Re: Memory controller merge (was Re: -mm merge plans for 2.6.24)

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Balbir Singh <balbir@...>
Cc: Andrew Morton <akpm@...>, Pavel Emelianov <xemul@...>, <linux-kernel@...>, <linux-mm@...>
Date: Tuesday, October 2, 2007 - 11:46 am

On Tue, 2 Oct 2007, Balbir Singh wrote:

I agree with putting the memory controller stuff on hold from 2.6.24.

Sorry, Balbir, I've failed to get back to you, still attending to
priorities.  Let me briefly summarize my issue with the mem controller:
you've not yet given enough attention to swap.

I accept that full swap control is something you're intending to add
incrementally later; but the current state doesn't make sense to me.

The problems are swapoff and swapin readahead.  These pull pages into
the swap cache, which are assigned to the cgroup (or the whatever-we-
call-the-remainder-outside-all-the-cgroups) which is running swapoff
or faulting in its own page; yet they very clearly don't (in general)
belong to that cgroup, but to other cgroups which will be discovered
later.

I did try removing the cgroup mods to mm/swap_state.c, so swap pages
get assigned to a cgroup only once it's really known; but that's not
enough by itself, because cgroup RSS reclaim doesn't touch those
pages, so the cgroup can easily OOM much too soon.  I was thinking
that you need a "limbo" cgroup for these pages, which can be attacked
for reclaim along with any cgroup being reclaimed, but from which
pages are readily migrated to their real cgroup once that's known.

But I had to switch over to other work before trying that out:
perhaps the idea doesn't really fly at all.  And it might well
be no longer needed once full mem+swap control is there.

So in the current memory controller, that unuse_pte mem charge I was
originally worried about failing (I hadn't at that point delved in
to see how it tries to reclaim) actually never fails (and never
does anything): the page is already assigned to some cgroup-or-
whatever and is never charged to vma->vm_mm at that point.

And small point: once that is sorted out and the page is properly
assigned in unuse_pte, you'll be needing to pte_unmap_unlock and
pte_offset_map_lock around the mem_cgroup_charge call there -
you're right to call it with GFP_KERNEL, but cannot do so while
holding the page table locked and mapped.  (But because the page
lock is held, there shouldn't be any raciness to dropping and
retaking the ptl.)

Hugh
-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
-mm merge plans for 2.6.24, Andrew Morton, (Mon Oct 1, 5:22 pm)
new aops merge [was Re: -mm merge plans for 2.6.24], Hugh Dickins, (Tue Oct 2, 12:21 pm)
x86 patches was Re: -mm merge plans for 2.6.24, Andi Kleen, (Tue Oct 2, 2:18 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Andrew Morton, (Tue Oct 2, 2:32 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Ingo Molnar, (Tue Oct 2, 3:37 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Andi Kleen, (Tue Oct 2, 3:46 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Thomas Gleixner, (Tue Oct 2, 3:58 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Andi Kleen, (Tue Oct 2, 3:01 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Andy Whitcroft, (Tue Oct 2, 5:26 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Andrew Morton, (Tue Oct 2, 3:18 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, KAMEZAWA Hiroyuki, (Tue Oct 2, 3:36 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Christoph Lameter, (Tue Oct 2, 2:16 pm)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Nish Aravamudan, (Tue Oct 2, 12:40 pm)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Andrew Morton, (Tue Oct 2, 3:43 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, KAMEZAWA Hiroyuki, (Tue Oct 2, 4:16 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Christoph Lameter, (Tue Oct 2, 2:18 pm)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Yasunori Goto, (Tue Oct 2, 6:48 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Lee Schermerhorn, (Tue Oct 2, 1:25 pm)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Lee Schermerhorn, (Tue Oct 2, 1:17 pm)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Matt Mackall, (Tue Oct 2, 3:55 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Andi Kleen, (Tue Oct 2, 3:59 am)
Re: -mm merge plans for 2.6.24, Pekka Enberg, (Tue Oct 2, 12:12 pm)
v4l-stk11xx* [Was: -mm merge plans for 2.6.24], Jiri Slaby, (Tue Oct 2, 3:59 am)
Re: Memory controller merge (was Re: -mm merge plans for 2.6..., Hugh Dickins, (Tue Oct 2, 11:46 am)
writeback fixes, Fengguang Wu, (Tue Oct 2, 4:39 am)
Re: -mm merge plans for 2.6.24, Borislav Petkov, (Sat Oct 13, 4:44 am)
Re: -mm merge plans for 2.6.24, Andrew Morton, (Sat Oct 13, 4:52 am)
Re: -mm merge plans for 2.6.24, Borislav Petkov, (Sat Oct 13, 7:45 am)
r/o bind mounts, was Re: -mm merge plans for 2.6.24, Christoph Hellwig, (Tue Oct 9, 5:19 am)
Re: remove zero_page (was Re: -mm merge plans for 2.6.24), Linus Torvalds, (Wed Oct 3, 11:21 am)
Re: remove zero_page (was Re: -mm merge plans for 2.6.24), Linus Torvalds, (Tue Oct 9, 10:52 am)
Re: remove zero_page (was Re: -mm merge plans for 2.6.24), Linus Torvalds, (Tue Oct 9, 10:22 pm)
Re: remove zero_page (was Re: -mm merge plans for 2.6.24), Hugh Dickins, (Wed Oct 10, 12:06 am)
Re: remove zero_page (was Re: -mm merge plans for 2.6.24), Linus Torvalds, (Wed Oct 10, 1:20 am)
Re: remove zero_page (was Re: -mm merge plans for 2.6.24), Linus Torvalds, (Wed Oct 10, 11:04 am)
Re: remove zero_page (was Re: -mm merge plans for 2.6.24), Linus Torvalds, (Tue Oct 9, 11:06 pm)
wibbling over the cpuset shed domain connnection, Paul Jackson, (Mon Oct 1, 5:34 pm)
Re: wibbling over the cpuset shed domain connnection, Nick Piggin, (Tue Oct 2, 8:36 am)
Re: wibbling over the cpuset shed domain connnection, Paul Jackson, (Wed Oct 3, 1:21 am)
Re: wibbling over the cpuset shed domain connnection, Nick Piggin, (Tue Oct 2, 9:12 am)
Re: wibbling over the cpuset shed domain connnection, Paul Jackson, (Wed Oct 3, 3:00 am)
Re: wibbling over the cpuset shed domain connnection, Andrew Morton, (Wed Oct 3, 6:57 am)
per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Peter Zijlstra, (Tue Oct 2, 4:17 am)
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Martin Knoblauch, (Wed Oct 3, 7:00 am)
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Peter Zijlstra, (Fri Oct 26, 10:48 am)
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Trond Myklebust, (Fri Oct 26, 12:37 pm)
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Peter Zijlstra, (Fri Dec 14, 10:50 am)
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Miklos Szeredi, (Fri Dec 14, 11:14 am)
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Peter Zijlstra, (Fri Dec 14, 11:54 am)
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Miklos Szeredi, (Fri Oct 26, 11:06 am)
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Peter Zijlstra, (Fri Oct 26, 11:22 am)
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Peter Zijlstra, (Fri Oct 26, 11:33 am)
[PATCH] mm: sysfs: expose the BDI object in sysfs, Peter Zijlstra, (Fri Nov 2, 10:59 am)
Re: [PATCH] mm: sysfs: expose the BDI object in sysfs, Kay Sievers, (Fri Nov 2, 11:13 am)
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Peter Zijlstra, (Sat Oct 27, 12:07 pm)