Re: [RFC][-mm] Memory controller hierarchy support (v1)

Previous thread: 2.6.25-mm1 -- Is this just my drive dying? by Miles Lane on Saturday, April 19, 2008 - 1:17 am. (3 messages)

Next thread: [PATCH] pci-gart_64: comparison between signed and unsigned by Roel Kluin on Saturday, April 19, 2008 - 2:13 am. (1 message)
To: Paul Menage <menage@...>, Pavel Emelianov <xemul@...>
Cc: YAMAMOTO Takashi <yamamoto@...>, <linux-kernel@...>, <linux-mm@...>, <containers@...>, Balbir Singh <balbir@...>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Date: Saturday, April 19, 2008 - 1:35 am

This applies on top of 2.6.25-rc8-mm2. The next version will be applied
on top of 2.5.25-mm1.

This code is built on top of Pavel's hierarchy patches.

1. It propagates the charges upwards. A charge incurred on a cgroup
is propagated to root. If any of the counters along the hierarchy
is over limit, reclaim is initiated from the parent. We reclaim
pages from the parent and the children below it. We also keep track
of the last child from whom reclaim was done and start from there in
the next reclaim.

TODO's/Open Questions

1. We need to hold cgroup_mutex while walking through the children
in reclaim. We need to figure out the best way to do so. Should
cgroups provide a helper function/macro for it?
2. Do not allow children to have a limit greater than their parents.
3. Allow the user to select if hierarchial support is required
4. Fine tune reclaim from children logic

Testing

This code was tested on a UML instance, where it compiled and worked well.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

include/linux/res_counter.h | 14 ++++
kernel/res_counter.c | 42 +++++++++++---
mm/memcontrol.c | 128 +++++++++++++++++++++++++++++++++++---------
3 files changed, 148 insertions(+), 36 deletions(-)

diff -puN include/linux/res_counter.h~memory-controller-hierarchy-support include/linux/res_counter.h
--- linux-2.6.25-rc8/include/linux/res_counter.h~memory-controller-hierarchy-support 2008-04-19 11:00:28.000000000 +0530
+++ linux-2.6.25-rc8-balbir/include/linux/res_counter.h 2008-04-19 11:00:28.000000000 +0530
@@ -43,6 +43,10 @@ struct res_counter {
* the routines below consider this to be IRQ-safe
*/
spinlock_t lock;
+ /*
+ * the parent counter. used for hierarchical resource accounting
+ */
+ struct res_counter *parent;
};

/**
@@ -82,7 +86,12 @@ enum {
* helpers for accounting
*/

-void res_counter_init(struct res_counter ...

To: Balbir Singh <balbir@...>
Cc: Pavel Emelianov <xemul@...>, YAMAMOTO Takashi <yamamoto@...>, <linux-kernel@...>, <linux-mm@...>, <containers@...>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Date: Saturday, April 19, 2008 - 11:49 am

On Fri, Apr 18, 2008 at 10:35 PM, Balbir Singh

There's already a function, cgroup_lock(). But it would be nice to
avoid such a heavy locking here, particularly since memory allocations
can occur with cgroup_mutex held, which could lead to a nasty deadlock
if the allocation triggered reclaim.

One of the things that I've been considering was to put the
parent/child/sibling hierarchy explicitly in cgroup_subsys_state. This
would give subsystems their own copy to refer to, and could use their
own internal locking to synchronize with callbacks from cgroups that
might change the hierarchy. Cpusets could make use of this too, since

My thoughts on this would be:

1) Never attach a first-level child's counter to its parent. As
Yamamoto points out, otherwise we end up with extra global operations
whenever any cgroup allocates or frees memory. Limiting the total
system memory used by all user processes doesn't seem to be something
that people are going to generally want to do, and if they really do
want to they can just create a non-root child and move the whole
system into that.

The one big advantage that you currently get from having all
first-level children be attached to the root is that the reclaim logic
automatically scans other groups when it reaches the top-level - but I
think that can be provided as a special-case in the reclaim traversal,
avoiding the overhead of hitting the root cgroup that we have in this
patch.

2) Always attach other children's counters to their parents - if the
user didn't want a hierarchy, they could create a flat grouping rather
than nested groupings.

Paul
--

To: Paul Menage <menage@...>
Cc: <balbir@...>, <xemul@...>, <yamamoto@...>, <linux-kernel@...>, <linux-mm@...>, <containers@...>, <kamezawa.hiroyu@...>
Date: Monday, April 21, 2008 - 2:33 am

Yeah - I suppose cpusets could use it, though
it's not critical. A fair bit of work already
went into cpusets so that it would not need to
traverse this hierarchy on any critical code path,
or while holding inconvenient locks.

So cpusets shouldn't be the driving motivation
for this, but it will likely be happy to go along
for the ride.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.940.382.4214
--

To: Paul Menage <menage@...>
Cc: Pavel Emelianov <xemul@...>, YAMAMOTO Takashi <yamamoto@...>, <linux-kernel@...>, <linux-mm@...>, <containers@...>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Date: Sunday, April 20, 2008 - 4:16 am

Very cool! I look forward to that infrastructure. I'll also look at the cpuset

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--

To: Balbir Singh <balbir@...>
Cc: Paul Menage <menage@...>, YAMAMOTO Takashi <yamamoto@...>, <linux-kernel@...>, <linux-mm@...>, <containers@...>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Date: Saturday, April 19, 2008 - 6:47 am

I though about it recently. Can we have a cgroup file, which will
control whether to attach a res_counter to the parent? This will

--

To: Pavel Emelyanov <xemul@...>
Cc: Paul Menage <menage@...>, YAMAMOTO Takashi <yamamoto@...>, <linux-kernel@...>, <linux-mm@...>, <containers@...>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Date: Sunday, April 20, 2008 - 3:43 am

It's one of the TODOS

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--

To: <balbir@...>
Cc: <menage@...>, <xemul@...>, <linux-kernel@...>, <linux-mm@...>, <containers@...>
Date: Saturday, April 19, 2008 - 2:56 am

i wonder how much performance impacts this involves.

it increases the number of atomic ops per charge/uncharge and
makes the common case (success) of every charge/uncharge in a system

if i read it correctly, it makes us hit the last child again and again.

i think you want to reclaim from all cgroups under the curr_cgroup
including eg. children's children.

YAMAMOTO Takashi
--

To: YAMAMOTO Takashi <yamamoto@...>
Cc: <menage@...>, <xemul@...>, <linux-kernel@...>, <linux-mm@...>, <containers@...>
Date: Saturday, April 19, 2008 - 4:34 am

Yes, it does. I'll run some tests to see what the overhead looks like. The
multi-hierarchy feature is very useful though and one of the TODOs is to make

Yes, good point, I should break out the function, so that we can work around the
recursion problem. Charging can cause further recursion, since we check for

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--

To: <balbir@...>
Cc: YAMAMOTO Takashi <yamamoto@...>, <menage@...>, <xemul@...>, <linux-kernel@...>, <linux-mm@...>, <containers@...>
Date: Sunday, April 20, 2008 - 8:41 pm

On Sat, 19 Apr 2008 14:04:00 +0530
I think multilevel cgroup is useful but this routines handling of hierarchy
seems never good. An easy idea to aginst this is making a child borrow some
amount of charge from its parent for reducing checks.
If you go this way, please show possibility to reducing overhead in your plan.

BTW, do you have ideas of attributes for children<->parent other than 'limit' ?
For example, 'priority' between childlen.

Thanks,
-Kame

--

Previous thread: 2.6.25-mm1 -- Is this just my drive dying? by Miles Lane on Saturday, April 19, 2008 - 1:17 am. (3 messages)

Next thread: [PATCH] pci-gart_64: comparison between signed and unsigned by Roel Kluin on Saturday, April 19, 2008 - 2:13 am. (1 message)