Re: [PATCH 1/7] cgroups: Shrink struct cgroup_subsys

Previous thread: [GIT PUL] block driver updates for 2.6.37-rc1 by Jens Axboe on Friday, October 22, 2010 - 1:00 am. (1 message)

Next thread: [PATCH] rcu: force all adopted callbacks are invoked before next cpu-offline by Lai Jiangshan on Friday, October 22, 2010 - 1:23 am. (1 message)
From: Li Zefan
Date: Friday, October 22, 2010 - 1:09 am

Stephane posted a patchset to add perf_cgroup subsystem, so perf can
be used to monitor all threads belonging to a cgroup.

But if you already mounted a cgroup hierarchy but without perf_cgroup
and the hierarchy has sub-cgroups, you can't bind perf_cgroup to it,
and thus you're not able to use per-cgroup perf feature.

This patchset alleviates the pain, and then a subsytem can be
bound/unbound to/from a hierarchy which has sub-cgroups in it.

Some subsystems still can't take advantage of this patchset, memcgroup
and cpuset in specific.

For cpuset, if a hierarchy has a sub-cgroup and the cgroup has tasks,
we can't decide sub-cgroup's cpuset.mems and cpuset.cpus automatically
if we try to bind cpuset to this hierarchy.

For memcgroup, memcgroup uses css_get/put(), and due to some complexity,
for now bindable subsystems should not use css_get/put().

Usage:

# mount -t cgroup -o cpuset xxx /mnt
# mkdir /mnt/tmp
# echo $$ > /mnt/tmp/tasks

(add cpuacct to the hierarchy)
# mount -o remount,cpuset,cpuacct xxx /mnt

(remove it from the hierarchy)
# mount -o remount,cpuset xxx /mnt

There's another limitation, cpuacct should not be bound to any mounted
hierarchy before the above operation. But that's not a problem, as you
can remove it from a hierarchy and bind it to another one.

---
 Documentation/cgroups/cgroups.txt |   26 ++-
 include/linux/cgroup.h            |   25 +++-
 kernel/cgroup.c                   |  305 +++++++++++++++++++++++++++++++------
 kernel/cgroup_freezer.c           |   19 ++-
 kernel/sched.c                    |    1 +
 net/sched/cls_cgroup.c            |    1 +
 security/device_cgroup.c          |    1 +
 7 files changed, 311 insertions(+), 67 deletions(-)
--

From: Li Zefan
Date: Friday, October 22, 2010 - 1:09 am

On x86_32, sizeof(struct cgroup_subsys) shrinks from 276 bytes
to 264.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
include/linux/cgroup.h |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index ed4ba11..e23ded6 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -481,14 +481,16 @@ struct cgroup_subsys {
 	void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
 
 	int subsys_id;
-	int active;
-	int disabled;
-	int early_init;
+
+	unsigned int active:1;
+	unsigned int disabled:1;
+	unsigned int early_init:1;
 	/*
 	 * True if this subsys uses ID. ID is not available before cgroup_init()
 	 * (not available in early_init time.)
 	 */
-	bool use_id;
+	unsigned int use_id:1;
+
 #define MAX_CGROUP_TYPE_NAMELEN 32
 	const char *name;
 
-- 
1.7.0.1

--

From: Paul Menage
Date: Thursday, October 28, 2010 - 4:34 pm

Acked-by: Paul Menage <menage@google.com>

Maybe use "bool" here?

--

From: Li Zefan
Date: Sunday, November 7, 2010 - 10:23 pm

(Sorry for the delayed reply)


Do you mean:

bool active;
bool disabled;
...

?

With alignment 5-8 bool values == 8 bytes in 64-bit machine, compared to
--

From: Paul Menage
Date: Tuesday, November 9, 2010 - 2:05 pm

I meant specifying it as:

bool active:1;
bool disabled:1;

i.e. keeping the bit-sized flags but also keeping the bool semantics.
Having said that, I'm not really sure why saving 12 bytes per
subsystem is worth a patch.

Paul
--

From: Li Zefan
Date: Tuesday, November 9, 2010 - 5:52 pm

Every thing that reduces code size (without sacrifice readability
and maintain maintainability) should be worth.

This is one of the reasons we accept patches that replacing
kmalloc+memset with kzalloc, which just saves 8 bytes in my box.
--

From: Paul Menage
Date: Tuesday, November 9, 2010 - 6:53 pm

Are you sure? I don't have a buildable kernel tree at the moment, but
the following fragment compiled fine for me (with gcc 4.4.3):

struct foo {
  _Bool b1:1;
  _Bool b2:1;
};

and was sized at one byte. And "bool" is just a typedef of _Bool in

Agreed, within reason. But this patch doesn't reduce code size - it
makes the code fractionally more complicated and reduces the *binary*

Replacing two function calls with one function call is a code
simplification and hence (generally) a good thing - the minuscule
reduction in binary size reduction that comes with it is just noise.

Paul
--

From: Li Zefan
Date: Tuesday, November 9, 2010 - 7:06 pm

It's a commonly used skill in kernel code, so I can't say it makes
code more complicated.

That said, I'll happily drop this patch. It just came to me when I
started to add new bool values to the structure. Or if you prefer
--

From: Paul Menage
Date: Tuesday, November 9, 2010 - 7:15 pm

bool xxx:1 is fine with me - I think it's worth keeping the semantics
of these being boolean flags as obvious as possible.

I just wouldn't invest much effort in similar patches in the future
(given that there are only likely to be, what, <20 instances of
cgroup_subsys in the kernel?). Shrinking a structure that had
potentially very many instances, such as css_set or cg_cgroup_link,
would be different of course.

Paul
--

From: Li Zefan
Date: Friday, October 22, 2010 - 1:09 am

Stephane posted a patchset to add perf_cgroup subsystem, so perf can
be used to monitor all threads belonging to a cgroup.

But if you already mounted a cgroup hierarchy but without perf_cgroup
and the hierarchy has sub-cgroups, you can't bind perf_cgroup to it,
and thus you're not able to use per-cgroup perf feature.

This patchset alleviates the pain, and then a subsytem can be bind/unbind
to/from a hierarchy which has sub-cgroups in it.

For a cgroup subsystem to become bindable, the can_bind flag of
struct cgroup_subsys should be set, and provide ->bind() callback
if necessary.

But for some constraints, not all subsystems can take advantage of
this patch. For example, we can't decide a cgroup's cpuset.mems and
cpuset.cpus automatically, so cpuset is not bindable.

Usage:

# mount -t cgroup -o cpuset xxx /mnt
# mkdir /mnt/tmp
# echo $$ > /mnt/tmp/tasks

(assume cpuacct is bindable, and we add cpuacct to the hierarchy)
# mount -o remount,cpuset,cpuacct xxx /mnt

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 include/linux/cgroup.h |    5 +
 kernel/cgroup.c        |  225 ++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 187 insertions(+), 43 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index e23ded6..49369ff 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -490,6 +490,11 @@ struct cgroup_subsys {
 	 * (not available in early_init time.)
 	 */
 	unsigned int use_id:1;
+	/*
+	 * Indicate if this subsystem can be bound/unbound to/from a cgroup
+	 * hierarchy which has child cgroups.
+	 */
+	unsigned int can_bind:1;
 
 #define MAX_CGROUP_TYPE_NAMELEN 32
 	const char *name;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 6c36750..46df5f8 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -57,6 +57,7 @@
 #include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
 #include <linux/eventfd.h>
 #include <linux/poll.h>
+#include <linux/bitops.h>
 
 #include ...
From: Peter Zijlstra
Date: Friday, October 22, 2010 - 5:47 am

You mean to say that you cannot add cpuset to an existing hierarchy
right? Not that you cannot add perf/cpuacct to an existing cpuset
hierarchy?


--

From: Li Zefan
Date: Sunday, October 24, 2010 - 5:59 pm

Exactly. You can see from the example. 
--

From: Matt Helsley
Date: Friday, October 22, 2010 - 2:38 pm

You could just say it's a depth-first walk except we process the parent before


Is there something wrong with the indentation here? I can't
see the closing brace for the "if (root->number_of_cgroups > 1)"

I think you could avoid the can_bind flag field entirely and do:

			if (!subsys[i]->bind)

Also, if we're going with my "split out unbind" suggestion I think
the part here would be:

		for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT)
			if (!subsys[i]->unbind)
				return -EBUSY;

<snip>

Cheers,
	-Matt Helsley
--

From: Li Zefan
Date: Sunday, October 24, 2010 - 6:23 pm

Nope. For some subsystems we just set the flag and need not to
--

From: Paul Menage
Date: Thursday, October 28, 2010 - 4:57 pm

The standard term for that is "pre-order traversal". You shouldn't
need a diagram.

Paul
--

From: Paul Menage
Date: Thursday, October 28, 2010 - 4:55 pm

Maybe call this "bindable"?

Basic idea looks great, it could do with a bunch more comments, and
maybe locking rules.

Is there any chance of a lock inversion between dir->i_mutex and
cgroup_lock in hierarchy_popuiate_dir() ?

Paul
--

From: Li Zefan
Date: Sunday, November 7, 2010 - 10:26 pm

The lock order is:

mutex_lock(&dir->i_mutex)
  mutex_lock(&cgroup_mutex)
    mutex_lock(&dir->i_mutex, I_MUTEX_CHILD)

it should be safe.
--

From: Li Zefan
Date: Friday, October 22, 2010 - 1:10 am

This allows us to unbind a cgroup subsystem from a hierarchy
which has sub-cgroups in it.

Due to some complexity, for now subsytems that use css_get/put()
can't be unbound from such a hierarchy.

Usage:

# mount -t cgroup -o cpuset,cpuacct xxx /mnt
# mkdir /mnt/tmp
# echo $$ > /mnt/tmp/tasks

(remove it from the hierarchy)
# mount -o remount,cpuset xxx /mnt

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 kernel/cgroup.c |   76 ++++++++++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 61 insertions(+), 15 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 46df5f8..9ce3fdb 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1036,19 +1036,42 @@ static int hierarchy_attach_css(struct cgroup *cgrp, void *data)
 	return ret;
 }
 
-static int hierarchy_update_css_sets(struct cgroup *cgrp, void *data)
+static int hierarchy_remove_css(struct cgroup *cgrp, void *data)
 {
-	unsigned long added_bits = (unsigned long)data;
+	unsigned long removed_bits = (unsigned long)data;
 	int i;
+
+	/*
+	 * There might be some pointers to the cgroup_subsys_state
+	 * that we are going to destroy.
+	 */
+	synchronize_rcu();
+
+	for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+		subsys[i]->destroy(subsys[i], cgrp);
+		cgrp->subsys[i] = NULL;
+	}
+
+	return 0;
+}
+
+static void hierarchy_update_css_sets(struct cgroup *cgrp,
+				      unsigned long bits, bool add)
+{
 	struct cg_cgroup_link *link;
+	int i;
 
 	write_lock(&css_set_lock);
 	list_for_each_entry(link, &cgrp->css_sets, cgrp_link_list) {
 		struct css_set *cg = link->cg;
 		struct hlist_head *hhead;
 
-		for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
-			cg->subsys[i] = cgrp->subsys[i];
+		for_each_set_bit(i, &bits, CGROUP_SUBSYS_COUNT) {
+			if (add)
+				cg->subsys[i] = cgrp->subsys[i];
+			else
+				cg->subsys[i] = dummytop->subsys[i];
+		}
 
 		/* rehash */
 		hlist_del(&cg->hlist);
@@ -1056,7 +1079,21 @@ static int ...
From: Paul Menage
Date: Thursday, October 28, 2010 - 5:02 pm

Should these technically be rcu_assign_pointer() ?

Paul
--

From: Li Zefan
Date: Friday, October 22, 2010 - 1:11 am

For those subsystems (debug, cpuacct, net_cls and devices),
setting the can_bind flag is sufficient.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 kernel/cgroup.c          |    1 +
 kernel/sched.c           |    1 +
 net/sched/cls_cgroup.c   |    1 +
 security/device_cgroup.c |    1 +
 4 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 9ce3fdb..6364bb5 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -5124,5 +5124,6 @@ struct cgroup_subsys debug_subsys = {
 	.destroy = debug_destroy,
 	.populate = debug_populate,
 	.subsys_id = debug_subsys_id,
+	.can_bind = 1,
 };
 #endif /* CONFIG_CGROUP_DEBUG */
diff --git a/kernel/sched.c b/kernel/sched.c
index 51944e8..cae104f 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -9329,6 +9329,7 @@ struct cgroup_subsys cpuacct_subsys = {
 	.destroy = cpuacct_destroy,
 	.populate = cpuacct_populate,
 	.subsys_id = cpuacct_subsys_id,
+	.can_bind = 1,
 };
 #endif	/* CONFIG_CGROUP_CPUACCT */
 
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index 37dff78..020ddfe 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -38,6 +38,7 @@ struct cgroup_subsys net_cls_subsys = {
 #define net_cls_subsys_id net_cls_subsys.subsys_id
 #endif
 	.module		= THIS_MODULE,
+	.can_bind	= 1,
 };
 
 
diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index 8d9c48f..b8136fc 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -473,6 +473,7 @@ struct cgroup_subsys devices_subsys = {
 	.destroy = devcgroup_destroy,
 	.populate = devcgroup_populate,
 	.subsys_id = devices_subsys_id,
+	.can_bind = 1,
 };
 
 int devcgroup_inode_permission(struct inode *inode, int mask)
-- 
1.7.0.1

--

From: Li Zefan
Date: Friday, October 22, 2010 - 1:11 am

To make it bindable, we need to thaw all processes when unbinding
the freezer subsystem from a cgroup hierarchy.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 include/linux/cgroup.h  |    3 ++-
 kernel/cgroup.c         |   22 ++++++++++++++++++++--
 kernel/cgroup_freezer.c |   19 +++++++++++++++++--
 3 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 49369ff..1e4e1df 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -478,7 +478,8 @@ struct cgroup_subsys {
 	int (*populate)(struct cgroup_subsys *ss,
 			struct cgroup *cgrp);
 	void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp);
-	void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
+	void (*bind)(struct cgroup_subsys *ss, struct cgroup *cgrp,
+		     bool unbind);
 
 	int subsys_id;
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 6364bb5..12c1f7c 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1105,6 +1105,22 @@ static int hierarchy_populate_dir(struct cgroup *cgrp, void *data)
 	return 0;
 }
 
+static int hierarchy_subsys_bind(struct cgroup *cgrp, void *data)
+{
+	struct cgroup_subsys *ss = data;
+
+	ss->bind(ss, cgrp, false);
+	return 0;
+}
+
+static int hierarchy_subsys_unbind(struct cgroup *cgrp, void *data)
+{
+	struct cgroup_subsys *ss = data;;
+
+	ss->bind(ss, cgrp, true);
+	return 0;
+}
+
 /*
  * Call with cgroup_mutex held. Drops reference counts on modules, including
  * any duplicate ones that parse_cgroupfs_options took. If this function
@@ -1172,7 +1188,8 @@ static int rebind_subsystems(struct cgroupfs_root *root,
 			list_move(&ss->sibling, &root->subsys_list);
 			ss->root = root;
 			if (ss->bind)
-				ss->bind(ss, cgrp);
+				cgroup_walk_hierarchy(hierarchy_subsys_bind,
+						      ss, cgrp);
 			mutex_unlock(&ss->hierarchy_mutex);
 			/* refcount was already taken, and we're keeping it */
 		} else if (bit & removed_bits) {
@@ -1180,7 +1197,8 @@ static int ...
From: Matt Helsley
Date: Friday, October 22, 2010 - 1:57 pm

Based on experience using cgroups and questions we've fielded in the
past on IRC I'd say users will really appreciate this.

We're planning to use the freezer in checkpoint/restart. Since
checkpoint requires the tasks to remain frozen for the duration of
the syscall we add a kernel-internal freezer subsystem interface
which prevents the cgroup from thawing. So we'll need some way to
"block" unbinding for that as well.

Perhaps the bind op should be able to return an error when
unbind == true? Although that raises the question of what to do if
only one of multiple unbinds fails..

Alternately you could split the bind/unbind op function pointers
and get rid of the boolean argument. Then just don't fill in the
freezer's unbind op and refuse to unbind subsystems that lack
the unbind op. That seems a bit cleaner for now at least.

Cheers,

--

From: Matt Helsley
Date: Friday, October 22, 2010 - 2:46 pm

Or alternately to the alternate :):  add a can_unbind:1 field and keep
the boolean bind op argument. That might be best -- avoids an extra
function pointer and still splits the bind/unbind-ability of a
subsystem.

Cheers,
	-Matt Helsley
--

From: Paul Menage
Date: Thursday, October 28, 2010 - 5:06 pm

I think I'd vote for the separate bind() / unbind() callbacks.

Paul
--

From: Matt Helsley
Date: Friday, October 22, 2010 - 2:57 pm

And in case you're curious here are the portions of the checkpoint/restart
tree which implement the above:

http://www.linux-cr.org/git/?p=linux-cr.git;a=commitdiff;h=a584c1c6b729a544c76cbb173ce...
http://www.linux-cr.org/git/?p=linux-cr.git;a=commitdiff;h=34869146ecd662352c0024e0769...

Cheers,
	-Matt Helsley
--

From: Li Zefan
Date: Sunday, October 24, 2010 - 6:15 pm

How about, we allow unbinding only when all the cgroups' state is

We can use bindable:1 along with a callback can_bind().

For some subsystems, we just set bindable to true. For freezer subsystem,
we set it to true and provide freezer_can_bind(), and return true only
if no cgroup's state is FROZEN.
--

From: Li Zefan
Date: Friday, October 22, 2010 - 1:12 am

For now bindable subsystems should not use css_get/put(), so warn
on this misuse.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 include/linux/cgroup.h |    7 +++++--
 kernel/cgroup.c        |    3 +++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 1e4e1df..10e4e02 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -80,13 +80,15 @@ struct cgroup_subsys_state {
 
 /* bits in struct cgroup_subsys_state flags field */
 enum {
-	CSS_ROOT, /* This CSS is the root of the subsystem */
-	CSS_REMOVED, /* This CSS is dead */
+	CSS_ROOT,	/* This CSS is the root of the subsystem */
+	CSS_REMOVED,	/* This CSS is dead */
+	CSS_NO_GET,	/* Forbid calling css_get/put() */
 };
 
 /* Caller must verify that the css is not for root cgroup */
 static inline void __css_get(struct cgroup_subsys_state *css, int count)
 {
+	BUG_ON(test_bit(CSS_NO_GET, &css->flags));
 	atomic_add(count, &css->refcnt);
 }
 
@@ -119,6 +121,7 @@ static inline bool css_tryget(struct cgroup_subsys_state *css)
 {
 	if (test_bit(CSS_ROOT, &css->flags))
 		return true;
+	BUG_ON(test_bit(CSS_NO_GET, &css->flags));
 	while (!atomic_inc_not_zero(&css->refcnt)) {
 		if (test_bit(CSS_REMOVED, &css->flags))
 			return false;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 12c1f7c..88e265d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -936,6 +936,9 @@ static void init_cgroup_css(struct cgroup_subsys_state *css,
 	css->cgroup = cgrp;
 	atomic_set(&css->refcnt, 1);
 	css->flags = 0;
+	/* For now, a bindable subsystem should not call css_get/put(). */
+	if (ss->can_bind)
+		set_bit(CSS_NO_GET, &css->flags);
 	css->id = NULL;
 	if (cgrp == dummytop)
 		set_bit(CSS_ROOT, &css->flags);
-- 
1.7.0.1

--

From: Paul Menage
Date: Thursday, October 28, 2010 - 5:05 pm

Maybe re-title this patch to BUG() if a bindable subsystem calls css_get() ?

Paul

--

From: Li Zefan
Date: Friday, October 22, 2010 - 1:12 am

Provide a usage example, and update the bind() callback API.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 Documentation/cgroups/cgroups.txt |   26 +++++++++++++++++---------
 1 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index 190018b..5b5382a 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -363,17 +363,23 @@ Note this will add ns to the hierarchy but won't remove memory or
 cpuset, because the new options are appended to the old ones:
 # mount -o remount,ns /dev/cgroup
 
+For some subsystems you can bind them to a mounted hierarchy or
+remove them from it, even if there're sub-cgroups in it:
+# mount -t cgroup -o freezer hier1 /dev/cgroup
+# echo $$ > /dev/cgroup/my_cgroup
+# mount -o freezer,cpuset hier1 /dev/cgroup
+(failed)
+# mount -o freezer,cpuacct hier1 /dev/cgroup
+# mount -o cpuacct hier1 /dev/cgroup
+
+Note cpuacct should be sit in the default hierarchy before remount.
+
 To Specify a hierarchy's release_agent:
 # mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \
   xxx /dev/cgroup
 
 Note that specifying 'release_agent' more than once will return failure.
 
-Note that changing the set of subsystems is currently only supported
-when the hierarchy consists of a single (root) cgroup. Supporting
-the ability to arbitrarily bind/unbind subsystems from an existing
-cgroup hierarchy is intended to be implemented in the future.
-
 Then under /dev/cgroup you can find a tree that corresponds to the
 tree of the cgroups in the system. For instance, /dev/cgroup
 is the cgroup that holds the whole system.
@@ -623,13 +629,15 @@ initialization which might be required before a task could attach.  For
 example in cpusets, no task may attach before 'cpus' and 'mems' are set
 up.
 
-void bind(struct cgroup_subsys *ss, struct cgroup *root)
+void bind(struct cgroup_subsys *ss, struct cgroup *cgrp, bool unbind)
 ...
From: KAMEZAWA Hiroyuki
Date: Sunday, October 24, 2010 - 5:36 pm

On Fri, 22 Oct 2010 16:12:25 +0800

Can this operation be used with noop cgroup ?
(IOW, perf cgroup can be attached to noop cgroup ?)

Thanks,
-Kame

--

From: Li Zefan
Date: Sunday, October 24, 2010 - 5:52 pm

Yes, you can. I've tested it.
--

From: Li Zefan
Date: Sunday, October 24, 2010 - 5:56 pm

But, wait. It stuck when I moved a task from a sub-cgroup to the root.
Need to dig more..
--

From: Paul Menage
Date: Thursday, October 28, 2010 - 5:13 pm

This is a bit vague. How about:

For non-bindable subsystems, this will  only involve movement
between the default hierarchy (which never has sub-cgroups) and a
hierarchy that is being created/destroyed (and hence has no sub-cgroups).

For binadable subsystems, this may also involve movement between the
default hierarchy and a mounted hierarchy that's populated with
sub-cgroups.

Also, the docs should mention that a cgroup setting the can_bind flag
has to be able to support side-effect free movement of a task into any
just-created cgroup, and into the root cgroup at any time. i.e. it's
not suitable for any subsystem where can_attach() might return false
for the root cgroup or a newly-created cgroup, or attach() might have
side-effects for those same cases.

Actually, perhaps we should forbid the combination of having both an
attach() callback and can_bind=true ?

Also, post_clone() doesn't get called when creating the css hierarchy
during binding.

Paul
--

From: Paul Menage
Date: Thursday, October 28, 2010 - 5:15 pm

Oops, ignore that - I noticed that it does.

Paul
--

From: Li Zefan
Date: Sunday, November 7, 2010 - 10:27 pm

This is much better. :)

Documentation often causes my headache due to my limited English skill.
--

From: Peter Zijlstra
Date: Friday, October 22, 2010 - 5:50 am

Right, so the only remaining problem I see with this approach is that
you cannot profile two different hierarchies at the same time, but I
can't really think of a solution to that problem (nor do I care very
much).

Seems like a nice approach, Thanks Li!
--

From: Li Zefan
Date: Sunday, October 24, 2010 - 6:07 pm

Paul had a patch to allow some subsystems to be added to multi-hierarchies,
which may help. But it forbids accessing t->cgroups, which makes this
feature of limited use.


Thanks!
--

From: Paul Menage
Date: Thursday, October 28, 2010 - 4:33 pm

I more or less dropped it since I couldn't see a big use for it - all
the examples I had were more easily implemented as basic cgroup

Not exactly - it meant that you couldn't do a fast dereference via
t->cgroups to get to your cgroup_subsys_state, since there was no
longer a unique css for each task for that subsystem - there would be
one per mounted instance of the subsystem. Instead it became necessary
to follow a chain of pointers to find the appropriate css for the
hierarchy in question.

In general this also means that it's harder for the multi-bound
subsystem have any real-world meaning - it's mostly useful for
stateless subsystems, or ones where the cgroup_subsys_state contents
have no direct connection with machine-wide data.

Paul
--

Previous thread: [GIT PUL] block driver updates for 2.6.37-rc1 by Jens Axboe on Friday, October 22, 2010 - 1:00 am. (1 message)

Next thread: [PATCH] rcu: force all adopted callbacks are invoked before next cpu-offline by Lai Jiangshan on Friday, October 22, 2010 - 1:23 am. (1 message)