All control files created by cgroup subsystems are given a prefix corresponding to their subsystem name. But control files provided by cgroups itself have no prefix. Currently that set of files is just "tasks", "notify_on_release" and "release_agent", but that set is likely to expand in the future. To reduce the risk of clashes, it would make sense to prefix these files and any future ones with the "cgroup." prefix. The only reason that I can see *not* to do this would be for compatibility with 2.6.24. Do people think this is a strong enough reason to leave the existing names? If distros are planning to ship products based on 2.6.24, presumably they'd be adding their own patches anyway, so they could add a trivial prefix change patch too. (I realise this discussion would have been more useful *before* 2.6.24 shipped, but I didn't quite get round to it ...) A compromise might be to keep "tasks" unprefixed, and say that future names get the "cgroup." prefix; in this case I'd be inclined to add the prefix to notify_on_release and release_agent on the grounds that there's much less chance of breaking anyone with those files since (I suspect) they're much less used. Note that if you mount a cgroup filesystem with the "noprefix" option, which is what the cpuset filesystem wrapper does, no subsystems have prefixes, and in this case the "cgroup." prefix wouldn't be used either. So this doesn't affect any users that explicitly mount cpusets rather than cgroups. Thoughts? Paul --
This makes most sense to me. It won't break any existing software (most likely) while it seems reasonable to leave 'tasks' unprefixed as this is something that any software using any subsystem of cgroup would be using anyway and it is not that much associated with a particulat subsystem. Regards, Peter Litov. --
And it makes most sense to me too, though I still doubt name collision will be a problem. --
Another stray idea, for the master of stray ideas ;), how about (1) renaming
all the cgroup files as you like, Paul M, even including tasks to cgroup.tasks
or whatever, and then (2) adding symlinks for the legacy names, such as:
tasks -> cgroup.tasks
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.940.382.4214
--So, there have been various options suggested over the course of this thread: -- 1) no code changes, just stake out all names matching a certain regexp (e.g. "[a-z].*") as being potentially used by the kernel in the future; document this, and let users who are worried about name clashes avoid these names pros: no work involved, avoids potentially complex changes to solve a possibly non-problem. cons: leaves an intermingled namespace; since this would be a convention rather than an enforced rule, users might be unaware that they're setting themselves up for a fall -- 2) separate out the kernel-generated names and user-generated names by putting the user-generated names in a "groups" sub-directory (can be a mount option that's automatically disabled for cpusets). pros: completely solves problem of intermingled namespaces; makes it easier to see sub-groups at a glance cons: extra code, slightly more awkward to deal with in the general case, is incompatible with the code that was in mainline in the brief period of time since 2.6.24 was finalized. -- 3) prefix all cgroup-provided files with "cgroup." pros: hardly any extra code; mostly solves the namespace problem since user names are much less likely to begin with "cgroup." cons: changes name of "tasks" file visible in 2.6.24; doesn't help if a future new subsystem is introduced with a commonly-used name that might clash with existing user-generated names. -- 4) prefix all future cgroup files with "cgroup." (possibly including existing notify_on_release and release_agent files?) pros: no extra code, just involves slightly longer strings for future new files; no incompatibility issues cons: ugly inconsistency between new cgroup files and grandfathered old ones, plus same clash problems as option 3 -- Paul --
I don't see the mixing of kernel generated filenames with user generated names to be a practical problem. There just aren't that many names in play here. I'd think that renaming even the few cgroup files that were published in 2.6.24 would be a fairly unacceptable incompatibility. We could accomplish that much by decreeing that future new kernel generated names that we might add follow some stronger convention, such as the cgroup_ or appropriate subsystem prefix. No need to change the existing well known names for this reason. Agreed. Yuck. You're complicating this more than necessary, to solve a problem that exists only in your imagination ;). Simple, overlapping, namespaces really are ok, so long as the number of distinct names and Just set some convention for future added names; that's enough to enable others adding user space names to avoid collisions. Such as using only lower case letters and underscores, or the cgroup_ or other prefix That's sufficient reason. Actually, in terms of 'common names used by humans' some of these names, "tasks" and "notify_on_release", date back much earlier than that. Please don't rename these two files in cgroups; and of course absolutely don't rename them in cpusets. Please don't end up with different names of these files, depending on whether you're in cgroups or cpusets, either. Not to me ;) Yuck. Lordy lordy -- a bunch of intrusive, complicating crap to solve a Unnecessary complication. No need to burden our children and grand children with this special case, forever after, in order to solve some empty set of I'll keep a bucket of ice water handy, for that patch ;). One nice property of the cpuset file system is that there is exactly one directory per cpuset. The kernel creates regular files; the user creates directories; no exceptions. I encourage us to keep that state I probably won't insist on a full force NAK so long as cpusets are unchanged, thanks to such compatibility measures; though I...
But that's part of my point - is it reasonable to describe a system That already happens - when mounted as the "cpuset" filesystem, we have names like "mems_allowed". When mounted as cgroups, we have names No, I don't like that idea either. Paul --
> Subsystem-created files already have an appropriate prefix.
Good ... such little consistencies in naming are helpful, when
voluntarily self imposed, without incompatible changes to published
In their earlier cpuset context, "tasks" and "notify_on_release" have
been well known for years.
And breaking actual compatibility, in a way that will break more than
the tiniest set of users, even over one release, should not be done
without both (1) a really good cause, and (2) a plan for migrating
Yeah - I figured you wouldn't risk my wrath changing this in cpusets ;).
Ok - it does make sense that cpuset specific files, when embedded in the
more general purpose cgroups, are adorned with name prefixes that they
didn't have in the legacy API. And besides, it is what is -- we released
it, and there's no compelling disaster forcing a change.
But generic cgroup infrastructure files, such as "tasks", have not had
such adornments. And now, that is what it is -- released, and sufficient
Good (and extra credit for saying so with fewer words and more politely
than I managed ;).
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.940.382.4214
--To me it seems quite logical that files belonging to the cgroup subsystem would have no prefix, and I don't see any good reason to do so. -serge --
On Thu, 28 Feb 2008 13:14:05 -0800 It would be easier to judge if we could see the full directory tree. Because if something is in /foo/bar/cgroup/notify_on_release then prefixing the filename with "cgroup_" seems pretty pointless. --
On Thu, Feb 28, 2008 at 1:21 PM, Andrew Morton The point would be to avoid situations where a user has code that creates a group directory called "foo", and then in a future kernel release cgroups introduces a control file called "foo". If it's prefixed, then the user just has to avoid creating groups prefixed by "cgroup." or any subsystem name, so collisions will be less likely. Paul --
Have you already run into that case? You said the set of files belong to cgroup itself is likely to increase - do you have some candidates in mind? Perhaps ones which are likely to conflict with choice taskgroup names? If anything I'd say add a 'prefix_cgroup' mount option and use it to decide whether to prefix or not (rather than use the noprefix option). -serge --
Nothing concrete right now. One example that I already proposed was the "cgroup.api" file but that's shelved for now, until such time as I It's hard to determine what likely taskgroup names would be. For my own use, pretty much every group has a unique 64-bit ID in the name so this isn't something I worry about directly affecting our systems. But The existing "noprefix" option controls whether the *subsystems* get prefixes. It's mainly there to provide backwards compatibility for cpusets. Existing cpusets clients would be expecting to find files with names like "mems_allowed" rather than "cpuset.mems_allowed". So mounting with the "noprefix" option (which happens automatically when you mount the "cpuset" filesystem wrapper) gives the same result as before. Currently "noprefix" has no effect on cgroup files, since they never have a prefix anyway. Yes, we could add a new mount option "prefixcgroup", and let people decide which they want. But I still don't see any argument *against* doing the prefixing automatically (rather than an argument that it's no better or worse) other than wanting 2.6.24 compatibility. Paul --
One likely new file that people agreed a while ago could be useful would be a "procs" file, similar to "tasks", but acting (and reporting) on thread groups rather than individual tasks. Paul --
Yes, I remember this. This feature would be extremely useful. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL --
On Thu, 28 Feb 2008 13:28:31 -0800 Maybe cgroups shouldn't be putting kernel-generated files in places where user-specified files appear? (Am still thrashing around a bit here without an overview of the overall layout and naming). --
On Thu, Feb 28, 2008 at 1:40 PM, Andrew Morton
Well, that API (mixing control files and group directories in the same
directory namespace) was inherited directly from cpusets.
It wouldn't be hard to throw that away and move all the user-created
group directories into their own subdirectory, i.e. change the
existing directory layout from something like:
/mnt/cgroup/
tasks
cpu.shares
memory.limit_in_bytes
memory.usage_in_bytes
user_created_groupname1/
tasks
cpu.shares
memory.limit_in_bytes
memory.usage_in_bytes
user_created_groupname2/
tasks
cpu.shares
memory.limit_in_bytes
memory.usage_in_bytes
to something like:
/mnt/cgroup/
tasks
cpu.shares
memory.limit_in_bytes
memory.usage_in_bytes
groups/
user_created_groupname1/
tasks
cpu.shares
memory.limit_in_bytes
memory.usage_in_bytes
groups/
user_created_groupname2/
tasks
cpu.shares
memory.limit_in_bytes
memory.usage_in_bytes
groups/
That would completely solve the namespace problem, at the cost of a
little extra verbosity/inelegance for human users (I suspect
programmatic users would prefer it), and lack of compatibility with
2.6.24. I'd also need to make the existing model a mount option so
Pretty much the same as cpusets, other than the additional
kernel-generated files in each directory, as provided by the resource
subsystems. So the same potential problem faced cpusets, but the fact
that new cpuset features weren't being developed quickly meant the
problem was less likely to actually bite people.
Paul
--On Thu, 28 Feb 2008 14:06:30 -0800 That doesn't. It sounds like cpusets legacy has mucked us up here? Could we do something like auto-prefixing user-created directories with a fixed string so that there is no way in which the user can cause a collision with kernel-created files? I suppose that would break cpusets back-compatibility as well? If so, we could do the prefixing only for non-cpusets directories, but that's getting hm. I guess that all the kernel-generated file names are known up-front and that they are instantiated early, so if a user tried to create a cgroup called "tasks", than that would just fail. But, as you say, later addition of new kernel-created files might collide with prior userspace installations. So yet another option would be to promise to prefix all _future_ kernel-generated files with "kern_", and to change the implementation now to reject any user-created files which start with "kern_". hm. --
OK, here's a patch that does that. It's not quite right yet, as it crashes on unmount, but the basic idea is there. It adds the additional "groups" directory by default, but uses the previous behaviour if the nogroupdir option is passed (done by default for cpusets).
Thoughts? Is this a direction we want to go in? As an option, or by default?
Paul
include/linux/cgroup.h | 2
kernel/cgroup.c | 164 +++++++++++++++++++++++++++++++++++--------------
kernel/cpuset.c | 2
3 files changed, 122 insertions(+), 46 deletions(-)
Index: subdir-2.6.25-rc3/include/linux/cgroup.h
===================================================================
--- subdir-2.6.25-rc3.orig/include/linux/cgroup.h
+++ subdir-2.6.25-rc3/include/linux/cgroup.h
@@ -105,7 +105,7 @@ struct cgroup {
struct cgroup *parent; /* my parent */
struct dentry *dentry; /* cgroup fs entry */
-
+ struct dentry *group_dentry;
/* Private pointers for each registered subsystem */
struct cgroup_subsys_state *subsys[CGROUP_SUBSYS_COUNT];
Index: subdir-2.6.25-rc3/kernel/cgroup.c
===================================================================
--- subdir-2.6.25-rc3.orig/kernel/cgroup.c
+++ subdir-2.6.25-rc3/kernel/cgroup.c
@@ -139,6 +139,7 @@ inline int cgroup_is_removed(const struc
/* bits in struct cgroupfs_root flags field */
enum {
ROOT_NOPREFIX, /* mounted subsystems have no named prefix */
+ ROOT_NOGROUPDIR, /* user-created sub-groups go in the main directory */
};
static int cgroup_is_releasable(const struct cgroup *cgrp)
@@ -475,6 +476,16 @@ static struct css_set *find_css_set(
return res;
}
+static inline struct cgroup *__d_cgrp(struct dentry *dentry)
+{
+ return dentry->d_fsdata;
+}
+
+static inline struct cftype *__d_cft(struct dentry *dentry)
+{
+ return dentry->d_fsdata;
+}
+
/*
* There is one global cgroup mutex. We also require taking
* task_lock() when dereferencing a task's cgroup subsys pointers.
@@ -598,33 +609,41 @@ static vo...So ... this proposal adds a 'groups' subdirectory in each cgroup, and places
the user generated subgroups in there.
It looks like an unnecessary, incompatible and complicating change to me.
For example, what would have been cgroup:
/mnt/cgroup/user_created_groupname1/user_created_groupname2
now becomes:
/mnt/cgroup/cgroups/user_created_groupname1/cgroups/user_created_groupname2
Right?
Why would you do this?
There is no problem with the current implementation, no bug we're
having trouble coding a fix for, no feature we're have trouble adding.
The current code, that simply doesn't allow colliding user names
because the kernel provided names are already there, works just fine.
You're doing this just to "protect the user from themself", to make
it more difficult for them to rely on some name that in a future
version is no longer available.
It annoys users, and rightfully so, to have to permanently deal with
interface warts, because the computer is trying to protect the user
from some hypothetical scenario that is not a problem the user needs
solving.
There is really a trivial resolution to this ... stake out what
additional kernel generated names might ever be added ... some
pattern(s) of characters which all future names will match, which
leave wide swaths of names safely available, in perpetuity, for
user created names, with no risk of future collision.
And did I say incompatible with released versions?
Hopefully Paul M isn't too surprised that I'm not endorsing this one ;).
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.940.382.4214
--Well, the additional components are called "groups" not "cgroups", but Correct. "Future-proofing" and "Forward planning" are two Yes, we could just say "the kernel reserves the right to use any names that begin with a lower-case letter, and no others", and be done with it. That leaves a bit of an ugly taste in my mouth, but if people seem Not at all incompatible if it requires a mount option to enable it ... Paul --
Ah yes - "groups" - right, sorry: The pattern might be stronger (more restrictive) than "[a-z].*" The pattern might be something like: 1) the known set of existing names (a short, specific list), plus 2) "[a-z]+\.[a-z]+(_[a-z]+)*" That second pattern is some lower case letters, a dot, and some more lower case letters, possibly with some embedded underscores. That is, except for the grandfathered existing known names, such as "tasks", you would be promising that all names added in the future would look like the examples (for some string of lower case letters "subsys"): subsys.foo subsys.bar_baz or for cgroup infrastructure names (using "kern" or "groups" prefix, I don't have a clear preference): groups.stuff groups.this_and_that Then for instance any name (not already in use) that had either (1) no embedded dot '.', or (2) at least one character other than "[a-z_.]+", or (3) other variants too numerous to list, would be safe for user created group names. Or, for a simpler and more restrictive regex pattern, don't allow underscores, resulting in all kernel generated names matching: 1) the known list, or 2) "[a-z]+\.[a-z]+" Note here "safe for user created" names just means safe from collision with kernel generated names, not necessarily safe from collision with other user generated names. That is regardless of what you do here, you cannot protect the delicate user from possible collision. You can only protect them from collision with "your" names. This risks imposing extra complexity on users just so you can avoid being blamed for the name collisions they might still experience anyway. When I phrase it that way, my enthusiasm for this proposal Ah - are you saying I missed another detail? That depending on how they mounted it, the path might be either of: /mnt/cgroup/groups/user_created_groupname1/groups/user_created_groupname2 or /mnt/cgroup/user_created_groupname1/user_created_groupname2 So code that knows something about the...
Why make it more complex? If we're going for a solution that involves an implicit partition of the namespace that the user has to be aware Of course - I don't think anyone's suggesting that the kernel can do anything about two competing users fighting over who gets to create cgroup "Foo". But IMO it's crazy to have multiple uncoordinated That was one of the questions that I left open - we could add it Yes, but compared to all the other configuration details that a userspace cgroup manager needs to work with, I think this would be a The main reason it's not my primary choice is that it's an implicit rule that the user has to know about or risk getting bitten in future. Making that rule complex is even worse, I think. Of course it does have the big advantage that no code changes are needed, just documentation. So if people prefer it, then I'm not going to fight hard. Paul --
It's a trade-off, between how many names the pattern covers, and how complicated it is. I don't really have a preference between "[a-z].*", "[a-z]+\.[a-z_]+", or other variants. This sort of namespace partitioning is common in other venues, such as one or two leading underscores in various C or Python names having particular 'system' uses. Perhaps 90% of users who ever are in a position to construct a name in the cgroup namespace won't worry much about this one way or the other. For them, the strlen() of the regex pattern describing future possible kernel generated cgroup names won't matter. A few such constructors of cgroup names will appreciate Ok - as I suspected. Note again, you can't keep the user from getting bitten. You can only avoid being one of the biters. From the users perspective, since you can't actually eliminate the risk of them ever seeing a collision, any complexity that you impose on their code risks being viewed as your taxing them more for your benefit (avoiding any blame to you) than for their benefit (actually avoiding all collision risk, which we can't practically do.) All name spaces co-operatively maintained in the commons have this risk of collision. As such name spaces go, the cgroup name space is easy. It's a small world. A few simple lexical conventions should suffice. I'd suggest something like you promise to stay in the "[a-z]+\.[a-z_]+" space, where the leading "[a-z]+" prefix is "cgroup" or one of a modest, slowly growing, set of cgroup subsystems, plus the existing grand-fathered names. Nothing here keeps others from intruding in that same space; you would just be promising not to go outside of that space. Self-imposed restraint like this "sells" better. The vast majority can just pay no mind; the small minority that care will appreciate that you're imposing constraints on yourself (on cgroup kernel code) that make their lives a little safer (one less chance of name collision) without imposing any additional constraints...
On Thu, Feb 28, 2008 at 2:21 PM, Andrew Morton That's something like putting them all in their own sub-directory, but We already have something like that in place, actually. When you mount the legacy "cpuset" filesystem, it just passes through to the cgroup filesystem with the mount options "cpuset,noprefix", i.e. mount a cgroups hierarcy with just cpusets bound to it, and *don't* prefix subsystem control files with the subsystem name. It wouldn't be hard to also have a "nosubdir" mount option that keeps the existing single-level style, have the cpuset filessytem pass that option. Paul --
| Arnd Bergmann | SCHED_IDLE documentation |
| david | Re: limits on raid |
| Jan Engelhardt | Re: [PATCH] CodingStyle: multiple updates |
| Ingo Molnar | Re: Rescheduling interrupts |
git: | |
| Russ Brown | git-svn: Branching clarifications |
| Sam Song | Fwd: [OT] Re: Git via a proxy server? |
| Junio C Hamano | Re: More precise tag following |
| Pierre Habouzit | Re: People unaware of the importance of "git gc"? |
| Michael | Virtual interface |
| Stijn | Re: libiconv problem |
| Stefan Beke | mail dovecot: pipe() failed: Too many open files |
| Amaury De Ganseman | "ping: sendto: No buffer space available" when using bittorrent or another p2p |
| Jim Winstead Jr. | Re: Root Disk/Book Disk Compatibility |
| Darren Senn | Re: Elm |
| Seung-Chul Woo | Is it possible to mount GNU HURD file system as DOS in SLS? |
| David Willmore | Re: Intel, the Pentium and Linux |
