Remove the filesystem support logic from the cpusets system and makes cpusets a cgroup subsystem The "cpuset" filesystem becomes a dummy filesystem; attempts to mount it get passed through to the cgroup filesystem with the appropriate options to emulate the old cpuset filesystem behaviour. Signed-off-by: Paul Menage <menage@google.com> --- Documentation/cpusets.txt | 91 +- fs/proc/base.c | 4 include/linux/cgroup_subsys.h | 6 include/linux/cpuset.h | 12 include/linux/mempolicy.h | 12 include/linux/sched.h | 3 init/Kconfig | 7 kernel/cpuset.c | 1192 +++++------------------------ kernel/exit.c | 2 kernel/fork.c | 3 mm/mempolicy.c | 2 11 files changed, 278 insertions(+), 1056 deletions(-) diff -puN Documentation/cpusets.txt~task-cgroupsv11-make-cpusets-a-client-of-cgroups Documentation/cpusets.txt --- a/Documentation/cpusets.txt~task-cgroupsv11-make-cpusets-a-client-of-cgroups +++ a/Documentation/cpusets.txt @@ -7,6 +7,7 @@ Written by Simon.Derr@bull.net Portions Copyright (c) 2004-2006 Silicon Graphics, Inc. Modified by Paul Jackson <pj@sgi.com> Modified by Christoph Lameter <clameter@sgi.com> +Modified by Paul Menage <menage@google.com> CONTENTS: ========= @@ -16,10 +17,9 @@ CONTENTS: 1.2 Why are cpusets needed ? 1.3 How are cpusets implemented ? 1.4 What are exclusive cpusets ? - 1.5 What does notify_on_release do ? - 1.6 What is memory_pressure ? - 1.7 What is memory spread ? - 1.8 How do I use cpusets ? + 1.5 What is memory_pressure ? + 1.6 What is memory spread ? + 1.7 How do I use cpusets ? 2. Usage Examples and Syntax 2.1 Basic Usage 2.2 Adding/removing cpus @@ -44,18 +44,19 @@ hierarchy visible in a virtual file syst hooks, beyond what is already present, required to manage dynamic job placement on large systems. -Each task ...
Paul M,
This snippet from the memory allocation hot path worries me a bit.
Once per memory page allocation, we go through here, needing to peak inside
the current tasks cpuset to see if it has changed (it's 'mems_generation'
value doesn't match the last seen value we have stashed in the task struct.)
@@ -653,20 +379,19 @@ void cpuset_update_task_memory_state(voi
struct task_struct *tsk = current;
struct cpuset *cs;
- if (tsk->cpuset == &top_cpuset) {
+ if (task_cs(tsk) == &top_cpuset) {
/* Don't need rcu for top_cpuset. It's never freed. */
my_cpusets_mem_gen = top_cpuset.mems_generation;
} else {
rcu_read_lock();
- cs = rcu_dereference(tsk->cpuset);
- my_cpusets_mem_gen = cs->mems_generation;
+ my_cpusets_mem_gen = task_cs(current)->mems_generation;
rcu_read_unlock();
}
With this new cgroup code, the task_cs macro was added, -twice-,
which deals with the fact that what used to be a single pointer
in the task struct directly to the tasks cpuset is now roughly
two more dereferences and an indexing away:
static inline struct cpuset *task_cs(struct task_struct *task)
{
return container_of(task_subsys_state(task, cpuset_subsys_id),
struct cpuset, css);
}
static inline struct cgroup_subsys_state *task_subsys_state(
struct task_struct *task, int subsys_id)
{
return rcu_dereference(task->cgroups->subsys[subsys_id]);
}
At a minimum, could you change that last added line to use 'tsk'
instead of 'current'? This should save one instruction, as 'tsk'
will likely already be in a register.
+ my_cpusets_mem_gen = task_cs(tsk)->mems_generation;
I guess the two, rather than one, invocations of task_cs() won't matter
much, as they are on the same address, so the second invocation will
hit cache lines just found on the first invocation.
I wonder if we can save any cache line hits on this, or if there is
someway to measure whether or not this has noticeable performance
impact.
... Probably ...It's two constant-indexed dereferences *in total*, compared to a single constant-indexed dereference in the pre-cgroup case. The cpuset pointer is found at task->cgroups->subsys[cpuset_subsys_id], where cpuset_subsys_id is a I didn't notice any performance hit on a pure allocate/free memory benchmark relative to non-cgroup cpusets. (There was a small performance hit relative to not using cpusets at all, but that was to be expected). Paul -
Ok - the C expression is longer and I didn't realize how
little difference it made in the end (the executing code.)
Good - thanks.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
-
Good.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
-
