From: Paul Jackson <email@example.com>
Add a new per-cpuset flag called 'sched_load_balance'.
When enabled in a cpuset (the default value) it tells the kernel
scheduler that the scheduler should provide the normal load
balancing on the CPUs in that cpuset, sometimes moving tasks
from one CPU to a second CPU if the second CPU is less loaded
and if that task is allowed to run there.
When disabled (write "0" to the file) then it tells the kernel
scheduler that load balancing is not required for the CPUs in
Now even if this flag is disabled for some cpuset, the kernel
may still have to load balance some or all the CPUs in that
cpuset, if some overlapping cpuset has its sched_load_balance
If there are some CPUs that are not in any cpuset whose
sched_load_balance flag is enabled, the kernel scheduler will
not load balance tasks to those CPUs.
Moreover the kernel will partition the 'sched domains'
(non-overlapping sets of CPUs over which load balancing is
attempted) into the finest granularity partition that it can
find, while still keeping any two CPUs that are in the same
shed_load_balance enabled cpuset in the same element of the
This serves two purposes:
1) It provides a mechanism for real time isolation of some CPUs, and
2) it can be used to improve performance on systems with many CPUs
by supporting configurations in which load balancing is not done
across all CPUs at once, but rather only done in several smaller
disjoint sets of CPUs.
This mechanism replaces the earlier overloading of the per-cpuset
flag 'cpu_exclusive', which overloading was removed in an earlier
See further the Documentation and comments in the code itself.
Signed-off-by: Paul Jackson <firstname.lastname@example.org>
Andrew - this patch goes right after your *-mm patch:
Documentation/cpusets.txt | 141 ++++++++++++++++++++++++-
Ah - good catch - thanks, Andrew. I should put together a fix,
to check these kmalloc calls.
For rebuild_sched_domains(), this can mean we just up and return having
For arch_init_sched_domains(), the code calling arch_init_sched_domains()
is not checking for failure, and doesn't have a trivial fallback code
path in the case of failure. I'll have to think about that one just a
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <email@example.com> 1.925.600.0401