Yup - it's not a good name for asking for a partition.
That's because it isn't asking for a partition.
It's asking for load balancing over the CPUs in the cpuset so marked.
Yup - it's asking for load balancing over that set. That is why it is
called that. There's no idea here of better or worse load balancing,
that's an internal kernel scheduler subtlety -- it's just a request that
load balancing be done.
That is what is visible to user space: whether or not tasks get moved
from overloaded CPUs to underloaded, though still allowed, CPUs.
This is visible to user space in two ways:
1) as task movemement, which may or may not be what is desired, and
2) as kernel CPU cycles spent, because load balancing costs CPU cycles
that increase more than linearly with the number of CPUs being
balanced.
The user doesn't give a hoot what a 'sched domain' is. They care to
manage (1) whether their tasks might move under a load imbalance, and
(2) how many CPU cycles the kernel spends providing this service.
You would do this with the current, single rooted cpuset (and now
cgroup) mechanism by having multiple immediate child cpusets of the
root cpuset, which partition the system CPUs. There is no need to
invent some bastardized multiple root structure.
I don't know what proposal you are reacting to here. Clearly not this
patch that I have proposed, as it is trivially easy to indicate whether
you want to load balance the root cpuset - by setting or clearing the
'sched_load_balance' flag in the root cpuset.
How could it possibly get any more direct that that?
My approach doesn't do that - perhaps we aren't communicating.
We are in complete agreement that the admin should specify what they
want, and leave it to the kernel to figure out how to do it.
Excellent -- I'm glad you like my approach </sarcasm>
We are in complete agreement in insisting on this.
In short:
The kernel schedulers dynamic sched domains are --not-- the service
being provided to the user. "Sched domains" are just the kernel
internal mechanism.
The service being provided is dynamic load balancing of tasks from
overloaded CPUs to underloaded CPUs.
Some users will want to disable load balancing on some cpusets, because
either:
(1) it's too expensive to balance really large cpusets unless really
needed, or
(2) real time users don't want to waste the CPU cycles doing
balancing even on small cpusets.
If you think I repeated everything two or three times above ... good,
you're right - I did.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
-