On Tue, 2008-10-14 at 18:37 +0530, Vaidyanathan Srinivasan wrote:
Ah, yes another use case of this ;-)
Right - I was hoping for some feedback from the arch folks (maybe I
should have CC'ed linux-arch) on this issue.
Yes, trouble is that as soon as its not a constant, we get into a
generic optimisation problem, which I'd rather not try to solve in the
CPU scheduler.
Agreed, getting a normalized value is possibly non-trivial. If we'd look
at things like (avg) cpu-speed over a measured time interval its doable,
but once we start looking at instructions completed and similar things
this might be a little more difficult.
Then again, we could perhaps re-normalize the value such that the avg
value over all cpus ends up being 1 - then again, SMT might ruin this
scheme.
Yes, I just tossed it in to not forget about it ;-)
The thing is, I know I wanted to write about two corner cases, but I've
already forgotten one.. I'm still hoping it will again occur to me :-)
Exactly, using a bitmask type systems allows doing that more easily,
because then a domain can be multiple types at once.
We can even consider adding more information like: shared_l1, shared_l2
and shared_l3. So that we have the full cache hierarchy available.
Ah, lets assume a 2 cpu system.
When presented with 2 tasks of weight 1024 and 1536, this constitues an
infeasible weight distribution.
There is no work-conserving way to schedule those two tasks, such that
their received cpu-time is proportionally fair.
However, when presented with 3 tasks each of weight 1024, this is
statically-infeasible but dynamically-feasible.
That is, there is no static distribution of tasks such that each tasks
receives a proportionally fair share of the cpu-time. However, by
rotating the excess task between the 2 cpus, we can ensure fairness.
In the latter case, we always have an imbalance between runqueues which
is smaller than 1 task.
In this case we can do two things, not schedule in which case we choose
to maintain the static distribution, this is called under-scheduling.
Or we move 1 task despite the fact that we'll tip the imbalance the
other way around, this is called over-scheduling.
As you can see, over-scheduling allows for fairness in more cases, but
at some expense.
A lot of people prefer the added determinism of the extra fairness, but
not everybody. Hence the tunable.
--