Re: [RFC PATCH v2 0/5] sched: modular find_busiest_group()

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Peter Zijlstra
Date: Tuesday, October 14, 2008 - 6:25 am

On Tue, 2008-10-14 at 18:37 +0530, Vaidyanathan Srinivasan wrote:

Ah, yes another use case of this ;-)


Right - I was hoping for some feedback from the arch folks (maybe I
should have CC'ed linux-arch) on this issue.


Yes, trouble is that as soon as its not a constant, we get into a
generic optimisation problem, which I'd rather not try to solve in the
CPU scheduler.


Agreed, getting a normalized value is possibly non-trivial. If we'd look
at things like (avg) cpu-speed over a measured time interval its doable,
but once we start looking at instructions completed and similar things
this might be a little more difficult.

Then again, we could perhaps re-normalize the value such that the avg
value over all cpus ends up being 1 - then again, SMT might ruin this
scheme.


Yes, I just tossed it in to not forget about it ;-)

The thing is, I know I wanted to write about two corner cases, but I've
already forgotten one.. I'm still hoping it will again occur to me :-)


Exactly, using a bitmask type systems allows doing that more easily,
because then a domain can be multiple types at once.

We can even consider adding more information like: shared_l1, shared_l2
and shared_l3. So that we have the full cache hierarchy available.


Ah, lets assume a 2 cpu system.

When presented with 2 tasks of weight 1024 and 1536, this constitues an
infeasible weight distribution.

There is no work-conserving way to schedule those two tasks, such that
their received cpu-time is proportionally fair.

However, when presented with 3 tasks each of weight 1024, this is
statically-infeasible but dynamically-feasible.

That is, there is no static distribution of tasks such that each tasks
receives a proportionally fair share of the cpu-time. However, by
rotating the excess task between the 2 cpus, we can ensure fairness.

In the latter case, we always have an imbalance between runqueues which
is smaller than 1 task.

In this case we can do two things, not schedule in which case we choose
to maintain the static distribution, this is called under-scheduling.

Or we move 1 task despite the fact that we'll tip the imbalance the
other way around, this is called over-scheduling.

As you can see, over-scheduling allows for fairness in more cases, but
at some expense.

A lot of people prefer the added determinism of the extra fairness, but
not everybody. Hence the tunable.

--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[RFC PATCH v2 0/5] sched: modular find_busiest_group(), Vaidyanathan Srinivasan, (Thu Oct 9, 5:09 am)
[RFC PATCH v2 1/5] sched: load calculation for each group ..., Vaidyanathan Srinivasan, (Thu Oct 9, 5:09 am)
[RFC PATCH v2 2/5] sched: calculate statistics for current ..., Vaidyanathan Srinivasan, (Thu Oct 9, 5:09 am)
[RFC PATCH v2 3/5] sched: collect statistics required for ..., Vaidyanathan Srinivasan, (Thu Oct 9, 5:09 am)
[RFC PATCH v2 4/5] sched: small imbalance corrections, Vaidyanathan Srinivasan, (Thu Oct 9, 5:09 am)
[RFC PATCH v2 5/5] sched: split find_busiest_group(), Vaidyanathan Srinivasan, (Thu Oct 9, 5:09 am)
Re: [RFC PATCH v2 0/5] sched: modular find_busiest_group(), Vaidyanathan Srinivasan, (Thu Oct 9, 6:36 pm)
Re: [RFC PATCH v2 0/5] sched: modular find_busiest_group(), Peter Zijlstra, (Tue Oct 14, 5:09 am)
Re: [RFC PATCH v2 0/5] sched: modular find_busiest_group(), Vaidyanathan Srinivasan, (Tue Oct 14, 6:07 am)
Re: [RFC PATCH v2 0/5] sched: modular find_busiest_group(), Peter Zijlstra, (Tue Oct 14, 6:25 am)
Re: [RFC PATCH v2 0/5] sched: modular find_busiest_group(), Vaidyanathan Srinivasan, (Fri Oct 24, 3:04 am)