Huge CFS vruntime spread (18 minutes) has been observed with LTTng while simply
running Xorg on a uniprocessor machine, 184.108.40.206 kernel. Detailed explanation in
my ELC2010 presentation at:
(includes slides, ad-hoc CFS instrumentation patches and wakeup latency test
I've torn the CFS scheduler apart in the past days to figure out what is causing
this weird behavior, and the culprit seems to be place_entity(). The problem
appears to be the cumulative effect of letting the min_vruntime go backward when
putting sleepers back on the runqueue. It lets the vruntime spread grow to
"entertaining" values (it is supposed to be in the 5ms range, not 18 minutes!).
In the original code, a max between the sched entity vruntime and the calculated
vruntime was supposed to "ensure that the thread time never go backward". But I
don't see why we even care about that. The key point is that the min_vruntime
of the runqueue should not go backward.
I propose to fix this by calculating the relative offset from
min_vruntime + sysctl_sched_latency rather than directly from min_vruntime. I
also ensure that the value never goes below min_vruntime.
Under the Xorg workload, moving a few windows around and starting firefox while
executing the wakeup-latency.c program (program waking up every 10ms and
reporting wakeup latency), this patch brings worse latency from 60ms down to
12ms. Even doing a kernel compilation at the same time, the worse latency stays
around 20ms now.
I'm submitting this patch ASAP, since it seems to fix CFS issues that many
people have been complaining about. I'm sending it as RFC because testing its
effect on more workloads would be welcome.
I can see that place_entity() has stayed more or less the same since 2.6.24 (and
maybe even before, as code has just been reorganised between 2.6.23 and 2.6.24),
so we can expect this to be a problem people have been experiencing for a while.
Signed-off-by: Mathieu Desnoyers ...