Linux: Measuring The Interactivity Estimator's Overhead

Submitted by Jeremy
on November 24, 2003 - 7:38pm

Con Kolivas [interview] recently posted a "demo" patch to the lkml in response to concern that his interactivity estimator [story] might be resulting in a performance hit. The patch entirely removes his interactivty estimator from the 2.6 kernel, allowing the unbelievers a chance to witness with their own eyes that indeed these ~350 lines of code are not the source of any throughput problems. He explains:

"The estimator should not somehow make your cpu 20% slower unless something is horribly wrong. It simply reorders which tasks go first, and if they do get to go first they round robin more frequently. Overall within a larger timespace the amount of time taken to do the work is the same (or slightly better in some settings). Delaying the cpu bound tasks and then letting them run for a longer timeslice when there is a larger cpu window allows them to benefit more from cpu cache. While the code looks complicated, the overhead is miniscule."


From: Con Kolivas [email blocked]
To: linux kernel mailing list [email blocked]
Subject: Demo patch - no interactivity 2.6
Date: Mon, 24 Nov 2003 10:16:08 +1100

I created this patch for demonstration purposes.

People have raised concerns about the overhead of the interactivity estimator 
in 2.6 and it's effect on throughput. Some anecdotes report wild accusations 
of 20% loss (without hard data). While others believe there is no need for 
interactivity estimation in the kernel with modern cpus.  

The work put in by myself to fine tune the estimator was carefully checked to 
ensure there were no regressions to performance and there have actually been 
minor improvements along the way. Beyond this is the interactivity estimator 
infrastructure already in place.

The estimator should not somehow make your cpu 20% slower unless something is 
horribly wrong. It simply reorders which tasks go first, and if they do get 
to go first they round robin more frequently. Overall within a larger 
timespace the amount of time taken to do the work is the same (or slightly 
better in some settings). Delaying the cpu bound tasks and then letting them 
run for a longer timeslice when there is a larger cpu window allows them to 
benefit more from cpu cache. While the code looks complicated, the overhead 
is miniscule.

So to demonstrate this more clearly, here is a patch that removes the 
interactivity estimator entirely, leaving the scheduler mechanism otherwise 
unchanged. No tasks get dynamic priority changes, and all tasks are allowed 
to run out their full timeslice and will then expire. There is no selective 
reinsertion into the active array. This deletes about 350 lines of code.

I have benchmarked this patch in a range of different benchmarks, and tried 
using it in real world settings.  There is no demonstrable performance 
benefit to this patch. Cpu throughput is the same in my testing (up to 16x). 
Not surprisingly, interactivity of this is also quite appalling. At rest it 
is fine, but as any load is placed on the system, X stutters and jerks. Audio 
is not too bad actually as a consequence of the fact that audio threads go to 
sleep before they expire so they always wake up in the active array and will 
not have too long a scheduling latency from there.

I do not recommend you use this patch in any real world setting as it is 
mildly DOS exploitable by repeated waking tasks, but otherwise works fine. I 
guess there are some embedded devices or specific one-use settings for the 
kernel where this would be ok to use. The only other setting I could think of 
is as a base patch for re-writing a new interactivity estimator from scratch.

Con

[patch]


From: Ingo Molnar [email blocked] Subject: Re: Demo patch - no interactivity 2.6 Date: Wed, 26 Nov 2003 10:59:24 +0100 (CET) On Mon, 24 Nov 2003, Con Kolivas wrote: > I created this patch for demonstration purposes. > > People have raised concerns about the overhead of the interactivity > estimator in 2.6 and it's effect on throughput. Some anecdotes report > wild accusations of 20% loss (without hard data). [...] this claim is nonsense, agreed. The only small change in performance should be for microbenchmark things like lmbench's lat_ctx. But this cost is well worth it. Ingo

Related Links: