login
Header Space

 
 

Linux: Voluntary Kernel Preemption

July 10, 2004 - 11:45am
Submitted by Jeremy on July 10, 2004 - 11:45am.
Linux news

In response to a recent discussion on the lkml, Ingo Molnar [interview] posted a patch that introduces "voluntary kernel preemption" into the Linux 2.6 kernel. Ingo explains the problem that he is working to address:

"As most of you are probably aware of it, there have been complaints on [the] lkml that the 2.6 kernel is not suitable for serious audio work due to high scheduling latencies (e.g. the Jackit people complained). I took a look at latencies and indeed 2.6.7 is pretty bad - latencies up to 50 msec (!) can be easily triggered using common workloads, on fast 2GHz+ x86 system - even when using the fully preemptible kernel!"

Ingo explains that voluntary preemption provides latencies equal to what was possible in 2.4 with the low latency patches, aiming for no latency greater than 1 millisecond, however accomplished in a much different way:

"Unlike the lowlatency patches, this patch doesn't add a lot of new scheduling points to the source code, it rather reuses a rich but currently inactive set of scheduling points that already exist in the 2.6 tree: the might_sleep() debugging checks. Any code point that does might_sleep() is in fact ready to sleep at that point. So the patch activates these debugging checks to be scheduling points. This reduces complexity and impact quite significantly."


From: Ingo Molnar [email blocked]
To:  linux-kernel
Subject: [announce] [patch] Voluntary Kernel Preemption Patch
Date: 	Fri, 9 Jul 2004 20:26:38 +0200


as most of you are probably aware of it, there have been complaints on
lkml that the 2.6 kernel is not suitable for serious audio work due to
high scheduling latencies (e.g. the Jackit people complained). I took a
look at latencies and indeed 2.6.7 is pretty bad - latencies up to 50
msec (!) can be easily triggered using common workloads, on fast 2GHz+
x86 system - even when using the fully preemptible kernel!

to solve this problem, Arjan van de Ven and I went over various kernel
functions to determine their preemptability and we re-created from
scratch a patch that is equivalent in performance to the 2.4 lowlatency
patches but is different in design, impact and approach:

  http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.7-bk20-H2

  (Note to kernel patch reviewers: the split voluntary_resched type of
  APIs, the feature #ifdefs and runtime flags are temporary and were
  only introduced to enable a easy benchmarking/comparisons. I'll split
  this up into small pieces once there's testing feedback and actual
  audio users had their say!)

unlike the lowlatency patches, this patch doesn't add a lot of new
scheduling points to the source code, it rather reuses a rich but
currently inactive set of scheduling points that already exist in the
2.6 tree: the might_sleep() debugging checks. Any code point that does
might_sleep() is in fact ready to sleep at that point. So the patch
activates these debugging checks to be scheduling points. This reduces
complexity and impact quite significantly.

but even using these (over one hundred) might_sleep() points there were
still a number of latency sources in the kernel - we identified and
fixed them by hand, either via additional might_sleep() checks, or via
explicit rescheduling points. Sometimes lock-break was necessary as
well.

as a practical goal, this patch aims to fix all latency sources that
generate higher than ~1 msec latencies. We'd love to learn about
workloads that still cause audio skipping even with this patch applied,
but i've been unable to generate any load that creates higher than 1msec
latencies. (not counting driver initialization routines.)

this patch is also more configurable than the 2.4 lowlatency patches
were: there's a .config option to enable voluntary preemption, and there
are runtime /proc/sys knobs and boot-time flags to turn voluntary
preemption (CONFIG_VOLUNTARY_PREEMPT) and kernel preemption
(CONFIG_PREEMPT) on/off:

        # turn on/off voluntary preemption (if CONFIG_VOLUNTARY_PREEMPT)
	echo 1 > /proc/sys/kernel/voluntary_preemption
	echo 0 > /proc/sys/kernel/voluntary_preemption

        # turn on/off the preemptible kernel feature (if CONFIG_PREEMPT)
	/proc/sys/kernel/kernel_preemption
	/proc/sys/kernel/kernel_preemption

the 'voluntary-preemption=0/1' and 'kernel-preemption=0/1' boot options
can be used to control these flags at boot-time.

all 4 combinations make sense if both CONFIG_PREEMPT and
CONFIG_VOLUNTARY_PREEMPT are enabled - great for performance/latency
testing and comparisons.

The stock 2.6 kernel is equivalent to:

   voluntary_preemption:0 kernel_preemption:0

the 2.6 kernel with voluntary kernel preemption is equivalent to:

   voluntary_preemption:1 kernel_preemption:0

the 2.6 kernel with preemptible kernel enabled is:

   voluntary_preemption:0 kernel_preemption:1

and the preemptible kernel enhanced with additional lock-breaks is 
enabled via:

   voluntary_preemption:1 kernel_preemption:1

it is safe to change these flags anytime.

The patch is against 2.6.7-bk20, and it also includes fixes for kernel
bugs that were uncovered while developing this patch. While it works for
me, be careful when using this patch!

Testreports, comments, suggestions are more than welcome,

	Ingo


From: Christoph Hellwig [email blocked] Subject: Re: [announce] [patch] Voluntary Kernel Preemption Patch Date: Fri, 9 Jul 2004 20:51:05 +0100 > unlike the lowlatency patches, this patch doesn't add a lot of new > scheduling points to the source code, it rather reuses a rich but > currently inactive set of scheduling points that already exist in the > 2.6 tree: the might_sleep() debugging checks. Any code point that does > might_sleep() is in fact ready to sleep at that point. So the patch > activates these debugging checks to be scheduling points. This reduces > complexity and impact quite significantly. I don't think this is a good idea. Just because a function might sleep it doesn't mean it should sleep. I'd rather add the might_sleep() to cond_resched() and replace the former with the latter in the cases where it makes sense.
From: Andrea Arcangeli [email blocked] Subject: Re: [announce] [patch] Voluntary Kernel Preemption Patch Date: Sat, 10 Jul 2004 01:50:17 +0200 On Fri, Jul 09, 2004 at 08:51:05PM +0100, Christoph Hellwig wrote: > > unlike the lowlatency patches, this patch doesn't add a lot of new > > scheduling points to the source code, it rather reuses a rich but > > currently inactive set of scheduling points that already exist in the > > 2.6 tree: the might_sleep() debugging checks. Any code point that does > > might_sleep() is in fact ready to sleep at that point. So the patch > > activates these debugging checks to be scheduling points. This reduces > > complexity and impact quite significantly. > > I don't think this is a good idea. Just because a function might sleep > it doesn't mean it should sleep. I'd rather add the might_sleep() to > cond_resched() and replace the former with the latter in the cases where > it makes sense. agreed. might_sleep() just like BUG() can be defined to noop. cond_resched() is the API to use. the other bad thing is that there is no point for the sysctl (in 2.4 that made no sense at all too, yeah it only makes sense for benchmarking easily w/ and w/o the feature but it must be optimized away at the very least with a config option for production), if need_resched is set we _must_ schedule no matter what (a sysctl can only introduce a bug if something). If we spend any cpu checking the sysctl, we should instead spend such cpu to check need_resched in the first place. The rest is of course very welcome, but you should remove all the pollution from the patch to make it mergeable. Just convert all those might to cond_resched() and remove all the superflous volountary stuff and config options. As worse you can leave a single config option LOW_RESCHEDULE_OVERHEAD with PREEMPT=n, that could remove some cond_resched() from an extremely fast path if you're concerned about adding branches in some critical point, but you really seem not concerned since with CONFIG_PREEMPT_VOLUNTARY=y (the only way to enable it) you even _waste_ cpu on these paths to check a worthless sysctl that can only introduce bugs at runtime since it overrides the wishes of the scheduler. If scheduler is bad fix the scheduler, but as soon as need_resched is set no sysctl must be allowed to mask the wishes of the scheduler.
From: Andrea Arcangeli [email blocked] Subject: Re: [announce] [patch] Voluntary Kernel Preemption Patch Date: Sat, 10 Jul 2004 02:52:08 +0200 On Sat, Jul 10, 2004 at 01:50:17AM +0200, Andrea Arcangeli wrote: > agreed. might_sleep() just like BUG() can be defined to noop. BTW, this reminded me a related topic that I can't recall being ever mentioned on l-k: BUG_ON can also be optimized away. So people should be careful not to do write this: BUG_ON(test_and_set_bit(p)) but to write this instead: if (unlikely(test_and_set_bit(p)) BUG() (in short the check inside a BUG_ON must be strictly read-only since it's not guaranteed to be computed)
From: Dave Jones [email blocked] Subject: Re: [announce] [patch] Voluntary Kernel Preemption Patch Date: Sat, 10 Jul 2004 02:02:02 +0100 On Sat, Jul 10, 2004 at 02:52:08AM +0200, Andrea Arcangeli wrote: > On Sat, Jul 10, 2004 at 01:50:17AM +0200, Andrea Arcangeli wrote: > > agreed. might_sleep() just like BUG() can be defined to noop. > > BTW, this reminded me a related topic that I can't recall being ever > mentioned on l-k: google for 'BUG_ON side effects'. It's come up a number of times 8-) Doesn't mean it isn't worth repeating however. Dave
From: Arjan van de Ven [email blocked] Subject: Re: [announce] [patch] Voluntary Kernel Preemption Patch Date: Sat, 10 Jul 2004 08:32:22 +0200 On Sat, Jul 10, 2004 at 01:50:17AM +0200, Andrea Arcangeli wrote: > the other bad thing is that there is no point for the sysctl (in 2.4 > that made no sense at all too, yeah it only makes sense for benchmarking > easily w/ and w/o the feature but it must be optimized away at the very > least with a config option for production), as Ingo wrote, that is the plan, all that "crud" is there just to make it easy to benchmark for now.
From: Ingo Molnar [email blocked] Subject: Re: [announce] [patch] Voluntary Kernel Preemption Patch Date: Sat, 10 Jul 2004 14:48:14 +0200 * Christoph Hellwig [email blocked] wrote: > > unlike the lowlatency patches, this patch doesn't add a lot of new > > scheduling points to the source code, it rather reuses a rich but > > currently inactive set of scheduling points that already exist in the > > 2.6 tree: the might_sleep() debugging checks. Any code point that does > > might_sleep() is in fact ready to sleep at that point. So the patch > > activates these debugging checks to be scheduling points. This reduces > > complexity and impact quite significantly. > > I don't think this is a good idea. Just because a function might > sleep it doesn't mean it should sleep. I'd rather add the > might_sleep() to cond_resched() and replace the former with the latter > in the cases where it makes sense. think of voluntary preemption as a variant of CONFIG_PREEMPT with different tradeoffs: it doesnt preempt as much code but it's cheaper (in terms of code footprint and overhead) and less risky (in terms of code affected). doesnt mean it should be scheduled to', which is the wrong approach because it is ultimately the decision of the user which tasks get scheduled (by giving processes various priorities) and the decision of the scheduler (for freely schedulable tasks). The preemption decision does not depend and should not depend on the kernel function utilized! if you dont care about latencies and want to maximize throughput (for e.g. servers) then you dont want to enable CONFIG_PREEMPT_VOLUNTARY. That way you get artificial batching of parallel workloads. FYI, i am also preparing a preemption patch where there's a (per-task) tunable for 'expected maximum latency' and the kernel would measure latencies and not do a forced preemption unless this latency is being exceeded. Voluntary preemption and CONFIG_PREEMPT means this tunable has a value of 0 - we reschedule as soon as possible. Server workloads mean a much higher tolerated latency value in the range of 50 msecs or so. Both are fair expectations and settings. Ingo



Related Links:

WHAT? COoperative multitaskin

July 11, 2004 - 12:17pm
Anonymous

WHAT? COoperative multitasking in Linux?
Ingo! Microsoft already made this mistake once, please!
COoperative multitasking is NONSENSE!

speck-geostationary