In response to a recent discussion on the lkml, Ingo Molnar [interview] posted a patch that introduces "voluntary kernel preemption" into the Linux 2.6 kernel. Ingo explains the problem that he is working to address:
"As most of you are probably aware of it, there have been complaints on [the] lkml that the 2.6 kernel is not suitable for serious audio work due to high scheduling latencies (e.g. the Jackit people complained). I took a look at latencies and indeed 2.6.7 is pretty bad - latencies up to 50 msec (!) can be easily triggered using common workloads, on fast 2GHz+ x86 system - even when using the fully preemptible kernel!"
Ingo explains that voluntary preemption provides latencies equal to what was possible in 2.4 with the low latency patches, aiming for no latency greater than 1 millisecond, however accomplished in a much different way:
"Unlike the lowlatency patches, this patch doesn't add a lot of new scheduling points to the source code, it rather reuses a rich but currently inactive set of scheduling points that already exist in the 2.6 tree: the might_sleep() debugging checks. Any code point that does might_sleep() is in fact ready to sleep at that point. So the patch activates these debugging checks to be scheduling points. This reduces complexity and impact quite significantly."
From: Ingo Molnar [email blocked] To: linux-kernel Subject: [announce] [patch] Voluntary Kernel Preemption Patch Date: Fri, 9 Jul 2004 20:26:38 +0200 as most of you are probably aware of it, there have been complaints on lkml that the 2.6 kernel is not suitable for serious audio work due to high scheduling latencies (e.g. the Jackit people complained). I took a look at latencies and indeed 2.6.7 is pretty bad - latencies up to 50 msec (!) can be easily triggered using common workloads, on fast 2GHz+ x86 system - even when using the fully preemptible kernel! to solve this problem, Arjan van de Ven and I went over various kernel functions to determine their preemptability and we re-created from scratch a patch that is equivalent in performance to the 2.4 lowlatency patches but is different in design, impact and approach: http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.7-bk20-H2 (Note to kernel patch reviewers: the split voluntary_resched type of APIs, the feature #ifdefs and runtime flags are temporary and were only introduced to enable a easy benchmarking/comparisons. I'll split this up into small pieces once there's testing feedback and actual audio users had their say!) unlike the lowlatency patches, this patch doesn't add a lot of new scheduling points to the source code, it rather reuses a rich but currently inactive set of scheduling points that already exist in the 2.6 tree: the might_sleep() debugging checks. Any code point that does might_sleep() is in fact ready to sleep at that point. So the patch activates these debugging checks to be scheduling points. This reduces complexity and impact quite significantly. but even using these (over one hundred) might_sleep() points there were still a number of latency sources in the kernel - we identified and fixed them by hand, either via additional might_sleep() checks, or via explicit rescheduling points. Sometimes lock-break was necessary as well. as a practical goal, this patch aims to fix all latency sources that generate higher than ~1 msec latencies. We'd love to learn about workloads that still cause audio skipping even with this patch applied, but i've been unable to generate any load that creates higher than 1msec latencies. (not counting driver initialization routines.) this patch is also more configurable than the 2.4 lowlatency patches were: there's a .config option to enable voluntary preemption, and there are runtime /proc/sys knobs and boot-time flags to turn voluntary preemption (CONFIG_VOLUNTARY_PREEMPT) and kernel preemption (CONFIG_PREEMPT) on/off: # turn on/off voluntary preemption (if CONFIG_VOLUNTARY_PREEMPT) echo 1 > /proc/sys/kernel/voluntary_preemption echo 0 > /proc/sys/kernel/voluntary_preemption # turn on/off the preemptible kernel feature (if CONFIG_PREEMPT) /proc/sys/kernel/kernel_preemption /proc/sys/kernel/kernel_preemption the 'voluntary-preemption=0/1' and 'kernel-preemption=0/1' boot options can be used to control these flags at boot-time. all 4 combinations make sense if both CONFIG_PREEMPT and CONFIG_VOLUNTARY_PREEMPT are enabled - great for performance/latency testing and comparisons. The stock 2.6 kernel is equivalent to: voluntary_preemption:0 kernel_preemption:0 the 2.6 kernel with voluntary kernel preemption is equivalent to: voluntary_preemption:1 kernel_preemption:0 the 2.6 kernel with preemptible kernel enabled is: voluntary_preemption:0 kernel_preemption:1 and the preemptible kernel enhanced with additional lock-breaks is enabled via: voluntary_preemption:1 kernel_preemption:1 it is safe to change these flags anytime. The patch is against 2.6.7-bk20, and it also includes fixes for kernel bugs that were uncovered while developing this patch. While it works for me, be careful when using this patch! Testreports, comments, suggestions are more than welcome, Ingo
From: Christoph Hellwig [email blocked] Subject: Re: [announce] [patch] Voluntary Kernel Preemption Patch Date: Fri, 9 Jul 2004 20:51:05 +0100 > unlike the lowlatency patches, this patch doesn't add a lot of new > scheduling points to the source code, it rather reuses a rich but > currently inactive set of scheduling points that already exist in the > 2.6 tree: the might_sleep() debugging checks. Any code point that does > might_sleep() is in fact ready to sleep at that point. So the patch > activates these debugging checks to be scheduling points. This reduces > complexity and impact quite significantly. I don't think this is a good idea. Just because a function might sleep it doesn't mean it should sleep. I'd rather add the might_sleep() to cond_resched() and replace the former with the latter in the cases where it makes sense.
From: Andrea Arcangeli [email blocked] Subject: Re: [announce] [patch] Voluntary Kernel Preemption Patch Date: Sat, 10 Jul 2004 01:50:17 +0200 On Fri, Jul 09, 2004 at 08:51:05PM +0100, Christoph Hellwig wrote: > > unlike the lowlatency patches, this patch doesn't add a lot of new > > scheduling points to the source code, it rather reuses a rich but > > currently inactive set of scheduling points that already exist in the > > 2.6 tree: the might_sleep() debugging checks. Any code point that does > > might_sleep() is in fact ready to sleep at that point. So the patch > > activates these debugging checks to be scheduling points. This reduces > > complexity and impact quite significantly. > > I don't think this is a good idea. Just because a function might sleep > it doesn't mean it should sleep. I'd rather add the might_sleep() to > cond_resched() and replace the former with the latter in the cases where > it makes sense. agreed. might_sleep() just like BUG() can be defined to noop. cond_resched() is the API to use. the other bad thing is that there is no point for the sysctl (in 2.4 that made no sense at all too, yeah it only makes sense for benchmarking easily w/ and w/o the feature but it must be optimized away at the very least with a config option for production), if need_resched is set we _must_ schedule no matter what (a sysctl can only introduce a bug if something). If we spend any cpu checking the sysctl, we should instead spend such cpu to check need_resched in the first place. The rest is of course very welcome, but you should remove all the pollution from the patch to make it mergeable. Just convert all those might to cond_resched() and remove all the superflous volountary stuff and config options. As worse you can leave a single config option LOW_RESCHEDULE_OVERHEAD with PREEMPT=n, that could remove some cond_resched() from an extremely fast path if you're concerned about adding branches in some critical point, but you really seem not concerned since with CONFIG_PREEMPT_VOLUNTARY=y (the only way to enable it) you even _waste_ cpu on these paths to check a worthless sysctl that can only introduce bugs at runtime since it overrides the wishes of the scheduler. If scheduler is bad fix the scheduler, but as soon as need_resched is set no sysctl must be allowed to mask the wishes of the scheduler.
From: Andrea Arcangeli [email blocked] Subject: Re: [announce] [patch] Voluntary Kernel Preemption Patch Date: Sat, 10 Jul 2004 02:52:08 +0200 On Sat, Jul 10, 2004 at 01:50:17AM +0200, Andrea Arcangeli wrote: > agreed. might_sleep() just like BUG() can be defined to noop. BTW, this reminded me a related topic that I can't recall being ever mentioned on l-k: BUG_ON can also be optimized away. So people should be careful not to do write this: BUG_ON(test_and_set_bit(p)) but to write this instead: if (unlikely(test_and_set_bit(p)) BUG() (in short the check inside a BUG_ON must be strictly read-only since it's not guaranteed to be computed)
From: Dave Jones [email blocked] Subject: Re: [announce] [patch] Voluntary Kernel Preemption Patch Date: Sat, 10 Jul 2004 02:02:02 +0100 On Sat, Jul 10, 2004 at 02:52:08AM +0200, Andrea Arcangeli wrote: > On Sat, Jul 10, 2004 at 01:50:17AM +0200, Andrea Arcangeli wrote: > > agreed. might_sleep() just like BUG() can be defined to noop. > > BTW, this reminded me a related topic that I can't recall being ever > mentioned on l-k: google for 'BUG_ON side effects'. It's come up a number of times 8-) Doesn't mean it isn't worth repeating however. Dave
From: Arjan van de Ven [email blocked] Subject: Re: [announce] [patch] Voluntary Kernel Preemption Patch Date: Sat, 10 Jul 2004 08:32:22 +0200 On Sat, Jul 10, 2004 at 01:50:17AM +0200, Andrea Arcangeli wrote: > the other bad thing is that there is no point for the sysctl (in 2.4 > that made no sense at all too, yeah it only makes sense for benchmarking > easily w/ and w/o the feature but it must be optimized away at the very > least with a config option for production), as Ingo wrote, that is the plan, all that "crud" is there just to make it easy to benchmark for now.
From: Ingo Molnar [email blocked] Subject: Re: [announce] [patch] Voluntary Kernel Preemption Patch Date: Sat, 10 Jul 2004 14:48:14 +0200 * Christoph Hellwig [email blocked] wrote: > > unlike the lowlatency patches, this patch doesn't add a lot of new > > scheduling points to the source code, it rather reuses a rich but > > currently inactive set of scheduling points that already exist in the > > 2.6 tree: the might_sleep() debugging checks. Any code point that does > > might_sleep() is in fact ready to sleep at that point. So the patch > > activates these debugging checks to be scheduling points. This reduces > > complexity and impact quite significantly. > > I don't think this is a good idea. Just because a function might > sleep it doesn't mean it should sleep. I'd rather add the > might_sleep() to cond_resched() and replace the former with the latter > in the cases where it makes sense. think of voluntary preemption as a variant of CONFIG_PREEMPT with different tradeoffs: it doesnt preempt as much code but it's cheaper (in terms of code footprint and overhead) and less risky (in terms of code affected). doesnt mean it should be scheduled to', which is the wrong approach because it is ultimately the decision of the user which tasks get scheduled (by giving processes various priorities) and the decision of the scheduler (for freely schedulable tasks). The preemption decision does not depend and should not depend on the kernel function utilized! if you dont care about latencies and want to maximize throughput (for e.g. servers) then you dont want to enable CONFIG_PREEMPT_VOLUNTARY. That way you get artificial batching of parallel workloads. FYI, i am also preparing a preemption patch where there's a (per-task) tunable for 'expected maximum latency' and the kernel would measure latencies and not do a forced preemption unless this latency is being exceeded. Voluntary preemption and CONFIG_PREEMPT means this tunable has a value of 0 - we reschedule as soon as possible. Server workloads mean a much higher tolerated latency value in the range of 50 msecs or so. Both are fair expectations and settings. Ingo
Included in ck
I have this patch in my -bk20 -ck development snapshot:
http://ck.kolivas.org/patches/2.6/2.6.7/2.6.7-bk20/snapshot-2.6.7-bk20-c...
Question
I have already bk20+ck4 kernel source, to patch ck5,
should i have to delete /usr/src/linux-2.6.7 and extract linux-2.6.7/tar.bz2 again?
--
Thanks!
http://directfb.org (no it's not my home page :) )
Clean
If it was a snapshot it will definitely be easier to start with a clean tree.
What happen to the hoopla bout pre-empt?
I thought the preemption was supposed to make the latency extremely low?
re: What happen to the hoopla bout pre-empt?
This is discussed in an earlier story.
x86 only?
anyone have an idea as to why this patch is x86-only? hmm... guess i'll add the option to arch/x86_64/Kconfig and hope for the best. :)
Ingo the x86 man
Seems Ingo starts lots of patches as x86 only if you watch lkml.
Maybe its all he has
Not many people are blessed with having a Sparc, a IA64 or a AMD64 lying around, so I'm guessing he cant ensure that it works on those platforms.. Probably better for him to mark it one way instead of getting flooded by bug reports from untested platforms.
Auzy
The driver on demand project
Will it be in the -MM tree?I
Will it be in the -MM tree?I found the fedora core 2's 2.6.5kernel have a heavy latency problem when I play music and sufer with the mozilla,it almost stop play and no sound output
Not the problem
The problem you are having is that the scheduler does not run your music player enough, or with low enough latency.
It is most likely *not* a problem with resched points or lock hold times.
WHAT? COoperative multitaskin
WHAT? COoperative multitasking in Linux?
Ingo! Microsoft already made this mistake once, please!
COoperative multitasking is NONSENSE!
Chill out
This patch is regarding pre-emption in system calls, which is a different context from preemptive multitasking. Linux always has had and always will have a preemptive task scheduler. However, when a system call is made, the task scheduler is traditionally not involved - the system call is scheduled until it completes, at which point normal scheduling returns. It is possible to also reschedule while in a system call to reduce latency for the processes which have to wait on that call that they didn't even make, but that introduces more complexity into the kernel code. What we are trying to do here is decide whether or not the increased complexity of making kernel code preemptible is worth the reduced latency we gain.
Incidentally
Incidentally, cooperative multitasking is not nonsense in many scenarios. It does typically use system resources more efficiently than preemptive multitasking. Many embedded RTOSes use what amounts to cooperative multitasking for their non-interrupt driven tasks.
All that said, none of that has anything to do with Ingo's patch. His patch simply adds more rescheduling points to the kernel. This is the non-CONFIG_PREEMPT way to reduce scheduling latency.
Awesome!
It's good to see that someone still thinks about the desktop and multimedia usage.
It seems to me that the majority of the core kernel developers have a mindset that's oriented more towards usage scenarios that are typical for servers. It's been typical for Linux, in the last couple "stable" series, to work pretty well on servers, but exhibit a horrendous behaviour on the desktop or for multimedia applications.
I'm not saying the core developers are doing a bad job, they're doing an outstanding job overall, but it's a statistical truth that they're turning a blind eye to the desktop and multimedia things.
Sure, there were always patches floating around trying to correct the situation, but that actually confirms my point: those were patches, afterthoughts, quick hacks, instead of being part of the core design.
Every time someone tries to fix that, there's significant resistance from the core developers. The result? When i'm using Linux to record music digitally, i have to be careful which apps i'm running in the background, otherwise the machine will skip parts of the sound stream. Reboot same machine in Windows and that never happens. And that's an AthlonXP/1800, not quite state of the art, but pretty powerful nevertheless.
I remember Linus saying, a while ago, that the server stuff has been pretty much figured out, and the most interesting things in the Linux world are going to happen on the desktop.
Well, until the Linux kernel gets anywhere near a state where it's at least decent from the desktop perspective, that ain't going to happen.
I mean, have you tried to do multimedia work on stock Linux kernels? It sucks. On the said AthlonXP/1800 i have 4 IDE ports, just so i can hook up each IDE drive alone to its own IDE cable, for performance reasons. I'm capturing a DV stream from a digital camcorder to a hard-drive that's solely used for this purpose. Well, guess what - if Mozilla or Evolution so much as pass gas or something, the video capture program will skip frames. The situation improved a bit since i reformated the multimedia drive with XFS, but it's still quite ridiculous.
Hello, core developers... Desktop? Multimedia? Please?
Strange. When i'm using Wi
Strange.
When i'm using Windows to type something, i have to be careful which apps i'm running in the background, otherwise the machine will skip keypresses. And that's P4/2600. On a much slower machine (P3/600), I never lose keypresses.
Probably buggy drivers and/or
Probably buggy drivers and/or mobo chipset. I remember hearing recently that at least one family of recent laptops miss "key up" events, and so whatever they do to work around it in Windows might be eating your keystrokes.
Will it be merged in the main
Will it be merged in the main kernel tree or next developing kernel?if it not,will it mean the patch lose?