login
Header Space

 
 

Linux: Realtime Preemption

October 11, 2004 - 10:34pm
Submitted by Jeremy on October 11, 2004 - 10:34pm.
Linux news

Since his first announcement of voluntary kernel preemption [story], Ingo Molnar [interview] has continued to work on reducing overall latencies in the kernel [story]. His latest -T4 patch introduces a new kernel configuration option about which he says, "the big change in this release is the addition of PREEMPT_REALTIME, which is a new implementation of a fully preemptible kernel model" in which "spinlocks and rwlocks use semaphores and are preemptible by default", and "the _irq variants of these locks do not disable interrupts but rely on IRQ threading to exclude against interrupt contexts."

Ingo offers a comparison of his latest effort to Montavista's recent real time patchset [story]. To begin, he explains that his implementation uses a feature of gcc [forum] "to detect the type of the spinlock compile-time and to switch to the mutex or raw_spinlock API accordingly." He points out that this greatly reduces the size of the patch, making it easier to maintain. Additional differences include that he uses native Linux semaphores for implementing preemption, that beyond converting spinlocks he also converts rwlocks to rw-semaphores, and that he has increased the number of targeted non-mutex locks from 30 up to 90.

For those eager to try out Ingo's latest patch, he does caution, "CONFIG_PREEMPT_REALTIME is default-off and i'd only suggest to enable it on non-critical systems. It is the first iteration of this feature and it will sure have rough edges. Not for the faint hearted!" As part of his ongoing efforts to reduce overall kernel latency, it is still a work in progress.


From: Ingo Molnar [email blocked]
To:  linux-kernel
Subject: [patch] CONFIG_PREEMPT_REALTIME, 'Fully Preemptible Kernel', VP-2.6.9-rc4-mm1-T4
Date: 	Mon, 11 Oct 2004 16:29:53 +0200


i've released the -T4 VP patch:

  http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc4-mm1-T4

the big change in this release is the addition of PREEMPT_REALTIME,
which is a new implementation of a fully preemptible kernel model:

 - spinlocks and rwlocks use semaphores and are preemptible by default

 - the _irq variants of these locks do not disable interrupts but rely
   on IRQ threading to exclude against interrupt contexts.

note that this implementation is different from other kernel-preemption
patches, in a number of key areas. Initially i looked at merging the
MontaVista patchset from two days ago but decided to implement a new one
from scratch to cure a number of conceptual problems:

 - this patch auto-detects the 'type' of the lock at compilation time. 

   All fully-preemptible kernel patches i've seen so far suffer from one
   nasty problem: they are very large because they redefine _all_ the
   spinlock APIs to provide separation between 'mutex based' and
   'original' spinlocks. E.g. check out the sheer size of the MontaVista
   patchset: Linux-2.6.9-rc3-RT_spinlock1.patch and
   Linux-2.6.9-rc3-RT_spinlock2.patch are 84K and 92K and they convert
   ~30 core spinlocks to new APIs.

   OTOH this patch converts _90_ spinlocks in roughly half the
   patchsize, which makes a large difference in maintainability.

   How it works: this implementation uses a gcc feature to detect the
   type of the spinlock compile-time and to switch to the mutex or
   raw_spinlock API accordingly. Only one, very isolated change has to
   be done to switch a generic spinlock to a spin-only lock: spinlock_t
   is changed to raw_spinlock_t and the initializer is fixed up. All the
   other code remains untouched - and this even if a single C module
   contains both mutex-based and spinlock-based API calls. This approach
   is quite close to a simple object-oriented lock type - but written in
   C and compatible with the existing spinlock APIs.

 - i used the native Linux semaphores/rwsems to implement
   spinlock/rwlock preemption. E.g. the MontaVista patches use separate
   synchronization objects (kmutex/pmutex) to implement this.

   I believe using native semaphores is the better approach
   architecturally because this means that we have to add priority
   inheritance handling only once and to the native Linux semaphores. 
   This has the additional benefit of fixing all mutex-using
   kernel code's priority inheritance problems. (which kmutex/pmutex
   does not solve.)

   OTOH the MontaVista patches naturally have the advantage of having a
   working priority-inheritance mechanism in the pmutex code, right now. 
   (I did a brief attempt to plug the pmutex code into this patch but it
   didnt look good of a match - but others might want to try to 
   integrate it nevertheless.)

   also, another bad property of the kmutex/pmutex code is that it uses
   assembly which makes it quite hard to port to non-x86 architectures. 
   OTOH, the native Linux semaphores and rwsems work on every
   architecture.

 - the patch converts rwlocks too, while e.g. the MontaVista patchset
   still keeps rwlocks as spinlocks. It is important to convert rwlocks
   to rw-semaphores, most notably this allow the conversion of the
   tasklist and signal spinlocks.

 - finally, i went for correctness primarily, not latencies. I checked
   out the MontaVista patches and they categorize roughly 30 spinlocks
   as the ones that are necessary to be 'raw'. Unfortunately this is
   inadequate, my patch excludes 90 such locks and it's still probably
   not a 100% correct conversion. The core kernel needs changes in the
   locking infrastructure to get rid of most of the these 90 non-mutex
   locks.

it is highly recommended to enable DEBUG_PREEMPT when enabling
PREEMPT_REALTIME. It will warn about all the places that are unsafe. The
patch is x86-only for the time being, but the changes necessary for
other architectures should be relatively low.

NOTE: CONFIG_PREEMPT_REALTIME is default-off and i'd only suggest to
enable it on non-critical systems. It is the first iteration of this
feature and it will sure have rough edges. Not for the faint hearted!

NOTE2: some of the lock-break functionality offered by the -VP patchset
is disabled if PREEMPT_REALTIME is enabled - this is temporary. This
will likely result in an increase of the maximum measured latencies.

NOTE3: since so many spinlocks are still non-mutex, even average
latencies will be well above what we could achieve - but i wanted to
reach a known-correct codebase first. For example, most of the
networking spinlocks had to be made non-mutex because of networking's
use of RCU locking primitives and per-CPU data structures. The same is
true for the VFS - many of its locks are non-mutex still due to RCU. 
Once this infrastructure work is done the size of the patch will
decrease significantly.

to build a -T4 tree from scratch the patching order is:

   http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.8.tar.bz2
 + http://kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.9-rc4.bz2
 + http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc4/2.6.9-rc4-mm1/2.6.9-rc4-mm1.bz2
 + http://redhat.com/~mingo/voluntary-preempt/voluntary-preempt-2.6.9-rc4-mm1-T4

	Ingo


From: Ingo Molnar [email blocked] To: Daniel Walker [email blocked] Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel Date: Mon, 11 Oct 2004 22:49:59 +0200 * Daniel Walker [email blocked] wrote: > On Sun, 2004-10-10 at 14:59, Ingo Molnar wrote: > > * Andrew Morton [email blocked] wrote: > > > > > Lockmeter gets in the way of all this activity in a big way. I'll > > > drop it. > > > > great. Daniel, would you mind to merge your patchkit against the > > following base: > > > > -mm3, minus lockmeter, plus the -T3 patch > > > No problem. Next release will be without lockmeter. Thanks for the > patches. what do you think about the PREEMPT_REALTIME stuff in -T4? Ideally, if you agree with the generic approach, the next step would be to add your priority inheritance handling code to Linux semaphores and rw-semaphores. The sched.c bits for that looked pretty straightforward. The list walking is a bit ugly but probably unavoidable - the only other option would be 100 priority queues per semaphore -> yuck. Ingo
From: Sven Dietrich [email blocked] Subject: RE: [ANNOUNCE] Linux 2.6 Real Time Kernel Date: Mon, 11 Oct 2004 14:44:36 -0700 I think Daniel has some separate thoughts, here are mine: Regarding the list walking stuff: There are a lot of hashing options, indexing, etc. that could be done. We thought of it as a future optimization. An easy fix would be to insert RT processes at the front, non-RT from the tail of the queue. Regarding patch size: clearly this is an issue. We are working on creating a good map of spinlock nestings, to help with this. Will publish that ASAP. IMO the number of raw_spinlocks should be lower, I said teens before. Theoretically, it should only need to be around hardware registers and some memory maps and cache code, plus interrupt controller and other SMP-contended hardware. Practically, its an efficiency judgement call. Its not worth blocking for 5 instructions in a critical section under any circumstance, so the deepest nested locks should probably remain spinlocks. There are some concurrency issues in kernel threads, and I think there is a lot of work here. The abstraction for LOCK_OPS is a good alternative, but like the spin_undefs, its difficult to tell in the code whether you are dealing with a mutex or a spinlock. Regarding the use of the system semaphore: We have WIP on PMUTEX modified to use atomic_t, thereby eliminating the assembly for instant portability. Its slow, but optimizations are allowed for. Of course for actual portability the IRQ threads must also be running on those other platforms. Your IRQ abstraction is ideal for that. Eventually, I think that we will see optimization - the last touches would have the final mutex code converted back to assembly, for performance reasons. There are a whole lot of caveats and race conditions that have not yet been unearthed by the brief LKML testing. A lot of them have to do with wakeups of tasks blocked on a mutex, and differentiating between blocked "ready" and blocked "mutex" states. Here the system semaphore may have an advantage. With that, maybe we can work back towards the abstraction, so that we can evaluate both solutions for their specific advantages. I'll have to take a look at the new T4 patch in detail, but at first glance it seems that both mutexes could coexist in the abstraction. We'll give it a test run, and look forward to your thoughts. Thanks, Sven
From: Ingo Molnar [email blocked] Subject: Re: [ANNOUNCE] Linux 2.6 Real Time Kernel Date: Mon, 11 Oct 2004 23:54:20 +0200 * Sven Dietrich [email blocked] wrote: > IMO the number of raw_spinlocks should be lower, I said teens before. > > Theoretically, it should only need to be around hardware registers and > some memory maps and cache code, plus interrupt controller and other > SMP-contended hardware. yeah, fully agreed. Right now the 90 locks i have means roughly 20% of all locking still happens as raw spinlocks. But, there is a 'correctness' _minimum_ set of spinlocks that _must_ be raw spinlocks - this i tried to map in the -T4 patch. The patch does run on SMP systems for example. (it was developed as an SMP kernel - in fact i never compiled it as UP :-|.) If code has per-CPU or preemption assumptions then there is no choice but to make it a raw spinlock, until those assumptions are fixed. > There are some concurrency issues in kernel threads, and I think there > is a lot of work here. The abstraction for LOCK_OPS is a good > alternative, but like the spin_undefs, its difficult to tell in the > code whether you are dealing with a mutex or a spinlock. what do you mean by 'it's difficult to tell'? In -T4 you do the choice of type in the data structure and the API adapts automatically. If the type is raw_spinlock_t then a spin_lock() is turned into a _raw_spin_lock(). If the type is spinlock_t then the spin_lock() is redirected to mutex_lock(). It's all transparently done and always correct. > There are a whole lot of caveats and race conditions that have not yet > been unearthed by the brief LKML testing. [...] actually, have you tried your patchset on an SMP box? As far as i can see the locking in it ignores SMP issues _completely_, which makes the choice of locks much less useful. Ingo
From: Sven Dietrich [email blocked] Subject: RE: [ANNOUNCE] Linux 2.6 Real Time Kernel Date: Mon, 11 Oct 2004 16:05:11 -0700 > > * Sven Dietrich [email blocked] wrote: > > > IMO the number of raw_spinlocks should be lower, I said teens before. > > > > Theoretically, it should only need to be around hardware registers and > > some memory maps and cache code, plus interrupt controller and other > > SMP-contended hardware. > > yeah, fully agreed. Right now the 90 locks i have means roughly 20% of > all locking still happens as raw spinlocks. > > But, there is a 'correctness' _minimum_ set of spinlocks that _must_ be > raw spinlocks - this i tried to map in the -T4 patch. The patch does run > on SMP systems for example. (it was developed as an SMP kernel - in fact > i never compiled it as UP :-|.) If code has per-CPU or preemption > assumptions then there is no choice but to make it a raw spinlock, until > those assumptions are fixed. > The grunt work is in identifying those problem areas and coming up with elegant, low-impact solutions. RCU locks is one example as mentioned before. We had a fix to serialize RCU access, but weren't happy with that. We were hoping to get some input on this, but these problems seem to show up more readily on slow systems (we are also testing with a bunch of old P1, P2 and K6 boxes all far sub 1 GHz) > > There are some concurrency issues in kernel threads, and I think there > > is a lot of work here. The abstraction for LOCK_OPS is a good > > alternative, but like the spin_undefs, its difficult to tell in the > > code whether you are dealing with a mutex or a spinlock. > > what do you mean by 'it's difficult to tell'? In -T4 you do the choice > of type in the data structure and the API adapts automatically. If the > type is raw_spinlock_t then a spin_lock() is turned into a > _raw_spin_lock(). If the type is spinlock_t then the spin_lock() is > redirected to mutex_lock(). It's all transparently done and always > correct. > I was making this observation: One can't look at an arbitrary piece of code and tell if it will be a spinlock or a mutex. One has to go look elsewhere. In the spin_undefs case one can look the top of the file and check for it, in the LOCK_OPS case, you have to call up the data structure declaration. > > There are a whole lot of caveats and race conditions that have not yet > > been unearthed by the brief LKML testing. [...] > > actually, have you tried your patchset on an SMP box? As far as i can > see the locking in it ignores SMP issues _completely_, which makes the > choice of locks much less useful. > We stated that its been tested minimally on SMP. That means we have had it up and running and found it to be unstable. I fully agree that SMP is the superset to get it working on, and that PMutex is not perfect at this point. We will take a look at the T5 patch and see what we can do about PI for the system semaphore, but I am not sure how portable it would be without also touching the assembly. FWIW PMutex is already based in part on the system semaphore, so we might get similar problems when porting elsewhere. I think we should try and eliminate the mutex as an issue ASAP so we can move on to the real meat. We have spec'd some requirements in the rttReleaseNotes, clearly not all are being met, but we hoped to capture most of them. I have copied Arndt Heursch and Witold Jaworski in Germany, maybe they will also have some insights. Sven



Related Links:

Linux is looking more and mor

October 15, 2004 - 5:15am
Anonymous

Linux is looking more and more like Solaris/BSD everyday. This is the same locking/interrupt model they use.

Finally, they are comming around. ;-)

RE: Linux more and more

October 15, 2004 - 6:58am
Anonymous

What you thought Linux has a R+D budget. They have to get their
roadmap somewhere. Like little beamer cronies who are told what
to write and what to work on based on the latest marketing fluf.
But than agian Linux != innovation. It has a higher purpose?

whhaat?

October 21, 2004 - 5:55pm
Anonymous

I don't know this and what you guys are trying to do, but what the hell is preempt?

What does preempt do?

What's it purpuse?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
speck-geostationary