On Fri, Apr 16, 2010 at 05:14:15PM +0200, Peter Zijlstra wrote:
Excellent point -- much of the issue really does center around
forward-progress guarantees. In fact the Linux kernel has a number of
locking primitives that require different degrees of forward-progress
guarantee from the code in their respective critical sections:
o spin_lock_irqsave(): Critical sections must guarantee forward
progress against everything except NMI handlers.
o raw_spin_lock(): Critical sections must guarantee forward
progress against everything except IRQ (including softirq)
and NMI handlers.
o spin_lock(): Critical sections must guarantee forward
progress against everything except IRQ (again including softirq)
and NMI handlers and (given CONFIG_PREEMPT_RT) higher-priority
realtime tasks.
o mutex_lock(): Critical sections need not guarantee
forward progress, as general blocking is permitted.
The other issue is the scope of the lock. The Linux kernel has
the following:
o BKL: global scope.
o Everything else: scope defined by the use of the underlying
lock variable.
One of the many reasons that we are trying to get rid of BKL is because
it combines global scope with relatively weak forward-progress guarantees.
So here is how the various RCU flavors stack up:
o rcu_read_lock_bh(): critical sections must guarantee forward
progress against everything except NMI handlers and IRQ handlers,
but not against softirq handlers. Global in scope, so that
violating the forward-progress guarantee risks OOMing the system.
o rcu_read_lock_sched(): critical sections must guarantee
forward progress against everything except NMI and IRQ handlers,
including softirq handlers. Global in scope, so that violating
the forward-progress guarantee risks OOMing the system.
o rcu_read_lock(): critical sections must guarantee forward
progress against everything except NMI handlers, IRQ handlers,
softirq handlers, and (in CONFIG_PREEMPT_RT) higher-priority
realtime tasks. Global in scope, so that violating the
forward-progress guarantee risks OOMing the system.
o srcu_read_lock(): critical sections need not guarantee forward
progress, as general blocking is permitted. Scope is controlled
by the use of the underlying srcu_struct structure.
As you say, one can block in rcu_read_lock() critical sections, but
the only blocking that is really safe is blocking that is subject to
priority inheritance. This prohibits mutexes, because although the
mutexes themselves are subject to priority inheritance, the mutexes'
critical sections might well not be.
So the easy response is "just use SRCU." Of course, SRCU has some
disadvantages at the moment:
o The return value from srcu_read_lock() must be passed to
srcu_read_unlock(). I believe that I can fix this.
o There is no call_srcu(). I believe that I can fix this.
o SRCU uses a flat per-CPU counter scheme that is not particularly
scalable. I believe that I can fix this.
o SRCU's current implementation makes it almost impossible to
implement priority boosting. I believe that I can fix this.
o SRCU requires explicit initialization of the underlying
srcu_struct. Unfortunately, I don't see a reasonable way
around this. Not yet, anyway.
So, is there anything else that you don't like about SRCU?
Thanx, Paul
--