Expose the new lock protection lock.
This can be used to annotate places where we take multiple locks of the
same class and avoid deadlocks by always taking another (top-level) lock
first.
NOTE: we're still bound to the MAX_LOCK_DEPTH (48) limit.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/lockdep.h | 2 ++
include/linux/spinlock.h | 6 ++++++
kernel/spinlock.c | 11 +++++++++++
3 files changed, 19 insertions(+)
Index: linux-2.6/include/linux/lockdep.h
===================================================================
--- linux-2.6.orig/include/linux/lockdep.h
+++ linux-2.6/include/linux/lockdep.h
@@ -410,8 +410,10 @@ static inline void print_irqtrace_events
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# ifdef CONFIG_PROVE_LOCKING
# define spin_acquire(l, s, t, i) lock_acquire(l, s, t, 0, 2, NULL, i)
+# define spin_acquire_nest(l, s, t, n, i) lock_acquire(l, s, t, 0, 2, n, i)
# else
# define spin_acquire(l, s, t, i) lock_acquire(l, s, t, 0, 1, NULL, i)
+# define spin_acquire_nest(l, s, t, n, i) lock_acquire(l, s, t, 0, 1, NULL, i)
# endif
# define spin_release(l, n, i) lock_release(l, n, i)
#else
Index: linux-2.6/include/linux/spinlock.h
===================================================================
--- linux-2.6.orig/include/linux/spinlock.h
+++ linux-2.6/include/linux/spinlock.h
@@ -183,8 +183,14 @@ do { \
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# define spin_lock_nested(lock, subclass) _spin_lock_nested(lock, subclass)
+# define spin_lock_nest_lock(lock, nest_lock) \
+ do { \
+ typecheck(struct lockdep_map *, &nest_lock->dep_map); \
+ _spin_lock_nest_lock(lock, &nest_lock->dep_map); \
+ } while (0)
#else
# define spin_lock_nested(lock, subclass) _spin_lock(lock)
+# define spin_lock_nest_lock(lock, nest_lock) _spin_lock(lock)
#endif
#define write_lock(lock) _write_lock(lock)
Index: ...> NOTE: we're still bound to the MAX_LOCK_DEPTH (48) limit. A) It is probably a good idea to put this in a comment somewhere near where spin_lock_nest_lock() is declared. B) It is probably a good idea to write that comment in such a way that dumb people like me understand what the limit is. The sentence I quoted above is too telegraphic for me to get. Is the point that no more than 48 spinlocks can be held at once, even if the inner locks are protected by some top level lock? Or do you mean something else? - R. --
No more than 48 locks (mutexes, rwlocks, spinlock, RCU, everything covered by lockdep) held by any one code-path; including nested interrupt contexts. --
> No more than 48 locks (mutexes, rwlocks, spinlock, RCU, everything > covered by lockdep) held by any one code-path; including nested > interrupt contexts. Does that mean that something like the new mm_take_all_locks() operation is going to explode if someone tries to use it with lockdep on? - R. --
Gah - yes, clearly nobody tried this.. :-/ Just looking at the code it will not only run into this limit, but it would warn about recursion on the second file/anon vma due to utter lack of annotation. Why are people still developing without lockdep? /me sad --
On Mon, Aug 04, 2008 at 04:32:12PM +0200, Peter Zijlstra wrote: > On Mon, 2008-08-04 at 07:26 -0700, Roland Dreier wrote: > > > No more than 48 locks (mutexes, rwlocks, spinlock, RCU, everything > > > covered by lockdep) held by any one code-path; including nested > > > interrupt contexts. > > > > Does that mean that something like the new mm_take_all_locks() operation > > is going to explode if someone tries to use it with lockdep on? > > Gah - yes, clearly nobody tried this.. :-/ > > Just looking at the code it will not only run into this limit, but it > would warn about recursion on the second file/anon vma due to utter lack > of annotation. > > Why are people still developing without lockdep? More puzzling, is why hasn't this triggered in the Fedora rawhide kernels, which do have lockdep enabled. Dave -- http://www.codemonkey.org.uk --
My guess is that the kvm thing attaches before there are any vma, and leaves after all the vma are gone. So it would never actually trigger. --
Yes, lockdep seems to be fine with kvm in kernel mainline in my current and past testing. Andrew asked me to check this long ago. vmx ~ # zgrep LOCKDEP /proc/config.gz CONFIG_LOCKDEP_SUPPORT=y CONFIG_LOCKDEP=y CONFIG_DEBUG_LOCKDEP=y CPA self-test: 4k 26112 large 1997 gb 0 x 2652[ffff880000000000-ffff8800bffff000] miss 262144 4k 186880 large 1683 gb 0 x 43021[ffff880000000000-ffff8800bffff000] miss 262144 4k 186880 large 1683 gb 0 x 43021[ffff880000000000-ffff8800bffff000] miss 262144 ok. loaded kvm module () apic write: bad size=1 fee00030 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 0 kvm: emulating exchange as write apic write: bad size=1 fee00030 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 0 apic write: bad size=1 fee00030 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 0 apic write: bad size=1 fee00030 Ignoring de-assert INIT to vcpu 0 apic write: bad size=1 fee00030 Ignoring de-assert INIT to vcpu 0 apic write: bad size=1 fee00030 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 0 I can't see lockdep errors in dmesg starting one more multiple VM in a loop (all run on a quadcore). GRU is likely the same. The only real place where lockdep is unusable in my experience is preempt-RT, it grinds it to an halt during boot on 8-way before reaching the shell. --
Dave Jones just handed me: David Miller just did a patch that might fix that. --
Sorry, that wasn't the problem, but my current testing passed because of another error... to test this report I rebuilt a kvm configured for rhel not for mainline so it wasn't the right test... When "long ago" I tested that this was working fine (actually when Andrew asked me), I guess lockdep was working because the implementation with the vmalloc array was slightly different, otherwise I don't know. I'm fairly certain that it worked fine at some I can reproduce this now yes after a 'make sync'. Obviously this is a bug in lockdep that it trips over this otherwise if lockdep was right the kernel should deadlock while this is just a false positive and everything runs fine. I assume it can't understand the spinlock address is different (I think it uses the address as key only for static locks), so I wonder if you could call print_deadlock_bug() from the failure path of the spinlock to solve this? Do things like double_rq_lock works just because rq1 and rq2 don't Woow cool, after 11 months I lost any hope that lockdep could ever work in that environment... Was it an actual bug or is this some way to lower the complexity of the graph build? --
From: Andrea Arcangeli <andrea@qumranet.com> Lockdep can't recognize if spinlocks are at a different address. So trylock avoids lockdep to generate false positives. After lockdep will be fixed this change can and should be reverted. Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> --- (btw it's not like I forgot to sync, but I sync against the wrong source tree previously because it was in the bash history, it's a bit In the meantime (as I doubt lockdep will get fixed any time soon) this will workaround it. diff -r 3469dce61df1 mm/mmap.c --- a/mm/mmap.c Tue Jul 29 20:01:28 2008 +0200 +++ b/mm/mmap.c Mon Aug 04 19:41:53 2008 +0200 @@ -2279,8 +2279,13 @@ static void vm_lock_anon_vma(struct anon /* * The LSB of head.next can't change from under us * because we hold the mm_all_locks_mutex. + * + * spin_lock would confuse lockdep who can't + * differentiate between the 'mapping' always changing + * address. */ - spin_lock(&anon_vma->lock); + while (!spin_trylock(&anon_vma->lock)) + cpu_relax(); /* * We can safely modify head.next after taking the * anon_vma->lock. If some other vma in this mm shares @@ -2310,7 +2315,13 @@ static void vm_lock_mapping(struct addre */ if (test_and_set_bit(AS_MM_ALL_LOCKS, &mapping->flags)) BUG(); - spin_lock(&mapping->i_mmap_lock); + /* + * spin_lock would confuse lockdep who can't + * differentiate between the 'mapping' always changing + * address. + */ + while (!spin_trylock(&mapping->i_mmap_lock)) + cpu_relax(); } } --
From: Andrea Arcangeli <andrea@qumranet.com> Lockdep can't recognize if spinlocks are at a different address. So using trylock in a loop is one way to avoid lockdep to generate false positives. After lockdep will be fixed this change can and should be reverted. Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> --- Resubmit because I didn't update the subject and so I improved the comments too. This is mostly to show it's a bug in lockdep and it can be trivially worked around without having to fix lockdep for real, any superior solution to this hack is more than welcome and recommended. diff -r 3469dce61df1 mm/mmap.c --- a/mm/mmap.c Tue Jul 29 20:01:28 2008 +0200 +++ b/mm/mmap.c Mon Aug 04 19:54:27 2008 +0200 @@ -2279,8 +2279,12 @@ static void vm_lock_anon_vma(struct anon /* * The LSB of head.next can't change from under us * because we hold the mm_all_locks_mutex. + * + * spin_lock would confuse lockdep who doesn't notice + * the 'anon_vma' always changing address. */ - spin_lock(&anon_vma->lock); + while (!spin_trylock(&anon_vma->lock)) + cpu_relax(); /* * We can safely modify head.next after taking the * anon_vma->lock. If some other vma in this mm shares @@ -2310,7 +2314,12 @@ static void vm_lock_mapping(struct addre */ if (test_and_set_bit(AS_MM_ALL_LOCKS, &mapping->flags)) BUG(); - spin_lock(&mapping->i_mmap_lock); + /* + * spin_lock would confuse lockdep who doesn't notice + * the 'mapping' always changing address. + */ + while (!spin_trylock(&mapping->i_mmap_lock)) + cpu_relax(); } } --
NAK, come-on, you didn't even bother to look at the available --
> NAK, come-on, you didn't even bother to look at the available > annotations.. Agree that this trylock-in-a-loop is a horrible hack that should never be merged. However I wonder what you think the right way to annotate the potentially unbounded number of locks taken in mm_take_all_locks() is? - R. --
Good question, one which I don't currently have an answer for - see my other mail why getting rid of the MAX_LOCK_DEPTH limitation is hard. I'll give it another serious go tomorrow (if nothing else preempts me). --
Let's say when I hear prove-locking my instinct tells me to walk away as fast as I can. I'll try to explain why I prefer the trylock loop (which btw is the only one that will hide the lockdep false positives here and I welcome you to fix it in another way, well another way that comes to mind is to call __raw_spin_lock which I didn't do because I don't like those lowlevel details in common code). So about prove-locking: 1) in production is disabled so when I get bugreports I've to grab locking deadlock information as usual (sysrq+t/p or preferably lkcd/kdump) 2) while coding it's useless as well because I don't need this thing to debug and fix any deadlocks 3) this only finds bugs after the system is hung and I can fix it by other means then Despite having used this for a while, it never happened to me that this has been useful as single time, I only run into false positives and systems grinding to an halt so not allowing debug kernels to run which reduced even more the ability to debug the kernel. I'm not aware of anybody else who was debugging that fixed bugs thanks to this code. So in my experience this only has hurt me and I don't want it ever enabled in anything I deal with. If only this had a slight chance to find bugs that don't easily trigger at runtime it'd be an entirely different matter. To be clear: my criticism is only to prove-locking, for example I find very useful that there are other things like spinlock debug options that warns if you call GFP with GFP_KERNEL while you're in atomic context (which unfortunately only works with preempt enabled for no good reason because the atomic context is maintained by non-preempt too but anyway). The spinlock sleep debug effectively find bugs that would otherwise go unnoticed. Let's say I'm only interested in stuff that shows bugs that would otherwise go unnoticed, everything else I can debug it by myself without prove-locking printk in the logs. So this can only help if you aren't capable of ...
You're so wrong it not even funny. It reports about deadlocks before they happen. All it needs is to observe a lock order violation and it will report it. In order for the dead-lock to happen, you need to actually hit the violation concurrently with the normal order. IOW, lockdep can even spot deadlocks on a non-preempt single cpu setup where they can never actually happen. Furthermore, it does more than the simple lock deadlocks, it also takes IRQ state into account. So it can tell about hard or soft irq recursion deadlocks. Having lockdep on while developing saves a lot of trouble - in fact it _has_ caught many real bugs before they could be introduced to mainline, ask Arjan who has supervised driver development. Not only that, it caught plenty of real bugs in mainline as well as -rt. These days it appears to not catch many because the tree is in such good shape, but that is fully thanks to lockdep. That is not to say it's perferct - lockdep certainly does have it limitations. But your portrail is very in-accurate. --
Now tell me how it helps to report them... It tells me the system has crashed and where, it's not like I couldn't figure it out by myself but just noticing nothing works and all cpus are spinning in some The point is that this is a runtime evaluation of lock orders, if runtime isn't the lucky one that reproduces the deadlock, it'll find Ask the preempt-RT folks, advertising infinite-cpu-smp on UP. No need of lockdep to reproduce deadlocks that couldn't be found without it in UP. Infact lockdep only avoids to optimize away the spinlock, you don't need preempt-RT to reproduce at zero-runtime-cost in a equally easily debuggable way whatever the lockdep can reproduced on a UP compile. You've only to run a SMP kernel on UP, big deal that lockdep Ok this gets just a bit more interesting but my understanding is that again this only "reports" stuff that crashes hard and is trivially Are you saying those bugs weren't trivially debuggable anyway by kernel-hackers with a simple sysrq+p/t or kgdb stack trace? If yes, I can only imagine that nobody takes care of setting up debugging options that actually are _zero_ runtime cost and don't I've an hard time to believe you, the only thing I can agree is that if you don't want to run smp kernel on UP (or even better preempt-RT on UP) you can run lockdep to find recursion. smp-preempt-RT on UP is a real feature that is useful debugging, because it makes bugs visible that weren't, lockdep is useless as far as I can tell. --
Hi Andrea, Sometimes it's hard to actually trigger a deadlock condition during development, especially if you're not developing on a big iron machine. --
My understand is that lockdep can't do static analysis of code, it's not a checker. It can only evaluate the locks in the order the cpu takes them. And if the CPU never takes them in the wrong order because only your unlikely to happen slow path takes them in the wrong order while the other cpus hold them in the opposite order, lockdep can't say nothing at all, how could it? If it would say something when the cpu doesn't deadlock a moment later, then it would be a false positive 99% of the time. So now we learn lockdep also lead developers to think if their code doesn't generate lockdep errors it will be safe in production, so again more damage thanks to lockdep. Lockdep is a printk that reports that you were incredibly lucky, that you got the lock inversion during development even if there was only once chance in a thousand, and that you can celebrate. But I know I can figure it out by myself when the system has crashed without something eating all cpu and polluting kernel common code to tell me. In other words I imagine lockdep as something like this: char *ptr = NULL BUG_ON(!ptr); /* this is lockdep */ *ptr = 100; Luckily nobody dreams to add BUG_ON that checks for null pointer to dereference, as there's the mmu for that. The same way for the null pointer dereference, there are zerocost means to debug all sort of kernel crashing deadlocks. Now perhaps I misunderstood lockdep entirely but I doubt as this thing is driven by calls made by the spin_lock functions, and it can't know anything about the spin_lock that haven't been run by the cpu yet. --
On Mon, 4 Aug 2008 23:30:18 +0200
You did ;)
At runtime lockdep will "learn" the lock ordering rules, building a
graph in memory. If it ever sees the thus-learnt rules violated, it
will warn.
Simple example:
static DEFINE_MUTEX(a);
static DEFINE_MUTEX(b);
void f1(void)
{
mutex_lock(a);
mutex_lock(b);
}
void f2(void)
{
mutex_lock(b);
mutex_lock(a);
}
void doit(void)
{
f1();
f2();
}
lockdep will warn here about the ranking violation. As soon as it
occurs. Even though the system never deadlocked.
It's very very clever and powerful.
--
Yes I wasn't aware of the AB BA feature that doesn't require the
inversion to happen on both cpus at the same time, despite having
spent weeks when it was kernel crashing systems at boot (which shows
it's not immediately easy to understand how that thing works by
reading the code).
static DEFINE_MUTEX(a);
static DEFINE_MUTEX(b);
static DEFINE_MUTEX(c);
void f1(void)
{
mutex_lock(c);
mutex_lock(a);
mutex_lock(b);
mutex_unlock(c);
mutex_unlock(a);
mutex_unlock(b);
}
void f2(void)
{
mutex_lock(c);
mutex_lock(b);
mutex_lock(a);
mutex_unlock(c);
mutex_unlock(b);
mutex_unlock(a);
}
void f3(void)
{
mutex_lock(a);
mutex_lock(b);
mutex_unlock(a);
mutex_unlock(b);
}
void f4(void)
{
mutex_lock(b);
mutex_lock(a);
mutex_unlock(b);
mutex_unlock(a);
}
void doit(void)
{
f1();
f2();
stop_machine(f3);
stop_machine(f4);
I'm curious, does it also handle the above? I mean not a big deal to
refactor the code to shutdown lockdep warnings, but it doesn't look
that clever to me. It seems just clever the minimum enough to be
useful with its AB BA memory.
Now I see lockdep has a good point to exist, for the lock inversion
"memory" but this is by no mean a feature I can remotely like in
general given how inaccurate and prone for false positives it is. It
looks more a necessary evil to deal with, like I tried to do with my
patch (even before understanding it was more useful than what I
thought initially).
--
On Mon, 4 Aug 2008 23:30:18 +0200 yes lockdep will only complain WHEN you take them in the wrong order But you claimed you would for sure be in a deadlock at that point which this comment totally puzzles me... rather than calling you naive... where was this said or even implied???? -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
I already said I didn't know about that despite having spent a fair amount of time trying to understand why lockdep crashes systems at boot about an year ago. I admit I didn't understand much about it and reducing its computation time didn't look feasible, perhaps my fault, It was just a logical conclusion of the statement that lockdep will warn of the following classes of locking bugs: "lock inversion scenarios"... There's no warning anywhere that it might not find them at all. Or point me to where it is warned you still have to read the code and verify it yourself, or you still risk to AB BA in production. If it was warned I wouldn't have mentioned it, people seem to talk like if lockdep is a checker doing static analyses of all paths when it can't. Partly of what I said before is true, even if I didn't understand the actual details of the AB BA memory it has by reading the code: the BA may happen only when system is OOM etc... so even lockdep memory may never find it. So my warning that code might AB BA deadlock even if lockdep doesn't warn sounds fair enough and without it I'm still afraid it can lead to developers think everything is ok with regard to AB BA. In any case (even if everyone already understand lockdep better than I did before this discussion), even if I'm wrong a sign of warning that lockdep isn't enough, can't hurt. --
On Tue, 5 Aug 2008 00:30:11 +0200 interesting; lockdep has been working for the last.. 2 1/2 years at least, and I don't remember seeing bugreports against it from you that would describe it as totally non-functional. Oh well.. seems you're rather preoccupied about it; that's ok, you're entitled to your opinion even if I don't agree with it ;-) -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
I reported it to Peter. If you see David's email, I guess it can be implied that I wasn't the only one aware that prove-locking made certain systems non functional, I thought it was widespread knowledge maybe not. It's amazing that things seem to have improved on that side, it surely So let me understand better: your opinion is that all of lockdep is useful, not just the AB BA detection? By reading the source again after 11 months to me it still looks check_deadlock() only has knowledge of the current context. It loops over the task struct checking all the locks of the current task! Combine the great feature that check_deadlock provides, with crashing at boot, and I hope that better explains my feeling about lockdep-prove-locking. This check_deadlock() thing is the real core of my dislike of prove-locking! The check_noncircular part I totally agree it's useful now that I see it works differently than check_deadlock (when I read it last time I thought it worked the same as check_deadlock). check_noncircular being useful doesn't automatically make check_deadlock useful. And incidentally it's exactly this check_deadlock part that is trapping on my code and that is now requiring silly changes to the common code (the ones I did) or to make the common code even more complex (what Peter is planning I guess). --
On Mon, 4 Aug 2008 23:09:54 +0200 I think you totally misunderstand things then. Lockdep will report a problem if it *ever* sees a BA order after it has seen a BA order. They don't have to happen at the same time. Or even within hours of eachother. They MIGHT happen... in a narrow time window, when you have a deadlock. But lockdep will warn you about the order violation without actually having to dealock... because the AB is likely to be done already most of the time when the BA happens. --
For me the false positive was generated by the recursive spinlock feature, not the deadlock inversion feature. If you limit my comments to the recursive spinlock feature perhaps you'll be more likely to So this is about the lock inversion feature, I admit I didn't realize it has memory that locks have been taken in AB order and spawns the first time locks have been taken in BA order. But this will lead to more false positives because there's nothing wrong to do AB BA if there's C lock taken before them! Perhaps enforcing lockdep all over the kernel is worth it just for this AB BA thing in case it doesn't only spawn false positives, but still I can't like something that is as inaccurate and prone for errors as lockdep and spreading all over the place to try to reduce the false positives it emits. I can't like that but that's just me... others clearly love it. --
From: Andrea Arcangeli <andrea@qumranet.com> It tells you about deadlock scenerios that you haven't even encoutered yet. It shows bugs for parallel code path sequences that have not even occurred yet. But I think you're beyond the point of being able to see why lockdep is valuable. So why don't we just move on. --
> The point is that this is a runtime evaluation of lock orders, if > runtime isn't the lucky one that reproduces the deadlock, it'll find > nothing at all. I think the point you miss is that lockdep can report a potential deadlock, even if the deadlock does not actually occur. For example suppose there is an AB-BA deadlock somewhere. For this to actually trigger, we have to have one CPU running the AB code path at exactly the moment another CPU runs the BA code path, with the right timing so one CPU holds A and tries to grab B while the other CPU already holds B. With lockdep, we just have to have the AB code path run once at any point, and then the BA code path run at any later time (even days after the AB code path has released all the locks). And then we get a warning dump that explains the exact potential deadlock. - R. --
Thanks a lot for the detailed explanation of check_noncircular. I agree check_noncircular is surely a good argument not to get rid of prove-locking as a whole. But check_noncircular is also a red-herring in this context. It's not check_noncircular trapping here, check_deadlock traps with false positives instead. The question is what are those false positives buying us? To avoid a developer to press sysrq+p or break on kgdb? Let's focus on check_deadlock->print_deadlock_bug and somebody who's not beyond the point please explain what print_deadlock_bug reports that does not actually occur and why it's a good idea to change the common code to accommodate for its false positives instead of getting rid of it for good. --
> Let's focus on check_deadlock->print_deadlock_bug and somebody who's
> not beyond the point please explain what print_deadlock_bug reports
> that does not actually occur and why it's a good idea to change the
> common code to accommodate for its false positives instead of getting
> rid of it for good.
check_deadlock operates on classes of locks, so it can warn about
potential deadlocks, eg if we have
foo(obj1, obj2)
{
lock(obj1);
lock(obj2);
...
then foo(obj, obj); is a deadlock but lockdep can warn about foo(obj,
different_obj) without triggering the deadlock in reality. Of course
this leads to false positives, and we sometimes have to change correct
code to help lockdep, but usually such rewriting leads to simpler
clearer better locking anyway.
- R.
--
It surely doesn't lead to simpler and clearer better locking in the case we're discussing here, and I don't know of other cases where it leads to better locking but feel free to make examples. --
Lockdep doesn't look at individual lock instances, but instead works on lock classes. The rq locks used to have a separate class each, but the third patch in this series changes that. The advantage of not working on lock instances is that the dependency graph becomes much smaller, and it will report problems much earlier, the down-side is that you will have to augment the information a little. Normally locks are grouped in classes based on their init site; eg. every instance initialized by a particular spin_lock_init() will belong to the same class. Besides this grouping, you can use things like spin_lock_nested(), which can be used to annotate simple recursion (up to 8 sub-classes). Then there is the possibility to explicitly create classes and assign locks to classes using lockdep_set_class(). Either way - there is a limit of MAX_LOCK_DEPTH (48) locks that can be tracked per task (including nesting interrupts). Making that dynamic is rather challenging - as it would involve He reduced the search complexity of the graph, look at the first patch in this series. --
From: Peter Zijlstra <a.p.zijlstra@chello.nl> Peter, your patches are just as effective at making all of my 64-cpu, 128-cpu, etc. machines usable with lockdep. --
OK, so the expected usage is:
spin_lock(&outer_lock);
/* take in any order */
spin_lock_nest_lock(&inner[0], &outer_lock);
spin_lock_nest_lock(&inner[2], &outer_lock);
spin_lock_nest_lock(&inner[1], &outer_lock);
...
?
And it's OK to
1. take inner locks one at a time without holding the outer lock
2. use plain spin_lock on inner locks when you're taking them one at
a time, and
3. release the outer lock before releasing the inner locks
but it's not OK to try to use different outer locks for a given inner lock.
Yes?
--
Yes (there it no requirement that the outer lock is a spinlock_t, just Only if you then release the inner locks in the reverse order you took them - the nested release code (releasing a lock that is not on the top of the stack) basically pops and pushes all the locks, the push will It doesn't validate this part - as with most lockdep annotations you can annotate away real deadlocks. --
OK. I don't actually need to do this, but I was asking for
completeness. But to clarify, you only need to do the reverse unlock if
you do it after unlocking the outer lock? If you're still holding the
Hm, OK.
J
--
Release order should always be totally irrelevant, whether you hold outer locks or not. Only the order of _getting_ locks matter. And yes, if there is an outer lock, even the order of getting locks is irrelevant, as long as anybody who gets more than one inner lock always holds the outer one. Linus --
I agree, its just non-trivial to convince lockdep of this. I can give it a go though. --
Right. But Peter said lockdep was fussy about it for some reason. If
the point of the exercise is to eliminate spurious warnings, it would be
nice to avoid new ones in the process...
J
--
But I need to disagree on a programming practice style. Unlocking locks in a non nested order is just bad programming practice. Unless there is a good reason to do so, one should never release locks in a non reverse order they were taken. This can be a source of bugs, where people might notice an outer lock being released and think the inner locks were too. Lately the kernel has been going through a lot of clean ups that have been making the kernel a much more maintainable beast. I feel we should enforce the rule of unlocking order (again, unless there is a good reason not to). Not for a technical reason, but just for a more maintainable one. -- Steve --
No it is not. Unlocking locks in non-nested order can sometimes be very much the rigth thing to do, and thinking otherwise is (a) naive and (b) can generate totally unnecessary and pointless bugs. The thing is, sometimes you have to do it, and imposing totally made-up rules ("unlocks have to nest") just confuses everybody. The FACT is, that unlocks do not have to nest cleanly. That's a rock solid *FACT*. The locking order matters, and the unlocking order does not. And if you cannot accept that as a fact, and you then say "unlock order should matter just to keep things nice and clean", then you end up being screwed and/or confused when you can't hold to the unlock order. There are many perfectly valid reasons not to unlock in reverse order. Don't create make-believe rules that break those reasons for no gain. For example: - let's say that you have a singly-linked list of objects. - you need to lock all objects, do something, and then unlock all objects. - the *only* sane way to do that is to just traverse the list twice. - that means that you will unlock the objects in the same order you locked them, _not_ in reverse ("nested") order. - if you force a rule of "unlocks must be nested", then YOU ARE A F*CKING MORON. It's that simple. Don't do made-up rules that have no technical reason for them. Lock ordering matters. Unlock ordering does not. It really is that simple. Don't confuse the issue by claiming anything else. Linus --
Unfortunately, you cut out my comment that I stated "unless there is a
good reason not to",
Yes, I totally agree that there are good reasons not to unlock in
reverse order, and the example
you gave happens to be one of them.
I just find that seeing something like:
lock(A);
lock(B);
[do something]
unlock(A);
unlock(B);
just seems to be sloppy.
I wont harp on this, it only came up in conversation in which someone
pointed out your
post.
-- Steve
--
Well, teh thing is, as long as you think nesting is good, you end up being Of course you'll often get nesting almost by mistake. For example, any normal locking that is hierarchical tends to nest naturally, especially if you end up using lots of small functions (which you should). Or simply due to error handling. So in that sense, nesting may be a "natural" thing, but partly exactly _because_ it is fairly natural, we should not try to make it even more than that, because then when things don't nest (and it does happen), if the "things should nest" camp gets too strong, bugs happen. They happen because somebody looks at the non-nesting case, and thinks it's sloppy, and tries to fix it. And in the process just complicates matters - and very likely introduces bugs. And THAT is why people shouldn't think that locks "should" nest. Because they shouldn't. They often do, but we're better off making people very aware of the fact that 'often' isn't 'good design', it's just a matter of 'it happens in practice' rather than 'it should be that way'. I know for a fact that some people thought unlocking in non-nested order was a bug. And I believe that belief is a dangerous one. Linus --
Ah, OK. You are fighting against nesting nazis, fair enough. I have written a bit of code where nesting was not possible (similar to your example, but I call those traversal locking not nesting). I just find that the locks should be nested when the nesting is natural. Breaking the nesting on natural nesting locks is a bug, IMHO. But as you know, there are several programmers out there that can not determine the difference between natural nesting locks and non nesting locks. By adding such a rule, those that can not tell the difference will be making a lot of needless noise, hence, it is best not to make any such rule. Lesson learned. I'll now go back to debugging my code. -- Steve --
An outer one might be more likely to be contended, so you might want to release it asap. Other times, you have lock A and lock B held (like scheduler rqs). You can say unlock(A); unlock(B); I don't really think it would make things more maintainable, FWIW. --
I actually did come across one bug in my lifetime where the out of nesting order of unlocks caused a bug. IIRC, it had to do (as Linus mentioned) with lots of little functions that required locking. One of these functions was between the out of order unlocking and was taking another inner lock. I don't remember the exact details, but it was something that made me try to nest locks and unlocks nicely when possible. And as Linus pointed out, there are several cases where it just doesn't make sense to nest. -- Steve --
Please fold (or tell me to provide a new patch)
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
diff -u linux-2.6/include/linux/spinlock.h linux-2.6/include/linux/spinlock.h
--- linux-2.6/include/linux/spinlock.h
+++ linux-2.6/include/linux/spinlock.h
@@ -185,8 +185,8 @@
# define spin_lock_nested(lock, subclass) _spin_lock_nested(lock, subclass)
# define spin_lock_nest_lock(lock, nest_lock) \
do { \
- typecheck(struct lockdep_map *, &nest_lock->dep_map); \
- _spin_lock_nest_lock(lock, &nest_lock->dep_map); \
+ typecheck(struct lockdep_map *, &(nest_lock)->dep_map);\
+ _spin_lock_nest_lock(lock, &(nest_lock)->dep_map); \
} while (0)
#else
# define spin_lock_nested(lock, subclass) _spin_lock(lock)
diff -u linux-2.6/kernel/spinlock.c linux-2.6/kernel/spinlock.c
--- linux-2.6/kernel/spinlock.c
+++ linux-2.6/kernel/spinlock.c
@@ -323,7 +323,7 @@
LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
}
-EXPORT_SYMBOL(_spin_lock_nest_lock)
+EXPORT_SYMBOL(_spin_lock_nest_lock);
#endif
--- linux-2.6.orig/include/linux/spinlock_api_smp.h
+++ linux-2.6/include/linux/spinlock_api_smp.h
@@ -22,6 +22,8 @@ int in_lock_functions(unsigned long addr
void __lockfunc _spin_lock(spinlock_t *lock) __acquires(lock);
void __lockfunc _spin_lock_nested(spinlock_t *lock, int subclass)
__acquires(lock);
+void __lockfunc _spin_lock_nest_lock(spinlock_t *lock, struct lockdep_map *map)
+ __acquires(lock);
void __lockfunc _read_lock(rwlock_t *lock) __acquires(lock);
void __lockfunc _write_lock(rwlock_t *lock) __acquires(lock);
void __lockfunc _spin_lock_bh(spinlock_t *lock) __acquires(lock);
--
