Re: [stable] [PATCH] lockdep: fix mismatched lockdep_depth/curr_chain_hash

Previous thread: [PATCH] LOCKDEP: fix mismatched lockdep_depth/curr_chain_hash by Gregory Haskins on Friday, October 5, 2007 - 12:22 am. (2 messages)

Next thread: [PATCH RFC 1/2] IRQ: Modularize the setup_irq code (1) by Ahmed S. Darwish on Friday, October 5, 2007 - 1:30 am. (2 messages)
To: <mingo@...>
Cc: <linux-kernel@...>, <linux-rt-users@...>, <ghaskins@...>
Date: Friday, October 5, 2007 - 12:03 am

Hi Ingo,
I am seeing a problem on the latest -rt where lockdep completely overwhelms
the system to the point that it grinds to a halt on large (8-way+) systems.
The problem seems to be that the class->locks_before and locks_after grow
unbounded (I have observed over 1M+ entries in them) so a lock_acquire call
can take over 10 seconds to finish resolving. Related to this seems to be
that lockdep appears to see a chain-hash miss over and over for what I would
assume should be an established graph (for instance, in
double_lock_balance() in an rt_overload condition). Turning off
PROVE_LOCKING (statically, or by setting debug_locks=0 dynamically restores
the system to normal behavior.

I took some time tonight to study lockdep (it is quite an impressive body of
code!), and came up with the following "fix". It does improve things
significantly by addressing what I believe is the issue with the
cache-misses (though it would appear there are still a few more issues
there that need addressing as some boots are still very lethargic). I use
the term "fix" loosely since I am not confident that I fully understand the
intention of your logic here so I can't say for sure if it was really
broken, or if I have made it worse ;)

Could you comment on what I have done here, or offer any advice on what to
look for elsewhere? I based the patch on pure linux-2.6.git since I see the
same issue (by visual inspection, that is) there as well.

Thanks in advance!
-Greg

------

LOCKDEP: fix mismatched lockdep_depth/curr_chain_hash

It is possible for the current->curr_chain_key to become inconsistent with the
current index if the chain fails to validate. The end result is that future
lock_acquire() operations may inadvertently fail to find a hit in the cache
resulting in a new node being added to the graph for every acquire.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
---

kernel/lockdep.c | 2 +-
1 files changed, 1 ins...

To: Gregory Haskins <ghaskins@...>, stable <stable@...>
Cc: <mingo@...>, linux-kernel <linux-kernel@...>
Date: Friday, October 5, 2007 - 5:31 am

Stable team,

please consider this patch for the next 22-stable.

---
Subject: lockdep: fix mismatched lockdep_depth/curr_chain_hash
From: Gregory Haskins <ghaskins@novell.com>

It is possible for the current->curr_chain_key to become inconsistent with the
current index if the chain fails to validate. The end result is that future
lock_acquire() operations may inadvertently fail to find a hit in the cache
resulting in a new node being added to the graph for every acquire.

[ peterz: this might explain some of the lockdep is so _slow_ complaints. ]
[ mingo: this does not impact the correctness of validation, but may slow
down future operations significantly, if the chain gets very long. ]

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/lockdep.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/kernel/lockdep.c
===================================================================
--- linux-2.6.orig/kernel/lockdep.c
+++ linux-2.6/kernel/lockdep.c
@@ -2166,7 +2166,6 @@ out_calc_hash:
}
#endif
chain_key = iterate_chain_key(chain_key, id);
- curr->curr_chain_key = chain_key;

/*
* Trylock needs to maintain the stack of held locks, but it
@@ -2215,6 +2214,7 @@ out_calc_hash:
if (unlikely(!debug_locks))
return 0;

+ curr->curr_chain_key = chain_key;
curr->lockdep_depth++;
check_chain_key(curr);
#ifdef CONFIG_DEBUG_LOCKDEP

-

To: Peter Zijlstra <peterz@...>
Cc: Gregory Haskins <ghaskins@...>, stable <stable@...>, <mingo@...>, linux-kernel <linux-kernel@...>
Date: Monday, October 8, 2007 - 1:24 pm

I don't see this patch in Linus's upstream tree. We need it there to be
able to accept it for -stable. Or is this just a bugfix of other things
that are already in his tree?

thanks,

-

To: Greg KH <greg@...>
Cc: Gregory Haskins <ghaskins@...>, stable <stable@...>, <mingo@...>, linux-kernel <linux-kernel@...>
Date: Monday, October 8, 2007 - 1:36 pm

I send Linus a similar patch, haven´t seem him pick it up yet.
I´ll notify you when and if he picks it up.

-

To: Peter Zijlstra <peterz@...>
Cc: Gregory Haskins <ghaskins@...>, stable <stable@...>, <mingo@...>, linux-kernel <linux-kernel@...>
Date: Monday, October 8, 2007 - 1:39 pm

Great, that would be great for us -stable monkeys...

greg k-h
-

To: Greg KH <greg@...>
Cc: Gregory Haskins <ghaskins@...>, stable <stable@...>, <mingo@...>, linux-kernel <linux-kernel@...>
Date: Thursday, October 25, 2007 - 1:48 pm

3aa416b07f0adf01c090baab26fb70c35ec17623

-

To: Peter Zijlstra <peterz@...>
Cc: Greg KH <greg@...>, Gregory Haskins <ghaskins@...>, stable <stable@...>, <mingo@...>, linux-kernel <linux-kernel@...>
Date: Thursday, October 25, 2007 - 2:55 pm

And for both 2.6.22 and 2.6.23...

-

To: Chuck Ebbert <cebbert@...>
Cc: Peter Zijlstra <peterz@...>, Gregory Haskins <ghaskins@...>, stable <stable@...>, <mingo@...>, linux-kernel <linux-kernel@...>
Date: Wednesday, October 31, 2007 - 10:37 am

It does not apply to 2.6.22 at all, so unless someone sends us a
backported version, I'll not apply it there.

thanks,

greg k-h
-

Previous thread: [PATCH] LOCKDEP: fix mismatched lockdep_depth/curr_chain_hash by Gregory Haskins on Friday, October 5, 2007 - 12:22 am. (2 messages)

Next thread: [PATCH RFC 1/2] IRQ: Modularize the setup_irq code (1) by Ahmed S. Darwish on Friday, October 5, 2007 - 1:30 am. (2 messages)