Hi Ingo,
I am seeing a problem on the latest -rt where lockdep completely overwhelms
the system to the point that it grinds to a halt on large (8-way+) systems.
The problem seems to be that the class->locks_before and locks_after grow
unbounded (I have observed over 1M+ entries in them) so a lock_acquire call
can take over 10 seconds to finish resolving. Related to this seems to be
that lockdep appears to see a chain-hash miss over and over for what I would
assume should be an established graph (for instance, in
double_lock_balance() in an rt_overload condition). Turning off
PROVE_LOCKING (statically, or by setting debug_locks=0 dynamically restores
the system to normal behavior.I took some time tonight to study lockdep (it is quite an impressive body of
code!), and came up with the following "fix". It does improve things
significantly by addressing what I believe is the issue with the
cache-misses (though it would appear there are still a few more issues
there that need addressing as some boots are still very lethargic). I use
the term "fix" loosely since I am not confident that I fully understand the
intention of your logic here so I can't say for sure if it was really
broken, or if I have made it worse ;)Could you comment on what I have done here, or offer any advice on what to
look for elsewhere? I based the patch on pure linux-2.6.git since I see the
same issue (by visual inspection, that is) there as well.Thanks in advance!
-Greg------
LOCKDEP: fix mismatched lockdep_depth/curr_chain_hash
It is possible for the current->curr_chain_key to become inconsistent with the
current index if the chain fails to validate. The end result is that future
lock_acquire() operations may inadvertently fail to find a hit in the cache
resulting in a new node being added to the graph for every acquire.Signed-off-by: Gregory Haskins <ghaskins@novell.com>
---kernel/lockdep.c | 2 +-
1 files changed, 1 ins...
Stable team,
please consider this patch for the next 22-stable.
---
Subject: lockdep: fix mismatched lockdep_depth/curr_chain_hash
From: Gregory Haskins <ghaskins@novell.com>It is possible for the current->curr_chain_key to become inconsistent with the
current index if the chain fails to validate. The end result is that future
lock_acquire() operations may inadvertently fail to find a hit in the cache
resulting in a new node being added to the graph for every acquire.[ peterz: this might explain some of the lockdep is so _slow_ complaints. ]
[ mingo: this does not impact the correctness of validation, but may slow
down future operations significantly, if the chain gets very long. ]Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/lockdep.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)Index: linux-2.6/kernel/lockdep.c
===================================================================
--- linux-2.6.orig/kernel/lockdep.c
+++ linux-2.6/kernel/lockdep.c
@@ -2166,7 +2166,6 @@ out_calc_hash:
}
#endif
chain_key = iterate_chain_key(chain_key, id);
- curr->curr_chain_key = chain_key;/*
* Trylock needs to maintain the stack of held locks, but it
@@ -2215,6 +2214,7 @@ out_calc_hash:
if (unlikely(!debug_locks))
return 0;+ curr->curr_chain_key = chain_key;
curr->lockdep_depth++;
check_chain_key(curr);
#ifdef CONFIG_DEBUG_LOCKDEP-
I don't see this patch in Linus's upstream tree. We need it there to be
able to accept it for -stable. Or is this just a bugfix of other things
that are already in his tree?thanks,
-
I send Linus a similar patch, haven´t seem him pick it up yet.
I´ll notify you when and if he picks it up.-
Great, that would be great for us -stable monkeys...
greg k-h
-
3aa416b07f0adf01c090baab26fb70c35ec17623
-
And for both 2.6.22 and 2.6.23...
-
It does not apply to 2.6.22 at all, so unless someone sends us a
backported version, I'll not apply it there.thanks,
greg k-h
-
| Andrew Morton | -mm merge plans for 2.6.23 |
| Chuck Ebbert | Why do so many machines need "noapic"? |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| Greg Kroah-Hartman | [PATCH 023/196] MCP_UCB1200: Convert from class_device to device |
git: | |
| David Miller | Re: [BUG] New Kernel Bugs |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 31/37] dccp: Remove manual influence on NDP Count feature |
| Gregory Haskins | [RFC PATCH 00/17] virtual-bus |
