This fixes a lockdep warning when invoking destroy_workqueue(), because the lockdep annotations are invoked under cpu_add_remove_lock. So, move the lockdep annotations before taking cpu_add_remove_lock in destroy_workqueue(), this will not affect the original purpose of adding them for destroy_workqueue() etc. However, it will affect another caller of cleanup_workqueue_thread(), that is, workqueue_cpu_callback(). This should be fine, because there are no other cases than cpu hotplug could call it. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Rusty Russell <rusty@rustcorp.com.au> --- diff --git a/kernel/workqueue.c b/kernel/workqueue.c index dee4865..0f050e2 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1025,9 +1025,6 @@ static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq) if (cwq->thread == NULL) return; - lock_map_acquire(&cwq->wq->lockdep_map); - lock_map_release(&cwq->wq->lockdep_map); - flush_cpu_workqueue(cwq); /* * If the caller is CPU_POST_DEAD and cwq->worklist was not empty, @@ -1055,6 +1052,9 @@ void destroy_workqueue(struct workqueue_struct *wq) const struct cpumask *cpu_map = wq_cpu_map(wq); int cpu; + lock_map_acquire(&wq->lockdep_map); + lock_map_release(&wq->lockdep_map); + cpu_maps_update_begin(); spin_lock(&workqueue_lock); list_del(&wq->list); --
OK, but nobody should take cpu_maps_update_begin() under wq->lockdep_map, in particular work->func() must not. I must have missed something, but it seems to me this patch tries to supress the valid warning. Could you please clarify? Oleg. --
Because lockdep annotations are added to prevent other locks are taken
Sure, below is the whole warning. Please teach me how this is valid.
Thanks.
Mar 31 16:15:00 dhcp-66-70-5 kernel: bonding: bond0: released all slaves
Mar 31 16:15:00 dhcp-66-70-5 kernel:
Mar 31 16:15:00 dhcp-66-70-5 kernel: =======================================================
Mar 31 16:15:00 dhcp-66-70-5 kernel: [ INFO: possible circular locking dependency detected ]
Mar 31 16:15:00 dhcp-66-70-5 kernel: 2.6.34-rc3 #90
Mar 31 16:15:00 dhcp-66-70-5 kernel: -------------------------------------------------------
Mar 31 16:15:00 dhcp-66-70-5 kernel: modprobe/5264 is trying to acquire lock:
Mar 31 16:15:00 dhcp-66-70-5 kernel: ((bond_dev->name)){+.+...}, at: [<ffffffff8108524a>] cleanup_workqueue_thread+0x2b/0x10b
Mar 31 16:15:00 dhcp-66-70-5 kernel:
Mar 31 16:15:00 dhcp-66-70-5 kernel: but task is already holding lock:
Mar 31 16:15:00 dhcp-66-70-5 kernel: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810631d1>] cpu_maps_update_begin+0x1e/0x27
Mar 31 16:15:00 dhcp-66-70-5 kernel:
Mar 31 16:15:00 dhcp-66-70-5 kernel: which lock already depends on the new lock.
Mar 31 16:15:01 dhcp-66-70-5 kernel:
Mar 31 16:15:01 dhcp-66-70-5 kernel:
Mar 31 16:15:01 dhcp-66-70-5 kernel: the existing dependency chain (in reverse order) is:
Mar 31 16:15:01 dhcp-66-70-5 kernel:
Mar 31 16:15:01 dhcp-66-70-5 kernel: -> #3 (cpu_add_remove_lock){+.+.+.}:
Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff810a6bc1>] validate_chain+0x1019/0x1540
Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff810a7e75>] __lock_acquire+0xd8d/0xe55
Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff810aa3a4>] lock_acquire+0x160/0x1af
Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff815523f8>] mutex_lock_nested+0x64/0x4e9
Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff810631d1>] cpu_maps_update_begin+0x1e/0x27
Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff810853cd>] destroy_workqueue+0x41/0x107
Mar 31 16:15:02 ...Hello, guys. I still have some trouble interpreting lockdep warnings. Please This (cpu hotplug -> wq) is the expected sequence. Plug cpu This is bond_uninit() calling destroy_workqueue() but I don't get how These two are form a workqueue worker thread and I don't quite This seems to be from the original thread of frame#3. It's grabbing wq lock here but the problem is that the lock will be released immediately, so bond_dev->name (the wq) can't be held by the time it reaches frame#3. How is this dependency chain completed? Is it Isn't there a circular dependency here? bonding_exit() calls destroy_workqueue() under rtnl_mutex but destroy_workqueue() should flush works which could be trying to grab rtnl_lock. Or am I completely misunderstanding locking here? Thanks. -- tejun --
wq lock is held *after* cpu_add_remove_lock, lockdep also said this, Sure, that is why I sent another patch for bonding. :) After this patch, another lockdep warning appears, it is exactly what you expect. Thanks. --
Hello, Yeah yeah, I'm just failing to see how the other direction is completed. ie. where does the kernel try to grab cpu_add_remove_lock Hmmm... can you please try to see whether this circular locking warning involving wq->lockdep_map is reproducible w/ the bonding locking fixed? I still can't see where wq -> cpu_add_remove_lock dependency is created. Thanks. -- tejun --
I thought this is obvious.
Here it is:
void destroy_workqueue(struct workqueue_struct *wq)
{
const struct cpumask *cpu_map = wq_cpu_map(wq);
int cpu;
cpu_maps_update_begin(); <----------------- Hold cpu_add_remove_lock here
spin_lock(&workqueue_lock);
list_del(&wq->list);
spin_unlock(&workqueue_lock);
for_each_cpu(cpu, cpu_map)
cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq, cpu)); <------ See below
cpu_maps_update_done(); <----------------- Release cpu_add_remove_lock here
...
static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq)
{
/*
* Our caller is either destroy_workqueue() or CPU_POST_DEAD,
* cpu_add_remove_lock protects cwq->thread.
*/
if (cwq->thread == NULL)
return;
lock_map_acquire(&cwq->wq->lockdep_map); <-------------- Lockdep complains here.
lock_map_release(&cwq->wq->lockdep_map);
...
Am I missing something??
Thanks.
--
Hello, Yeap, the above is cpu_add_remove_lock -> wq->lockdep_map dependency. I can see that but I'm failing to see where the dependency the other direction is created. Thanks. -- tejun --
Hmm, it looks like I misunderstand lock_map_acquire()? From the changelog, I thought it was added to complain its caller is holding a lock when invoking it, thus cpu_add_remove_lock is not an exception. Thanks! --
Oh, I see, wq->lockdep_map is acquired again in run_workqueue(), so I was wrong. :) I think you and Oleg are right, the lockdep warning is not irrelevant. Sorry for the noise, ignore this patch please. Thanks. --
Oops, typo, I meant "is irrelevant." ;) --
Hello, Oh, that just tells the code is trying to grab a pseudo lock. It's not really a lock but to lockdep it looks like one and lockdep can use Yeah, I think the circular dependency you reported on wq->lockdep_map is completed only through dependency through rtnl_mutex. If you fix rtnl_mutex locking, it should go away too. Thanks. -- tejun --
Oh, I can never understand the output from lockdep, it is much more clever than me ;) OK, so work->func() takes rtnl_mutex. This means it is not safe to do flush_workqueue() or destroy_workqueue() However, rtnl_link_unregister() takes rtnl_mutex and then bond_uninit() does cleanup_workqueue_thread(). So, looks like this warning is valid, this path can deadlock if destroy_workqueue() is called when bond->mii_work is queued. Lockdep decided to blaim cpu_add_remove_lock in this chain. Oleg. --
Yes, this is what makes me confused. ;) Thanks! --
