Re: [2.6.33-rc5] Weird deadlock when shutting down

Previous thread: [PATCH] time: Cleanup direct xtime usage in xen by John Stultz on Thursday, January 28, 2010 - 9:24 pm. (1 message)

Next thread: [RFC 0/2 PATCH] remove member rt_se from struct rt_rq by Yong Zhang on Thursday, January 28, 2010 - 11:56 pm. (4 messages)
From: Benjamin Herrenschmidt
Date: Thursday, January 28, 2010 - 9:41 pm

Hi Ingo !

Johannes and I see this on our quad G5s... it -could- be similar to
one reported a short while ago by Xiaotian Feng <xtfeng@gmail.com>
under the subject [2.6.33-rc4] sysfs lockdep warnings on cpu hotplug.
 
Basically, the machine deadlocks right after printing the following
when doing a shutdown:

halt/4071 is trying to acquire lock:
 (s_active){++++.+}, at: [<c0000000001ef868>] .sysfs_addrm_finish+0x58/0xc0

but task is already holding lock:
 (&per_cpu(cpu_policy_rwsem, cpu)){+.+.+.}, at: [<c0000000004cd6ac>] .lock_policy_rwsem_write+0x84/0xf4

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

<nothing else ... machine deadlocked here>

Any idea ?

Cheers,
Ben.


--

From: Johannes Berg
Date: Thursday, February 18, 2010 - 2:36 am

This is still happening with -rc8. Any news?

johannes

--

From: Linus Torvalds
Date: Thursday, February 18, 2010 - 9:31 am

You don't have a full backtrace for these things?

We've had lots of trouble with the cpu governors, and I suspect the 
problem isn't new, but the lockdep warning is likely new (see commit 
846f99749ab68bbc7f75c74fec305de675b1a1bf: "sysfs: Add lockdep annotations 
for the sysfs active reference").

So it is likely to be an old issue that (a) now gets warned about and (b) 
might have had timing changes enough to trigger it.

I suspect it is G5-specific (or specific to whatever CPU frequency code 
that gets used there), since I think we'd have had lots of reports if this 
happened on x86.

		Linus
--

From: Johannes Berg
Date: Thursday, February 18, 2010 - 11:45 am

Well, it used to not deadlock and actually shut down the machine :) So
in that sense it's definitely new. It might have printed a lockdep
warning before, which you wouldn't normally see since the machine turns

Yeah, that's puzzling me as well.

johannes
From: =?UTF-8?Q?Am=C3=A9rico_Wang?=
Date: Saturday, February 20, 2010 - 12:13 am

On Fri, Feb 19, 2010 at 2:45 AM, Johannes Berg

Does my following untested patch help?

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>

---------
From: Johannes Berg
Date: Saturday, February 20, 2010 - 1:56 am

Sorry, no. I'll hook up a screen to the box after I return from the
fresh market.

johannes
From: =?UTF-8?Q?Am=C3=A9rico_Wang?=
Date: Saturday, February 20, 2010 - 2:06 am

On Sat, Feb 20, 2010 at 4:56 PM, Johannes Berg

Are you sure there is no difference? :-/

Also, could you please also apply the 4 patches from Eric?

You can get them here:
http://lkml.org/lkml/2010/2/11/334

Thanks much for your testing!
--

From: Johannes Berg
Date: Saturday, February 20, 2010 - 3:53 am

No ... could be a different deadlock now :) Not sure how likely that is

Will do.

johannes
From: Johannes Berg
Date: Saturday, February 20, 2010 - 4:28 am

It deadlocks after

Disabling non-boot CPUs ...

Not sure if that counts as a difference...

johannes
From: Johannes Berg
Date: Saturday, February 20, 2010 - 5:07 am

I suspect the BUG: key not in data! thing I get now disables lockdep (it
seems to be mostly due to module loading btw) and then I don't get any
output here.

Seems it's all busted.

johannes
From: =?utf-8?Q?Am=C3=A9rico?= Wang
Date: Saturday, February 20, 2010 - 6:44 am

I am not sure neither...

That message is displayed before shutting down the devices.

To verify, you can add some printk() in the end of
__cpufreq_remove_dev(), or enable CONFIG_CPU_FREQ_DEBUG.

Thanks!
--

From: Johannes Berg
Date: Saturday, February 20, 2010 - 6:57 am

That is already enabled.

johannes

--

From: =?UTF-8?Q?Am=C3=A9rico_Wang?=
Date: Sunday, February 21, 2010 - 2:51 am

On Sat, Feb 20, 2010 at 9:57 PM, Johannes Berg

Ok, I got it.

Could you test the patch below? Thanks!

----------------->

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
From: Johannes Berg
Date: Sunday, February 21, 2010 - 3:17 am

No change, sorry, still hangs right after "Disabling non-boot CPUs ..."
before the machine turns off.

johannes
From: =?UTF-8?Q?Am=C3=A9rico_Wang?=
Date: Monday, February 22, 2010 - 1:19 am

On Sun, Feb 21, 2010 at 6:17 PM, Johannes Berg

Oh, I see, then this will be another problem.

My previous patch is to fix the cpufreq lockdep warning mentioned
in Benjamin's report, so this hang should be caused by other problem,
not the cpufreq lockdep problem.

Thanks.
--

From: Johannes Berg
Date: Monday, February 22, 2010 - 1:39 am

Right, sounds like -- and I haven't seen that lockdep report during
shutdown any more.

johannes
From: Johannes Berg
Date: Sunday, February 21, 2010 - 3:43 am

Incidentally, the machine also freezes hard without any output at all if
I "echo 0 > /sys/.../cpu1/online".

johannes
From: Xiaotian Feng
Date: Sunday, February 21, 2010 - 3:55 am

On Sun, Feb 21, 2010 at 6:43 PM, Johannes Berg

It might be nothing related with cpufreq. I think there's something
wrong during the _cpu_down path.
put more debug printks into _cpu_down(), if we can find kernel is
--

From: Xiaotian Feng
Date: Sunday, February 21, 2010 - 4:12 am

and it looks like this breakage is only seen on powerMac G5, so it
might be arch specific. Maybe some commit in powermac breaks G5's
--

From: Johannes Berg
Date: Sunday, February 21, 2010 - 4:14 am

Hmm, not sure ... it seems to be in __stop_machine(), in this code:

        printk("got cpu\n");
        for_each_online_cpu(i) {
                sm_work = per_cpu_ptr(stop_machine_work, i);
                INIT_WORK(sm_work, stop_cpu);
                queue_work_on(i, stop_machine_wq, sm_work);
        }
        /* This will release the thread on our CPU. */
        put_cpu();
        printk("put cpu\n");


which is weird... the "got cpu" printk is the last thing I see.

johannes
From: Johannes Berg
Date: Sunday, February 21, 2010 - 4:22 am

As odd as that may be, it hangs in put_cpu() here.

johannes

--

From: =?UTF-8?Q?Am=C3=A9rico_Wang?=
Date: Monday, February 22, 2010 - 1:34 am

On Sun, Feb 21, 2010 at 7:22 PM, Johannes Berg

Hmm, does adding synchronize_sched() in _cpu_down() help?

Something like this:

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 677f253..681f5c5 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -228,6 +228,7 @@ static int __ref _cpu_down(unsigned int cpu, int
tasks_frozen)
        cpumask_copy(old_allowed, &current->cpus_allowed);
        set_cpus_allowed_ptr(current, cpu_active_mask);

+       synchronize_sched();
        err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
        if (err) {
                set_cpu_active(cpu, true);
--

From: Johannes Berg
Date: Monday, February 22, 2010 - 2:04 am

No luck.

johannes

--

From: =?UTF-8?Q?Am=C3=A9rico_Wang?=
Date: Monday, February 22, 2010 - 2:12 am

On Mon, Feb 22, 2010 at 5:04 PM, Johannes Berg

Ok, thanks.

Since it hangs in put_cpu() which is just preempt_enable(), so I began
to suspect if we need a synchronize_sched(), or some barrier perhaps.
I am not sure at all.

Before other experts look at this, I think doing a bisect would be very
useful.

Again, thanks for your testing!
--

From: Johannes Berg
Date: Monday, February 22, 2010 - 2:14 am

I was afraid you'd say that, it'll take forever though since I need to
walk over to it after every shutdown, see if it turned off and then turn
it on again (and possibly off).... I guess I'll get started on that.

johannes

--

From: =?UTF-8?Q?Am=C3=A9rico_Wang?=
Date: Monday, February 22, 2010 - 2:21 am

On Mon, Feb 22, 2010 at 5:14 PM, Johannes Berg

No problem.

Feel free to wait for the experts like Linus taking this. Leave as it is. ;)

Thanks.
--

From: =?UTF-8?Q?Am=C3=A9rico_Wang?=
Date: Monday, February 22, 2010 - 1:22 am

Maybe not, there could be some generic code bug that will only be
exposed on a specific arch.

Thanks.
--

From: Dave Young
Date: Saturday, February 20, 2010 - 12:42 am

On Fri, Feb 19, 2010 at 2:45 AM, Johannes Berg

before shutdown, you can:
echo N > /proc/sys/kernel/printk_delay



-- 
Regards
dave
--

From: Johannes Berg
Date: Saturday, February 20, 2010 - 1:46 am

Well if I understand Américo correctly then it won't have printed
anything (even if it were to deadlock) before adding the lockdep
annotations to s_active, so I guess that theory is out.

johannes
From: =?UTF-8?Q?Am=C3=A9rico_Wang?=
Date: Friday, February 19, 2010 - 9:46 pm

On Fri, Feb 19, 2010 at 12:31 AM, Linus Torvalds

Right.

This is a real deadlock case found by lockdep added to s_active.

The problem is that we did kobject_put(&data->kobj) while holding policy_rwsem
which is used to protect 'data'. It is not so easy to fix this,
probably we need to
do more work on cpufreq code.

Thanks.
--

From: Johannes Berg
Date: Saturday, February 20, 2010 - 1:45 am

But it doesn't make sense that it's just an existing real deadlock that
is now found -- it never occurred previously!

Anyway, I'll try your patch, thanks for that.

johannes
From: =?UTF-8?Q?Am=C3=A9rico_Wang?=
Date: Saturday, February 20, 2010 - 2:30 am

On Thu, Feb 18, 2010 at 5:36 PM, Johannes Berg

Hey, johannes

Not sure if you made some mistake here, the one you report here [1]
is _not_ the same with this one reported by Benjamin.

Please make sure what you are talking about here is the same one.

Thanks.

1. http://lkml.org/lkml/2010/2/18/33
--

From: Johannes Berg
Date: Saturday, February 20, 2010 - 3:52 am

I'm talking about the problem Ben reported -- that one is completely
different. Was your patch supposed to address _that_ one?

johannes

From: David Howells
Date: Thursday, March 25, 2010 - 4:28 am

I see this now, with full backtrace:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.34-rc2-cachefs #115
-------------------------------------------------------
halt/2291 is trying to acquire lock:
 (s_active#31){++++.+}, at: [<ffffffff81104950>] sysfs_addrm_finish+0x31/0x5a

but task is already holding lock:
 (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}, at: [<ffffffff812a3a92>] lock_policy_rwsem_write+0x4a/0x7b

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}:
       [<ffffffff81053da2>] __lock_acquire+0x1343/0x16cd
       [<ffffffff81054183>] lock_acquire+0x57/0x6d
       [<ffffffff813637b6>] down_write+0x3f/0x62
       [<ffffffff812a3a92>] lock_policy_rwsem_write+0x4a/0x7b
       [<ffffffff812a3b36>] store+0x39/0x79
       [<ffffffff81103589>] sysfs_write_file+0x103/0x13f
       [<ffffffff810afddc>] vfs_write+0xad/0x172
       [<ffffffff810aff5a>] sys_write+0x45/0x6c
       [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b

-> #0 (s_active#31){++++.+}:
       [<ffffffff81053a59>] __lock_acquire+0xffa/0x16cd
       [<ffffffff81054183>] lock_acquire+0x57/0x6d
       [<ffffffff81104086>] sysfs_deactivate+0x8c/0xc9
       [<ffffffff81104950>] sysfs_addrm_finish+0x31/0x5a
       [<ffffffff81104a33>] sysfs_remove_dir+0x75/0x88
       [<ffffffff811bafae>] kobject_del+0x16/0x37
       [<ffffffff811bb00d>] kobject_release+0x3e/0x66
       [<ffffffff811bbd71>] kref_put+0x43/0x4d
       [<ffffffff811baf29>] kobject_put+0x47/0x4b
       [<ffffffff812a39b2>] __cpufreq_remove_dev+0x1da/0x236
       [<ffffffff8136178e>] cpufreq_cpu_callback+0x62/0x7a
       [<ffffffff81048362>] notifier_call_chain+0x32/0x5e
       [<ffffffff810483ed>] __raw_notifier_call_chain+0x9/0xb
       [<ffffffff81351736>] _cpu_down+0x90/0x29e
       [<ffffffff810311d3>] disable_nonboot_cpus+0x6f/0x105
       [<ffffffff8103f845>] ...
From: =?UTF-8?Q?Am=C3=A9rico_Wang?=
Date: Thursday, March 25, 2010 - 7:43 pm

I already send a patch for this:

https://patchwork.kernel.org/patch/83558/

It is still in -mm tree, I think.

--

From: David Howells
Date: Friday, March 26, 2010 - 5:22 am

Tested-by: David Howells <dhowells@redhat.com>
--

Previous thread: [PATCH] time: Cleanup direct xtime usage in xen by John Stultz on Thursday, January 28, 2010 - 9:24 pm. (1 message)

Next thread: [RFC 0/2 PATCH] remove member rt_se from struct rt_rq by Yong Zhang on Thursday, January 28, 2010 - 11:56 pm. (4 messages)