Re: [Regression] Commit 2344abbcbdb82140050e8be29d3d55e4f6fe860b breaks resume on nx6325

Previous thread: [PATCH] ext3/jbd: Avoid WARN() messages when failing to write to the superblock by Theodore Ts'o on Saturday, September 20, 2008 - 2:35 pm. (4 messages)

Next thread: Re: Syba 8-Port Serial Card Unidentified By Kernel by Bjorn Helgaas on Saturday, September 20, 2008 - 4:21 pm. (1 message)
From: Rafael J. Wysocki
Date: Saturday, September 20, 2008 - 4:24 pm

Hi,

Unfortunately resume from suspend to RAM is completely broken on my hp nx6325
because of

commit 2344abbcbdb82140050e8be29d3d55e4f6fe860b
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Sep 16 11:32:50 2008 -0700

    clockevents: make device shutdown robust

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Reverting of this commit makes things work again.

I know what this commit is for etc., but it obviosly need a replacement. :-(

Thanks,
Rafael
--

From: Rafael J. Wysocki
Date: Saturday, September 20, 2008 - 4:32 pm

One more thing, "broken" means that the box doesn't resume (the suspend part
seems to work correctly) and instead it seems to enter a neverending loop
that cannot be broken by any means except for the power button, so I think it
occurs with interrupts disabled.

Thanks,
Rafael
--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 12:03 pm

Update:

After some more debugging I verified that in fact the $subject commit breaks
CPU hotplugging (the 'online' part), so I should be able to get some more
information about what really happens.

This still will be difficult, because the box hangs solid (magic sysrq doesn't
work in this state) almost immediately after a (failing) attempt to 'online' the
previously 'offlined' CPU.

Thanks,
Rafael
--

From: Rafael J. Wysocki
Date: Sunday, September 21, 2008 - 3:36 pm

Update:

It turns out that as long as X is not used (ie. I don't switch to it from the
console) the machine doesn't hang after the failing attempt to 'online' CPU1
(previously 'offlined').  This observation allowed me to get some more data.

Below is the trace of the process trying to 'online' CPU1 (obtained with
'echo t > /proc/sysrq-trigger'):

[  385.548018] bash          D 0000000000000000     0  3842   3796
[  385.548018]  ffff880074cdfb78 0000000000000082 ffff880074cdfac8 ffff88007500e3c0
[  385.548018]  ffffffff8073c900 ffffffff8073c900 ffffffff8073c900 ffffffff8073c900
[  385.548018]  ffffffff8073c900 ffffffff8073c900 ffffffff80738c80 ffffffff8073c900
[  385.548018] Call Trace:
[  385.548018]  [<ffffffff80457078>] schedule_timeout+0x22/0xbf
[  385.548018]  [<ffffffff8022c678>] ? need_resched+0x1e/0x28
[  385.548018]  [<ffffffff80235150>] ? __cond_resched+0x34/0x3a
[  385.548018]  [<ffffffff80456edd>] wait_for_common+0xd5/0x13c
[  385.548018]  [<ffffffff802309e7>] ? default_wake_function+0x0/0xf
[  385.548018]  [<ffffffff80456fce>] wait_for_completion+0x18/0x1a
[  385.548018]  [<ffffffff80248d5f>] synchronize_rcu+0x32/0x3b
[  385.548018]  [<ffffffff80248de0>] ? wakeme_after_rcu+0x0/0x10
[  385.548018]  [<ffffffff8023262a>] partition_sched_domains+0xc3/0x1ee
[  385.548018]  [<ffffffff8026850f>] cpuset_track_online_cpus+0x26d/0x281
[  385.548018]  [<ffffffff80230a13>] ? wake_up_process+0x10/0x12
[  385.548018]  [<ffffffff8024e7e1>] notifier_call_chain+0x33/0x5b
[  385.548018]  [<ffffffff8024e848>] __raw_notifier_call_chain+0x9/0xb
[  385.548018]  [<ffffffff8024e859>] raw_notifier_call_chain+0xf/0x11
[  385.548018]  [<ffffffff804555f4>] _cpu_up+0xd9/0x114
[  385.548018]  [<ffffffff80455686>] cpu_up+0x57/0x67
[  385.548018]  [<ffffffff8044b8d0>] store_online+0x4d/0x75
[  385.548018]  [<ffffffff80399e53>] sysdev_store+0x1b/0x1d
[  385.548018]  [<ffffffff802ea1c2>] sysfs_write_file+0xdf/0x114
[  385.548018]  [<ffffffff802a3ae1>] vfs_write+0xa7/0xe1
[  385.548018]  ...
From: Ingo Molnar
Date: Monday, September 22, 2008 - 1:41 am

i think RCU is not broken, RCU just happens to be the first entity 
affected by C1E-enter magically killing the (previously working) lapic 
timer clockevents device. We are working on a fix.

	Ingo
--

From: Rafael J. Wysocki
Date: Monday, September 22, 2008 - 8:37 am

Thanks!

Rafael
--