2.6.36-rc1 hangs during XFS barrier test for /

Previous thread: Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs by Don Zickus on Friday, August 20, 2010 - 8:05 am. (48 messages)

Next thread: cleancache followup from LSF10/MM summit by Dan Magenheimer on Friday, August 20, 2010 - 8:14 am. (2 messages)
From: Torsten Kaiser
Date: Friday, August 20, 2010 - 8:08 am

Hello,

after installing 2.6.36-rc1 my system gets stuck during "Mounting root..."

I'm using an initramfs to mount the root fs, because I'm using a
stacked setup with md (raid1) -> dm-crypt -> xfs.

Strange side effect: sometimes the cursor stops blinking for a few
seconds, but then resumes blinking. Each of these blinking stalls are
accompanied by a RCU stall message.

From the serial console:
[    8.039603] Freeing unused kernel memory: 564k freed
[    8.049070] Write protecting the kernel read-only data: 10240k
[    8.059173] Freeing unused kernel memory: 604k freed
[    8.068930] Freeing unused kernel memory: 1732k freed
[   40.364439] SysRq : Changing Loglevel
[   40.371605] Loglevel set to 6
[   56.760017] INFO: rcu_sched_state detected stalls on CPUs/tasks: {
2} (detected by 0, t=4004 jiffies)
[   86.780016] INFO: rcu_sched_state detected stalls on CPUs/tasks: {
2} (detected by 0, t=7006 jiffies)
[  116.800018] INFO: rcu_sched_state detected stalls on CPUs/tasks: {
2} (detected by 0, t=10008 jiffies)
[  146.820018] INFO: rcu_sched_state detected stalls on CPUs/tasks: {
2} (detected by 0, t=13010 jiffies)
[  159.135015] SysRq : Show Blocked State
[  159.142014]  ffff88007f7449f0 0000000000000046 ffff8800071abd10
ffff880000000000
[  159.145007]  ffff88007ff4f770 0000000000012740 ffff8800071abfd8
0000000000012740
[  159.145007]  ffff8800071abfd8 ffff88007f744c50 ffff8800071abfd8
ffff88007f744c48
[  159.145007] Call Trace:
[  159.145007]  [<ffffffff8143ef40>] ? dm_wq_work+0x0/0x1a0
[  159.145007]  [<ffffffff8155e7fd>] ? io_schedule+0x3d/0x60
[  159.145007]  [<ffffffff8143e13a>] ? dm_wait_for_completion+0xba/0x150
[  159.145007]  [<ffffffff81035870>] ? default_wake_function+0x0/0x20
[  159.145007]  [<ffffffff8143ef40>] ? dm_wq_work+0x0/0x1a0
[  159.145007]  [<ffffffff8143ef40>] ? dm_wq_work+0x0/0x1a0
[  159.230029]  [<ffffffff8143ef82>] ? dm_wq_work+0x42/0x1a0
[  159.230029]  [<ffffffff8104d21b>] ? process_one_work+0xfb/0x370
[  159.230029]  [<ffffffff8104ed7c>] ? ...
From: Paul E. McKenney
Date: Friday, August 20, 2010 - 12:32 pm

This indicates that you have a "longer than average loop", probably
with interrupts disabled across the loop.  Documentation/RCU/stallwarn.txt
has more information on this condition.

--

From: Torsten Kaiser
Date: Sunday, August 22, 2010 - 9:39 am

On Fri, Aug 20, 2010 at 9:32 PM, Paul E. McKenney

My initramfs seems to have eaten the real OOPS. It was the (already
reported) "kernel BUG at drivers/scsi/scsi_lib.c:1113".


The stall was detected on CPU #2, the same CPU that got the
scsi_lib-BUG. So that very much just looks like a fallout from that.

Thanks for you tipp, but I think, I will only bother you again, if I'm
still seeing this after that OOPS has gotten fixed.

Thanks,

Torsten
--

Previous thread: Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs by Don Zickus on Friday, August 20, 2010 - 8:05 am. (48 messages)

Next thread: cleancache followup from LSF10/MM summit by Dan Magenheimer on Friday, August 20, 2010 - 8:14 am. (2 messages)