Re: INFO: task reiserfs/0:1322 blocked for more than 120 seconds

Previous thread: Re: kernel BUG at lib/radix-tree.c:473! by zhang wenjie on Saturday, August 16, 2008 - 8:37 pm. (1 message)

Next thread: patching kdb to Centos kernel : error by Satish Eerpini on Saturday, August 16, 2008 - 11:00 pm. (3 messages)
From: Greg Donald
Date: Saturday, August 16, 2008 - 9:36 pm

I got this while rsync'ng an NFS share onto a local disk:

[42374.151062] INFO: task reiserfs/0:1322 blocked for more than 120 seconds.
[42374.186295] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[42374.229433] reiserfs/0    D c1f36180     0  1322      2
[42374.265246]        f5dbdedc 00000046 c1f36180 c1f36180 f5e932c0
1c823428 00002669 f5e932c0
[42374.273706]        f5e93514 c1f36180 00000000 f5dbc000 f62cc780
f5e932c0 00000002 00000001
[42374.313709]        00000000 00000000 f5e932c0 c013cc01 00000246
f5dbded4 c013cbce e31e12ec
[42374.356837] Call Trace:
[42374.417842]  [<c013cc01>] ? trace_hardirqs_on+0xb/0xd
[42374.451201]  [<c013cbce>] ? trace_hardirqs_on_caller+0xe9/0x111
[42374.489735]  [<c02e876b>] mutex_lock_nested+0x14b/0x22b
[42374.525760]  [<c01c9727>] ? flush_commit_list+0x119/0x505
[42374.560839]  [<c01c9727>] flush_commit_list+0x119/0x505
[42374.594183]  [<c01cca8e>] flush_async_commits+0x41/0x4b
[42374.629770]  [<c012ec1a>] run_workqueue+0xc3/0x18e
[42374.662893]  [<c012ebfe>] ? run_workqueue+0xa7/0x18e
[42374.697814]  [<c01cca4d>] ? flush_async_commits+0x0/0x4b
[42374.732504]  [<c012f609>] ? worker_thread+0x0/0x8a
[42374.765765]  [<c012f688>] worker_thread+0x7f/0x8a
[42374.797749]  [<c0131d61>] ? autoremove_wake_function+0x0/0x38
[42374.833713]  [<c0131c93>] kthread+0x40/0x69
[42374.865772]  [<c0131c53>] ? kthread+0x0/0x69
[42374.897774]  [<c010392f>] kernel_thread_helper+0x7/0x10
[42374.929777]  =======================
[42374.957001] 3 locks held by reiserfs/0/1322:
[42374.990140]  #0:  (reiserfs){--..}, at: [<c012ebe1>] run_workqueue+0x8a/0x18e
[42375.025754]  #1:  (&(&journal->j_work)->work){--..}, at:
[<c012ebfe>] run_workqueue+0xa7/0x18e
[42375.062963]  #2:  (&jl->j_commit_mutex){--..}, at: [<c01c9727>]
flush_commit_list+0x119/0x505


I deleted a few GBs of data and ran it again but was unable to
reproduce it.  This was on 2.6.27-rc3.

I don't see any corruption.  Fluke?


-- 
Greg Donald
--

From: Andrew Morton
Date: Tuesday, August 19, 2008 - 11:52 pm

Seems that about 100% of the reports we get of this warning triggering
are sys_sync, transaction commit, etc.

Does kerneloops.org disagree with me?

If not, I vote we kill it.
--

From: Ingo Molnar
Date: Wednesday, August 20, 2008 - 2:19 am

ok. How about quadrupling the timeout, as per the patch below?

more than 8 minutes uninterruptible wait, is that a reasonable limit?

I had this warning trigger a couple of times during development, 
alerting me to hung tasks.

	Ingo

------------------>
From 3fb4198766c38aa03492cc3996475076073c22ea Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Wed, 20 Aug 2008 11:17:40 +0200
Subject: [PATCH] softlockup: increase hung tasks check from 2 minutes to 8 minutes


increase the timeout. If it still triggers for people, we can kill it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/softlockup.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index b75b492..17a0580 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -164,7 +164,7 @@ unsigned long __read_mostly sysctl_hung_task_check_count = 1024;
 /*
  * Zero means infinite timeout - no checking done:
  */
-unsigned long __read_mostly sysctl_hung_task_timeout_secs = 120;
+unsigned long __read_mostly sysctl_hung_task_timeout_secs = 480;
 
 unsigned long __read_mostly sysctl_hung_task_warnings = 10;
 
--

From: Andi Kleen
Date: Wednesday, August 20, 2008 - 3:00 am

There should be a way to disable them for NFS and other network
file systems at least. Having network issues is not that uncommon
and flooding the log with backtraces every time they happen
when a network fs is mounted is not very useful.

-Andi

--

From: Andi Kleen
Date: Wednesday, August 20, 2008 - 2:59 am

And NFS -- i just had the kernel log on one of my nfsroot test systems
flooded recently with them when the ethernet cable was disconnected
for some time and nfs blocked.  Scared me first, but then after
analysis didn't seem very useful. I imagine it would scare normal
users far more.

-Andi
--

Previous thread: Re: kernel BUG at lib/radix-tree.c:473! by zhang wenjie on Saturday, August 16, 2008 - 8:37 pm. (1 message)

Next thread: patching kdb to Centos kernel : error by Satish Eerpini on Saturday, August 16, 2008 - 11:00 pm. (3 messages)