login
Login
/
Register
Search
Search this site:
Forums
News
Blogs
Features
Site
Home
»
Mailing list archives
»
linux-kernel
»
2010
»
February
»
18
Re: Hung task - sync - 2.6.33-rc7 w/md6 multicore rebuild in process
view
thread
Previous message: [
thread
] [
date
] [
author
]
Next message: [
thread
] [
date
] [
author
]
[view in full thread]
From: Michael Breuer
Subject:
Re: Hung task - sync - 2.6.33-rc7 w/md6 multicore rebuild in process
Date: Thursday, February 18, 2010 - 10:11 am
On 02/17/2010 09:39 PM, Jan Kara wrote:
quoted text
>> On 2/13/2010 11:51 AM, Michael Breuer wrote: >> >>> Scenario: >>> >>> 1. raid6 (software - 6 1Tb sata drives) doing a resync (multi core >>> enabled) >>> 2. rebuilding kernel (rc8) >>> 3. system became sluggish - top& vmstat showed all 12Gb ram used - >>> albeit 10g of fs cache. It seemed as though relcaim of fs cache became >>> really slow once there were no more "free" pages. >>> vmstat<after hung task reported - don't have from before> >>> procs -----------memory---------- ---swap-- -----io---- --system-- >>> -----cpu----- >>> r b swpd free buff cache si so bi bo in cs us >>> sy id wa st >>> 0 1 808 112476 347592 9556952 0 0 39 388 158 189 >>> 1 18 77 4 0 >>> 4. Worrying a bit about the looming instability, I typed, "sync." >>> 5. sync took a long time, and was reported by the kernel as a hung >>> task (repeatedly) - see below. >>> 6. entering additional sync commands also hang (unsuprising, but >>> figured I'd try as non-root). >>> 7. The running sync (pid 11975) cannot be killed. >>> 8. echo 1> drop_caches does clear the fs cache. System behaves better >>> after this (but sync is still hung). >>> >>> config attached. >>> >>> Running with sky2 dma patches (in rc8) and increased the audit name >>> space to avoid the flood of name space maxed warnings. >>> >>> My current plan is to let the raid rebuild complete and then reboot >>> (to rc8 if the bits made it to disk)... maybe with a backup of >>> recently changed files to an external system. >>> >>> Feb 13 10:54:13 mail kernel: INFO: task sync:11975 blocked for more >>> than 120 seconds. >>> Feb 13 10:54:13 mail kernel: "echo 0> >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>> Feb 13 10:54:13 mail kernel: sync D 0000000000000002 0 >>> 11975 6433 0x00000000 >>> Feb 13 10:54:13 mail kernel: ffff8801c45f3da8 0000000000000082 >>> ffff8800282f5948 ffff8800282f5920 >>> Feb 13 10:54:13 mail kernel: ffff88032f785d78 ffff88032f785d40 >>> 000000030c37a771 0000000000000282 >>> Feb 13 10:54:13 mail kernel: ffff8801c45f3fd8 000000000000f888 >>> ffff88032ca00000 ffff8801c61c9750 >>> Feb 13 10:54:13 mail kernel: Call Trace: >>> Feb 13 10:54:13 mail kernel: [<ffffffff81154730>] ? >>> bdi_sched_wait+0x0/0x20 >>> Feb 13 10:54:13 mail kernel: [<ffffffff8115473e>] bdi_sched_wait+0xe/0x20 >>> Feb 13 10:54:13 mail kernel: [<ffffffff81537b4f>] __wait_on_bit+0x5f/0x90 >>> Feb 13 10:54:13 mail kernel: [<ffffffff81154730>] ? >>> bdi_sched_wait+0x0/0x20 >>> Feb 13 10:54:13 mail kernel: [<ffffffff81537bf8>] >>> out_of_line_wait_on_bit+0x78/0x90 >>> Feb 13 10:54:13 mail kernel: [<ffffffff81078650>] ? >>> wake_bit_function+0x0/0x50 >>> Feb 13 10:54:13 mail kernel: [<ffffffff8104ac55>] ? >>> wake_up_process+0x15/0x20 >>> Feb 13 10:54:13 mail kernel: [<ffffffff81155daf>] >>> bdi_sync_writeback+0x6f/0x80 >>> Feb 13 10:54:13 mail kernel: [<ffffffff81155de2>] >>> sync_inodes_sb+0x22/0x100 >>> Feb 13 10:54:13 mail kernel: [<ffffffff81159902>] >>> __sync_filesystem+0x82/0x90 >>> Feb 13 10:54:13 mail kernel: [<ffffffff81159a04>] >>> sync_filesystems+0xf4/0x120 >>> Feb 13 10:54:13 mail kernel: [<ffffffff81159a91>] sys_sync+0x21/0x40 >>> Feb 13 10:54:13 mail kernel: [<ffffffff8100b0f2>] >>> system_call_fastpath+0x16/0x1b >>> >>> <this repeats every 120 seconds - all the same traceback> >>> >>> >>> >>> >>> >> Note: this cleared after about 90 minutes - sync eventually completed. >> I'm thinking that with multicore enabled the resync is able to starve >> out normal system activities that weren't starved w/o multicore. >> > Hmm, it is a bug in writeback code. But as Linus pointed out, it's not really > clear why it's *so* slow. So when it happens again, could you please sample for > a while (like every second for 30 seconds) stacks of blocked tasks via > Alt-Sysrq-W? I'd like to see where flusher threads are hanging... Thanks. > > Honza >
Ok - got it. Sync is still spinning, btw... attaching log extract as well as dmesg output.
Previous message: [
thread
] [
date
] [
author
]
Next message: [
thread
] [
date
] [
author
]
Messages in current thread:
Hung task - sync - 2.6.33-rc7 w/md6 multicore rebuild in ...
, Michael Breuer
, (Sat Feb 13, 9:51 am)
Re: Hung task - sync - 2.6.33-rc7 w/md6 multicore rebuild ...
, Michael Breuer
, (Sat Feb 13, 10:09 am)
Re: Hung task - sync - 2.6.33-rc7 w/md6 multicore rebuild ...
, Michael Breuer
, (Sat Feb 13, 11:16 am)
Re: Hung task - sync - 2.6.33-rc7 w/md6 multicore rebuild ...
, Jan Kara
, (Wed Feb 17, 7:39 pm)
Re: Hung task - sync - 2.6.33-rc7 w/md6 multicore rebuild ...
, Michael Breuer
, (Wed Feb 17, 7:51 pm)
Re: Hung task - sync - 2.6.33-rc7 w/md6 multicore rebuild ...
, Michael Breuer
, (Thu Feb 18, 10:11 am)
Re: Hung task - sync - 2.6.33-rc7 w/md6 multicore rebuild ...
, Dave Chinner
, (Thu Feb 18, 6:43 pm)
Re: Hung task - sync - 2.6.33-rc7 w/md6 multicore rebuild ...
, Michael Breuer
, (Thu Feb 18, 7:31 pm)
Re: Hung task - sync - 2.6.33-rc7 w/md6 multicore rebuild ...
, Dave Chinner
, (Thu Feb 18, 9:02 pm)
Re: Hung task - sync - 2.6.33-rc7 w/md6 multicore rebuild ...
, Michael Breuer
, (Thu Feb 18, 10:31 pm)
Re: Hung task - sync - 2.6.33-rc7 w/md6 multicore rebuild ...
, Dave Chinner
, (Fri Feb 19, 2:05 pm)
Re: Hung task - sync - 2.6.33-rc7 w/md6 multicore rebuild ...
, Pozsar Balazs
, (Fri Apr 2, 4:01 am)
Re: Hung task - sync - 2.6.33-rc7 w/md6 multicore rebuild ...
, mbreuer
, (Fri Apr 2, 6:58 am)
Navigation
Mailing list archives
Recent posts
Popular discussions
linux-kernel
:
Ken Chen
[patch] sched: fix inconsistency when redistribute per-cpu tg->cfs_rq shares.
Ingo Molnar
Re: [PATCH v3] x86: merge the simple bitops and move them to bitops.h
Jan Engelhardt
Re: [PATCH] Allow Kconfig to set default mmap_min_addr protection
Dmitry Torokhov
Re: [2.6 patch] input/serio/hp_sdc.c section fix
Rafael J. Wysocki
[Bug #16380] Loop devices act strangely in 2.6.35
git
:
Steven Grimm
Using git as a general backup mechanism (was Re: Using GIT to store /etc)
Jeff King
Re: [PATCH] git-reset: allow --soft in a bare repo
Johannes Sixt
Re: [PATCH 01/14] msvc: Fix compilation errors in compat/win32/sys/poll.c
Johannes Schindelin
Re: [PATCH] Uninstall rule for top level Makefile
Shawn O. Pearce
Re: [PATCH v2] Speed up bash completion loading
git-commits-head
:
Linux Kernel Mailing List
cgroups: clean up cgroup_pidlist_find() a bit
Linux Kernel Mailing List
sony-laptop: Add support for extended hotkeys
Linux Kernel Mailing List
IB/core: Add support for masked atomic operations
Linux Kernel Mailing List
V4L/DVB (8939): cx18: fix sparse warnings
Linux Kernel Mailing List
ipv6 mcast: Check address family of gf_group in getsockopt(MS_FILTER).
linux-netdev
:
Inaky Perez-Gonzalez
[PATCH 40/40] wimax/i2400m: add CREDITS and MAINTAINERS entries
Karsten Keil
[mISDN PATCH v2 05/19] Reduce stack size in dsp_cmx_send()
linux
Re: 2.6.23-rc8 network problem. Mem leak? ip1000a?
David Miller
Re: tun: Use netif_receive_skb instead of netif_rx
David Miller
Re: [net-next PATCH v2] llc enhancements
freebsd-current
:
Matthew Fleming
Re: [RFC] Outline of USB process integration in the kernel taskqueue system
illoai@gmail.com
Re: OT: 2d password
Hartmut Brandt
Re: problem with nss_ldap
Andrew Reilly
Re: FreeBSD's problems as seen by the BSDForen.de community
Max Laier
Re: Upcoming ABI Breakage in RELENG_7
Colocation donated by:
Syndicate