Hello,
I am experiencing weird system hangs. Once about 2-5 weeks system freezes=
=20
and stops accepting remote connections, so it is no longer possible to=20
connect to most important services: smtp (postfix), www (squid) or even=20
ssh. Such connection is accepted but then it hangs.What is strange, that previously established ssh session is usable. It is=
=20
possible to work on such system until you do something stupid like "less=20
/var/log/all.log". Using strace I found that process blocks on:--- strace: being ---
execve("/usr/bin/tail", ["tail", "-f", "/var/log/all.log"], [/* 33 vars */]=
) =3D 0
brk(0) =3D 0x8052000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
=3D 0x6ff00000
access("/etc/ld.so.preload", R_OK) =3D -1 ENOENT (No such file or dire=
ctory)
open("/etc/ld.so.cache", O_RDONLY) =3D 3
fstat64(3, {st_mode=3DS_IFREG|0644, st_size=3D20944, ...}) =3D 0
mmap2(NULL, 20944, PROT_READ, MAP_PRIVATE, 3, 0) =3D 0x6fefa000
close(3) =3D 0
open("/lib/libc.so.6", O_RDONLY) =3D 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0RY\1\0004\0\0\0"...=
, 512) =3D 512
fstat64(3, {st_mode=3DS_IFREG|0755, st_size=3D1175920, ...}) =3D 0
mmap2(NULL, 1185212, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) =
=3D 0x6fdd8000
mmap2(0x6fef4000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DE=
NYWRITE, 3, 0x11b) =3D 0x6fef4000
mmap2(0x6fef7000, 9660, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANO=
NYMOUS, -1, 0) =3D 0x6fef7000
close(3) =3D 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
=3D 0x6fdd7000
set_thread_area({entry_number:-1 -> 6, base_addr:0x6fdd76b0, limit:1048575,=
seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_prese=
nt:0, useable:1}) =3D 0
mprotect(0x6fef4000, 4096, PROT_READ) =3D 0
mprotect(0x6ff1c000, 4096, PROT_READ) =3D 0
munmap(0x6fefa...
Is this a regression? If so, what's the most recent kernel that didn't show
the problem?The symptoms could be consistent with some place doing a
balance_dirty_pages while holding a lock that is required for IO, but I can't
see a smoking gun (you've got contention on i_mutex, but that should be
OK).Can you see if there is any memory under writeback that isn't being
completed (sysrq+M), also a list the locks held after the hang might be
helpful (compile in lockdep and sysrq+D)Is anything currently running? (sysrq+P and even a full sysrq+T task list
could be useful).Are any IO errors occurring at all?
Thanks,
Nick
-
It seems that 2.6.23.x still fails but somehow different. I updated my=20
bugreport at: http://bugzilla.kernel.org/show_bug.cgi?id=3D9182. There are=
=20
new attachments with traces and an oops that happened while I was taking=20
the debugging data.Thank you.
Best regards,
=09=09=09Krzysztof Ol=EAdzki
OK. It did it again :(, so I send additional debug information. Today I=20
discovered that "echo i > /proc/sysrq-trigger" makes my system being=20SysRq : Show Memory
Mem-info:
DMA per-cpu:
Normal per-cpu:
Active:281859 inactive:140551 dirty:46161 writeback:0 unstable:0
free:39081 slab:50571 mapped:3261 pagetables:356 bounce:0
DMA free:8704kB min:512kB low:640kB high:768kB active:48kB inactive:0kB pre=
sent:16256kB pages_scanned:12 all_unreclaimable? no
lowmem_reserve[]: 0 2015
Normal free:147620kB min:65016kB low:81268kB high:97524kB active:1127388kB =
inactive:562204kB present:2064260kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0
DMA: 4*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*20=
48kB 2*4096kB =3D 8704kB
Normal: 15781*4kB 522*8kB 264*16kB 56*32kB 9*64kB 8*128kB 6*256kB 1*512kB 1=
*1024kB 2*2048kB 16*4096kB =3D 147620kB
Swap cache: add 5022, delete 4928, find 2281/2804, race 0+0
Free swap =3D 2927208kB
Total swap =3D 2927608kB
Free swap: 2927208kB
524224 pages of RAM
0 pages of HIGHMEM
6345 reserved pages
249411 pages shared
94 pages swap cached
46161 pages dirty
0 pages writeback
3261 pages mapped
50571 pages slabSysRq : Show Locks Held
Showing all locks held in the system:
1 lock held by syslogd/2197:
#0: (&inode->i_mutex){--..}, at: [<7814aabc>] generic_file_aio_write+0x4=
3/0xb6
1 lock held by freshclam/2898:
#0: (&inode->i_mutex){--..}, at: [<7814aabc>] generic_file_aio_write+0x4=
3/0xb6
1 lock held by crond/3061:
#0: (&inode->i_mutex){--..}, at: [<7814aabc>] generic_file_aio_write+0x4=
3/0xb6
1 lock held by snmpd/3546:
#0: (&inode->i_mutex){--..}, at: [<7814aabc>] generic_file_aio_write+0x4=
3/0xb6
1 lock held by squid/3606:
#0: (&inode->i_mutex){--..}, at: [<7814aabc>] generic_file_aio_write+0x4=
3/0xb6
1 lock held by zabbix_agentd/3728:
#0: (&inode->i_mutex){--..}, at: [<7814aabc>] generic_file_a...
Mem-info:
DMA per-cpu:
Normal per-cpu:
Active:340169 inactive:117325 dirty:48579 writeback:0 unstable:0
free:25266 slab:28357 mapped:3109 pagetables:816 bounce:0
DMA free:8784kB min:512kB low:640kB high:768kB active:0kB inactive:72kB pre=
sent:16256kB pages_scanned:36 all_unreclaimable? no
lowmem_reserve[]: 0 2015
Normal free:92280kB min:65016kB low:81268kB high:97524kB active:1360676kB i=
nactive:469228kB present:2064260kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0
DMA: 12*4kB 6*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2=
048kB 2*4096kB =3D 8784kB
Normal: 2724*4kB 1863*8kB 95*16kB 0*32kB 1*64kB 3*128kB 0*256kB 0*512kB 1*1=
024kB 1*2048kB 15*4096kB =3D 92280kB
Swap cache: add 6396, delete 6331, find 3048/3691, race 0+0
Free swap =3D 2927260kB
Total swap =3D 2927608kB
Free swap: 2927260kB
524224 pages of RAM
0 pages of HIGHMEM
6345 reserved pages
302073 pages shared
65 pages swap cached
48579 pages dirty
0 pages writeback
3109 pages mapped
28357 pages slab
816 pages pagetablesSysRq : Show Locks Held
Showing all locks held in the system:
1 lock held by syslogd/2163:
#0: (&inode->i_mutex){--..}, at: [<7814aabc>] generic_file_aio_write+0x4=
3/0xb6
1 lock held by freshclam/2864:
#0: (&inode->i_mutex){--..}, at: [<7814aabc>] generic_file_aio_write+0x4=
3/0xb6
1 lock held by named/2925:
#0: (&inode->i_mutex){--..}, at: [<7814aabc>] generic_file_aio_write+0x4=
3/0xb6
1 lock held by crond/3027:
#0: (&inode->i_mutex){--..}, at: [<7814aabc>] generic_file_aio_write+0x4=
3/0xb6
1 lock held by squid/3559:
#0: (&inode->i_mutex){--..}, at: [<7814aabc>] generic_file_aio_write+0x4=
3/0xb6
1 lock held by zabbix_agentd/3701:
#0: (&inode->i_mutex){--..}, at: [<7814aabc>] generic_file_aio_write+0x4=
3/0xb6
1 lock held by agetty/3753:
#0: (&tty->atomic_read_lock){--..}, at: [<7829cdde>] read_chan+0x1b1/0x5=
4b
1 lock held...
OK. I'll try to do it next time if there will be a chance. It may take=20
I'll have to check - maybe I have this captured. If not I'll check it next=
=20Didn't notice - so no.
Thank you.
Best regards,
=09=09=09Krzysztof Ol=EAdzki
This trace puzzles me, what is: unix_dgram_recvmsg doing there.
Also, it has two invocations of: ext3_file_writesingle write, no networking, also stuck in balance_dirty_pages().
-
No, no loopback:
# mount
/dev/md0 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)
devpts on /dev/pts type devpts (rw,nosuid,noexec)
/dev/mapper/VolGrp0-usr on /usr type ext3 (rw,nodev,data=3Djournal)
/dev/mapper/VolGrp0-var on /var type ext3 (rw,nodev,data=3Djournal)
/dev/mapper/VolGrp0-squid_spool on /var/cache/squid/cd0 type ext3 (rw,nosui=
d,nodev,noatime,data=3Dwriteback)
/dev/mapper/VolGrp0-squid_spool2 on /var/cache/squid/cd1 type ext3 (rw,nosu=
id,nodev,noatime,data=3Dwriteback)
/dev/mapper/VolGrp0-news_spool on /var/spool/news type ext3 (rw,nosuid,node=
v,noatime)
shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)
usbfs on /proc/bus/usb type usbfs (rw,noexec,nosuid,devmode=3D0664,devgid=
=3D85)
owl:/usr/gentoo-nfs on /usr/gentoo-nfs type nfs (ro,nosuid,nodev,noatime,bg=
,intr,tcp,addr=3D192.168.129.26)Exactly. Strange, isn't it?
Thanks.
Best regards,
=09=09=09=09Krzysztof Ol=EAdzki
| Alan Cox | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Jan Engelhardt | intel iommu (Re: -mm merge plans for 2.6.23) |
| Adrian Bunk | Re: LSM conversion to static interface |
| Greg Kroah-Hartman | [PATCH 004/196] Chinese: add translation of SubmittingPatches |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Andrew Morton | Re: [BUG] New Kernel Bugs |
| Winkler, Tomas | RE: iwlwifi: fix build bug in "iwlwifi: fix LED stall" |
