On Feb 11, 2008 11:15 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
Last report:
http://marc.info/?l=linux-kernel&m=120129854023202
[snip]
2.6.24-rc3-mm2:
http://marc.info/?l=linux-kernel&m=119636996902805
-> crash in ether1394 and another one in sunrpc
http://marc.info/?l=linux-kernel&m=119671371413299
-> crash in tcp v4
http://marc.info/?l=linux-kernel&m=119888251227487
-> crash in sunrpc (first oops in mail)
2.6.24-rc6-mm1:
http://marc.info/?l=linux-kernel&m=119888251227487
-> crash net-core (skb_release_data) (second oops in mail)
http://marc.info/?l=linux-kernel&m=119898573229965
-> another one in net-core (skb_release_data)
http://marc.info/?l=linux-kernel&m=119910705115373
-> crash in sunrpc
http://marc.info/?l=linux-kernel&m=119921272203686
-> crash in tcp v4
http://marc.info/?l=linux-kernel&m=119933661810303
-> crash in sunrpc
http://marc.info/?l=linux-kernel&m=119946018207746
-> crash in firewire thread
http://marc.info/?l=linux-kernel&m=119958976409612
-> crash in sunrpc
2.6.24-rc8-mm1:
http://marc.info/?l=linux-kernel&m=119671371413299
-> unknown, but some list check triggered (kernel BUG at lib/list_debug.c:33!)
2.6.25-rc1:
http://marc.info/?l=linux-kernel&m=120276641105256&w=2
-> crash in ether1394
All looks network related, and my testcase did stress the network
somewhat because it was reading large files from a NFSv4 share.
But I agree with Stefan, that its not looking like a ether1394 bug.
The code did not change and looking at the code from these crashed in
my eyes it can't crash. The list is checked before calling into the
processing functions and the locking also looks right. It very much
looks like the list itself got corrupted somehow. And that the other
network system also suffer from a similar corruption.
But I have no clue where to look for the corruptor. I tried
slub_debug=FZP, but that also crashed (after several days of working)
without the slub debugging catching anything.
The corruptor also might be in the disk subsystem, as testcase also
stresses the disks.
My current best guess are the changes between rc2-mm1 and rc3-mm2 in
the git trees sched, scsi-misc or x86. Maybe git-xfs, but during all
the crashes my root xfs filesystem did lose some files because their
contents was still in writeback, but the directories or other
filesystem structures where not damaged once.
Torsten
--