Re: 2.6.24-rc6-mm1

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Jarek Poplawski <jarkao2@...>
Cc: Herbert Xu <herbert@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, Neil Brown <neilb@...>, J. Bruce Fields <bfields@...>, <netdev@...>, Tom Tucker <tom@...>
Date: Friday, January 4, 2008 - 11:21 am

On Jan 4, 2008 2:30 PM, Jarek Poplawski <jarkao2@gmail.com> wrote:

I'm open for any suggestions and will try to answer any questions.
The only thing that is sadly not practical is bisecting the borkenout
mm-patches, as triggering this error is to unreliable /
time-consuming.


Yes, without these fixes I can't boot.
But they should only be run during starting the arrays, so I doubt
that this is that cause.
(Also -rc3-mm2 did not need this fix)

My skbuff-double-free-detector is still in there, but was never triggered.


???
I see no lockdep warning before the crashes.
I have seen a warning about the dst->__refcnt in dst_release and
different warnings about list operations.

I think I have always posted everything I have seen before the
crashes. (captured via serial console)

(If you mean the lockdep-problem in -rc6: That is more or less a
missing annotation during early bootup. The only problem with that is,
that it will causes lockdep to be turned off and so it can not be used
to find any real problem. A fix for that is in -mm so I do have
lockdep on the mm-kernels)


Yes, but Herbert mentioned double freeing a skb explicit and so I
tried to catch this.
I do not know enough about the network core to verify the locking of
the involved lists.


Yes, I think I really need to redo the git-nfsd-test.
With IOMMU_DEBUG enabled rc6-mm1worked for 52 packages, only a secound
run of kde-packages triggered it after only 5 packages.
I don't know what this bug hates about kdeartwork-wallpaper (triggered
it this time) or kdeartwork-styles.

Output from the crash with IOMMU_DEBUG (lockdep was enabled, but did
not trigger):
[15593.236374] Unable to handle kernel NULL pointer
dereference<3>list_add corruption. prev->next should be next
(ffffffff8078a410), but was ffff81011ec01e68. (prev=ffff81011ec01e68).
[15593.236374]  at 0000000000000000 RIP:
[15593.236374]  [<0000000000000000>]
[15593.236374] PGD 79d22067 PUD 7acd7067 PMD 0
[15593.236374] Oops: 0010 [1] SMP
[15593.236374] last sysfs file:
/sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[15593.236374] CPU 2
[15593.236374] Modules linked in: radeon drm w83792d ipv6 tuner
tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761
tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common
v4l1_compat sg hid pata_amd i2c_nforce2
[15593.236374] Pid: 510, comm: khpsbpkt Not tainted 2.6.24-rc6-mm1 #15
[15593.236374] RIP: 0010:[<0000000000000000>]  [<0000000000000000>]
[15593.236374] RSP: 0018:ffff81007eed3ee8  EFLAGS: 00010206
[15593.236374] RAX: ffff81007eed3ef0 RBX: ffff81011ec01e40 RCX: ffff81011ec01e40
[15593.236374] RDX: ffff81011ec01e68 RSI: ffff81011ec01e68 RDI: 0000000000000000
[15593.236374] RBP: ffff81007eed3f10 R08: 0000000000000000 R09: 0000000000000001
[15593.236374] R10: 0000000000000001 R11: 0000000000000058 R12: ffff81007eed3ef0
[15593.236374] R13: ffffffff80470e50 R14: 0000000000000000 R15: 0000000000000000
[15593.236374] FS:  00007f76e6c98700(0000) GS:ffff81011ff1f000(0000)
knlGS:00000000556f46c0
[15593.236374] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[15593.236374] CR2: 0000000000000000 CR3: 0000000079d29000 CR4: 00000000000006e0
[15593.236374] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15593.236374] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[15593.236374] Process khpsbpkt (pid: 510, threadinfo
ffff81007eed2000, task ffff81007eed0000)
[15593.236374] Stack:  ffffffff80470f0b ffff81011ec01e68
ffff81011ec014a8 00000000fffffffc
[15593.236374]  0000000000000000 ffff81007eed3f40 ffffffff8024d72d
00000000000001fb
[15593.236374]  ffff81007ff2bd98 00000000000001fb ffff81007ff2bcf0
ffff81007ff2df40
[15593.236374] Call Trace:
[15593.236374]  [<ffffffff80470f0b>] hpsbpkt_thread+0xbb/0x140
[15593.236374]  [<ffffffff8024d72d>] kthread+0x4d/0x80
[15593.236374]  [<ffffffff8020c4b8>] child_rip+0xa/0x12
[15593.236374]  [<ffffffff8020bbcf>] restore_args+0x0/0x30
[15593.236374]  [<ffffffff8024d6e0>] kthread+0x0/0x80
[15593.236374]  [<ffffffff8020c4ae>] child_rip+0x0/0x12
[15593.236374]
[15593.236374]
[15593.236374] Code:  Bad RIP value.
[15593.236374] RIP  [<0000000000000000>]
[15593.236374]  RSP <ffff81007eed3ee8>
[15593.236374] CR2: 0000000000000000
[15593.236377] ---[ end trace 11d2dc0fdbe1651f ]---
[15627.875963] ------------[ cut here ]------------
[15627.875963] kernel BUG at lib/list_debug.c:33!
[15627.875963] invalid opcode: 0000 [2] SMP
[15627.875963] last sysfs file:
/sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[15627.875963] CPU 3
[15627.875963] Modules linked in: radeon drm w83792d ipv6 tuner
tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761
tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common
v4l1_compat sg hid pata_amd i2c_nforce2
[15627.875963] Pid: 6258, comm: nxssh Tainted: G      D 2.6.24-rc6-mm1 #15
[15627.875963] RIP: 0010:[<ffffffff803bd954>]  [<ffffffff803bd954>]
__list_add+0x54/0x60
[15627.875963] RSP: 0000:ffff81007ffb3c80  EFLAGS: 00010082
[15627.875963] RAX: 0000000000000079 RBX: 0000000000000082 RCX: 000000000000b9f1
[15627.875963] RDX: 0000000000001514 RSI: 0000000000000001 RDI: ffffffff807641c0
[15627.875963] RBP: ffff81007ffb3c80 R08: 0000000000000001 R09: 0000000000000010
[15627.875963] R10: 0000000000000000 R11: 0000000000000020 R12: ffff81011ec01e40
[15627.875963] R13: ffff81011ec01e68 R14: 0000000000000002 R15: ffff81007eee2000
[15627.875963] FS:  00007f3531da2700(0000) GS:ffff81011ff1f280(0000)
knlGS:00000000556f46c0
[15627.875963] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15627.875963] CR2: 00007ff643d49fe0 CR3: 0000000079c37000 CR4: 00000000000006e0
[15627.875963] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15627.875963] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[15627.875963] Process nxssh (pid: 6258, threadinfo ffff810079d50000,
task ffff810079d4e000)
[15627.875963] Stack:  ffff81007ffb3ca0 ffffffff8046f9e8
ffff81011ec01e40 000000000000000c
[15627.875963]  ffff81007ffb3d40 ffffffff8047027c ffff81007ddd8000
ffff81007ddd8048
[15627.875963]  ffff81007ffb3ce0 ffffffff805d4366 ffff81007ddd8000
000000000000000c
[15627.875963] Call Trace:
[15627.875963]  <IRQ>  [<ffffffff8046f9e8>] queue_packet_complete+0x48/0x80
[15627.875963]  [<ffffffff8047027c>] hpsb_packet_received+0x51c/0x6d0
[15627.875963]  [<ffffffff805d4366>] _spin_unlock+0x26/0x30
[15627.875963]  [<ffffffff8047cc3d>] dma_rcv_tasklet+0x22d/0x430
[15627.875963]  [<ffffffff8021273e>] read_hpet+0xe/0x10
[15627.875963]  [<ffffffff805d48f2>] _spin_unlock_irqrestore+0x42/0x60
[15627.875963]  [<ffffffff8023d8b3>] tasklet_action+0x53/0xd0
[15627.875963]  [<ffffffff8023d754>] __do_softirq+0x84/0x110
[15627.875963]  [<ffffffff8020c82c>] call_softirq+0x1c/0x30
[15627.875963]  [<ffffffff8020eaa5>] do_softirq+0x65/0xc0
[15627.875963]  [<ffffffff8023d6c5>] irq_exit+0x95/0xa0
[15627.875963]  [<ffffffff8020ebbf>] do_IRQ+0x8f/0x100
[15627.875963]  [<ffffffff8020bb26>] ret_from_intr+0x0/0xf
[15627.875963]  <EOI>
[15627.875963]
[15627.875963] Code: 0f 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 8b 16
48 89 e5 e8
[15627.875963] RIP  [<ffffffff803bd954>] __list_add+0x54/0x60
[15627.875963]  RSP <ffff81007ffb3c80>
[15627.875963] ---[ end trace 11d2dc0fdbe1651f ]---
[15627.875963] Kernel panic - not syncing: Aiee, killing interrupt handler!

first oops:
(gdb) list *0xffffffff80470f0b
0xffffffff80470f0b is in hpsbpkt_thread (drivers/ieee1394/ieee1394_core.c:1139).
1134                    INIT_LIST_HEAD(&tmp);
1135                    spin_lock_irq(&pending_packets_lock);
1136                    list_splice_init(&hpsbpkt_queue, &tmp);
1137                    spin_unlock_irq(&pending_packets_lock);
1138
1139                    list_for_each_entry_safe(packet, p, &tmp, queue) {
1140                            list_del_init(&packet->queue);
1141                            packet->complete_routine(packet->complete_data);
1142                    }
1143

second oops:
(gdb) list *0xffffffff8046f9e8
0xffffffff8046f9e8 is in queue_packet_complete
(drivers/ieee1394/ieee1394_core.c:1115).
1110                    return;
1111            }
1112            if (packet->complete_routine != NULL) {
1113                    spin_lock_irqsave(&pending_packets_lock, flags);
1114                    list_add_tail(&packet->queue, &hpsbpkt_queue);
1115                    spin_unlock_irqrestore(&pending_packets_lock, flags);
1116                    wake_up_process(khpsbpkt_thread);
1117            }
1118            return;
1119    }

Torsten
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
2.6.24-rc6-mm1, Andrew Morton, (Sun Dec 23, 3:30 am)
Re: 2.6.24-rc6-mm1, Dave Young, (Wed Dec 26, 4:37 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sun Dec 23, 12:27 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Fri Dec 28, 6:53 pm)
Re: 2.6.24-rc6-mm1, Andrew Morton, (Fri Dec 28, 7:07 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sat Dec 29, 12:51 pm)
Re: 2.6.24-rc6-mm1, Herbert Xu, (Sat Dec 29, 9:30 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sat Dec 29, 11:34 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Mon Dec 31, 4:15 pm)
Re: 2.6.24-rc6-mm1, Herbert Xu, (Tue Jan 1, 8:04 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Wed Jan 2, 2:29 pm)
Re: 2.6.24-rc6-mm1, Herbert Xu, (Wed Jan 2, 5:51 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Fri Jan 4, 6:23 am)
Re: 2.6.24-rc6-mm1, Jarek Poplawski, (Fri Jan 4, 9:30 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Fri Jan 4, 11:21 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Fri Jan 4, 5:24 pm)
Re: 2.6.24-rc6-mm1, Jarek Poplawski, (Fri Jan 4, 8:07 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sat Jan 5, 4:01 am)
Re: 2.6.24-rc6-mm1, Jarek Poplawski, (Sat Jan 5, 6:13 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sat Jan 5, 10:52 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sat Jan 5, 6:10 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sat Jan 5, 11:16 pm)
Re: 2.6.24-rc6-mm1, Andrew Morton, (Sat Jan 5, 9:25 pm)
Re: 2.6.24-rc6-mm1, FUJITA Tomonori, (Sat Jan 5, 11:28 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sun Jan 6, 6:41 am)
Re: 2.6.24-rc6-mm1, FUJITA Tomonori, (Sun Jan 6, 7:23 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sun Jan 6, 7:35 am)
Re: 2.6.24-rc6-mm1, FUJITA Tomonori, (Sun Jan 6, 9:33 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sun Jan 6, 4:03 pm)
Re: 2.6.24-rc6-mm1, FUJITA Tomonori, (Mon Jan 7, 2:16 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Fri Jan 25, 5:06 pm)
Re: 2.6.24-rc6-mm1, Ingo Molnar, (Tue Jan 8, 11:59 am)
Re: 2.6.24-rc6-mm1, FUJITA Tomonori, (Tue Jan 8, 7:57 pm)
Re: 2.6.24-rc6-mm1, Jarek Poplawski, (Wed Jan 9, 5:04 am)
Re: 2.6.24-rc6-mm1, FUJITA Tomonori, (Wed Jan 9, 8:54 pm)
Re: 2.6.24-rc6-mm1, Andrew Morton, (Tue Jan 8, 8:27 pm)
Re: 2.6.24-rc6-mm1, FUJITA Tomonori, (Tue Jan 8, 8:54 pm)
Re: 2.6.24-rc6-mm1, Andrew Morton, (Tue Jan 8, 9:07 pm)
Re: 2.6.24-rc6-mm1, Jarek Poplawski, (Sun Jan 6, 4:27 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sun Jan 6, 6:30 am)
Re: 2.6.24-rc6-mm1, Jarek Poplawski, (Sun Jan 6, 10:52 am)
Re: 2.6.24-rc6-mm1, J. Bruce Fields, (Wed Jan 2, 5:57 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Thu Jan 3, 11:37 am)
Re: 2.6.24-rc6-mm1, J. Bruce Fields, (Thu Jan 3, 2:52 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Thu Jan 3, 1:02 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Tue Jan 1, 8:59 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Tue Jan 1, 2:29 pm)
Re: 2.6.24-rc6-mm1, Randy Dunlap, (Sun Dec 30, 1:41 am)
Re: 2.6.24-rc6-mm1, J. Bruce Fields, (Sun Dec 30, 5:24 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sun Dec 30, 5:35 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Mon Dec 31, 9:17 am)
Re: 2.6.24-rc6-mm1, Andrew Morton, (Sun Dec 23, 4:39 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Thu Dec 27, 7:42 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Thu Dec 27, 10:30 am)
Re: 2.6.24-rc6-mm1 Kernel panics at different functions (), Kamalesh Babulal, (Thu Dec 27, 4:49 am)
Re: 2.6.24-rc6-mm1 Kernel panics at different functions (), Kamalesh Babulal, (Fri Dec 28, 5:11 am)
Re: 2.6.24-rc6-mm1 Kernel panics at different functions (), Kamalesh Babulal, (Thu Dec 27, 6:25 am)
Re: 2.6.24-rc6-mm1 - e1000 breakage, James Morris, (Wed Dec 26, 7:39 pm)
Re: 2.6.24-rc6-mm1 (driver core/sysfs), Randy Dunlap, (Mon Dec 31, 4:11 pm)
Re: 2.6.24-rc6-mm1 (driver core/sysfs), Greg KH, (Fri Jan 11, 9:05 pm)
[patch] auto-qa Kconfig, Ingo Molnar, (Mon Jan 14, 12:11 pm)
Re: [patch] auto-qa Kconfig, Pavel Machek, (Tue Jan 15, 6:13 pm)
Re: 2.6.24-rc6-mm1 (build problem: gpio/W1), Randy Dunlap, (Mon Dec 31, 2:19 pm)
Re: 2.6.24-rc6-mm1 (build problem: gpio/W1), Evgeniy Polyakov, (Sat Jan 5, 11:29 am)
Re: 2.6.24-rc6-mm1 (build problem: gpio/W1), Ville , (Sat Jan 5, 12:16 pm)
Re: 2.6.24-rc6-mm1 (build problem: gpio/W1), Randy Dunlap, (Sat Jan 5, 1:18 pm)
Re: 2.6.24-rc6-mm1 (build problem: gpio_keys), Randy Dunlap, (Mon Dec 31, 2:18 pm)
Re: 2.6.24-rc6-mm1 (build problem: gpio_keys), David Brownell, (Mon Dec 31, 2:40 pm)
[PATCH -mm] gpio: fix x86 build problem: gpio_keys, Randy Dunlap, (Mon Dec 31, 3:10 pm)
Re: [PATCH -mm] gpio: fix x86 build problem: gpio_keys, Ingo Molnar, (Tue Jan 1, 11:32 am)
Re: 2.6.24-rc6-mm1 (build problem: v4l / i2c), Randy Dunlap, (Mon Dec 31, 2:18 pm)
[PATCH -mm] driver core: build with SYSFS=n, Randy Dunlap, (Mon Dec 31, 2:05 pm)
[PATCH -mm] crypto: scatterwalk.h needs sched.h, Randy Dunlap, (Mon Dec 31, 2:05 pm)
Re: [PATCH -mm] crypto: scatterwalk.h needs sched.h, Herbert Xu, (Mon Dec 31, 6:31 pm)
Re: 2.6.24-rc6-mm1: __raw_spin_is_contended undefined, Joseph Fannin, (Wed Dec 26, 10:21 pm)
Re: 2.6.24-rc6-mm1: __raw_spin_is_contended undefined, Nick Piggin, (Thu Dec 27, 1:21 am)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, Mariusz Kozlowski, (Wed Dec 26, 8:29 am)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, David Miller, (Wed Dec 26, 11:05 pm)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, Adrian Bunk, (Fri Dec 28, 7:22 pm)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, David Miller, (Sat Dec 29, 4:14 am)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, Adrian Bunk, (Sat Dec 29, 4:48 am)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, David Miller, (Sat Dec 29, 4:54 am)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, Adrian Bunk, (Sat Dec 29, 5:06 am)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, David Miller, (Sat Dec 29, 5:18 am)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, Adrian Bunk, (Sat Dec 29, 5:53 am)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, David Miller, (Sat Dec 29, 5:15 am)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, David Miller, (Sat Dec 29, 4:27 am)
Re: 2.6.24-rc6-mm1, Andreas Mohr, (Tue Dec 25, 5:51 pm)
Re: 2.6.24-rc6-mm1: suspend broken on HP nx6325 due to cpufr..., Rafael J. Wysocki, (Sun Dec 23, 6:54 pm)
Re: 2.6.24-rc6-mm1: suspend broken on HP nx6325 due to cpufr..., Rafael J. Wysocki, (Mon Dec 24, 10:13 am)
Re: 2.6.24-rc6-mm1, Rafael J. Wysocki, (Sun Dec 23, 8:35 am)
Re: 2.6.24-rc6-mm1, H. Peter Anvin, (Sun Dec 23, 7:09 pm)
Re: 2.6.24-rc6-mm1, Ingo Molnar, (Sun Dec 23, 9:00 am)
Re: 2.6.24-rc6-mm1, Rafael J. Wysocki, (Sun Dec 23, 9:48 am)
Re: 2.6.24-rc6-mm1, Rafael J. Wysocki, (Sun Dec 23, 9:53 am)
Re: 2.6.24-rc6-mm1, Sam Ravnborg, (Sun Dec 23, 4:09 pm)
Re: 2.6.24-rc6-mm1, Rafael J. Wysocki, (Sun Dec 23, 6:44 pm)
Re: 2.6.24-rc6-mm1, Ingo Molnar, (Sun Dec 23, 7:04 am)
Re: 2.6.24-rc6-mm1, Ingo Molnar, (Sun Dec 23, 7:10 am)
Re: 2.6.24-rc6-mm1, Andrew Morton, (Sun Dec 23, 7:34 am)
Re: 2.6.24-rc6-mm1, Ingo Molnar, (Sun Dec 23, 7:57 am)
Re: 2.6.24-rc6-mm1, Christoph Hellwig, (Sun Dec 23, 8:12 am)