Re: 2.6.24-rc6-mm1

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: J. Bruce Fields <bfields@...>
Cc: Herbert Xu <herbert@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, Neil Brown <neilb@...>, <netdev@...>, Tom Tucker <tom@...>
Date: Thursday, January 3, 2008 - 1:02 am

On Jan 2, 2008 10:57 PM, J. Bruce Fields <bfields@fieldses.org> wrote:

The problem with that is, that triggering the bug is not easy so
marking anything 'good' is questionable.
This time I needed to compile over 50 packages until it triggered.

I was using 2.6.24-rc6-mm1 again, but with a crude hack (see end of
mail) that I hope should catch any double-frees of skbs.
None of my warnings triggered, only a list corruption again in
svc_xprt_enqueue(), but this time with an additional output about
whats wrong with the list:
[17023.029519] list_add corruption. prev->next should be next
(ffff8100d20ec1c8), but was ffff81009c5a6
c28. (prev=ffff81009c5a6c28).
[17023.029537] ------------[ cut here ]------------
[17023.031445] kernel BUG at lib/list_debug.c:33!
[17023.033280] invalid opcode: 0000 [1] SMP
[17023.034967] last sysfs file:
/sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[17023.038209] CPU 3
[17023.039047] Modules linked in: radeon drm w83792d ipv6 tuner
tea5767 tda8290 tuner_xc2028 tda9887 tu
ner_simple mt20xx tea5761 tvaudio msp3400 bttv ir_common
compat_ioctl32 videobuf_dma_sg videobuf_core b
tcx_risc usbhid tveeprom videodev hid v4l2_common v4l1_compat sg
pata_amd i2c_nforce2
[17023.039519] Pid: 20564, comm: nfsv4-svc Not tainted 2.6.24-rc6-mm1 #14
[17023.039519] RIP: 0010:[<ffffffff803bd834>]  [<ffffffff803bd834>]
__list_add+0x54/0x60
[17023.039519] RSP: 0018:ffff8101002c9dc0  EFLAGS: 00010282
[17023.039519] RAX: 0000000000000088 RBX: ffff810110125c00 RCX: 0000000000000000
[17023.039519] RDX: ffff81010067c000 RSI: 0000000000000001 RDI: ffffffff80764140
[17023.039519] RBP: ffff8101002c9dc0 R08: 0000000000000001 R09: 0000000000000000
[17023.039519] R10: ffff81000100a088 R11: 0000000000000001 R12: ffff8100d20ec180
[17023.039519] R13: ffff8100d20ec1b8 R14: ffff8100d20ec1b8 R15: ffff8101188e4600
[17023.039519] FS:  00007ff7a870c6f0(0000) GS:ffff81011ff0cd00(0000)
knlGS:0000000000000000
[17023.039519] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[17023.039519] CR2: 00000000024df510 CR3: 0000000032539000 CR4: 00000000000006e0
[17023.039519] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[17023.039519] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[17023.039519] Process nfsv4-svc (pid: 20564, threadinfo
ffff8101002c8000, task ffff81010067c000)
[17023.039519] Stack:  ffff8101002c9e00 ffffffff805c18ab
ffff8100d20ec188 ffff8101188e4600
[17023.039519]  ffff81009c5a6c00 ffff81010d118000 ffff810110125c00
ffff8101188e4610
[17023.039519]  ffff8101002c9e10 ffffffff805c1997 ffff8101002c9ee0
ffffffff805c2ac4
[17023.039519] Call Trace:
[17023.039519]  [<ffffffff805c18ab>] svc_xprt_enqueue+0x1ab/0x240
[17023.039519]  [<ffffffff805c1997>] svc_xprt_received+0x17/0x20
[17023.039519]  [<ffffffff805c2ac4>] svc_recv+0x394/0x7c0
[17023.039519]  [<ffffffff805c1ece>] svc_send+0xae/0xd0
[17023.039519]  [<ffffffff802305f0>] default_wake_function+0x0/0x10
[17023.039519]  [<ffffffff803178b9>] nfs_callback_svc+0x79/0x130
[17023.039519]  [<ffffffff80232a67>] finish_task_switch+0x67/0xe0
[17023.039519]  [<ffffffff8020c4b8>] child_rip+0xa/0x12
[17023.039519]  [<ffffffff8020bbcf>] restore_args+0x0/0x30
[17023.039519]  [<ffffffff805b69cd>] __svc_create_thread+0xdd/0x200
[17023.039519]  [<ffffffff80317840>] nfs_callback_svc+0x0/0x130
[17023.039519]  [<ffffffff8020c4ae>] child_rip+0x0/0x12
[17023.039519]
[17023.039519]
[17023.039519] Code: 0f 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 8b 16 48 89 e5 e8
[17023.039519] RIP  [<ffffffff803bd834>] __list_add+0x54/0x60
[17023.039519]  RSP <ffff8101002c9dc0>
[17023.039524] ---[ end trace a9257b24a4b10968 ]---
[17023.041451] Kernel panic - not syncing: Aiee, killing interrupt handler!

I also wonder if this really is only one bug, or two. (One in skb
handling and one in svc_xprt_enqueue's list code)

Summary of what I think is related to this:

* 2.6.24-rc2-mm1 might not have it, but had a bug in the nfsv4 sillyrenames.
* 2.6.24-rc3-mm1 did not work for me at all (IO-APIC problem)
* 2.6.24-rc3-mm2 has the bug
  - I have seen a crash in ether1394_complete_cb
  - I have seen 3 crashes in svc_xprt_enqueue, two with slub_debug=FZP
  - I have seen a crash in tcp_send_ack -> __alloc_skb
  - patched with svc_xprt_received(&svsk->sk_xprt); removed from
svc_create_socket()
    -> still crashed in svc_xprt_enqueue
  - patched "skb_release_all(dst);" back to "skb_release_data(dst);"
in skb_morph() (net/core/skbuff.c).
    -> seemed to work (200+ packages compiled+installed)
* 2.6.24-rc4-mm1 and -rc5-mm1: not tried, was still testing -rc3-mm2
* 2.6.24-rc6 did not crash, but I did not test it for too long
* 2.6.24-rc6-mm1 has the bug
  - I have seen a crash in skb_release_data with vanilla -mm1
  - patched "skb_release_all(dst);" back to "skb_release_data(dst);" in
skb_morph() (net/core/skbuff.c).
    -> crashed in xs_tcp_data_recv, skb_morph() was not called once
  - patched with notfreed-hack
    -> crashed in svc_xprt_enqueue, skb_morph() was not called once

I will see if I have the time to try git-nfsd, but as I'm not able to
trigger the bug reliable, I will not be able to really declare any
tree unrelated...

Torsten

My skb-double-free-hack: I added a new __u8 field to sk_buff after its
cb field and the following warnings:
--- skbuff.c~   2008-01-03 05:29:32.092408637 +0100
+++ skbuff.c    2008-01-02 23:00:14.114535678 +0100
@@ -205,6 +205,7 @@ struct sk_buff *__alloc_skb(unsigned int
        memset(skb, 0, offsetof(struct sk_buff, tail));
        skb->truesize = size + sizeof(struct sk_buff);
        atomic_set(&skb->users, 1);
+       skb->notfreed=1;
        skb->head = data;
        skb->data = data;
        skb_reset_tail_pointer(skb);
@@ -317,13 +318,18 @@ static void kfree_skbmem(struct sk_buff

        switch (skb->fclone) {
        case SKB_FCLONE_UNAVAILABLE:
+               WARN_ON(!skb->notfreed);
+               skb->notfreed=0;
                kmem_cache_free(skbuff_head_cache, skb);
                break;

        case SKB_FCLONE_ORIG:
                fclone_ref = (atomic_t *) (skb + 2);
-               if (atomic_dec_and_test(fclone_ref))
+               if (atomic_dec_and_test(fclone_ref)) {
+                 WARN_ON(!skb->notfreed);
+                 skb->notfreed=0;
                        kmem_cache_free(skbuff_fclone_cache, skb);
+                }
                break;

        case SKB_FCLONE_CLONE:
@@ -335,8 +341,11 @@ static void kfree_skbmem(struct sk_buff
                 */
                skb->fclone = SKB_FCLONE_UNAVAILABLE;

-               if (atomic_dec_and_test(fclone_ref))
+               if (atomic_dec_and_test(fclone_ref)) {
+                 WARN_ON(!other->notfreed);
+                 other->notfreed=0;
                        kmem_cache_free(skbuff_fclone_cache, other);
+                }
                break;
        }
 }
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
2.6.24-rc6-mm1, Andrew Morton, (Sun Dec 23, 3:30 am)
Re: 2.6.24-rc6-mm1, Dave Young, (Wed Dec 26, 4:37 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sun Dec 23, 12:27 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Fri Dec 28, 6:53 pm)
Re: 2.6.24-rc6-mm1, Andrew Morton, (Fri Dec 28, 7:07 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sat Dec 29, 12:51 pm)
Re: 2.6.24-rc6-mm1, Herbert Xu, (Sat Dec 29, 9:30 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sat Dec 29, 11:34 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Mon Dec 31, 4:15 pm)
Re: 2.6.24-rc6-mm1, Herbert Xu, (Tue Jan 1, 8:04 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Wed Jan 2, 2:29 pm)
Re: 2.6.24-rc6-mm1, Herbert Xu, (Wed Jan 2, 5:51 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Fri Jan 4, 6:23 am)
Re: 2.6.24-rc6-mm1, Jarek Poplawski, (Fri Jan 4, 9:30 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Fri Jan 4, 11:21 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Fri Jan 4, 5:24 pm)
Re: 2.6.24-rc6-mm1, Jarek Poplawski, (Fri Jan 4, 8:07 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sat Jan 5, 4:01 am)
Re: 2.6.24-rc6-mm1, Jarek Poplawski, (Sat Jan 5, 6:13 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sat Jan 5, 10:52 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sat Jan 5, 6:10 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sat Jan 5, 11:16 pm)
Re: 2.6.24-rc6-mm1, Andrew Morton, (Sat Jan 5, 9:25 pm)
Re: 2.6.24-rc6-mm1, FUJITA Tomonori, (Sat Jan 5, 11:28 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sun Jan 6, 6:41 am)
Re: 2.6.24-rc6-mm1, FUJITA Tomonori, (Sun Jan 6, 7:23 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sun Jan 6, 7:35 am)
Re: 2.6.24-rc6-mm1, FUJITA Tomonori, (Sun Jan 6, 9:33 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sun Jan 6, 4:03 pm)
Re: 2.6.24-rc6-mm1, FUJITA Tomonori, (Mon Jan 7, 2:16 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Fri Jan 25, 5:06 pm)
Re: 2.6.24-rc6-mm1, Ingo Molnar, (Tue Jan 8, 11:59 am)
Re: 2.6.24-rc6-mm1, FUJITA Tomonori, (Tue Jan 8, 7:57 pm)
Re: 2.6.24-rc6-mm1, Jarek Poplawski, (Wed Jan 9, 5:04 am)
Re: 2.6.24-rc6-mm1, FUJITA Tomonori, (Wed Jan 9, 8:54 pm)
Re: 2.6.24-rc6-mm1, Andrew Morton, (Tue Jan 8, 8:27 pm)
Re: 2.6.24-rc6-mm1, FUJITA Tomonori, (Tue Jan 8, 8:54 pm)
Re: 2.6.24-rc6-mm1, Andrew Morton, (Tue Jan 8, 9:07 pm)
Re: 2.6.24-rc6-mm1, Jarek Poplawski, (Sun Jan 6, 4:27 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sun Jan 6, 6:30 am)
Re: 2.6.24-rc6-mm1, Jarek Poplawski, (Sun Jan 6, 10:52 am)
Re: 2.6.24-rc6-mm1, J. Bruce Fields, (Wed Jan 2, 5:57 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Thu Jan 3, 11:37 am)
Re: 2.6.24-rc6-mm1, J. Bruce Fields, (Thu Jan 3, 2:52 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Thu Jan 3, 1:02 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Tue Jan 1, 8:59 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Tue Jan 1, 2:29 pm)
Re: 2.6.24-rc6-mm1, Randy Dunlap, (Sun Dec 30, 1:41 am)
Re: 2.6.24-rc6-mm1, J. Bruce Fields, (Sun Dec 30, 5:24 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Sun Dec 30, 5:35 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Mon Dec 31, 9:17 am)
Re: 2.6.24-rc6-mm1, Andrew Morton, (Sun Dec 23, 4:39 pm)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Thu Dec 27, 7:42 am)
Re: 2.6.24-rc6-mm1, Torsten Kaiser, (Thu Dec 27, 10:30 am)
Re: 2.6.24-rc6-mm1 Kernel panics at different functions (), Kamalesh Babulal, (Thu Dec 27, 4:49 am)
Re: 2.6.24-rc6-mm1 Kernel panics at different functions (), Kamalesh Babulal, (Fri Dec 28, 5:11 am)
Re: 2.6.24-rc6-mm1 Kernel panics at different functions (), Kamalesh Babulal, (Thu Dec 27, 6:25 am)
Re: 2.6.24-rc6-mm1 - e1000 breakage, James Morris, (Wed Dec 26, 7:39 pm)
Re: 2.6.24-rc6-mm1 (driver core/sysfs), Randy Dunlap, (Mon Dec 31, 4:11 pm)
Re: 2.6.24-rc6-mm1 (driver core/sysfs), Greg KH, (Fri Jan 11, 9:05 pm)
[patch] auto-qa Kconfig, Ingo Molnar, (Mon Jan 14, 12:11 pm)
Re: [patch] auto-qa Kconfig, Pavel Machek, (Tue Jan 15, 6:13 pm)
Re: 2.6.24-rc6-mm1 (build problem: gpio/W1), Randy Dunlap, (Mon Dec 31, 2:19 pm)
Re: 2.6.24-rc6-mm1 (build problem: gpio/W1), Evgeniy Polyakov, (Sat Jan 5, 11:29 am)
Re: 2.6.24-rc6-mm1 (build problem: gpio/W1), Ville , (Sat Jan 5, 12:16 pm)
Re: 2.6.24-rc6-mm1 (build problem: gpio/W1), Randy Dunlap, (Sat Jan 5, 1:18 pm)
Re: 2.6.24-rc6-mm1 (build problem: gpio_keys), Randy Dunlap, (Mon Dec 31, 2:18 pm)
Re: 2.6.24-rc6-mm1 (build problem: gpio_keys), David Brownell, (Mon Dec 31, 2:40 pm)
[PATCH -mm] gpio: fix x86 build problem: gpio_keys, Randy Dunlap, (Mon Dec 31, 3:10 pm)
Re: [PATCH -mm] gpio: fix x86 build problem: gpio_keys, Ingo Molnar, (Tue Jan 1, 11:32 am)
Re: 2.6.24-rc6-mm1 (build problem: v4l / i2c), Randy Dunlap, (Mon Dec 31, 2:18 pm)
[PATCH -mm] driver core: build with SYSFS=n, Randy Dunlap, (Mon Dec 31, 2:05 pm)
[PATCH -mm] crypto: scatterwalk.h needs sched.h, Randy Dunlap, (Mon Dec 31, 2:05 pm)
Re: [PATCH -mm] crypto: scatterwalk.h needs sched.h, Herbert Xu, (Mon Dec 31, 6:31 pm)
Re: 2.6.24-rc6-mm1: __raw_spin_is_contended undefined, Joseph Fannin, (Wed Dec 26, 10:21 pm)
Re: 2.6.24-rc6-mm1: __raw_spin_is_contended undefined, Nick Piggin, (Thu Dec 27, 1:21 am)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, Mariusz Kozlowski, (Wed Dec 26, 8:29 am)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, David Miller, (Wed Dec 26, 11:05 pm)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, Adrian Bunk, (Fri Dec 28, 7:22 pm)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, David Miller, (Sat Dec 29, 4:14 am)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, Adrian Bunk, (Sat Dec 29, 4:48 am)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, David Miller, (Sat Dec 29, 4:54 am)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, Adrian Bunk, (Sat Dec 29, 5:06 am)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, David Miller, (Sat Dec 29, 5:18 am)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, Adrian Bunk, (Sat Dec 29, 5:53 am)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, David Miller, (Sat Dec 29, 5:15 am)
Re: 2.6.24-rc6-mm1: some section mismatches on sparc64, David Miller, (Sat Dec 29, 4:27 am)
Re: 2.6.24-rc6-mm1, Andreas Mohr, (Tue Dec 25, 5:51 pm)
Re: 2.6.24-rc6-mm1: suspend broken on HP nx6325 due to cpufr..., Rafael J. Wysocki, (Sun Dec 23, 6:54 pm)
Re: 2.6.24-rc6-mm1: suspend broken on HP nx6325 due to cpufr..., Rafael J. Wysocki, (Mon Dec 24, 10:13 am)
Re: 2.6.24-rc6-mm1, Rafael J. Wysocki, (Sun Dec 23, 8:35 am)
Re: 2.6.24-rc6-mm1, H. Peter Anvin, (Sun Dec 23, 7:09 pm)
Re: 2.6.24-rc6-mm1, Ingo Molnar, (Sun Dec 23, 9:00 am)
Re: 2.6.24-rc6-mm1, Rafael J. Wysocki, (Sun Dec 23, 9:48 am)
Re: 2.6.24-rc6-mm1, Rafael J. Wysocki, (Sun Dec 23, 9:53 am)
Re: 2.6.24-rc6-mm1, Sam Ravnborg, (Sun Dec 23, 4:09 pm)
Re: 2.6.24-rc6-mm1, Rafael J. Wysocki, (Sun Dec 23, 6:44 pm)
Re: 2.6.24-rc6-mm1, Ingo Molnar, (Sun Dec 23, 7:04 am)
Re: 2.6.24-rc6-mm1, Ingo Molnar, (Sun Dec 23, 7:10 am)
Re: 2.6.24-rc6-mm1, Andrew Morton, (Sun Dec 23, 7:34 am)
Re: 2.6.24-rc6-mm1, Ingo Molnar, (Sun Dec 23, 7:57 am)
Re: 2.6.24-rc6-mm1, Christoph Hellwig, (Sun Dec 23, 8:12 am)