Re: 2.6.23-rc9: Oops in cache_alloc_refill() mm/slab.c

Previous thread: [PATCH 00/32] Remove iget() and read_inode() [try #2] by David Howells on Thursday, October 4, 2007 - 11:56 am. (47 messages)

Next thread: [PATCH] Rename is_cgroup_init() by sukadev on Thursday, October 4, 2007 - 12:15 pm. (2 messages)
To: Linux Kernel Mailing List <linux-kernel@...>
Date: Thursday, October 4, 2007 - 12:13 pm

While running ffsb tests on my ext4 filesystem, I got an Oops in
cache_alloc_refill().
I turned on SLAB debugging and here is the message I got:

slab: Internal list corruption detected in cache 'buffer_head'(30),
slabp ffff81007e100100(1515870810). Hexdump:

000: 5a 5a 5a 5a 5a 5a 5a 5a b8 23 34 7e 00 81 ff ff
010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
020: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
030: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
040: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
050: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
060: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a a5
070: c0 88 56 63 c5 56 41 d8 f1 37 4a 80 ff ff ff ff
080: c0 88 56 63 c5 56 41 d8 80 33 53 7d 00 81 ff ff
090: e8 25 60 7d 00 81 ff ff 68 cb 3b 01 00 81 ff ff
0a0: 18 68 50 7d 00 81 ff ff
------------[ cut here ]------------
kernel BUG at /home/clementv/src/linux-2.6.23-rc9/mm/slab.c:2923!
invalid opcode: 0000 [1] SMP
CPU 2
Modules linked in: qla2xxx
Pid: 4041, comm: ffsb Not tainted 2.6.23-rc9 #2
RIP: 0010:[<ffffffff802758b6>] [<ffffffff802758b6>] check_slabp+0xb5/0xc1
RSP: 0018:ffff8100774bb958 EFLAGS: 00010096
RAX: 0000000000000001 RBX: ffff81007e100100 RCX: 0000000000006d20
RDX: 00000000ffffffff RSI: 0000000000000046 RDI: ffff81007e347280
RBP: 00000000000000a8 R08: 0000000000000005 R09: ffffffff8060bb10
R10: 00000000000ae468 R11: 0000000500000002 R12: 00000000000000a8
R13: ffff81007e347280 R14: ffff81007e347280 R15: 0000000000000002
FS: 0000000041802950(0063) GS:ffff81007e0c4728(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000005f83d00c CR3: 0000000078149000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ffsb (pid: 4041, threadinfo ffff8100774ba000, task ffff81007dbdc7a0)
Stack: 000000000000000d 000000000000000e ffff81007e100100 ffff81007e342398
ffff81007e078488 ffffffff80277069 ...

To: Valerie Clement <valerie.clement@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>
Date: Thursday, October 4, 2007 - 5:43 pm

slabp->inuse = 1515870810 looks bogus. Is this easily reproducible ?

-

To: Badari Pulavarty <pbadari@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, ext4 development <linux-ext4@...>
Date: Friday, October 5, 2007 - 9:41 am

Hi Badari,
Thanks for your answer.
I didn't reproduce it without the latest ext4 patches. So I suspect a
bug in one of them.
But how debugging this?
Which other debug traces can I turn on?

Valérie

-

To: Valerie Clement <valerie.clement@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, ext4 development <linux-ext4@...>, <cmm@...>
Date: Friday, October 5, 2007 - 10:54 am

Let me understand. You applied latest ext4 patchsets ? If so, Mingming
has some slab-cleanup changes in the patchset. You can try backing them
out and see.

Thanks,
Badari

-

To: Badari Pulavarty <pbadari@...>
Cc: Valerie Clement <valerie.clement@...>, Linux Kernel Mailing List <linux-kernel@...>, ext4 development <linux-ext4@...>
Date: Friday, October 5, 2007 - 4:30 pm

It's unlikely to be the jbd_slab_cleanup.patch, which actually get rid
of slab allocation for buffers passing down to disk IO, and replace with
get_free_page directly.

Could you send me the profile used for ffsb test?

Thanks,
Mingming

-

To: Badari Pulavarty <pbadari@...>
Cc: Linux Kernel Mailing List <linux-kernel@...>, ext4 development <linux-ext4@...>
Date: Friday, October 5, 2007 - 6:06 pm

-------- Forwarded Message --------
From: Valerie Clement <valerie.clement@bull.net>
To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: 2.6.23-rc9: Oops in cache_alloc_refill() mm/slab.c
Date: Thu, 04 Oct 2007 18:13:46 +0200
While running ffsb tests on my ext4 filesystem, I got an Oops in
cache_alloc_refill().
I turned on SLAB debugging and here is the message I got:

slab: Internal list corruption detected in cache 'buffer_head'(30),
slabp ffff81007e100100(1515870810). Hexdump:

=======================>

slabp->inuse counter looks corrupted (1515870810), it should not greater
than cachep->num looks valid (30)

000: 5a 5a 5a 5a 5a 5a 5a 5a b8 23 34 7e 00 81 ff ff
010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
020: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
030: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
040: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
050: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
060: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a a5
070: c0 88 56 63 c5 56 41 d8 f1 37 4a 80 ff ff ff ff
080: c0 88 56 63 c5 56 41 d8 80 33 53 7d 00 81 ff ff
090: e8 25 60 7d 00 81 ff ff 68 cb 3b 01 00 81 ff ff
0a0: 18 68 50 7d 00 81 ff ff
------------[ cut here ]------------
kernel BUG at /home/clementv/src/linux-2.6.23-rc9/mm/slab.c:2923!
invalid opcode: 0000 [1] SMP
CPU 2
Modules linked in: qla2xxx
Pid: 4041, comm: ffsb Not tainted 2.6.23-rc9 #2
RIP: 0010:[<ffffffff802758b6>] [<ffffffff802758b6>] check_slabp+0xb5/0xc1
RSP: 0018:ffff8100774bb958 EFLAGS: 00010096
RAX: 0000000000000001 RBX: ffff81007e100100 RCX: 0000000000006d20
RDX: 00000000ffffffff RSI: 0000000000000046 RDI: ffff81007e347280
RBP: 00000000000000a8 R08: 0000000000000005 R09: ffffffff8060bb10
R10: 00000000000ae468 R11: 0000000500000002 R12: 00000000000000a8
R13: ffff81007e347280 R14: ffff81007e347280 R15: 0000000000000002
FS: 0000000041802950(0063) GS:ffff81007e0c4728(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0...

Previous thread: [PATCH 00/32] Remove iget() and read_inode() [try #2] by David Howells on Thursday, October 4, 2007 - 11:56 am. (47 messages)

Next thread: [PATCH] Rename is_cgroup_init() by sukadev on Thursday, October 4, 2007 - 12:15 pm. (2 messages)