Re: repeatable slab corruption with LTP msgctl08

Previous thread: [PATCH] [BLOCK] request_module() use format string by maximilian attems on Thursday, June 12, 2008 - 12:46 am. (1 message)

Next thread: 2.6.26-rc5-mm3 by Andrew Morton on Thursday, June 12, 2008 - 1:59 am. (100 messages)
To: Nadia Derbey <Nadia.Derbey@...>, Manfred Spraul <manfred@...>, <linux-kernel@...>
Cc: <linux-mm@...>, Pekka Enberg <penberg@...>, Christoph Lameter <clameter@...>
Date: Thursday, June 12, 2008 - 1:13 am

Running current mainline on my old 2-way PIII. Distro is RH FC1. LTP
version is ltp-full-20070228 (lots of retro-computing there).

Config is at http://userweb.kernel.org/~akpm/config-vmm.txt

./testcases/bin/msgctl08 crashes after ten minutes or so:

slab: Internal list corruption detected in cache 'size-128'(26), slabp f2905000(20). Hexdump:

000: 00 e0 12 f2 88 32 c0 f7 88 00 00 00 88 50 90 f2
010: 14 00 00 00 0f 00 00 00 00 00 00 00 ff ff ff ff
020: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
030: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
040: fd ff ff ff fd ff ff ff 00 00 00 00 fd ff ff ff
050: fd ff ff ff fd ff ff ff 19 00 00 00 17 00 00 00
060: fd ff ff ff fd ff ff ff 0b 00 00 00 fd ff ff ff
070: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
080: 10 00 00 00
------------[ cut here ]------------
kernel BUG at mm/slab.c:2949!
invalid opcode: 0000 [#1] SMP
Modules linked in:

Pid: 3348, comm: msgctl08 Not tainted (2.6.26-rc5 #1)
EIP: 0060:[<c017a35b>] EFLAGS: 00010086 CPU: 0
EIP is at check_slabp+0xeb/0x100
EAX: 00000001 EBX: f2905083 ECX: 00000001 EDX: f20ee670
ESI: f2905000 EDI: 00000084 EBP: f4671e88 ESP: f4671e64
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process msgctl08 (pid: 3348, ti=f4670000 task=f20ee670 task.ti=f4670000)
Stack: c0472f2b 00000000 0000001a f2905000 00000014 f7c01500 ffffffff 0000000e
f2905000 f4671eec c017b69f 00000010 000000d0 f20ee670 f7c032ac 000000d0
f7c01500 0000000e f7c03288 f7c06df0 f29ec088 00000098 00000000 00000000
Call Trace:
[<c017b69f>] ? cache_alloc_refill+0xcf/0x6b0
[<c017bdd4>] ? __kmalloc+0x154/0x160
[<c0257663>] ? load_msg+0x33/0x150
[<c0257663>] ? load_msg+0x33/0x150
[<c0257dfb>] ? do_msgsnd+0x17b/0x2e0
[<c0257cc9>] ? do_msgsnd+0x49/0x2e0
[<c0126f1f>] ? __do_softirq+0x6f/0x100
[<c0126e58>] ? _local_bh_enable+0x48/0xa0
[<c0257f92>] ? sys_msgsnd+0x32/0x40
[<c0106e12>] ? sys_ipc+0xb2/0x240
[<c0102f58>] ? s...

To: Andrew Morton <akpm@...>
Cc: Manfred Spraul <manfred@...>, <linux-kernel@...>, <linux-mm@...>, Pekka Enberg <penberg@...>, Christoph Lameter <clameter@...>
Date: Friday, June 13, 2008 - 3:03 am

Andrew,

Sorry for answering so late (I only saw you e-mail yesterday afternoon).
I have just run msgctl08, setting msgmni to 1722 and turning on the SLAB
debug:

CONFIG_SLAB=y
CONFIG_SLABINFO=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_SLAB_LEAK=y

kernel: linux-2.6.26-rc5-mm1
ltp: ltp-full-20080430

But I could not reproduce the bug.

Will try to investigate more.

Regards,
Nadia
--

To: Nadia Derbey <Nadia.Derbey@...>
Cc: Andrew Morton <akpm@...>, Manfred Spraul <manfred@...>, <linux-kernel@...>, <linux-mm@...>, Pekka Enberg <penberg@...>, Christoph Lameter <clameter@...>
Date: Friday, June 13, 2008 - 4:33 am

Same result with 2.6.26-rc6.
I saw from your config file that you're running a 2.6.26-rc2. Will try
with that one.

Regards,
Nadia
--

To: Nadia Derbey <Nadia.Derbey@...>
Cc: Manfred Spraul <manfred@...>, <linux-kernel@...>, <linux-mm@...>, Pekka Enberg <penberg@...>, Christoph Lameter <clameter@...>
Date: Friday, June 13, 2008 - 4:42 am

erk. It'll make me a week to bisect this. I suppose I can plod away
at it in the background, but I won't be able to do that until the end

No, that was just when I last checked in that machine's .config file.
It was yesterday's mainline, with that .config and `make oldconfig'.
--

To: Andrew Morton <akpm@...>
Cc: Nadia Derbey <Nadia.Derbey@...>, Manfred Spraul <manfred@...>, <linux-kernel@...>, <linux-mm@...>, Christoph Lameter <clameter@...>
Date: Thursday, June 12, 2008 - 6:35 am

Hi Andrew,

Looking at the above dump, slabp->free is 0x0f and the bufctl it points to
is 0xff ("BUFCTL_END") which marks the last element in the chain. This is
wrong as the total number of objects in the slab (cachep->num) is 26 but
the number of objects in use (slabp->inuse) is 20. So somehow you have
managed to lost 6 objects from the bufctl chain.

I really don't understand how your bufctl chains has so many BUFCTL_END
elements in the first place. It's doesn't look like the memory has been
stomped on (slab->s_mem, for example, is 0xf2906088), so I'd look for a
double kfree() of size 128 somewhere...

Pekka
--

To: Pekka J Enberg <penberg@...>
Cc: Andrew Morton <akpm@...>, Nadia Derbey <Nadia.Derbey@...>, <linux-kernel@...>, <linux-mm@...>, Christoph Lameter <clameter@...>
Date: Thursday, June 12, 2008 - 2:33 pm

Hmm. double kfree() should be cached by the redzone code.
And I disagree with your link interpretation:

000: 00 e0 12 f2 88 32 c0 f7 88 00 00 00 88 50 90 f2
010:
inuse: 14 00 00 00 (20 entries in use, 6 should be free)
free: 0f 00 00 00
nodeid: 00 00 00 00
bufctl[0x00] ff ff ff ff 020: fd ff ff ff fd ff ff ff fd ff ff ff
bufctl[0x4] fd ff ff ff 030: fd ff ff ff fd ff ff ff fd ff ff ff
bufctl[0x8] fd ff ff ff 040: fd ff ff ff fd ff ff ff 00 00 00 00
bufctl[0x0c] fd ff ff ff 050: fd ff ff ff fd ff ff ff 19 00 00 00
bufctl[0x10] 17 00 00 00 060: fd ff ff ff fd ff ff ff 0b 00 00 00
bufctl[0x14] fd ff ff ff 070: fd ff ff ff fd ff ff ff fd ff ff ff
bufctl[0x18] fd ff ff ff 080: 10 00 00 00

free: points to entry 0x0f.
bufctl[0x0f] is 0x19, i.e. it points to entry 0x19
0x19 points to 0x10
0x10 points to 0x17
0x17 is a BUFCTL_ACTIVE - that's a bug.
but: 0x13 is a valid link entry, is points to 0x0b
0x0b points to 0x00, which is BUFCTL_END.

IMHO the most probable bug is a single bit error:
bufctl[0x10] should be 0x13 instead of 0x17.

What about printing all redzone words? That would allow us to validate the bufctl chain.

Andrew: Could you post the new oops?

--
Manfred

--

To: Manfred Spraul <manfred@...>
Cc: Andrew Morton <akpm@...>, Nadia Derbey <Nadia.Derbey@...>, <linux-kernel@...>, <linux-mm@...>, Christoph Lameter <clameter@...>
Date: Thursday, June 12, 2008 - 3:27 pm

Ah, you're, of course, right. For some reason I was looking at 8-bit
bufctls when they are, in fact, 32-bit.

Pekka
--

To: Manfred Spraul <manfred@...>
Cc: Pekka J Enberg <penberg@...>, Nadia Derbey <Nadia.Derbey@...>, <linux-kernel@...>, <linux-mm@...>, Christoph Lameter <clameter@...>
Date: Thursday, June 12, 2008 - 2:41 pm

umm, what new oops?

I have four saved away here:

slab: Internal list corruption detected in cache 'size-96'(32), slabp ea2a5040(28). Hexdump:

000: 20 90 b5 ec 88 54 80 f7 e0 00 00 00 e0 50 2a ea
010: 1c 00 00 00 17 00 00 00 00 00 00 00 fd ff ff ff
020: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
030: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
040: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
050: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
060: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
070: fd ff ff ff fd ff ff ff 18 00 00 00 1f 00 00 00
080: fd ff ff ff fd ff ff ff 1c 00 00 00 ff ff ff ff
090: fd ff ff ff fd ff ff ff fd ff ff ff
------------[ cut here ]------------
kernel BUG at mm/slab.c:2949!
invalid opcode: 0000 [#1] SMP
last sysfs file:
Modules linked in:

Pid: 5535, comm: msgctl08 Not tainted (2.6.26-rc5-mm3 #3)
EIP: 0060:[<c0184b5b>] EFLAGS: 00010082 CPU: 0
EIP is at check_slabp+0xeb/0x100
EAX: 00000001 EBX: ea2a50db ECX: 00000001 EDX: eea93230
ESI: ea2a5040 EDI: 0000009c EBP: eea95e84 ESP: eea95e60
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process msgctl08 (pid: 5535, ti=eea94000 task=eea93230 task.ti=eea94000)
Stack: c048897a 000000ff 00000020 ea2a5040 0000001c f7801140 f7822478 00000010
ea2a5040 eea95eec c0185eaf 00000000 00000010 000000d0 00000001 f78054ac
000000d0 f7801140 00000010 f7805488 f7808d38 c063a068 eea935e8 00000001
Call Trace:
[<c0185eaf>] ? cache_alloc_refill+0xcf/0x6c0
[<c01865f4>] ? __kmalloc+0x154/0x160
[<c0264113>] ? load_msg+0x33/0x150
[<c0264113>] ? load_msg+0x33/0x150
[<c02648ab>] ? do_msgsnd+0x17b/0x2e0
[<c0264779>] ? do_msgsnd+0x49/0x2e0
[<c02647d8>] ? do_msgsnd+0xa8/0x2e0
[<c0264a42>] ? sys_msgsnd+0x32/0x40
[<c0106eb2>] ? sys_ipc+0xb2/0x240
[<c02855e4>] ? trace_hardirqs_on_thunk+0xc/0x10
[<c0102f35>] ? sysenter_past_esp+0x6a/0xa5
=======================
Code: 02 fa ff 8b 55 f0 8b 42 38 8d 04 85 1c 00 00 ...

To: Andrew Morton <akpm@...>
Cc: Pekka J Enberg <penberg@...>, Nadia Derbey <Nadia.Derbey@...>, <linux-kernel@...>, <linux-mm@...>, Christoph Lameter <clameter@...>
Date: Thursday, June 12, 2008 - 3:09 pm

bufctl[0x00] 13 00 00 00 020: fd ff ff ff fd ff ff ff fd ff ff ff
bufctl[0x04] fd ff ff ff 030: 06 00 00 00 ff ff ff ff fd ff ff ff
bufctl[0x08] 18 00 00 00 040: fd ff ff ff fd ff ff ff 17 00 00 00
bufctl[0x0c] fd ff ff ff 050: fd ff ff ff fd ff ff ff fd ff ff ff
bufctl[0x10] fd ff ff ff 060: fd ff ff ff fd ff ff ff 05 00 00 00
bufctl[0x14] fd ff ff ff 070: fd ff ff ff fd ff ff ff 00 00 00 00
bufctl[0x18] 0f 00 00 00 080: fd ff ff ff

bufcfl[0x00] fd ff ff ff 020: fd ff ff ff fd ff ff ff 07 00 00 00
bufctl[0x04] fd ff ff ff 030: fd ff ff ff fd ff ff ff 08 00 00 00
bufctl[0x08] 0f 00 00 00 040: fd ff ff ff fd ff ff ff ff ff ff ff
bufctl[0x0c] fd ff ff ff 050: fd ff ff ff fd ff ff ff fd ff ff ff
bufctl[0x10] fd ff ff ff 060: fd ff ff ff 03 00 00 00 fd ff ff ff
bufctl[0x14] fd ff ff ff 070: fd ff ff ff fd ff ff ff fd ff ff ff
bufctl[0x18 fd ff ff ff 080: fd ff ff ff

All bugs appear to be a spurious 0x04 in a bufctl[nr%8==0].

Either someone does a set_bit() or your cpu is breaking down.

From looking at the the msgctl08 test: it shouldn't produce any races,
it just does lots of bulk msgsnd()/msgrcv() operations. Always one
thread sends, one thread receives on each queue. It's probably more a
scheduler stresstest than anything else.

Attached is a completely untested patch:
- add 8 bytes to each slabp struct: This changes the alignment of the
bufctl entries.
- add a hexdump of the redzone bytes. Andrew: how do you log the oops?
it might scroll of the screen.

--
Manfred

To: Manfred Spraul <manfred@...>
Cc: <penberg@...>, <Nadia.Derbey@...>, <linux-kernel@...>, <linux-mm@...>, <clameter@...>
Date: Thursday, June 12, 2008 - 3:20 pm

On Thu, 12 Jun 2008 21:09:10 +0200

Well. It is about ten years old. But this is the first sign of a

OK, I'll try that this evening (eight hours hence).

netconsole-to-disk.

--

To: Pekka J Enberg <penberg@...>
Cc: Andrew Morton <akpm@...>, Nadia Derbey <Nadia.Derbey@...>, Manfred Spraul <manfred@...>, <linux-kernel@...>, <linux-mm@...>
Date: Thursday, June 12, 2008 - 1:13 pm

Looks pretty strange. Could this be rerun with SLAB_DEBUG or with SLUB
with full debugging?
--

To: Nadia Derbey <Nadia.Derbey@...>, Manfred Spraul <manfred@...>, <linux-kernel@...>, <linux-mm@...>, Pekka Enberg <penberg@...>, Christoph Lameter <clameter@...>
Date: Thursday, June 12, 2008 - 2:34 am

ah, it runs to completion in about ten seconds on 2.6.25, so it'll be
easy for someone to bisect it.

What's that? Sigh. OK. I wasn't doing anything much anyway.
--

To: Nadia Derbey <Nadia.Derbey@...>, Manfred Spraul <manfred@...>, <linux-kernel@...>, <linux-mm@...>, Pekka Enberg <penberg@...>, Christoph Lameter <clameter@...>
Date: Thursday, June 12, 2008 - 4:02 am

Oh drat. git-bisect tells me that this one-year-old msgctl08's
execution time vastly increased when we added

commit f7bf3df8be72d98afa84f5ff183e14c1ba1e560d
Author: Nadia Derbey <Nadia.Derbey@bull.net>
Date: Tue Apr 29 01:00:39 2008 -0700

ipc: scale msgmni to the amount of lowmem

But we already knew that, and LTP got changed to fix it.

So I was wrong in assuming that the long-execution-time correlates with
the slab-corruption bug.

And the slab corruption bug takes half an hour to reproduce and an
unknown amount of time to not-reproduce. I don't think I'll be able to
complete this before I disappear for over a week.

--

To: Nadia Derbey <Nadia.Derbey@...>, Manfred Spraul <manfred@...>, <linux-kernel@...>, <linux-mm@...>, Pekka Enberg <penberg@...>, Christoph Lameter <clameter@...>
Date: Thursday, June 12, 2008 - 4:15 am

Doing

echo 16 > /proc/sys/kernel/msgmni

on the tree at the above bisection point
(f7bf3df8be72d98afa84f5ff183e14c1ba1e560d is most-recently-applied
patch), msgctl08 runs to completion in 15.7 seconds.

Doing the same thing on 2.6.26-rc5-mm3, msgctl08 also runs to
completion, in 20.9 seconds. So

- it got slower

- the crash requires the huge value of /proc/sys/kernel/msgmni (1746)
to reproduce.

--

To: Andrew Morton <akpm@...>
Cc: Nadia Derbey <Nadia.Derbey@...>, Manfred Spraul <manfred@...>, <linux-kernel@...>, <linux-mm@...>, Pekka Enberg <penberg@...>
Date: Thursday, June 12, 2008 - 4:35 am

That is because it scales itself to the number of available msg queues.
So with Nadia's patch there are more and it runs slower.

In fact it seems to start one process per message queue, so perhaps it's
just running out of processes or something. Ok it should not segfault.

BTW a great way to debug slab corruptions with LTP faster is to run with
a slab thrasher stress module like http://firstfloor.org/~andi/crasher-26.diff

-Andi
--

To: Andi Kleen <andi@...>
Cc: Nadia Derbey <Nadia.Derbey@...>, Manfred Spraul <manfred@...>, <linux-kernel@...>, <linux-mm@...>, Pekka Enberg <penberg@...>
Date: Thursday, June 12, 2008 - 6:24 am

Well I tried that. It didn't actually seem to do much (no CPU time
consumed) so I revved it up a bit:

--- a/drivers/char/crasher.c~crasher-26-speedup
+++ a/drivers/char/crasher.c
@@ -59,7 +59,7 @@ struct mem_buf {
static unsigned long crasher_random(void)
{
rand_seed = rand_seed*69069L+1;
- return rand_seed^jiffies;
+ return (rand_seed^jiffies) & 3;
}

void crasher_srandom(unsigned long entropy)
_

But it hasn't crashed after 57 minutes.

I don't think that is how we should fix this bug ;)

I'm pretty much out of time on this one.
--

To: Andrew Morton <akpm@...>
Cc: Nadia Derbey <Nadia.Derbey@...>, Manfred Spraul <manfred@...>, <linux-kernel@...>, <linux-mm@...>, Pekka Enberg <penberg@...>
Date: Thursday, June 12, 2008 - 6:41 am

This means it doesn't corrupt other slabs. Maybe only its own?

-Andi

--

To: Andi Kleen <andi@...>
Cc: Nadia Derbey <Nadia.Derbey@...>, Manfred Spraul <manfred@...>, <linux-kernel@...>, <linux-mm@...>, Pekka Enberg <penberg@...>
Date: Thursday, June 12, 2008 - 5:08 am

Something like that might be needed. I ran it again and it took 49
minutes to crash.

--

To: Nadia Derbey <Nadia.Derbey@...>, Manfred Spraul <manfred@...>, <linux-kernel@...>, <linux-mm@...>, Pekka Enberg <penberg@...>, Christoph Lameter <clameter@...>
Date: Thursday, June 12, 2008 - 4:16 am

--

Previous thread: [PATCH] [BLOCK] request_module() use format string by maximilian attems on Thursday, June 12, 2008 - 12:46 am. (1 message)

Next thread: 2.6.26-rc5-mm3 by Andrew Morton on Thursday, June 12, 2008 - 1:59 am. (100 messages)