Re: repeatable slab corruption with LTP msgctl08

Previous thread: [PATCH] [BLOCK] request_module() use format string by maximilian attems on Wednesday, June 11, 2008 - 9:46 pm. (1 message)

Next thread: 2.6.26-rc5-mm3 by Andrew Morton on Wednesday, June 11, 2008 - 10:59 pm. (100 messages)
From: Andrew Morton
Date: Wednesday, June 11, 2008 - 10:13 pm

Running current mainline on my old 2-way PIII.  Distro is RH FC1.  LTP
version is ltp-full-20070228 (lots of retro-computing there).

Config is at http://userweb.kernel.org/~akpm/config-vmm.txt


./testcases/bin/msgctl08 crashes after ten minutes or so:

slab: Internal list corruption detected in cache 'size-128'(26), slabp f2905000(20). Hexdump:

000: 00 e0 12 f2 88 32 c0 f7 88 00 00 00 88 50 90 f2
010: 14 00 00 00 0f 00 00 00 00 00 00 00 ff ff ff ff
020: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
030: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
040: fd ff ff ff fd ff ff ff 00 00 00 00 fd ff ff ff
050: fd ff ff ff fd ff ff ff 19 00 00 00 17 00 00 00
060: fd ff ff ff fd ff ff ff 0b 00 00 00 fd ff ff ff
070: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
080: 10 00 00 00
------------[ cut here ]------------
kernel BUG at mm/slab.c:2949!
invalid opcode: 0000 [#1] SMP 
Modules linked in:

Pid: 3348, comm: msgctl08 Not tainted (2.6.26-rc5 #1)
EIP: 0060:[<c017a35b>] EFLAGS: 00010086 CPU: 0
EIP is at check_slabp+0xeb/0x100
EAX: 00000001 EBX: f2905083 ECX: 00000001 EDX: f20ee670
ESI: f2905000 EDI: 00000084 EBP: f4671e88 ESP: f4671e64
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process msgctl08 (pid: 3348, ti=f4670000 task=f20ee670 task.ti=f4670000)
Stack: c0472f2b 00000000 0000001a f2905000 00000014 f7c01500 ffffffff 0000000e 
       f2905000 f4671eec c017b69f 00000010 000000d0 f20ee670 f7c032ac 000000d0 
       f7c01500 0000000e f7c03288 f7c06df0 f29ec088 00000098 00000000 00000000 
Call Trace:
 [<c017b69f>] ? cache_alloc_refill+0xcf/0x6b0
 [<c017bdd4>] ? __kmalloc+0x154/0x160
 [<c0257663>] ? load_msg+0x33/0x150
 [<c0257663>] ? load_msg+0x33/0x150
 [<c0257dfb>] ? do_msgsnd+0x17b/0x2e0
 [<c0257cc9>] ? do_msgsnd+0x49/0x2e0
 [<c0126f1f>] ? __do_softirq+0x6f/0x100
 [<c0126e58>] ? _local_bh_enable+0x48/0xa0
 [<c0257f92>] ? sys_msgsnd+0x32/0x40
 [<c0106e12>] ? sys_ipc+0xb2/0x240
 [<c0102f58>] ? sysenter_past_esp+0xa5/0xb1
 [<c0102f1d>] ? ...
From: Andrew Morton
Date: Wednesday, June 11, 2008 - 11:34 pm

ah, it runs to completion in about ten seconds on 2.6.25, so it'll be
easy for someone to bisect it.

What's that?  Sigh.  OK.  I wasn't doing anything much anyway.
--

From: Andrew Morton
Date: Thursday, June 12, 2008 - 1:02 am

Oh drat.  git-bisect tells me that this one-year-old msgctl08's
execution time vastly increased when we added

commit f7bf3df8be72d98afa84f5ff183e14c1ba1e560d
Author: Nadia Derbey <Nadia.Derbey@bull.net>
Date:   Tue Apr 29 01:00:39 2008 -0700

    ipc: scale msgmni to the amount of lowmem
    

But we already knew that, and LTP got changed to fix it.

So I was wrong in assuming that the long-execution-time correlates with
the slab-corruption bug.

And the slab corruption bug takes half an hour to reproduce and an
unknown amount of time to not-reproduce.  I don't think I'll be able to
complete this before I disappear for over a week.

--

From: Andrew Morton
Date: Thursday, June 12, 2008 - 1:15 am

Doing

	echo 16 > /proc/sys/kernel/msgmni

on the tree at the above bisection point
(f7bf3df8be72d98afa84f5ff183e14c1ba1e560d is most-recently-applied
patch), msgctl08 runs to completion in 15.7 seconds.

Doing the same thing on 2.6.26-rc5-mm3, msgctl08 also runs to
completion, in 20.9 seconds.  So

- it got slower

- the crash requires the huge value of /proc/sys/kernel/msgmni (1746)
  to reproduce.

--

From: Andrew Morton
Date: Thursday, June 12, 2008 - 1:16 am

From: Andi Kleen
Date: Thursday, June 12, 2008 - 1:35 am

That is because it scales itself to the number of available msg queues.
So with Nadia's patch there are more and it runs slower.

In fact it seems to start one process per message queue, so perhaps it's 
just running out of processes or something. Ok it should not segfault.

BTW a great way to debug slab corruptions with LTP faster is to run with
a slab thrasher stress module like http://firstfloor.org/~andi/crasher-26.diff

-Andi
--

From: Andrew Morton
Date: Thursday, June 12, 2008 - 2:08 am

Something like that might be needed.  I ran it again and it took 49
minutes to crash.

--

From: Andrew Morton
Date: Thursday, June 12, 2008 - 3:24 am

Well I tried that.  It didn't actually seem to do much (no CPU time
consumed) so I revved it up a bit: 

--- a/drivers/char/crasher.c~crasher-26-speedup
+++ a/drivers/char/crasher.c
@@ -59,7 +59,7 @@ struct mem_buf {
 static unsigned long crasher_random(void)
 {
         rand_seed = rand_seed*69069L+1;
-        return rand_seed^jiffies;
+        return (rand_seed^jiffies) & 3;
 }
 
 void crasher_srandom(unsigned long entropy)
_


But it hasn't crashed after 57 minutes.

I don't think that is how we should fix this bug ;)

I'm pretty much out of time on this one.
--

From: Andi Kleen
Date: Thursday, June 12, 2008 - 3:41 am

This means it doesn't corrupt other slabs. Maybe only its own?

-Andi

--

From: Pekka J Enberg
Date: Thursday, June 12, 2008 - 3:35 am

Hi Andrew,


Looking at the above dump, slabp->free is 0x0f and the bufctl it points to 
is 0xff ("BUFCTL_END") which marks the last element in the chain. This is 
wrong as the total number of objects in the slab (cachep->num) is 26 but 
the number of objects in use (slabp->inuse) is 20. So somehow you have 
managed to lost 6 objects from the bufctl chain.

I really don't understand how your bufctl chains has so many BUFCTL_END 
elements in the first place. It's doesn't look like the memory has been 
stomped on (slab->s_mem, for example, is 0xf2906088), so I'd look for a 
double kfree() of size 128 somewhere...

		Pekka
--

From: Christoph Lameter
Date: Thursday, June 12, 2008 - 10:13 am

Looks pretty strange. Could this be rerun with SLAB_DEBUG or with SLUB 
with full debugging?
--

From: Manfred Spraul
Date: Thursday, June 12, 2008 - 11:33 am

Hmm. double kfree() should be cached by the redzone code.
And I disagree with your link interpretation:

000: 00 e0 12 f2 88 32 c0 f7 88 00 00 00 88 50 90 f2
010:
inuse: 14 00 00 00 (20 entries in use, 6 should be free)
free:  0f 00 00 00
nodeid: 00 00 00 00
bufctl[0x00] ff ff ff ff 020: fd ff ff ff fd ff ff ff fd ff ff ff
bufctl[0x4] fd ff ff ff  030: fd ff ff ff fd ff ff ff fd ff ff ff
bufctl[0x8] fd ff ff ff  040: fd ff ff ff fd ff ff ff 00 00 00 00
bufctl[0x0c] fd ff ff ff 050: fd ff ff ff fd ff ff ff 19 00 00 00
bufctl[0x10] 17 00 00 00 060: fd ff ff ff fd ff ff ff 0b 00 00 00
bufctl[0x14] fd ff ff ff 070: fd ff ff ff fd ff ff ff fd ff ff ff
bufctl[0x18] fd ff ff ff 080: 10 00 00 00

free: points to entry 0x0f.
bufctl[0x0f] is 0x19, i.e. it points to entry 0x19
0x19 points to 0x10
0x10 points to 0x17
0x17 is a BUFCTL_ACTIVE - that's a bug.
but: 0x13 is a valid link entry, is points to 0x0b
0x0b points to 0x00, which is BUFCTL_END.

IMHO the most probable bug is a single bit error:
bufctl[0x10] should be 0x13 instead of 0x17.

What about printing all redzone words? That would allow us to validate the bufctl chain.

Andrew: Could you post the new oops?

--
	Manfred

--

From: Andrew Morton
Date: Thursday, June 12, 2008 - 11:41 am

umm, what new oops?

I have four saved away here:

slab: Internal list corruption detected in cache 'size-96'(32), slabp ea2a5040(28). Hexdump:

000: 20 90 b5 ec 88 54 80 f7 e0 00 00 00 e0 50 2a ea
010: 1c 00 00 00 17 00 00 00 00 00 00 00 fd ff ff ff
020: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
030: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
040: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
050: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
060: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
070: fd ff ff ff fd ff ff ff 18 00 00 00 1f 00 00 00
080: fd ff ff ff fd ff ff ff 1c 00 00 00 ff ff ff ff
090: fd ff ff ff fd ff ff ff fd ff ff ff
------------[ cut here ]------------
kernel BUG at mm/slab.c:2949!
invalid opcode: 0000 [#1] SMP 
last sysfs file: 
Modules linked in:

Pid: 5535, comm: msgctl08 Not tainted (2.6.26-rc5-mm3 #3)
EIP: 0060:[<c0184b5b>] EFLAGS: 00010082 CPU: 0
EIP is at check_slabp+0xeb/0x100
EAX: 00000001 EBX: ea2a50db ECX: 00000001 EDX: eea93230
ESI: ea2a5040 EDI: 0000009c EBP: eea95e84 ESP: eea95e60
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process msgctl08 (pid: 5535, ti=eea94000 task=eea93230 task.ti=eea94000)
Stack: c048897a 000000ff 00000020 ea2a5040 0000001c f7801140 f7822478 00000010 
       ea2a5040 eea95eec c0185eaf 00000000 00000010 000000d0 00000001 f78054ac 
       000000d0 f7801140 00000010 f7805488 f7808d38 c063a068 eea935e8 00000001 
Call Trace:
 [<c0185eaf>] ? cache_alloc_refill+0xcf/0x6c0
 [<c01865f4>] ? __kmalloc+0x154/0x160
 [<c0264113>] ? load_msg+0x33/0x150
 [<c0264113>] ? load_msg+0x33/0x150
 [<c02648ab>] ? do_msgsnd+0x17b/0x2e0
 [<c0264779>] ? do_msgsnd+0x49/0x2e0
 [<c02647d8>] ? do_msgsnd+0xa8/0x2e0
 [<c0264a42>] ? sys_msgsnd+0x32/0x40
 [<c0106eb2>] ? sys_ipc+0xb2/0x240
 [<c02855e4>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c0102f35>] ? sysenter_past_esp+0x6a/0xa5
 =======================
Code: 02 fa ff 8b 55 f0 8b 42 38 8d 04 85 1c 00 00 00 39 f8 76 0b 43 f7 c7 0f 00 00 00 75 d2 eb bd c7 04 24 7a 89 48 c0 ...
From: Manfred Spraul
Date: Thursday, June 12, 2008 - 12:09 pm

bufctl[0x00] 13 00 00 00 020: fd ff ff ff fd ff ff ff fd ff ff ff
bufctl[0x04]  fd ff ff ff 030: 06 00 00 00 ff ff ff ff fd ff ff ff
bufctl[0x08] 18 00 00 00 040: fd ff ff ff fd ff ff ff 17 00 00 00
bufctl[0x0c] fd ff ff ff 050: fd ff ff ff fd ff ff ff fd ff ff ff
bufctl[0x10] fd ff ff ff 060: fd ff ff ff fd ff ff ff 05 00 00 00
bufctl[0x14] fd ff ff ff 070: fd ff ff ff fd ff ff ff 00 00 00 00
bufctl[0x18] 0f 00 00 00 080: fd ff ff ff

bufcfl[0x00] fd ff ff ff 020: fd ff ff ff fd ff ff ff 07 00 00 00
bufctl[0x04] fd ff ff ff 030: fd ff ff ff fd ff ff ff 08 00 00 00
bufctl[0x08] 0f 00 00 00 040: fd ff ff ff fd ff ff ff ff ff ff ff
bufctl[0x0c] fd ff ff ff 050: fd ff ff ff fd ff ff ff fd ff ff ff
bufctl[0x10] fd ff ff ff 060: fd ff ff ff 03 00 00 00 fd ff ff ff
bufctl[0x14] fd ff ff ff 070: fd ff ff ff fd ff ff ff fd ff ff ff
bufctl[0x18 fd ff ff ff 080: fd ff ff ff

All bugs appear to be a spurious 0x04 in a bufctl[nr%8==0].

Either someone does a set_bit() or your cpu is breaking down.

 From looking at the the msgctl08 test: it shouldn't produce any races, 
it just does lots of bulk msgsnd()/msgrcv() operations. Always one 
thread sends, one thread receives on each queue. It's probably more a 
scheduler stresstest than anything else.

Attached is a completely untested patch:
- add 8 bytes to each slabp struct: This changes the alignment of the 
bufctl entries.
- add a hexdump of the redzone bytes. Andrew: how do you log the oops? 
it might scroll of the screen.

--
    Manfred
From: Andrew Morton
Date: Thursday, June 12, 2008 - 12:20 pm

On Thu, 12 Jun 2008 21:09:10 +0200

Well.  It is about ten years old.  But this is the first sign of a

OK, I'll try that this evening (eight hours hence).


netconsole-to-disk.

--

From: Pekka Enberg
Date: Thursday, June 12, 2008 - 12:27 pm

Ah, you're, of course, right. For some reason I was looking at 8-bit 
bufctls when they are, in fact, 32-bit.

		Pekka
--

From: Nadia Derbey
Date: Friday, June 13, 2008 - 12:03 am

Andrew,

Sorry for answering so late (I only saw you e-mail yesterday afternoon).
I have just run msgctl08, setting msgmni to 1722 and turning on the SLAB 
debug:

CONFIG_SLAB=y
CONFIG_SLABINFO=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_SLAB_LEAK=y

kernel: linux-2.6.26-rc5-mm1
ltp: ltp-full-20080430

But I could not reproduce the bug.

Will try to investigate more.

Regards,
Nadia
--

From: Nadia Derbey
Date: Friday, June 13, 2008 - 1:33 am

Same result with 2.6.26-rc6.
I saw from your config file that you're running a 2.6.26-rc2. Will try 
with that one.

Regards,
Nadia
--

From: Andrew Morton
Date: Friday, June 13, 2008 - 1:42 am

erk.  It'll make me a week to bisect this.  I suppose I can plod away
at it in the background, but I won't be able to do that until the end

No, that was just when I last checked in that machine's .config file. 
It was yesterday's mainline, with that .config and `make oldconfig'.
--

Previous thread: [PATCH] [BLOCK] request_module() use format string by maximilian attems on Wednesday, June 11, 2008 - 9:46 pm. (1 message)

Next thread: 2.6.26-rc5-mm3 by Andrew Morton on Wednesday, June 11, 2008 - 10:59 pm. (100 messages)