Re: 2.6.25-git2: BUG: unable to handle kernel paging request at ffffffffffffffff

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Zdenek Kabelac <zdenek.kabelac@...>
Cc: Ingo Molnar <mingo@...>, Jiri Slaby <jirislaby@...>, Rafael J. Wysocki <rjw@...>, <paulmck@...>, David Miller <davem@...>, Linux Kernel Mailing List <linux-kernel@...>, Andrew Morton <akpm@...>, <linux-ext4@...>, <herbert@...>, Pekka Enberg <penberg@...>, Christoph Lameter <clameter@...>
Date: Wednesday, April 23, 2008 - 11:53 am

On Wed, 23 Apr 2008, Zdenek Kabelac wrote:

Goodie, two of the backtraces (the parent-is-sleeping warning and the 
immediately subsequent oops) look like the same thing that should already 
be fixed in current -git. But there is some interesting stuff there..


Yes, that's interesting to see.


This is indeed an interesting issue: arch/x86/kernel/smpboot.c does an IPI 
call to start_secondary, and yes, it looks suspicious to have that 
lock_ipi_call_lock there (and in particular the unlock_ipi_call_lock that 
enables interrupts within it). Ingo?

But the really interesting one is the later kmalloc() debugging triggers, 
because this one is, I suspect, very much a sign of the memory corruption 
bug you see. 

There's two reasons that make me say that:

 - the callback is in networking code and wireless, which was one of the 
   possible suspects.

 - the padding pattern which *should* have been POISON_INUSE (0x5a) has 
   been overwritten with:

   Padding 0xffff8100201a0000:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
   ....
   Padding 0xffff8100201a71a0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk¥
   Padding 0xffff8100201a71b0:  cc cc cc cc cc cc cc cc 00 00 1a 20 00 81 ff ff ÌÌÌÌÌÌÌÌ......ÿÿ
   Padding 0xffff8100201a71c0:  cd 70 17 a0 ff ff ff ff 00 00 00 00 73 05 00 00 Íp..ÿÿÿÿ....s...
   Padding 0xffff8100201a71d0:  b6 54 58 00 01 00 00 00 d5 71 26 81 ff ff ff ff ¶TX.....Õq&.ÿÿÿÿ
   Padding 0xffff8100201a71e0:  00 00 00 00 7c 05 00 00 97 54 58 00 01 00 00 00 ....|....TX.....

   which in turn is interesting because it very much looks like SLUB 
   re-used a page for something else (the values that things got 
   overwritten by are largely SLUB's own poison bytes: 6b is POISON_FREE, 
   the a5 at the end of the list of 6b's is POISON_END, while cc is 
   SLUB_RED_ACTIVE).

To me, that pattern looks like an order-3 allocation (correct: that's what 
kmalloc-4096 is supposed to be using!) got released, and the stuff at the 
end (with slub debugging, there's only room for 7 4096-byte allocations 
there, so 71b0 is past the end) in that SLUB debug info.

The first word of that busy allocation is ffff8100201a0000, which is also 
the base pointer to the whole order-3 page ("Free pointer"), followed by 
the SLAB tracking data.

Looks like possibly a double free to me (with the first free caused the 
page to be re-used, the second free is the one that triggers the debug 
message). But maybe Pekka or Christoph are better at reading those oopses.

Now, the first slab debug trigger then does:

   FIX kmalloc-4096: Restoring 0xffff8100201a0000-0xffff8100201a7e16=0x5a

to "restore" the data to its expected values, which is why the *second* 
one triggers, because now the allocation that was re-used got overwritten 
with that free pattern, and then you get more complaints about *that*, and 
the skb pointers themselves now have bogus data in them (overwritten 
twice: first with 0x5a, to restore the first one, then with 0xcc for the 
second warning.

So then the subsequent "general protection fault" is just because of bogus 
skb pointers due to the still-in-use allocation being overwritten by all 
these poison values.

And finally, the stuff at the very end (BUG: sleeping function called from 
invalid context and the SPIN IRQ one) are just warnings because we killed 
a process in a critical section, so all the preempt and irq flags are just 
wrong. Those can be ignored entirely.

But what is interesting is that this does look networking-related. I 
suspect it's the suspend/resume that triggers something with the 
dev_open() thing, which re-uses an already-free'd pointer or whatever. I 
have no clue about exactly what goes wrong, but I really would suspect 
that whole "network device down/up" sequence during the suspend.

I've left the kernel trace appended, since I added a few more people to 
the discussion.

		Linus

---
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
2.6.25-git1: Solid hang on HP nx6325 (64-bit), Rafael J. Wysocki, (Sat Apr 19, 9:22 am)
2.6.25-git2: BUG: unable to handle kernel paging request at ..., Rafael J. Wysocki, (Sun Apr 20, 3:04 pm)
Re: 2.6.25-git2: BUG: unable to handle kernel paging request..., Rafael J. Wysocki, (Mon Apr 21, 12:12 pm)
Re: 2.6.25-git2: BUG: unable to handle kernel paging request..., Rafael J. Wysocki, (Mon Apr 21, 2:22 pm)
Re: 2.6.25-git2: BUG: unable to handle kernel paging request..., Rafael J. Wysocki, (Mon Apr 21, 1:19 pm)
Re: 2.6.25-git2: BUG: unable to handle kernel paging request..., Rafael J. Wysocki, (Mon Apr 21, 8:54 pm)
Re: 2.6.25-git2: BUG: unable to handle kernel paging request..., Linus Torvalds, (Wed Apr 23, 11:53 am)
[PATCH 1/1] x86: fix text_poke, Jiri Slaby, (Sun Apr 27, 8:51 pm)
Re: [PATCH 1/1] x86: fix text_poke, Linus Torvalds, (Fri Apr 25, 11:03 am)
Re: [PATCH 1/1] x86: fix text_poke, David Miller, (Fri Apr 25, 4:18 pm)
Re: [PATCH 1/1] x86: fix text_poke, Ingo Molnar, (Fri Apr 25, 11:19 am)
Re: [PATCH 1/1] x86: fix text_poke, Andi Kleen, (Fri Apr 25, 11:27 am)
Re: [PATCH 1/1] x86: fix text_poke, Ingo Molnar, (Fri Apr 25, 11:26 am)
Re: [PATCH 1/1] x86: fix text_poke, Linus Torvalds, (Fri Apr 25, 11:33 am)
Re: [PATCH 1/1] x86: fix text_poke, Mathieu Desnoyers, (Fri Apr 25, 11:54 am)
Re: [PATCH 1/1] x86: fix text_poke, Ingo Molnar, (Fri Apr 25, 11:59 am)
Re: [PATCH 1/1] x86: fix text_poke, Mathieu Desnoyers, (Fri Apr 25, 12:11 pm)
Re: [PATCH 1/1] x86: fix text_poke, Ingo Molnar, (Fri Apr 25, 11:50 am)
Re: [PATCH 1/1] x86: fix text_poke, Linus Torvalds, (Fri Apr 25, 12:11 pm)
Re: [PATCH 1/1] x86: fix text_poke, H. Peter Anvin, (Fri Apr 25, 11:57 am)
Re: [PATCH 1/1] x86: fix text_poke, Pavel Machek, (Fri Apr 25, 2:53 pm)
Re: [PATCH 1/1] x86: fix text_poke, Andi Kleen, (Fri Apr 25, 11:48 am)
Re: [PATCH 1/1] x86: fix text_poke, Linus Torvalds, (Fri Apr 25, 12:06 pm)
Re: [PATCH 1/1] x86: fix text_poke, Ingo Molnar, (Fri Apr 25, 12:22 pm)
Re: [PATCH 1/1] x86: fix text_poke, Linus Torvalds, (Fri Apr 25, 12:37 pm)
Re: [PATCH 1/1] x86: fix text_poke, Ingo Molnar, (Fri Apr 25, 12:52 pm)
Re: [PATCH 1/1] x86: fix text_poke, Andi Kleen, (Fri Apr 25, 12:56 pm)
Re: [PATCH 1/1] x86: fix text_poke, Ingo Molnar, (Fri Apr 25, 12:45 pm)
Re: [PATCH 1/1] x86: fix text_poke, Linus Torvalds, (Fri Apr 25, 12:51 pm)
Re: [PATCH 1/1] x86: fix text_poke, Ingo Molnar, (Fri Apr 25, 1:02 pm)
Re: [PATCH 1/1] x86: fix text_poke, Linus Torvalds, (Fri Apr 25, 1:13 pm)
Re: [PATCH 1/1] x86: fix text_poke, Ingo Molnar, (Fri Apr 25, 1:53 pm)
Re: [PATCH 1/1] x86: fix text_poke, Ingo Molnar, (Fri Apr 25, 2:13 pm)
Re: [PATCH 1/1] x86: fix text_poke, Linus Torvalds, (Fri Apr 25, 2:09 pm)
Re: [PATCH 1/1] x86: fix text_poke, Ingo Molnar, (Fri Apr 25, 2:19 pm)
Re: [PATCH 1/1] x86: fix text_poke, Ingo Molnar, (Fri Apr 25, 2:56 pm)
Re: [PATCH 1/1] x86: fix text_poke, Ingo Molnar, (Fri Apr 25, 2:04 pm)
Re: [PATCH 1/1] x86: fix text_poke, Andi Kleen, (Fri Apr 25, 1:26 pm)
Re: [PATCH 1/1] x86: fix text_poke, Linus Torvalds, (Fri Apr 25, 1:29 pm)
Re: [PATCH 1/1] x86: fix text_poke, Ingo Molnar, (Fri Apr 25, 12:43 pm)
Re: [PATCH 1/1] x86: fix text_poke, Andi Kleen, (Fri Apr 25, 12:19 pm)
Re: [PATCH 1/1] x86: fix text_poke, Linus Torvalds, (Fri Apr 25, 12:24 pm)
Re: [PATCH 1/1] x86: fix text_poke, Jeremy Fitzhardinge, (Fri Apr 25, 2:13 pm)
Re: [PATCH 1/1] x86: fix text_poke, Nick Piggin, (Sun May 4, 10:36 pm)
Re: [PATCH 1/1] x86: fix text_poke, Ingo Molnar, (Fri Apr 25, 12:33 pm)
Re: [PATCH 1/1] x86: fix text_poke, Mathieu Desnoyers, (Fri Apr 25, 12:30 pm)
Re: [PATCH 1/1] x86: fix text_poke, H. Peter Anvin, (Fri Apr 25, 12:42 pm)
Re: [PATCH 1/1] x86: fix text_poke, Mathieu Desnoyers, (Fri Apr 25, 1:09 pm)
Re: [PATCH 1/1] x86: fix text_poke, Mathieu Desnoyers, (Fri Apr 25, 2:37 pm)
Re: [PATCH 1/1] x86: fix text_poke, H. Peter Anvin, (Fri Apr 25, 4:18 pm)
Re: [PATCH 1/1] x86: fix text_poke, Mathieu Desnoyers, (Fri Apr 25, 4:37 pm)
Re: [PATCH 1/1] x86: fix text_poke, H. Peter Anvin, (Fri Apr 25, 4:41 pm)
Re: [PATCH 1/1] x86: fix text_poke, David Miller, (Fri Apr 25, 5:02 pm)
Re: [PATCH 1/1] x86: fix text_poke, H. Peter Anvin, (Fri Apr 25, 5:11 pm)
Re: [PATCH 1/1] x86: fix text_poke, Linus Torvalds, (Fri Apr 25, 4:51 pm)
Re: [PATCH 1/1] x86: fix text_poke, Mathieu Desnoyers, (Fri Apr 25, 5:12 pm)
Re: [PATCH 1/1] x86: fix text_poke, Jeremy Fitzhardinge, (Sat Apr 26, 2:50 am)
Re: [PATCH 1/1] x86: fix text_poke, Masami Hiramatsu, (Sun Apr 27, 8:49 pm)
Re: [PATCH 1/1] x86: fix text_poke, Linus Torvalds, (Fri Apr 25, 6:04 pm)
Re: [PATCH 1/1] x86: fix text_poke, Frank Ch. Eigler, (Thu Jun 5, 1:44 pm)
Re: [PATCH 1/1] x86: fix text_poke, Frank Ch. Eigler, (Fri Apr 25, 10:12 pm)
Re: [PATCH 1/1] x86: fix text_poke, Mathieu Desnoyers, (Fri Apr 25, 7:00 pm)
Re: [PATCH 1/1] x86: fix text_poke, Jeremy Fitzhardinge, (Fri Apr 25, 7:13 pm)
Re: [PATCH 1/1] x86: fix text_poke, Masami Hiramatsu, (Fri Apr 25, 7:34 pm)
Re: [PATCH 1/1] x86: fix text_poke, Jeremy Fitzhardinge, (Sat Apr 26, 2:21 am)
Re: [PATCH 1/1] x86: fix text_poke, Arnaldo Carvalho de Melo, (Sat Apr 26, 7:56 am)
Re: [PATCH 1/1] x86: fix text_poke, Jeremy Fitzhardinge, (Sat Apr 26, 7:38 pm)
Re: [PATCH 1/1] x86: fix text_poke, Arnaldo Carvalho de Melo, (Sat Apr 26, 9:00 pm)
Re: [PATCH 1/1] x86: fix text_poke, H. Peter Anvin, (Fri Apr 25, 5:15 pm)
Re: [PATCH 1/1] x86: fix text_poke, Mathieu Desnoyers, (Fri Apr 25, 5:47 pm)
Re: [PATCH 1/1] x86: fix text_poke, H. Peter Anvin, (Fri Apr 25, 6:07 pm)
Re: [PATCH 1/1] x86: fix text_poke, Mathieu Desnoyers, (Fri Apr 25, 6:30 pm)
Re: [PATCH 1/1] x86: fix text_poke, Linus Torvalds, (Fri Apr 25, 6:36 pm)
Re: [PATCH 1/1] x86: fix text_poke, Mathieu Desnoyers, (Mon Apr 28, 4:43 pm)
Re: [PATCH 1/1] x86: fix text_poke, Jeremy Fitzhardinge, (Mon Apr 28, 5:02 pm)
Re: [PATCH 1/1] x86: fix text_poke, Mathieu Desnoyers, (Sun May 4, 11:03 am)
Re: [PATCH 1/1] x86: fix text_poke, H. Peter Anvin, (Sun May 4, 12:18 pm)
Re: [PATCH 1/1] x86: fix text_poke, Ingo Molnar, (Mon Apr 28, 4:21 pm)
Re: [PATCH 1/1] x86: fix text_poke, Jeremy Fitzhardinge, (Mon Apr 28, 4:55 pm)
Re: [PATCH 1/1] x86: fix text_poke, H. Peter Anvin, (Mon Apr 28, 5:01 pm)
Re: [PATCH 1/1] x86: fix text_poke, Mathieu Desnoyers, (Mon Apr 28, 6:42 pm)
Re: [PATCH 1/1] x86: fix text_poke, H. Peter Anvin, (Fri Apr 25, 6:38 pm)
Re: [PATCH 1/1] x86: fix text_poke, H. Peter Anvin, (Fri Apr 25, 3:19 pm)
Re: [PATCH 1/1] x86: fix text_poke, Mathieu Desnoyers, (Fri Apr 25, 4:04 pm)
Re: [PATCH 1/1] x86: fix text_poke, H. Peter Anvin, (Fri Apr 25, 4:09 pm)
Re: [PATCH 1/1] x86: fix text_poke, H. Peter Anvin, (Fri Apr 25, 2:47 pm)
Re: [PATCH 1/1] x86: fix text_poke, Ingo Molnar, (Fri Apr 25, 11:32 am)
Re: [PATCH 1/1] x86: fix text_poke, Andi Kleen, (Fri Apr 25, 11:17 am)
Re: [PATCH 1/1] x86: fix text_poke, Christoph Lameter, (Fri Apr 25, 3:36 pm)
Re: [PATCH 1/1] x86: fix text_poke, Andi Kleen, (Sat Apr 26, 5:59 am)
VIRTUAL_BUG_ON(), Christoph Lameter, (Mon Apr 28, 4:24 pm)
[RFC 1/1] mm: add virt to phys debug, Jiri Slaby, (Thu May 1, 3:22 pm)
Re: [RFC 1/1] mm: add virt to phys debug, Christoph Lameter, (Thu May 1, 4:18 pm)
Re: [RFC 1/1] mm: add virt to phys debug, Jiri Slaby, (Tue May 13, 10:38 am)
Re: [RFC 1/1] mm: add virt to phys debug, Jiri Slaby, (Tue May 6, 5:54 pm)
Re: [RFC 1/1] mm: add virt to phys debug, Christoph Lameter, (Wed May 7, 1:30 pm)
Re: [PATCH 1/1] x86: fix text_poke, Jiri Slaby, (Sat Apr 26, 7:16 am)
Re: [PATCH 1/1] x86: fix text_poke, Andi Kleen, (Sat Apr 26, 7:34 am)
Re: 2.6.25-git2: BUG: unable to handle kernel paging request..., Rafael J. Wysocki, (Fri Apr 25, 11:30 am)
Re: 2.6.25-git2: BUG: unable to handle kernel paging request..., Christoph Lameter, (Wed Apr 23, 3:05 pm)
Re: 2.6.25-git2: BUG: unable to handle kernel paging request..., Christoph Lameter, (Wed Apr 23, 3:28 pm)
device_pm_add (was: Re: 2.6.25-git2: BUG: unable to handle k..., Rafael J. Wysocki, (Tue Apr 22, 4:34 pm)
Re: device_pm_add (was: Re: 2.6.25-git2: BUG: unable to hand..., Rafael J. Wysocki, (Tue Apr 22, 8:50 pm)
Re: device_pm_add (was: Re: 2.6.25-git2: BUG: unable to hand..., Rafael J. Wysocki, (Tue Apr 22, 6:48 pm)
Re: device_pm_add (was: Re: 2.6.25-git2: BUG: unable to hand..., Rafael J. Wysocki, (Tue Apr 22, 4:57 pm)
Re: 2.6.25-git2: BUG: unable to handle kernel paging request..., Rafael J. Wysocki, (Tue Apr 22, 5:46 pm)
Re: 2.6.25-git2: BUG: unable to handle kernel paging request..., Rafael J. Wysocki, (Mon Apr 21, 9:30 pm)
Re: 2.6.25-git2: BUG: unable to handle kernel paging request..., Rafael J. Wysocki, (Mon Apr 21, 9:15 pm)
Re: 2.6.25-git2: BUG: unable to handle kernel paging request..., Paul E. McKenney, (Sun Apr 20, 10:08 pm)
Re: 2.6.25-git2: BUG: unable to handle kernel paging request..., Paul E. McKenney, (Mon Apr 21, 12:59 am)
Re: 2.6.25-git2: BUG: unable to handle kernel paging request..., Rafael J. Wysocki, (Mon Apr 21, 12:24 pm)
Re: 2.6.25-git2: BUG: unable to handle kernel paging request..., Rafael J. Wysocki, (Mon Apr 21, 9:35 am)
Re: 2.6.25-git2: BUG: unable to handle kernel paging request..., Rafael J. Wysocki, (Sun Apr 20, 3:14 pm)