Re: 2.6.34-rc4 : OOPS in unmap_vma

Previous thread: linux-next: manual merge of the crypto tree with Linus' tree by Stephen Rothwell on Tuesday, April 13, 2010 - 6:48 pm. (2 messages)

Next thread: [PATCH][resend] namei.c : update mnt when it needed by Huang Shijie on Tuesday, April 13, 2010 - 7:16 pm. (2 messages)
From: Parag Warudkar
Date: Tuesday, April 13, 2010 - 6:53 pm

Not sure if this is related to the recent mm/vma fixes - got this 
while rebooting (kexec) latest git -

[    0.000000] Linux version 2.6.34-rc4 (paragw@parag-laptop) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #19 SMP PREEMPT Tue Apr 13 20:59:37 EDT 2010
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-2.6.34-rc4 root=UUID=0a0bb1b9-978c-4e16-8e43-aae24e172e12 ro quiet splash
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000100 - 000000000009fc00 (usable)
[    0.000000]  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000ef000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000b8f2f000 (usable)
[    0.000000]  BIOS-e820: 00000000b8f2f000 - 00000000b8f31000 (reserved)
[    0.000000]  BIOS-e820: 00000000b8f31000 - 00000000b9d70000 (usable)
[    0.000000]  BIOS-e820: 00000000b9d70000 - 00000000b9d80000 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000b9d80000 - 00000000bc4e0000 (usable)
[    0.000000]  BIOS-e820: 00000000bc4e0000 - 00000000bc6e0000 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000bc6e0000 - 00000000bde92000 (usable)
[    0.000000]  BIOS-e820: 00000000bde92000 - 00000000bde9a000 (reserved)
[    0.000000]  BIOS-e820: 00000000bde9a000 - 00000000bdebf000 (usable)
[    0.000000]  BIOS-e820: 00000000bdebf000 - 00000000bdecf000 (reserved)
[    0.000000]  BIOS-e820: 00000000bdecf000 - 00000000bdfcf000 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000bdfcf000 - 00000000bdfff000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000bdfff000 - 00000000be000000 (usable)
[    0.000000]  BIOS-e820: 00000000be000000 - 00000000c0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
[    0.000000]  BIOS-e820: 00000000fed10000 - 00000000fed14000 (reserved)
[    0.000000]  BIOS-e820: 00000000fed18000 - 00000000fed1a000 (reserved)
[    0.000000]  BIOS-e820: 00000000fed1c000 - ...
From: Borislav Petkov
Date: Tuesday, April 13, 2010 - 11:17 pm

From: Parag Warudkar <parag.lkml@gmail.com>
Date: Tue, Apr 13, 2010 at 09:53:46PM -0400



hmm, it doesn't look like it. Your code translates to something like

   0:   b8 00 00 00 00          mov    $0x0,%eax
   5:   80 ff ff                cmp    $0xff,%bh
   8:   ff 48 21                decl   0x21(%rax)
   b:   45 80 48 8b 45          rex.RB orb    $0x45,-0x75(%r8)
  10:   80 48 ff c8             orb    $0xc8,-0x1(%rax)
  14:   48 3b 85 40 ff ff ff    cmp    -0xc0(%rbp),%rax
  1b:   48 8b 85 50 ff ff ff    mov    -0xb0(%rbp),%rax
  22:   48 0f 42 7d 80          cmovb  -0x80(%rbp),%rdi
  27:   48 89 7d 80             mov    %rdi,-0x80(%rbp)
  2b:*  48 8b 38                mov    (%rax),%rdi     <-- trapping instruction
  2e:   48 85 ff                test   %rdi,%rdi
  31:   0f 84 f5 04 00 00       je     0x52c
  37:   48                      rex.W
  38:   b8 fb 0f 00 00          mov    $0xffb,%eax
  3d:   00 c0                   add    %al,%al
  3f:   ff                      .byte 0xff


which I could correlate with what I get here (comments added):

	.loc 1 1051 0
	movabsq	$549755813888, %rax	#, tmp158	PGDIR_SIZE
.LVL392:
	leaq	(%r12,%rax), %rax	#,
	movq	%rax, -88(%rbp)	#, %sfp
	movabsq	$-549755813888, %rax	#, tmp159	PGDIR_MASK
	andq	%rax, -88(%rbp)	# tmp159, %sfp
	movq	-88(%rbp), %rdx	# %sfp, tmp160
	movq	-72(%rbp), %rax	# %sfp, tmp161
	decq	%rdx	# tmp160			__boundary
	decq	%rax	# tmp161			__end
	cmpq	%rax, %rdx	# tmp161, tmp160	rFLAGS
	movq	-72(%rbp), %rax	# %sfp,
	cmovb	-88(%rbp), %rax	# %sfp,,
	movq	-112(%rbp), %rdx	# %sfp,		pgd
	movq	%rax, -88(%rbp)	#, %sfp
	movq	(%rdx), %rax	# <variable>.pgd, pgd$pgd

and if this output is correct and if you scroll back a little in your
assemble output, you should probably see that the value computed in
pgd_offset() is being saved in -0x80(%rbp) and reloaded again for use.

So you oops when dereferencing that pgd value in %rax (%rdx in my case),
*pgd in pgd_none_or_clear_bad(pgd) which is called in the below ...
From: Linus Torvalds
Date: Wednesday, April 14, 2010 - 7:32 am

There's a large constant (0xffffff8000000000) in there at the beginning, 
and the disassembly hasn't found the start of the next instruction very 
cleanly. The same is true at the end: another large constant is cut off in 
the middle. 

The byte just before the dumped instruction stream is almost certainly 
'48h', and the last byte of the last constant is 0xff, and the disassembly 
ends up being:

   0:	48 b8 00 00 00 00 80 	mov    $0xffffff8000000000,%rax
   7:	ff ff ff 
   a:	48 21 45 80          	and    %rax,-0x80(%rbp)
   e:	48 8b 45 80          	mov    -0x80(%rbp),%rax
  12:	48 ff c8             	dec    %rax
  15:	48 3b 85 40 ff ff ff 	cmp    -0xc0(%rbp),%rax
  1c:	48 8b 85 50 ff ff ff 	mov    -0xb0(%rbp),%rax
  23:	48 0f 42 7d 80       	cmovb  -0x80(%rbp),%rdi
  28:	48 89 7d 80          	mov    %rdi,-0x80(%rbp)
  2c:*	48 8b 38             	mov    (%rax),%rdi     <-- trapping instruction
  2f:	48 85 ff             	test   %rdi,%rdi
  32:	0f 84 f5 04 00 00    	je     0x52d
  38:	48 b8 fb 0f 00 00 00 	mov    $0xffffc00000000ffb,%rax
  3f:	c0 ff ff 

But yes, you found the right spot (that 0xffffff8000000000 constant is 

Yup. Close enough. Btw, it's often good to look at both the *.s code _and_ 
the *.lst code. If you do "make mm/memory.lst", you'll find those big 
constants easily, and then you'll see the code this way:

	        do {
	                next = pgd_addr_end(addr, end);
	ffffffff81b2aa45:       48 b8 00 00 00 00 80    mov    $0x8000000000,%rax
	ffffffff81b2aa4c:       00 00 00
	ffffffff81b2aa4f:       49 8d 04 04             lea    (%r12,%rax,1),%rax
	ffffffff81b2aa53:       48 89 45 a8             mov    %rax,-0x58(%rbp)
	ffffffff81b2aa57:       48 b8 00 00 00 00 80    mov    $0xffffff8000000000,%rax
	ffffffff81b2aa5e:       ff ff ff
	ffffffff81b2aa61:       48 21 45 a8             and    %rax,-0x58(%rbp)
	ffffffff81b2aa65:       48 8b 45 b8             mov    -0x48(%rbp),%rax
	ffffffff81b2aa69:       48 8b 55 a8             mov    ...
From: Borislav Petkov
Date: Wednesday, April 14, 2010 - 8:22 am

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, Apr 14, 2010 at 07:32:08AM -0700


Right, the decodecode output looked kinda strange to me and I tried
to match the instruction order and find the location. But yeah, now
that I'm looking at show_registers(), we don't start dumping on precise
instruction boundary but simply 64 bytes in the default case. No time

[..]

ok, I can't say that I'm a linux newbie but the .lst code is new to me.

Well, Parag said something about kexec kernel so it is definitely
interesting what he means there - a kexec-enabled kernel or is this the
"second" kernel his machine kexec'd into after a previous failure. I
think this could clarify the situation a bit.

Thanks for looking over the asm.

-- 
Regards/Gruss,
Boris.
--

From: Vivek Goyal
Date: Wednesday, April 14, 2010 - 9:07 am

FWIW, Just a data point. I pulled in latest kernel and I can boot it
through BIOS as well as kexec boot on my x86_64 box.

Vivek
--

From: Parag Warudkar
Date: Wednesday, April 14, 2010 - 2:58 pm

Hi Borislav


It was the kexec'ed kernel that oopsed - the first kernel had no issues.
It was kexec'ing from 2.6.34-rc4 to the same kernel.

After that I have tried to reboot via kexec to try to reproduce the
issue but it either hung completely or resulted in corrupted X and
non-moving cursor.
Kexec from Distro kernel to itself works just fine (Ubuntu 2.6.32-20) however.

I will start a bisect as soon as find time.

Parag
--

From: Maciej Rutecki
Date: Friday, April 16, 2010 - 7:41 am

I created a Bugzilla entry at 
https://bugzilla.kernel.org/show_bug.cgi?id=15795
for your bug report, please add your address to the CC list in there, thanks!

-- 
Maciej Rutecki
http://www.maciek.unixy.pl
--

Previous thread: linux-next: manual merge of the crypto tree with Linus' tree by Stephen Rothwell on Tuesday, April 13, 2010 - 6:48 pm. (2 messages)

Next thread: [PATCH][resend] namei.c : update mnt when it needed by Huang Shijie on Tuesday, April 13, 2010 - 7:16 pm. (2 messages)