Re: [2.6.27] overlapping early reservations [was: early exception - lockdep related?]

Previous thread: [git pull] x86 fixes by Ingo Molnar on Friday, September 5, 2008 - 11:51 am. (1 message)

Next thread: blocks of zeros (NULLs) in NFS files in kernels >= 2.6.20 by Aaron Straus on Friday, September 5, 2008 - 12:19 pm. (25 messages)
From: Luca Tettamanti
Date: Friday, September 5, 2008 - 12:17 pm

Hum, kernel says:

http://img177.imageshack.us/my.php?image=overlappingus2.jpg

Overlapping early reservations b98000-eff266 RAMDISK to 200000-d09cf7
TEXT DATA BSS

It would appear that the initramfs is overlapping the kernel itself,
is the boot loader (LILO) doing something stupid?

Luca
--

From: Peter Zijlstra
Date: Friday, September 5, 2008 - 12:25 pm

Suppose it is, lets ask hpa..

--

From: H. Peter Anvin
Date: Friday, September 5, 2008 - 1:18 pm

It definitely looks like it.

	-hpa
--

From: Luca Tettamanti
Date: Saturday, September 6, 2008 - 6:20 am

Is there anything that the kernel could to do confuse lilo?
The issue started appearing with 2.6.27 and the outcome of the boot
process varies between versions and seems sensitive to configuration
changes (though a "bad" kernel consistently fails).

Orthogonal to my problem: the panic() in reserve_early is useless for
debugging since the output won't reach the screen or the serial
console (even worse: the kernel takes an exception while trying to
execute the panic). Is it acceptable to replace it with an
early_printk + hlt?

Luca
--

From: Ingo Molnar
Date: Saturday, September 6, 2008 - 7:51 am

good question. Does your successful 2.6.26 bootup actually _depend_ on 
the initrd? Or does it perhaps have enough built-in drivers that make it 
boot just fine?

in that case v2.6.26 might just have stomped on the initrd silently, 
corrupted it (during kernel decompress), and the initrd unpacker saw the 
corruption and ignored it. Userspace wouldnt care as the kernel had all 
the drivers it needed.

or perhaps something made your v2.6.27 bzImage larger so that the 

very much so. I was wondering about that already.

In any case it would make sense to turn that particular overlap 
situation into a warning message and disable initrd decompress - and try 
to boot with whatever is built-in the kernel.

	Ingo
--

From: Yinghai Lu
Date: Saturday, September 6, 2008 - 9:06 am

console=uart8250,io,0x3f8,115200n8
could help

YH
--

From: Luca Tettamanti
Date: Monday, September 8, 2008 - 10:54 am

Nope, parse_early_param() is called in start_kernel(), my kernel dies

How does LILO decides where to put the initrd (I find LILO code...
obscure)? I mean, it gets a compressed image: how does it know the
size of the uncompressed kernel image? Is it the payload_length in the
real mode header? (answer to self: no, it appears to be the compressed

I'm already using the latest version.

Luca
--

From: Yinghai Lu
Date: Monday, September 8, 2008 - 11:04 am

On Mon, Sep 8, 2008 at 10:54 AM, Luca Tettamanti <kronos.it@gmail.com> wrote:

can you post boot log with working kernel + "debug"?

YH
--

From: Luca Tettamanti
Date: Monday, September 8, 2008 - 12:14 pm

This is the map of the early reservations (will send the dmesg + debug later):

[    0.000000] (6 early reservations) ==> bootmem [0000000000 - 00bbf90000]
[    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page ==>
[0000000000 - 0000001000]
[    0.000000]   #1 [0000006000 - 0000008000]       TRAMPOLINE ==>
[0000006000 - 0000008000]
[    0.000000]   #2 [0000200000 - 0000d012b8]    TEXT DATA BSS ==>
[0000200000 - 0000d012b8]
[    0.000000]   #3 [00037dc000 - 00040fe2d9]          RAMDISK ==>
[00037dc000 - 00040fe2d9]
[    0.000000]   #4 [000009c800 - 0000100000]    BIOS reserved ==>
[000009c800 - 0000100000]
[    0.000000]   #5 [0000008000 - 000000b000]          PGTABLE ==>
[0000008000 - 000000b000]

As a side note: I've bigger older (2.6.26) kernels that boots fine,
and smaller 2.6.27 kernels that do not work, e.g. this one:

Overlapping early reservations b71000-effb43 RAMDISK to 200000-c84ecf
TEXT DATA BSS

Luca
--

From: Yinghai Lu
Date: Monday, September 8, 2008 - 12:35 pm

that could explain sth. big kernel use more,  and lilo put ramdisk

need to figure out, lilo put ramdisk so low...

need to know e820 table layout...

YH
--

From: Luca Tettamanti
Date: Monday, September 8, 2008 - 1:00 pm

BIOS-provided physical RAM map:
  BIOS-e820: 0000000000000000 - 000000000009c800 (usable)
  BIOS-e820: 000000000009c800 - 00000000000a0000 (reserved)
  BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
  BIOS-e820: 0000000000100000 - 00000000bbf90000 (usable)
  BIOS-e820: 00000000bbf90000 - 00000000bbf9e000 (ACPI data)
  BIOS-e820: 00000000bbf9e000 - 00000000bbfe0000 (ACPI NVS)
  BIOS-e820: 00000000bbfe0000 - 00000000bc000000 (reserved)
  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
  BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
 last_pfn = 0xbbf90 max_arch_pfn = 0x3ffffffff

dmesg is attached, but I haven't rebooted yet.

Luca
From: Yinghai Lu
Date: Monday, September 8, 2008 - 1:46 pm

so some config works, others not?

YH
--

From: Luca Tettamanti
Date: Monday, September 8, 2008 - 2:25 pm

Yes, that's correct, but it doesn't seem related to a particular
configuration item (moon phase maybe).
For example the kernel I'm using right now (-rc4-something) has the
same config as a non-working kernel minus LOCKDEP, but git-current
minus LOCKDEP does not work. On another kernel I got a working config
just enabling DEBUG_INFO, in another case I disabled MTD.

Luca
--

From: Luca Tettamanti
Date: Tuesday, September 9, 2008 - 2:48 pm

I mean git pulled from Linus' tree.

Btw, I'm attaching dmesg with debug.

Luca
From: Luca Tettamanti
Date: Saturday, September 6, 2008 - 10:20 am

The image is mostly modular (e.g. ahci drivers + LVM + fs are on the
ramdisk), see:

The compressed image (bzImage) is roughly the same size as older
kernels (2MB), the uncompressed vmlinux is slightly bigger, but not
much (~100k)

452K    ./mm/built-in.o
100K    ./ipc/built-in.o
8.0K    ./security/built-in.o
3.0M    ./drivers/built-in.o
4.0K    ./usr/built-in.o
216K    ./block/built-in.o
96K     ./init/built-in.o
292K    ./lib/built-in.o
1.3M    ./net/built-in.o
16K     ./crypto/built-in.o
1.1M    ./kernel/built-in.o
1.3M    ./fs/built-in.o
4.0K    ./sound/built-in.o

Luca
--

From: Ingo Molnar
Date: Saturday, September 6, 2008 - 10:25 am

hm, that doesnt seem to match the ranges that got printed:

| Kernel is loaded at the standard 2MB physical, and goes up to 13.6MB 
| physical. That's a tad large at 11.6 MB but still valid.

so how come a ~2MB vmlinux takes 11.6 MB? Is the bss that large for some 
reason perhaps?

	Ingo
--

From: Luca Tettamanti
Date: Saturday, September 6, 2008 - 10:29 am

Sorry, the sentence above is not very clear: I meant that with 2.6.27
the uncompressed vmlinux is only slightly bigger than older kernels;
as you stated the size is 11.6MB.

Luca
--

From: Ingo Molnar
Date: Saturday, September 6, 2008 - 6:41 am

yeah. Kernel is loaded at the standard 2MB physical, and goes up to 
13.6MB physical. That's a tad large at 11.6 MB but still valid.

ramdisk image goes from 11.6 MB to 14.9 MB - roughly standard size. That 
overlaps 2 MB into the kernel image so we have to panic. LILO should 
have loaded the ramdisk somewhere else. (or should have aborted the boot 
if it cannot do that)

We could perhaps print a prominent warning, delay the boot for 5 seconds 
or so via mdelay(5000) and simply not load the ramdisk if this happens? 
The kernel is obviously still functional - and such a large vmlinuz 
likely has all the built-in drivers to boot up to user-space - the lack 
of the ramdisk does not necessarily hurt .

	Ingo
--

From: Yinghai Lu
Date: Saturday, September 6, 2008 - 9:14 am

wonder if lilo is fixing bzImage from 1M, and when it is calculating
pos of ramdisk...base that
later on-same-position uncompressing, put vmlinux from 2M...

it seems kexec is puting initrd as high as possible, or could specify
the ramdisk postion ..

wonder if new lilo could help.

YH
--

Previous thread: [git pull] x86 fixes by Ingo Molnar on Friday, September 5, 2008 - 11:51 am. (1 message)

Next thread: blocks of zeros (NULLs) in NFS files in kernels >= 2.6.20 by Aaron Straus on Friday, September 5, 2008 - 12:19 pm. (25 messages)