Re: [PATCH 5/5] lguest: loading bzImage directly

Previous thread: [PATCH] Handle errors in sync_sb_inodes() by Guillaume Chazarain on Tuesday, October 2, 2007 - 3:57 pm. (2 messages)

Next thread: Kernel 2.4 vs 2.6 Traffic Controller performance by Sonny on Tuesday, October 2, 2007 - 5:05 pm. (2 messages)
From: Rusty Russell
Date: Tuesday, October 2, 2007 - 4:34 pm

Hi all,

	Jeremy had some boot changes for bzImages, but buried in there was an
update to the boot protocol to support Xen and lguest (and kvm-lite).
I've copied those fairly simple patches, and if HPA is happy I'd like to
push them for 2.6.24 (after correcting for the Great Arch Merge of
course).

Thanks,
Rusty.

-

From: Rusty Russell
Date: Tuesday, October 2, 2007 - 4:35 pm

Proposed updates for version 2.07 of the boot protocol.  This includes:

load_flags.KEEP_SEGMENTS- flag to request/inhibit segment reloads
hardware_subarch	- what subarchitecture we're booting under
hardware_subarch_data	- per-architecture data

The intention of these changes is to make booting a paravirtualized
kernel work via the normal Linux boot protocol.  The intention is that
the bzImage payload can be a properly formed ELF file, so that the
bootloader can use its ELF notes and Phdrs to get more metadata about
the kernel and its requirements.

The ELF file could be the uncompressed kernel vmlinux itself; it would
only take small buildsystem changes to implement this.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>

---
 Documentation/i386/boot.txt    |   34 +++++++++++++++++++++++++++++++++-
 arch/i386/kernel/asm-offsets.c |    7 +++++++
 include/asm-i386/bootparam.h   |    9 +++++++--
 3 files changed, 47 insertions(+), 3 deletions(-)

diff -r cff7afab3bac Documentation/i386/boot.txt
--- a/Documentation/i386/boot.txt	Tue Oct 02 22:05:10 2007 +1000
+++ b/Documentation/i386/boot.txt	Tue Oct 02 22:05:10 2007 +1000
@@ -168,6 +168,8 @@ 0234/1	2.05+	relocatable_kernel Whether 
 0234/1	2.05+	relocatable_kernel Whether kernel is relocatable or not
 0235/3	N/A	pad2		Unused
 0238/4	2.06+	cmdline_size	Maximum size of the kernel command line
+023C/4	2.07+	hardware_subarch Hardware subarchitecture
+0240/8	2.07+	hardware_subarch_data Subarchitecture-specific data
 
 (1) For backwards compatibility, if the setup_sects field contains 0, the
     real value is 4.
@@ -204,7 +206,7 @@ boot loaders can ignore those fields.
 
 The byte order of all fields is littleendian (this is x86, after all.)
 
-Field name:	setup_secs
+Field name:	setup_sects
 Type:		read
 Offset/size:	0x1f1/1
 ...
From: Rusty Russell
Date: Tuesday, October 2, 2007 - 4:35 pm

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

---
 include/linux/linkage.h |    6 ++++++
 1 file changed, 6 insertions(+)

===================================================================
--- a/include/linux/linkage.h
+++ b/include/linux/linkage.h
@@ -34,6 +34,12 @@
   name:
 #endif
 
+#ifndef WEAK
+#define WEAK(name)	   \
+	.weak name;	   \
+	name:
+#endif
+
 #define KPROBE_ENTRY(name) \
   .pushsection .kprobes.text, "ax"; \
   ENTRY(name)


-

From: Rusty Russell
Date: Tuesday, October 2, 2007 - 4:36 pm

This patch uses the updated boot protocol to do paravirtualized boot.
If the boot version is >= 2.07, then it will do two things:

 1. Check the bootparams loadflags to see if we should reload the
    segment registers and clear interrupts.  This is appropriate
    for normal native boot and some paravirtualized environments, but
    inapproprate for others.

 2. Check the hardware architecture, and dispatch to the appropriate
    kernel entrypoint.  If the bootloader doesn't set this, then we
    simply do the normal boot sequence.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
---
 arch/i386/boot/compressed/head.S |   14 +++++++++--
 arch/i386/boot/compressed/misc.c |    4 +++
 arch/i386/boot/header.S          |    7 ++++-
 arch/i386/kernel/head.S          |   47 ++++++++++++++++++++++++++++++++++----
 4 files changed, 65 insertions(+), 7 deletions(-)

diff -r 5d471e4c931d arch/i386/boot/compressed/head.S
--- a/arch/i386/boot/compressed/head.S	Tue Oct 02 22:13:34 2007 +1000
+++ b/arch/i386/boot/compressed/head.S	Tue Oct 02 22:20:25 2007 +1000
@@ -27,19 +27,30 @@
 #include <asm/segment.h>
 #include <asm/page.h>
 #include <asm/boot.h>
+#include <asm/asm-offsets.h>
 
 .section ".text.head","ax",@progbits
 	.globl startup_32
 
 startup_32:
-	cld
-	cli
+	/* check to see if KEEP_SEGMENTS flag is meaningful */
+	cmpw $0x207, BP_version(%esi)
+	jb 1f
+
+	/* test KEEP_SEGMENTS flag to see if the bootloader is asking
+	 * us to not reload segments */
+	testb $(1<<6), BP_loadflags(%esi)
+	jnz 2f
+
+1:	cli
 	movl $(__BOOT_DS),%eax
 	movl %eax,%ds
 	movl %eax,%es
 	movl %eax,%fs
 	movl %eax,%gs
 	movl %eax,%ss
+
+2:	cld
 
 /* Calculate the delta between where we were compiled to run
  * at and where we were actually ...
From: Rusty Russell
Date: Tuesday, October 2, 2007 - 4:39 pm

Version 2.07 of the boot protocol uses 0x23C for the hardware_subarch
field, that for lguest is "1".  This allows us to use the standard
boot entry point rather than the "GenuineLguest" string hack.

This entry point also clears the BSS and copies the boot parameters
and commandline for us, saving more code.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

---
 Documentation/lguest/lguest.c |   31 ++++---------------------------
 arch/i386/kernel/head.S       |    8 ++++++++
 drivers/lguest/lguest_asm.S   |    9 +++------
 3 files changed, 15 insertions(+), 33 deletions(-)

diff -r 2fdc577cfe5c Documentation/lguest/lguest.c
--- a/Documentation/lguest/lguest.c	Tue Oct 02 22:21:05 2007 +1000
+++ b/Documentation/lguest/lguest.c	Tue Oct 02 23:00:09 2007 +1000
@@ -251,23 +251,6 @@ static void *get_pages(unsigned int num)
 	return addr;
 }
 
-/* To find out where to start we look for the magic Guest string, which marks
- * the code we see in lguest_asm.S.  This is a hack which we are currently
- * plotting to replace with the normal Linux entry point. */
-static unsigned long entry_point(const void *start, const void *end)
-{
-	const void *p;
-
-	/* The scan gives us the physical starting address.  We boot with
-	 * pagetables set up with virtual and physical the same, so that's
-	 * OK. */
-	for (p = start; p < end; p++)
-		if (memcmp(p, "GenuineLguest", strlen("GenuineLguest")) == 0)
-			return to_guest_phys(p + strlen("GenuineLguest"));
-
-	errx(1, "Is this image a genuine lguest?");
-}
-
 /* This routine is used to load the kernel or initrd.  It tries mmap, but if
  * that fails (Plan 9's kernel file isn't nicely aligned on page boundaries),
  * it falls back to reading the memory in. */
@@ -303,7 +286,6 @@ static void map_at(int fd, void *addr, u
  * We return the starting address. */
 static unsigned long map_elf(int elf_fd, const Elf32_Ehdr *ehdr)
 {
-	void *start = (void *)-1, *end = NULL;
 	Elf32_Phdr phdr[ehdr->e_phnum];
 	unsigned int i;
 
@@ ...
From: Rusty Russell
Date: Tuesday, October 2, 2007 - 4:40 pm

Now arch/i386/boot/compressed/head.S understands the hardware_platform field,
we can directly execute bzImages.  No more horrific unpacking code.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 Documentation/lguest/lguest.c    |   97 ++++++++++++--------------------------
 arch/i386/boot/compressed/head.S |    6 ++
 drivers/lguest/lguest.c          |    5 +
 3 files changed, 42 insertions(+), 66 deletions(-)

diff -r b0480fd71a72 Documentation/lguest/lguest.c
--- a/Documentation/lguest/lguest.c	Tue Oct 02 22:28:13 2007 +1000
+++ b/Documentation/lguest/lguest.c	Tue Oct 02 22:52:07 2007 +1000
@@ -326,74 +326,39 @@ static unsigned long map_elf(int elf_fd,
 	return ehdr->e_entry;
 }
 
-/*L:160 Unfortunately the entire ELF image isn't compressed: the segments
- * which need loading are extracted and compressed raw.  This denies us the
- * information we need to make a fully-general loader. */
-static unsigned long unpack_bzimage(int fd)
-{
-	gzFile f;
-	int ret, len = 0;
-	/* A bzImage always gets loaded at physical address 1M.  This is
-	 * actually configurable as CONFIG_PHYSICAL_START, but as the comment
-	 * there says, "Don't change this unless you know what you are doing".
-	 * Indeed. */
-	void *img = from_guest_phys(0x100000);
-
-	/* gzdopen takes our file descriptor (carefully placed at the start of
-	 * the GZIP header we found) and returns a gzFile. */
-	f = gzdopen(fd, "rb");
-	/* We read it into memory in 64k chunks until we hit the end. */
-	while ((ret = gzread(f, img + len, 65536)) > 0)
-		len += ret;
-	if (ret < 0)
-		err(1, "reading image from bzImage");
-
-	verbose("Unpacked size %i addr %p\n", len, img);
-
-	/* The entry point for a bzImage is always the first byte */
-	return (unsigned long)img;
-}
-
 /*L:150 A bzImage, unlike an ELF file, is not meant to be loaded.  You're
- * supposed to jump into it and it will unpack itself.  We can't do that
- * because the Guest can't run the unpacking code, and adding features to
- * lguest ...
From: Chris Malley
Date: Wednesday, October 3, 2007 - 2:37 am

Hi guys

Would it not be clearer to #include <asm/bootparam.h> and use 
the relevant named members of struct setup_header / struct boot_params
rather than the hard-coded values 0x202, 0x1F1, 0x214 ?

--
Chris


On Wed, 2007-10-03 at 09:40 +1000, Rusty Russell wrote:


-

From: H. Peter Anvin
Date: Wednesday, October 3, 2007 - 10:12 am

Yes, please.  I have a patch in -mm which already does that across the 
kernel.

	-hpa
-

From: Rusty Russell
Date: Wednesday, October 3, 2007 - 5:02 pm

Yes, but unfortunately bootparam.h wasn't designed to be included from
userspace.

The patch would look like this, but it makes me wonder if it'd be better
to put all these user-exposed types in bootparam.h and have the other
headers include them.  hpa?

diff -r 6bb527d113a8 include/asm-i386/Kbuild
--- a/include/asm-i386/Kbuild	Wed Oct 03 13:49:31 2007 +1000
+++ b/include/asm-i386/Kbuild	Thu Oct 04 09:53:08 2007 +1000
@@ -6,7 +6,10 @@ header-y += msr-index.h
 header-y += msr-index.h
 header-y += ptrace-abi.h
 header-y += ucontext.h
+header-y += bootparam.h
 
+unifdef-y += e820.h
+unifdef-y += ist.h
 unifdef-y += msr.h
 unifdef-y += mtrr.h
 unifdef-y += vm86.h
diff -r 6bb527d113a8 include/asm-i386/bootparam.h
--- a/include/asm-i386/bootparam.h	Wed Oct 03 13:49:31 2007 +1000
+++ b/include/asm-i386/bootparam.h	Thu Oct 04 09:45:12 2007 +1000
@@ -10,82 +10,82 @@
 #include <video/edid.h>
 
 struct setup_header {
-	u8	setup_sects;
-	u16	root_flags;
-	u32	syssize;
-	u16	ram_size;
-	u16	vid_mode;
-	u16	root_dev;
-	u16	boot_flag;
-	u16	jump;
-	u32	header;
-	u16	version;
-	u32	realmode_swtch;
-	u16	start_sys;
-	u16	kernel_version;
-	u8	type_of_loader;
-	u8	loadflags;
+	__u8	setup_sects;
+	__u16	root_flags;
+	__u32	syssize;
+	__u16	ram_size;
+	__u16	vid_mode;
+	__u16	root_dev;
+	__u16	boot_flag;
+	__u16	jump;
+	__u32	header;
+	__u16	version;
+	__u32	realmode_swtch;
+	__u16	start_sys;
+	__u16	kernel_version;
+	__u8	type_of_loader;
+	__u8	loadflags;
 #define LOADED_HIGH	(1<<0)
 #define KEEP_SEGMENTS	(1<<6)
 #define ...
From: H. Peter Anvin
Date: Wednesday, October 3, 2007 - 5:26 pm

I don't have a strong preference either way, but I think what you have 
here is fine.

	-hpa

-

From: Kjartan Maraas
Date: Saturday, November 24, 2007 - 2:54 pm

[snip]

This change seems to have broken build of the battstat applet in
gnome-applets or rather the included apmlib in there. Intended?


Cheers
Kjartan


-

From: H. Peter Anvin
Date: Saturday, November 24, 2007 - 3:14 pm

Perhaps you could actually give some detail how it broke the code?!

	-hpa
-

From: Kjartan Maraas
Date: Sunday, November 25, 2007 - 5:32 am

Sorry for being terse. It looks to me like apm_event_t is gone, and that
breaks this bit of code:

Making all in apmlib
make[1]: Entering directory
`/home/kmaraas/cvs/gnome/gnome-applets/battstat/apmlib'
gcc -DHAVE_CONFIG_H -I. -I../.. -I../.. -I../../apmlib -DORBIT2=1
-pthread -I/opt/gnome2/include/panel-2.0 -I/opt/gnome2/include/gtk-2.0
-I/opt/gnome2/include/libgnomeui-2.0
-I/opt/gnome2/include/libbonoboui-2.0 -I/opt/gnome2/lib/gtk-2.0/include
-I/opt/gnome2/include/atk-1.0 -I/opt/gnome2/include/cairo
-I/opt/gnome2/include/pango-1.0 -I/opt/gnome2/include/glib-2.0
-I/opt/gnome2/lib/glib-2.0/include -I/opt/gnome2/include/libgnome-2.0
-I/opt/gnome2/include/libgnomecanvas-2.0
-I/opt/gnome2/include/gnome-vfs-2.0
-I/opt/gnome2/lib/gnome-vfs-2.0/include
-I/opt/gnome2/include/libbonobo-2.0 -I/opt/gnome2/include/orbit-2.0
-I/opt/gnome2/include/bonobo-activation-2.0
-I/opt/gnome2/include/libart-2.0 -I/opt/gnome2/include/gconf/2
-DG_LOG_DOMAIN=\"battstat_applet\"  -Wall -g -O0 -D_FORTIFY_SOURCE=2
-Wall -g -O0 -D_FORTIFY_SOURCE=2 -MT apmlib.o -MD -MP
-MF .deps/apmlib.Tpo -c -o apmlib.o apmlib.c
In file included from apmlib.c:32:
apm.h:56: error: expected declaration specifiers or ‘...’ before
‘apm_event_t’
apm.h:63: error: expected ‘)’ before ‘event’
apmlib.c:243: error: expected declaration specifiers or ‘...’ before
‘apm_event_t’
apmlib.c: In function ‘apm_get_events’:
apmlib.c:257: error: ‘events’ undeclared (first use in this function)
apmlib.c:257: error: (Each undeclared identifier is reported only once
apmlib.c:257: error: for each function it appears in.)
apmlib.c:257: error: ‘apm_event_t’ undeclared (first use in this
function)
apmlib.c:258: warning: control reaches end of non-void function
apmlib.c: At top level:
apmlib.c:366: error: expected ‘)’ before ‘event’
make[1]: *** [apmlib.o] Error 1
make[1]: Leaving directory
`/home/kmaraas/cvs/gnome/gnome-applets/battstat/apmlib'
make: *** [all-recursive] Error 1

Cheers
Kjartan


-

From: rae l
Date: Monday, October 29, 2007 - 11:38 pm

Sugguestion is you also add an entry to the header of the file
(Documentation/i386/boot.txt):

something like this:

diff --git a/Documentation/i386/boot.txt b/Documentation/i386/boot.txt
index fc49b79..645bbd7 100644
--- a/Documentation/i386/boot.txt
+++ b/Documentation/i386/boot.txt
@@ -42,6 +42,9 @@ Protocol 2.05:	(Kernel 2.6.20) Make protected mode
kernel relocatable.
 Protocol 2.06:	(Kernel 2.6.22) Added a field that contains the size of
 		the boot command line

+Protocol 2.07:	(Kernel 2.6.23) Added two fields hardware_subarch and
+		hardware_subarch_data to describe the paravirtualized
+		environment.

 **** MEMORY LAYOUT

-- 
Denis Cheng
Linux Application Developer

"One of my most productive days was throwing away 1000 lines of code."
 - Ken Thompson.
-

From: H. Peter Anvin
Date: Tuesday, October 2, 2007 - 4:44 pm

Acked-by: H. Peter Anvin <hpa@zytor.com>

	-hpa
-

From: Jeremy Fitzhardinge
Date: Tuesday, October 2, 2007 - 4:46 pm

Ah, good.  I was thinking about reviving this work.  The main problem is
that sticking an ELF header at the 1 meg mark (the address of the
bzImage "payload") breaks 32-bit bootloaders which think they can just
jump to 32-bit code there.  I started a conversation with Eric at KS
about it, but we didn't reach any conclusions.

This series looks like a good start for Xen, but we still need to work
out where to stash the metadata which normally lives in ELF notes.  
Using ELF is convenient for Xen because it lets a large chunk of domain
builder code be reused; on the other hand, loading a plain bzImage is
pretty simple, so maybe it isn't such a big deal.

HPA, Eric: if we don't go the "embed ELF" path, where's a good
backwards-compatible place to stash the note data?  If we do go with
"embed ELF", how should we go about doing it?  Arrange to put the ELF
headers before the 1M mark?

    J
-

From: H. Peter Anvin
Date: Tuesday, October 2, 2007 - 4:53 pm

This sounds like another good reason to do the ELF image as the 
postcompression image.  The interface to the embedded compression 
routine is then unchanged, and we get the "full vmlinux" with any notes 
that belongs there.

I'll try to get an implementation of that done -- it really shouldn't be 
very hard.

	-hpa

-

From: Jeremy Fitzhardinge
Date: Tuesday, October 2, 2007 - 4:56 pm

Please explain what you're proposing again, because my memory of your
plan from last time wouldn't help in this case.  Are you proposing that
the bzImage contains compressed data that its expecting the bootloader
to decompress?  Won't that completely break backwards compatibility?  If
we don't care about backwards compatibility with old bootloaders, then
it doesn't matter what we do one way or the other.

    J
-

From: H. Peter Anvin
Date: Tuesday, October 2, 2007 - 5:43 pm

No, not at all.

I'm proposing that the existing bzImage format be retained, but that the 
payload of the decompressor (already a gzip file) simply be vmlinux.gz 
-- i.e. a gzip compressed ELF file, notes and all.  A pointer in the 
header will point to the offset of the payload (this is new, obviously.)

The decompression stub is adjusted to expect an ELF image, instead of a 
raw binary.

Existing bootloaders (16- or 32-bit) simply load the bzImage the way 
they do now; new bootloaders have the option of accessing the vmlinux.gz 
directly if they either want to load it themselves or want to examine 
the notes.

	-hpa

-

From: H. Peter Anvin
Date: Tuesday, October 2, 2007 - 5:46 pm

Slight correction: it does, of course, break loaders which root through 
the bzImage for a gzip header and decode that themselves and place in 
memory.  These loaders are pretty broken, though; they can't deal with 
the fact that the physical address of the kernel is configurable, for 
one thing.

	-hpa
-

From: Jeremy Fitzhardinge
Date: Tuesday, October 2, 2007 - 5:58 pm

OK, but that has the same problem as making the payload an ELF file:
32-bit bootloaders which simply jump to 1M will be jumping into data
rather than code - and I got the impression from taking to Eric at KS
that there are such bootloaders.  

If that's not an issue, then I still think the payload should be a plain
ELF file (possibly self-decompressing, or just a plain uncompressed
vmlinux, if that's what's desired).  Still think making a protected-mode
bootloader do the decompression is the wrong way to go about this; ELF
is enough.

    J
-

From: H. Peter Anvin
Date: Tuesday, October 2, 2007 - 6:03 pm

It would be cleaner to actually parse the ELF; it's only a handful of 
lines of code (we don't have to support arbitrary placement of sections, 

Uhm, no it doesn't.  Those bootloaders jump to the decompressor, not to 

It doesn't have to if it doesn't want to; it only needs to do so if it 
wants to access the kernel as an ELF.  Again, it has the advantage that 
the ELF is the real vmlinux, no funnies.

	-hpa
-

Previous thread: [PATCH] Handle errors in sync_sb_inodes() by Guillaume Chazarain on Tuesday, October 2, 2007 - 3:57 pm. (2 messages)

Next thread: Kernel 2.4 vs 2.6 Traffic Controller performance by Sonny on Tuesday, October 2, 2007 - 5:05 pm. (2 messages)