login
Header Space

 
 

Linux: Rewriting the x86 Setup Code

July 15, 2007 - 2:50pm
Submitted by Jeremy on July 15, 2007 - 2:50pm.
Linux news

H. Peter Anvin submitted a series of patches rewriting the x86 setup code, "this patch set replaces the x86 setup code, which is currently all in assembly, with a version written in C, using the '.code16gcc' feature of binutils (which has been present since at least 2001.)" He went on to explain why he did this, "the new code is vastly easier to read, and, I hope, debug. It should be noted that I found a fair number of minor bugs while going through this code, and have attempted to correct them."

Linus Torvalds reacted favorably, "I can't really argue against this on any sane grounds - not only is it removing more lines than it adds, but moving from mostly unreadable assembly to C seems a good idea." He went on to note, "so let's just get this merged. But the question is, do we put it in 2.6.23-rc1, or do we put it in -mm for a few weeks, which would imply waiting for the next merge window? Andrew?" Andrew Morton pointed out that the patches have been in -mm already for a couple of months, "this code has been in -mm since 11 May, as git-newsetup.patch. It has caused (for what it is) astonishingly few problems. Maybe a couple of build glitches and one runtime failure, all quickly fixed. I'd say it's ready." Linus agreed, "Ok. That makes it easy. I'll just merge it."


From:	H. Peter Anvin [email blocked]
Subject: x86 setup code rewrite in C - revised
Date:	Wed, 11 Jul 2007 12:18:25 -0700

This patch set replaces the x86 setup code, which is currently all in
assembly, with a version written in C, using the ".code16gcc" feature
of binutils (which has been present since at least 2001.)

The new code is vastly easier to read, and, I hope, debug.  It should
be noted that I found a fair number of minor bugs while going through
this code, and have attempted to correct them.

In the process of doing so, it introduces several cleanups, in
particular:

- Obsoletes the hd_info field in the boot_params structure; they are
  only ever used for ST-506 (pre-IDE) drives and are pretty much
  guaranteed to be wrong on current BIOSes;
- Unifies the CPU feature bits between i386 and x86-64.  In the
  future, it should be possible to use arch/i386/boot/cpucheck.c to do
  the post-invocation CPU check currently done in
  arch/x86_64/kernel/trampoline.S, although this patch set doesn't
  introduce that change.
- boot_params is now a proper structure.

This patchset incorporates all feedback received by 2007-07-11 12:00 PDT.

 arch/i386/boot/bootsect.S                     |   98 -
 arch/i386/boot/edd.S                          |  231 --
 arch/i386/boot/setup.S                        | 1075 -------------
 arch/i386/boot/video.S                        | 2043 --------------------------
 arch/i386/kernel/verify_cpu.S                 |   94 -
 arch/x86_64/boot/bootsect.S                   |   98 -
 arch/x86_64/boot/install.sh                   |    2 
 arch/x86_64/boot/mtools.conf.in               |   17 
 arch/x86_64/boot/setup.S                      |  826 ----------
 arch/x86_64/boot/tools/build.c                |  185 --
 b/Documentation/i386/zero-page.txt            |    1 
 b/MAINTAINERS                                 |    4 
 b/arch/i386/Kconfig.cpu                       |    6 
 b/arch/i386/boot/Makefile                     |   48 
 b/arch/i386/boot/a20.c                        |  161 ++
 b/arch/i386/boot/apm.c                        |   97 +
 b/arch/i386/boot/bitops.h                     |   45 
 b/arch/i386/boot/boot.h                       |  296 +++
 b/arch/i386/boot/cmdline.c                    |   97 +
 b/arch/i386/boot/code16gcc.h                  |   15 
 b/arch/i386/boot/compressed/Makefile          |    7 
 b/arch/i386/boot/compressed/head.S            |    6 
 b/arch/i386/boot/copy.S                       |  101 +
 b/arch/i386/boot/cpu.c                        |   69 
 b/arch/i386/boot/cpucheck.c                   |  267 +++
 b/arch/i386/boot/edd.c                        |  196 ++
 b/arch/i386/boot/header.S                     |  283 +++
 b/arch/i386/boot/main.c                       |  161 ++
 b/arch/i386/boot/mca.c                        |   43 
 b/arch/i386/boot/memory.c                     |   99 +
 b/arch/i386/boot/pm.c                         |  170 ++
 b/arch/i386/boot/pmjump.S                     |   54 
 b/arch/i386/boot/printf.c                     |  307 +++
 b/arch/i386/boot/setup.ld                     |   54 
 b/arch/i386/boot/string.c                     |   52 
 b/arch/i386/boot/tools/build.c                |  160 --
 b/arch/i386/boot/tty.c                        |  112 +
 b/arch/i386/boot/version.c                    |   23 
 b/arch/i386/boot/vesa.h                       |   79 +
 b/arch/i386/boot/video-bios.c                 |  125 +
 b/arch/i386/boot/video-vesa.c                 |  284 +++
 b/arch/i386/boot/video-vga.c                  |  260 +++
 b/arch/i386/boot/video.c                      |  456 +++++
 b/arch/i386/boot/video.h                      |  145 +
 b/arch/i386/boot/voyager.c                    |   46 
 b/arch/i386/kernel/cpu/Makefile               |    2 
 b/arch/i386/kernel/cpu/addon_cpuid_features.c |   50 
 b/arch/i386/kernel/cpu/common.c               |    2 
 b/arch/i386/kernel/cpu/proc.c                 |   21 
 b/arch/i386/kernel/e820.c                     |    2 
 b/arch/i386/kernel/setup.c                    |   12 
 b/arch/x86_64/Kconfig                         |    4 
 b/arch/x86_64/boot/Makefile                   |  136 -
 b/arch/x86_64/boot/compressed/Makefile        |    9 
 b/arch/x86_64/boot/compressed/head.S          |    6 
 b/arch/x86_64/kernel/Makefile                 |    2 
 b/arch/x86_64/kernel/setup.c                  |   21 
 b/arch/x86_64/kernel/verify_cpu.S             |   22 
 b/drivers/ide/legacy/hd.c                     |   73 
 b/include/asm-i386/boot.h                     |    6 
 b/include/asm-i386/bootparam.h                |   85 +
 b/include/asm-i386/cpufeature.h               |   26 
 b/include/asm-i386/e820.h                     |   14 
 b/include/asm-i386/processor.h                |    1 
 b/include/asm-i386/required-features.h        |   39 
 b/include/asm-i386/setup.h                    |   10 
 b/include/asm-x86_64/alternative.h            |   68 
 b/include/asm-x86_64/boot.h                   |   16 
 b/include/asm-x86_64/bootparam.h              |    1 
 b/include/asm-x86_64/cpufeature.h             |  115 -
 b/include/asm-x86_64/e820.h                   |    4 
 b/include/asm-x86_64/processor.h              |    3 
 b/include/asm-x86_64/required-features.h      |   46 
 b/include/asm-x86_64/segment.h                |    8 
 b/include/linux/edd.h                         |    4 
 b/include/linux/screen_info.h                 |    9 
 76 files changed, 4606 insertions(+), 5209 deletions(-)


From: "H. Peter Anvin" [email blocked] Cc: "H. Peter Anvin" [email blocked] Subject: [x86 setup 01/33] x86 setup: MAINTAINERS: formally take responsibility for the i386 boot code Date: Wed, 11 Jul 2007 12:18:26 -0700 From: H. Peter Anvin [email blocked] Change MAINTAINERS to formally take responsibility for the i386 boot code. Signed-off-by: H. Peter Anvin [email blocked] --- MAINTAINERS | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index df40a4e..7f92ce2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1750,8 +1750,8 @@ T: http://www.harbaum.org/till/i2c_tiny_usb S: Maintained i386 BOOT CODE -P: Riley H. Williams -M: [email blocked] +P: H. Peter Anvin +M: [email blocked] L: [email blocked] S: Maintained -- 1.5.2.2
From: Linus Torvalds [email blocked] To: "H. Peter Anvin" [email blocked] Subject: Re: x86 setup code rewrite in C - revised Date: Thu, 12 Jul 2007 10:24:48 -0700 (PDT) On Wed, 11 Jul 2007, H. Peter Anvin wrote: > > This patch set replaces the x86 setup code, which is currently all in > assembly, with a version written in C, using the ".code16gcc" feature > of binutils (which has been present since at least 2001.) > > 76 files changed, 4606 insertions(+), 5209 deletions(-) I can't really argue against this on any sane grounds - not only is it removing more lines than it adds, but moving from mostly unreadable assembly to C seems a good idea. How does this impact the size of that code? Do we even care? But as to how to integrate it, I'm not sure I really want to just merge it. I suspect we would want to have it in some public tree that people actually test at least to some degree first, and the -mm tree seems to make most sense. I didn't see anything objectionable in the series, although I do think the explanations need to be re-done for a number of them. You seem to have violated the "a single line to explain the patch at the top" rule, and as a result they make no sense for some of them (the explanation for patch 05/33 doesn't parse for me and 07/33 seems to have the single-line problem) So let's just get this merged. But the question is, do we put it in 2.6.23-rc1, or do we put it in -mm for a few weeks, which would imply waiting for the next merge window? Andrew? Linus
From: Andrew Morton [email blocked] To: Linus Torvalds [email blocked] Subject: Re: x86 setup code rewrite in C - revised Date: Thu, 12 Jul 2007 10:30:56 -0700 On Thu, 12 Jul 2007 10:24:48 -0700 (PDT) Linus Torvalds [email blocked] wrote: > > > On Wed, 11 Jul 2007, H. Peter Anvin wrote: > > > > This patch set replaces the x86 setup code, which is currently all in > > assembly, with a version written in C, using the ".code16gcc" feature > > of binutils (which has been present since at least 2001.) > > > > 76 files changed, 4606 insertions(+), 5209 deletions(-) > > I can't really argue against this on any sane grounds - not only is it > removing more lines than it adds, but moving from mostly unreadable > assembly to C seems a good idea. > > How does this impact the size of that code? Do we even care? > > But as to how to integrate it, I'm not sure I really want to just merge > it. I suspect we would want to have it in some public tree that people > actually test at least to some degree first, and the -mm tree seems to > make most sense. > > I didn't see anything objectionable in the series, although I do think the > explanations need to be re-done for a number of them. You seem to have > violated the "a single line to explain the patch at the top" rule, and as > a result they make no sense for some of them (the explanation for patch > 05/33 doesn't parse for me and 07/33 seems to have the single-line > problem) > > So let's just get this merged. But the question is, do we put it in > 2.6.23-rc1, or do we put it in -mm for a few weeks, which would imply > waiting for the next merge window? Andrew? > This code has been in -mm since 11 May, as git-newsetup.patch. It has caused (for what it is) astonishingly few problems. Maybe a couple of build glitches and one runtime failure, all quickly fixed. I'd say it's ready.
From: Linus Torvalds [email blocked] To: Andrew Morton [email blocked] Subject: Re: x86 setup code rewrite in C - revised Date: Thu, 12 Jul 2007 10:49:53 -0700 (PDT) On Thu, 12 Jul 2007, Andrew Morton wrote: > > This code has been in -mm since 11 May, as git-newsetup.patch. It has > caused (for what it is) astonishingly few problems. Maybe a couple of > build glitches and one runtime failure, all quickly fixed. > > I'd say it's ready. Ok. That makes it easy. I'll just merge it. Linus



Related Links:

What's for?

July 15, 2007 - 8:19pm
Anonymous (not verified)

And what exactly does this x86 setup thing?

x86 setup

July 15, 2007 - 8:48pm
fraggle (not verified)

This is the low level code that is run at the very start of OS boot. Currently, the code for x86 (PC/Intel) startup is written in assembler, which is hard to read. This is a rewrite that does it in C instead. C code is easier to read and debug, which is why the majority of Linux is written in it. Previously, this startup code had to be written in assembler because it runs in the processor's 16-bit mode. There is now gcc support for generating 16-bit code, so it can all now be done in C.

Assembly is the language.

July 16, 2007 - 1:28am
Anonymous (not verified)

Assembly is the language. Assembler is the application that compiles it to Machine code. But good explanation otherwise :)

Nitpicking

July 16, 2007 - 3:49am

In all my years coding, I've seen the phrases "in assembly language" and "in assembler" used to refer to the same stuff: Code meant to be assembled by an assembler. In fact, Dictionary.com lists assembler as a synonym for assembly language. (Take a look at definition 2b.)

Anyway, all that said, the ".code16gcc" directive does not instruct GCC to generate 32-bit code. GCC will still generate 32-bit code. The directive instructs the assembler (GAS) to put override prefixes on instructions that refer to 32-bit registers, because the code will run in a segment whose default argument width is 16.

See, on x86, "INC AX" and "INC EAX" both have the same opcode, but one of the two gets a prefix. Which one? The answer is: "It depends." If the code runs from a 16-bit segment, then the argument-width override prefix gives you the latter form. If you're running from a 32-bit segment, the prefix gives you the former format.

Fun fact: x86-64 reuses this mechanism to select between 32-bit and 64-bit argument widths. Once 64-bit support is enabled, though, one can only select between 32-bit and 64-bit segments. This is how 32-bit code can run under x86-64, and how you lose the ability to run 16-bit mode code (and thus things like Wine, which expect the old 16/32 behavior) on 64-bit Linux.

--
Program Intellivision and play Space Patrol!

An example...

July 16, 2007 - 3:29pm

If you'd like to prove to yourself the "INC EAX" example, do the following. Put this in a file named inc1.s:

    .text
    inc %eax
    inc %ax

Put this in a file named inc2.s:

    .text
    .code16gcc
    inc %ax
    inc %eax

Now compile (or, rather, assemble) them with:

gcc -c inc1.s
gcc -c inc2.s

If you do a byte-wise comparison of the two output files, inc1.o and inc2.o, you'll find they're identical. At least, they are on this machine, which is a RHEL 4 box.

--
Program Intellivision and play Space Patrol!

It's not a question of the

July 17, 2007 - 12:06am
Anonymous (not verified)

It's not a question of the object files being equal but rather of what happens when the code is executed. It's the same machine code but it will have the semantics of the first snippet in a 32-bit code segment and the semantics of the lower in a 16-bit code segment.

We're talking past each other

July 17, 2007 - 12:48am

My point is that "INC EAX" in a 32-bit segment has precisely the same encoding as "INC AX" in a 16-bit segment, especially the lack of an operand-width override. GCC expects to run in a 32-bit segment and generates 32-bit code. The ".code16gcc" directive tells the assembler that "This is 32-bit code, but it will run in a 16-bit segment. Fix it up please."

That's rather different than telling the compiler to produce 16-bit code (regardless of the default operand width in the segment the code ends up running in).

--
Program Intellivision and play Space Patrol!

German is easier (as

July 16, 2007 - 5:27pm
Anonymous (not verified)

German is easier (as usual):
The language is called assembler and the programm to assemble assembler into machine code is called assembler too. Simple. ;)

who initiates this x86 code

July 16, 2007 - 7:58am
hscripts (not verified)

I am too new to this. Just wanted to know who initiates this x86 code?

Not sure which question you're asking

July 16, 2007 - 3:20pm

If you're asking who initially wrote this code, much of it is code that dates back to the earliest Linux kernels. It's obviously been updated multiple times over the years, but it's ancient code.

If you're asking what invokes this code, it's the code that gets run shortly after the bootloader has loaded the kernel image. This puts the core of the machine in a particular state, and then it jumps to the rest of the loader which unpacks the kernel and starts booting. I believe at the time this code's run, the CPU is still in 16-bit x86 real mode, although I could very well be mistaken. At least at the time this page was written, it describes setup.S as: "arrang[ing] the transition from the processor running in real mode (DOS mode) to protected mode (full 32-bit mode)."

I consider myself quite lucky to NOT need to know these details, but I do find it somewhat fascinating.

--
Program Intellivision and play Space Patrol!

The thing you write about

July 17, 2007 - 3:10am
Jezze (not verified)

The thing you write about setup.S is how I understand it as well. But why can't you instruct GRUB to setup protected mode and then load the linux kernel? I can guess it has something to do with the GDT (Global Descriptor Table)?

And if GRUB isn't the boot loader ...

July 17, 2007 - 12:08pm
Anonymous (not verified)

... what then? I don't think that relying on the bootloader is a good idea.

Legacy

July 16, 2007 - 9:44am
Anonymous (not verified)

It sucks that x86 is so legacy and that kernels and bootloaders are so full of "hacks" and workarounds.

Legacy - what it is

July 16, 2007 - 12:23pm
Anonymous (not verified)

A legacy is a legacy. It's cool that x86 became increasingly cheap so that
computers were sufficiently widespread to drive the networking infrastructure
that enabled projects like Linux to succeed. The corporations which drove this
were trying to guzzle from the money fountain of personal computing. From this
foundation a somewhat cleaner operating system mostly now based on a co-operative
meritocracy (whose merit scale is suggested by a few and promulgated by many) is
evolving. It would have been cooler if it had happened with (technically excellent
chip of choice), but this is our legacy.

Well, I guess you *could*

July 17, 2007 - 9:44pm
Anonymous (not verified)

Well, I guess you *could* save up for a power5 or an itanium 2 based machine... legacy free... and damned expensive :-/

Legacy

July 18, 2007 - 1:24am
Anonymous (not verified)

A legacy is a Legacy. Bug Intel for building the chips this way. The x86 raw mode, and em386 protected mode are their designs (its hardware, not software). The bios looks for bootable partitions. It find the first one it sees and loads it in low memory (64K), and starts it running. Usually thats a bootloader. The bootloader lets you select an operating system (eg: Linux kernel). This is all in x86 "real" mode. Real mode only lets you have 640K of memory. Now say the bootloader picks Linux. Linux kernel is bigger than 640K, so you have to use long jumps and move it into memory areas above what the current system can see. That is, load it into the low (640K) area, then blindly move it to somewhere the current system can't see it (keep track of the blind move) remember it is loading and moving in blocks it can handle. Then (as a last step after the bootloader has loaded and then moved everything, put the system into 386 "virtual" mode, and give the new starting address of your linux kernel (in the previously unseeable memory location) to the CPU program counter, essentially starting your new kernel. The first thing to do with your new x86 kernel: move the whole kernel back to the low memory area and get some of that 640K back (in dmesg it says "xxx kilobytes freed"). Linux then tests for hardware, and installs drivers, starts init and then the user space initialization routines. Some of that loading and shuffling around of the kernel is done by this code.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
speck-geostationary