Hi Linus; please pull:
git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-x86setup.git for-linus
H. Peter Anvin (2):
x86 setup: add a near jump to serialize %cr0 on 386/486
x86 setup: set %ebx == %ebp == %edi == 0 on protected mode entry
arch/x86/boot/pmjump.S | 12 ++++++++----
1 files changed, 8 insertions(+), 4 deletions(-)
commit 142a92e61f9c405a114cb2bfaf3ce3f537a48a89
Author: H. Peter Anvin <hpa@zytor.com>
Date: Sun Nov 4 17:54:31 2007 -0800
x86 setup: set %ebx == %ebp == %edi == 0 on protected mode entry
In accordance with the newly formalized 32-bit boot protocol, set
%ebx == %ebp == %edi == 0 in order to support future extensions to the
protocol.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
diff --git a/arch/x86/boot/pmjump.S b/arch/x86/boot/pmjump.S
index 18732f7..0d24e96 100644
--- a/arch/x86/boot/pmjump.S
+++ b/arch/x86/boot/pmjump.S
@@ -28,13 +28,15 @@
* void protected_mode_jump(u32 entrypoint, u32 bootparams);
*/
protected_mode_jump:
- xorl %ebx, %ebx # Flag to indicate this is a boot
movl %edx, %esi # Pointer to boot_params table
movl %eax, 3f # Patch ljmpl instruction
jmp 1f # Short jump to flush instruction q.
1:
movw $__BOOT_DS, %cx
+ xorl %ebx, %ebx # Per the 32-bit boot protocol
+ xorl %ebp, %ebp # Per the 32-bit boot protocol
+ xorl %edi, %edi # Per the 32-bit boot protocol
movl %cr0, %edx
orb $1, %dl # Protected mode (PE) bit
commit ad676d0fdf2e59ccc28ee9f6f9593ff14a3d8a5a
Author: H. Peter Anvin <hpa@zytor.com>
Date: Sun Nov 4 17:50:12 2007 -0800
x86 setup: add a near jump to serialize %cr0 on 386/486
The 386 and 486 needs a jump immediately after setting %cr0 in order
to serialize the pipeline.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
diff --git a/arch/x86/boot/pmjump.S b/arch/x86/boot/pmjump.S
index 2e55923..18732f7 100644
--- a/arch/x86/boot/pmjump.S
+++ b/arch/x86/boot/pmjump...Just for the record, I realized this patch could be done slightly
cleaner and cleaned it up accordingly.
git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-x86setup.git for-linus
H. Peter Anvin (2):
x86 setup: add a near jump to serialize %cr0 on 386/486
x86 setup: set %ebx == %ebp == %edi == 0 on protected mode entry
arch/x86/boot/pmjump.S | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)
commit 9f259cc59ba45b8db401d60be9700e275676fb15
Author: H. Peter Anvin <hpa@zytor.com>
Date: Sun Nov 4 17:54:31 2007 -0800
x86 setup: set %ebx == %ebp == %edi == 0 on protected mode entry
In accordance with the newly formalized 32-bit boot protocol, set
%ebx == %ebp == %edi == 0 in order to support future extensions to the
protocol.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
diff --git a/arch/x86/boot/pmjump.S b/arch/x86/boot/pmjump.S
index 26baeab..fa6bed1 100644
--- a/arch/x86/boot/pmjump.S
+++ b/arch/x86/boot/pmjump.S
@@ -28,11 +28,13 @@
* void protected_mode_jump(u32 entrypoint, u32 bootparams);
*/
protected_mode_jump:
- xorl %ebx, %ebx # Flag to indicate this is a boot
movl %edx, %esi # Pointer to boot_params table
movl %eax, 2f # Patch ljmpl instruction
movw $__BOOT_DS, %cx
+ xorl %ebx, %ebx # Per the 32-bit boot protocol
+ xorl %ebp, %ebp # Per the 32-bit boot protocol
+ xorl %edi, %edi # Per the 32-bit boot protocol
movl %cr0, %edx
orb $1, %dl # Protected mode (PE) bit
commit 7ed192906a2144ebc8ca2925a85d27b9c5355668
Author: H. Peter Anvin <hpa@zytor.com>
Date: Sun Nov 4 17:50:12 2007 -0800
x86 setup: add a near jump to serialize %cr0 on 386/486
The 386 and 486 needs a jump immediately after setting %cr0 in order
to serialize the pipeline.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
diff --git a/arch/x86/boot/pmjump.S b/arch/x86/boot/pmjump.S
index 2e55923..26baeab 100644
--- a/arch/x86/boot/pmjump.S
+...Ok, I'm obviously happier, but I have to admit that the original code was safer than the new code. It did both the short jump and the far jump before reloading any segments. So I suspect the new code _works_ fine, but it's simply not as fundamentally safe as the old code was. The old code did do some instructions in between the short jump and the far jump, but they were all the kind of instructions that didn't care about the PE bit: there was a _read_ of the segment descriptor value, but that's mode-independent (it's only the writes that matter), and the other instructions were bog-standard integer instructions. So I would actually prefer some additional safety, with something like the appended.. This is TOTALLY UNTESTED! I checked with objdump that the result looks roughly ok, but I didn't really think through the segment base address in that long jump thing. Do we have the difference between flat mode and the 16-bit bootup mode in some better way? Hmm? Linus -- arch/x86/boot/pmjump.S | 25 +++++++++++++++++-------- 1 files changed, 17 insertions(+), 8 deletions(-) diff --git a/arch/x86/boot/pmjump.S b/arch/x86/boot/pmjump.S index fa6bed1..587dc04 100644 --- a/arch/x86/boot/pmjump.S +++ b/arch/x86/boot/pmjump.S @@ -29,7 +29,11 @@ */ protected_mode_jump: movl %edx, %esi # Pointer to boot_params table - movl %eax, 2f # Patch ljmpl instruction + + xorl %ecx, %ecx # add data segment offset to + movw %ds, %cx # the "in_32_bit_mode" thing. + shll $4, %ecx + addl %ecx, 2f movw $__BOOT_DS, %cx xorl %ebx, %ebx # Per the 32-bit boot protocol @@ -42,15 +46,20 @@ protected_mode_jump: jmp 1f # Short jump to serialize on 386/486 1: - movw %cx, %ds - movw %cx, %es - movw %cx, %fs - movw %cx, %gs - movw %cx, %ss - # Jump to the 32-bit entrypoint .byte 0x66, 0xea # ljmpl opcode -2: .long 0 # offset +2: .long in_32_bit_mode # offset .word __BOOT_CS # segment .size protected_mode_jump, .-protected_m...
Well, we *could* do a 16-bit PM segment (and do two far jumps), but that seems rather silly. We'd have to patch the GDT for the base in that case, anyway. This is more or less the same code I had for the first version of the patch, modulo moving the short jump of course. I do like making the 32-bit code a separate function, but it really should be "movl %ecx,..." in the 32-bit code. I have to admit I agree with Eric that this is probably overkill, but hey, there is nothing like a bit of overkill to make sure something is really and truly dead. Cooking up a tree now. -hpa -
Yeah, there is no point in having two far jumps. One is enough. The point being that since apparently the new boot standards say that the 32-bit code is entered with segments etc set to specific values, then we shouldn't do the jump to that 32-bit standard with a far jump: we should do it as a regular jump, because we'd want to to set up the segments etc At least my assembler does the right thing with just the plain "mov" for segments, but yes, there may be old assemblers that add a useless data size override. So "movl %ecx,%*s" is probably the right thing to do to make sure they don't do anything stupid.. Btw, on that same kind of thread: I think we should move the clearing of the registers into the 32-bit mode too, since that makes the instructions shorter (no operand size override), and makes more sense anyway (then we can also clean %edx/%ecx. Final comment: shouldn't we set up %esp to be correct for the new %ss too? Linus -
Well, the 32-bit code needs to set up its own stack, and only it knows where it wants its stack; we don't guarantee that the stack is valid when we enter the 32-bit code and we're entering with both INT and NMI disabled (requiring a stack would probably break all existing users of the 32-bit entrypoint.) However, that being said, doing so is trivial, and it might help some debugging hack; anything that makes debugging easier is a Good Thing[TM]. -hpa -
I agree. But it would be nice if some basic instructions still worked: as is, you cannot even do things like reloading %eflags, because the only way Yeah. Even if it was just re-using the boot-time stack area temporarily, just to give code the choice to use a common set of instructions. Linus -
If I had to do it from scratch today I would make the 32-bit entry point require a stack, segments and use C calling conventions to pass struct boot_params *. Besides %esi I'm not really fond of requiring anything in the 32bit entrypoint. At the same time I totally agree that it is always nice to provide way more then you need. Eric -
Nailing down the interface as hard as possible is a good idea, to avoid tying your hands for the future. -hpa -
I'm just saying be liberal in what you accept and conservative in what you send. Making the entire process well defined is useful so things don't break unnecessarily, and the maintainers of the pieces of software that use the interface know what they can reliably get away with and what is just luck. Currently using the 32-bit entry point reliably requires: %cs to be set. %esi to be set. %ebx be set to 0. %gdt to be set and have: 0x10 a 32bit 4G code segment with base of 0 0x18 a 32bit 4G data segment with base of 0 With the latest generation of the boot protocol if KEEP_SEGMENTS is set then it is only required that the data segments %ds, %es, %fs, %gs and %ss be initialized to a valid value. I have no problem with code providing more then what is required above, and in fact I think it is likely a good thing. For future expansion of the protocol things will go easiest if we don't add additional requirements to the list above, as that is all that I think all current boot loaders provide. Anyway this is getting off topic. So far the changes to pmjump.S look to be going well. Eric -
Actually, I suspect the currently code will handle %ebx with any value, Specifying now that unused GPRs should be zeroed will allow for changes if and when we need it. It's an easy requirement to fulfill, so boot loader authors can put it through the pipe now. Then, if we find Thanks. I just pushed two more patches to the git tree; one to do the paranoia thing, and one to initialize LDTR and TR; the latter is for the benefit of Intel VT and is not required for correctness, but it should be able to speed up booting slightly on VT-based hardware. See: http://git.kernel.org/?p=linux/kernel/git/hpa/linux-2.6-x86setup.git;a=log;h=for-linus -hpa -
Correct. So a bootloader must set %ebx to zero to handle those older kernels. Sure it is reasonable to ask for. The bootloader pipe is awfully long though. But putting it in at the time we clean up the rules for the 32bit entry point is the best chance we are going to get to be able to change things. Eric -
Erm, I guess I see what you mean, but it comes to the effect of tying
your hands now in a specific way, rather than having them tied in an
unknown way later on...
But I hadn't noticed the 32-bit boot protocol spec go in. Unfortunately
it isn't useful for booting a pv Xen guest; I just mailed my comments.
I hope we can iterate this to something more generally useful before
getting too wedded to the current protocol.
J
-I'm not so sure about that. Xen PV is rather fundamentally a different This is addressed by the "don't reload segments" bit in LOADFLAGS. -hpa -
OK.
J
-Yes; specifically, boot_params.hdr.hardware_subarch == 0 (as opposed to compile-time subarchitectures, like Voyager, which still boots the same way as far as I know.) It would definitely be good to document what other values in this field Specifically, with this bit set the decompression code won't touch the segment registers at all, and it's up to the caller to have all code and data segments set up with suitable descriptors. The kernel will still try to install its own GDT when the kernel proper starts; this becomes a hardware_subarch issue. -hpa -
Yes, though it would be nice to use this mechanism to deal with voyager
booting too, so that it can be normalized as a pvop backend rather than
Yes, the "setup proper kernel gdt" is part of the hwsubarch-specific
startup code.
Another thing it would be nice to add is an elf-note-like notion so that
the kernel can export arbitrary key/value data to the bootloader (ie,
the converse of the bootloader->kernel value list). Xen currently does
this via ELF notes, but any semanically equivalent mechanism would do.
It's probably simpler than trying to work out how to mush bzimage and
ELF together.
J
-I suspect all we need is an offset-pointer field pointing into the kernel image. As far as the kernel build process is concerned, it becomes a section in the boot/compressed link script. That offset then needs to get exported to the setup.elf link stage and there adjusted to become a file offset. The ELF note format is sane enough, although it looks like it's not self-terminating, so we'd either need an offset and a length field, or adopt the convention that namesz = descsz = type = 0 terminates the block (I prefer the latter, myself.) We also need the notes documented, obviously. -hpa -
Hm, I think offset+length would be better: it's how they're represented
in a normal ELF file, so you can just extract the length if you're
extracting the notes. Also, generating a terminating note with the
current linker-based notes machinery would be a bit of a pain.
J
-.notes : {
*(.note.*)
. = ALIGN(4);
LONG(0);
LONG(0);
LONG(0);
}
Am I missing something?
-I don't think adding a length any harder. The all zero note is reserved so using it this way should be ok. Regardless this sounds like a sane thing to be looking at. Eric -
Oh, I suppose, but I never much liked putting data-definition into the
linker script.
J
-I think it should be sparsely used, but stuff like simple end markers is pretty much what it's good for. The main reason I want to avoid adding another header field is that the header is a finite resource; one of the many poor decisions in its original design was using a 2-byte jump at the top, so address 0x281 is the end of the universe. -hpa -
That was fixed long ago (by having a 4 byte reserved field in the middle) that we can do a two byte jump and then do a farther jump from there to the 16bit code. So as long as we actually use discipline and really reserve the field for a further jump there should be no need for 0x281 being the end of the universe. Eric -
That's not the only complication. The thing that concern me more is boot loaders using the jump as a length indicator, and there is really very little chance to test that out safely, except perhaps by breaking it immediately (by adding a 16-byte jump at the end; that way we provide a minimum of overlap for boot loader authors.) That being said, I don't see any such field (bootsect_kludge could be recycled, arguably, and pad2 is three bytes which is enough for a 16-bit jump.) At the moment, though, that would only push the maximum from 0x281 to 0x290, then we run into the next field in struct boot_params. Although this field can also be relocated over time, it once again shows that breaking this particular limit is nontrivial, and that we're better off trying to avoid pushing it. However, with a little discipline I think we can make 0x281 last us for the usable lifetime of this format. In the 10 years since the 2.00 format was created, we have only added 36 bytes of header, and we have 57 bytes left (plus 5 bytes of pad and 6 bytes of recyclable field.) When we get closer to full, if we haven't already created a mechanism making field additions obsolete I think we would be better off creating a pointer to a secondary header than trying to break the limitations involved in the current header format. -hpa -
The old setup.S had that 16-byte jump in there. We actually goofed when we added the relocatable bzImage support and I have a hard time believing in discipline when I see the amount of not invented here and various oddball mistakes (cause by overlooking things) that seems to go on when extending the format. We never needed to change the way the command line was passed, and we should have kept the longer jump where we had it. If we are going to through and add an additional pointer to a notes section let's please put a jump in there so we can make the header longer as we choose. Pointers really, really, really suck for maintenance of binary formats. Offsets against a known base are better, but better still is if you can avoid them entirely. For what we are doing allocating a contiguous piece of memory or file is not at all unreasonable. Eric -
The longer jump was never documented, and so didn't exist. There was definitely no way to rely on it. The old command-line protocol had some really ugly interactions with the absolutely insane hoisting code from the pre-2.02 days. I didn't have enough guts back then to scream and just rip it out, mostly because it took me a long time to figure out what the heck it really did (as opposed to what it claimed it did.) That being said, we probably could have gotten away with leaving the protocol as-is while ripping out the guts (as I eventually did in the rewrite), even if the old protocol only The problem is that that will only buy us 15 bytes, and eat up 3 (in practice, 4) of them... It might be worth doing anyway, as it'd only break the 32-bit entrypoint users to reorganize struct boot_params. -hpa -
| david | Re: Linux 2.6.27-rc8 |
| Chuck Ebbert | Why do so many machines need "noapic"? |
| Kumar Gala | PCI Failed to allocate mem for PCI ROM |
| Francois Romieu | Re: PROBLEM: 2.6.23-rc "NETDEV WATCHDOG: eth0: transmit timed out" |
git: | |
| Matthieu Moy | git push to a non-bare repository |
| Peter Stahlir | Git as a filesystem |
| Bill Lear | Meaning of "fatal: protocol error: bad line length character"? |
| Junio C Hamano | A note from the maintainer |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Chris Kuethe | Re: OpenBSD 4.4 amd64 bsd.mp can't detect 4GB memory |
| Austin English | Wine on OpenBSD |
| Darrian Hale | Re: uvm_mapent_alloc: out of static map entries on 4.3 i386 |
| John P Poet | Realtek 8111C transmit timed out |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Alexey Dobriyan | Re: [GIT]: Networking |
| Octavian Purdila | [RFC] support for IEEE 1588 |
