Hi there,
2.6.23-rc1 won't boot on my Asus M6N laptop and the last thing it shows after loading
grub is "No setup signature found...," which, imho, comes from
arch/i386/boot/header.S. I tried printing out the value of setup_sig like so
<snip>
# Setup corrupt somehow...
setup_bad:
movl $setup_corrupt, %eax
calll puts
movl setup_sig, %eax
calll puts
# Fall through...
.globl die
.type die, @function
die:
</snip>
but didn't have the time to lookup the proper way of printing the value of a variable in asm
and am pretty sure that the printed value during boot: "1-" is wrong. Config attached.
--
Regards/GruI hitted the same error message with v2.6.23-rc1-171-ge4903fb. No problem with v2.6.21-3770-g01e73be. Seems there has been some cleanup of the i386/boot code. Anyone can give a quick clue, or should I do a bisect? Xudong -
The bisect will almost guaranteed show a change at the change to the new setup code. The message means that the setup code wasn't loaded correctly into memory; the big question is *why*. What distro/version of grub are you running? I'm wondering if there are some old version of grub out there which did the "load four sectors" way-anciently-obsolete crap; the other possibility that comes to mind is setting up the stack in an invalid manner. -hpa -
I just finished bisecting between v2.6.22..v2.6.23-rc1 (13 kernels compiled, whew...) and Peter you are
right, here's the evidoer:
4fd06960f120e02e9abc802a09f9511c400042a5 is first bad commit
commit 4fd06960f120e02e9abc802a09f9511c400042a5
Author: H. Peter Anvin <hpa@zytor.com>
Date: Wed Jul 11 12:18:56 2007 -0700
Use the new x86 setup code for i386
This patch hooks the new x86 setup code into the Makefile machinery. It
also adapts boot/tools/build.c to a two-file (as opposed to three-file)
universe, and simplifies it substantially.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
grub version:
[boris@gollum:18:17:27:-> apt-cache show grub
Package: grub
Priority: optional
Section: admin
Installed-Size: 708
Maintainer: Grub Maintainers <pkg-grub-devel@lists.alioth.debian.org>
Architecture: i386
Version: 0.97-29
Depends: libc6 (>= 2.5-5), libncurses5 (>= 5.4-5)
Suggests: grub-doc, mdadm
Filename: pool/main/g/grub/grub_0.97-29_i386.deb
Size: 366884
MD5sum: 2da7a5942db06eaba046dff4615bcce9
SHA1: 7f4da793da209d011ce94fceebaebe0e5f08790f
SHA256: 2596782c08f1f7365e9935f687fef74c67d8702503188f22448db9f0ac98e18e
Description: GRand Unified Bootloader
...
so any ideas/test patches for debugging this are welcome.
--
Regards/Gruß,
Boris.
-This concerns me deeply. This is a current version of Grub which shouldn't have any silly 8K limitations. Yet it appears to have a similar pathology over the ancient version Xudong just described. The absolute best would be if we could replicate this in simulation (Bochs or Qemu); this would make it very simple to debug. Would you be willing to try to do that? -hpa -
sure, will do, however i'll be busy at work/travelling tomorrow but as soon as i
get home i'll whip up my qemu and run the kernel in question in it. However,
Xudong said that grub 0.97 boots just fine on his machine and i think it'll be
better to debug this right on the bare hardware (i.e. my laptop) ...?
Suggestions ?
--
Regards/Gruß,
Boris.
-If we can't reproduce the problem in simulation, that itself will tell us something very important. If we *can* reproduce it in simulation, it will be vastly easier to debug. -hpa -
On Thu, Jul 26, 2007 at 09:31:54PM -0700, H. Peter Anvin wrote:
Hi Peter,
sorry for the delay, here's my report. I got the qemu linux image from the
qemu website and did the following:
[boris@gollum:10:34:25:qemu:9553)-> qemu -kernel /boot/2.6.22-4fd06960f120e02e9abc802a09f9511c400042a5-12 -append "root=/dev/hda" linux-0.2.img
and the kernel did boot just fine so the problem should be pertaining only to the
case when we boot on the bare hardware...
--
Regards/Gruß,
Boris.
-You are using qemu itself as the kernel loader instead of your possible problematic grub on your harddisk. To duplicate the problem, you need to manually copy your grub and the kernel to linux-0.2.img and boot it with "qemu linux-0.2.img", although I am not sure exactly how to manually do this. Another way is to know your installation media that contains the copy of grub you are using, and use it to do an installation in qemu. Xudong -
Right, this was too easy to be true. I now did:
qemu -hda /dev/hda -snapshot
and booted from the hd using the installed grub and the same kernel and it
_didn't_ boot showing again "no setup signature found... "
--
Regards/Gruß,
Boris.
-Okay, so it's an algorithmic problem. This is quite important to know. Is /boot a separate partition on your disk by any chance? Either way, this means we can use qemu to debug this, which will make it a lot easier. This is what I'd like you to do next: - run qemu in one window with the -S -s options. - in another window, do: gdb target remote localhost:1234 set architecture i8086 disp/i ($cs << 4)+$eip br *0x10200 br *0x20200 br *0x30200 br *0x40200 br *0x50200 br *0x60200 br *0x70200 br *0x80200 br *0x90200 c # ... hopefully you're now stopped at a jump instruction p/x $ds # Hopefully this is showing, say, 0x9000 if you're stopped # at 0x90200 # Where X is the first digit of the address stopped at: dump memory setup.dump 0xX0000 0xX8000 Please send me setup.dump plus your vmlinuz file. Thanks, -hpa -
Here's a complete log of what i did: qemu -hda /dev/hda -snapshot -S -s and then in another window: ------- []# gdb GNU gdb 6.6-debian Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i486-linux-gnu". (gdb) target remote localhost:1234 Remote debugging using localhost:1234 0x0000fff0 in ?? () (gdb) set architecture i8086 The target architecture is assumed to be i8086 (gdb) disp/i ($cs << 4)+$eip 1: x/i ($cs << 4) + $eip 0xffff0: ljmp $0xf000,$0xe05b (gdb) br *0x10200 Breakpoint 1 at 0x10200 (gdb) br *0x20200 Breakpoint 2 at 0x20200 (gdb) br *0x30200 Breakpoint 3 at 0x30200 (gdb) br *0x40200 Breakpoint 4 at 0x40200 (gdb) br *0x50200 Breakpoint 5 at 0x50200 (gdb) br *0x60200 Breakpoint 6 at 0x60200 (gdb) br *0x70200 Breakpoint 7 at 0x70200 (gdb) br *0x80200 Breakpoint 8 at 0x80200 (gdb) br *0x90200 Breakpoint 9 at 0x90200 (gdb) c Continuing. Breakpoint 4, 0x00040200 in ?? () 1: x/i ($cs << 4) + $eip 0x40300: lea (%si),%dx (gdb) p/x $ds $1 = 0x18 (gdb) dump memory setup.dump 0x40000 0x48000 (gdb) q The program is running. Exit anyway? (y or n) y ---EOF--- Please find the setup.dump file attached. And by vmlinuz you meant bzImage, right? Anyways, due to the fact that its size will not fit in vger's size limits i'm sending it to you in a private mail. -- Regards/Gru
This isn't the setup code, it's doing something else. Could you try this again, but when you get to this point, if the instruction displayed isn't a "jmp" instruction, and $ds doesn't have the right value, enter "c" and see if you hit the proper break later. Sorry, -hpa -
Hi, i decided to do some cheating :) and skipped the breakpoint where it used to stop (0x40200). (by the way, hitting 'c' wouldn't continue at all and keep executing the same instruction over and over again). This time it seems it behaves as expected: GNU gdb 6.6-debian Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i486-linux-gnu". (gdb) target remote localhost:1234 Remote debugging using localhost:1234 0x0000fff0 in ?? () (gdb) set architecture i8086 The target architecture is assumed to be i8086 (gdb) disp/i ($cs << 4)+$eip 1: x/i ($cs << 4) + $eip 0xffff0: ljmp $0xf000,$0xe05b (gdb) br *0x10200 Breakpoint 1 at 0x10200 (gdb) br *0x20200 Breakpoint 2 at 0x20200 (gdb) br *0x30200 Breakpoint 3 at 0x30200 (gdb) br *0x50200 Breakpoint 4 at 0x50200 (gdb) br *0x60200 Breakpoint 5 at 0x60200 (gdb) br *0x70200 Breakpoint 6 at 0x70200 (gdb) br *0x80200 Breakpoint 7 at 0x80200 (gdb) br *0x90200 Breakpoint 8 at 0x90200 (gdb) c Continuing. Program received signal SIGTRAP, Trace/breakpoint trap. 0x00000000 in ?? () 1: x/i ($cs << 4) + $eip 0x90200: jmp 0x9023c (gdb) p/x $ds $1 = 0x9000 (gdb) dump memory setup.dump 0x90000 0x98000 (gdb) -- Regards/Gru
Uhm, it looks to me that you ran qemu with the -kernel option again (I can tell because the dump exhibits a few bugs that are characteristic of the qemu loader.) This makes qemu itself load the kernel and not rely on the boot loader that's on your bootloader. I was expecting you to run "qemu -S -s -hda /dev/hda -snapshot", which you previously said when run (without the -S -s) options reproduced the problem. After taking the dump, please do: delete c ... to verify the problem is reproduced. Thanks, -hpa -
ups, sorry for that, i thought falsely here that you want to debug the kernel
In this case, we never land on a jump instruction:
GNU gdb 6.6-debian
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
(gdb) target remote localhost:1234
Remote debugging using localhost:1234
0x0000fff0 in ?? ()
(gdb) set arch i8086
The target architecture is assumed to be i8086
(gdb) disp/i ($cs << 4)+$eip
1: x/i ($cs << 4) + $eip 0xffff0: ljmp $0xf000,$0xe05b
(gdb) br *0x10200
Breakpoint 1 at 0x10200
(gdb) br *0x20200
Breakpoint 2 at 0x20200
(gdb) br *0x30200
Breakpoint 3 at 0x30200
(gdb) br *0x40200
Breakpoint 4 at 0x40200
(gdb) br *0x50200
Breakpoint 5 at 0x50200
(gdb) br *0x60200
Breakpoint 6 at 0x60200
(gdb) br *0x70200
Breakpoint 7 at 0x70200
(gdb) br *0x80200
Breakpoint 8 at 0x80200
(gdb) br *0x90200
Breakpoint 9 at 0x90200
(gdb) c
Continuing.
Breakpoint 4, 0x00040200 in ?? ()
1: x/i ($cs << 4) + $eip 0x40300: lea (%si),%dx
(gdb) c
Continuing.
if i do delete here, it loads the second stage of grub and continues to load the
kernel. Is there another way to land at the jmp instruction instead of poking
blindly, maybe disassemble something parts of the initial code. \me reading
grub-docs...
--
Regards/Gruß,
Boris.
-If you do "delete" without a breakpoint number, you're deleting all breakpoints. I just experimented with grub, and it looks like it should break at 0x90200, so just set that breakpoint and none of the others. -hpa -
Hi,
now this is one of those cases where one tries to shoot a small fly with a
nuclear missile. The first assumption that something was wrong with the kernel
setup code was wrong and here's how i know:
The problem with my version of grub not hitting the breakpoint 0x90200 made me
think that something might be messed up in the grub part of the boot sequence.
Thus, i did the qemu simulation again and noticed on the initial boot screen of
grub it saying "Grub version 0.91." However, you remember from a different post
that the version of grub i have is the latest to be found in debian unstable,
0.97-29, so i thought that something has to be wrong with it and especially with all
those grub stages binaries, in my case in /boot/grub, which grub-install setups.
Checking their timestamps revealed that the files are from 2004 so i thought,
well, these are OLD! :) After refreshing the grub installation and replacing
the stages-binaries with the fresh ones, the kernel booted just fine :), here:
[boris@gollum:07:02:07:~:9994)-> uname -a
Linux gollum 2.6.22-4fd06960f120e02e9abc802a09f9511c400042a5-12 #12 PREEMPT Thu Jul 26 18:08:34 CEST 2007 i686 GNU/Linux
so i guess the problem was with the ancient parts of a grub installation i had
lying around which weren't replaced by the apt-get update process and somehow
messed up newer grub versions. Anyway, in the end one still learns a lot while at it.
Thanks for your help.
--
Regards/Gruß,
Boris.
-Very cool. I actually suspected that, but I wanted to explore all avenues. I'm glad this can be written off. -hpa -
Oh lovely. The purpose of this was to intercept the running of the kernel setup code. If grub doesn't load it at a 64K boundary, it is hard to guess what it would do. I'll do some experiments with qemu here and see if I can figure out a way to get it to trap at the right point. -hpa -
We're seeing boot failure when using isolinux with the latest kernels, but so far only on Dell machines. It loads the kernel and initrd, types "Ready." and then just reboots. Adding "vga=ask" gives the mode prompt and allows selecting a video mode, but it just reboots after that too. -
On my real machine with grub 0.97, there is no problem loading the same kernel. Xudong -
Do you mean the kernel with the 4fd06960f120e02e9abc802a09f9511c400042a5 commit
on top?
--
Regards/Gruß,
Boris.
-No, I mean v2.6.23-rc1-171-ge4903fb (HEAD of torvalds-linux-2.6.git as of this writing), which can not be loaded by grub 0.91, but can be loaded by grub 0.97 (at least on my machine). Sorry for the confusion. Xudong -
My grub screen shows "GRUB version 0.91". I am playing with a qemu image with this version of grub came from an old Damn Small Linux distribution. Xudong -
Hm, the oldest on GNU's website is 0.92, so I suspect 0.91 is the version number it had back when everyone was effectively using Grub snapshots. I'll try to see if I can pull Grub's CVS repository and do some archeology. -hpa -
| Chuck Ebbert | Why do so many machines need "noapic"? |
| Renato S. Yamane | Error -71 on device descriptor read/all |
| Greg Kroah-Hartman | [PATCH 05/54] kset: convert fuse to use kset_create |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
git: | |
| R. Tyler Ballance | Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted ... |
| Shawn O. Pearce | Re: Some ideas for StGIT |
| Alexander Litvinov | git-svn does not seems to work with crlf convertion enabled. |
| Wink Saville | Resolving conflicts |
| John P Poet | Realtek 8111C transmit timed out |
| Rémi Denis-Courmont | Re: [PATCH] Security: Implement and document RLIMIT_NETWORK. |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | [GIT]: Networking |
| Jason Beaudoin | Re: Real men don't attack straw men |
| Parvinder Bhasin | BIND and CNAME-ing |
| Manuel Ravasio | Annoying problem with dnsmasq |
| Craig Skinner | Re: How can i boot a bsd.rd from windows 2000 ? |
