Hi,
Today's linux-next tree (commit
93847083e4791567931bd17c039cc35881cdad29) fails to boot:
[built with gcc-4.2.4-3]
BUG: Int 14: CR2 b0049dea
EDI 00000082 ESI 00000000 EBP c059be88 ESP c059be5c
EBX f000ec62 EDX 0000000e ECX c0595480 EAX f000ec62
err 00000000 EIP c0181ca0 CS 00000060 flg 00010082
Stack: 00000040 c06a2ba0 000080d0 c0595480 c0000f19c c000f180 c0581120
c059bea8
c02bf19b 00000000 00000080 c059beb8 c0000f194 c000f180 0000000a
c059beb8
c03a1059 00000000 00000000 c059bed8 c05c4c7c 0009efff 00000000
c04f4df4
I get this as soon as I boot from grub2, strangely the error message is
at the bottom of the screen, and I can't see the full message (scrolling
won't work).
The last kernel I built & booted was 2.6.26-rc8 from Linus's tree. I
will try to built&boot 2.6.26-rc9, and then bisect.
This happens on 32-bit Dell Inspiron 6400 (Intel Core Duo T2300 @1.66
Ghz CPU), Intel ICH-7 chipset, and a seagate SATA drive.
I will provide full hardware details once I bisected the problem.
Meanwhile, if somebody has an idea as to what is wrong?
Best regards,
--Edwin
--
At Fri, 11 Jul 2008 14:12:11 +0300, [Added Ingo to Cc] I get the boot problem on i386 with 2008-07-11 linux-next tree, too. In my case, no error appears on the screen, just staying blank and dead. It seems stopping at the very beginning, soon after GRUB, so could be the same reason. The same config worked fine with yesterday's tree (2008-07-10) on the same machine. Also, today's tree works on x86-64 (but on another machine). My config is below. thanks, Takashi --- # # Automatically generated make config: don't edit # Linux kernel version: 2.6.26-rc9 # Tue Jul 8 11:37:08 2008 # # CONFIG_64BIT is not set CONFIG_X86_32=y # CONFIG_X86_64 is not set CONFIG_X86=y CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" # CONFIG_GENERIC_LOCKBREAK is not set CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_FAST_CMPXCHG_LOCAL=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y # CONFIG_GENERIC_GPIO is not set CONFIG_ARCH_MAY_HAVE_PC_FDC=y # CONFIG_RWSEM_GENERIC_SPINLOCK is not set CONFIG_RWSEM_XCHGADD_ALGORITHM=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y CONFIG_GENERIC_CALIBRATE_DELAY=y # CONFIG_GENERIC_TIME_VSYSCALL is not set CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y # CONFIG_HAVE_CPUMASK_OF_CPU_MAP is not set CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y # CONFIG_ZONE_DMA32 is not set CONFIG_ARCH_POPULATES_NODE_MAP=y # CONFIG_AUDIT_ARCH is not ...
I don't see any boot messages on the screen, I get that BUG message as soon as grub's menu dissapears. I have bisected it to this range so far: git-bisect good aa03060a78c1aec53075a0c8ca7be19cedfbea8f git-bisect bad b1611c0058bc6635e7257e755c3f194933a7a6df Should I continue to bisect? See git-bisect log, and .config below. git-bisect start # good: [e5a5816f7875207cb0a0a7032e39a4686c5e10a4] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 git-bisect good e5a5816f7875207cb0a0a7032e39a4686c5e10a4 # bad: [93847083e4791567931bd17c039cc35881cdad29] Add linux-next specific files for 20080711 git-bisect bad 93847083e4791567931bd17c039cc35881cdad29 # bad: [5aeabb501abf4fa99a85c9dd347f2c3399545f01] Merge commit 'kvm/master' git-bisect bad 5aeabb501abf4fa99a85c9dd347f2c3399545f01 # bad: [da1671f1e4d1c7baf81e938e1c20b99fa6a79982] Revert "kconfig: normalize int/hex values" git-bisect bad da1671f1e4d1c7baf81e938e1c20b99fa6a79982 # good: [9e97638c0ab1588913e298b41fca68a593650058] Merge branch 'x86/core' into auto-x86-next git-bisect good 9e97638c0ab1588913e298b41fca68a593650058 # good: [aa03060a78c1aec53075a0c8ca7be19cedfbea8f] Merge commit 'safe-poison-pointers/auto-safe-poison-pointers-next' git-bisect good aa03060a78c1aec53075a0c8ca7be19cedfbea8f # bad: [10889486f1de748096e999ee6b9d22890504cebf] Merge branch 'quilt/i2c' git-bisect bad 10889486f1de748096e999ee6b9d22890504cebf # bad: [b1611c0058bc6635e7257e755c3f194933a7a6df] Merge commit 'x86/auto-x86-next' git-bisect bad b1611c0058bc6635e7257e755c3f194933a7a6df # # Automatically generated make config: don't edit # Linux kernel version: 2.6.26-rc9 # Fri Jul 11 13:08:25 2008 # # CONFIG_64BIT is not set CONFIG_X86_32=y # CONFIG_X86_64 is not set CONFIG_X86=y CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" # CONFIG_GENERIC_LOCKBREAK is not ...
could you check latest tip/master, does it boot fine with the same config? Ingo --
tip/master boots fine. Thanks for the hint, I rebuilt a failing kernel, and this is what addr2line says: $ addr2line -e vmlinux -i c0181ca0 ??:0 $ addr2line -e vmlinux -f c0181ca0 kmem_cache_alloc ??:0 Best regards, --Edwin --
to update linux-next to the latest bits in -tip, you can perhaps do something like this: git-merge tip/auto-x86-next git-merge tip/auto-core-next git-merge tip/auto-cpus4096-next git-merge tip/auto-ftrace-next git-merge tip/auto-generic-ipi-next git-merge tip/auto-genirq-next git-merge tip/auto-latest git-merge tip/auto-safe-poison-pointers-next git-merge tip/auto-sched-next git-merge tip/auto-stackprotector-next git-merge tip/auto-timers-next but it's easily possible that the bug is in some other portion of linux-next. Ingo --
Hm, I haven't tested the linux-next from today myself yet, but I have a related question. Namely, is there a way to get a log of commits that have been added since the previous linux-next? That may help to find a guilty patch if the yesterday's linux-next works. Thanks, Rafael --
at least in -tip it works like this:
git-shortlog tip-history-2008-07-10_09.58_Thu..
or, a more practical format with commit IDs on the same line:
git log --no-merges --pretty=format:"%h: %s" \
tip-history-2008-07-10_09.58_Thu..
you can restrict it to a given piece of code as well, say:
git log --no-merges --pretty=format:"%h: %s" \
tip-history-2008-07-10_09.58_Thu.. -- arch/x86/ include/asm-x86/
this doesnt work in linux-next nearly as well, due to the Quilt imported
trees. Every time a quilt queue is updated and reimported, there's a
stream of repeat commits.
Ingo
--
Thanks for the tips. Well, it turns out that linux-next from today doesn't boot on my box too (64-bit) and I don't see anything obviously suspicious. Bisection time. Thanks, Rafael --
Hi Ingo,
I have identified the source of the breakage on my box, but I don't really
think it's the same problem that Edwin is observing.
Namely, it turns out that some code in arch/x86/kernel/acpi/boot.c, as in
today's linux-next, doesn't really make sense, because we have two conflicting
DMA-based quirks in there for the same set of boxes (HP nx6325 and nx6125) and
one of them actually breaks my box.
I have reported that already, but it probably got lost somewhere.
Below is a patch that fixes things for me, on top of today's linux-next.
Please apply.
Thanks,
Rafael
---
Remove some code that breaks my HP nx6325 from arch/x86/kernel/acpi/boot.c.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
arch/x86/kernel/acpi/boot.c | 47 --------------------------------------------
1 file changed, 47 deletions(-)
Index: linux-next/arch/x86/kernel/acpi/boot.c
===================================================================
--- linux-next.orig/arch/x86/kernel/acpi/boot.c
+++ linux-next/arch/x86/kernel/acpi/boot.c
@@ -84,8 +84,6 @@ int acpi_lapic;
int acpi_ioapic;
int acpi_strict;
-static int disable_irq0_through_ioapic __initdata;
-
u8 acpi_sci_flags __initdata;
int acpi_sci_override_gsi __initdata;
int acpi_skip_timer_override __initdata;
@@ -982,10 +980,6 @@ void __init mp_override_legacy_irq(u8 bu
int pin;
struct mp_config_intsrc mp_irq;
- /* Skip the 8254 timer interrupt (IRQ 0) if requested. */
- if (bus_irq == 0 && disable_irq0_through_ioapic)
- return;
-
/*
* Convert 'gsi' to 'ioapic.pin'.
*/
@@ -1052,10 +1046,6 @@ void __init mp_config_acpi_legacy_irqs(v
for (i = 0; i < 16; i++) {
int idx;
- /* Skip the 8254 timer interrupt (IRQ 0) if requested. */
- if (i == 0 && disable_irq0_through_ioapic)
- continue;
-
for (idx = 0; idx < mp_irq_entries; idx++) {
struct mp_config_intsrc *irq = mp_irqs + idx;
@@ -1413,17 +1403,6 @@ static int __init force_acpi_ht(const st
}
/*
- * Don't register any ...thanks Rafael, i have applied your fix to tip/x86/core and to auto-x86-next as well. what happened is a case of too many fixes for the same problem :-/ Ingo --
Ouch, sorry. You need CONFIG_DEBUG_INFO for this to really make any sense. My habits are deceiving me, because I always build with DEBUG_INFO. Ingo claims that this slows down his builds, so he never does it. In the end, it is up to each one of us to choose his options, but in general, I think it should be used when testing kernels. If you're tired of rebuilding kernels now, I don't blame you. Another lesson learned for me :-( Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036
BTW, did the new kernel fail in exactly the same place? If not, you should also replace the EIP in the new crash report on the addr2line command line, so in general: addr2line -e vmlinux -f -i <EIP here>. (I don't think it really makes sense for the kernel to crash in kmem_cache_alloc() this early in the boot process, so I'm guessing you have a different EIP.) (Also don't rebuild a bad kernel just to try this out again, but if you happen to run across another bad one for example during bisection, you can try it then.) Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036
Yep, I am using ccache, same sources -> same binary. addr2line -i now says: /var/local/src/linux-2.6.git/linux-2.6/mm/slub.c:1648 /var/local/src/linux-2.6.git/linux-2.6/mm/slub.c:1662 Strangely the EIP is the same even after rebuilding with debug info. Since tip/master supports the latency tracing features, I won't dig further into the linux-next problem now (we'll know tomorrow if the commits in tip solve the boot problem). Best regards, --Edwin --
I have got the same problem as Edwin TÖRÖK: From next-20080710 to next-20080711 the kernel fails to boot. EIP seems to be in function kmem_cache_alloc. This is also true for next-20080716. I didn't try the kernels > next-20080711 and < next-20080716. Any news on this bug? My config is: # # Automatically generated make config: don't edit # Linux kernel version: 2.6.26 # Wed Jul 16 21:28:31 2008 # # CONFIG_64BIT is not set CONFIG_X86_32=y # CONFIG_X86_64 is not set CONFIG_X86=y CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" # CONFIG_GENERIC_LOCKBREAK is not set CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_FAST_CMPXCHG_LOCAL=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y # CONFIG_GENERIC_GPIO is not set CONFIG_ARCH_MAY_HAVE_PC_FDC=y # CONFIG_RWSEM_GENERIC_SPINLOCK is not set CONFIG_RWSEM_XCHGADD_ALGORITHM=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y CONFIG_GENERIC_CALIBRATE_DELAY=y # CONFIG_GENERIC_TIME_VSYSCALL is not set CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y # CONFIG_HAVE_CPUMASK_OF_CPU_MAP is not set CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y # CONFIG_ZONE_DMA32 is not set CONFIG_ARCH_POPULATES_NODE_MAP=y # CONFIG_AUDIT_ARCH is not set CONFIG_ARCH_SUPPORTS_AOUT=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_X86_SMP=y CONFIG_X86_32_SMP=y CONFIG_X86_HT=y CONFIG_X86_BIOS_REBOOT=y CONFIG_X86_TRAMPOLINE=y CONFIG_KTIME_SCALAR=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General ...
Yes a fix has been released: http://article.gmane.org/gmane.linux.kernel.kexec/1882 But I don't know when it will be applied.... --
T24gRnJpLCBKdWwgMTEsIDIwMDggYXQgMToxMiBQTSwgVMO2csO2ayBFZHdpbiA8ZWR3aW50b3Jv a0BnbWFpbC5jb20+IHdyb3RlOgo+IEhpLAo+Cj4gVG9kYXkncyBsaW51eC1uZXh0IHRyZWUgKGNv bW1pdAo+IDkzODQ3MDgzZTQ3OTE1Njc5MzFiZDE3YzAzOWNjMzU4ODFjZGFkMjkpIGZhaWxzIHRv IGJvb3Q6Cj4gW2J1aWx0IHdpdGggZ2NjLTQuMi40LTNdCj4KPiBCVUc6IEludCAxNDogQ1IyIGIw MDQ5ZGVhCj4gICAgIEVESSAwMDAwMDA4MiBFU0kgMDAwMDAwMDAgRUJQIGMwNTliZTg4IEVTUCBj MDU5YmU1Ywo+ICAgICBFQlggZjAwMGVjNjIgRURYIDAwMDAwMDBlIEVDWCBjMDU5NTQ4MCBFQVgg ZjAwMGVjNjIKPiAgICAgZXJyIDAwMDAwMDAwIEVJUCBjMDE4MWNhMCAgQ1MgMDAwMDAwNjAgZmxn IDAwMDEwMDgyCj4gU3RhY2s6ICAgMDAwMDAwNDAgYzA2YTJiYTAgMDAwMDgwZDAgYzA1OTU0ODAg YzAwMDBmMTljIGMwMDBmMTgwIGMwNTgxMTIwCj4gYzA1OWJlYTgKPiAgICAgICAgIGMwMmJmMTli IDAwMDAwMDAwIDAwMDAwMDgwIGMwNTliZWI4IGMwMDAwZjE5NCBjMDAwZjE4MCAwMDAwMDAwYQo+ IGMwNTliZWI4Cj4gICAgICAgICBjMDNhMTA1OSAwMDAwMDAwMCAwMDAwMDAwMCBjMDU5YmVkOCBj MDVjNGM3YyAgMDAwOWVmZmYgMDAwMDAwMDAKPiBjMDRmNGRmNAoKSGksCgpPbmUgcmVhbGx5IHNp bXBsZSB3YXkgb2YgZ2V0dGluZyBzb21lIG1vcmUgaW5mbyBvdXQgb2YgdGhpcyBpcyB0byB0YWtl CnRoZSBFSVAgdmFsdWUgKGhlcmUgYzAxODFjYTApIGFuZCBydW4gaXQgdGhyb3VnaCBhZGRyMmxp bmU6CgogICAgJCBhZGRyMmxpbmUgLWUgdm1saW51eCAtaSBjMDE4MWNhMAoKQnV0IHlvdSBuZWVk IHRvIG1ha2Ugc3VyZSB0aGF0IHRoZSBiekltYWdlL3ZtbGludXggeW91IGJvb3RlZApjb3JyZXNw b25kcyB0byB0aGUgdm1saW51eCB5b3UgYXJlIHJ1bm5pbmcgYWRkcjJsaW5lIGFnYWluc3QuCgpU aGlzIHdpbGwgdGVsbCB5b3UgdGhlIHNvdXJjZSBsaW5lIHdoaWNoIHByb2R1Y2VkIHRoZSBwYWdl IGZhdWx0IGFuZAp3aWxsIHByb2JhYmx5IGdpdmUgYSBnb29kIGNsdWUgYXMgdG8gd2hhdCB3ZW50 IHdyb25nLgoKVGhhbmtzIGZvciByZXBvcnRpbmcgOi0pCgoKVmVnYXJkCgotLSAKIlRoZSBhbmlt aXN0aWMgbWV0YXBob3Igb2YgdGhlIGJ1ZyB0aGF0IG1hbGljaW91c2x5IHNuZWFrZWQgaW4gd2hp bGUKdGhlIHByb2dyYW1tZXIgd2FzIG5vdCBsb29raW5nIGlzIGludGVsbGVjdHVhbGx5IGRpc2hv bmVzdCBhcyBpdApkaXNndWlzZXMgdGhhdCB0aGUgZXJyb3IgaXMgdGhlIHByb2dyYW1tZXIncyBv d24gY3JlYXRpb24uIgoJLS0gRS4gVy4gRGlqa3N0cmEsIEVXRDEwMzYK --
T
At Tue, 15 Jul 2008 04:11:26 +0200, Confirmed that this fixes the boot problem on my machine, too. (It explains why this happens only on x86-32...) Added Bernhard to Cc. Maybe we should defer firmware_map_add*()? thanks, --
Yes. I already posted a patch for that. http://article.gmane.org/gmane.linux.kernel.kexec/1882 Bernhard -- Bernhard Walle, SUSE LINUX Products GmbH, Architecture Development --
Which does not defer firmware_map_add*() but the kobject initialisation. Which also fixes the problem. Bernhard -- Bernhard Walle, SUSE LINUX Products GmbH, Architecture Development --
At Tue, 15 Jul 2008 13:17:44 +0200, Thanks, it's good to know that the fix is already pending. Takashi --
What's the process to get that to linux-next? Ingo's tip tree. Bernhard -- Bernhard Walle, SUSE LINUX Products GmbH, Architecture Development --
