Hi, With 2.6.27-rc7 on qemu-x86_64, it seems that panic will trigger a General Protection Fault. I haven't seen it before. [ 4.499793] VFS: Cannot open root device "hda1" or unknown-block(2,0) [ 4.502747] Please append a correct "root=" boot option; here are the available partitions: [ 4.506641] 0800 2048000 sda driver: sd [ 4.508987] 0801 1895638 sda1 [ 4.511088] 0802 1 sda2 [ 4.512858] 0810 2048 sdb driver: sd [ 4.514915] 0b00 1048575 sr0 driver: sr [ 4.519074] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(2,0) [ 4.523477] general protection fault: fff2 [1] SMP [ 4.523641] CPU 0 [ 4.523641] Modules linked in: [ 4.523641] Pid: 1, comm: swapper Tainted: G W 2.6.27-rc7 #1 [ 4.523641] RIP: 0010:[<ffffffff81019d27>] [<ffffffff81019d27>] native_smp_send_stop+0x29/0x2d [ 4.523641] RSP: 0018:ffff880007867d70 EFLAGS: 00000286 [ 4.523641] RAX: 00000000000000ff RBX: 0000000000000286 RCX: 0000000000000000 [ 4.523641] RDX: 0000000000000005 RSI: ffffffff81019ce1 RDI: 0000000000000000 [ 4.523641] RBP: ffff880007867d80 R08: 0000000000000000 R09: 0000000000002800 [ 4.523641] R10: 0000000000002800 R11: ffff880001020a40 R12: ffff88000705b018 [ 4.523641] R13: ffff88000705b000 R14: 0000000000008001 R15: ffffffff8159d550 [ 4.523641] FS: 0000000000000000(0000) GS:ffffffff816fae00(0000) knlGS:0000000000000000 [ 4.523641] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [ 4.523641] CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006a0 [ 4.523641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4.523641] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000 [ 4.523641] Process swapper (pid: 1, threadinfo ffff880007866000, task ffff880007868000) [ 4.523641] Stack: 0000000000005131 ffffffff8159d52d ffff880007867e70 ffffffff810344a4 [ 4.523641] 0000003000000010 ffff880007867e80 ...
hm, 0x5a is a simple pop %rdx. A #GP there means the stack segment is bust? so it's preceded by a popfq and on the next instruction we #GP. but the stack and flags state looks good: [ 4.523641] RSP: 0018:ffff880007867d70 EFLAGS: 00000286 weird. Ingo --
No, that would be #SS (and segments don't really exist in 64-bit mode anyway.) In 32-bit mode it could mean a code segment overrun. *However*... [ 4.523477] general protection fault: fff2 [1] SMP There is an error code attached to the #GP, which is supposed to mean that somehow a segment selector was involved. This doesn't look like a My guess is that the popfq enables interrupts, and we try to take an interrupt through an IDT entry which isn't set up correctly. -hpa --
I'm sorry for the false alarm. I discovered that it did not happen on
a clean kernel. My kernel was using this patch.
diff --git a/arch/x86/kernel/cpu/common_64.c b/arch/x86/kernel/cpu/common_64.c
index a11f5d4..abf5bc8 100644
--- a/arch/x86/kernel/cpu/common_64.c
+++ b/arch/x86/kernel/cpu/common_64.c
@@ -261,6 +261,8 @@ void __init early_cpu_init(void)
cpu_devs[cvdev->vendor] = cvdev->cpu_dev;
early_cpu_support_print();
early_identify_cpu(&boot_cpu_data);
+
+ setup_clear_cpu_cap(X86_FEATURE_PSE);
}
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
No, I was wrong! It *does* happen for vanilla as well, but it doesn't happen reliably. [ 4.043370] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(2,0) [ 4.048765] general protection fault: fff2 [1] SMP [ 4.048765] CPU 0 [ 4.048765] Modules linked in: [ 4.048765] Pid: 1, comm: swapper Tainted: G W 2.6.27-rc7 #8 [ 4.048765] RIP: 0010:[<ffffffff81019d27>] [<ffffffff81019d27>] native_smp_send_stop+0x29/0x2d [ 4.048765] RSP: 0018:ffff880007867d70 EFLAGS: 00000286 [ 4.048765] RAX: 00000000000000ff RBX: 0000000000000286 RCX: 0000000000000000 [ 4.048765] RDX: 0000000000000005 RSI: ffffffff81019ce1 RDI: 0000000000000000 [ 4.048765] RBP: ffff880007867d80 R08: 0000000000000000 R09: ffff880087867bff [ 4.048765] R10: ffff880087867bff R11: 000000000000000a R12: ffff88000707b018 [ 4.048765] R13: ffff88000707b000 R14: 0000000000008001 R15: ffffffff8159d550 [ 4.048765] FS: 0000000000000000(0000) GS:ffffffff816fae00(0000) knlGS:0000000000000000 [ 4.048765] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [ 4.048765] CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006a0 [ 4.048765] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4.048765] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000 [ 4.048765] Process swapper (pid: 1, threadinfo ffff880007866000, task ffff880007868000) [ 4.048765] Stack: 000000000000506f ffffffff8159d52d ffff880007867e70 ffffffff81034454 [ 4.048765] 0000003000000010 ffff880007867e80 ffff880007867db0 ffff880007867e80 [ 4.048765] ffff880007867dd0 ffff880007867e80 ffff880007899360 000000000000500e [ 4.048765] Call Trace: [ 4.048765] [<ffffffff81034454>] panic+0xe8/0x193 [ 4.048765] [<ffffffff8118ef5f>] ? kobject_put+0x44/0x49 [ 4.048765] [<ffffffff8121778e>] ? put_device+0x15/0x17 [ 4.048765] [<ffffffff8121ad49>] ? class_for_each_device+0xfe/0x10e [ 4.048765] [<ffffffff81715059>] ...
Keeping it going also found this bootup failure: [ 0.321423] Freeing SMP alternatives: 39k freed [ 0.323950] ACPI: Core revision 20080609 [ 0.360390] divide error: 0000 [1] SMP [ 0.360944] CPU 0 [ 0.360944] Modules linked in: [ 0.360944] Pid: 1, comm: swapper Tainted: G W 2.6.27-rc7 #9 [ 0.360944] RIP: 0010:[<ffffffff81039193>] [<ffffffff81039193>] __do_softirq+0x49/0xc5 [ 0.360944] RSP: 0018:ffffffff81792f00 EFLAGS: 00000206 [ 0.360944] RAX: ffff880007867fd8 RBX: 0000000000000042 RCX: ffff880007867d90 [ 0.360944] RDX: ffff880007867d90 RSI: 0000000000000086 RDI: ffffffff817ac208 [ 0.360944] RBP: ffffffff81792f20 R08: ffff88000100d0b0 R09: ffff88000100d040 [ 0.360944] R10: ffff88000100d040 R11: ffffffff81646b40 R12: ffffffff816ec080 [ 0.360944] R13: 000000000000000a R14: 0000000000000000 R15: 0000000000000000 [ 0.360944] FS: 0000000000000000(0000) GS:ffffffff816fae00(0000) knlGS:0000000000000000 [ 0.360944] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [ 0.360944] CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006a0 [ 0.360944] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 0.360944] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000 [ 0.360944] Process swapper (pid: 1, threadinfo ffff880007866000, task ffff880007868000) [ 0.360944] Stack: 0000000000000046 0000000000000000 ffffffff817893e0 0000000000000030 [ 0.360944] ffffffff81792f38 ffffffff8100d24c ffffffff81792f38 ffffffff81792f58 [ 0.360944] ffffffff8100eb81 ffff880007867ce8 0000000000000000 ffffffff81792f68 [ 0.360944] Call Trace: [ 0.360944] <IRQ> [<ffffffff8100d24c>] call_softirq+0x1c/0x28 [ 0.360944] [<ffffffff8100eb81>] do_softirq+0x32/0x89 [ 0.360944] [<ffffffff810392ad>] irq_exit+0x3f/0x82 [ 0.360944] [<ffffffff8100e9b3>] do_IRQ+0x147/0x166 [ 0.360944] [<ffffffff8100c5a1>] ret_from_intr+0x0/0xb [ 0.360944] <EOI> [<ffffffff8107013f>] ? noop+0x0/0x6 [ ...
Yes, but there shouldn't be any external interrupts that could turn into a divide error. It really smells like a Qemu problem -- possibly even a Qemu miscompile -- to me. Does it reproduce in KVM? -hpa --
I have no computer that can do KVM, sorry :-( Stack trace contains IO_APIC functions, so it seems that maybe the emulated IOAPIC is trying to (erroneously) deliver an int 0 (for some reason)? But I don't know, that's just speculation which can be done better by others, so I will stop now :-) Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 --
I suspect it's a problem in Qemu's IOAPIC model, but it's hard to know for sure. -hpa --
yes - it smells like it tries to deliver vector 0, after the panic code has deinitialized the lapic / ioapic. Ingo --
