The 2.6.22-rc1 boot panics early in amd_mcheck_init() with my k6-III/450. It panics early enough that some of what I'm sure would be useful has already scrolled off the screen, and there's no scrollback buffer at that point. If more detail is needed, I'll have to transcribe what I *can* see by hand. Here's /proc/cpuinfo on the chance it will help someone track this down: processor : 0 vendor_id : AuthenticAMD cpu family : 5 model : 9 model name : AMD-K6(tm) 3D+ Processor stepping : 1 cpu MHz : 451.032 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow k6_mtrr bogomips : 902.76 clflush size : 32 -- ----------------------------------------------------------------------- Bob Tracy WTO + WIPO = DMCA? http://www.anti-dmca.org rct@frus.com ----------------------------------------------------------------------- -
or use digital camera photo of screen if you have such a camera. It's a good idea to increase the screen virtual size by decreasing --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -
In short, boot with vga=6 (80x50), or even vga=ask, selecting something with lots of chars (132x60 for example) Jan -- -
That did it...
(...)
CPU: L1 I Cache: (...)
CPU: L2 Cache: (...)
Intel machine check architecture supported.
general protection fault: 0000 [#1]
PREEMPT
Modules linked in:
CPU: 0
EIP: 0060:[<c01079f4>] Not tainted VLI
EFLAGS: 00010286 (2.6.22-rc1 #1)
EIP is at amd_mcheck_init+0x2b/0xc3
eax: 0000002f ebx: 00000000 ecx: 00000179 edx: 00000001
esi: 000a0b00 edi: c03c1000 ebp: 00843007 esp: c03c7fb0
ds:007b es:007b fs:0000 gs:0000 ss:0068
Process swapper (pid:0, ti=c03c6000 task=c0399a20 task.ti=c03c6000)
Stack: c03569fc 00000000 00843007 c0189398 ffffffff 00000000 000a0600 c03cb34f
00000002 c03a2b40 00000000 c03cb79c 00040000 00000000 00000000 c03c8842
00000054 c03c8388 c03dfb40 00000000
Call Trace:
[<c0189398>] proc_register+0x3b/0xb7
[<c03cb34f>] identify_boot_cpu+0xd/0x1f
[<c03cb79c>] check_bugs+0x8/0x4e
[<c03c8842>] start_kernel+0x19a/0x1a3
[<c03c8388>] unknown_bootoption+0x0/0x191
==========================================
Code: 56 53 83 ec 14 c7 05 08 f1 39 c0 3c 78 10 c0 8b 40 0c a8 80 0f 84 a0 00 00 00 c7 04 24 fc 69 35 c0 e8 79 f3 00 00 b9 79 01 00 00 <0f> 32 f6 c4 01 89 c3 74 0a b1 7b 83 c8 ff 83 ca ff 0f 30 0f b6
EIP:[<c01079f4>] amd_mcheck_init+0x2b/0xc3 SS:ESP 0068:c03c7fb0
Kernel panic - not syncing: Attempted to kill the idle task!
--
-----------------------------------------------------------------------
Bob Tracy WTO + WIPO = DMCA? http://www.anti-dmca.org
rct@frus.com
-----------------------------------------------------------------------
-
rdmsr with ecx == 0x179 (Machine Check Global Capabilities Register) Probably K6 doesn't have that. Caused by: [PATCH] i386: check capability -
On Wed, May 16, 2007 at 11:53:22AM -0400, Chuck Ebbert wrote: > Bob Tracy wrote: > > Jan Engelhardt wrote: > >>> On Tue, 15 May 2007 22:13:14 -0500 (CDT) Bob Tracy wrote: > >>> > >>>> The 2.6.22-rc1 boot panics early in amd_mcheck_init() with my k6-III/450. > > > Intel machine check architecture supported. > > general protection fault: 0000 [#1] > > PREEMPT > > Modules linked in: > > CPU: 0 > > EIP: 0060:[<c01079f4>] Not tainted VLI > > EFLAGS: 00010286 (2.6.22-rc1 #1) > > EIP is at amd_mcheck_init+0x2b/0xc3 > > > > rdmsr with ecx == 0x179 (Machine Check Global Capabilities Register) > > Probably K6 doesn't have that. sounds right. Intel style MCE capability was introduced with the Athlon on AMD systems iirc. > Caused by: > > [PATCH] i386: check capability Though this would imply that Bobs K6-3 is reporting that it does have that bit in its cpuid flags. Bob, can you send your /proc/cpuinfo and dmesg |grep CPU ? Dave -- http://www.codemonkey.org.uk -
/proc/cpuinfo sent in the first message in this thread (anticipated your request :-)), but it's small enough to repeat: processor : 0 vendor_id : AuthenticAMD cpu family : 5 model : 9 model name : AMD-K6(tm) 3D+ Processor stepping : 1 cpu MHz : 451.040 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow k6_mtrr bogomips : 902.78 clflush size : 32 Here's the requested dmesg output (for 2.6.21): Initializing CPU#0 CPU: After generic identify, caps: 008021bf 808029bf 00000000 00000000 00000000 00000000 00000000 CPU: L1 I Cache: 32K (32 bytes/line), D cache 32K (32 bytes/line) CPU: L2 Cache: 256K (32 bytes/line) CPU: After all inits, caps: 008021bf 808029bf 00000000 00000002 00000000 00000000 00000000 CPU: AMD-K6(tm) 3D+ Processor stepping 01 NVRM: CPU does not support the PAT, falling back to MTRRs. -- ----------------------------------------------------------------------- Bob Tracy WTO + WIPO = DMCA? http://www.anti-dmca.org rct@frus.com ----------------------------------------------------------------------- -
On Wed, May 16, 2007 at 02:11:56PM -0500, Bob Tracy wrote: > flags : fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow k6_mtrr > bogomips : 902.78 > clflush size : 32 Ah so it really does think it has mce. I just dug out the datasheet for the K6-3, and true enough, it did have MCE, however, it isn't intel compatible. It has two MSRs (MCAR at 0x0, and MCTR at 0x01). Then the punchline.. "Because the processor does not support machine check exceptions, the contents of the MCAR and MCTR are only affected by the WRMSR instruction and by RESET being sampled asserted (where all bits in each register are reset to 0)." In short, it's useless. We could clear the capability bit and pretend it isn't there, at no loss of functionality, or we could revert back to doing model checks instead of cpuid flag checks. Dave -- http://www.codemonkey.org.uk -
On Wed, May 16, 2007 at 03:22:48PM -0400, Dave Jones wrote: > On Wed, May 16, 2007 at 02:11:56PM -0500, Bob Tracy wrote: > > > flags : fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow k6_mtrr > > bogomips : 902.78 > > clflush size : 32 > > Ah so it really does think it has mce. > I just dug out the datasheet for the K6-3, and true enough, it did have MCE, however, > it isn't intel compatible. It has two MSRs (MCAR at 0x0, and MCTR at 0x01). > Then the punchline.. > > "Because the processor does not support machine check exceptions, the contents of the > MCAR and MCTR are only affected by the WRMSR instruction and by RESET being sampled > asserted (where all bits in each register are reset to 0)." > > In short, it's useless. > We could clear the capability bit and pretend it isn't there, at no loss of > functionality, or we could revert back to doing model checks instead of cpuid flag checks. Bob, does this patch make it boot again for you? Dave Some AMD K6's advertise machine check capability, but don't actually have an Intel compatible implementation. It also doesn't actually work, so don't advertise it as being present. Signed-off-by: Dave Jones <davej@redhat.com> diff --git a/arch/i386/kernel/cpu/amd.c b/arch/i386/kernel/cpu/amd.c index 4fec702..3a75c5b 100644 --- a/arch/i386/kernel/cpu/amd.c +++ b/arch/i386/kernel/cpu/amd.c @@ -197,7 +197,14 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c) /* placeholder for any needed mods */ break; } + + /* + * Some K6's advertise MCE, but it's incompatible + * to Intel style MCE, and also non-functional. + */ + clear_bit(X86_FEATURE_MCE, c->x86_capability); break; + case 6: /* An Athlon/Duron */ /* Bit 15 of Athlon specific MSR 15, needs to be 0 -- http://www.codemonkey.org.uk -
NAK. No difference. Identical panic message. (Yes, I double-checked to make sure I was booting the patched kernel :-)). -- ----------------------------------------------------------------------- Bob Tracy WTO + WIPO = DMCA? http://www.anti-dmca.org rct@frus.com ----------------------------------------------------------------------- -
On Wed, May 16, 2007 at 11:36:46PM -0500, Bob Tracy wrote: > Dave Jones wrote: > > Bob, does this patch make it boot again for you? > > > > Dave > > > > Some AMD K6's advertise machine check capability, but don't actually > > have an Intel compatible implementation. It also doesn't actually work, > > so don't advertise it as being present. > > > > Signed-off-by: Dave Jones <davej@redhat.com> > > NAK. No difference. Identical panic message. (Yes, I double-checked > to make sure I was booting the patched kernel :-)). Hmm, odd. Does reverting the patch that Chuck fingered fix it? Dave -- http://www.codemonkey.org.uk -
ACK. I'm running 2.6.22-rc1 minus Joachim's patch as I type this. Anticipating the question, here's the "Processor type and features" section from "linux/.config": # # Automatically generated make config: don't edit # Linux kernel version: 2.6.22-rc1 # Sun May 13 00:29:57 2007 # (...) # # Processor type and features # CONFIG_TICK_ONESHOT=y # CONFIG_NO_HZ is not set CONFIG_HIGH_RES_TIMERS=y # CONFIG_SMP is not set CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_PARAVIRT is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set # CONFIG_MPENTIUMM is not set # CONFIG_MCORE2 is not set # CONFIG_MPENTIUM4 is not set CONFIG_MK6=y # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_MVIAC7 is not set # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_L1_CACHE_SHIFT=5 CONFIG_X86_XADD=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_CMPXCHG64=y CONFIG_X86_ALIGNMENT_16=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_TSC=y CONFIG_X86_MINIMUM_CPU_MODEL=4 CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y # CONFIG_PREEMPT_NONE is not set # CONFIG_PREEMPT_VOLUNTARY is not ...
Hmpf. We cold either use rdmsr_safe or add a family check again or clear it in k6 setup. I think clearing it in setup is cleanest. Does this patch work? -Andi Clear MCE flag on AMD K6 It reports machine check capability in CPUID, but doesn't actually implement all the necessary MSRs of the standard Intel machine check architecture. This fixes a boot failure recently introduced. Signed-off-by: Andi Kleen <ak@suse.de> Index: linux/arch/i386/kernel/cpu/amd.c =================================================================== --- linux.orig/arch/i386/kernel/cpu/amd.c +++ linux/arch/i386/kernel/cpu/amd.c @@ -280,6 +280,10 @@ static void __cpuinit init_amd(struct cp if (c->x86 == 0x10 && !force_mwait) clear_bit(X86_FEATURE_MWAIT, c->x86_capability); + + /* K6s reports MCEs but don't actually have all the MSRs */ + if (c->x86 < 6) + clear_bit(X86_FEATURE_MCE, c->x86_capability); } static unsigned int __cpuinit amd_size_cache(struct cpuinfo_x86 * c, unsigned int size) Index: linux/arch/i386/kernel/cpu/mcheck/k7.c =================================================================== --- linux.orig/arch/i386/kernel/cpu/mcheck/k7.c +++ linux/arch/i386/kernel/cpu/mcheck/k7.c @@ -72,12 +72,12 @@ void amd_mcheck_init(struct cpuinfo_x86 u32 l, h; int i; - machine_check_vector = k7_machine_check; - wmb(); - if (!cpu_has(c, X86_FEATURE_MCE)) return; + machine_check_vector = k7_machine_check; + wmb(); + printk (KERN_INFO "Intel machine check architecture supported.\n"); rdmsr (MSR_IA32_MCG_CAP, l, h); if (l & (1<<8)) /* Control register present ? */ -
I want to acknowledge receiving the above, but it arrived too late for me to test this morning (the work day intrudes). I'll get a new kernel built, test this, and report back in about 10-11 hours. -- ----------------------------------------------------------------------- Bob Tracy WTO + WIPO = DMCA? http://www.anti-dmca.org rct@frus.com ----------------------------------------------------------------------- -
ACK. I reinstalled Joachim's patch (default 2.6.22-rc1 state), and added your patch. Life is good: we have a fix/workaround. -- ----------------------------------------------------------------------- Bob Tracy WTO + WIPO = DMCA? http://www.anti-dmca.org rct@frus.com ----------------------------------------------------------------------- -
Great, thanks for finding this and Andi for the patch. I'll talk to our labs about dusting off old systems for testing ;) -Joachim -
