Re: [BUG] (regression) AMD k6-III/450 won't boot w/2.6.22-rc1

Previous thread: Re: kdb: add rdmsr and wrmsr commands for i386 by Bernardo Innocenti on Tuesday, May 15, 2007 - 8:03 pm. (4 messages)

Next thread: 2.6.22-rc1-mm1 by Andrew Morton on Tuesday, May 15, 2007 - 8:19 pm. (123 messages)
From: Bob Tracy
Date: Tuesday, May 15, 2007 - 8:13 pm

The 2.6.22-rc1 boot panics early in amd_mcheck_init() with my k6-III/450.
It panics early enough that some of what I'm sure would be useful has
already scrolled off the screen, and there's no scrollback buffer at
that point.  If more detail is needed, I'll have to transcribe what I
*can* see by hand.

Here's /proc/cpuinfo on the chance it will help someone track this down:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 5
model		: 9
model name	: AMD-K6(tm) 3D+ Processor
stepping	: 1
cpu MHz		: 451.032
cache size	: 256 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow k6_mtrr
bogomips	: 902.76
clflush size	: 32

-- 
-----------------------------------------------------------------------
Bob Tracy                   WTO + WIPO = DMCA? http://www.anti-dmca.org
rct@frus.com
-----------------------------------------------------------------------
-

From: Randy Dunlap
Date: Tuesday, May 15, 2007 - 9:27 pm

or use digital camera photo of screen if you have such a camera.

It's a good idea to increase the screen virtual size by decreasing


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-

From: Jan Engelhardt
Date: Wednesday, May 16, 2007 - 12:03 am

In short, boot with vga=6 (80x50), or even vga=ask, selecting something with
lots of chars (132x60 for example)


	Jan
-- 
-

From: Bob Tracy
Date: Wednesday, May 16, 2007 - 6:15 am

That did it...

(...)
CPU: L1 I Cache: (...)
CPU: L2 Cache: (...)
Intel machine check architecture supported.
general protection fault: 0000 [#1]
PREEMPT
Modules linked in:
CPU:   0
EIP: 0060:[<c01079f4>]  Not tainted VLI
EFLAGS: 00010286   (2.6.22-rc1 #1)
EIP is at amd_mcheck_init+0x2b/0xc3
eax: 0000002f  ebx: 00000000  ecx: 00000179  edx: 00000001
esi: 000a0b00  edi: c03c1000  ebp: 00843007  esp: c03c7fb0
ds:007b  es:007b  fs:0000  gs:0000  ss:0068
Process swapper (pid:0, ti=c03c6000 task=c0399a20 task.ti=c03c6000)
Stack: c03569fc 00000000 00843007 c0189398 ffffffff 00000000 000a0600 c03cb34f
       00000002 c03a2b40 00000000 c03cb79c 00040000 00000000 00000000 c03c8842
       00000054 c03c8388 c03dfb40 00000000
Call Trace:
 [<c0189398>] proc_register+0x3b/0xb7
 [<c03cb34f>] identify_boot_cpu+0xd/0x1f
 [<c03cb79c>] check_bugs+0x8/0x4e
 [<c03c8842>] start_kernel+0x19a/0x1a3
 [<c03c8388>] unknown_bootoption+0x0/0x191
==========================================
Code: 56 53 83 ec 14 c7 05 08 f1 39 c0 3c 78 10 c0 8b 40 0c a8 80 0f 84 a0 00 00 00 c7 04 24 fc 69 35 c0 e8 79 f3 00 00 b9 79 01 00 00 <0f> 32 f6 c4 01 89 c3 74 0a b1 7b 83 c8 ff 83 ca ff 0f 30 0f b6
EIP:[<c01079f4>] amd_mcheck_init+0x2b/0xc3 SS:ESP 0068:c03c7fb0
Kernel panic - not syncing: Attempted to kill the idle task!

-- 
-----------------------------------------------------------------------
Bob Tracy                   WTO + WIPO = DMCA? http://www.anti-dmca.org
rct@frus.com
-----------------------------------------------------------------------
-

From: Chuck Ebbert
Date: Wednesday, May 16, 2007 - 8:53 am

rdmsr with ecx == 0x179 (Machine Check Global Capabilities Register)

Probably K6 doesn't have that.

Caused by:

	[PATCH] i386: check capability

-

From: Dave Jones
Date: Wednesday, May 16, 2007 - 9:30 am

On Wed, May 16, 2007 at 11:53:22AM -0400, Chuck Ebbert wrote:
 > Bob Tracy wrote:
 > > Jan Engelhardt wrote:
 > >>> On Tue, 15 May 2007 22:13:14 -0500 (CDT) Bob Tracy wrote:
 > >>>
 > >>>> The 2.6.22-rc1 boot panics early in amd_mcheck_init() with my k6-III/450.
 > 
 > > Intel machine check architecture supported.
 > > general protection fault: 0000 [#1]
 > > PREEMPT
 > > Modules linked in:
 > > CPU:   0
 > > EIP: 0060:[<c01079f4>]  Not tainted VLI
 > > EFLAGS: 00010286   (2.6.22-rc1 #1)
 > > EIP is at amd_mcheck_init+0x2b/0xc3
 > > 
 > 
 > rdmsr with ecx == 0x179 (Machine Check Global Capabilities Register)
 > 
 > Probably K6 doesn't have that.

sounds right. Intel style MCE capability was introduced with the Athlon
on AMD systems iirc.

 > Caused by:
 > 
 > 	[PATCH] i386: check capability

Though this would imply that Bobs K6-3 is reporting that it does have
that bit in its cpuid flags.

Bob, can you send your /proc/cpuinfo and dmesg |grep CPU  ?

	Dave

-- 
http://www.codemonkey.org.uk
-

From: Bob Tracy
Date: Wednesday, May 16, 2007 - 12:11 pm

/proc/cpuinfo sent in the first message in this thread (anticipated your
request :-)), but it's small enough to repeat:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 5
model		: 9
model name	: AMD-K6(tm) 3D+ Processor
stepping	: 1
cpu MHz		: 451.040
cache size	: 256 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow k6_mtrr
bogomips	: 902.78
clflush size	: 32

Here's the requested dmesg output (for 2.6.21):

Initializing CPU#0
CPU: After generic identify, caps: 008021bf 808029bf 00000000 00000000 00000000 00000000 00000000
CPU: L1 I Cache: 32K (32 bytes/line), D cache 32K (32 bytes/line)
CPU: L2 Cache: 256K (32 bytes/line)
CPU: After all inits, caps: 008021bf 808029bf 00000000 00000002 00000000 00000000 00000000
CPU: AMD-K6(tm) 3D+ Processor stepping 01
NVRM: CPU does not support the PAT, falling back to MTRRs.

-- 
-----------------------------------------------------------------------
Bob Tracy                   WTO + WIPO = DMCA? http://www.anti-dmca.org
rct@frus.com
-----------------------------------------------------------------------
-

From: Dave Jones
Date: Wednesday, May 16, 2007 - 12:22 pm

On Wed, May 16, 2007 at 02:11:56PM -0500, Bob Tracy wrote:

 > flags		: fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow k6_mtrr
 > bogomips	: 902.78
 > clflush size	: 32

Ah so it really does think it has mce.
I just dug out the datasheet for the K6-3, and true enough, it did have MCE, however,
it isn't intel compatible. It has two MSRs (MCAR at 0x0, and MCTR at 0x01).
Then the punchline..

"Because the processor does not support machine check exceptions, the contents of the
MCAR and MCTR are only affected by the WRMSR instruction and by RESET being sampled
asserted (where all bits in each register are reset to 0)."

In short, it's useless.
We could clear the capability bit and pretend it isn't there, at no loss of
functionality, or we could revert back to doing model checks instead of cpuid flag checks.

	Dave

-- 
http://www.codemonkey.org.uk
-

From: Dave Jones
Date: Wednesday, May 16, 2007 - 2:07 pm

On Wed, May 16, 2007 at 03:22:48PM -0400, Dave Jones wrote:
 > On Wed, May 16, 2007 at 02:11:56PM -0500, Bob Tracy wrote:
 > 
 >  > flags		: fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow k6_mtrr
 >  > bogomips	: 902.78
 >  > clflush size	: 32
 > 
 > Ah so it really does think it has mce.
 > I just dug out the datasheet for the K6-3, and true enough, it did have MCE, however,
 > it isn't intel compatible. It has two MSRs (MCAR at 0x0, and MCTR at 0x01).
 > Then the punchline..
 > 
 > "Because the processor does not support machine check exceptions, the contents of the
 > MCAR and MCTR are only affected by the WRMSR instruction and by RESET being sampled
 > asserted (where all bits in each register are reset to 0)."
 > 
 > In short, it's useless.
 > We could clear the capability bit and pretend it isn't there, at no loss of
 > functionality, or we could revert back to doing model checks instead of cpuid flag checks.

Bob, does this patch make it boot again for you?

	Dave

Some AMD K6's advertise machine check capability, but don't actually
have an Intel compatible implementation. It also doesn't actually work,
so don't advertise it as being present.

Signed-off-by: Dave Jones <davej@redhat.com>

diff --git a/arch/i386/kernel/cpu/amd.c b/arch/i386/kernel/cpu/amd.c
index 4fec702..3a75c5b 100644
--- a/arch/i386/kernel/cpu/amd.c
+++ b/arch/i386/kernel/cpu/amd.c
@@ -197,7 +197,14 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c)
 				/* placeholder for any needed mods */
 				break;
 			}
+
+			/*
+			 * Some K6's advertise MCE, but it's incompatible
+			 * to Intel style MCE, and also non-functional.
+			 */
+			clear_bit(X86_FEATURE_MCE, c->x86_capability);
 			break;
+
 		case 6: /* An Athlon/Duron */
  
 			/* Bit 15 of Athlon specific MSR 15, needs to be 0
-- 
http://www.codemonkey.org.uk
-

From: Bob Tracy
Date: Wednesday, May 16, 2007 - 9:36 pm

NAK.  No difference.  Identical panic message.  (Yes, I double-checked
to make sure I was booting the patched kernel :-)).

-- 
-----------------------------------------------------------------------
Bob Tracy                   WTO + WIPO = DMCA? http://www.anti-dmca.org
rct@frus.com
-----------------------------------------------------------------------
-

From: Dave Jones
Date: Wednesday, May 16, 2007 - 11:08 pm

On Wed, May 16, 2007 at 11:36:46PM -0500, Bob Tracy wrote:
 > Dave Jones wrote:
 > > Bob, does this patch make it boot again for you?
 > > 
 > > 	Dave
 > > 
 > > Some AMD K6's advertise machine check capability, but don't actually
 > > have an Intel compatible implementation. It also doesn't actually work,
 > > so don't advertise it as being present.
 > > 
 > > Signed-off-by: Dave Jones <davej@redhat.com>
 > 
 > NAK.  No difference.  Identical panic message.  (Yes, I double-checked
 > to make sure I was booting the patched kernel :-)).

Hmm, odd.
Does reverting the patch that Chuck fingered fix it? 

	Dave

-- 
http://www.codemonkey.org.uk
-

From: Bob Tracy
Date: Thursday, May 17, 2007 - 5:34 am

ACK.  I'm running 2.6.22-rc1 minus Joachim's patch as I type this.

Anticipating the question, here's the "Processor type and features"
section from "linux/.config":

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.22-rc1
# Sun May 13 00:29:57 2007
#
(...)
#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
# CONFIG_SMP is not set
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_PARAVIRT is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MCORE2 is not set
# CONFIG_MPENTIUM4 is not set
CONFIG_MK6=y
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_XADD=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_ALIGNMENT_16=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_X86_MINIMUM_CPU_MODEL=4
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not ...
From: Andi Kleen
Date: Thursday, May 17, 2007 - 5:35 am

Hmpf.

We cold either use rdmsr_safe or add a family check again or clear it 
in k6 setup.  I think clearing it in setup is cleanest.

Does this patch work?

-Andi

Clear MCE flag on AMD K6

It reports machine check capability in CPUID, but doesn't actually
implement all the necessary MSRs of the standard Intel machine
check architecture.

This fixes a boot failure recently introduced.

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux/arch/i386/kernel/cpu/amd.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/amd.c
+++ linux/arch/i386/kernel/cpu/amd.c
@@ -280,6 +280,10 @@ static void __cpuinit init_amd(struct cp
 
 	if (c->x86 == 0x10 && !force_mwait)
 		clear_bit(X86_FEATURE_MWAIT, c->x86_capability);
+
+	/* K6s reports MCEs but don't actually have all the MSRs */
+	if (c->x86 < 6) 
+		clear_bit(X86_FEATURE_MCE, c->x86_capability);
 }
 
 static unsigned int __cpuinit amd_size_cache(struct cpuinfo_x86 * c, unsigned 
int size)
Index: linux/arch/i386/kernel/cpu/mcheck/k7.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/mcheck/k7.c
+++ linux/arch/i386/kernel/cpu/mcheck/k7.c
@@ -72,12 +72,12 @@ void amd_mcheck_init(struct cpuinfo_x86 
 	u32 l, h;
 	int i;
 
-	machine_check_vector = k7_machine_check;
-	wmb();
-
 	if (!cpu_has(c, X86_FEATURE_MCE))
 		return;
 
+	machine_check_vector = k7_machine_check;
+	wmb();
+
 	printk (KERN_INFO "Intel machine check architecture supported.\n");
 	rdmsr (MSR_IA32_MCG_CAP, l, h);
 	if (l & (1<<8))	/* Control register present ? */
-

From: Bob Tracy
Date: Thursday, May 17, 2007 - 5:54 am

I want to acknowledge receiving the above, but it arrived too late for
me to test this morning (the work day intrudes).  I'll get a new kernel
built, test this, and report back in about 10-11 hours.

-- 
-----------------------------------------------------------------------
Bob Tracy                   WTO + WIPO = DMCA? http://www.anti-dmca.org
rct@frus.com
-----------------------------------------------------------------------
-

From: Bob Tracy
Date: Thursday, May 17, 2007 - 4:38 pm

ACK.  I reinstalled Joachim's patch (default 2.6.22-rc1 state), and
added your patch.  Life is good: we have a fix/workaround.

-- 
-----------------------------------------------------------------------
Bob Tracy                   WTO + WIPO = DMCA? http://www.anti-dmca.org
rct@frus.com
-----------------------------------------------------------------------
-

From: Joachim Deguara
Date: Friday, May 18, 2007 - 7:38 am

Great, thanks for finding this and Andi for the patch.  I'll talk to our labs 
about dusting off old systems for testing ;)

-Joachim


-

Previous thread: Re: kdb: add rdmsr and wrmsr commands for i386 by Bernardo Innocenti on Tuesday, May 15, 2007 - 8:03 pm. (4 messages)

Next thread: 2.6.22-rc1-mm1 by Andrew Morton on Tuesday, May 15, 2007 - 8:19 pm. (123 messages)