Re: [Bug 11388] New: 2.6.27-rc3 warns about MTRR range; only 3 of 16gb of memory is usable

Previous thread: [patch 4/4] x86: PAT documentation updates with debug info by venkatesh.pallipadi on Wednesday, August 20, 2008 - 4:45 pm. (1 message)

Next thread: [PATCH] setlocalversion: dont include svn change count by Mike Frysinger on Wednesday, August 20, 2008 - 6:25 pm. (1 message)
From: Andrew Morton
Date: Wednesday, August 20, 2008 - 6:04 pm

(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed, 20 Aug 2008 17:38:59 -0700 (PDT)

Looks like a post-2.6.26 regression caused by
12031a624af7816ec7660b82be648aa3703b4ebe.

--

From: Yinghai Lu
Date: Wednesday, August 20, 2008 - 6:20 pm

On Wed, Aug 20, 2008 at 6:04 PM, Andrew Morton
reg00: base=0xd0000000 (3328MB), size=196864MB: uncachable, count=1
reg01: base=0xe0000000 (3584MB), size=197120MB: uncachable, count=1
reg02: base=0x00000000 (   0MB), size=212992MB: write-back, count=1
reg03: base=0x400000000 (16384MB), size=197120MB: write-back, count=1
reg04: base=0x420000000 (16896MB), size=196864MB: write-back, count=1

the size mtrr looks crazy.

YH
--

From: Yinghai Lu
Date: Wednesday, August 20, 2008 - 6:49 pm

please apply attached patch and boot with show_msr=1 to dump the msr
(including mtrr)

YH
From: Ingo Molnar
Date: Thursday, August 21, 2008 - 4:44 am

looks rather useful - added it to tip/x86/debug.

	Ingo
--

From: Ingo Molnar
Date: Thursday, August 21, 2008 - 4:56 am

fails to build with the attached config:

arch/x86/kernel/cpu/common_64.c: In function 
From: Yinghai Lu
Date: Thursday, August 21, 2008 - 8:39 am

that was one tool to verify if BIOS does right thing about some special bits.

it seems it doesn't compile when xen etc is enable in config.

YH
--

From: Ingo Molnar
Date: Thursday, August 21, 2008 - 8:51 pm

yeah - but would be nice to fix it, as it's a useful diagnostic patch. 
If people have similar problems in the future they can boot their distro 
kernels with show_mtrr=x to get a MTRR dump.

	Ingo
--

From: Yinghai Lu
Date: Thursday, August 21, 2008 - 9:45 pm

will look at it.

YH
--

From: Joshua Hoblitt
Date: Thursday, August 21, 2008 - 1:55 pm

Both hunks in the patch applied with an offset of -36 but the show_msr
flag doesn't seem to have any effect.  The dmesg is still the same
number of lines accord to wc.

I'm attaching the new dmesg anyways.

$ cat /proc/cmdline 
root=/dev/ram0 real_root=/dev/sda3 init=/linuxrc show_msr=1 console=tty0
console=ttyS0,115200n8 

-J

--

From: Joshua Hoblitt
Date: Thursday, August 21, 2008 - 4:33 pm

I'm doing that now.  Note that I was pulling from netdev-2.6 before.

-J

--
--

From: Joshua Hoblitt
Date: Thursday, August 21, 2008 - 5:10 pm

The dmesg from the tip kernel is attached and it's at least larger than
the last builds dmesg.

 $ wc -l *.dmesg
  683 2.6.27-r3.dmesg
  829 2.6.27-rc4-tip.dmesg

-J

--
From: Yinghai Lu
Date: Thursday, August 21, 2008 - 5:28 pm

did you apply my debug patch?
you should get msr print out...

YH
--

From: Joshua Hoblitt
Date: Thursday, August 21, 2008 - 5:29 pm

Lol.  No - I thought you implied it was in the tip tree.  Sigh.  I'll
try again.

-J

--
--

From: Joshua Hoblitt
Date: Thursday, August 21, 2008 - 6:00 pm

I have applied your patch to the tip tree and rebuilt.  Still no msr
dump.

-J

--
From: Joshua Hoblitt
Date: Thursday, August 21, 2008 - 6:10 pm

Ugh - I just realized I forgot to type "-dirty" into grub after
rebuilding the kernel.  Here is the new dmesg with the msr trace.

-J

--

From: Yinghai Lu
Date: Thursday, August 21, 2008 - 6:55 pm

[    0.429971]  MSR00000200: 00000000d0000000
[    0.433305]  MSR00000201: 0000000ff0000800

==> base: 0xd0000000 size: 0x10000000 UC

[    0.436638]  MSR00000202: 00000000e0000000
[    0.439971]  MSR00000203: 0000000fe0000800
==> base: 0xe000000 size: 0x2000000 UC

[    0.443304]  MSR00000204: 0000000000000006
[    0.446637]  MSR00000205: 0000000c00000800

==> base: 0 size 16G WB

[    0.449970]  MSR00000206: 0000000400000006
[    0.453303]  MSR00000207: 0000000fe0000800

==> base: 16G, size: 128M WB

[    0.456636]  MSR00000208: 0000000420000006
[    0.459970]  MSR00000209: 0000000ff0000800
==> base: 16g+128M, size 64M WB

[    0.463303]  MSR0000020a: 0000000000000000
[    0.466636]  MSR0000020b: 0000000000000000

[    0.469969]  MSR0000020c: 0000000000000000
[    0.473302]  MSR0000020d: 0000000000000000

it seems right.

can you send out /proc/cpuinfo

YH
--

From: Joshua Hoblitt
Date: Thursday, August 21, 2008 - 7:15 pm

See below.  I should also add that this kernel correctly sets up the mtrrs
on an amd system.

--
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Xeon(R) CPU           X5482  @ 3.20GHz
stepping        : 6
cpu MHz         : 2400.000
cache size      : 6144 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl pni
monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm
bogomips        : 6386.99
clflush size    : 64
cache_alignment : 64
address sizes   : 38 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Xeon(R) CPU           X5482  @ 3.20GHz
stepping        : 6
cpu MHz         : 2400.000
cache size      : 6144 KB
physical id     : 1
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 4
initial apicid  : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl pni
monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm
bogomips        : 6386.12
clflush size    : 64
cache_alignment : 64
address sizes   : 38 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Xeon(R) CPU           X5482  @ 3.20GHz
stepping        : 6
cpu MHz       ...
From: Yinghai Lu
Date: Thursday, August 21, 2008 - 7:26 pm

good, the root cause is your bios not set mask correctly...

it should set var mtrr like

[    0.429971]  MSR00000200: 00000000d0000000
[    0.433305]  MSR00000201: 0000000ff0000800
==> [    0.433305]  MSR00000201: 0000003ff0000800

[    0.436638]  MSR00000202: 00000000e0000000
[    0.439971]  MSR00000203: 0000000fe0000800
==> [    0.439971]  MSR00000203: 0000003fe0000800

[    0.443304]  MSR00000204: 0000000000000006
[    0.446637]  MSR00000205: 0000000c00000800
==> [    0.446637]  MSR00000205: 0000003c00000800

[    0.449970]  MSR00000206: 0000000400000006
[    0.453303]  MSR00000207: 0000000fe0000800
==>[    0.453303]  MSR00000207: 0000003fe0000800

[    0.456636]  MSR00000208: 0000000420000006
[    0.459970]  MSR00000209: 0000000ff0000800
==> [    0.459970]  MSR00000209: 0000003ff0000800

you may talk to your BIOS vendor or system vendor to request one new
updated BIOS.

YH
--

From: Yinghai Lu
Date: Thursday, August 21, 2008 - 8:24 pm

or please try attached workaround patch. hope it works.

Ingo,
if it works, we need to push it for 2.6.27

YH
From: Ingo Molnar
Date: Thursday, August 21, 2008 - 8:50 pm

i've tidied up your patch (see the commit below) and have queued it up 
in x86/urgent. It seems fairly safe and i guess we can push it to 
v2.6.27 if Joshua reports test success. Joshua, could you give it a go 
please?

	Ingo

-------------->
From 38cc1c3df77c1bb739a4766788eb9fa49f16ffdf Mon Sep 17 00:00:00 2001
From: Yinghai Lu <yhlu.kernel@gmail.com>
Date: Thu, 21 Aug 2008 20:24:24 -0700
Subject: [PATCH] x86: work around MTRR mask setting

Joshua Hoblitt reported that only 3 GB of his 16 GB of RAM is
usable. Booting with mtrr_show showed us the BIOS-initialized
MTRR settings - which are all wrong.


So detect this borkage and add the prefix 111.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/cpu/mtrr/generic.c |   15 +++++++++++++--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 509bd3d..43102e0 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -379,6 +379,7 @@ static void generic_get_mtrr(unsigned int reg, unsigned long *base,
 			     unsigned long *size, mtrr_type *type)
 {
 	unsigned int mask_lo, mask_hi, base_lo, base_hi;
+	unsigned int tmp, hi;
 
 	rdmsr(MTRRphysMask_MSR(reg), mask_lo, mask_hi);
 	if ((mask_lo & 0x800) == 0) {
@@ -392,8 +393,18 @@ static void generic_get_mtrr(unsigned int reg, unsigned long *base,
 	rdmsr(MTRRphysBase_MSR(reg), base_lo, base_hi);
 
 	/* Work out the shifted address mask. */
-	mask_lo = size_or_mask | mask_hi << (32 - PAGE_SHIFT)
-	    | mask_lo >> PAGE_SHIFT;
+	tmp = mask_hi << (32 - PAGE_SHIFT) | mask_lo >> PAGE_SHIFT;
+	mask_lo = size_or_mask | tmp;
+	/* Expand tmp with high bits to all 1s*/
+	hi = fls(tmp);
+	if (hi > 0) {
+		tmp |= ~((1<<(hi - 1)) - 1);
+
+		if (tmp != mask_lo) {
+			WARN_ON("mtrr: your BIOS has set up an incorrect mask, fixing it up.\n");
+			mask_lo = ...
From: Ingo Molnar
Date: Thursday, August 21, 2008 - 8:56 pm

or just try the latest tip/master - please check whether you are getting 

	Ingo
--

From: Yinghai Lu
Date: Thursday, August 21, 2008 - 9:48 pm

according to his dmesg, it works.

may need to change it to WARN_ON only print one time

YH
--

From: Joshua Hoblitt
Date: Friday, August 22, 2008 - 5:22 pm

I've confirmed that the boards in these systems are Tyan Tempest i5400PW
(S5397)s.  We've discovered a workload that will deadlock the system
under both 2.6.24.2 and -tip kernel with the mtrr masking patch.  The
only thing unusual about this workload is that one of the binaries in
it constantly segvs...  Is it possible that these deadlocks (no kernel
oops on console) are caused by MSR setup wierdness or is it likely unrelated?

-J

--
--

From: Yinghai Lu
Date: Friday, August 22, 2008 - 10:52 pm

could be other problem.

cpu should be smarter enough to understand the missing bits in mask.
at least amd cpu. remember that we didn't set mask bits to 40bits with
opteron with LinuxBIOS, and everything still works well.

YH
--

From: Ingo Molnar
Date: Saturday, August 23, 2008 - 3:43 am

yeah. Is the deadlock debuggable? (does nmi_watchdog=1 produce anything 
useful, or does the enabling of CONFIG_PROVE_LOCKING=y show anything 
weird in the syslog during light, non-deadlocking use of this workload?)

	Ingo
--

From: Joshua Hoblitt
Date: Tuesday, August 26, 2008 - 1:35 am

Enabling the nmi_watchdog doesn't produce anything at all (I double
checked the .config... it should be working).  Rebuilding with
PROVE_LOCKING seems to have prevented the deadlock.  It used to take
30-45 mins to lock the system up under heavy load and we're going on 6
hours here with no issues.  Absolutely nothing in the dmesg.  Ugh.  Any
other suggestions?  How bad is it to leave PROVE_LOCKING enabled?

-J

--
--

From: Ingo Molnar
Date: Tuesday, August 26, 2008 - 1:42 am

do the NMI counts in /proc/interrupts increase about once per second, on 
every CPU? Do you wait for the deadlock on a text (VGA) console, to make 
sure you see any NMI watchdog printout?

	Ingo
--

From: Joshua Hoblitt
Date: Monday, August 25, 2008 - 2:43 pm

I repulled/rebuilding the tip tree this morning.  I can confirm that the
show_msr patch is working and mtrr masking patch is now only printing a
single warning in the dmesg.  The dmesg is attached.

As per usual, the kernel folks provide the best support of any software
in history.  Thanks guys!

-J

--
From: Yinghai Lu
Date: Thursday, August 21, 2008 - 11:16 pm

can you change WARN_ON to WARN_ON_ONCE ?

YH
--

From: Ingo Molnar
Date: Thursday, August 21, 2008 - 11:24 pm

the commit below does that. Note that the condition is 
WARN_ON(condition) or WARN(string) - WARN_ON(string) will just print a 
kernel stack unconditionally. Unfortunately there's no WARN_ONCE(). 
(Arjan?)

	Ingo

---------->
From 1c8aa33e17dc4aa68b329d262fff253648a98adb Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Fri, 22 Aug 2008 08:22:23 +0200
Subject: [PATCH] x86: work around MTRR mask setting, v2

improve the debug printout:

- make it actually display something
- print it only once

would be nice to have a WARN_ONCE() facility, to feed such things to
kerneloops.org.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/cpu/mtrr/generic.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 43102e0..cb7d3b6 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -401,7 +401,12 @@ static void generic_get_mtrr(unsigned int reg, unsigned long *base,
 		tmp |= ~((1<<(hi - 1)) - 1);
 
 		if (tmp != mask_lo) {
-			WARN_ON("mtrr: your BIOS has set up an incorrect mask, fixing it up.\n");
+			static int once = 1;
+
+			if (once) {
+				printk(KERN_INFO "mtrr: your BIOS has set up an incorrect mask, fixing it up.\n");
+				once = 0;
+			}
 			mask_lo = tmp;
 		}
 	}
--

From: Arjan van de Ven
Date: Saturday, August 23, 2008 - 4:53 pm

On Fri, 22 Aug 2008 08:24:59 +0200

Andrew removed that from the patches as "unused" :-(
oh well easy to add back ;-)


-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Ingo Molnar
Date: Monday, August 25, 2008 - 2:17 am

please send a patch, with at least one user :-)

	Ingo
--

From: Joshua Hoblitt
Date: Thursday, August 21, 2008 - 8:26 pm

Thank you for diagnosing the problem so quickly!  I think those systems
(we have 16 in that set) are Tyan S539Xs, where X is a 6 or 7.  I'll
have to double check the exact model.

    http://www.tyan.com/product_board_detail.aspx?pid=562

I'll complain to Tyan about it during business hours tomorrow.  Any idea
as to why this issue didn't arise with earlier kernels?  I've had Tyan
technical support try to tell me to down rev. kernels in the past
instead of escalating the issue.  The more information I can give them
the better.

-J

--
--

Previous thread: [patch 4/4] x86: PAT documentation updates with debug info by venkatesh.pallipadi on Wednesday, August 20, 2008 - 4:45 pm. (1 message)

Next thread: [PATCH] setlocalversion: dont include svn change count by Mike Frysinger on Wednesday, August 20, 2008 - 6:25 pm. (1 message)