(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Wed, 20 Aug 2008 17:38:59 -0700 (PDT) Looks like a post-2.6.26 regression caused by 12031a624af7816ec7660b82be648aa3703b4ebe. --
On Wed, Aug 20, 2008 at 6:04 PM, Andrew Morton reg00: base=0xd0000000 (3328MB), size=196864MB: uncachable, count=1 reg01: base=0xe0000000 (3584MB), size=197120MB: uncachable, count=1 reg02: base=0x00000000 ( 0MB), size=212992MB: write-back, count=1 reg03: base=0x400000000 (16384MB), size=197120MB: write-back, count=1 reg04: base=0x420000000 (16896MB), size=196864MB: write-back, count=1 the size mtrr looks crazy. YH --
please apply attached patch and boot with show_msr=1 to dump the msr (including mtrr) YH
looks rather useful - added it to tip/x86/debug. Ingo --
fails to build with the attached config: arch/x86/kernel/cpu/common_64.c: In function
that was one tool to verify if BIOS does right thing about some special bits. it seems it doesn't compile when xen etc is enable in config. YH --
yeah - but would be nice to fix it, as it's a useful diagnostic patch. If people have similar problems in the future they can boot their distro kernels with show_mtrr=x to get a MTRR dump. Ingo --
Both hunks in the patch applied with an offset of -36 but the show_msr flag doesn't seem to have any effect. The dmesg is still the same number of lines accord to wc. I'm attaching the new dmesg anyways. $ cat /proc/cmdline root=/dev/ram0 real_root=/dev/sda3 init=/linuxrc show_msr=1 console=tty0 console=ttyS0,115200n8 -J --
can you check that with tip/master? http://people.redhat.com/mingo/tip.git/readme.txt YH --
I'm doing that now. Note that I was pulling from netdev-2.6 before. -J -- --
The dmesg from the tip kernel is attached and it's at least larger than the last builds dmesg. $ wc -l *.dmesg 683 2.6.27-r3.dmesg 829 2.6.27-rc4-tip.dmesg -J --
did you apply my debug patch? you should get msr print out... YH --
Lol. No - I thought you implied it was in the tip tree. Sigh. I'll try again. -J -- --
I have applied your patch to the tip tree and rebuilt. Still no msr dump. -J --
Ugh - I just realized I forgot to type "-dirty" into grub after rebuilding the kernel. Here is the new dmesg with the msr trace. -J --
[ 0.429971] MSR00000200: 00000000d0000000 [ 0.433305] MSR00000201: 0000000ff0000800 ==> base: 0xd0000000 size: 0x10000000 UC [ 0.436638] MSR00000202: 00000000e0000000 [ 0.439971] MSR00000203: 0000000fe0000800 ==> base: 0xe000000 size: 0x2000000 UC [ 0.443304] MSR00000204: 0000000000000006 [ 0.446637] MSR00000205: 0000000c00000800 ==> base: 0 size 16G WB [ 0.449970] MSR00000206: 0000000400000006 [ 0.453303] MSR00000207: 0000000fe0000800 ==> base: 16G, size: 128M WB [ 0.456636] MSR00000208: 0000000420000006 [ 0.459970] MSR00000209: 0000000ff0000800 ==> base: 16g+128M, size 64M WB [ 0.463303] MSR0000020a: 0000000000000000 [ 0.466636] MSR0000020b: 0000000000000000 [ 0.469969] MSR0000020c: 0000000000000000 [ 0.473302] MSR0000020d: 0000000000000000 it seems right. can you send out /proc/cpuinfo YH --
See below. I should also add that this kernel correctly sets up the mtrrs on an amd system. -- processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU X5482 @ 3.20GHz stepping : 6 cpu MHz : 2400.000 cache size : 6144 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm bogomips : 6386.99 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU X5482 @ 3.20GHz stepping : 6 cpu MHz : 2400.000 cache size : 6144 KB physical id : 1 siblings : 4 core id : 0 cpu cores : 4 apicid : 4 initial apicid : 4 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm bogomips : 6386.12 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU X5482 @ 3.20GHz stepping : 6 cpu MHz ...
good, the root cause is your bios not set mask correctly... it should set var mtrr like [ 0.429971] MSR00000200: 00000000d0000000 [ 0.433305] MSR00000201: 0000000ff0000800 ==> [ 0.433305] MSR00000201: 0000003ff0000800 [ 0.436638] MSR00000202: 00000000e0000000 [ 0.439971] MSR00000203: 0000000fe0000800 ==> [ 0.439971] MSR00000203: 0000003fe0000800 [ 0.443304] MSR00000204: 0000000000000006 [ 0.446637] MSR00000205: 0000000c00000800 ==> [ 0.446637] MSR00000205: 0000003c00000800 [ 0.449970] MSR00000206: 0000000400000006 [ 0.453303] MSR00000207: 0000000fe0000800 ==>[ 0.453303] MSR00000207: 0000003fe0000800 [ 0.456636] MSR00000208: 0000000420000006 [ 0.459970] MSR00000209: 0000000ff0000800 ==> [ 0.459970] MSR00000209: 0000003ff0000800 you may talk to your BIOS vendor or system vendor to request one new updated BIOS. YH --
or please try attached workaround patch. hope it works. Ingo, if it works, we need to push it for 2.6.27 YH
i've tidied up your patch (see the commit below) and have queued it up
in x86/urgent. It seems fairly safe and i guess we can push it to
v2.6.27 if Joshua reports test success. Joshua, could you give it a go
please?
Ingo
-------------->
From 38cc1c3df77c1bb739a4766788eb9fa49f16ffdf Mon Sep 17 00:00:00 2001
From: Yinghai Lu <yhlu.kernel@gmail.com>
Date: Thu, 21 Aug 2008 20:24:24 -0700
Subject: [PATCH] x86: work around MTRR mask setting
Joshua Hoblitt reported that only 3 GB of his 16 GB of RAM is
usable. Booting with mtrr_show showed us the BIOS-initialized
MTRR settings - which are all wrong.
So detect this borkage and add the prefix 111.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/kernel/cpu/mtrr/generic.c | 15 +++++++++++++--
1 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 509bd3d..43102e0 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -379,6 +379,7 @@ static void generic_get_mtrr(unsigned int reg, unsigned long *base,
unsigned long *size, mtrr_type *type)
{
unsigned int mask_lo, mask_hi, base_lo, base_hi;
+ unsigned int tmp, hi;
rdmsr(MTRRphysMask_MSR(reg), mask_lo, mask_hi);
if ((mask_lo & 0x800) == 0) {
@@ -392,8 +393,18 @@ static void generic_get_mtrr(unsigned int reg, unsigned long *base,
rdmsr(MTRRphysBase_MSR(reg), base_lo, base_hi);
/* Work out the shifted address mask. */
- mask_lo = size_or_mask | mask_hi << (32 - PAGE_SHIFT)
- | mask_lo >> PAGE_SHIFT;
+ tmp = mask_hi << (32 - PAGE_SHIFT) | mask_lo >> PAGE_SHIFT;
+ mask_lo = size_or_mask | tmp;
+ /* Expand tmp with high bits to all 1s*/
+ hi = fls(tmp);
+ if (hi > 0) {
+ tmp |= ~((1<<(hi - 1)) - 1);
+
+ if (tmp != mask_lo) {
+ WARN_ON("mtrr: your BIOS has set up an incorrect mask, fixing it up.\n");
+ mask_lo = ...or just try the latest tip/master - please check whether you are getting Ingo --
according to his dmesg, it works. may need to change it to WARN_ON only print one time YH --
I've confirmed that the boards in these systems are Tyan Tempest i5400PW (S5397)s. We've discovered a workload that will deadlock the system under both 2.6.24.2 and -tip kernel with the mtrr masking patch. The only thing unusual about this workload is that one of the binaries in it constantly segvs... Is it possible that these deadlocks (no kernel oops on console) are caused by MSR setup wierdness or is it likely unrelated? -J -- --
could be other problem. cpu should be smarter enough to understand the missing bits in mask. at least amd cpu. remember that we didn't set mask bits to 40bits with opteron with LinuxBIOS, and everything still works well. YH --
yeah. Is the deadlock debuggable? (does nmi_watchdog=1 produce anything useful, or does the enabling of CONFIG_PROVE_LOCKING=y show anything weird in the syslog during light, non-deadlocking use of this workload?) Ingo --
Enabling the nmi_watchdog doesn't produce anything at all (I double checked the .config... it should be working). Rebuilding with PROVE_LOCKING seems to have prevented the deadlock. It used to take 30-45 mins to lock the system up under heavy load and we're going on 6 hours here with no issues. Absolutely nothing in the dmesg. Ugh. Any other suggestions? How bad is it to leave PROVE_LOCKING enabled? -J -- --
do the NMI counts in /proc/interrupts increase about once per second, on every CPU? Do you wait for the deadlock on a text (VGA) console, to make sure you see any NMI watchdog printout? Ingo --
I repulled/rebuilding the tip tree this morning. I can confirm that the show_msr patch is working and mtrr masking patch is now only printing a single warning in the dmesg. The dmesg is attached. As per usual, the kernel folks provide the best support of any software in history. Thanks guys! -J --
can you change WARN_ON to WARN_ON_ONCE ? YH --
the commit below does that. Note that the condition is
WARN_ON(condition) or WARN(string) - WARN_ON(string) will just print a
kernel stack unconditionally. Unfortunately there's no WARN_ONCE().
(Arjan?)
Ingo
---------->
From 1c8aa33e17dc4aa68b329d262fff253648a98adb Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Fri, 22 Aug 2008 08:22:23 +0200
Subject: [PATCH] x86: work around MTRR mask setting, v2
improve the debug printout:
- make it actually display something
- print it only once
would be nice to have a WARN_ONCE() facility, to feed such things to
kerneloops.org.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/kernel/cpu/mtrr/generic.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 43102e0..cb7d3b6 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -401,7 +401,12 @@ static void generic_get_mtrr(unsigned int reg, unsigned long *base,
tmp |= ~((1<<(hi - 1)) - 1);
if (tmp != mask_lo) {
- WARN_ON("mtrr: your BIOS has set up an incorrect mask, fixing it up.\n");
+ static int once = 1;
+
+ if (once) {
+ printk(KERN_INFO "mtrr: your BIOS has set up an incorrect mask, fixing it up.\n");
+ once = 0;
+ }
mask_lo = tmp;
}
}
--
On Fri, 22 Aug 2008 08:24:59 +0200 Andrew removed that from the patches as "unused" :-( oh well easy to add back ;-) -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
please send a patch, with at least one user :-) Ingo --
Thank you for diagnosing the problem so quickly! I think those systems
(we have 16 in that set) are Tyan S539Xs, where X is a 6 or 7. I'll
have to double check the exact model.
http://www.tyan.com/product_board_detail.aspx?pid=562
I'll complain to Tyan about it during business hours tomorrow. Any idea
as to why this issue didn't arise with earlier kernels? I've had Tyan
technical support try to tell me to down rev. kernels in the past
instead of escalating the issue. The more information I can give them
the better.
-J
--
--
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining supp |
