When booting a current x86.git kernel under kvm, I get this:
(qemu) Linux version 2.6.24-rc8 (jeremy@ezr) (gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)) #1928 SMP PREEMPT Thu Jan 24 17:09:04 PST 2008
early_ioremap_init()
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000001fff0000 (usable)
BIOS-e820: 000000001fff0000 - 0000000020000000 (ACPI data)
BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
console [earlyser0] enabled
0MB HIGHMEM available.
511MB LOWMEM available.
Scan SMP from c0000000 for 1024 bytes.
Scan SMP from c009fc00 for 1024 bytes.
Scan SMP from c00f0000 for 65536 bytes.
Scan SMP from c009fc00 for 1024 bytes.
***************
**** WARNING: likely BIOS bug
**** MTRRs don't cover all of memory, trimmed 131056 pages
***************
update e820 for mtrr
modified physical RAM map:
modified: 0000000000000000 - 000000001fff0000 (reserved)
modified: 000000001fff0000 - 0000000020000000 (ACPI data)
modified: 00000000fffc0000 - 0000000100000000 (reserved)
highmem size specified (0MB) is bigger than pages available (0MB)!.
0MB HIGHMEM available.
0MB LOWMEM available.
BUG: Int 6: CR2 00000000
EDI c055de40 ESI c04f7bec EBP c04cff68 ESP c04cff4c
EBX 00100000 EDX 00000006 ECX 0047efff EAX 00000100
err 00000000 EIP c04e5c7d CS 00000060 flg 00010006
Stack: 00000000 c04f7bec c051eb0c c04cff70 c04e5cd1 c04cff80 c04da360 00000010
c04f7bec c04cff98 c04da4e5 c042ac8d 00000000 c042ac70 00000000 c04cffd0
c04da7c0 c04cffe8 c04cb680 c051e920 c051e8e0 00000800 00099800 c04c1000
��Pid: 0, comm: swapper Not tainted 2.6.24-rc8 #1928
[<c04e5c7d>] ? reserve_bootmem_core+0x1e/0x61
[<c04e5cd1>] ? reserve_bootmem+0x11/0x13
[<c04da360>] ? setup_bootmem_allocator+0x42/0x141
[<c04da4e5>] ? setup_memory+0x86/0x8d
[<c04da7c0>] ? ...Looks like the code doesn't check that the CPU *has* MTRRs... -hpa --
on 32 bit, max_pfn is supposed to be 0? will look at it. YH --
please try this
[PATCH] x86: trim RAM need to check if mtrr is there
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index a1551d0..a0b6f55 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -646,9 +646,6 @@ static __init int amd_special_default_mtrr(unsigned long end_pfn)
{
u32 l, h;
- /* Doesn't apply to memory < 4GB */
- if (end_pfn <= (0xffffffff >> PAGE_SHIFT))
- return 0;
if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
return 0;
if (boot_cpu_data.x86 < 0xf || boot_cpu_data.x86 > 0x11)
@@ -682,6 +679,12 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
mtrr_type type;
u64 trim_start, trim_size;
+ /* Doesn't apply to memory < 4GB */
+ if (end_pfn <= (0xffffffffUL >> PAGE_SHIFT))
+ return 0;
+
+ if (!cpu_has_mtrr)
+ return 0;
/*
* Make sure we only trim uncachable memory on machines that
* support the Intel MTRR architecture:
--
I don't think this makes sense to move outside the AMD workaround because the normal MTRRs don't have an implicit hidden 4GB semantics. -Andi --
so check it mtrr is there, also check if mem less 4G and is AMD as early
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Index: linux-2.6/arch/x86/kernel/cpu/mtrr/main.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/mtrr/main.c
+++ linux-2.6/arch/x86/kernel/cpu/mtrr/main.c
@@ -642,13 +642,10 @@ early_param("disable_mtrr_trim", disable
#define Tom2Enabled (1U << 21)
#define Tom2ForceMemTypeWB (1U << 22)
-static __init int amd_special_default_mtrr(unsigned long end_pfn)
+static __init int amd_special_default_mtrr(void)
{
u32 l, h;
- /* Doesn't apply to memory < 4GB */
- if (end_pfn <= (0xffffffff >> PAGE_SHIFT))
- return 0;
if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
return 0;
if (boot_cpu_data.x86 < 0xf || boot_cpu_data.x86 > 0x11)
@@ -682,6 +679,13 @@ int __init mtrr_trim_uncached_memory(uns
mtrr_type type;
u64 trim_start, trim_size;
+ if (!cpu_has_mtrr)
+ return 0;
+
+ /* Doesn't apply to memory < 4GB */
+ if (end_pfn <= (0xffffffffUL >> PAGE_SHIFT))
+ return 0;
+
/*
* Make sure we only trim uncachable memory on machines that
* support the Intel MTRR architecture:
@@ -691,6 +695,9 @@ int __init mtrr_trim_uncached_memory(uns
if (!is_cpu(INTEL) || disable_mtrr_trim || def != MTRR_TYPE_UNCACHABLE)
return 0;
+ if (amd_special_default_mtrr())
+ return 0;
+
/* Find highest cached pfn */
for (i = 0; i < num_var_ranges; i++) {
mtrr_if->get(i, &base, &size, &type);
@@ -702,9 +709,6 @@ int __init mtrr_trim_uncached_memory(uns
highest_addr = base + size;
}
- if (amd_special_default_mtrr(end_pfn))
- return 0;
-
if ((highest_addr >> PAGE_SHIFT) < end_pfn) {
printk(KERN_WARNING "***************\n");
printk(KERN_WARNING "**** WARNING: likely BIOS bug\n");
--
Why the check for < 4 GB? The same thing applies to memory below the 4 GB limit -- in fact, we've had a number of that kind of systems in the past. -hpa --
Thanks, this gets me to usermode under kvm.
J
--
i think the is 0xffffffff to 0xffffffffUL make the difference. the cpu mode in KVM/qemu may cpu_has_mtrr, but doesn't really have return mtrr msr with correct value. is_cpu(INTEL) already make sure we have mtrr_if got assigned and cpu_has_mtrr. may need to fix qemu instead... Can you try post /proc/mtrr for your guest in kvm/qemu? Thanks YH --
jeremy, can you try v3 patch? Thanks YH --
no ... lets not do non-sensical trimming of RAM, ok? Emit a warning but never trim all of RAM and make the system unbootable. Trimmed RAM is something that users can pester board/BIOS vendors with. Non-booting kernels is something _we_ get pestered with ;-) Ingo --
*And* let's push a fix to Qemu/KVM as appropriate. Last I checked Qemu never even turns caching on in %cr0, never mind gets the MTRRs right. If it's advertising MTRRs, this is a problem. -hpa --
so more strict check if mtrr is there really.
bail out if mtrr all blank when qemu cpu model is used
and check if is AMD as early
also remove 4G less check, according to hpa.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Index: linux-2.6/arch/x86/kernel/cpu/mtrr/main.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/mtrr/main.c
+++ linux-2.6/arch/x86/kernel/cpu/mtrr/main.c
@@ -642,13 +642,10 @@ early_param("disable_mtrr_trim", disable
#define Tom2Enabled (1U << 21)
#define Tom2ForceMemTypeWB (1U << 22)
-static __init int amd_special_default_mtrr(unsigned long end_pfn)
+static __init int amd_special_default_mtrr(void)
{
u32 l, h;
- /* Doesn't apply to memory < 4GB */
- if (end_pfn <= (0xffffffff >> PAGE_SHIFT))
- return 0;
if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
return 0;
if (boot_cpu_data.x86 < 0xf || boot_cpu_data.x86 > 0x11)
@@ -686,9 +683,14 @@ int __init mtrr_trim_uncached_memory(uns
* Make sure we only trim uncachable memory on machines that
* support the Intel MTRR architecture:
*/
+ if (!is_cpu(INTEL) || disable_mtrr_trim)
+ return 0;
rdmsr(MTRRdefType_MSR, def, dummy);
def &= 0xff;
- if (!is_cpu(INTEL) || disable_mtrr_trim || def != MTRR_TYPE_UNCACHABLE)
+ if (def != MTRR_TYPE_UNCACHABLE)
+ return 0;
+
+ if (amd_special_default_mtrr())
return 0;
/* Find highest cached pfn */
@@ -702,8 +704,14 @@ int __init mtrr_trim_uncached_memory(uns
highest_addr = base + size;
}
- if (amd_special_default_mtrr(end_pfn))
+ /* kvm/qemu doesn't have mtrr set right, don't trim them all */
+ if (!highest_addr) {
+ printk(KERN_WARNING "***************\n");
+ printk(KERN_WARNING "**** WARNING: likely strange cpu\n");
+ printk(KERN_WARNING "**** MTRRs all blank, cpu in qemu?\n");
+ printk(KERN_WARNING "***************\n");
return 0;
+ }
if ((highest_addr >> PAGE_SHIFT) < end_pfn) {
printk(KERN_WARNING "***************\n");
--
thanks, applied. Shouldnt we put in an exception for when there is MTRR support, but they dont cover anything. Still emit a warning - but booting up real slow is still better than losing all of RAM and crashing ... i also updated the messages, they now go like this: WARNING: strange, CPU MTRRs all blank? and: WARNING: BIOS bug: CPU MTRRs don't cover all of memory, losing 45MB of RAM. Ingo --
The problem is re-occuring for me with current x86.git. Looks like v2
did the trick, and v3 is broken...
J
--
which git head is that? I'm pushing out the queue with v3 included this very minute, so i doubt you can have tested that already! :-) Ingo --
I was referring to:
commit 6a4544a9c8b54b82893044cb53695502cc386f00
Author: Yinghai Lu <Yinghai.Lu@Sun.COM>
Date: Fri Jan 25 00:15:29 2008 +0100
x86_32: trim memory by updating e820 v3
but you've answered the question...
J
--
yeah, i guess what i've pushed out just now is v3++. Does it work for you? Ingo --
Yes, it does. And for the record: [root@qemu ~]# cat /proc/mtrr [root@qemu ~]# J --
so the boot msg say: WARNING: strange, CPU MTRRs all blank? YH --
Yes. WARNING: strange, CPU MTRRs all blank? ------------[ cut here ]------------ WARNING: at /home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/cpu/mtrr/main.c:710 mtrr_trim_uncached_memory+0xf8/0x17b() Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.24 #1938 [<c012804c>] warn_on_slowpath+0x41/0x51 [<c01285d4>] ? __call_console_drivers+0x4e/0x5a [<c03c3e85>] ? _spin_unlock_irqrestore+0xf/0x2f [<c03c3e91>] ? _spin_unlock_irqrestore+0x1b/0x2f [<c012880c>] ? release_console_sem+0x1a0/0x1a8 [<c0128c1f>] ? vprintk+0x294/0x2ee [<c0128c2b>] ? vprintk+0x2a0/0x2ee [<c04e811b>] ? __alloc_bootmem_core+0x100/0x27a [<c0128c8e>] ? printk+0x15/0x17 [<c0128c8e>] ? printk+0x15/0x17 [<c04de51e>] mtrr_trim_uncached_memory+0xf8/0x17b [<c04de7ae>] ? mtrr_bp_init+0x20d/0x215 [<c04dc79b>] setup_arch+0x283/0x3c2 [<c0128c8e>] ? printk+0x15/0x17 [<c04d6689>] start_kernel+0x60/0x311 ======================= J --
so the ram size less 4g is way out for your case. again, can you post /proc/mtrrs with v2 patch? Thanks YH --
It was empty. But ignore this report; it wasn't against the right
version of x86.git.
J
--
Though I don't see this form of message; have you pushed your changes
out to the public x86.git#mm tree?
J
--
it's available ... 3 .. 2 ... 1 ... NOW! ;-) Ingo --
The logic we ideally would like to have is something like this: find _any_ RAM that is not mapped via any MTRRs (be that special MTRR extensions like Tom2) and clear that from the e820 maps. not just 'end of RAM'. And in that sense the amd_special_default_mtrr() check is wrong, because it just checks that Tom2 is set and then does no other checking. And the original MTRR check is wrong too because it just finds the highest cacheable MTRR-covered address and compares that to the kernel-known end of RAM. what we should probably do instead is to have a filter function: new_end = trim_range_to_mtrr_cached(start, end); and then we could iterate through every e820 map entry that is marked as usable RAM, and send it through this filter. If the filter returns the same value that got passed in, we keep the e820 entry unchanged. If the filter returns a new "end" value, we use that in the e820 map. that way, the current Tom2 hack is just a natural extension to the filter function: it would (on AMD CPUs) recognize (within trim_range_to_mtrr_cached filter) that all memory addresses above 4GB are marked as cacheable via Tom2. Or something like this. Hm? Ingo --
I agree that would be the correct way to do it. Later on with PAT that filter could also do PAT related checks and something like this will likely be needed anyways. -Andi --
no, to be fully generic it would have to be able to 'split' e820 entries up and punch holes into them - but we dont want to go that far i think. The most common problem is mismatch at the end of a range. but what matters more is to have full, generic _detection_ of the problem - and that's what we dont do right now. (and that's what my reply outlines) The _fixup_ which we base on this information can then be anything from "trivially trim the end" up to a complex "punch holes" solution or the a "what is the effective MTRR caching attribute of this physical address" type of function would benefit PAT too, yes. Ingo --
For what it's worth, I have a set of code to do this, written in order to canonicalize and modify e820 data structures for memdisk. The key to it is the observation that the e820-delivered (address, len, type) tuples isn't actually the data structure you want -- what you want is an ordered list of (address, type) tuples, where type may includes undefined (the e820 default type - i.e. unused, available address space). In addition to the attached code, to do this right, we probably want a function to do filtering (only clobber entries of a specfic type, in this case type 1) as opposed to blind clobber; with an ordered array this is quite trivial. -hpa
