Re: [PATCH] x86: trim ram need to check if mtrr is there v3

Previous thread: [patch] x86, i387: use convert_to_fxsr() in fpregs_set() by Siddha, Suresh B on Thursday, January 24, 2008 - 6:40 pm. (5 messages)

Next thread: [PATCH] x86: Remove nx_enabled from fault.c by Harvey Harrison on Thursday, January 24, 2008 - 9:41 pm. (2 messages)
From: Jeremy Fitzhardinge
Date: Thursday, January 24, 2008 - 6:44 pm

When booting a current x86.git kernel under kvm, I get this:

(qemu) Linux version 2.6.24-rc8 (jeremy@ezr) (gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)) #1928 SMP PREEMPT Thu Jan 24 17:09:04 PST 2008
early_ioremap_init()
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000001fff0000 (usable)
 BIOS-e820: 000000001fff0000 - 0000000020000000 (ACPI data)
 BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
console [earlyser0] enabled
0MB HIGHMEM available.
511MB LOWMEM available.
Scan SMP from c0000000 for 1024 bytes.
Scan SMP from c009fc00 for 1024 bytes.
Scan SMP from c00f0000 for 65536 bytes.
Scan SMP from c009fc00 for 1024 bytes.
***************
**** WARNING: likely BIOS bug
**** MTRRs don't cover all of memory, trimmed 131056 pages
***************
update e820 for mtrr
modified physical RAM map:
 modified: 0000000000000000 - 000000001fff0000 (reserved)
 modified: 000000001fff0000 - 0000000020000000 (ACPI data)
 modified: 00000000fffc0000 - 0000000100000000 (reserved)
highmem size specified (0MB) is bigger than pages available (0MB)!.
0MB HIGHMEM available.
0MB LOWMEM available.
BUG: Int 6: CR2 00000000
     EDI c055de40  ESI c04f7bec  EBP c04cff68  ESP c04cff4c
     EBX 00100000  EDX 00000006  ECX 0047efff  EAX 00000100
     err 00000000  EIP c04e5c7d   CS 00000060  flg 00010006
Stack: 00000000 c04f7bec c051eb0c c04cff70 c04e5cd1 c04cff80 c04da360 00000010
       c04f7bec c04cff98 c04da4e5 c042ac8d 00000000 c042ac70 00000000 c04cffd0
       c04da7c0 c04cffe8 c04cb680 c051e920 c051e8e0 00000800 00099800 c04c1000
��Pid: 0, comm: swapper Not tainted 2.6.24-rc8 #1928
 [<c04e5c7d>] ? reserve_bootmem_core+0x1e/0x61
 [<c04e5cd1>] ? reserve_bootmem+0x11/0x13
 [<c04da360>] ? setup_bootmem_allocator+0x42/0x141
 [<c04da4e5>] ? setup_memory+0x86/0x8d
 [<c04da7c0>] ? ...
From: H. Peter Anvin
Date: Thursday, January 24, 2008 - 6:49 pm

Looks like the code doesn't check that the CPU *has* MTRRs...

	-hpa
--

From: Yinghai Lu
Date: Thursday, January 24, 2008 - 7:21 pm

on 32 bit, max_pfn is supposed to be 0?

will look at it.

YH

--

From: Yinghai Lu
Date: Thursday, January 24, 2008 - 7:32 pm

please try this

[PATCH] x86: trim RAM need to check if mtrr is there

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>

diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index a1551d0..a0b6f55 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -646,9 +646,6 @@ static __init int amd_special_default_mtrr(unsigned long end_pfn)
 {
 	u32 l, h;
 
-	/* Doesn't apply to memory < 4GB */
-	if (end_pfn <= (0xffffffff >> PAGE_SHIFT))
-		return 0;
 	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
 		return 0;
 	if (boot_cpu_data.x86 < 0xf || boot_cpu_data.x86 > 0x11)
@@ -682,6 +679,12 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
 	mtrr_type type;
 	u64 trim_start, trim_size;
 
+	/* Doesn't apply to memory < 4GB */
+	if (end_pfn <= (0xffffffffUL >> PAGE_SHIFT))
+		return 0;
+
+	if (!cpu_has_mtrr)
+		return 0;
 	/*
 	 * Make sure we only trim uncachable memory on machines that
 	 * support the Intel MTRR architecture:
--

From: Andi Kleen
Date: Friday, January 25, 2008 - 3:52 am

I don't think this makes sense to move outside the AMD workaround
because the normal MTRRs don't have an implicit hidden 4GB semantics.

-Andi
--

From: Yinghai Lu
Date: Thursday, January 24, 2008 - 8:47 pm

so check it mtrr is there, also check if mem less 4G and is AMD as early

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>

Index: linux-2.6/arch/x86/kernel/cpu/mtrr/main.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/mtrr/main.c
+++ linux-2.6/arch/x86/kernel/cpu/mtrr/main.c
@@ -642,13 +642,10 @@ early_param("disable_mtrr_trim", disable
 #define Tom2Enabled (1U << 21)
 #define Tom2ForceMemTypeWB (1U << 22)
 
-static __init int amd_special_default_mtrr(unsigned long end_pfn)
+static __init int amd_special_default_mtrr(void)
 {
 	u32 l, h;
 
-	/* Doesn't apply to memory < 4GB */
-	if (end_pfn <= (0xffffffff >> PAGE_SHIFT))
-		return 0;
 	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
 		return 0;
 	if (boot_cpu_data.x86 < 0xf || boot_cpu_data.x86 > 0x11)
@@ -682,6 +679,13 @@ int __init mtrr_trim_uncached_memory(uns
 	mtrr_type type;
 	u64 trim_start, trim_size;
 
+	if (!cpu_has_mtrr)
+		return 0;
+
+	/* Doesn't apply to memory < 4GB */
+	if (end_pfn <= (0xffffffffUL >> PAGE_SHIFT))
+		return 0;
+
 	/*
 	 * Make sure we only trim uncachable memory on machines that
 	 * support the Intel MTRR architecture:
@@ -691,6 +695,9 @@ int __init mtrr_trim_uncached_memory(uns
 	if (!is_cpu(INTEL) || disable_mtrr_trim || def != MTRR_TYPE_UNCACHABLE)
 		return 0;
 
+	if (amd_special_default_mtrr())
+		return 0;
+
 	/* Find highest cached pfn */
 	for (i = 0; i < num_var_ranges; i++) {
 		mtrr_if->get(i, &base, &size, &type);
@@ -702,9 +709,6 @@ int __init mtrr_trim_uncached_memory(uns
 			highest_addr = base + size;
 	}
 
-	if (amd_special_default_mtrr(end_pfn))
-		return 0;
-
 	if ((highest_addr >> PAGE_SHIFT) < end_pfn) {
 		printk(KERN_WARNING "***************\n");
 		printk(KERN_WARNING "**** WARNING: likely BIOS bug\n");
--

From: H. Peter Anvin
Date: Thursday, January 24, 2008 - 10:44 pm

Why the check for < 4 GB?  The same thing applies to memory below the 4 
GB limit -- in fact, we've had a number of that kind of systems in the past.

	-hpa
--

From: Yinghai Lu
Date: Friday, January 25, 2008 - 12:50 am

then we could remove that.

YH
--

From: Jeremy Fitzhardinge
Date: Friday, January 25, 2008 - 12:43 am

Thanks, this gets me to usermode under kvm.

    J
--

From: Yinghai Lu
Date: Friday, January 25, 2008 - 1:13 am

i think the is 0xffffffff to 0xffffffffUL make the difference.

the cpu mode in KVM/qemu may cpu_has_mtrr, but doesn't really have
return mtrr msr with correct value.

is_cpu(INTEL) already make sure we have mtrr_if got assigned and cpu_has_mtrr.

may need to fix qemu instead...

Can you try post /proc/mtrr for your guest in kvm/qemu?

Thanks

YH
--

From: Yinghai Lu
Date: Friday, January 25, 2008 - 1:39 am

jeremy,

can you try v3 patch?

Thanks

YH
--

From: Ingo Molnar
Date: Friday, January 25, 2008 - 4:12 am

no ...

lets not do non-sensical trimming of RAM, ok? Emit a warning but never 
trim all of RAM and make the system unbootable. Trimmed RAM is something 
that users can pester board/BIOS vendors with. Non-booting kernels is 
something _we_ get pestered with ;-)

	Ingo
--

From: H. Peter Anvin
Date: Friday, January 25, 2008 - 8:47 am

*And* let's push a fix to Qemu/KVM as appropriate.

Last I checked Qemu never even turns caching on in %cr0, never mind gets 
the MTRRs right.  If it's advertising MTRRs, this is a problem.

	-hpa
--

From: Yinghai Lu
Date: Friday, January 25, 2008 - 1:42 am

so more strict check if mtrr is there really.
bail out if mtrr all blank when qemu cpu model is used

and check if is AMD as early
also remove 4G less check, according to hpa.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>

Index: linux-2.6/arch/x86/kernel/cpu/mtrr/main.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/mtrr/main.c
+++ linux-2.6/arch/x86/kernel/cpu/mtrr/main.c
@@ -642,13 +642,10 @@ early_param("disable_mtrr_trim", disable
 #define Tom2Enabled (1U << 21)
 #define Tom2ForceMemTypeWB (1U << 22)
 
-static __init int amd_special_default_mtrr(unsigned long end_pfn)
+static __init int amd_special_default_mtrr(void)
 {
 	u32 l, h;
 
-	/* Doesn't apply to memory < 4GB */
-	if (end_pfn <= (0xffffffff >> PAGE_SHIFT))
-		return 0;
 	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
 		return 0;
 	if (boot_cpu_data.x86 < 0xf || boot_cpu_data.x86 > 0x11)
@@ -686,9 +683,14 @@ int __init mtrr_trim_uncached_memory(uns
 	 * Make sure we only trim uncachable memory on machines that
 	 * support the Intel MTRR architecture:
 	 */
+	if (!is_cpu(INTEL) || disable_mtrr_trim)
+		return 0;
 	rdmsr(MTRRdefType_MSR, def, dummy);
 	def &= 0xff;
-	if (!is_cpu(INTEL) || disable_mtrr_trim || def != MTRR_TYPE_UNCACHABLE)
+	if (def != MTRR_TYPE_UNCACHABLE)
+		return 0;
+
+	if (amd_special_default_mtrr())
 		return 0;
 
 	/* Find highest cached pfn */
@@ -702,8 +704,14 @@ int __init mtrr_trim_uncached_memory(uns
 			highest_addr = base + size;
 	}
 
-	if (amd_special_default_mtrr(end_pfn))
+	/* kvm/qemu doesn't have mtrr set right, don't trim them all */
+	if (!highest_addr) {
+		printk(KERN_WARNING "***************\n");
+		printk(KERN_WARNING "**** WARNING: likely strange cpu\n");
+		printk(KERN_WARNING "**** MTRRs all blank, cpu in qemu?\n");
+		printk(KERN_WARNING "***************\n");
 		return 0;
+	}
 
 	if ((highest_addr >> PAGE_SHIFT) < end_pfn) {
 		printk(KERN_WARNING "***************\n");
--

From: Ingo Molnar
Date: Friday, January 25, 2008 - 4:09 am

thanks, applied. Shouldnt we put in an exception for when there is MTRR 
support, but they dont cover anything. Still emit a warning - but 
booting up real slow is still better than losing all of RAM and crashing 
...

i also updated the messages, they now go like this:

  WARNING: strange, CPU MTRRs all blank?

and:

  WARNING: BIOS bug: CPU MTRRs don't cover all of memory, losing 45MB of RAM.

	Ingo
--

From: Jeremy Fitzhardinge
Date: Friday, January 25, 2008 - 11:55 am

The problem is re-occuring for me with current x86.git.  Looks like v2 
did the trick, and v3 is broken...

    J
--

From: Ingo Molnar
Date: Friday, January 25, 2008 - 11:59 am

which git head is that? I'm pushing out the queue with v3 included this 
very minute, so i doubt you can have tested that already! :-)

	Ingo
--

From: Jeremy Fitzhardinge
Date: Friday, January 25, 2008 - 12:01 pm

I was referring to:

commit 6a4544a9c8b54b82893044cb53695502cc386f00
Author: Yinghai Lu <Yinghai.Lu@Sun.COM>
Date:   Fri Jan 25 00:15:29 2008 +0100

    x86_32: trim memory by updating e820 v3


but you've answered the question...

    J
--

From: Ingo Molnar
Date: Friday, January 25, 2008 - 12:04 pm

yeah, i guess what i've pushed out just now is v3++. Does it work for 
you?

	Ingo
--

From: Jeremy Fitzhardinge
Date: Friday, January 25, 2008 - 12:27 pm

Yes, it does.  And for the record:

[root@qemu ~]# cat /proc/mtrr 
[root@qemu ~]# 

	J

--

From: Ingo Molnar
Date: Friday, January 25, 2008 - 12:30 pm

great, thanks for checking.

	Ingo
--

From: Yinghai Lu
Date: Friday, January 25, 2008 - 12:32 pm

so the boot msg say:
WARNING: strange, CPU MTRRs all blank?

YH
--

From: Jeremy Fitzhardinge
Date: Friday, January 25, 2008 - 12:39 pm

Yes.

WARNING: strange, CPU MTRRs all blank?
------------[ cut here ]------------
WARNING: at /home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/cpu/mtrr/main.c:710 mtrr_trim_uncached_memory+0xf8/0x17b()
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.24 #1938
 [<c012804c>] warn_on_slowpath+0x41/0x51
 [<c01285d4>] ? __call_console_drivers+0x4e/0x5a
 [<c03c3e85>] ? _spin_unlock_irqrestore+0xf/0x2f
 [<c03c3e91>] ? _spin_unlock_irqrestore+0x1b/0x2f
 [<c012880c>] ? release_console_sem+0x1a0/0x1a8
 [<c0128c1f>] ? vprintk+0x294/0x2ee
 [<c0128c2b>] ? vprintk+0x2a0/0x2ee
 [<c04e811b>] ? __alloc_bootmem_core+0x100/0x27a
 [<c0128c8e>] ? printk+0x15/0x17
 [<c0128c8e>] ? printk+0x15/0x17
 [<c04de51e>] mtrr_trim_uncached_memory+0xf8/0x17b
 [<c04de7ae>] ? mtrr_bp_init+0x20d/0x215
 [<c04dc79b>] setup_arch+0x283/0x3c2
 [<c0128c8e>] ? printk+0x15/0x17
 [<c04d6689>] start_kernel+0x60/0x311
 =======================

	J


--

From: Yinghai Lu
Date: Friday, January 25, 2008 - 12:19 pm

so the ram size less 4g is way out for your case.

again, can you post /proc/mtrrs with v2 patch?

Thanks

YH
--

From: Jeremy Fitzhardinge
Date: Friday, January 25, 2008 - 12:04 pm

It was empty.  But ignore this report; it wasn't against the right 
version of x86.git.

    J
--

From: Jeremy Fitzhardinge
Date: Friday, January 25, 2008 - 11:59 am

Though I don't see this form of message; have you pushed your changes 
out to the public x86.git#mm tree?

    J
--

From: Ingo Molnar
Date: Friday, January 25, 2008 - 12:02 pm

it's available ... 3 .. 2 ... 1 ... NOW! ;-)

	Ingo
--

From: Ingo Molnar
Date: Friday, January 25, 2008 - 7:21 am

The logic we ideally would like to have is something like this:

find _any_ RAM that is not mapped via any MTRRs (be that special MTRR 
extensions like Tom2) and clear that from the e820 maps.

not just 'end of RAM'.

And in that sense the amd_special_default_mtrr() check is wrong, because 
it just checks that Tom2 is set and then does no other checking. And the 
original MTRR check is wrong too because it just finds the highest 
cacheable MTRR-covered address and compares that to the kernel-known end 
of RAM.

what we should probably do instead is to have a filter function:

   new_end = trim_range_to_mtrr_cached(start, end);

and then we could iterate through every e820 map entry that is marked as 
usable RAM, and send it through this filter. If the filter returns the 
same value that got passed in, we keep the e820 entry unchanged. If the 
filter returns a new "end" value, we use that in the e820 map.

that way, the current Tom2 hack is just a natural extension to the 
filter function: it would (on AMD CPUs) recognize (within 
trim_range_to_mtrr_cached filter) that all memory addresses above 4GB 
are marked as cacheable via Tom2.

Or something like this. Hm?

	Ingo
--

From: Andi Kleen
Date: Friday, January 25, 2008 - 7:57 am

I agree that would be the correct way to do it.

Later on with PAT that filter could also do PAT related checks
and something like this will likely be needed anyways.

-Andi
--

From: Ingo Molnar
Date: Friday, January 25, 2008 - 8:10 am

no, to be fully generic it would have to be able to 'split' e820 entries 
up and punch holes into them - but we dont want to go that far i think. 
The most common problem is mismatch at the end of a range.

but what matters more is to have full, generic _detection_ of the 
problem - and that's what we dont do right now. (and that's what my 
reply outlines)

The _fixup_ which we base on this information can then be anything from 
"trivially trim the end" up to a complex "punch holes" solution or the 

a "what is the effective MTRR caching attribute of this physical 
address" type of function would benefit PAT too, yes.

	Ingo
--

From: H. Peter Anvin
Date: Friday, January 25, 2008 - 8:57 am

For what it's worth, I have a set of code to do this, written in order 
to canonicalize and modify e820 data structures for memdisk.

The key to it is the observation that the e820-delivered (address, len, 
type) tuples isn't actually the data structure you want -- what you want 
is an ordered list of (address, type) tuples, where type may includes 
undefined (the e820 default type - i.e. unused, available address space).

In addition to the attached code, to do this right, we probably want a 
function to do filtering (only clobber entries of a specfic type, in 
this case type 1) as opposed to blind clobber; with an ordered array 
this is quite trivial.

	-hpa
Previous thread: [patch] x86, i387: use convert_to_fxsr() in fpregs_set() by Siddha, Suresh B on Thursday, January 24, 2008 - 6:40 pm. (5 messages)

Next thread: [PATCH] x86: Remove nx_enabled from fault.c by Harvey Harrison on Thursday, January 24, 2008 - 9:41 pm. (2 messages)