Re: [PATCH] Fix left over EFI cache mapping problems

Previous thread: [PATCH] libata: Register for dock events when the drive is inside a dock station by Holger Macht on Thursday, February 14, 2008 - 5:40 am. (15 messages)

Next thread: linux-next: first tree by Stephen Rothwell on Thursday, February 14, 2008 - 6:35 am. (61 messages)
From: Andi Kleen
Date: Thursday, February 14, 2008 - 6:13 am

Fix left over EFI cache mapping problems

[History: this was originally part of a larger patch that got
partially added earlier. This patch fixes the left-over bugs
and also fixes another bug I noticed while revising this]

cpa/set_memory_* does not properly support ioremap or fixmap
(efi_ioremap on 64bit returns fixmap) addresses. In particular 
it will not correctly change the alias in the direct mapping
and then cause illegal cache attribute aliases.

So using it for efi_ioremap is wrong at the current time
(it might be possible to fix cpa in the future to handle
this correctly, but that requires much more work is probably not i
appropiate for .25)

Fix efi_ioremap() instead to directly set the correct caching
attributes and then set the direct mapping manually.

There was also another problem I noticed: the previous call to 
set_memory_uc() passed in bytes as size, but set_memory_* 
requires pages. So it would always set far more memory
to uncached than really needed. Fixed that one too.
[This problem was not fixed in the earlier patchkit, but is now]

I also commented a few more potential (but likely not urgent)
corner cases.

Requires the generic ioremap attribute fix I sent earlier
to work correctly on 32bit.

Cc: ying.huang@intel.com

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86/kernel/efi.c    |   14 ++++++++------
 arch/x86/kernel/efi_64.c |    6 ++++--
 include/asm-x86/efi.h    |    5 +++--
 3 files changed, 15 insertions(+), 10 deletions(-)

Index: linux/arch/x86/kernel/efi.c
===================================================================
--- linux.orig/arch/x86/kernel/efi.c
+++ linux/arch/x86/kernel/efi.c
@@ -413,17 +413,31 @@ void __init efi_enter_virtual_mode(void)
 
 	efi.systab = NULL;
 	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
+		int cached;
+		unsigned numpages;
+
 		md = p;
 		if (!(md->attribute & EFI_MEMORY_RUNTIME))
 			continue;
 
+		/*
+		 * RED-PEN The spec (and ia64) has a lot more ...
From: Ingo Molnar
Date: Thursday, February 14, 2008 - 9:12 am

this is indeed a bug (we change the attributes for a larger area than 
needed), but your fix is unclean. Find below a cleaner solution.

Ying, if you agree with this fix could you please test and ACK it before 
we push it to Linus? (this fix is also in the latest x86.git#mm)

Thanks,

	Ingo

----------------->
Subject: x86: EFI set_memory_x()/set_memory_uc() fixes
From: Ingo Molnar <mingo@elte.hu>
Date: Thu Feb 14 14:21:32 CET 2008

The EFI-runtime mapping code changed a larger memory area than it
should have, due to a pages/bytes parameter mixup.

noticed by Andi Kleen.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/efi.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-x86.q/arch/x86/kernel/efi.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/efi.c
+++ linux-x86.q/arch/x86/kernel/efi.c
@@ -391,7 +391,7 @@ static void __init runtime_code_page_mke
 		if (md->type != EFI_RUNTIME_SERVICES_CODE)
 			continue;
 
-		set_memory_x(md->virt_addr, md->num_pages << EFI_PAGE_SHIFT);
+		set_memory_x(md->virt_addr, md->num_pages);
 	}
 }
 
@@ -434,7 +434,7 @@ void __init efi_enter_virtual_mode(void)
 		}
 
 		if (!(md->attribute & EFI_MEMORY_WB))
-			set_memory_uc(md->virt_addr, size);
+			set_memory_uc(md->virt_addr, md->num_pages);
 
 		systab = (u64) (unsigned long) efi_phys.systab;
 		if (md->phys_addr <= systab && systab < end) {
--

From: Andi Kleen
Date: Thursday, February 14, 2008 - 10:16 am

You're still ignoring the other problem of set_memory_uc() not 
handling fixmap and ioremap correctly. I explained it to you
several times now and you have not replied to it once. If you
think my analysis of this is wrong please describe exactly
where it is wrong.

Sure if you're only fixing half the bugs you get a simpler patch,
but you also still got half the bugs.

-Andi
--

From: Ingo Molnar
Date: Thursday, February 14, 2008 - 11:38 am

No, we did not ignore it, and yes, you are wrong.

One thing that you miss is that the 64-bit EFI runtime has to be marked 
uncacheable only if it the EFI image attribute signals an uncacheable 
area:

                if (!(md->attribute & EFI_MEMORY_WB))
                        set_memory_uc(md->virt_addr, md->num_pages);

and Linux EFI does not support device EFI runtimes. So your observation, 
while correct for non-RAM 64-bit EFI images, is theoretical at the 
moment and has no practical relevance.

Of course, as we've stated it numerous times, we want this all fixed up, 
and we _have_ fixed it up already, but we wanted to do it properly. 
Right now we've got the fixes lined up and we are waiting for a test 
report and an Ack from Ying Huang. (he reported that current -git worked 
just fine for him)

Also note that 64-bit EFI runtime support (the ability to execute EFI 
code) is completely new - it got introduced 14 days ago. We only use 
fixmaps on 64-bit EFI.

32-bit EFI is more common (but still not very common, compared to other 
x86 platforms) and that is totally unaffected by secondary aliases. 
(which is a complication of the 64-bit kernel)

Thanks,

	Ingo
--

From: Andi Kleen
Date: Thursday, February 14, 2008 - 2:42 pm

Sorry I didn't get that (you were a bit terse). 

You're saying the EFI BIOSes will never set that flag ?

I'm reading page 123+ of UEFI 2.1 which describes GetMemoryMap() 
and these flags and I see nothing to that effect. I admit I didn't
read the full EFI bible so far so there are certainly EFI
aspects I don't understand.

Can you please clarify why EFI would not set that flag on Linux? 

On 32bit it is wrong too I think at least on non default __PAGE_OFFSET

Of course it is affected. set_memory_uc() will not fix up the 
direct mapping in this case either. Given the overlap of PCI hole
to direct mapping cases there are more seldom, but certainly
exist (e.g. consider 1:3 split and a 2GB PCI hole) 

And while given that's a relatively obscure case it's a valid regression.

-Andi

--

From: Arjan van de Ven
Date: Thursday, February 14, 2008 - 3:08 pm

On Thu, 14 Feb 2008 22:42:41 +0100

because it will only normally get set on EFI code that lives in device memory.
There's no reason to ever use non-cache for ram this way. Ever. Non-cached execution
is a TOTAL pain for anything, and will be avoided if at all possible.

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Andi Kleen
Date: Thursday, February 14, 2008 - 4:01 pm

normally or guaranteed? 

e.g. possible scenario: one of the EFI runtime services needs
access to some device to do something (e.g. for rebooting
or putting information into the battery backed RAM). Wouldn't they likely 

Ok but what prevents the firmware from passing such memory areas?
I assume it doesn't know that Linux doesn't need such mappings, does it?

Anyways if all the questions above get firmly answered with no then the code 
setting mappings UC should be definitely removed.  I would welcome such
a patch.

-Andi

--

From: Huang, Ying
Date: Thursday, February 14, 2008 - 7:52 pm

On my test machine, there is EFI runtime memory area with EFI_MEMORY_UC
attribute. The following is cut from the dmesg:

EFI: mem75: type=4, attr=0xf, range=[0x000000007f4ff000-0x000000007f500000) (0MB)
EFI: mem76: type=11, attr=0x8000000000000001, range=[0x00000000fed1c000-0x00000000fed20000) (0MB)
EFI: mem77: type=11, attr=0x8000000000000001, range=[0x00000000fffb0000-0x00000000fffb4000) (0MB)

Where:

attr is md->attribute,
#define EFI_MEMORY_RUNTIME 0x8000000000000000
#define EFI_MEMORY_UC 0x0000000000000001

But because end_pfn_map contains the above UC memory area, efi_ioremap()
is not used on EFI 64.

Best Regards,
Huang Ying

--

From: Andi Kleen
Date: Friday, February 15, 2008 - 1:55 am

I see. It would be a good idea if you could test with a limited
memmap (mem=... command line option) just to make sure this path works

Right now I don't think pageattr.c will deal fully correct with 
the fixmap, but if the git-x86#mm changes are pushed for .25
it might actually work.

-Andi
--

From: Huang, Ying
Date: Friday, February 15, 2008 - 2:16 am

I tested efi_ioremap with "memmap=...", and it works fine on my 2G EFI
box.

Best Regards,
Huang Ying

--

From: Huang, Ying
Date: Thursday, February 14, 2008 - 9:48 pm

I think the patch following may be better, because it is possible that
the EFI_PAGE_SHIFT and PAGE_SHIFT are different.

Best Regards,
Huang Ying

-------------------------> 
The EFI-runtime mapping code changed a larger memory area than it
should have, due to a pages/bytes parameter mixup.

noticed by Andi Kleen.

This patch has been tested on Intel x86 platform with EFI 32/64.

Signed-off-by: Huang Ying <ying.huang@intel.com>

---
 arch/x86/kernel/efi.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

--- a/arch/x86/kernel/efi.c
+++ b/arch/x86/kernel/efi.c
@@ -386,12 +386,13 @@ static void __init runtime_code_page_mke
 
 	/* Make EFI runtime service code area executable */
 	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
-		md = p;
+		unsigned long num_pages;
 
+		md = p;
 		if (md->type != EFI_RUNTIME_SERVICES_CODE)
 			continue;
-
-		set_memory_x(md->virt_addr, md->num_pages << EFI_PAGE_SHIFT);
+		num_pages = (md->num_pages << EFI_PAGE_SHIFT) >> PAGE_SHIFT;
+		set_memory_x(md->virt_addr, num_pages);
 	}
 }
 
@@ -434,7 +435,7 @@ void __init efi_enter_virtual_mode(void)
 		}
 
 		if (!(md->attribute & EFI_MEMORY_WB))
-			set_memory_uc(md->virt_addr, size);
+			set_memory_uc(md->virt_addr, size >> PAGE_SHIFT);
 
 		systab = (u64) (unsigned long) efi_phys.systab;
 		if (md->phys_addr <= systab && systab < end) {

--

From: Linus Torvalds
Date: Thursday, February 14, 2008 - 10:44 pm

If this is a problem in practice, we'd be better off having a helper 

overflows at 4GB on x86-32. And maybe you never have areas that big, and 
people are moving over to 64-bit anyway, it still sounds like a bug 
waiting to happen.

So *if* we care (I doubt we do, since EFI_PAGE_SHIFT at least right now 
matches PAGE_SHIFT on x86), you'd probably want to do something like

  static inline unsigned long efi_pages_to_native_pages(unsigned long efi_pages)
  {
  #if EFI_PAGE_SHIFT > PAGE_SHIFT
	return efi_pages << (EFI_PAGE_SHIFT - PAGE_SHIFT);
  #else
	return efi_pages >> (PAGE_SHIFT - EFI_PAGE_SHIFT);
  #endif
  }

or whatever.

Otherwise, trying to avoid a bug with different page sizes is actually 
more likely to *introduce* one rather than fix one..

			Linus
--

From: Huang, Ying
Date: Thursday, February 14, 2008 - 11:24 pm

Yes, I will fix this.

Best Regards,
Huang Ying

--

From: Ingo Molnar
Date: Friday, February 15, 2008 - 12:30 am

yeah, this is even cleaner - i guess EFI_PAGE_SHIFT will be around in 
the future too because it's also used on ia64.

	Ingo
--

From: Ingo Molnar
Date: Friday, February 15, 2008 - 12:08 am

right now, EFI page size is 4096:

  include/linux/efi.h:#define EFI_PAGE_SHIFT          12

i doubt we'll ever change PAGE_SIZE on x86 - ia64's variable lowlevel 
pagesizes are not particularly useful IMO. I think we'll at most have 
some generic kernel feature that allows a larger PAGE_CACHE_SHIFT - but 
on the lowlevel MMU level we'll always stay at 4K.

and i doubt EFI_PAGE_SHIFT would want to (ever) go away from 12 either. 

So perhaps, at least as far as arch/x86/kernel/efi*.c files go, it would 
be cleaner to just replace EFI_PAGE_SHIFT with PAGE_SHIFT and 
EFI_PAGE_SIZE with PAGE_SIZE?

	Ingo
--

From: Huang, Ying
Date: Friday, February 15, 2008 - 12:32 am

Maybe. On x86, the only usage of EFI memory map (and
EFI_PAGE_SHIFT/EFI_PAGE_SIZE) is to map the EFI runtime memory area. So
I think either dealing with potential difference now or doing it in the
future when necessary is easy.

Best Regards,
Huang Ying

--

From: Huang, Ying
Date: Sunday, February 17, 2008 - 6:53 pm

After looking up in UEFI specification, I found that it is specified by
UEFI specification 2.1 (section 6.2, page 124) that the NumberOfPages
(num_pages) of EFI_MEMORY_DESCRIPTOR (efi_memory_desc_t) must be "Number
of 4KB pages in the memory region". So we need not worry about potential
EFI_PAGE_SHIFT changes.

Best Regards,

--

From: Andi Kleen
Date: Monday, February 18, 2008 - 4:26 am

> After looking up in UEFI specification, I found that it is specified by


Then it would be cleaner to just use PAGE_SIZE in the x86 specific EFI
code imho.

-Andi
--

From: Ingo Molnar
Date: Monday, February 18, 2008 - 7:05 am

(as i suggested in the mail Ying replied to.)

But ... as long as ia64 uses EFI too i dont mind if it's using 
EFI_PAGE_SHIFT consistently, even in the arch/x86-only files.

	Ingo
--

From: Andi Kleen
Date: Friday, February 15, 2008 - 1:48 am

Can EFI_PAGE_SIZE ever be < 4k? If yes you would need to round up
first to linux page size before shifting.

-Andi

--

From: Huang, Ying
Date: Friday, February 15, 2008 - 2:21 am

Yes. It is needed. And md->virt_addr should be processed as follow:

md->virt_addr & PAGE_MASK

before fed into set_memory_*.

Best Regards,
Huang Ying

--

From: Andi Kleen
Date: Friday, February 15, 2008 - 2:43 am

If you do that you also need to add the old offset into the page
to the size.

I just mention it because ioremap_change_attr() did that wrong
before my patch yesterday :)

-Andi

--

Previous thread: [PATCH] libata: Register for dock events when the drive is inside a dock station by Holger Macht on Thursday, February 14, 2008 - 5:40 am. (15 messages)

Next thread: linux-next: first tree by Stephen Rothwell on Thursday, February 14, 2008 - 6:35 am. (61 messages)