Fix left over EFI cache mapping problems [History: this was originally part of a larger patch that got partially added earlier. This patch fixes the left-over bugs and also fixes another bug I noticed while revising this] cpa/set_memory_* does not properly support ioremap or fixmap (efi_ioremap on 64bit returns fixmap) addresses. In particular it will not correctly change the alias in the direct mapping and then cause illegal cache attribute aliases. So using it for efi_ioremap is wrong at the current time (it might be possible to fix cpa in the future to handle this correctly, but that requires much more work is probably not i appropiate for .25) Fix efi_ioremap() instead to directly set the correct caching attributes and then set the direct mapping manually. There was also another problem I noticed: the previous call to set_memory_uc() passed in bytes as size, but set_memory_* requires pages. So it would always set far more memory to uncached than really needed. Fixed that one too. [This problem was not fixed in the earlier patchkit, but is now] I also commented a few more potential (but likely not urgent) corner cases. Requires the generic ioremap attribute fix I sent earlier to work correctly on 32bit. Cc: ying.huang@intel.com Signed-off-by: Andi Kleen <ak@suse.de> --- arch/x86/kernel/efi.c | 14 ++++++++------ arch/x86/kernel/efi_64.c | 6 ++++-- include/asm-x86/efi.h | 5 +++-- 3 files changed, 15 insertions(+), 10 deletions(-) Index: linux/arch/x86/kernel/efi.c =================================================================== --- linux.orig/arch/x86/kernel/efi.c +++ linux/arch/x86/kernel/efi.c @@ -413,17 +413,31 @@ void __init efi_enter_virtual_mode(void) efi.systab = NULL; for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) { + int cached; + unsigned numpages; + md = p; if (!(md->attribute & EFI_MEMORY_RUNTIME)) continue; + /* + * RED-PEN The spec (and ia64) has a lot more ...
this is indeed a bug (we change the attributes for a larger area than
needed), but your fix is unclean. Find below a cleaner solution.
Ying, if you agree with this fix could you please test and ACK it before
we push it to Linus? (this fix is also in the latest x86.git#mm)
Thanks,
Ingo
----------------->
Subject: x86: EFI set_memory_x()/set_memory_uc() fixes
From: Ingo Molnar <mingo@elte.hu>
Date: Thu Feb 14 14:21:32 CET 2008
The EFI-runtime mapping code changed a larger memory area than it
should have, due to a pages/bytes parameter mixup.
noticed by Andi Kleen.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/kernel/efi.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Index: linux-x86.q/arch/x86/kernel/efi.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/efi.c
+++ linux-x86.q/arch/x86/kernel/efi.c
@@ -391,7 +391,7 @@ static void __init runtime_code_page_mke
if (md->type != EFI_RUNTIME_SERVICES_CODE)
continue;
- set_memory_x(md->virt_addr, md->num_pages << EFI_PAGE_SHIFT);
+ set_memory_x(md->virt_addr, md->num_pages);
}
}
@@ -434,7 +434,7 @@ void __init efi_enter_virtual_mode(void)
}
if (!(md->attribute & EFI_MEMORY_WB))
- set_memory_uc(md->virt_addr, size);
+ set_memory_uc(md->virt_addr, md->num_pages);
systab = (u64) (unsigned long) efi_phys.systab;
if (md->phys_addr <= systab && systab < end) {
--
You're still ignoring the other problem of set_memory_uc() not handling fixmap and ioremap correctly. I explained it to you several times now and you have not replied to it once. If you think my analysis of this is wrong please describe exactly where it is wrong. Sure if you're only fixing half the bugs you get a simpler patch, but you also still got half the bugs. -Andi --
No, we did not ignore it, and yes, you are wrong.
One thing that you miss is that the 64-bit EFI runtime has to be marked
uncacheable only if it the EFI image attribute signals an uncacheable
area:
if (!(md->attribute & EFI_MEMORY_WB))
set_memory_uc(md->virt_addr, md->num_pages);
and Linux EFI does not support device EFI runtimes. So your observation,
while correct for non-RAM 64-bit EFI images, is theoretical at the
moment and has no practical relevance.
Of course, as we've stated it numerous times, we want this all fixed up,
and we _have_ fixed it up already, but we wanted to do it properly.
Right now we've got the fixes lined up and we are waiting for a test
report and an Ack from Ying Huang. (he reported that current -git worked
just fine for him)
Also note that 64-bit EFI runtime support (the ability to execute EFI
code) is completely new - it got introduced 14 days ago. We only use
fixmaps on 64-bit EFI.
32-bit EFI is more common (but still not very common, compared to other
x86 platforms) and that is totally unaffected by secondary aliases.
(which is a complication of the 64-bit kernel)
Thanks,
Ingo
--
Sorry I didn't get that (you were a bit terse). You're saying the EFI BIOSes will never set that flag ? I'm reading page 123+ of UEFI 2.1 which describes GetMemoryMap() and these flags and I see nothing to that effect. I admit I didn't read the full EFI bible so far so there are certainly EFI aspects I don't understand. Can you please clarify why EFI would not set that flag on Linux? On 32bit it is wrong too I think at least on non default __PAGE_OFFSET Of course it is affected. set_memory_uc() will not fix up the direct mapping in this case either. Given the overlap of PCI hole to direct mapping cases there are more seldom, but certainly exist (e.g. consider 1:3 split and a 2GB PCI hole) And while given that's a relatively obscure case it's a valid regression. -Andi --
On Thu, 14 Feb 2008 22:42:41 +0100 because it will only normally get set on EFI code that lives in device memory. There's no reason to ever use non-cache for ram this way. Ever. Non-cached execution is a TOTAL pain for anything, and will be avoided if at all possible. -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
normally or guaranteed? e.g. possible scenario: one of the EFI runtime services needs access to some device to do something (e.g. for rebooting or putting information into the battery backed RAM). Wouldn't they likely Ok but what prevents the firmware from passing such memory areas? I assume it doesn't know that Linux doesn't need such mappings, does it? Anyways if all the questions above get firmly answered with no then the code setting mappings UC should be definitely removed. I would welcome such a patch. -Andi --
On my test machine, there is EFI runtime memory area with EFI_MEMORY_UC attribute. The following is cut from the dmesg: EFI: mem75: type=4, attr=0xf, range=[0x000000007f4ff000-0x000000007f500000) (0MB) EFI: mem76: type=11, attr=0x8000000000000001, range=[0x00000000fed1c000-0x00000000fed20000) (0MB) EFI: mem77: type=11, attr=0x8000000000000001, range=[0x00000000fffb0000-0x00000000fffb4000) (0MB) Where: attr is md->attribute, #define EFI_MEMORY_RUNTIME 0x8000000000000000 #define EFI_MEMORY_UC 0x0000000000000001 But because end_pfn_map contains the above UC memory area, efi_ioremap() is not used on EFI 64. Best Regards, Huang Ying --
I see. It would be a good idea if you could test with a limited memmap (mem=... command line option) just to make sure this path works Right now I don't think pageattr.c will deal fully correct with the fixmap, but if the git-x86#mm changes are pushed for .25 it might actually work. -Andi --
I tested efi_ioremap with "memmap=...", and it works fine on my 2G EFI box. Best Regards, Huang Ying --
I think the patch following may be better, because it is possible that
the EFI_PAGE_SHIFT and PAGE_SHIFT are different.
Best Regards,
Huang Ying
------------------------->
The EFI-runtime mapping code changed a larger memory area than it
should have, due to a pages/bytes parameter mixup.
noticed by Andi Kleen.
This patch has been tested on Intel x86 platform with EFI 32/64.
Signed-off-by: Huang Ying <ying.huang@intel.com>
---
arch/x86/kernel/efi.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
--- a/arch/x86/kernel/efi.c
+++ b/arch/x86/kernel/efi.c
@@ -386,12 +386,13 @@ static void __init runtime_code_page_mke
/* Make EFI runtime service code area executable */
for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
- md = p;
+ unsigned long num_pages;
+ md = p;
if (md->type != EFI_RUNTIME_SERVICES_CODE)
continue;
-
- set_memory_x(md->virt_addr, md->num_pages << EFI_PAGE_SHIFT);
+ num_pages = (md->num_pages << EFI_PAGE_SHIFT) >> PAGE_SHIFT;
+ set_memory_x(md->virt_addr, num_pages);
}
}
@@ -434,7 +435,7 @@ void __init efi_enter_virtual_mode(void)
}
if (!(md->attribute & EFI_MEMORY_WB))
- set_memory_uc(md->virt_addr, size);
+ set_memory_uc(md->virt_addr, size >> PAGE_SHIFT);
systab = (u64) (unsigned long) efi_phys.systab;
if (md->phys_addr <= systab && systab < end) {
--
If this is a problem in practice, we'd be better off having a helper
overflows at 4GB on x86-32. And maybe you never have areas that big, and
people are moving over to 64-bit anyway, it still sounds like a bug
waiting to happen.
So *if* we care (I doubt we do, since EFI_PAGE_SHIFT at least right now
matches PAGE_SHIFT on x86), you'd probably want to do something like
static inline unsigned long efi_pages_to_native_pages(unsigned long efi_pages)
{
#if EFI_PAGE_SHIFT > PAGE_SHIFT
return efi_pages << (EFI_PAGE_SHIFT - PAGE_SHIFT);
#else
return efi_pages >> (PAGE_SHIFT - EFI_PAGE_SHIFT);
#endif
}
or whatever.
Otherwise, trying to avoid a bug with different page sizes is actually
more likely to *introduce* one rather than fix one..
Linus
--
Yes, I will fix this. Best Regards, Huang Ying --
yeah, this is even cleaner - i guess EFI_PAGE_SHIFT will be around in the future too because it's also used on ia64. Ingo --
right now, EFI page size is 4096: include/linux/efi.h:#define EFI_PAGE_SHIFT 12 i doubt we'll ever change PAGE_SIZE on x86 - ia64's variable lowlevel pagesizes are not particularly useful IMO. I think we'll at most have some generic kernel feature that allows a larger PAGE_CACHE_SHIFT - but on the lowlevel MMU level we'll always stay at 4K. and i doubt EFI_PAGE_SHIFT would want to (ever) go away from 12 either. So perhaps, at least as far as arch/x86/kernel/efi*.c files go, it would be cleaner to just replace EFI_PAGE_SHIFT with PAGE_SHIFT and EFI_PAGE_SIZE with PAGE_SIZE? Ingo --
Maybe. On x86, the only usage of EFI memory map (and EFI_PAGE_SHIFT/EFI_PAGE_SIZE) is to map the EFI runtime memory area. So I think either dealing with potential difference now or doing it in the future when necessary is easy. Best Regards, Huang Ying --
After looking up in UEFI specification, I found that it is specified by UEFI specification 2.1 (section 6.2, page 124) that the NumberOfPages (num_pages) of EFI_MEMORY_DESCRIPTOR (efi_memory_desc_t) must be "Number of 4KB pages in the memory region". So we need not worry about potential EFI_PAGE_SHIFT changes. Best Regards, --
> After looking up in UEFI specification, I found that it is specified by Then it would be cleaner to just use PAGE_SIZE in the x86 specific EFI code imho. -Andi --
(as i suggested in the mail Ying replied to.) But ... as long as ia64 uses EFI too i dont mind if it's using EFI_PAGE_SHIFT consistently, even in the arch/x86-only files. Ingo --
Can EFI_PAGE_SIZE ever be < 4k? If yes you would need to round up first to linux page size before shifting. -Andi --
Yes. It is needed. And md->virt_addr should be processed as follow: md->virt_addr & PAGE_MASK before fed into set_memory_*. Best Regards, Huang Ying --
If you do that you also need to add the old offset into the page to the size. I just mention it because ioremap_change_attr() did that wrong before my patch yesterday :) -Andi --
