Hi all, The attached patch enables EFI to run in physical mode. Basically EFI is in physical mode at first and it's switched to virtual mode after calling SetVirtualAddressMap. By applying this patch, you can run EFI always in physical mode. And you can also specify "virtefi" as kernel boot parameter to run EFI in virtual mode as before. Note that this patch supports only x86_64. This is needed to run kexec/kdump in EFI-booted system. The following is an original discussion. In this thread, I explained that kdump does not work because EFI system table is modified by SetVirtualAddressMap. And the idea to run EFI in physical mode was proposed. This patch implements it. Basic idea of this patch is to create EFI own pagetable. This pagetable maps physical address of EFI runtime to the virtual address which is the same value so that we can call it directly. For example, physical address 0x800000 is mapped to virtual address 0x800000. Before calling EFI runtime, cr3 register is switched to this pagetable, and restored when we come back from EFI. Any comments would be appreciated. Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com> --- arch/x86/include/asm/efi.h | 3 arch/x86/kernel/efi.c | 142 ++++++++++++++++++++++++++++++++++- arch/x86/kernel/efi_32.c | 4 arch/x86/kernel/efi_64.c | 92 ++++++++++++++++++++++ include/linux/efi.h | 1 include/linux/init.h | 1 init/main.c | 16 +++ 7 files changed, 254 insertions(+), 5 deletions(-) diff -Nurp linux-2.6.35.org/arch/x86/include/asm/efi.h linux-2.6.35/arch/x86/include/asm/efi.h --- linux-2.6.35.org/arch/x86/include/asm/efi.h 2010-08-01 18:11:14.000000000 -0400 +++ linux-2.6.35/arch/x86/include/asm/efi.h 2010-08-13 14:39:25.817104994 -0400 @@ -93,6 +93,9 @@ extern int add_efi_memmap; extern void efi_reserve_early(void); extern void efi_call_phys_prelog(void); extern void efi_call_phys_epilog(void); +extern void ...
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> There is what appears to be unneeded redundancy (we need two implementations of physciall calls into efi?), but that is confined to the weird efi state. It is a shame you haven't done the little bit extra to get efi_pagetable_init working on x86_32. Overall this seems sane and confined to the x86 efi, and it looks like further improvements could easily be layered on top of this one. Eric --
Unfortunately I don't have a machine to test. The machine I'm using does not support EFI on x86_32:-( I'd appreciate it if anyone try it... Thanks, --
Any hope to get a followon patch for i386 as well? That would make it largely a no-brainer. Tony, does this affect ia64 in any way? -hpa --
> does this affect ia64 in any way? I remember Eric complaining that set_virtual_address_map() was a one way trap door with no way to get back to physical mode ... and thus this was a big problem to support kexec on ia64. And yet we still call it, and ia64 can do kexec. So some other work around must have been found. Can't immediately remember what it was though. -Tony --
There is a hack in the code someplace on ia64 to pass the virtual address efi was mapped at to the next kernel, and have the kernel make certain to use efi there, without calling set_virtual_address_map(). For similar kernels that is fine at some point I expect kernel divergence will make that scheme unworkable. Essentially this is the same as using physical addresses but starting with the virtual addresses. For ia64 I seem to recall some weird floating point fixup routines that benefited from the speed set_virtual_address_map() provided. For x86_64 where the primary (sole?) reason for enabling EFI handling is to set efi variables from linux, I don't see a case where enabling virtual mode makes sense. If EFI stays around on x86, always running the calls in physical mode and in other ways slowly decreasing our dependence on perfect efi implementations seems necessary. As to Peter's question I did not see any of that code that affected anything that ia64 used. Eric --
I guess my real question was "is this something IA64 could benefit from and/or could we make the IA64 code more similar to the x86 bits"? -hpa --
If Eric's recollection about the "weird floating point fixup routines"[1] performance issues are correct - then ia64 won't want to do this. -Tony [1] more usually called FPSWA - floating point software assist - which handle a bunch or corner cases in denormalized floating point values that the h/w doesn't cover. --
I proposed something similar to this for ia64 at one point to solve the problem of kexecing to Xen - which at that time mapped EFI to a different location to Linux. As I recall, the idea was shot-down by SGI Altix people on the basis potential performance problems. I don't recall any reasons more specific than that being given (and to be honest I was less than happy about it at the time). In the end I moved EFI in Xen to match Linux and have been able to ignore the problem ever since. Though as Eric pointed out elsewhere in this thread, there is ample scope for incompatibilities with future/other kernels. --
Another aspect of this... this plays well into the already-outstanding proposal to keep an identity-mapped set of page tables around at all times. Right now we do it ad hoc for 64 bits and not really for 32 bits, but that is being changed, see the thread starting at: http://marc.info/?i=1280940316-7966-1-git-send-email-bp@amd64.org This would definitely be better than keeping yet another private page table. -hpa --
efi_flags and save_cr3 should be per-CPU, because they now will be used after SMP is enabled. efi_pgd should be dynamically allocated instead of statically allocated, because EFI may be not enabled on some platform. And I think it is better to unify early physical mode with run-time physical mode. Just allocate the page table with early page allocator (lmb?). Best Regards, Huang Ying --
No, it should not be dynamic; rather we should unify all the users who need a 1:1 map and just keep that page table set around. -- Sent from my mobile phone. Please pardon any lack of formatting. --
Agree. One known issue of global 1:1 map is that we need to make at least part of page table PAGE_KERNEL_EXEC for EFI runtime code, and change_page_attr can not be used before page allocator is available. Best Regards, Huang Ying --
For the 1:1 map we probably should make all pages executable; other things need it too, but we shouldn't have it mapped in except when needed. -- Sent from my mobile phone. Please pardon any lack of formatting. --
We still want to restore cr3 from the local task structure as soon as is reasonable, as an identity mapped page table will have page 0 We need to be careful in the setup of the global page table so that we are in sync with the pat structure for the attributes pages are mapped so that we don't map a page as cached and uncached at the same time. Otherwise we could accidentally get cache corruption. To do that would seem to mean change_page_attr is relevant at least after we switch from our default set of page permissions. Eric --
Quite, which is yet another reason to have a common global page table for all the 1:1 users... right now this is all ad hoc. -hpa --
