When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can potentially have the same number of physical address bits as the 64-bit host ("Enhanced Legacy PAE Paging"). This is a bugfix for two cases: 1. running a 32-bit PAE kernel on a machine with more than 64GB RAM. 2. running a 32-bit PAE Xen guest on a host machine with more than 64GB RAM In both cases, a pte could need to have more than 36 bits of physical, and masking it to 36-bits will cause fairly severe havoc. The 46-bit mask used in 64-bit seems pretty arbitrary. The physical size could be between 40 and 52 bits. Setting the mask to 40 bits would restrict the physical size to 1TB, which is definitely too small. Setting it to 52 would be ridiculously large, and runs the risk that one of the vendors may decide to put flags rather than physical address in one of the upper reserved bits. Doing it "properly" would require testing cpuid leaf 0x80000008, but it would mean that we would lose the ability to make all these compile-time constants. So, stick with 46 bits. It's enough for now. [ Ingo: This needs a test, but I think it should be fairly low-risk. If it checks out OK, it should be slipped to Linus fairly soon, since it is a bugfix. It's probably worth putting into stable too. ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jan Beulich <jbeulich@novell.com> Cc: Stable Kernel <stable@kernel.org> diff -r 0eebd30011dc include/asm-x86/page_32.h --- a/include/asm-x86/page_32.h Wed Jun 04 10:32:01 2008 +0100 +++ b/include/asm-x86/page_32.h Thu Jun 05 16:09:53 2008 +0100 @@ -22,7 +22,7 @@ #ifdef CONFIG_X86_PAE -#define __PHYSICAL_MASK_SHIFT 36 +#define __PHYSICAL_MASK_SHIFT 46 #define __VIRTUAL_MASK_SHIFT 32 #define PAGETABLE_LEVELS 3 --
Hmm? There's 11 bits available - why would anyone want to assign bits from the sufficiently official (at least as far as AMD is concerned, I'm not sure I saw a precise statement on Intel's side) frame number bits? And even if they would, it would certainly take some control register bit to enable the feature, so shrinking the mask if that would ever happen would seem more appropriate. Bottom line - I'd suggest pushing both 32- and 64-bits up to 52. Jan --
The Intel docs list those 11 bits as available to software, and are not
reserved for any future flags they may want to add. I was a bit
We could have an auction:
Do I hear 46? 47? 48? 50? 52! Going once, twice, 52 bits!
Anyway, we can fix it later in a separate patch. This is a
change-as-little-as-possible bugfix patch.
J
--
It should either be 52 bits or dynamic based on CPUID information. The latter is very expensive. If there end up being additional control bits assigned in this space we won't use them since we know the size of the address space (which won't include the control bits) and thus will leave them at zero. It's largely theoretical, since I believe Linux on x86-64 relies on virtual >= physical+N, where I believe N is about 3 bits, and the page table format or page size need to change to support more than 48 bits of virtual address space. -hpa --
I'm more concerned that it might not be possible. I'm trying to think
how many places have compile-time constants derived from this mask.
You mean, if new bits appear we can just adjust the mask accordingly to
I don't see any relationship between the physical and virtual size.
Certainly virtual is fixed at 48 bits (4*9+12), but I don't think
there's any deep reason why physical needs to be within 3 bits.
J
--
Correct. Remember, the page table entries come from the kernel - not Identity-mapping. 1 bit goes to kernel/user split, then the kernel area is split into multiple regions, one of which is identity-mapping. It may be just 2. -hpa --
When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can potentially have the same number of physical address bits as the 64-bit host ("Enhanced Legacy PAE Paging"). This means, in theory, we could have up to 52 bits of physical address in a pte. The 32-bit kernel uses a 32-bit unsigned long to represent a pfn. This means that it can only represent physical addresses up to 32+12=44 bits wide. Rather than widening pfns everywhere, just set 2^44 as the Linux x86_32-PAE architectural limit for physical address size. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jan Beulich <jbeulich@novell.com> --- include/asm-x86/page_32.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) =================================================================== --- a/include/asm-x86/page_32.h +++ b/include/asm-x86/page_32.h @@ -22,7 +22,8 @@ #ifdef CONFIG_X86_PAE -#define __PHYSICAL_MASK_SHIFT 36 +/* 44=32+12, the limit we can fit into an unsigned long pfn */ +#define __PHYSICAL_MASK_SHIFT 44 #define __VIRTUAL_MASK_SHIFT 32 #define PAGETABLE_LEVELS 3 --
Ah, yes! When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can potentially have the same number of physical address bits as the 64-bit host ("Enhanced Legacy PAE Paging"). This means, in theory, we could have up to 52 bits of physical address in a pte. The 32-bit kernel uses a 32-bit unsigned long to represent a pfn. This means that it can only represent physical addresses up to 32+12=44 bits wide. Rather than widening pfns everywhere, just set 2^44 as the Linux x86_32-PAE architectural limit for physical address size. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jan Beulich <jbeulich@novell.com> --- include/asm-x86/page_32.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) =================================================================== --- a/include/asm-x86/page_32.h +++ b/include/asm-x86/page_32.h @@ -22,7 +22,8 @@ #ifdef CONFIG_X86_PAE -#define __PHYSICAL_MASK_SHIFT 36 +/* 44=32+12, the limit we can fit into an unsigned long pfn */ +#define __PHYSICAL_MASK_SHIFT 44 #define __VIRTUAL_MASK_SHIFT 32 #define PAGETABLE_LEVELS 3 --
43bits might be actally safer because of potential sign bugs. But of course it won't work anyways likely. -Andi --
I thought about it, but I think we're pretty consistent about putting
pfns into unsigned types. But, yes, you're very marginal at that point.
J
--
applied to tip/x86/cleanups - thanks Jeremy. No urgency for v2.6.26, right? Ingo --
Not urgent, but it would be nice to have.
J
--
ok, cherry-picked it into x86/urgent. This aspect makes it eligible for v2.6.26: | This is a bugfix for two cases: | 1. running a 32-bit PAE kernel on a machine with | more than 64GB RAM. | 2. running a 32-bit PAE Xen guest on a host machine with | more than 64GB RAM | | In both cases, a pte could need to have more than 36 bits of physical, | and masking it to 36-bits will cause fairly severe havoc. also added a stable@kernel.org Cc: to the commit, so it will be picked up in stable as well. Ingo --
The rationale for the 46 bits is that the kernel needs roughly 4x as much virtual space as physical space and the virtual space is limited to 48bits. To be exact 47 bits is always user space and the 47 bits remaining for the kernel are split into half, with one half for the direct mapping and the other half for random mappings. With some pushing you could extend it to 46.5 bits or so, but beyond that you'll be in trouble. It's not arbitrary at all. -Andi --
That is only half of it. Since PHYSICAL_MASK also controls other than RAM mappings, there's really two constants that are needed here: One (46) to indicate how large the 1:1 mapping can possibly get (and hence what the upper boundary of usable RAM is - without introducing highmem), and another (52) to indicate how wide a physical address (perhaps from a 64-bit PCI BAR) can possibly be (i.e. used to validate physical addresses / page table entries). Jan --
Why's that? Is the issue the amount of memory needed for pagetables and
I didn't say it was. That was the introduction to my explanation of why
I didn't think it was arbitrary. Of course, if there had been a comment
there explaining the rationale, I wouldn't have had to make one up...
J
--
No, it's the fact that the 1:1 mapping needs as much virtual space as the physical range covered (including all holes). Jan --
Right, I see. And suddenly 64-bits seems... constrained. ;)
J
--
Not really. The vendors are aware of this constraint -- it's hardly unique to Linux. The reason for canonical addresses and all that jazz is to keep people from doing stupid things like store stuff in the upper 16 bits of a pointer (happened a lot on the 68000, where the first implementation had only 24 address bits.) Thus, all changes needed to go to a larger virtual address space are all internal to the kernel. -hpa --
On Thu, 05 Jun 2008 16:21:14 +0100 the problem on 32 bit is that if you have that much ram, you run out of lowmem FAST.... so you have bigger problems. -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
Sure, you'd have to be barking mad to give a 32-bit system 2^40 bytes of
RAM. But under Xen the host's physical addresses are used in guest
pagetables, so you could have a reasonably sized 32-bit PAE Xen guest be
exposed to huge host physical addresses.
But the basic point is that, given that Enhanced Legacy PAE Paging
exists, 36-bits is not correct, so we should fix it. And if the
platform allows addressable hardware to be physically discontigious -
either memory or devices - then you may end up using large numbers of
physical bits without having a stupid amount of memory actually present.
J
--
