Do we actually want these DirectMap lines in the x86 /proc/meminfo?
I can see they're interesting to CPA developers and TLB optimizers,
but they don't fit its usual "where has all my memory gone?" usage.
If they are to stay, here are some fixes.
1. On x86_32 without PAE, they're not 2M but 4M pages: no need to
mess with the internal enum, but show the right name to users.
2. Many machines can never show anything but 0 for DirectMap1G,
so suppress that line unless direct_gbpages are really enabled.
3. The unit in /proc/meminfo is kB not number of pages: HugePages
messed that up, but they're an example to regret not to follow.
4. Once we use kB, it's easy to see that 1GB has gone missing (which
explains why CONFIG_CPA_DEBUG=y soon wraps DirectMap2M negative):
because head_64.S's level2_ident_pgt entries were not counted.
My fix is not ideal, but works for more and for less than 1G,
and avoids interfering with early bootup pagetable contortions.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
You might prefer me to split these up?
Should we really be using level2_ident_pgt (which needs to avoid NX)
for the final direct map (which wants to use NX)? But my attempt
to build up a fresh pagetable there failed miserably to boot!
arch/x86/mm/init_64.c | 6 +++++-
arch/x86/mm/pageattr.c | 18 ++++++++++++------
2 files changed, 17 insertions(+), 7 deletions(-)
--- 2.6.27-rc3/arch/x86/mm/init_64.c 2008-07-29 04:24:15.000000000 +0100
+++ linux/arch/x86/mm/init_64.c 2008-08-13 16:37:41.000000000 +0100
@@ -60,7 +60,7 @@ static unsigned long dma_reserve __initd
DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-int direct_gbpages __meminitdata
+int direct_gbpages
#ifdef CONFIG_DIRECT_GBPAGES
= 1
#endif
@@ -314,6 +314,7 @@ phys_pmd_init(pmd_t *pmd_page, unsigned
{
unsigned long pages = 0;
unsigned long last_map_addr = end;
+ unsigned long start = address;
int i = pmd_index(address);
@@ -334,6 +335,9 @@ ...I made them unconditional to minimize the risk of some dumb parser not being able to deal with them. Longer term there will be more and more machines that support them. Admittedly that's not a very strong argument. -Andi --
Yes, that's what I meant by the TLB optimizers. But it's going to be a fractional effect, isn't it, when you're trying to get the last 1% out of the machine? And in such a case, you might wonder more what all the 4k ones are actually being used for (no problem at all if they've ended up behind vmalloced module text). Hugh --
Depending on the workload it can be much more than that. -Andi --
i cannot see any performance difference myself between 2MB and 1GB TLBs.
There are measurements that Andi Kleen did originally in this commit:
commit 8346ea17aa20e9864b0f7dc03d55f3cd5620b8c1
Author: Andi Kleen <andi@firstfloor.org>
Date: Wed Mar 12 03:53:32 2008 +0100
x86: split large page mapping for AMD TSEG
[lower is better]
no split stddev split stddev delta
Elapsed Time 87.146 (0.727516) 84.296 (1.09098) -3.2%
User Time 274.537 (4.05226) 273.692 (3.34344) -0.3%
System Time 34.907 (0.42492) 34.508 (0.26832) -1.1%
Percent CPU 322.5 (38.3007) 326.5 (44.5128) +1.2%
=> About 3.2% improvement in elapsed time for kernbench.
[...]
meanwhile i have Barcelona class hardware myself and i cannot reproduce
these claimed improvements in kernbench performance. gbpages versus
no-gbpages results are dead on the same, within statistical noise.
( i'm sure it could make some difference in synthetic user-space
workloads - but gbpages are not exposed to user-space anyway. )
Ingo
--
i think they are borderline useful - so i've applied your fixes to hm, exactly what change have you tried? (patch?) Ingo --
As soon as that kernel failed to boot, I chucked the patch away and erased it from my mind: much better to leave such a change to the people who are intimate with this sequence and can debug it. It wasn't anything much, the page to use has already been set aside for alloc_low_page, I thought it was just a matter of breaking the association with level2_ident_pgt at the right level then letting phys_pmd_init do its usual setup from scratch. Maybe it didn't work because I got it slightly wrong, or maybe it it didn't work for more subtle reasons e.g. I was then building up the first 1GB of direct map 2MB by 2MB: if direct map is actually used in there and falls out of TLB, I'd certainly be in trouble. Hugh --
