i cannot see any performance difference myself between 2MB and 1GB TLBs.
There are measurements that Andi Kleen did originally in this commit:
commit 8346ea17aa20e9864b0f7dc03d55f3cd5620b8c1
Author: Andi Kleen <andi@firstfloor.org>
Date: Wed Mar 12 03:53:32 2008 +0100
x86: split large page mapping for AMD TSEG
[lower is better]
no split stddev split stddev delta
Elapsed Time 87.146 (0.727516) 84.296 (1.09098) -3.2%
User Time 274.537 (4.05226) 273.692 (3.34344) -0.3%
System Time 34.907 (0.42492) 34.508 (0.26832) -1.1%
Percent CPU 322.5 (38.3007) 326.5 (44.5128) +1.2%
=> About 3.2% improvement in elapsed time for kernbench.
[...]
meanwhile i have Barcelona class hardware myself and i cannot reproduce
these claimed improvements in kernbench performance. gbpages versus
no-gbpages results are dead on the same, within statistical noise.
( i'm sure it could make some difference in synthetic user-space
workloads - but gbpages are not exposed to user-space anyway. )
Ingo
--