With the kernel configured for a 64k page size, but using 4k pages in
the hardware page table, I get:
64k/4k: 441.723s user + 27.258s system time
So the improvement in the user time is almost all due to the reduced
TLB misses (as one would expect). For the system time, using 64k
pages in the VM reduces it by about 21%, and using 64k hardware pages
reduces it by another 30%. So the reduction in kernel overhead is
significant but not as large as the impact of reducing TLB misses.
Paul.
--