Linus Torvalds writes:You have misunderstood the 21% number. That number has *nothing* to do with hardware TLB miss handling, and everything to do with how long the generic Linux virtual memory code spends doing its thing (page faults, setting up and tearing down Linux page tables, etc.). It doesn't even have anything to do with the hash table (hardware page table), because both cases are using 4k hardware pages. Thus in both cases the TLB misses and hash-table misses would have been the same. The *only* difference between the cases is the page size that the generic Linux virtual memory code is using. With the 64k page size our architecture-independent kernel code runs 21% faster. Thus the 21% is not about the TLB or any hardware thing at all, it's about the larger per-byte overhead of our kernel code when using the smaller page size. The thing you were ranting about -- hardware TLB handling overhead -- comes in at 5%, comparing 4k hardware pages to 64k hardware pages (444 seconds vs. 420 seconds user time for the kernel compile). And yes, it's a POWER6. Paul. --
| Jan Engelhardt | intel iommu (Re: -mm merge plans for 2.6.23) |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Rafael J. Wysocki | Re: Linux 2.6.27-rc5: System boot regression caused by commit a2bd7274b47124d2fc4d... |
git: | |
| Gerrit Renker | [PATCH 0/37] dccp: Feature negotiation - last call for comments |
| David Miller | [GIT]: Networking |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
