It's also worth emphasizing that 1.5% of the total time, or 21% of the
system time, is pure software overhead in the Linux kernel that has
nothing to do with the TLB or with gcc's memory access patterns.
That's the cost of handling memory in small (i.e. 4kB) chunks inside
the generic Linux VM code, rather than bigger chunks.
Paul.
--