On Mon, Mar 03, 2008 at 05:59:10PM +0100, Nick Piggin wrote:
The range invalidates have a performance advantage for the GRU. TLB invalidates
on the GRU are relatively slow (usec) and interfere somewhat with the performance
of other active GRU instructions. Invalidating a large chunk of addresses with
a single GRU TLBINVAL operation is must faster than issuing a stream of single
page TLBINVALs.
I expect this performance advantage will also apply to other users of mmuops.
Functionally, the GRU is very close to what I would consider to be the
"standard TLB" model. Dropins and flushs map closely to processor dropins
and flushes for cpus. The internal structure of the GRU TLB is identical to
the TLB of existing cpus. Requiring the GRU driver to track dropins with
long term page references seems to me a deviation from having the basic
mmuops support a "standard TLB" model. AFAIK, no other processor requires
this.
Tracking TLB dropins (and long term page references) could be done but it
adds significant complexity and scaling issues. The size of the tables to
track many TB (to PB) of memory can get large. If the memory is being
referenced by highly threaded applications, then the problem becomes even
more complex. Either tables must be replicated per-thread (and require even
more memory), or the table structure becomes even more complex to deal with
node locality, cacheline bouncing, etc.
Try to avoid a requirement to track dropins with long term page references.
Agree.
--