This series:
- adds a simple debugfs profiling entry for cross-cpu tlb flushes
- converts them to using smp_call_function_mask
- unifies 32 and 64-bit tlb flushes
- converts smp_call_function to using multiple queues (using the now
freed vectors)
- allows config-time adjustment of the number of queues
- adds a kernel parameter to disable multi-queue in case it causes
problems
The main concern is whether using smp_call_function adds an
unacceptible performance hit to cross-cpu tlb flushes. My limited
measurements show a ~35% regression in latency for a particular flush;
it would be interesting to try this on a wider range of hardware. I
gather the effect tlb flush performance is very application specific
as well, but I'm not sure what benchmarks show what effects.
Trading off agains the latency of a given flush, the smp_function_call
mechanism allows multiple requests to be queued, and so may improve
throughput on a system-wide basis.
So, I'd like people to try this out and see what performance effects it
has.
Thanks,
J
--