Hi,
In workload with heavy page reclaim, flush_tlb_page() is frequently
used. We currently have 8 vectors for tlb flush, which is fine for small
machines. But for big machines with a lot of CPUs, the 8 vectors are
shared by all CPUs and we need lock to protect them. This will cause a
lot of lock contentions. please see the patch 3 for detailed number of
the lock contention.
Andi Kleen suggests we can use 32 vectors for tlb flush, which should be
fine for even 8 socket machines. Test shows this reduces lock contention
dramatically (see patch 3 for number).
One might argue if this will waste too many vectors and leave less
vectors for devices. This could be a problem. But even we use 32
vectors, we still leave 78 vectors for devices. And we now have per-cpu
vector, vector isn't scarce any more, but I'm open if anybody has
objections.
Thanks,
Shaohua
--