Anton Salikhmetov <salikhmetov@gmail.com> writes:
You should probably put your design document somewhere in Documentation
with a patch.
This means on i386 with highmem ptes you will map/flush tlb/unmap each
PTE individually. You will do 512 times as much work as really needed
per PTE leaf page.
The performance critical address space walkers use a different design
pattern that avoids this.
Flushing TLBs unbatched can also be very expensive because if the MM is
shared by several CPUs you'll have a inter-processor interrupt for
each iteration. They are quite costly even on smaller systems.
It would be better if you did a single flush_tlb_range() at the end.
This means on x86 this will currently always do a full flush, but that's
still better than really slowing down in the heavily multithreaded case.
-Andi
--