The problem is that the code in rmap.c try_to_umap() and friends loops
over reverse maps after taking a spinlock. The mm_struct is only known
after the rmap has been acccessed. This means *inside* the spinlock.
That is why I tried to convert the locks to scan the revese maps to
semaphores. If that is done then one can indeed do the callouts outside of
atomic contexts.
With larger number of processor semaphores make a lot of sense since the
holdoff times on spinlocks will increase. If we go to sleep then the
processor can do something useful instead of hogging a cacheline.
A rw lock there can also increase concurrency during reclaim espcially if
the anon_vma chains and the number of address spaces mapping a page is
high.
--