V2 -> V3:
+ rebase to 23-mm1 atop RvR's split lru series [no change]
+ fix function return types [void -> int] to fix build when
not configured.
New in V2.
We need to hold the mmap_sem for write to initiatate mlock()/munlock()
because we may need to merge/split vmas. However, this can lead to
very long lock hold times attempting to fault in a large memory region
to mlock it into memory. This can hold off other faults against the
mm [multithreaded tasks] and other scans of the mm, such as via /proc.
To alleviate this, downgrade the mmap_sem to read mode during the
population of the region for locking. This is especially the case
if we need to reclaim memory to lock down the region. We [probably?]
don't need to do this for unlocking as all of the pages should be
resident--they're already mlocked.
Now, the caller's of the mlock functions [mlock_fixup() and
mlock_vma_pages_range()] expect the mmap_sem to be returned in write
mode. Changing all callers appears to be way too much effort at this
point. So, restore write mode before returning. Note that this opens
a window where the mmap list could change in a multithreaded process.
So, at least for mlock_fixup(), where we could be called in a loop over
multiple vmas, we check that a vma still exists at the start address
and that vma still covers the page range [start,end). If not, we return
an error, -EAGAIN, and let the caller deal with it.
Return -EAGAIN from mlock_vma_pages_range() function and mlock_fixup()
if the vma at 'start' disappears or changes so that the page range
[start,end) is no longer contained in the vma. Again, let the caller
deal with it. Looks like only sys_remap_file_pages() [via mmap_region()]
should actually care.
With this patch, I no longer see processes like ps(1) blocked for seconds
or minutes at a time waiting for a large [multiple gigabyte] region to be
locked down.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Rik van Riel <riel@redha...