Thanks to you both for looking into this. I far prefer this general approach
than cleaning up the migration PTEs as the page tables get copied. While it
might "work", it's sloppy in the same way as having migration_entry_wait()
do the cleanup was sloppy. It's far preferable to make the VMA move and
page table copy atomic with anon_vma->lock.
On Wed, Apr 28, 2010 at 04:28:38PM +0900, KAMEZAWA Hiroyuki wrote:
The biggest problem is that the reverse mapping is temporarily out of
sync until do_exit gets rid of the mess, but how serious is that really?
If there is a migration entry in there, the mapcount should already be zero and
migration holds a reference to the page to prevent it going away. rmap_walk()
may then miss the migration_pte so it gets left behind. Ordinarily this
would be bad but in exec(), we cannot be faulting this page so we won't
trigger the bug in swapops. Instead, do_exit ultimately will skip over the
migration PTE doing nothing with the page but as the mapcount is still zero,
the page won't leak.
I see the point of your patch but I'm not yet seeing why it is
necessary to back out if move_page_tables fails.
That said, both patches have a greater problem. Both of them hold a spinlock
(anon_vma->lock) while calling into the page allocator with GFP_KERNEL (to
allocate the page tables). We don't want to change that to GFP_ATOMIC so
either we need to allocate the pages in advance or special case rmap_walk()
to not walk processes that are in exec.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--