Re: [PATCH 01 of 12] Core of mmu notifiers

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Hugh Dickins
Date: Tuesday, April 29, 2008 - 3:49 am

On Tue, 29 Apr 2008, Andrea Arcangeli wrote:

[I'm scarcely following the mmu notifiers to-and-fro, which seems
to be in good hands, amongst faster thinkers than me: who actually
need and can test this stuff.  Don't let me slow you down; but I
can quickly clarify on this history.]

No, the locking was different as you had it, Andrea: there was an extra
bitspin lock, carried over from the pte_chains days (maybe we changed
the name, maybe we disagreed over the name, I forget), which mainly
guarded the page->mapcount.  I thought that was one lock more than we
needed, and eliminated it in favour of atomic page->mapcount in 2.6.9.

Here's the relevant extracts from ChangeLog-2.6.9:

[PATCH] rmaplock: PageAnon in mapping

First of a batch of five patches to eliminate rmap's page_map_lock, replace
its trylocking by spinlocking, and use anon_vma to speed up swapoff.

Patches updated from the originals against 2.6.7-mm7: nothing new so I won't
spam the list, but including Manfred's SLAB_DESTROY_BY_RCU fixes, and omitting
the unuse_process mmap_sem fix already in 2.6.8-rc3.

This patch:

Replace the PG_anon page->flags bit by setting the lower bit of the pointer in
page->mapping when it's anon_vma: PAGE_MAPPING_ANON bit.

We're about to eliminate the locking which kept the flags and mapping in
synch: it's much easier to work on a local copy of page->mapping, than worry
about whether flags and mapping are in synch (though I imagine it could be
done, at greater cost, with some barriers).

[PATCH] rmaplock: kill page_map_lock

The pte_chains rmap used pte_chain_lock (bit_spin_lock on PG_chainlock) to
lock its pte_chains.  We kept this (as page_map_lock: bit_spin_lock on
PG_maplock) when we moved to objrmap.  But the file objrmap locks its vma tree
with mapping->i_mmap_lock, and the anon objrmap locks its vma list with
anon_vma->lock: so isn't the page_map_lock superfluous?

Pretty much, yes.  The mapcount was protected by it, and needs to become an
atomic: starting at -1 like page _count, so nr_mapped can be tracked precisely
up and down.  The last page_remove_rmap can't clear anon page mapping any
more, because of races with page_add_rmap; from which some BUG_ONs must go for
the same reason, but they've served their purpose.

vmscan decisions are naturally racy, little change there beyond removing
page_map_lock/unlock.  But to stabilize the file-backed page->mapping against
truncation while acquiring i_mmap_lock, page_referenced_file now needs page
lock to be held even for refill_inactive_zone.  There's a similar issue in
acquiring anon_vma->lock, where page lock doesn't help: which this patch
pretends to handle, but actually it needs the next.

Roughly 10% cut off lmbench fork numbers on my 2*HT*P4.  Must confess my
testing failed to show the races even while they were knowingly exposed: would
benefit from testing on racier equipment.

[PATCH] rmaplock: SLAB_DESTROY_BY_RCU

With page_map_lock gone, how to stabilize page->mapping's anon_vma while
acquiring anon_vma->lock in page_referenced_anon and try_to_unmap_anon?

The page cannot actually be freed (vmscan holds reference), but however much
we check page_mapped (which guarantees that anon_vma is in use - or would
guarantee that if we added suitable barriers), there's no locking against page
becoming unmapped the instant after, then anon_vma freed.

It's okay to take anon_vma->lock after it's freed, so long as it remains a
struct anon_vma (its list would become empty, or perhaps reused for an
unrelated anon_vma: but no problem since we always check that the page located
is the right one); but corruption if that memory gets reused for some other
purpose.

This is not unique: it's liable to be problem whenever the kernel tries to
approach a structure obliquely.  It's generally solved with an atomic
reference count; but one advantage of anon_vma over anonmm is that it does not
have such a count, and it would be a backward step to add one.

Therefore...  implement SLAB_DESTROY_BY_RCU flag, to guarantee that such a
kmem_cache_alloc'ed structure cannot get freed to other use while the
rcu_read_lock is held i.e.  preempt disabled; and use that for anon_vma.

Fix concerns raised by Manfred: this flag is incompatible with poisoning and
destructor, and kmem_cache_destroy needs to synchronize_kernel.

I hope SLAB_DESTROY_BY_RCU may be useful elsewhere; but though it's safe for
little anon_vma, I'd be reluctant to use it on any caches whose immediate
shrinkage under pressure is important to the system.

[PATCH] rmaplock: mm lock ordering

With page_map_lock out of the way, there's no need for page_referenced and
try_to_unmap to use trylocks - provided we switch anon_vma->lock and
mm->page_table_lock around in anon_vma_prepare.  Though I suppose it's
possible that we'll find that vmscan makes better progress with trylocks than
spinning - we're free to choose trylocks again if so.

Try to update the mm lock ordering documentation in filemap.c.  But I still
find it confusing, and I've no idea of where to stop.  So add an mm lock
ordering list I can understand to rmap.c.

[The fifth patch was about using anon_vma in swapoff, not relevant here.]

So, going back to what you wrote: holding the page lock there is
not enough to prevent the struct anon_vma going away beneath us.

Hugh
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH 00 of 12] mmu notifier #v13, Andrea Arcangeli, (Tue Apr 22, 6:51 am)
[PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Tue Apr 22, 6:51 am)
[PATCH 07 of 12] Add a function to rw_semaphores to check ..., Andrea Arcangeli, (Tue Apr 22, 6:51 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Eric Dumazet, (Tue Apr 22, 7:56 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Tue Apr 22, 8:15 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Avi Kivity, (Tue Apr 22, 8:24 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Eric Dumazet, (Tue Apr 22, 8:37 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Tue Apr 22, 9:46 am)
Re: [PATCH 00 of 12] mmu notifier #v13, Robin Holt, (Tue Apr 22, 11:22 am)
Re: [PATCH 00 of 12] mmu notifier #v13, Andrea Arcangeli, (Tue Apr 22, 11:43 am)
Re: [PATCH 00 of 12] mmu notifier #v13, Robin Holt, (Tue Apr 22, 12:42 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Christoph Lameter, (Tue Apr 22, 1:19 pm)
Re: [PATCH 02 of 12] Fix ia64 compilation failure because ..., Christoph Lameter, (Tue Apr 22, 1:22 pm)
Re: [PATCH 03 of 12] get_task_mm should not succeed if mmp ..., Christoph Lameter, (Tue Apr 22, 1:23 pm)
Re: [PATCH 04 of 12] Moves all mmu notifier methods outsid ..., Christoph Lameter, (Tue Apr 22, 1:24 pm)
Re: [PATCH 05 of 12] Move the tlb flushing into free_pgtab ..., Christoph Lameter, (Tue Apr 22, 1:25 pm)
Re: [PATCH 10 of 12] Convert mm_lock to use semaphores aft ..., Christoph Lameter, (Tue Apr 22, 1:26 pm)
Re: [PATCH 00 of 12] mmu notifier #v13, Christoph Lameter, (Tue Apr 22, 1:28 pm)
Re: [PATCH 00 of 12] mmu notifier #v13, Christoph Lameter, (Tue Apr 22, 1:30 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Robin Holt, (Tue Apr 22, 1:31 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Tue Apr 22, 3:35 pm)
Re: [PATCH 02 of 12] Fix ia64 compilation failure because ..., Andrea Arcangeli, (Tue Apr 22, 3:43 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Robin Holt, (Tue Apr 22, 4:07 pm)
Re: [PATCH 03 of 12] get_task_mm should not succeed if mmp ..., Christoph Lameter, (Tue Apr 22, 4:13 pm)
Re: [PATCH 04 of 12] Moves all mmu notifier methods outsid ..., Christoph Lameter, (Tue Apr 22, 4:14 pm)
Re: [PATCH 10 of 12] Convert mm_lock to use semaphores aft ..., Christoph Lameter, (Tue Apr 22, 4:19 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Christoph Lameter, (Tue Apr 22, 4:20 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Jack Steiner, (Tue Apr 22, 5:28 pm)
Re: [PATCH 00 of 12] mmu notifier #v13, Jack Steiner, (Tue Apr 22, 5:31 pm)
Re: [PATCH 00 of 12] mmu notifier #v13, Andrea Arcangeli, (Wed Apr 23, 6:33 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 6:36 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Robin Holt, (Wed Apr 23, 7:47 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 8:59 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 9:26 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 9:37 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Jack Steiner, (Wed Apr 23, 10:09 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 10:24 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 10:45 am)
Re: [PATCH 04 of 12] Moves all mmu notifier methods outsid ..., Christoph Lameter, (Wed Apr 23, 11:02 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Christoph Lameter, (Wed Apr 23, 11:09 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Christoph Lameter, (Wed Apr 23, 11:15 am)
Re: [PATCH 04 of 12] Moves all mmu notifier methods outsid ..., Andrea Arcangeli, (Wed Apr 23, 11:16 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Christoph Lameter, (Wed Apr 23, 11:19 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 11:19 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Christoph Lameter, (Wed Apr 23, 11:21 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 11:25 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Christoph Lameter, (Wed Apr 23, 11:27 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 11:34 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 11:37 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Christoph Lameter, (Wed Apr 23, 11:46 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 3:19 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed Apr 23, 11:49 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Robin Holt, (Thu Apr 24, 2:51 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Thu Apr 24, 8:39 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Thu Apr 24, 10:41 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Robin Holt, (Sat Apr 26, 6:17 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Sat Apr 26, 7:04 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Sun Apr 27, 5:27 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Christoph Lameter, (Mon Apr 28, 1:34 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Mon Apr 28, 5:10 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Christoph Lameter, (Mon Apr 28, 6:28 pm)
Re: [PATCH 01 of 12] Core of mmu notifiers, Hugh Dickins, (Tue Apr 29, 3:49 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Tue Apr 29, 6:32 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Tue Apr 29, 8:30 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Robin Holt, (Tue Apr 29, 8:50 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Tue Apr 29, 9:03 am)
Re: [PATCH 01 of 12] Core of mmu notifiers, Andrea Arcangeli, (Wed May 7, 8:00 am)