Re: [PATCH 4/4] anonvma: when setting up page->mapping, we need to pick the _oldest_ anonvma

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Linus Torvalds
Date: Monday, April 12, 2010 - 6:08 pm

On Tue, 13 Apr 2010, Johannes Weiner wrote:

It now looks like this..

		Linus
---
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 12 Apr 2010 12:44:29 -0700
Subject: [PATCH 4/4] anonvma: when setting up page->mapping, we need to pick the _oldest_ anonvma

Otherwise we might be mapping in a page in a new mapping, but that page
(through the swapcache) would later be mapped into an old mapping too.
The page->mapping must be the case that works for everybody, not just
the mapping that happened to page it in first.

Here's the scenario:

 - page gets allocated/mapped by process A. Let's call the anon_vma we
   associate the page with 'A' to keep it easy to track.

 - Process A forks, creating process B. The anon_vma in B is 'B', and has
   a chain that looks like 'B' -> 'A'. Everything is fine.

 - Swapping happens. The page (with mapping pointing to 'A') gets swapped
   out (perhaps not to disk - it's enough to assume that it's just not
   mapped any more, and lives entirely in the swap-cache)

 - Process B pages it in, which goes like this:

        do_swap_page ->
          page = lookup_swap_cache(entry);
         ...
          set_pte_at(mm, address, page_table, pte);
          page_add_anon_rmap(page, vma, address);

   And think about what happens here!

   In particular, what happens is that this will now be the "first"
   mapping of that page, so page_add_anon_rmap() used to do

        if (first)
                __page_set_anon_rmap(page, vma, address);

   and notice what anon_vma it will use? It will use the anon_vma for
   process B!

   What happens then? Trivial: process 'A' also pages it in (nothing
   happens, it's not the first mapping), and then process 'B' execve's
   or exits or unmaps, making anon_vma B go away.

   End result: process A has a page that points to anon_vma B, but
   anon_vma B does not exist any more.  This can go on forever.  Forget
   about RCU grace periods, forget about locking, forget anything like
   that.  The bug is simply that page->mapping points to an anon_vma
   that was correct at one point, but was _not_ the one that was shared
   by all users of that possible mapping.

Changing it to always use the deepest anon_vma in the anonvma chain gets
us to the safest model.

This can be improved in certain cases: if we know the page is private to
just this particular mapping (for example, it's a new page, or it is the
only swapcache entry), we could pick the top (most specific) anon_vma.

But that's a future optimization. Make it _work_ reliably first.

Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Borislav Petkov <bp@alien8.de> [ "What do you know, I think you fixed it!" ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/rmap.c |   15 +++++++++++++--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index ee97d38..4bad326 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -734,9 +734,20 @@ void page_move_anon_rmap(struct page *page,
 static void __page_set_anon_rmap(struct page *page,
 	struct vm_area_struct *vma, unsigned long address)
 {
-	struct anon_vma *anon_vma = vma->anon_vma;
+	struct anon_vma_chain *avc;
+	struct anon_vma *anon_vma;
+
+	BUG_ON(!vma->anon_vma);
+
+	/*
+	 * We must use the _oldest_ possible anon_vma for the page mapping!
+	 *
+	 * So take the last AVC chain entry in the vma, which is the deepest
+	 * ancestor, and use the anon_vma from that.
+	 */
+	avc = list_entry(vma->anon_vma_chain.prev, struct anon_vma_chain, same_vma);
+	anon_vma = avc->anon_vma;
 
-	BUG_ON(!anon_vma);
 	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
 	page->mapping = (struct address_space *) anon_vma;
 	page->index = linear_page_index(vma, address);
-- 
1.7.1.rc1.dirty

--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Linux 2.6.34-rc3, Linus Torvalds, (Tue Mar 30, 10:50 am)
[Regression, post-rc2] Commit a5ee4eb7541 breaks OpenGL on ..., Rafael J. Wysocki, (Tue Mar 30, 2:16 pm)
Re: [Regression, post-rc2] Commit a5ee4eb7541 breaks OpenG ..., Rafael J. Wysocki, (Wed Mar 31, 6:13 pm)
Re: [Regression, post-rc2] Commit a5ee4eb7541 breaks OpenG ..., Rafael J. Wysocki, (Thu Apr 1, 12:46 pm)
Re: [Regression, post-rc2] Commit a5ee4eb7541 breaks OpenG ..., Rafael J. Wysocki, (Sat Apr 3, 12:33 pm)
[PATCH] rmap: fix anon_vma_fork() memory leak, Rik van Riel, (Sun Apr 4, 4:09 pm)
Re: [PATCH] rmap: fix anon_vma_fork() memory leak, Minchan Kim, (Sun Apr 4, 4:56 pm)
Re: [PATCH] rmap: fix anon_vma_fork() memory leak, Linus Torvalds, (Mon Apr 5, 8:37 am)
Re: [PATCH] rmap: fix anon_vma_fork() memory leak, Minchan Kim, (Mon Apr 5, 8:48 am)
Re: [PATCH] rmap: fix anon_vma_fork() memory leak, Rik van Riel, (Mon Apr 5, 9:04 am)
[PATCH -v2] rmap: fix anon_vma_fork() memory leak, Rik van Riel, (Mon Apr 5, 9:13 am)
[No subject], Rik van Riel, (Tue Apr 6, 7:34 am)
[No subject], Rik van Riel, (Tue Apr 6, 7:38 am)
[No subject], Minchan Kim, (Tue Apr 6, 8:34 am)
[No subject], Rik van Riel, (Tue Apr 6, 8:40 am)
[No subject], Linus Torvalds, (Tue Apr 6, 8:55 am)
[No subject], Minchan Kim, (Tue Apr 6, 8:58 am)
[No subject], Minchan Kim, (Tue Apr 6, 9:23 am)
[No subject], Linus Torvalds, (Tue Apr 6, 9:28 am)
[No subject], Linus Torvalds, (Tue Apr 6, 9:32 am)
[No subject], Minchan Kim, (Tue Apr 6, 9:45 am)
[No subject], Linus Torvalds, (Tue Apr 6, 9:53 am)
[No subject], Minchan Kim, (Tue Apr 6, 9:54 am)
[No subject], Rik van Riel, (Tue Apr 6, 10:04 am)
[No subject], Borislav Petkov, (Tue Apr 6, 10:05 am)
Re: Ugly rmap NULL ptr deref oopsie on hibernate (was Linu ..., Steinar H. Gunderson, (Tue Apr 6, 12:10 pm)
Re: Ugly rmap NULL ptr deref oopsie on hibernate (was Linu ..., Steinar H. Gunderson, (Tue Apr 6, 1:46 pm)
Re: Ugly rmap NULL ptr deref oopsie on hibernate (was Linu ..., Steinar H. Gunderson, (Tue Apr 6, 2:05 pm)
[PATCH 1/3] mm: make page freeing path RCU-safe, Borislav Petkov, (Sun Apr 11, 6:19 am)
[PATCH 2/3] mm: cleanup find_mergeable_anon_vma complexity, Borislav Petkov, (Sun Apr 11, 6:19 am)
[PATCH 3/3] mm: fixup vma_adjust, Borislav Petkov, (Sun Apr 11, 6:19 am)
[PATCH 2/3] mm: cleanup find_mergeable_anon_vma complexity, Borislav Petkov, (Sun Apr 11, 6:25 am)
[PATCH 2/4] vma_adjust: fix the copying of anon_vma chains, Linus Torvalds, (Mon Apr 12, 1:23 pm)
Re: [PATCH 4/4] anonvma: when setting up page->mapping, we ..., Linus Torvalds, (Mon Apr 12, 6:08 pm)