Re: [PATCH -v2] rmap: make anon_vma_prepare link in all the anon_vmas of a mergeable VMA

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Linus Torvalds
Date: Wednesday, April 7, 2010 - 4:37 pm

On Wed, 7 Apr 2010, Linus Torvalds wrote:

Not SIGBUS, but VM_FAULT_OOM, of course.

IOW, something like this should be no worse than what we have now, and has 
the much nicer locking semantics.

Having done some more digging, I can point to a downside: we do end up 
having about twice as many anon_vma entries. It seems about half of the 
vma's never need an anon_vma entry, probably because they end up being 
read-only file mappings, and thus never trigger the anonvma case.

That said:

 - I don't really think you can fix the locking problem you have in a 
   saner way

 - the anon_vma entry is much smaller than the vm_area_struct, so we're 
   still using much less memory for them than for vma's.

 - We _could_ avoid allocating anonvma entries for shared mappings or for 
   mappings that are read-only. That might force us to allocate some of 
   them at mprotect time, and/or when doing a forced COW event with 
   ptrace, but we have the mmap_sem for writing for the one case, and we 
   could decide to get it for the other.

   So it's not a _fundamental_ problem if we decide we want to recover 
   most of the memory lost by doing unconditional allocations.

There are alternative models. For example, the VM layer _could_ decide to 
just release the mmap_sem, and re-do it and take it for writing if the vma 
doesn't have an anon_vma. 

I dunno. I like how this patch makes things so much less subtle, though.  
For example: with this in place, we could further simplify 
anon_vma_prepare(), since it would now never have the re-entrancy issue 
and wouldn't need to worry about taking that page_table_lock and 
re-testing vma->anon_vma for races.

		Linus

---
 mm/memory.c |   12 +++---------
 mm/mmap.c   |   17 ++++-------------
 2 files changed, 7 insertions(+), 22 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 833952d..b5efe76 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2223,9 +2223,6 @@ reuse:
 gotten:
 	pte_unmap_unlock(page_table, ptl);
 
-	if (unlikely(anon_vma_prepare(vma)))
-		goto oom;
-
 	if (is_zero_pfn(pte_pfn(orig_pte))) {
 		new_page = alloc_zeroed_user_highpage_movable(vma, address);
 		if (!new_page)
@@ -2766,8 +2763,6 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	/* Allocate our own private page. */
 	pte_unmap(page_table);
 
-	if (unlikely(anon_vma_prepare(vma)))
-		goto oom;
 	page = alloc_zeroed_user_highpage_movable(vma, address);
 	if (!page)
 		goto oom;
@@ -2863,10 +2858,6 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (flags & FAULT_FLAG_WRITE) {
 		if (!(vma->vm_flags & VM_SHARED)) {
 			anon = 1;
-			if (unlikely(anon_vma_prepare(vma))) {
-				ret = VM_FAULT_OOM;
-				goto out;
-			}
 			page = alloc_page_vma(GFP_HIGHUSER_MOVABLE,
 						vma, address);
 			if (!page) {
@@ -3115,6 +3106,9 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	pmd_t *pmd;
 	pte_t *pte;
 
+	if (!vma->anon_vma)
+		return VM_FAULT_OOM;
+
 	__set_current_state(TASK_RUNNING);
 
 	count_vm_event(PGFAULT);
diff --git a/mm/mmap.c b/mm/mmap.c
index 75557c6..c14284b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -463,6 +463,8 @@ static void vma_link(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	mm->map_count++;
 	validate_mm(mm);
+
+	anon_vma_prepare(vma);
 }
 
 /*
@@ -479,6 +481,8 @@ static void __insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma)
 	BUG_ON(__vma && __vma->vm_start < vma->vm_end);
 	__vma_link(mm, vma, prev, rb_link, rb_parent);
 	mm->map_count++;
+
+	anon_vma_prepare(vma);
 }
 
 static inline void
@@ -1674,12 +1678,6 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
 	if (!(vma->vm_flags & VM_GROWSUP))
 		return -EFAULT;
 
-	/*
-	 * We must make sure the anon_vma is allocated
-	 * so that the anon_vma locking is not a noop.
-	 */
-	if (unlikely(anon_vma_prepare(vma)))
-		return -ENOMEM;
 	anon_vma_lock(vma);
 
 	/*
@@ -1720,13 +1718,6 @@ static int expand_downwards(struct vm_area_struct *vma,
 {
 	int error;
 
-	/*
-	 * We must make sure the anon_vma is allocated
-	 * so that the anon_vma locking is not a noop.
-	 */
-	if (unlikely(anon_vma_prepare(vma)))
-		return -ENOMEM;
-
 	address &= PAGE_MASK;
 	error = security_file_mmap(NULL, 0, 0, 0, address, 1);
 	if (error)
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Linux 2.6.34-rc3, Linus Torvalds, (Tue Mar 30, 10:50 am)
[Regression, post-rc2] Commit a5ee4eb7541 breaks OpenGL on ..., Rafael J. Wysocki, (Tue Mar 30, 2:16 pm)
Re: [Regression, post-rc2] Commit a5ee4eb7541 breaks OpenG ..., Rafael J. Wysocki, (Wed Mar 31, 6:13 pm)
Re: [Regression, post-rc2] Commit a5ee4eb7541 breaks OpenG ..., Rafael J. Wysocki, (Thu Apr 1, 12:46 pm)
Re: [Regression, post-rc2] Commit a5ee4eb7541 breaks OpenG ..., Rafael J. Wysocki, (Sat Apr 3, 12:33 pm)
[PATCH] rmap: fix anon_vma_fork() memory leak, Rik van Riel, (Sun Apr 4, 4:09 pm)
Re: [PATCH] rmap: fix anon_vma_fork() memory leak, Minchan Kim, (Sun Apr 4, 4:56 pm)
Re: [PATCH] rmap: fix anon_vma_fork() memory leak, Linus Torvalds, (Mon Apr 5, 8:37 am)
Re: [PATCH] rmap: fix anon_vma_fork() memory leak, Minchan Kim, (Mon Apr 5, 8:48 am)
Re: [PATCH] rmap: fix anon_vma_fork() memory leak, Rik van Riel, (Mon Apr 5, 9:04 am)
[PATCH -v2] rmap: fix anon_vma_fork() memory leak, Rik van Riel, (Mon Apr 5, 9:13 am)
[No subject], Rik van Riel, (Tue Apr 6, 7:34 am)
[No subject], Rik van Riel, (Tue Apr 6, 7:38 am)
[No subject], Minchan Kim, (Tue Apr 6, 8:34 am)
[No subject], Rik van Riel, (Tue Apr 6, 8:40 am)
[No subject], Linus Torvalds, (Tue Apr 6, 8:55 am)
[No subject], Minchan Kim, (Tue Apr 6, 8:58 am)
[No subject], Minchan Kim, (Tue Apr 6, 9:23 am)
[No subject], Linus Torvalds, (Tue Apr 6, 9:28 am)
[No subject], Linus Torvalds, (Tue Apr 6, 9:32 am)
[No subject], Minchan Kim, (Tue Apr 6, 9:45 am)
[No subject], Linus Torvalds, (Tue Apr 6, 9:53 am)
[No subject], Minchan Kim, (Tue Apr 6, 9:54 am)
[No subject], Rik van Riel, (Tue Apr 6, 10:04 am)
[No subject], Borislav Petkov, (Tue Apr 6, 10:05 am)
Re: Ugly rmap NULL ptr deref oopsie on hibernate (was Linu ..., Steinar H. Gunderson, (Tue Apr 6, 12:10 pm)
Re: Ugly rmap NULL ptr deref oopsie on hibernate (was Linu ..., Steinar H. Gunderson, (Tue Apr 6, 1:46 pm)
Re: Ugly rmap NULL ptr deref oopsie on hibernate (was Linu ..., Steinar H. Gunderson, (Tue Apr 6, 2:05 pm)
Re: [PATCH -v2] rmap: make anon_vma_prepare link in all th ..., Linus Torvalds, (Wed Apr 7, 4:37 pm)
[PATCH 1/3] mm: make page freeing path RCU-safe, Borislav Petkov, (Sun Apr 11, 6:19 am)
[PATCH 2/3] mm: cleanup find_mergeable_anon_vma complexity, Borislav Petkov, (Sun Apr 11, 6:19 am)
[PATCH 3/3] mm: fixup vma_adjust, Borislav Petkov, (Sun Apr 11, 6:19 am)
[PATCH 2/3] mm: cleanup find_mergeable_anon_vma complexity, Borislav Petkov, (Sun Apr 11, 6:25 am)
[PATCH 2/4] vma_adjust: fix the copying of anon_vma chains, Linus Torvalds, (Mon Apr 12, 1:23 pm)