Re: remove zero_page (was Re: -mm merge plans for 2.6.24)

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Hugh Dickins <hugh@...>
Cc: Nick Piggin <nickpiggin@...>, Andrew Morton <akpm@...>, <linux-mm@...>, <linux-kernel@...>
Date: Wednesday, October 10, 2007 - 1:20 am

On Wed, 10 Oct 2007, Hugh Dickins wrote:


The problem is, those first "remove ref-counting" patches were ugly 
*regardless* of ZERO_PAGE.

We (yes, largely I) fixed up the mess since. The whole vm_normal_page() 
and the magic PFN_REMAP thing got rid of a lot of the problems.

And I bet that we could do something very similar wrt the zero page too.

Basically, the ZERO page could act pretty much exactly like a PFN_REMAP 
page: the VM would not touch it. No rmap, no page refcounting, no nothing.

This following patch is not meant to be even half-way correct (it's not 
even _remotely_ tested), but is just meant to be a rough "grep for 
ZERO_PAGE in the VM, and see what happens if you don't ref-count it".

Would something like the below work? I dunno. But I suspect it would. I 
doubt anybody has the energy to actually try to actually follow through on 
it, which is why I'm not pushing on it any more, and why I'll accept 
Nick's patch to just remove ZERO_PAGE, but I really *am* very unhappy 
about this.

The "page refcounting cleanups" in the VM back when were really painful. 
And dammit, I felt like I was the one who had to clean them up after you 
guys. Which makes me really testy on this subject.

And yes, I also admit that the vm_normal_page() and the PFN_REMAP thing 
ended up really improving the VM, and we're pretty much certainly better 
off now than we were before - but I also think that ZERO_PAGE etc could 
easily be handled with the same model. After all, if we can make 
"mmap(/dev/mem)" work with COW and everything, I'd argue that ZERO_PAGE 
really is just a very very small special case of that!

Totally half-assed untested patch to follow, not meant for anything but a 
"I think this kind of approach should have worked too" comment.

So I'm not pushing the patch below, I'm just fighting for people realizing 
that

 - the kernel has *always* (since pretty much day 1) done that ZERO_PAGE 
   thing. This means that I would not be at all surprised if some 
   application basically depends on it. I've written test-programs that 
   depends on it - maybe people have written other code that basically has 
   been written for and tested with a kernel that has basically always 
   made read-only zero pages extra cheap.

   So while it may be true that removing ZERO_PAGE won't affect anybody, I 
   don't think it's a given, and I also don't think it's sane calling 
   people "crazy" for depending on something that has always been true 
   under Linux for the last 15+ years. There are few behaviors that have 
   been around for that long.

 - make sure the commit message is accurate as to need for this (ie not 
   claim that the ZERO_PAGE itself was the problem, and give some actual 
   performance numbers on what is going on)

that's all.

		Linus

---
 mm/memory.c  |   17 ++++++++---------
 mm/migrate.c |    2 +-
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index f82b359..0a8cc88 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -386,6 +386,7 @@ static inline int is_cow_mapping(unsigned int flags)
 struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte)
 {
 	unsigned long pfn = pte_pfn(pte);
+	struct page *page;
 
 	if (unlikely(vma->vm_flags & VM_PFNMAP)) {
 		unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT;
@@ -413,7 +414,11 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_
 	 * The PAGE_ZERO() pages and various VDSO mappings can
 	 * cause them to exist.
 	 */
-	return pfn_to_page(pfn);
+	page = pfn_to_page(pfn);
+	if (PageReserved(page))
+		page = NULL;
+
+	return page;
 }
 
 /*
@@ -968,7 +973,7 @@ no_page_table:
 	if (flags & FOLL_ANON) {
 		page = ZERO_PAGE(address);
 		if (flags & FOLL_GET)
-			get_page(page);
+			page = alloc_page(GFP_KERNEL | GFP_ZERO);
 		BUG_ON(flags & FOLL_WRITE);
 	}
 	return page;
@@ -1131,9 +1136,6 @@ static int zeromap_pte_range(struct mm_struct *mm, pmd_t *pmd,
 			pte++;
 			break;
 		}
-		page_cache_get(page);
-		page_add_file_rmap(page);
-		inc_mm_counter(mm, file_rss);
 		set_pte_at(mm, addr, pte, zero_pte);
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 	arch_leave_lazy_mmu_mode();
@@ -1717,7 +1719,7 @@ gotten:
 
 	if (unlikely(anon_vma_prepare(vma)))
 		goto oom;
-	if (old_page == ZERO_PAGE(address)) {
+	if (!old_page) {
 		new_page = alloc_zeroed_user_highpage_movable(vma, address);
 		if (!new_page)
 			goto oom;
@@ -2274,15 +2276,12 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	} else {
 		/* Map the ZERO_PAGE - vm_page_prot is readonly */
 		page = ZERO_PAGE(address);
-		page_cache_get(page);
 		entry = mk_pte(page, vma->vm_page_prot);
 
 		ptl = pte_lockptr(mm, pmd);
 		spin_lock(ptl);
 		if (!pte_none(*page_table))
 			goto release;
-		inc_mm_counter(mm, file_rss);
-		page_add_file_rmap(page);
 	}
 
 	set_pte_at(mm, address, page_table, entry);
diff --git a/mm/migrate.c b/mm/migrate.c
index e2fdbce..8d2e110 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -827,7 +827,7 @@ static int do_move_pages(struct mm_struct *mm, struct page_to_node *pm,
 			goto set_status;
 
 		if (PageReserved(page))		/* Check for zero page */
-			goto put_and_set;
+			goto set_status;
 
 		pp->page = page;
 		err = page_to_nid(page);
-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
-mm merge plans for 2.6.24, Andrew Morton, (Mon Oct 1, 5:22 pm)
new aops merge [was Re: -mm merge plans for 2.6.24], Hugh Dickins, (Tue Oct 2, 12:21 pm)
x86 patches was Re: -mm merge plans for 2.6.24, Andi Kleen, (Tue Oct 2, 2:18 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Andrew Morton, (Tue Oct 2, 2:32 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Ingo Molnar, (Tue Oct 2, 3:37 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Andi Kleen, (Tue Oct 2, 3:46 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Thomas Gleixner, (Tue Oct 2, 3:58 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Andi Kleen, (Tue Oct 2, 3:01 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Andy Whitcroft, (Tue Oct 2, 5:26 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Andrew Morton, (Tue Oct 2, 3:18 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, KAMEZAWA Hiroyuki, (Tue Oct 2, 3:36 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Christoph Lameter, (Tue Oct 2, 2:16 pm)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Nish Aravamudan, (Tue Oct 2, 12:40 pm)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Andrew Morton, (Tue Oct 2, 3:43 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, KAMEZAWA Hiroyuki, (Tue Oct 2, 4:16 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Christoph Lameter, (Tue Oct 2, 2:18 pm)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Yasunori Goto, (Tue Oct 2, 6:48 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Lee Schermerhorn, (Tue Oct 2, 1:25 pm)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Lee Schermerhorn, (Tue Oct 2, 1:17 pm)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Matt Mackall, (Tue Oct 2, 3:55 am)
Re: x86 patches was Re: -mm merge plans for 2.6.24, Andi Kleen, (Tue Oct 2, 3:59 am)
Re: -mm merge plans for 2.6.24, Pekka Enberg, (Tue Oct 2, 12:12 pm)
v4l-stk11xx* [Was: -mm merge plans for 2.6.24], Jiri Slaby, (Tue Oct 2, 3:59 am)
writeback fixes, Fengguang Wu, (Tue Oct 2, 4:39 am)
Re: -mm merge plans for 2.6.24, Borislav Petkov, (Sat Oct 13, 4:44 am)
Re: -mm merge plans for 2.6.24, Andrew Morton, (Sat Oct 13, 4:52 am)
Re: -mm merge plans for 2.6.24, Borislav Petkov, (Sat Oct 13, 7:45 am)
r/o bind mounts, was Re: -mm merge plans for 2.6.24, Christoph Hellwig, (Tue Oct 9, 5:19 am)
Re: remove zero_page (was Re: -mm merge plans for 2.6.24), Linus Torvalds, (Wed Oct 3, 11:21 am)
Re: remove zero_page (was Re: -mm merge plans for 2.6.24), Linus Torvalds, (Tue Oct 9, 10:52 am)
Re: remove zero_page (was Re: -mm merge plans for 2.6.24), Linus Torvalds, (Tue Oct 9, 10:22 pm)
Re: remove zero_page (was Re: -mm merge plans for 2.6.24), Hugh Dickins, (Wed Oct 10, 12:06 am)
Re: remove zero_page (was Re: -mm merge plans for 2.6.24), Linus Torvalds, (Wed Oct 10, 1:20 am)
Re: remove zero_page (was Re: -mm merge plans for 2.6.24), Linus Torvalds, (Wed Oct 10, 11:04 am)
Re: remove zero_page (was Re: -mm merge plans for 2.6.24), Linus Torvalds, (Tue Oct 9, 11:06 pm)
wibbling over the cpuset shed domain connnection, Paul Jackson, (Mon Oct 1, 5:34 pm)
Re: wibbling over the cpuset shed domain connnection, Nick Piggin, (Tue Oct 2, 8:36 am)
Re: wibbling over the cpuset shed domain connnection, Paul Jackson, (Wed Oct 3, 1:21 am)
Re: wibbling over the cpuset shed domain connnection, Nick Piggin, (Tue Oct 2, 9:12 am)
Re: wibbling over the cpuset shed domain connnection, Paul Jackson, (Wed Oct 3, 3:00 am)
Re: wibbling over the cpuset shed domain connnection, Andrew Morton, (Wed Oct 3, 6:57 am)
per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Peter Zijlstra, (Tue Oct 2, 4:17 am)
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Martin Knoblauch, (Wed Oct 3, 7:00 am)
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Peter Zijlstra, (Fri Oct 26, 10:48 am)
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Trond Myklebust, (Fri Oct 26, 12:37 pm)
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Peter Zijlstra, (Fri Dec 14, 10:50 am)
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Miklos Szeredi, (Fri Dec 14, 11:14 am)
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Peter Zijlstra, (Fri Dec 14, 11:54 am)
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Miklos Szeredi, (Fri Oct 26, 11:06 am)
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Peter Zijlstra, (Fri Oct 26, 11:22 am)
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Peter Zijlstra, (Fri Oct 26, 11:33 am)
[PATCH] mm: sysfs: expose the BDI object in sysfs, Peter Zijlstra, (Fri Nov 2, 10:59 am)
Re: [PATCH] mm: sysfs: expose the BDI object in sysfs, Kay Sievers, (Fri Nov 2, 11:13 am)
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24), Peter Zijlstra, (Sat Oct 27, 12:07 pm)