[PATCH] hugepage: support ZERO_PAGE()

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: KOSAKI Motohiro
Date: Monday, September 1, 2008 - 6:21 pm

CCed Mel Golman


Adam, Thank you precious explain.

Honestly, I can't imazine non-zero-page-support cause terrible things.
Can you explain when happend the terrible things?
I don't know its problem is big issue or not.


Anyway, I made hugepage's zero page patch.
Could you please see it?



=======================================================================================
Subject: hugepage: supoort ZERO_PAGE()

Now, hugepage doesn't use zero page at all because zero page is almost used for coredumping only
and it isn't supported ago.

But now, we implemented hugepage coredumping and we should implement the zero page of hugepage.
The patch do that.


Implementation note:
-------------------------------------------------------------
o Why do we only check VM_SHARED for zero page?
  normal page checked as ..

	static inline int use_zero_page(struct vm_area_struct *vma)
	{
	        if (vma->vm_flags & (VM_LOCKED | VM_SHARED))
	                return 0;
	
	        return !vma->vm_ops || !vma->vm_ops->fault;
	}

First, hugepages never mlock()ed. we don't need concern to VM_LOCKED.

Second, hugetlbfs is pseudo filesystem, not real filesystem and it doesn't have any file backing.
Then, ops->fault checking is meaningless.


o Why don't we use zero page if !pte.

!pte indicate {pud, pmd} doesn't exist or any error happend.
So, We shouldn't return zero page if any error happend.



test method
-------------------------------------------------------
console 1:

	# su
	# echo 100 >/proc/sys/vm/nr_hugepages
	# mount -t hugetlbfs none /hugetlbfs/
	# watch -n1 cat /proc/meminfo

console 2:
	% gcc -g -Wall crash_hugepage.c -o crash_hugepage -lhugetlbfs
	% ulimit -c unlimited
	% echo 0x23 >/proc/self/coredump_filter
	% HUGETLB_MORECORE=yes ./crash_hugepage 50
		-> segmentation fault
	% gdb

crash_hugepage.c
----------------------
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

#define HUGEPAGE_SIZE (2*1024*1024)

int main(int argc, char** argv){
	char* p;

	p = malloc( atoi(argv[1]) * HUGEPAGE_SIZE);
	sleep(2);

	*(p + HUGEPAGE_SIZE) = 1;
	sleep(2);

	*(int*)0 = 1;

	return 0;
}
--------------------------------


Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Adam Litke <agl@us.ibm.com>
CC: Hugh Dickins <hugh@veritas.com>
CC: Kawai Hidehiro <hidehiro.kawai.ez@hitachi.com>
CC: William Irwin <wli@holomorphy.com>
CC: Mel Gorman <mel@skynet.ie>

---
 include/linux/hugetlb.h |    6 ++++--
 mm/hugetlb.c            |   29 +++++++++++++++++++++++++----
 mm/memory.c             |    3 ++-
 3 files changed, 31 insertions(+), 7 deletions(-)

Index: b/mm/hugetlb.c
===================================================================
--- a/mm/hugetlb.c	2008-08-31 01:57:36.000000000 +0900
+++ b/mm/hugetlb.c	2008-09-02 08:39:31.000000000 +0900
@@ -2022,15 +2022,30 @@ follow_huge_pud(struct mm_struct *mm, un
 	return NULL;
 }
 
+static int huge_zeropage_ok(pte_t *ptep, int write, int shared)
+{
+	if (!ptep)
+		return 0;
+
+	if (write)
+		return 0;
+
+	if (shared)
+		return 0;
+
+	return huge_pte_none(huge_ptep_get(ptep));
+}
+
 int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			struct page **pages, struct vm_area_struct **vmas,
 			unsigned long *position, int *length, int i,
-			int write)
+			int write, int shared)
 {
 	unsigned long pfn_offset;
 	unsigned long vaddr = *position;
 	int remainder = *length;
 	struct hstate *h = hstate_vma(vma);
+	int zeropage_ok = 0;
 
 	spin_lock(&mm->page_table_lock);
 	while (vaddr < vma->vm_end && remainder) {
@@ -2043,8 +2058,11 @@ int follow_hugetlb_page(struct mm_struct
 		 * first, for the page indexing below to work.
 		 */
 		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
+		if (huge_zeropage_ok(pte, write, shared))
+			zeropage_ok = 1;
 
-		if (!pte || huge_pte_none(huge_ptep_get(pte)) ||
+		if (!pte ||
+		    (huge_pte_none(huge_ptep_get(pte)) && !zeropage_ok) ||
 		    (write && !pte_write(huge_ptep_get(pte)))) {
 			int ret;
 
@@ -2061,11 +2079,14 @@ int follow_hugetlb_page(struct mm_struct
 		}
 
 		pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT;
-		page = pte_page(huge_ptep_get(pte));
+		if (zeropage_ok)
+			page = ZERO_PAGE(0);
+		else
+			page = pte_page(huge_ptep_get(pte));
 same_page:
 		if (pages) {
 			get_page(page);
-			pages[i] = page + pfn_offset;
+			pages[i] = page + (zeropage_ok ? 0 : pfn_offset);
 		}
 
 		if (vmas)
Index: b/include/linux/hugetlb.h
===================================================================
--- a/include/linux/hugetlb.h	2008-09-02 08:05:46.000000000 +0900
+++ b/include/linux/hugetlb.h	2008-09-02 08:40:46.000000000 +0900
@@ -21,7 +21,9 @@ int hugetlb_sysctl_handler(struct ctl_ta
 int hugetlb_overcommit_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
 int hugetlb_treat_movable_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
 int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *);
-int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, int *, int, int);
+int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
+			struct page **, struct vm_area_struct **,
+			unsigned long *, int *, int, int, int);
 void unmap_hugepage_range(struct vm_area_struct *,
 			unsigned long, unsigned long, struct page *);
 void __unmap_hugepage_range(struct vm_area_struct *,
@@ -74,7 +76,7 @@ static inline unsigned long hugetlb_tota
 	return 0;
 }
 
-#define follow_hugetlb_page(m,v,p,vs,a,b,i,w)	({ BUG(); 0; })
+#define follow_hugetlb_page(m, v, p, vs, a, b, i, w, s)	({ BUG(); 0; })
 #define follow_huge_addr(mm, addr, write)	ERR_PTR(-EINVAL)
 #define copy_hugetlb_page_range(src, dst, vma)	({ BUG(); 0; })
 #define hugetlb_prefault(mapping, vma)		({ BUG(); 0; })
Index: b/mm/memory.c
===================================================================
--- a/mm/memory.c	2008-08-30 11:31:53.000000000 +0900
+++ b/mm/memory.c	2008-09-02 08:41:12.000000000 +0900
@@ -1208,7 +1208,8 @@ int __get_user_pages(struct task_struct 
 
 		if (is_vm_hugetlb_page(vma)) {
 			i = follow_hugetlb_page(mm, vma, pages, vmas,
-						&start, &len, i, write);
+						&start, &len, i, write,
+						vma->vm_flags & VM_SHARED);
 			continue;
 		}
 



--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH] coredump_filter: add hugepage core dumping, KOSAKI Motohiro, (Wed Aug 27, 10:24 pm)
some problem with &quot;git pull&quot;, Peter Teoh, (Wed Aug 27, 10:38 pm)
Re: [PATCH] coredump_filter: add hugepage core dumping, KOSAKI Motohiro, (Thu Aug 28, 7:59 am)
Re: [PATCH] coredump_filter: add hugepage core dumping, Hugh Dickins, (Thu Aug 28, 9:38 am)
Re: [PATCH] coredump_filter: add hugepage core dumping, KOSAKI Motohiro, (Thu Aug 28, 4:35 pm)
Re: [PATCH] coredump_filter: add hugepage core dumping, Hidehiro Kawai, (Sun Aug 31, 11:00 pm)
[PATCH] hugepage: support ZERO_PAGE(), KOSAKI Motohiro, (Mon Sep 1, 6:21 pm)
Re: [PATCH] coredump_filter: add hugepage core dumping, KOSAKI Motohiro, (Mon Sep 1, 7:18 pm)
Re: [PATCH] hugepage: support ZERO_PAGE(), Mel Gorman, (Tue Sep 2, 7:22 am)
Re: [PATCH] hugepage: support ZERO_PAGE(), Mel Gorman, (Tue Sep 2, 8:13 am)
Re: [PATCH] hugepage: support ZERO_PAGE(), Mel Gorman, (Tue Sep 2, 9:27 am)
Re: [PATCH] hugepage: support ZERO_PAGE(), Adam Litke, (Tue Sep 2, 10:27 am)
Re: [PATCH] coredump_filter: add hugepage core dumping, KOSAKI Motohiro, (Fri Sep 5, 1:06 am)
Re: [PATCH] coredump_filter: add hugepage core dumping, Hidehiro Kawai, (Sun Sep 7, 6:51 pm)
Re: [PATCH] coredump_filter: add hugepage core dumping, KOSAKI Motohiro, (Tue Sep 9, 4:20 am)
Re: [PATCH] coredump_filter: add hugepage core dumping, Roland McGrath, (Tue Sep 9, 11:04 pm)
Re: [PATCH] coredump_filter: add hugepage core dumping, KOSAKI Motohiro, (Tue Sep 9, 11:53 pm)