From: Andrea Arcangeli <aarcange@redhat.com>
It's mostly a matter of replacing alloc_pages with alloc_pages_vma after
introducing alloc_pages_vma. khugepaged needs special handling as the
allocation has to happen inside collapse_huge_page where the vma is known and
an error has to be returned to the outer loop to sleep alloc_sleep_millisecs in
case of failure. But it retains the more efficient logic of handling allocation
failures in khugepaged in case of CONFIG_NUMA=n.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -330,14 +330,17 @@ alloc_pages(gfp_t gfp_mask, unsigned int
{
return alloc_pages_current(gfp_mask, order);
}
-extern struct page *alloc_page_vma(gfp_t gfp_mask,
+extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
struct vm_area_struct *vma, unsigned long addr);
#else
#define alloc_pages(gfp_mask, order) \
alloc_pages_node(numa_node_id(), gfp_mask, order)
-#define alloc_page_vma(gfp_mask, vma, addr) alloc_pages(gfp_mask, 0)
+#define alloc_pages_vma(gfp_mask, order, vma, addr) \
+ alloc_pages(gfp_mask, order)
#endif
#define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
+#define alloc_page_vma(gfp_mask, vma, addr) \
+ alloc_pages_vma(gfp_mask, 0, vma, addr)
extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
extern unsigned long get_zeroed_page(gfp_t gfp_mask);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -620,11 +620,26 @@ static int __do_huge_pmd_anonymous_page(
return ret;
}
+static inline gfp_t alloc_hugepage_gfpmask(int defrag)
+{
+ return GFP_TRANSHUGE & ~(defrag ? 0 : __GFP_WAIT);
+}
+
+static inline struct page *alloc_hugepage_vma(int defrag,
+ struct vm_area_struct *vma,
+ unsigned long haddr)
+{
+ return alloc_pages_vma(alloc_hugepage_gfpmask(defrag),
+ HPAGE_PMD_ORDER, vma, ...I think this should be:
if (unlikely(mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL))) {
#ifdef CONFIG_NUMA
put_page(new_page);
#endif
goto out;
}
Thanks,
--
Hmm no, the change you suggest would generate memory corruption with use after free. --
On Mon, 29 Nov 2010 17:11:03 +0100
I'm sorry if I miss something, "new_page" will be reused in !CONFIG_NUMA case
as you say, but, in CONFIG_NUMA case, it is allocated in this function
(collapse_huge_page()) by alloc_hugepage_vma(), and is not freed when memcg's
charge failed.
Actually, we do in collapse_huge_page():
if (unlikely(!isolated)) {
...
#ifdef CONFIG_NUMA
put_page(new_page);
#endif
goto out;
}
later. I think we need a similar logic in memcg's failure path too.
Thanks,
Daisuke Nishimura.
--
Apologies, you really found a minor memleak in case of memcg accounting failure. diff --git a/mm/huge_memory.c b/mm/huge_memory.c --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1726,7 +1726,7 @@ static void collapse_huge_page(struct mm } #endif if (unlikely(mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL))) - goto out; + goto out_put_page; anon_vma_lock(vma->anon_vma); @@ -1755,10 +1755,7 @@ static void collapse_huge_page(struct mm spin_unlock(&mm->page_table_lock); anon_vma_unlock(vma->anon_vma); mem_cgroup_uncharge_page(new_page); -#ifdef CONFIG_NUMA - put_page(new_page); -#endif - goto out; + goto out_put_page; } /* @@ -1799,6 +1796,13 @@ static void collapse_huge_page(struct mm khugepaged_pages_collapsed++; out: up_write(&mm->mmap_sem); + return; + +out_put_page: +#ifdef CONFIG_NUMA + put_page(new_page); +#endif + goto out; } static int khugepaged_scan_pmd(struct mm_struct *mm, I was too optimistic that there wasn't really a bug, I thought it was some confusion about the hpage usage that differs with numa and not numa. On a side note, the CONFIG_NUMA case will later change further to move the allocation out of the mmap_sem write mode to make the fs submitting I/O from userland and doing memory allocations in the I/O paths happier. --
