Hi Andrew, Here is a restacked version of the grouping pages by mobility patches based on the patches currently in your tree. It should be a drop-in replacement for what is in 2.6.23-rc4-mm1 and is what I propose for merging to mainline. The change from what you have already is that the redundant patches are removed. For example, the patches that made grouping pages by mobility configurable and later removed that ability do not exist in this set. Simiarly, the patches for grouping high-order atomic allocations together does not exist. Also note that the first patch related to IA-64 in this set appears unrelated but it's required by patches and having the change at the start makes the patchset more comprehensible in terms of dependencies. This rebasing work is largely the work of Andy Whitcroft. Thanks Andy. The patches replaced in -mm are as ...
Subject: ia64: parse kernel parameter hugepagesz= in early boot Parse hugepagesz with early_param() instead of __setup(). __setup() is called after the memory allocator has been initialised and the pageblock bitmaps already setup. In tests on one IA64 there did not seem to be any problem with using early_param() and in fact may be more correct as it guarantees the parameter is handled before the parsing of hugepages=. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Andy Whitcroft <apw@shadowen.org> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/ia64/Kconfig | 5 +++++ arch/ia64/mm/hugetlbpage.c | 4 ++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc5-clean/arch/ia64/Kconfig linux-2.6.23-rc5-001-ia64-parse-kernel-parameter-hugepagesz=-in-early-boot/arch/ia64/Kconfig --- linux-2.6.23-rc5-clean/arch/ia64/Kconfig 2007-09-01 07:08:24.000000000 +0100 +++ linux-2.6.23-rc5-001-ia64-parse-kernel-parameter-hugepagesz=-in-early-boot/arch/ia64/Kconfig 2007-09-02 16:18:48.000000000 +0100 @@ -54,6 +54,11 @@ config ARCH_HAS_ILOG2_U64 bool default n +config HUGETLB_PAGE_SIZE_VARIABLE + bool + depends on HUGETLB_PAGE + default y + config GENERIC_FIND_NEXT_BIT bool default y diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc5-clean/arch/ia64/mm/hugetlbpage.c linux-2.6.23-rc5-001-ia64-parse-kernel-parameter-hugepagesz=-in-early-boot/arch/ia64/mm/hugetlbpage.c --- linux-2.6.23-rc5-clean/arch/ia64/mm/hugetlbpage.c 2007-09-01 07:08:24.000000000 +0100 +++ linux-2.6.23-rc5-001-ia64-parse-kernel-parameter-hugepagesz=-in-early-boot/arch/ia64/mm/hugetlbpage.c 2007-09-02 16:18:48.000000000 +0100 @@ -194,6 +194,6 @@ static int __init hugetlb_setup_sz(char * override here with new page shift. */ ia64_set_rr(HPAGE_REGION_BASE, hpage_shift << 2); - return 1; + return 0; } -__setup("hugepagesz=", ...
Subject: Add a bitmap that is used to track flags affecting a block of pages
The grouping pages by mobility patchset needs to track if pages within a block
can be moved or reclaimed so that pages are freed to the appropriate list.
This patch adds a bitmap for flags affecting a whole a pageblock_nr_pages
block of pages.
In non-SPARSEMEM configurations, the bitmap is stored in the struct zone
and allocated during initialisation. SPARSEMEM dynamically allocates the
bitmap in a struct mem_section as required.
Additional credit to Andy Whitcroft who reviewed up an earlier implementation
of the mechanism an suggested how to make it a *lot* cleaner.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/mmzone.h | 13 +++
include/linux/pageblock-flags.h | 74 ++++++++++++++++++
mm/page_alloc.c | 137 +++++++++++++++++++++++++++++++++++
3 files changed, 224 insertions(+)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc5-001-ia64-parse-kernel-parameter-hugepagesz=-in-early-boot/include/linux/mmzone.h linux-2.6.23-rc5-002-add-a-bitmap-that-is-used-to-track-flags-affecting-a-block-of-pages/include/linux/mmzone.h
--- linux-2.6.23-rc5-001-ia64-parse-kernel-parameter-hugepagesz=-in-early-boot/include/linux/mmzone.h 2007-09-02 16:18:27.000000000 +0100
+++ linux-2.6.23-rc5-002-add-a-bitmap-that-is-used-to-track-flags-affecting-a-block-of-pages/include/linux/mmzone.h 2007-09-02 16:19:05.000000000 +0100
@@ -13,6 +13,7 @@
#include <linux/init.h>
#include <linux/seqlock.h>
#include <linux/nodemask.h>
+#include <linux/pageblock-flags.h>
#include <asm/atomic.h>
#include <asm/page.h>
@@ -222,6 +223,14 @@ struct zone {
#endif
struct free_area free_area[MAX_ORDER];
+#ifndef CONFIG_SPARSEMEM
+ /*
+ * Flags for a pageblock_nr_pages block. See pageblock-flags.h.
+ * In SPARSEMEM, this map is stored in struct mem_section
+ ...Subject: Fix corruption of memmap on ia64-sparsemem when mem_section is not a power of 2 There are problems in the use of SPARSEMEM and pageblock flags that causes problems on ia64. The first part of the problem is that units are incorrect in SECTION_BLOCKFLAGS_BITS computation. This results in a map_section's section_mem_map being treated as part of a bitmap which isn't good. This was evident with an invalid virtual address when mem_init attempted to free bootmem pages while relinquishing control from the bootmem allocator. The second part of the problem occurs because the pageblock flags bitmap is be located with the mem_section. The SECTIONS_PER_ROOT computation using sizeof (mem_section) may not be a power of 2 depending on the size of the bitmap. This renders masks and other such things not power of 2 base. This issue was seen with SPARSEMEM_EXTREME on ia64. This patch moves the bitmap outside of mem_section and uses a pointer instead in the mem_section. The bitmaps are allocated when the section is being initialised. Note that sparse_early_usemap_alloc() does not use alloc_remap() like sparse_early_mem_map_alloc(). The allocation required for the bitmap on x86, the only architecture that uses alloc_remap is typically smaller than a cache line. alloc_remap() pads out allocations to the cache size which would be a needless waste. Credit to Bob Picco for identifying the original problem and effecting a fix for the SECTION_BLOCKFLAGS_BITS calculation. Credit to Andy Whitcroft for devising the best way of allocating the bitmaps only when required for the section. From: Bob Picco <bob.picco@hp.com> [wli@holomorphy.com: warning fix] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andy Whitcroft <apw@shadowen.org> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: William Irwin <bill.irwin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/mmzone.h | 4 ++- mm/sparse.c | 54 ...
Subject: Split the free lists for movable and unmovable allocations This patch adds the core of the fragmentation reduction strategy. It works by grouping pages together based on their ability to move. Basically, it works by breaking the list in zone->free_area list into MIGRATE_TYPES number of lists. Mobility grouping works at an abitrary order less than or equal to MAX_ORDER. Generally this is a fixed sized defined at compile time. However, on platforms like ia64 where the huge page size is runtime configurable it is desirable to group at a this order. On x86_64 and occasionally on x86, the hugepage size may not always be MAX_ORDER_NR_PAGES. This patch groups pages together based on the value of HUGETLB_PAGE_ORDER. It uses a compile-time constant if possible and a variable where the huge page size is runtime configurable. It is assumed that grouping should be done at the lowest sensible order and that the user would not want to override this. If this is not true, page_block order could be forced to a variable initialised via a boot-time kernel parameter. Note that many allocations are already flagged as __GFP_MOVABLE which is re-used by this patch to determine how pages should be grouped. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Andy Whitcroft <apw@shadowen.org> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/mmzone.h | 10 ++ include/linux/pageblock-flags.h | 1 mm/page_alloc.c | 143 +++++++++++++++++++++++++++++------ 3 files changed, 129 insertions(+), 25 deletions(-) diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc5-003-fix-corruption-of-memmap-on-ia64-sparsemem-when-mem_section-is-not-a-power-of-2/include/linux/mmzone.h linux-2.6.23-rc5-004-split-the-free-lists-for-movable-and-unmovable-allocations/include/linux/mmzone.h --- ...
Subject: Choose pages from the per cpu list-based on migration type The freelists for each migrate type can slowly become polluted due to the per-cpu list. Consider what happens when the following happens 1. A 2^pageblock_order list is reserved for __GFP_MOVABLE pages 2. An order-0 page is allocated from the newly reserved block 3. The page is freed and placed on the per-cpu list 4. alloc_page() is called with GFP_KERNEL as the gfp_mask 5. The per-cpu list is used to satisfy the allocation This results in a kernel page is in the middle of a migratable region. This patch prevents this leak occuring by storing the MIGRATE_ type of the page in page->private. On allocate, a page will only be returned of the desired type, else more pages will be allocated. This may temporarily allow a per-cpu list to go over the pcp->high limit but it'll be corrected on the next free. Care is taken to preserve the hotness of pages recently freed. The additional code is not measurably slower for the workloads we've tested. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/page_alloc.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc5-004-split-the-free-lists-for-movable-and-unmovable-allocations/mm/page_alloc.c linux-2.6.23-rc5-005-choose-pages-from-the-per-cpu-list-based-on-migration-type/mm/page_alloc.c --- linux-2.6.23-rc5-004-split-the-free-lists-for-movable-and-unmovable-allocations/mm/page_alloc.c 2007-09-02 16:19:34.000000000 +0100 +++ linux-2.6.23-rc5-005-choose-pages-from-the-per-cpu-list-based-on-migration-type/mm/page_alloc.c 2007-09-02 16:20:09.000000000 +0100 @@ -757,7 +757,8 @@ static int rmqueue_bulk(struct zone *zon struct page *page = __rmqueue(zone, order, migratetype); if (unlikely(page == NULL)) break; - list_add_tail(&page->lru, list); + list_add(&page->lru, list); + set_page_private(page, migratetype); ...
On Mon, 10 Sep 2007 12:21:51 +0100 (IST) We're doing a linear search through the per-cpu magaznines right there in the page allocator hot path. Even if the search matches the first element, the setup costs will matter. Surely we can make this search go away with a better choice of data --
I have a patch that expands the per-cpu structure and eliminates the search and I made various attempts at reducing the setup cost (e.g. checking if the first element suited before starting the search). However, I wasn't been able to show for definite it made anything faster but it did increase -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
Subject: Group short-lived and reclaimable kernel allocations
This patch marks a number of allocations that are either short-lived such as
network buffers or are reclaimable such as inode allocations. When something
like updatedb is called, long-lived and unmovable kernel allocations tend to
be spread throughout the address space which increases fragmentation.
This patch groups these allocations together as much as possible by adding a
new MIGRATE_TYPE. The MIGRATE_RECLAIMABLE type is for allocations that can be
reclaimed on demand, but not moved. i.e. they can be migrated by deleting
them and re-reading the information from elsewhere.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/buffer.c | 3 ++-
fs/jbd/journal.c | 4 ++--
fs/jbd/revoke.c | 6 ++++--
fs/proc/base.c | 13 +++++++------
fs/proc/generic.c | 2 +-
include/linux/gfp.h | 15 ++++++++++++---
include/linux/mmzone.h | 5 +++--
include/linux/pageblock-flags.h | 2 +-
include/linux/slab.h | 4 +++-
kernel/cpuset.c | 2 +-
lib/radix-tree.c | 6 ++++--
mm/page_alloc.c | 10 +++++++---
mm/shmem.c | 4 ++--
mm/slab.c | 2 ++
mm/slub.c | 3 +++
15 files changed, 54 insertions(+), 27 deletions(-)
Index: linux-2.6.23-rc4-mm1-redropped/fs/buffer.c
===================================================================
--- linux-2.6.23-rc4-mm1-redropped.orig/fs/buffer.c 2007-09-09 18:23:34.000000000 +0100
+++ linux-2.6.23-rc4-mm1-redropped/fs/buffer.c 2007-09-09 18:26:16.000000000 +0100
@@ -3100,7 +3100,8 @@
struct buffer_head *alloc_buffer_head(gfp_t gfp_flags)
{
- struct buffer_head *ret = ...Minor nit, Mel.
It's easier to read patches if you use the diff -p option:
-p --show-c-function
Show which C function each change is in.
Thanks.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
-
That's a fair comment. I normally make sure it's there but it got missed in a few patches in this set which is awkward. Sorry about that. -- Mel Gorman -
Subject: Drain per-cpu lists when high-order allocations fail
Per-cpu pages can accidentally cause fragmentation because they are free, but
pinned pages in an otherwise contiguous block. When this patch is applied,
the per-cpu caches are drained after the direct-reclaim is entered if the
requested order is greater than 0. It simply reuses the code used by suspend
and hotplug.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc5-006-group-short-lived-and-reclaimable-kernel-allocations/mm/page_alloc.c linux-2.6.23-rc5-007-drain-per-cpu-lists-when-high-order-allocations-fail/mm/page_alloc.c
--- linux-2.6.23-rc5-006-group-short-lived-and-reclaimable-kernel-allocations/mm/page_alloc.c 2007-09-02 16:20:31.000000000 +0100
+++ linux-2.6.23-rc5-007-drain-per-cpu-lists-when-high-order-allocations-fail/mm/page_alloc.c 2007-09-02 16:20:48.000000000 +0100
@@ -852,6 +852,7 @@ void mark_free_pages(struct zone *zone)
}
spin_unlock_irqrestore(&zone->lock, flags);
}
+#endif /* CONFIG_PM */
/*
* Spill all of this CPU's per-cpu pages back into the buddy allocator.
@@ -864,7 +865,25 @@ void drain_local_pages(void)
__drain_pages(smp_processor_id());
local_irq_restore(flags);
}
-#endif /* CONFIG_HIBERNATION */
+
+void smp_drain_local_pages(void *arg)
+{
+ drain_local_pages();
+}
+
+/*
+ * Spill all the per-cpu pages from all CPUs back into the buddy allocator
+ */
+void drain_all_local_pages(void)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ __drain_pages(smp_processor_id());
+ local_irq_restore(flags);
+
+ smp_call_function(smp_drain_local_pages, NULL, 0, 1);
+}
/*
* Free a 0-order page
@@ -1452,6 +1471,9 @@ nofail_alloc:
cond_resched();
+ if (order != 0)
+ drain_all_local_pages();
+
if (likely(did_some_progress)) ...Does this help? I have a more general version which could go in -
Yes, it does help. It's noticable when one is trying to get as much memory in hugepages as possible. It reaches a certain point where hugepages are free but pinned due to per-cpu pages. This "certain point" depends on the number of CPUs as a ratio to the size of physical memory as well as a certain degree of randomness as the location of per-cpu pages is not predictable. Worst case is not being able to allocate something like (NR_CPUS * pcp->high * 2) hugepages even if they are otherwise free. By all means if you have a general version, send it and I'll take a look. If it's more general and nicer but still can be used to drain the per-cpu lists when high-order allocations fail, I'm all for it. -
Subject: Move free pages between lists on steal
When a fallback is forced to steal a page from a block of a different
type and more than half of the block is free reassign that block to the
new type and move the free pages over to the new type's free lists.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
[y-goto@jp.fujitsu.com: fix BUG_ON check at move_freepages()]
[apw@shadowen.org: Move to using pfn_valid_within()]
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andy Whitcroft <andyw@uk.ibm.com>
Cc: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 70 insertions(+), 2 deletions(-)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc5-007-drain-per-cpu-lists-when-high-order-allocations-fail/mm/page_alloc.c linux-2.6.23-rc5-008-move-free-pages-between-lists-on-steal/mm/page_alloc.c
--- linux-2.6.23-rc5-007-drain-per-cpu-lists-when-high-order-allocations-fail/mm/page_alloc.c 2007-09-02 16:20:48.000000000 +0100
+++ linux-2.6.23-rc5-008-move-free-pages-between-lists-on-steal/mm/page_alloc.c 2007-09-02 16:21:09.000000000 +0100
@@ -662,6 +662,72 @@ static int fallbacks[MIGRATE_TYPES][MIGR
[MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE },
};
+/*
+ * Move the free pages in a range to the free lists of the requested type.
+ * Note that start_page and end_pages are not aligned on a pageblock
+ * boundary. If alignment is required, use move_freepages_block()
+ */
+int move_freepages(struct zone *zone,
+ struct page *start_page, struct page *end_page,
+ int migratetype)
+{
+ struct page *page;
+ unsigned long order;
+ int blocks_moved = 0;
+
+#ifndef CONFIG_HOLES_IN_ZONE
+ /*
+ * page_zone is not safe to call in this context when
+ * CONFIG_HOLES_IN_ZONE is set. This bug check is probably redundant
+ * ...Subject: Do not group pages by mobility type on low memory systems
Where there are fewer than one pageblock in the system per mobility
type mixing is inevitable and any attempt to prevent it will fail
in a costly manner. This patch checks the size of vm_total_pages in
build_all_zonelists(). If there are not enough areas, mobility is effectivly
disabled by considering all allocations as the same type (UNMOVABLE).
This is achived via a __read_mostly flag.
This patch removes any need to disable grouping pages by mobility at
compile time.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 25 ++++++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc5-008-move-free-pages-between-lists-on-steal/mm/page_alloc.c linux-2.6.23-rc5-009-do-not-group-pages-by-mobility-type-on-low-memory-systems/mm/page_alloc.c
--- linux-2.6.23-rc5-008-move-free-pages-between-lists-on-steal/mm/page_alloc.c 2007-09-02 16:21:09.000000000 +0100
+++ linux-2.6.23-rc5-009-do-not-group-pages-by-mobility-type-on-low-memory-systems/mm/page_alloc.c 2007-09-02 16:21:30.000000000 +0100
@@ -154,8 +154,13 @@ int nr_node_ids __read_mostly = MAX_NUMN
EXPORT_SYMBOL(nr_node_ids);
#endif
+int page_group_by_mobility_disabled __read_mostly;
+
static inline int get_pageblock_migratetype(struct page *page)
{
+ if (unlikely(page_group_by_mobility_disabled))
+ return MIGRATE_UNMOVABLE;
+
return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
}
@@ -169,6 +174,10 @@ static inline int allocflags_to_migratet
{
WARN_ON((gfp_flags & GFP_MOVABLE_MASK) == GFP_MOVABLE_MASK);
+ if (unlikely(page_group_by_mobility_disabled))
+ return MIGRATE_UNMOVABLE;
+
+ /* Cluster based on mobility */
return (((gfp_flags & __GFP_MOVABLE) != 0) << 1) |
((gfp_flags & __GFP_RECLAIMABLE) != 0);
}
@@ -2294,9 ...Subject: Bias the location of pages freed for min_free_kbytes in the same pageblock_nr_pages areas The standard buddy allocator always favours splitting the smallest block of pages. The effect of this is that the pages free to satisfy min_free_kbytes tends to be preserved since boot time at the same location of memory for a very long time, remaining contiguous. When an administrator sets the reserve at 16384 at boot time, it tends to be the same MAX_ORDER blocks that remain free. This allows the occasional high atomic allocation to succeed up until the point the blocks are split. In practice, it is difficult to split these blocks but when they do split, the benefit of having min_free_kbytes for contiguous blocks disappears. Additionally, increasing min_free_kbytes once the system has been running for some time has no guarantee of creating contiguous blocks. On the other hand, grouping pages by mobility favours the splitting of large blocks when there are no free pages of the appropriate type available. A side-effect of this is that all blocks in memory tends to be used up and the contiguous free blocks from boot time are not preserved like in the vanilla allocator. This can cause a problem if a new caller is unwilling to reclaim or does not reclaim for long enough. A failure scenario was found for a wireless network device allocating order-1 atomic allocations but the allocations were not intense or frequent enough for a whole block of pages to be preserved for MIGRATE_HIGHALLOC. This was reproduced on a desktop by booting with mem=256mb, forcing the driver to allocate at order-1, running a bittorrent client (downloading a debian ISO) and building a kernel with -j2. This patch addresses the problem on the desktop machine booted with mem=256mb. It works by setting aside a reserve of pageblock_nr_pages blocks, the number of which depends on the value of min_free_kbytes. These blocks are only fallen back to when there is no other free pages. Then the smallest possible ...
Subject: Bias the placement of kernel pages at lower pfns
This patch chooses blocks with lower PFNs when placing kernel allocations.
This is particularly important during fallback in low memory situations to
stop unmovable pages being placed throughout the entire address space.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc5-010-bias-the-location-of-pages-freed-for-min_free_kbytes-in-the-same-max_order_nr_pages-blocks/mm/page_alloc.c linux-2.6.23-rc5-011-bias-the-placement-of-kernel-pages-at-lower-pfns/mm/page_alloc.c
--- linux-2.6.23-rc5-010-bias-the-location-of-pages-freed-for-min_free_kbytes-in-the-same-max_order_nr_pages-blocks/mm/page_alloc.c 2007-09-02 16:22:04.000000000 +0100
+++ linux-2.6.23-rc5-011-bias-the-placement-of-kernel-pages-at-lower-pfns/mm/page_alloc.c 2007-09-02 16:22:27.000000000 +0100
@@ -768,6 +768,23 @@ int move_freepages_block(struct zone *zo
return move_freepages(zone, start_page, end_page, migratetype);
}
+/* Return the page with the lowest PFN in the list */
+static struct page *min_page(struct list_head *list)
+{
+ unsigned long min_pfn = -1UL;
+ struct page *min_page = NULL, *page;;
+
+ list_for_each_entry(page, list, lru) {
+ unsigned long pfn = page_to_pfn(page);
+ if (pfn < min_pfn) {
+ min_pfn = pfn;
+ min_page = page;
+ }
+ }
+
+ return min_page;
+}
+
/* Remove an element from the buddy allocator from the fallback list */
static struct page *__rmqueue_fallback(struct zone *zone, int order,
int start_migratetype)
@@ -791,8 +808,11 @@ static struct page *__rmqueue_fallback(s
if (list_empty(&area->free_list[migratetype]))
continue;
+ /* Bias kernel allocations towards low pfns */
page = list_entry(area->free_list[migratetype].next,
struct page, lru);
+ if (unlikely(start_migratetype != ...Subject: Be more agressive about stealing when MIGRATE_RECLAIMABLE allocations fallback
MIGRATE_RECLAIMABLE allocations tend to be very bursty in nature like when
updatedb starts. It is likely this will occur in situations where MAX_ORDER
blocks of pages are not free. This means that updatedb can scatter
MIGRATE_RECLAIMABLE pages throughout the address space. This patch is more
agressive about stealing blocks of pages for MIGRATE_RECLAIMABLE.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 23 +++++++++++++++++------
1 file changed, 17 insertions(+), 6 deletions(-)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc5-011-bias-the-placement-of-kernel-pages-at-lower-pfns/mm/page_alloc.c linux-2.6.23-rc5-012-be-more-agressive-about-stealing-when-migrate_reclaimable-allocations-fallback/mm/page_alloc.c
--- linux-2.6.23-rc5-011-bias-the-placement-of-kernel-pages-at-lower-pfns/mm/page_alloc.c 2007-09-02 16:22:27.000000000 +0100
+++ linux-2.6.23-rc5-012-be-more-agressive-about-stealing-when-migrate_reclaimable-allocations-fallback/mm/page_alloc.c 2007-09-02 16:22:47.000000000 +0100
@@ -713,7 +713,7 @@ int move_freepages(struct zone *zone,
{
struct page *page;
unsigned long order;
- int blocks_moved = 0;
+ int pages_moved = 0;
#ifndef CONFIG_HOLES_IN_ZONE
/*
@@ -742,10 +742,10 @@ int move_freepages(struct zone *zone,
list_add(&page->lru,
&zone->free_area[order].free_list[migratetype]);
page += 1 << order;
- blocks_moved++;
+ pages_moved += 1 << order;
}
- return blocks_moved;
+ return pages_moved;
}
int move_freepages_block(struct zone *zone, struct page *page, int migratetype)
@@ -817,11 +817,22 @@ static struct page *__rmqueue_fallback(s
/*
* If breaking a large block of pages, move all free
- * pages to the preferred allocation list
+ * pages to the preferred allocation list. If falling
+ * back for a reclaimable kernel ...Subject: Print out statistics in relation to fragmentation avoidance to /proc/pagetypeinfo This patch provides fragmentation avoidance statistics via /proc/pagetypeinfo. The information is collected only on request so there is no runtime overhead. The statistics are in three parts: The first part prints information on the size of blocks that pages are being grouped on and looks like Page block order: 10 Pages per block: 1024 The second part is a more detailed version of /proc/buddyinfo and looks like Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10 Node 0, zone DMA, type Unmovable 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type Reclaimable 1 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type Reserve 0 4 4 0 0 0 0 1 0 1 0 Node 0, zone Normal, type Unmovable 111 8 4 4 2 3 1 0 0 0 0 Node 0, zone Normal, type Reclaimable 293 89 8 0 0 0 0 0 0 0 0 Node 0, zone Normal, type Movable 1 6 13 9 7 6 3 0 0 0 0 Node 0, zone Normal, type Reserve 0 0 0 0 0 0 0 0 0 0 4 The third part looks like Number of blocks type Unmovable Reclaimable Movable Reserve Node 0, zone DMA 0 1 2 1 Node 0, zone Normal 3 17 94 4 To walk the zones within a node with interrupts disabled, walk_zones_in_node() is introduced and shared between /proc/buddyinfo, /proc/zoneinfo ...
It really gives me the creeps to throw away a large set of large patches and to then introduce a new set. What would go wrong if we just merged the patches I already have? -
Nothing, the end result is more or less the same. There are three style cleanups in the restack and for some reason, one of the functions moved but otherwise they are identical. The restacked version was provided to illustrate what the final stack really looks like and because I thought you would prefer it over a stack that had one patch introducing a change and a later patch removing it (like making it configurable for example). It also allowed us to test against mainline to make sure everything was ok prior to the merge. Go ahead with the patches you already have if you prefer. Just make sure not to include breakout-page_order-to-internalh-to-avoid-special-knowledge-of-the-buddy-allocator.patch as it's only required for page-owner-tracking. Thanks Andrew. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -
memory-unplug-v7-page-isolation.patch uses page_order() also, so I brought this patch back. -
