[PATCH 0/3] Arbitrary grouping and statistics for grouping pages by mobility v4

Previous thread: [patch] Move led attributes out of device name and into sysfs attributes, was Re: LED devices by Richard Hughes on Friday, June 1, 2007 - 8:04 am. (15 messages)

Next thread: Re: [1/3] 2.6.22-rc3: known regressions v2 by Jeff Chua on Friday, June 1, 2007 - 9:43 am. (3 messages)
From: Mel Gorman
Date: Friday, June 1, 2007 - 9:30 am

Hi Andrew,

These are a resend of the arbitrary grouping and statistics patches that
were removed from -mm due to build failures.  These have been successfully
tested with allnoconfig, allmodconfig and defconfig on x86 and x86_64 using
2.6.21-rc3-mm1 as a base. allnoconfig passes on ppc64 as does a standard
kernel build suitable for booting the machine but allmodconfig failed on
ppc64 with the stock -mm kernel in arch/powerpc/platforms/cell/spufs/file.c
so I didn't test there further. ia64 fails allnoconfig on the vanilla kernel
but a standard boot test was fine.

Changelog since v3
o Ensure that HUGETLB_PAGE_ORDER is not referenced when it is not available.
  Previous versions failed to compile if HUGETLB_PAGE was not set

Changelog since v2
o Patches acked by Christoph

Changelog since v1 of statistics and grouping by arbitrary order
o Fix a bug in move_freepages_block() calculations
o Make page_order available in internal.h for PageBuddy pages
o Rename fragavoidance to pagetypeinfo for both code and proc filename
o Renamr nr_pages_pageblock to pageblock_nr_pages for consistency
o Print out pageblock_nr_pages and pageblock_order in proc output
o Print out the orders in the header for /proc/pagetypeinfo
o The order being grouped at is no longer printed to the kernel log. The
  necessary information is available in /proc/pagetypeinfo
o Breakout page_order so that statistics do not require special knowledge
  of the buddy allocator

The first patch allows grouping by mobility at sizes other than
MAX_ORDER_NR_PAGES.  The size is based on the order of the system hugepage
where that is defined. When possible this is specified as a compile time
constant to help the optimiser. It does change the handling of hugepagesz
from __setup() to early_param() which needs looking at.

The second and third patches provide some statistics in relation to
fragmentation avoidance. The statistics patches are split as the second
set depend on information from PAGE_OWNER when it's available.
-- ...
From: Mel Gorman
Date: Friday, June 1, 2007 - 9:30 am

Currently mobility grouping works at the MAX_ORDER_NR_PAGES level.
This makes sense for the majority of users where this is also the huge page
size. However, on platforms like ia64 where the huge page size is runtime
configurable it is desirable to group at a lower order.  On x86_64 and
occasionally on x86, the hugepage size may not always be MAX_ORDER_NR_PAGES.

This patch groups pages together based on the value of HUGETLB_PAGE_ORDER. It
uses a compile-time constant if possible and a variable where the huge page
size is runtime configurable.

It is assumed that grouping should be done at the lowest sensible order
and that the user would not want to override this.  If this is not true,
page_block order could be forced to a variable initialised via a boot-time
kernel parameter.

One potential issue with this patch is that IA64 now parses hugepagesz
with early_param() instead of __setup(). __setup() is called after the
memory allocator has been initialised and the pageblock bitmaps already
setup. In tests on one IA64 there did not seem to be any problem with using
early_param() and in fact may be more correct as it guarantees the parameter
is handled before the parsing of hugepages=.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Christoph Lameter <clameter@sgi.com>
---

 arch/ia64/Kconfig               |    5 ++
 arch/ia64/mm/hugetlbpage.c      |    4 +-
 include/linux/mmzone.h          |    4 +-
 include/linux/pageblock-flags.h |   25 ++++++++++++-
 mm/page_alloc.c                 |   67 ++++++++++++++++++++++++-----------
 5 files changed, 80 insertions(+), 25 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc3-mm1-clean/arch/ia64/Kconfig linux-2.6.22-rc3-mm1-004_group_arbitrary/arch/ia64/Kconfig
--- linux-2.6.22-rc3-mm1-clean/arch/ia64/Kconfig	2007-06-01 09:24:34.000000000 +0100
+++ linux-2.6.22-rc3-mm1-004_group_arbitrary/arch/ia64/Kconfig	2007-06-01 10:16:26.000000000 +0100
@@ -54,6 +54,11 ...
From: Mel Gorman
Date: Friday, June 1, 2007 - 9:30 am

This patch provides fragmentation avoidance statistics via
/proc/pagetypeinfo. The information is collected only on request so there
is no runtime overhead. The statistics are in three parts:

The first part prints information on the size of blocks that pages are
being grouped on and looks like

Page block order: 10
Pages per block:  1024

The second part is a more detailed version of /proc/buddyinfo and looks like

Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10 
Node    0, zone      DMA, type    Unmovable      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone      DMA, type  Reclaimable      1      0      0      0      0      0      0      0      0      0      0
Node    0, zone      DMA, type      Movable      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone      DMA, type      Reserve      0      4      4      0      0      0      0      1      0      1      0
Node    0, zone   Normal, type    Unmovable    111      8      4      4      2      3      1      0      0      0      0
Node    0, zone   Normal, type  Reclaimable    293     89      8      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type      Movable      1      6     13      9      7      6      3      0      0      0      0
Node    0, zone   Normal, type      Reserve      0      0      0      0      0      0      0      0      0      0      4

The third part looks like

Number of blocks type     Unmovable  Reclaimable      Movable      Reserve
Node 0, zone      DMA            0            1            2            1
Node 0, zone   Normal            3           17           94            4

To walk the zones within a node with interrupts disabled, walk_zones_in_node()
is introduced and shared between /proc/buddyinfo, /proc/zoneinfo and
/proc/pagetypeinfo to reduce code duplication. It seems specific to what
vmstat.c requires but could ...
From: Mel Gorman
Date: Friday, June 1, 2007 - 9:31 am

When PAGE_OWNER is set, more information is available of relevance
to fragmentation avoidance. A second line is added to /proc/page_owner
showing the PFN, the pageblock number, the mobility type of the page based
on its allocation flags, whether the allocation is improperly placed and
the flags. A sample entry looks like

Page allocated via order 0, mask 0x1280d2
PFN 7355 Block 7 type 3 Fallback Flags      LA     
[0xc01528c6] __handle_mm_fault+598
[0xc0320427] do_page_fault+279
[0xc031ed9a] error_code+114

This information can be used to identify pages that are improperly placed. As
the format of PAGE_OWNER data is now different, the comment at the top of
Documentation/page_owner.c is updated with new instructions.

As PAGE_OWNER tracks the GFP flags used to allocate the pages,
/proc/pagetypeinfo is enhanced to contain how many mixed blocks exist. The
additional output looks like

Number of mixed blocks    Unmovable  Reclaimable      Movable      Reserve
Node 0, zone      DMA            0            1            2            1
Node 0, zone   Normal            2           11           33            0

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Christoph Lameter <clameter@sgi.com>
---

 Documentation/page_owner.c |    3 -
 fs/proc/proc_misc.c        |   28 ++++++++++++
 mm/vmstat.c                |   93 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 123 insertions(+), 1 deletion(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc3-mm1-005_statistics/Documentation/page_owner.c linux-2.6.22-rc3-mm1-006_statistics_owner/Documentation/page_owner.c
--- linux-2.6.22-rc3-mm1-005_statistics/Documentation/page_owner.c	2007-06-01 09:24:34.000000000 +0100
+++ linux-2.6.22-rc3-mm1-006_statistics_owner/Documentation/page_owner.c	2007-06-01 10:38:14.000000000 +0100
@@ -2,7 +2,8 @@
  * User-space helper to sort the output of /proc/page_owner
  *
  * Example use:
- * cat /proc/page_owner > ...
Previous thread: [patch] Move led attributes out of device name and into sysfs attributes, was Re: LED devices by Richard Hughes on Friday, June 1, 2007 - 8:04 am. (15 messages)

Next thread: Re: [1/3] 2.6.22-rc3: known regressions v2 by Jeff Chua on Friday, June 1, 2007 - 9:43 am. (3 messages)