Hi Andrew, These are a resend of the arbitrary grouping and statistics patches that were removed from -mm due to build failures. These have been successfully tested with allnoconfig, allmodconfig and defconfig on x86 and x86_64 using 2.6.21-rc3-mm1 as a base. allnoconfig passes on ppc64 as does a standard kernel build suitable for booting the machine but allmodconfig failed on ppc64 with the stock -mm kernel in arch/powerpc/platforms/cell/spufs/file.c so I didn't test there further. ia64 fails allnoconfig on the vanilla kernel but a standard boot test was fine. Changelog since v3 o Ensure that HUGETLB_PAGE_ORDER is not referenced when it is not available. Previous versions failed to compile if HUGETLB_PAGE was not set Changelog since v2 o Patches acked by Christoph Changelog since v1 of statistics and grouping by arbitrary order o Fix a bug in move_freepages_block() calculations o Make page_order available in internal.h for PageBuddy pages o Rename fragavoidance to pagetypeinfo for both code and proc filename o Renamr nr_pages_pageblock to pageblock_nr_pages for consistency o Print out pageblock_nr_pages and pageblock_order in proc output o Print out the orders in the header for /proc/pagetypeinfo o The order being grouped at is no longer printed to the kernel log. The necessary information is available in /proc/pagetypeinfo o Breakout page_order so that statistics do not require special knowledge of the buddy allocator The first patch allows grouping by mobility at sizes other than MAX_ORDER_NR_PAGES. The size is based on the order of the system hugepage where that is defined. When possible this is specified as a compile time constant to help the optimiser. It does change the handling of hugepagesz from __setup() to early_param() which needs looking at. The second and third patches provide some statistics in relation to fragmentation avoidance. The statistics patches are split as the second set depend on information from PAGE_OWNER when it's available. -- ...
Currently mobility grouping works at the MAX_ORDER_NR_PAGES level. This makes sense for the majority of users where this is also the huge page size. However, on platforms like ia64 where the huge page size is runtime configurable it is desirable to group at a lower order. On x86_64 and occasionally on x86, the hugepage size may not always be MAX_ORDER_NR_PAGES. This patch groups pages together based on the value of HUGETLB_PAGE_ORDER. It uses a compile-time constant if possible and a variable where the huge page size is runtime configurable. It is assumed that grouping should be done at the lowest sensible order and that the user would not want to override this. If this is not true, page_block order could be forced to a variable initialised via a boot-time kernel parameter. One potential issue with this patch is that IA64 now parses hugepagesz with early_param() instead of __setup(). __setup() is called after the memory allocator has been initialised and the pageblock bitmaps already setup. In tests on one IA64 there did not seem to be any problem with using early_param() and in fact may be more correct as it guarantees the parameter is handled before the parsing of hugepages=. Signed-off-by: Mel Gorman <firstname.lastname@example.org> Acked-by: Andy Whitcroft <email@example.com> Acked-by: Christoph Lameter <firstname.lastname@example.org> --- arch/ia64/Kconfig | 5 ++ arch/ia64/mm/hugetlbpage.c | 4 +- include/linux/mmzone.h | 4 +- include/linux/pageblock-flags.h | 25 ++++++++++++- mm/page_alloc.c | 67 ++++++++++++++++++++++++----------- 5 files changed, 80 insertions(+), 25 deletions(-) diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc3-mm1-clean/arch/ia64/Kconfig linux-2.6.22-rc3-mm1-004_group_arbitrary/arch/ia64/Kconfig --- linux-2.6.22-rc3-mm1-clean/arch/ia64/Kconfig 2007-06-01 09:24:34.000000000 +0100 +++ linux-2.6.22-rc3-mm1-004_group_arbitrary/arch/ia64/Kconfig 2007-06-01 10:16:26.000000000 +0100 @@ -54,6 +54,11 ...
This patch provides fragmentation avoidance statistics via /proc/pagetypeinfo. The information is collected only on request so there is no runtime overhead. The statistics are in three parts: The first part prints information on the size of blocks that pages are being grouped on and looks like Page block order: 10 Pages per block: 1024 The second part is a more detailed version of /proc/buddyinfo and looks like Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10 Node 0, zone DMA, type Unmovable 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type Reclaimable 1 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type Reserve 0 4 4 0 0 0 0 1 0 1 0 Node 0, zone Normal, type Unmovable 111 8 4 4 2 3 1 0 0 0 0 Node 0, zone Normal, type Reclaimable 293 89 8 0 0 0 0 0 0 0 0 Node 0, zone Normal, type Movable 1 6 13 9 7 6 3 0 0 0 0 Node 0, zone Normal, type Reserve 0 0 0 0 0 0 0 0 0 0 4 The third part looks like Number of blocks type Unmovable Reclaimable Movable Reserve Node 0, zone DMA 0 1 2 1 Node 0, zone Normal 3 17 94 4 To walk the zones within a node with interrupts disabled, walk_zones_in_node() is introduced and shared between /proc/buddyinfo, /proc/zoneinfo and /proc/pagetypeinfo to reduce code duplication. It seems specific to what vmstat.c requires but could ...
When PAGE_OWNER is set, more information is available of relevance to fragmentation avoidance. A second line is added to /proc/page_owner showing the PFN, the pageblock number, the mobility type of the page based on its allocation flags, whether the allocation is improperly placed and the flags. A sample entry looks like Page allocated via order 0, mask 0x1280d2 PFN 7355 Block 7 type 3 Fallback Flags LA [0xc01528c6] __handle_mm_fault+598 [0xc0320427] do_page_fault+279 [0xc031ed9a] error_code+114 This information can be used to identify pages that are improperly placed. As the format of PAGE_OWNER data is now different, the comment at the top of Documentation/page_owner.c is updated with new instructions. As PAGE_OWNER tracks the GFP flags used to allocate the pages, /proc/pagetypeinfo is enhanced to contain how many mixed blocks exist. The additional output looks like Number of mixed blocks Unmovable Reclaimable Movable Reserve Node 0, zone DMA 0 1 2 1 Node 0, zone Normal 2 11 33 0 Signed-off-by: Mel Gorman <email@example.com> Acked-by: Andy Whitcroft <firstname.lastname@example.org> Acked-by: Christoph Lameter <email@example.com> --- Documentation/page_owner.c | 3 - fs/proc/proc_misc.c | 28 ++++++++++++ mm/vmstat.c | 93 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 123 insertions(+), 1 deletion(-) diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc3-mm1-005_statistics/Documentation/page_owner.c linux-2.6.22-rc3-mm1-006_statistics_owner/Documentation/page_owner.c --- linux-2.6.22-rc3-mm1-005_statistics/Documentation/page_owner.c 2007-06-01 09:24:34.000000000 +0100 +++ linux-2.6.22-rc3-mm1-006_statistics_owner/Documentation/page_owner.c 2007-06-01 10:38:14.000000000 +0100 @@ -2,7 +2,8 @@ * User-space helper to sort the output of /proc/page_owner * * Example use: - * cat /proc/page_owner > ...
|Ken Chen||[patch] sched: fix inconsistency when redistribute per-cpu tg->cfs_rq shares.|
|Ingo Molnar||Re: [PATCH v3] x86: merge the simple bitops and move them to bitops.h|
|Jan Engelhardt||Re: [PATCH] Allow Kconfig to set default mmap_min_addr protection|
|Dmitry Torokhov||Re: [2.6 patch] input/serio/hp_sdc.c section fix|
|Rafael J. Wysocki||[Bug #16380] Loop devices act strangely in 2.6.35|
|Steven Grimm||Using git as a general backup mechanism (was Re: Using GIT to store /etc)|
|Jeff King||Re: [PATCH] git-reset: allow --soft in a bare repo|
|Johannes Sixt||Re: [PATCH 01/14] msvc: Fix compilation errors in compat/win32/sys/poll.c|
|Johannes Schindelin||Re: [PATCH] Uninstall rule for top level Makefile|
|Shawn O. Pearce||Re: [PATCH v2] Speed up bash completion loading|
|Linux Kernel Mailing List||cgroups: clean up cgroup_pidlist_find() a bit|
|Linux Kernel Mailing List||sony-laptop: Add support for extended hotkeys|
|Linux Kernel Mailing List||IB/core: Add support for masked atomic operations|
|Linux Kernel Mailing List||V4L/DVB (8939): cx18: fix sparse warnings|
|Linux Kernel Mailing List||ipv6 mcast: Check address family of gf_group in getsockopt(MS_FILTER).|
|Inaky Perez-Gonzalez||[PATCH 40/40] wimax/i2400m: add CREDITS and MAINTAINERS entries|
|Karsten Keil||[mISDN PATCH v2 05/19] Reduce stack size in dsp_cmx_send()|
|linux||Re: 2.6.23-rc8 network problem. Mem leak? ip1000a?|
|David Miller||Re: tun: Use netif_receive_skb instead of netif_rx|
|David Miller||Re: [net-next PATCH v2] llc enhancements|
|Matthew Fleming||Re: [RFC] Outline of USB process integration in the kernel taskqueue system|
|firstname.lastname@example.org||Re: OT: 2d password|
|Hartmut Brandt||Re: problem with nss_ldap|
|Andrew Reilly||Re: FreeBSD's problems as seen by the BSDForen.de community|
|Max Laier||Re: Upcoming ABI Breakage in RELENG_7|