Hi Lee,
This is the patchset I would like tested. It has Kamezawa-sans approach for
using a structure instead of pointer packing. While it consumes more cache
like Christoph pointed out, it should an easier starting point to optimise
once workloads are identified that can show performance gains/regressions. The
pointer packing is a potential optimisation but once in place, it's difficult
to alter again.Please let me know how it works out for you.
Changelog since V7
o Fix build bug in relation to memory controller combined with one-zonelist
o Use while() instead of a stupid looking for()Changelog since V6
o Instead of encoding zone index information in a pointer, this version
introduces a structure that stores a zone pointer and its indexChangelog since V5
o Rebase to 2.6.23-rc4-mm1
o Drop patch that replaces inline functions with macrosChangelog since V4
o Rebase to -mm kernel. Host of memoryless patches collisions dealt with
o Do not call wakeup_kswapd() for every zone in a zonelist
o Dropped the FASTCALL removal
o Have cursor in iterator advance earlier
o Use nodes_and in cpuset_nodes_valid_mems_allowed()
o Use defines instead of inlines, noticably better performance on gcc-3.4
No difference on later compilers such as gcc 4.1
o Dropped gfp_skip patch until it is proven to be of benefit. Tests are
currently inconclusive but it definitly consumes at least one cache
lineChangelog since V3
o Fix compile error in the parisc change
o Calculate gfp_zone only once in __alloc_pages
o Calculate classzone_idx properly in get_page_from_freelist
o Alter check so that zone id embedded may still be used on UP
o Use Kamezawa-sans suggestion for skipping zones in zonelist
o Add __alloc_pages_nodemask() to filter zonelist based on a nodemask. This
removes the need for MPOL_BIND to have a custom zonelist
o Move zonelist iterators and helpers to mm.h
o Change _zones from struct zone * to unsigned longChangelog...
Two zonelists exist so that GFP_THISNODE allocations will be guaranteed
to use memory only from a node local to the CPU. As we can now filter the
zonelist based on a nodemask, we can filter the node slightly different
when GFP_THISNODE is specified.When GFP_THISNODE is used, a temporary nodemask is created with only the
node local to the CPU set. This allows us to eliminate the second zonelist.Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---drivers/char/sysrq.c | 2 -
fs/buffer.c | 5 +--
include/linux/gfp.h | 23 +++------------
include/linux/mempolicy.h | 2 -
include/linux/mmzone.h | 14 ---------
mm/mempolicy.c | 8 ++---
mm/page_alloc.c | 61 ++++++++++++++++++++++-------------------
mm/slab.c | 2 -
mm/slub.c | 2 -
mm/vmscan.c | 2 -
10 files changed, 51 insertions(+), 70 deletions(-)diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc4-mm1-030_filter_nodemask/drivers/char/sysrq.c linux-2.6.23-rc4-mm1-040_use_one_zonelist/drivers/char/sysrq.c
--- linux-2.6.23-rc4-mm1-030_filter_nodemask/drivers/char/sysrq.c 2007-09-13 11:57:27.000000000 +0100
+++ linux-2.6.23-rc4-mm1-040_use_one_zonelist/drivers/char/sysrq.c 2007-09-13 13:44:23.000000000 +0100
@@ -270,7 +270,7 @@ static struct sysrq_key_op sysrq_term_opstatic void moom_callback(struct work_struct *ignored)
{
- out_of_memory(node_zonelist(0, GFP_KERNEL), GFP_KERNEL, 0);
+ out_of_memory(node_zonelist(0), GFP_KERNEL, 0);
}static DECLARE_WORK(moom_work, moom_callback);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc4-mm1-030_filter_nodemask/fs/buffer.c linux-2.6.23-rc4-mm1-040_use_one_zonelist/fs/buffer.c
--- linux-2.6.23-rc4-mm1-030_filter_nodemask/fs/buffer.c 2007-09-13 11:57:52.000000000 +0100
+++ linux-2.6.23-rc4-mm1-040_use_one_zonelist/fs/buffer.c 2007-09-13 13:44:23.000000000 +0100
@@ -375,11 +375,10 @@ static void free_more_memory(void)
yie...
The MPOL_BIND policy creates a zonelist that is used for allocations belonging
to that thread that can use the policy_zone. As the per-node zonelist is
already being filtered based on a zone id, this patch adds a version of
__alloc_pages() that takes a nodemask for further filtering. This eliminates
the need for MPOL_BIND to create a custom zonelist. A positive benefit of
this is that allocations using MPOL_BIND now use the local-node-ordered
zonelist instead of a custom node-id-ordered zonelist.Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---fs/buffer.c | 2
include/linux/cpuset.h | 4 -
include/linux/gfp.h | 4 +
include/linux/mempolicy.h | 3
include/linux/mmzone.h | 62 +++++++++++++----
kernel/cpuset.c | 18 +----
mm/mempolicy.c | 145 ++++++++++++-----------------------------
mm/page_alloc.c | 40 +++++++----
8 files changed, 133 insertions(+), 145 deletions(-)diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc4-mm1-020_zoneid_zonelist/fs/buffer.c linux-2.6.23-rc4-mm1-030_filter_nodemask/fs/buffer.c
--- linux-2.6.23-rc4-mm1-020_zoneid_zonelist/fs/buffer.c 2007-09-13 11:57:44.000000000 +0100
+++ linux-2.6.23-rc4-mm1-030_filter_nodemask/fs/buffer.c 2007-09-13 11:57:52.000000000 +0100
@@ -376,7 +376,7 @@ static void free_more_memory(void)for_each_online_node(nid) {
zrefs = first_zones_zonelist(node_zonelist(nid, GFP_NOFS),
- gfp_zone(GFP_NOFS));
+ NULL, gfp_zone(GFP_NOFS));
if (zrefs->zone)
try_to_free_pages(node_zonelist(nid, GFP_NOFS), 0,
GFP_NOFS);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc4-mm1-020_zoneid_zonelist/include/linux/cpuset.h linux-2.6.23-rc4-mm1-030_filter_nodemask/include/linux/cpuset.h
--- linux-2.6.23-rc4-mm1-020_zoneid_zonelist/include/linux/cpuset.h 2007-09-10 09:29:13.000000000 +0100
+++ linux-2.6.23-rc4-mm1-030_filter_nodemask/include/linux/cpuset.h 2007-09-13 11:57:52.000000000 +0100
@@ -28,7 +...
Using two zonelists per node requires very frequent use of zone_idx(). This
is costly as it involves a lookup of another structure and a substraction
operation. As the zone_idx is often required, it should be quickly accessible.
The node idx could also be stored here if it was found that accessing
zone->node is significant which may be the case on workloads where nodemasks
are heavily used.This patch introduces a struct zoneref to store a zone pointer and a zone
index. The zonelist then consists of an array of this struct zonerefs which
are looked up as necessary. Helpers are given for accessing the zone index
as well as the node index.[kamezawa.hiroyu@jp.fujitsu.com: Suggested struct zoneref instead of embedding information in pointers]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Christoph Lameter <clameter@sgi.com>
---arch/parisc/mm/init.c | 2 -
fs/buffer.c | 6 ++--
include/linux/mmzone.h | 64 +++++++++++++++++++++++++++++++++++++-------
kernel/cpuset.c | 4 +-
mm/hugetlb.c | 3 +-
mm/mempolicy.c | 35 ++++++++++++++----------
mm/oom_kill.c | 2 -
mm/page_alloc.c | 51 +++++++++++++++++------------------
mm/slab.c | 2 -
mm/slub.c | 2 -
mm/vmscan.c | 7 ++--
mm/vmstat.c | 5 ++-
12 files changed, 118 insertions(+), 65 deletions(-)diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc4-mm1-010_use_two_zonelists/arch/parisc/mm/init.c linux-2.6.23-rc4-mm1-020_zoneid_zonelist/arch/parisc/mm/init.c
--- linux-2.6.23-rc4-mm1-010_use_two_zonelists/arch/parisc/mm/init.c 2007-09-13 11:57:36.000000000 +0100
+++ linux-2.6.23-rc4-mm1-020_zoneid_zonelist/arch/parisc/mm/init.c 2007-09-13 11:57:44.000000000 +0100
@@ -604,7 +604,7 @@ void show_mem(void)
for (i = 0; i < npmem_ranges; i++) {
zl = node_zonelist(i);
for (j = 0; j < MAX_NR_ZONES; j++) {
- struct zone **z;
+ struct zoneref *z;...
Currently a node has a number of zonelists, one for each zone type in the
system and a second set for THISNODE allocations. Based on the zones allowed
by a gfp mask, one of these zonelists is selected. All of these zonelists
occupy memory and consume cache lines.This patch replaces the multiple zonelists per-node with two zonelists. The
first contains all populated zones in the system and the second contains all
populated zones in node suitable for GFP_THISNODE allocations. An iterator
macro is introduced called for_each_zone_zonelist() interates through each
zone in the zonelist that is allowed by the GFP flags.Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Christoph Lameter <clameter@sgi.com>
---arch/parisc/mm/init.c | 11 +-
fs/buffer.c | 6 +
include/linux/gfp.h | 17 +---
include/linux/mmzone.h | 65 +++++++++++-----
mm/hugetlb.c | 8 +-
mm/oom_kill.c | 8 +-
mm/page_alloc.c | 169 +++++++++++++++++++-------------------------
mm/slab.c | 8 +-
mm/slub.c | 8 +-
mm/vmscan.c | 22 ++---
10 files changed, 160 insertions(+), 162 deletions(-)diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc4-mm1-007_node_zonelist/arch/parisc/mm/init.c linux-2.6.23-rc4-mm1-010_use_two_zonelists/arch/parisc/mm/init.c
--- linux-2.6.23-rc4-mm1-007_node_zonelist/arch/parisc/mm/init.c 2007-08-28 02:32:35.000000000 +0100
+++ linux-2.6.23-rc4-mm1-010_use_two_zonelists/arch/parisc/mm/init.c 2007-09-13 11:57:36.000000000 +0100
@@ -599,15 +599,18 @@ void show_mem(void)
#ifdef CONFIG_DISCONTIGMEM
{
struct zonelist *zl;
- int i, j, k;
+ int i, j;for (i = 0; i < npmem_ranges; i++) {
+ zl = node_zonelist(i);
for (j = 0; j < MAX_NR_ZONES; j++) {
- zl = NODE_DATA(i)->node_zonelists + j;
+ struct zone **z;
+ struct zone *zone;printk("Zone list for zone %d on node %d: ", j, i);
- for (k = 0; zl->zones[k] != NUL...
This patch introduces a node_zonelist() helper function. It is used to lookup
the appropriate zonelist given a node and a GFP mask. The patch on its own is
a cleanup but it helps clarify parts of the one-zonelist-per-node patchset. If
necessary, it can be merged with the next patch in this set without problems.Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---drivers/char/sysrq.c | 3 +--
fs/buffer.c | 6 +++---
include/linux/gfp.h | 22 +++++++++++++++++++---
include/linux/mempolicy.h | 2 +-
mm/mempolicy.c | 6 +++---
mm/page_alloc.c | 3 +--
mm/slab.c | 3 +--
mm/slub.c | 3 +--
8 files changed, 30 insertions(+), 18 deletions(-)diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc4-mm1-005_freepages_zonelist/drivers/char/sysrq.c linux-2.6.23-rc4-mm1-007_node_zonelist/drivers/char/sysrq.c
--- linux-2.6.23-rc4-mm1-005_freepages_zonelist/drivers/char/sysrq.c 2007-09-10 09:29:11.000000000 +0100
+++ linux-2.6.23-rc4-mm1-007_node_zonelist/drivers/char/sysrq.c 2007-09-13 11:57:27.000000000 +0100
@@ -270,8 +270,7 @@ static struct sysrq_key_op sysrq_term_opstatic void moom_callback(struct work_struct *ignored)
{
- out_of_memory(&NODE_DATA(0)->node_zonelists[ZONE_NORMAL],
- GFP_KERNEL, 0);
+ out_of_memory(node_zonelist(0, GFP_KERNEL), GFP_KERNEL, 0);
}static DECLARE_WORK(moom_work, moom_callback);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc4-mm1-005_freepages_zonelist/fs/buffer.c linux-2.6.23-rc4-mm1-007_node_zonelist/fs/buffer.c
--- linux-2.6.23-rc4-mm1-005_freepages_zonelist/fs/buffer.c 2007-09-10 09:29:13.000000000 +0100
+++ linux-2.6.23-rc4-mm1-007_node_zonelist/fs/buffer.c 2007-09-13 11:57:27.000000000 +0100
@@ -369,13 +369,13 @@ void invalidate_bdev(struct block_device
static void free_more_memory(void)
{
struct zone **zones;
- pg_data_t *pgdat;
+ int nid;wakeup_pdflush(1024);
yield();- for_each_on...
The allocator deals with zonelists which indicate the order in which zones
should be targeted for an allocation. Similarly, direct reclaim of pages
iterates over an array of zones. For consistency, this patch converts direct
reclaim to use a zonelist. No functionality is changed by this patch. This
simplifies zonelist iterators in the next patch.Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Christoph Lameter <clameter@sgi.com>
---include/linux/swap.h | 2 +-
mm/page_alloc.c | 2 +-
mm/vmscan.c | 19 +++++++++++--------
3 files changed, 13 insertions(+), 10 deletions(-)diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc4-mm1-fix-pcnet32/include/linux/swap.h linux-2.6.23-rc4-mm1-005_freepages_zonelist/include/linux/swap.h
--- linux-2.6.23-rc4-mm1-fix-pcnet32/include/linux/swap.h 2007-09-10 09:29:14.000000000 +0100
+++ linux-2.6.23-rc4-mm1-005_freepages_zonelist/include/linux/swap.h 2007-09-13 11:57:20.000000000 +0100
@@ -189,7 +189,7 @@ extern int rotate_reclaimable_page(struc
extern void swap_setup(void);/* linux/mm/vmscan.c */
-extern unsigned long try_to_free_pages(struct zone **zones, int order,
+extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
gfp_t gfp_mask);
extern unsigned long try_to_free_mem_container_pages(struct mem_container *mem);
extern int __isolate_lru_page(struct page *page, int mode);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc4-mm1-fix-pcnet32/mm/page_alloc.c linux-2.6.23-rc4-mm1-005_freepages_zonelist/mm/page_alloc.c
--- linux-2.6.23-rc4-mm1-fix-pcnet32/mm/page_alloc.c 2007-09-10 09:29:14.000000000 +0100
+++ linux-2.6.23-rc4-mm1-005_freepages_zonelist/mm/page_alloc.c 2007-09-13 11:57:20.000000000 +0100
@@ -1667,7 +1667,7 @@ nofail_alloc:
reclaim_state.reclaimed_slab = 0;
p->reclaim_state = &reclaim_state;- did_some_progress = try_to_free_pages(zonelist->zones, order, gfp_mask);
+ did_some_progress = try_to_free_pages(zone...
| david | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 004/196] Chinese: add translation of SubmittingPatches |
| Paul Jackson | Re: cpuset-remove-sched-domain-hooks-from-cpusets |
| Rafael J. Wysocki | [Bug #11210] libata badness |
| David Miller | Re: [GIT]: Networking |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Andrew Morton | Re: [BUG] New Kernel Bugs |
git: | |
