Kamezawa-san,
This version implements your idea for storing a zone pointer and zone_idx
in a structure within the zonelist instead of encoding information in a
pointer. It has worked out quite well. The performance is comparable on
the tests I've run with similar gains/losses as I've seen with but pointer
packing but this code may be easier to understand. However, the zonelist
has doubled in size and consumes more cache lines.
I did not put the node_idx into the structure as it was not clear that there
was a real gain from doing that as the node ID is no rarely used. However,
it would be trivial to add if it could be demonstrated to be of real benefit
on workloads that make heavy use of nodemasks. I do not have an appropriate
test environment for measuring that but prehaps someone else. If they are
willing to check it out, I'll roll a suitable patch.
Any opinions on whether the slight gain in apparent performance in kernbench
worth the cacheline? It's very difficult to craft a benchmark that notices
the extra line being used so this could be a hand-waving issue.
Changelog since V6
o Instead of encoding zone index information in a pointer, this version
introduces a structure that stores a zone pointer and its index
Changelog since V5
o Rebase to 2.6.23-rc4-mm1
o Drop patch that replaces inline functions with macros
Changelog since V4
o Rebase to -mm kernel. Host of memoryless patches collisions dealt with
o Do not call wakeup_kswapd() for every zone in a zonelist
o Dropped the FASTCALL removal
o Have cursor in iterator advance earlier
o Use nodes_and in cpuset_nodes_valid_mems_allowed()
o Use defines instead of inlines, noticably better performance on gcc-3.4
No difference on later compilers such as gcc 4.1
o Dropped gfp_skip patch until it is proven to be of benefit. Tests are
currently inconclusive but it definitly consumes at least one cache
line
Changelog since V3
o Fix compile error in the parisc change
o Calculate gfp_zone only once in __alloc_pages
o Calculate classzone_idx properly in get_page_from_freelist
o Alter check so that zone id embedded may still be used on UP
o Use Kamezawa-sans suggestion for skipping zones in zonelist
o Add __alloc_pages_nodemask() to filter zonelist based on a nodemask. This
removes the need for MPOL_BIND to have a custom zonelist
o Move zonelist iterators and helpers to mm.h
o Change _zones from struct zone * to unsigned long
Changelog since V2
o shrink_zones() uses zonelist instead of zonelist->zones
o hugetlb uses zonelist iterator
o zone_idx information is embedded in zonelist pointers
o replace NODE_DATA(nid)->node_zonelist with node_zonelist(nid)
Changelog since V1
o Break up the patch into 3 patches
o Introduce iterators for zonelists
o Performance regression test
The following patches replace multiple zonelists per node with one zonelist
that is filtered based on the GFP flags. The patches as a set fix a bug
with regard to the use of MPOL_BIND and ZONE_MOVABLE. With this patchset,
the MPOL_BIND will apply to the two highest zones when the highest zone
is ZONE_MOVABLE. This should be considered as an alternative fix for the
MPOL_BIND+ZONE_MOVABLE in 2.6.23 to the previously discussed hack that
filters only custom zonelists. As a bonus, the patchset reduces the cache
footprint of the kernel and should improve performance in a number of cases.
The first patch cleans up an inconsitency where direct reclaim uses
zonelist->zones where other places use zonelist. The second patch introduces
a helper function node_zonelist() for looking up the appropriate zonelist
for a GFP mask which simplifies patches later in the set.
The third patch replaces multiple zonelists with two zonelists that are
filtered. The two zonelists are due to the fact that the memoryless patchset
introduces a second set of zonelists for __GFP_THISNODE.
The fourth patch introduces filtering of the zonelists based on a nodemask.
The final patch replaces the two zonelists with one zonelist. A nodemask is
created when __GFP_THISNODE is specified to filter the list. The nodelists
could be pre-allocated with one-per-node but it's not clear that __GFP_THISNODE
is used often enough to be worth the effort.
Performance results varied depending on the machine configuration but were
usually small performance gains. In real workloads the gain/loss will depend
on how much the userspace portion of the benchmark benefits from having more
cache available due to reduced referencing of zonelists.
These are the range of performance losses/gains when running against
2.6.23-rc3-mm1. The set and these machines are a mix of i386, x86_64 and
ppc64 both NUMA and non-NUMA.
Total CPU time on Kernbench: -0.67% to 3.05%
Elapsed time on Kernbench: -0.25% to 2.96%
page_test from aim9: -6.98% to 5.60%
brk_test from aim9: -3.94% to 4.11%
fork_test from aim9: -5.72% to 4.14%
exec_test from aim9: -1.02% to 1.56%
The TBench figures were too variable between runs to draw conclusions from but
there didn't appear to be any regressions there. The hackbench results for both
sockets and pipes were within noise.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
| Eric Sandeen | Re: [RFC] Heads up on sys_fallocate() |
| Linus Torvalds | Linux 2.6.27 |
| Cornelia Huck | Re: 2.6.22-rc3-mm1 |
| Andi Kleen | [PATCH for review] [6/48] x86: trim memory not covered by WB MTRRs |
| Linux Kernel Mailing List | i.MX3: make SoC devices globally available |
| Linux Kernel Mailing List | MXC: Remove WD IRQ priority setting |
| Linux Kernel Mailing List | ARM: DaVinci: i2c setup |
| Linux Kernel Mailing List | [MACVLAN]: Update Kconfig to refer to iproute |
git: | |
| Sverre Rabbelier | Git vs Monotone |
| Jakub Narebski | Re: [RFC] origin link for cherry-pick and revert |
| Jan-Benedict Glaw | Re: Errors GITtifying GCC and Binutils |
| H. Peter Anvin | Re: tip tree clone fail |
| jamal | Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000 |
| KOVACS Krisztian | [net-next PATCH 01/16] Loosen source address check on IPv4 output |
| Ilpo Järvinen | Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ |
| Andrew Bird (Sphere Systems) | Re: [RFC] Patch to option HSO driver to the kernel |
| sata/ide timeout errors on asus server-mb | 2 hours ago | Linux kernel |
| Shared swap partition | 2 hours ago | Linux general |
| usb mic not detected | 6 hours ago | Applications and Utilities |
| Problem in Inserting a module | 7 hours ago | Linux kernel |
| Treason Uncloaked | 13 hours ago | Linux kernel |
| high memory | 2 days ago | Linux kernel |
| semaphore access speed | 2 days ago | Applications and Utilities |
| the kernel how to power off the machine | 2 days ago | Linux kernel |
| Easter Eggs in windows XP | 3 days ago | Windows |
| Root password | 3 days ago | Linux general |
