Andrew, this is the promised drop-in replacement for the patch series,
with all the cleanups you requested since friday as well as the bugfixes
I came up with this morning.You can drop vmscan-add-some-sanity-checks-to-get_scan_ratio.patch
and the incremental changes - those are all folded in to these patches.On large memory systems, the VM can spend way too much time scanning
through pages that it cannot (or should not) evict from memory. Not
only does it use up CPU time, but it also provokes lock contention
and can leave large systems under memory presure in a catatonic state.This patch series improves VM scalability by:
1) putting filesystem backed, swap backed and unevictable pages
onto their own LRUs, so the system only scans the pages that it
can/should evict from memory2) switching to two handed clock replacement for the anonymous LRUs,
so the number of pages that need to be scanned when the system
starts swapping is bound to a reasonable number3) keeping unevictable pages off the LRU completely, so the
VM does not waste CPU time scanning them. ramfs, ramdisk,
SHM_LOCKED shared memory segments and mlock()ed VMA pages
are keept on the unevictable list.More info on the overall design can be found at:
http://linux-mm.org/PageReplacementDesign
An all-in-one patch can be found at:
http://people.redhat.com/riel/splitvm/
Changelog:
- fix the merge bugs
- leave swappiness at 60, if only to demonstrate why that value is
wrong with the new code (hi Andrew)
- update Documentation/vm/unevictable-lru.txt until my hands hurt
from typing
- rename try_to_unlock to try_to_munlock
- remove CONFIG_NORECLAIM_MLOCK, only use CONFIG_UNEVICTABLE_LRU
- Aunt Tillified the CONFIG_UNEVICTABLE_LRU description
- make CONFIG_NORECLAIM_LRU no longer depend on 64BIT and default y
- rename NORECLAIM to UNEVICTABLE as suggested by Andrew Morton
- fix vmscan-fix-pagecache-reclaim-referenced-bit-check.patch so
the referenced bit set test is the same...
Hey, I did some MM testing!
On a 900MB 2-way, allocate and memset 1000MB.
mainline:
vmm:/home/akpm> time usemem -m 1000
usemem -m 1000 0.10s user 10.27s system 62% cpu 16.567 total
vmm:/home/akpm> time usemem -m 1000
usemem -m 1000 0.12s user 10.23s system 63% cpu 16.234 total
vmm:/home/akpm> time usemem -m 1000
usemem -m 1000 0.13s user 9.90s system 63% cpu 15.812 total
vmm:/home/akpm> time usemem -m 1000
usemem -m 1000 0.11s user 9.98s system 65% cpu 15.494 total
vmm:/home/akpm> time usemem -m 1000
usemem -m 1000 0.12s user 9.94s system 62% cpu 16.000 total2.6.26-rc5-mm3:
vmm:/home/akpm> time usemem -m 1000
usemem -m 1000 0.15s user 9.81s system 52% cpu 19.117 total
vmm:/home/akpm> time usemem -m 1000
usemem -m 1000 0.14s user 9.07s system 45% cpu 20.403 total
vmm:/home/akpm> time usemem -m 1000
usemem -m 1000 0.25s user 9.63s system 34% cpu 28.533 total
vmm:/home/akpm> time usemem -m 1000
usemem -m 1000 0.15s user 9.35s system 49% cpu 19.196 total
vmm:/home/akpm> time usemem -m 1000
usemem -m 1000 0.13s user 8.79s system 49% cpu 17.993 totalSeems to have saved a little CPU but the IO patterns got worse.
qsbench, 4 processes, memory size tuned to threshold-of-swapping*1.1:
Mainline:
vmm:/home/akpm/qsbench> time ./qsbench -p 4 -m 230
./qsbench -p 4 -m 230 175.45s user 45.67s system 60% cpu 6:08.40 total2.6.26-rc5-mm3:
vmm:/home/akpm/qsbench> time ./qsbench -p 4 -m 230
./qsbench -p 4 -m 230 178.21s user 28.49s system 99% cpu 3:27.14 totalSo woot! Professional qsbench users will be pleased ;) It could have
been a fluke though - iirc qsbench is pretty unstable, especially on
the threshold.Main thing is: it seems stable. Old LTP ran for an hour or so before I
hit the msgctl08 crash (which is a regression in current mainline).--
Where can I get this benchmark?
I found following URL. but it doesn't have -m option.
I guess it is too old ;)http://lkml.org/lkml/2001/10/9/90
--
I might have added it - I forget.
--
Thanks.
I'll test this benchmark :)--
On Wed, 11 Jun 2008 22:34:30 -0700
In previous tests on my 16GB system, a 16GB fillmem (goes into swap)
saves enough CPU time to make up for potentially worse detection of
the working set (well, not like this program really has a working set).I'll try this out myself on a smaller system and see if there's
Ignoring references that happen on the active list, only acting
on re-references that happen on the inactive list, gives anonymous
memory something that closer resembles the use-once policy.Better for some workloads, but potentially worse for others.
Definately worth tweaking the system though, to get performance
Our main focus has been on stability for the past few months,
trying to get the whole series integrated.--
All rights reversed.
--
| david | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Eric Sandeen | Re: [RFC] Heads up on sys_fallocate() |
| Filippos Papadopoulos | Re: INITIO scsi driver fails to work properly |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| Jarek Poplawski | [PATCH take 2] pkt_sched: Protect gen estimators under est_lock. |
| Natalie Protasevich | [BUG] New Kernel Bugs |
