Re: [PATCH -mm 00/24] VM pageout scalability improvements (V12)

Previous thread: [PATCH -mm 08/24] vmscan: fix pagecache reclaim referenced bit check by Rik van Riel on Wednesday, June 11, 2008 - 2:42 pm. (1 message)

Next thread: [PATCH -mm 05/24] define page_file_cache() function by Rik van Riel on Wednesday, June 11, 2008 - 2:42 pm. (1 message)
To: <linux-kernel@...>
Cc: Andrew Morton <akpm@...>, Lee Schermerhorn <lee.schermerhorn@...>, Kosaki Motohiro <kosaki.motohiro@...>
Date: Wednesday, June 11, 2008 - 2:42 pm

Andrew, this is the promised drop-in replacement for the patch series,
with all the cleanups you requested since friday as well as the bugfixes
I came up with this morning.

You can drop vmscan-add-some-sanity-checks-to-get_scan_ratio.patch
and the incremental changes - those are all folded in to these patches.

On large memory systems, the VM can spend way too much time scanning
through pages that it cannot (or should not) evict from memory. Not
only does it use up CPU time, but it also provokes lock contention
and can leave large systems under memory presure in a catatonic state.

This patch series improves VM scalability by:

1) putting filesystem backed, swap backed and unevictable pages
onto their own LRUs, so the system only scans the pages that it
can/should evict from memory

2) switching to two handed clock replacement for the anonymous LRUs,
so the number of pages that need to be scanned when the system
starts swapping is bound to a reasonable number

3) keeping unevictable pages off the LRU completely, so the
VM does not waste CPU time scanning them. ramfs, ramdisk,
SHM_LOCKED shared memory segments and mlock()ed VMA pages
are keept on the unevictable list.

More info on the overall design can be found at:

http://linux-mm.org/PageReplacementDesign

An all-in-one patch can be found at:

http://people.redhat.com/riel/splitvm/

Changelog:
- fix the merge bugs
- leave swappiness at 60, if only to demonstrate why that value is
wrong with the new code (hi Andrew)
- update Documentation/vm/unevictable-lru.txt until my hands hurt
from typing
- rename try_to_unlock to try_to_munlock
- remove CONFIG_NORECLAIM_MLOCK, only use CONFIG_UNEVICTABLE_LRU
- Aunt Tillified the CONFIG_UNEVICTABLE_LRU description
- make CONFIG_NORECLAIM_LRU no longer depend on 64BIT and default y
- rename NORECLAIM to UNEVICTABLE as suggested by Andrew Morton
- fix vmscan-fix-pagecache-reclaim-referenced-bit-check.patch so
the referenced bit set test is the same...

To: Rik van Riel <riel@...>
Cc: <linux-kernel@...>, Lee Schermerhorn <lee.schermerhorn@...>, Kosaki Motohiro <kosaki.motohiro@...>
Date: Thursday, June 12, 2008 - 1:34 am

Hey, I did some MM testing!

On a 900MB 2-way, allocate and memset 1000MB.

mainline:

vmm:/home/akpm> time usemem -m 1000
usemem -m 1000 0.10s user 10.27s system 62% cpu 16.567 total
vmm:/home/akpm> time usemem -m 1000
usemem -m 1000 0.12s user 10.23s system 63% cpu 16.234 total
vmm:/home/akpm> time usemem -m 1000
usemem -m 1000 0.13s user 9.90s system 63% cpu 15.812 total
vmm:/home/akpm> time usemem -m 1000
usemem -m 1000 0.11s user 9.98s system 65% cpu 15.494 total
vmm:/home/akpm> time usemem -m 1000
usemem -m 1000 0.12s user 9.94s system 62% cpu 16.000 total

2.6.26-rc5-mm3:

vmm:/home/akpm> time usemem -m 1000
usemem -m 1000 0.15s user 9.81s system 52% cpu 19.117 total
vmm:/home/akpm> time usemem -m 1000
usemem -m 1000 0.14s user 9.07s system 45% cpu 20.403 total
vmm:/home/akpm> time usemem -m 1000
usemem -m 1000 0.25s user 9.63s system 34% cpu 28.533 total
vmm:/home/akpm> time usemem -m 1000
usemem -m 1000 0.15s user 9.35s system 49% cpu 19.196 total
vmm:/home/akpm> time usemem -m 1000
usemem -m 1000 0.13s user 8.79s system 49% cpu 17.993 total

Seems to have saved a little CPU but the IO patterns got worse.

qsbench, 4 processes, memory size tuned to threshold-of-swapping*1.1:

Mainline:

vmm:/home/akpm/qsbench> time ./qsbench -p 4 -m 230
./qsbench -p 4 -m 230 175.45s user 45.67s system 60% cpu 6:08.40 total

2.6.26-rc5-mm3:

vmm:/home/akpm/qsbench> time ./qsbench -p 4 -m 230
./qsbench -p 4 -m 230 178.21s user 28.49s system 99% cpu 3:27.14 total

So woot! Professional qsbench users will be pleased ;) It could have
been a fluke though - iirc qsbench is pretty unstable, especially on
the threshold.

Main thing is: it seems stable. Old LTP ran for an hour or so before I
hit the msgctl08 crash (which is a regression in current mainline).

--

To: Andrew Morton <akpm@...>
Cc: <kosaki.motohiro@...>, Rik van Riel <riel@...>, <linux-kernel@...>, Lee Schermerhorn <lee.schermerhorn@...>
Date: Monday, June 16, 2008 - 1:32 am

Where can I get this benchmark?

I found following URL. but it doesn't have -m option.
I guess it is too old ;)

http://lkml.org/lkml/2001/10/9/90

--

To: KOSAKI Motohiro <kosaki.motohiro@...>
Cc: Rik van Riel <riel@...>, <linux-kernel@...>, Lee Schermerhorn <lee.schermerhorn@...>
Date: Monday, June 16, 2008 - 2:20 am

I might have added it - I forget.

--

To: Andrew Morton <akpm@...>
Cc: <kosaki.motohiro@...>, Rik van Riel <riel@...>, <linux-kernel@...>, Lee Schermerhorn <lee.schermerhorn@...>
Date: Monday, June 16, 2008 - 2:22 am

Thanks.
I'll test this benchmark :)

--

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, Lee Schermerhorn <lee.schermerhorn@...>, Kosaki Motohiro <kosaki.motohiro@...>
Date: Thursday, June 12, 2008 - 9:31 am

On Wed, 11 Jun 2008 22:34:30 -0700

In previous tests on my 16GB system, a 16GB fillmem (goes into swap)
saves enough CPU time to make up for potentially worse detection of
the working set (well, not like this program really has a working set).

I'll try this out myself on a smaller system and see if there's

Ignoring references that happen on the active list, only acting
on re-references that happen on the inactive list, gives anonymous
memory something that closer resembles the use-once policy.

Better for some workloads, but potentially worse for others.

Definately worth tweaking the system though, to get performance

Our main focus has been on stability for the past few months,
trying to get the whole series integrated.

--
All rights reversed.
--

Previous thread: [PATCH -mm 08/24] vmscan: fix pagecache reclaim referenced bit check by Rik van Riel on Wednesday, June 11, 2008 - 2:42 pm. (1 message)

Next thread: [PATCH -mm 05/24] define page_file_cache() function by Rik van Riel on Wednesday, June 11, 2008 - 2:42 pm. (1 message)