2.6.24-rc1: First impressions

Previous thread: Interrupt Latency test for intel machine by Jaswinder Singh on Friday, October 26, 2007 - 7:13 am. (1 message)

Next thread: Re: Is gcc thread-unsafe? by Bart Van Assche on Friday, October 26, 2007 - 7:40 am. (22 messages)
From: Martin Knoblauch
Date: Friday, October 26, 2007 - 7:18 am

Hi ,

 just to give some feedback on 2.6.24-rc1. For some time I am tracking IO/writeback problems that hurt system responsiveness big-time. I tested Peters stuff together with Fenguangs additions and it looked promising. Therefore I was very happy to see Peters stuff going into 2.6.24 and waited eagerly for rc1. In short, I am impressed. This really looks good. IO throughput is great and I could not reproduce the responsiveness problems so far.

 Below are a some numbers of my brute-force I/O tests that I can use to bring responsiveness down. My platform is a HP/DL380g4, dual CPUs, HT-enabled, 8 GB Memory, SmartaArray6i controller with 4x72GB SCSI disks as RAID5 (battery protected writeback cahe enabled) and gigabit networking (tg3). User space is 64-bit RHEL4.3

 I am basically doing copies using "dd" with 1MB blocksize. Local Filesystem ist ext2 (noatime). IO-Scheduler is dealine, as it tends to give best results. NFS3 Server is a Sun/T2000/Solaris10. The tests are:

dd1 - copy 16 GB from /dev/zero to local FS
dd1-dir - same, but using O_DIRECT for output
dd2/dd2-dir - copy 2x7.6 GB in parallel from /dev/zero to local FS
dd3/dd3-dir - copy 3x5.2 GB in parallel from /dev/zero lo local FS
net1 - copy 5.2 GB from NFS3 share to local FS
mix3 - copy 3x5.2 GB from /dev/zero to local disk and two NFS3 shares

 I did the numbers for 2.6.19.2, 2.6.22.6 and 2.6.24-rc1. All units are MB/sec.

test           2.6.19.2     2.6.22.6    2.6.24.-rc1
----------------------------------------------------------------
dd1           28            50                96
dd1-dir     88                88                86
dd2          2x16.5       2x11            2x44.5
dd2-dir      2x44          2x44            2x43
dd3            3x9.8        3x8.7         3x30
dd3-dir      3x29.5      3x29.5        3x28.5
net1            30-33         50-55         37-52
mix3           17/32         25/50        96/35 (disk/combined-network)


 Some observations:

- single threaded disk speed really went up wit ...
From: Ingo Molnar
Date: Friday, October 26, 2007 - 8:22 am

wow, really nice results! Peter does know how to make stuff fast :) Now 

Such as the rewritten reclaim (clockpro) patches:

  http://programming.kicks-ass.net/kernel-patches/page-replace/

The improve-swap-performance (swap-token) patches:

  http://programming.kicks-ass.net/kernel-patches/swap_token/

His enable-swap-over-NFS [and other complex IO transports] patches:

  http://programming.kicks-ass.net/kernel-patches/vm_deadlock/

And the concurrent pagecache patches:

  http://programming.kicks-ass.net/kernel-patches/concurrent-pagecache/

as a starter :-) I think the MM should get out of deep-feature-freeze 
mode - there's tons of room to improve :-/

	Ingo "runs and hides" Molnar
-

From: Peter Zijlstra
Date: Friday, October 26, 2007 - 8:29 am

I think riel is taking over that stuff with his split vm and policies


Will post that one again, soonish.... Esp. after Linus professed liking
to have swap over NFS.

I've been working on improving the changelogs and comments in that code.

latest code (somewhat raw, as rushed by ingo posting this) in:

Yeah, that one would be cool, but it depends on Nick getting his
lockless pagecache upstream. For those who don't know, both are in -rt
(and have been for some time) so it's not unproven code.
From: Rik van Riel
Date: Friday, October 26, 2007 - 8:49 am

On Fri, 26 Oct 2007 17:29:00 +0200

I am.  Taking every single reference to a page into account simply
won't scale to systems with 1TB of RAM.  This is why I am working
on implementing:

http://linux-mm.org/PageReplacementDesign

At the moment I only have the basic "plumbing" of the split VM
working and am fixing some bugs in that.  Expect a patch series
with that soon, so you guys can review that code and tell me
where to beat it into shape some more :)

After that I will work on the policy bits, where we can really
get performance benefits.  The patch series should be mergeable
in smaller increments, so we can take things slowly if desired.

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
-

From: Andrew Morton
Date: Friday, October 26, 2007 - 12:21 pm

On Fri, 26 Oct 2007 17:22:21 +0200

Those changes seem suspiciously large to me.  I wonder if there's less
physical IO happening during the timed run, and correspondingly more

Kidding.  We merge about 265 MM patches in 2.6.24-rc1:

 482 files changed, 8071 insertions(+), 5142 deletions(-)

-

From: Ingo Molnar
Date: Friday, October 26, 2007 - 12:33 pm

so a final 'sync' should be added to the test too, and the time it takes 

impressive :)

	Ingo
-

From: Andrew Morton
Date: Friday, October 26, 2007 - 12:42 pm

On Fri, 26 Oct 2007 21:33:40 +0200

That's one way of doing it.  Or just run the test for a "long" time.  ie:
much longer than (total-memory / disk-bandwidth).  Probably the latter

A lot of that was new functionality.  That's easier to add than things
which change long-standing functionality.
-

From: Bill Davidsen
Date: Saturday, October 27, 2007 - 12:14 pm

Longer might be less inaccurate, but without flushing the last data you 
really don't get best accuracy, you just reduce the error. Clearly doing 
fdatasync() is best, since other i/o caused by sync() can skew the results.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

-

From: Arjan van de Ven
Date: Friday, October 26, 2007 - 10:46 pm

On Fri, 26 Oct 2007 12:21:55 -0700

another option... this is ext2.. didn't the ext2 reservation stuff get
merged into -rc1? for ext3 that gave a 4x or so speed boost (much
better sequential allocation pattern)

(or maybe I'm just wrong)
-

From: Andrew Morton
Date: Friday, October 26, 2007 - 10:59 pm

Yes, one would expect that to make a large difference in dd2/dd2-dir and
dd3/dd3-dir - but only on SMP.  On UP there's not enough concurrency in the
fs block allocator for any damage to occur.

Reservations won't affect dd1 though, and that went faster too.
-

Previous thread: Interrupt Latency test for intel machine by Jaswinder Singh on Friday, October 26, 2007 - 7:13 am. (1 message)

Next thread: Re: Is gcc thread-unsafe? by Bart Van Assche on Friday, October 26, 2007 - 7:40 am. (22 messages)