Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority

Previous thread: [PATCH] CRISv10 fasttimer: Scrap INLINE and name timeval_cmp better by Jesper Nilsson on Wednesday, November 14, 2007 - 1:08 pm. (3 messages)

Next thread: [PATCH] mm: Don't allow ioremapping of ranges larger than vmalloc space by Robert Bragg on Wednesday, November 14, 2007 - 2:31 pm. (2 messages)
To: Alan D. Brunelle <Alan.Brunelle@...>
Cc: Rik van Riel <riel@...>, <arjan@...>, <jens.axboe@...>, <mingo@...>, <linux-kernel@...>
Date: Wednesday, November 14, 2007 - 1:14 pm

(cc lkml restored, with permission)

I'd consider its status to be "might be a good idea, more performance

These are *large* differences, making this a very signifcant patch. Much
care is needed now.

Could you expand a bit on what you're testing here? I think that in one
process you're doing a continuous copy-a-kernel-tree and in the other
process you're the above three things, yes?

I guess the other things we should look at are the impact on the
continuously-copy-a-kernel-tree process and also the overall IO throughput.
These things will of course be related. If the overall system-wide IO
throughput increases with the patch then we probably have a no-brainer. If
(as I suspect) the overall IO throughput is decreased then this will be a

hm, yes. Back in the days when I used to do useful things I'd do most
testing of this sort on 256MB, 128MB or even 64MB machines. So that data
would get tossed out of cache quickly so that I could use smaller working

Sure, it hasn't been ruled out. Especially as those time deltas you're
measuring are so large. We haven't seen changes in IO throughput like that
in years. We just need to work out if they're net-positive or net-negative ;)

This will end up being a pretty large hunk of work I expect.
-

To: Andrew Morton <akpm@...>
Cc: Rik van Riel <riel@...>, <arjan@...>, <jens.axboe@...>, <mingo@...>, <linux-kernel@...>
Date: Friday, November 16, 2007 - 12:25 pm

Here are the results for the latest tests, some notes:

o The machine actually has 8GiB of RAM, so the tests still may end up

o Sorry the results took so long - the updated tree size caused the
runs to take > 12 hours...

o The longer runs seemed to bring down the standard deviation a bit,
although they are still quite large.

o 10 runs per test (read large file, read a tree, overwrite large
file), with averages presented.

o 1st 4 columns (min, avg, max, std dev) refer to the average run
lengths for the tests - real time, in seconds

o The last 3 columns are extracted from iostat results over the course
of the whole run.

o The read a tree test certainly stands out - the other 2 large file
manipulations have the two kernels within a couple of percent, but the
read a tree test has Arjan's patch taking about 47%(!) longer on
average. The increased %iowait & %system time in all 3 cases is interesting.

Read large file:

Kernel Min Avg Max Std Dev %user %system %iowait
--------------------------------------------------------------
base : 201.6 215.1 275.5 22.8 0.26% 4.69% 33.54%
arjan: 198.0 210.3 261.5 18.5 0.33% 10.24% 54.00%

Read a tree:

Kernel Min Avg Max Std Dev %user %system %iowait
--------------------------------------------------------------
base : 3518.2 4631.3 5991.3 784.6 0.19% 3.29% 23.56%
arjan: 5731.6 6849.8 7777.4 731.6 0.32% 9.90% 52.70%

Overwrite large file:

Kernel Min Avg Max Std Dev %user %system %iowait
--------------------------------------------------------------
base : 104.2 147.7 239.5 38.4 0.02% 0.05% 1.08%
arjan: 106.2 149.7 239.2 38.4 0.25% 0.79% 14.97%

Let me know if there is anything else I can do to elaborate, or if you
have suggestions for further testing.

Alan
-

To: Alan D. Brunelle <alan.brunelle@...>
Cc: Andrew Morton <akpm@...>, Rik van Riel <riel@...>, <arjan@...>, <jens.axboe@...>, <mingo@...>, <linux-kernel@...>
Date: Friday, November 16, 2007 - 2:35 pm

Out of curiosity, what are the mount options for the freshly created
ext3 fs? In particular, are you using noatime, nodiratime?

Ray
-

To: Ray Lee <ray-lk@...>
Cc: Andrew Morton <akpm@...>, Rik van Riel <riel@...>, <arjan@...>, <jens.axboe@...>, <mingo@...>, <linux-kernel@...>
Date: Friday, November 16, 2007 - 2:39 pm

Nope, just mount. However, the tool I'm using to read the large file &
overwrite the large file does open with O_NOATIME for reads...

The tool used to read the files in the read-a-tree test is dd, and I
doubt(?) it does a O_NOATIME...

Alan
-

To: Andrew Morton <akpm@...>
Cc: Rik van Riel <riel@...>, <arjan@...>, <jens.axboe@...>, <mingo@...>, <linux-kernel@...>
Date: Friday, November 16, 2007 - 12:40 pm

I'm going to try and do some clean up work on the iostat CPU results -
the reason %user & %system are so low is (I think) because they also
include a lot of 0% results from the tail of the runs (as the unmount is
going on I think). I'm going to try and extract results for just the
"meat" of the runs.

Alan
-

To: Andrew Morton <akpm@...>
Cc: Rik van Riel <riel@...>, <arjan@...>, Jens Axboe <jens.axboe@...>, <mingo@...>, <linux-kernel@...>
Date: Wednesday, November 14, 2007 - 3:24 pm

The test works like this:

1. I ensure that the device under test (DUT) is set to run the CFQ
scheduler.
1. It is a Fibre Channel 72GiB disk
2. Single partition...
2. Put an Ext3 FS on the partition (mkfs.ext3 -b 4096)
3. Mount the device, and then:
1. Put an 8GiB file on the new FS
2. Put 3 copies of a Linux tree (w/ objs & kernel & such) onto
the FS in separate directories
1. Note: I'm going to do runs with 6 copies to each
directory tree to get to about 4.2GiB per directory tree
4. Then, for each of the tests:
1. Remount the device (purge page cache by umount & then mount)
2. Start up a copy of 1 kernel tree to another tree (you hadn't
specified if the copy in the background should be to a new
area or not, so I'm just re-using the same area so we don't
have to worry about removing the old). I keep doing the copy
as long as the tests are going
3. Perform the test (10 times)

The tests are:

* Linear read of a large file (8GiB)
* Tree read (foreach file in the tree, dd it to /dev/null)
* Overwrite of that large file: was doing 256KiB random&direct
read/writes, will go down to 4KiB read/writes as that is more
realistic I'd guess

I'm going to try and get the comparisons done by tomorrow, the results
should be very different due to the changes noted above (going to 4.2GiB
trees instead of 700MiB, going to 4K instead of 256K read/writes). This
may cause the runs to be much longer, and then I won't get it done as

I'll add in continuous 'iostat' grabs, and present data on that too - it
would contain both generic IO information as well as grabbing
I'll get results out when I have the changes made to the script
(outlined above), and the runs done.

Alan
-

To: Andrew Morton <akpm@...>
Cc: Rik van Riel <riel@...>, <arjan@...>, Jens Axboe <jens.axboe@...>, <mingo@...>, <linux-kernel@...>
Date: Wednesday, November 14, 2007 - 3:56 pm

Oh, and the runs were done in single-user mode...

Alan
-

To: Alan D. Brunelle <Alan.Brunelle@...>
Cc: Andrew Morton <akpm@...>, Rik van Riel <riel@...>, Jens Axboe <jens.axboe@...>, <mingo@...>, <linux-kernel@...>
Date: Wednesday, November 14, 2007 - 3:50 pm

On Wed, 14 Nov 2007 14:24:03 -0500

ok so the obvious meta-question is this: what does it mean that your
test takes longer or shorter. I can see IO "capacity" (trying to avoid
the use bandwidth here) moves from the foreground test to the
background test (and/or other way around)... but if that was starved
previously... it could or could not be the right result.
What do you think the measure of "it's at least not worse" is? Is there
any way to get to that concept? (and then looking at if that got met is
the second step ;( )

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
-

To: Andrew Morton <akpm@...>
Cc: Alan D. Brunelle <Alan.Brunelle@...>, Rik van Riel <riel@...>, <arjan@...>, <jens.axboe@...>, <linux-kernel@...>
Date: Wednesday, November 14, 2007 - 1:18 pm

and the numbers suggest it's mostly a severe performance regression.
That's not what i have expected - ho hum. Apologies for my earlier
"please merge it already!" whining.

Ingo
-

To: Ingo Molnar <mingo@...>
Cc: Andrew Morton <akpm@...>, Alan D. Brunelle <Alan.Brunelle@...>, Rik van Riel <riel@...>, <jens.axboe@...>, <linux-kernel@...>
Date: Wednesday, November 14, 2007 - 1:51 pm

On Wed, 14 Nov 2007 18:18:05 +0100

that's.. not automatic; it depends on what the right thing is :-(
What for sure changes is that who gets to do IO changes. Some of the
tests we ran internally (we didn't publish yet because we saw REALLY
large variations for most of them even without any patch) show for
example that "dbench" got slower. But.. dbench gets slower when things
get more fair, and faster when things get unfair. What conclusion you
draw out of that is a whole different matter and depends on exactly
what the test is doing, and what is the right thing for the OS to do in
terms of who gets to do the IO.

THis makes the patch more tricky than the one line change suggests, and
this is also why I haven't published a ton of data yet; it's hard to
get useful tests for this (and the variation of the 2.6.23+ kernels
makes it even harder to do anything meaningful ;-( )

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
-

To: Arjan van de Ven <arjan@...>
Cc: Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, Rik van Riel <riel@...>, <jens.axboe@...>, <linux-kernel@...>
Date: Wednesday, November 14, 2007 - 3:43 pm

I'd also like to point out here that the run-to-run deviation was indeed
quite large for both the unpatched- and patched-kernels, I'll report on
that information with the next set of results...

Alan
-

To: Arjan van de Ven <arjan@...>
Cc: Andrew Morton <akpm@...>, Alan D. Brunelle <Alan.Brunelle@...>, Rik van Riel <riel@...>, <jens.axboe@...>, <linux-kernel@...>
Date: Wednesday, November 14, 2007 - 2:55 pm

yeah, i'd agree to not too much faith into dbench results.

Ingo
-

Previous thread: [PATCH] CRISv10 fasttimer: Scrap INLINE and name timeval_cmp better by Jesper Nilsson on Wednesday, November 14, 2007 - 1:08 pm. (3 messages)

Next thread: [PATCH] mm: Don't allow ioremapping of ranges larger than vmalloc space by Robert Bragg on Wednesday, November 14, 2007 - 2:31 pm. (2 messages)