Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority

Previous thread: [PATCH] tgafb: Remove a redundant non-mono test in mono imageblit by Maciej W. Rozycki on Monday, October 15, 2007 - 1:42 pm. (1 message)

Next thread: none
To: <linux-kernel@...>
Cc: <akpm@...>, <jens.axboe@...>, <mingo@...>
Date: Monday, October 15, 2007 - 1:46 pm

Subject: Give kjournald a IOPRIO_CLASS_RT io priority
From: Arjan van de Ven <arjan@linux.intel.com>

With latencytop, I noticed that the (in memory) atime updates during a
kernel build had latencies of 600 msec or longer; this is obviously not so
nice behavior. Other EXT3 journal related operations had similar or even
longer latencies.

Digging into this a bit more, it appears to be an interaction between EXT3
and CFQ in that CFQ tries to be fair to everyone, including kjournald.
However, in reality, kjournald is "special" in that it does a lot of journal
work and effectively this leads to a twisted kind of "mass priority
inversion" type of behavior.

The good news is that CFQ already has the infrastructure to make certain
processes special... JBD just wasn't using that quite yet.

The patch below makes kjournald of the IOPRIO_CLASS_RT priority to break
this priority inversion behavior. With this patch, the latencies for atime
updates (and similar operation) go down by a factor of 3x to 4x !

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>

diff -purN linux-2.6.23-rc9.org/fs/jbd/journal.c linux-2.6.23-rc9.lt/fs/jbd/journal.c
--- linux-2.6.23-rc9.org/fs/jbd/journal.c 2007-10-02 05:24:52.000000000 +0200
+++ linux-2.6.23-rc9.lt/fs/jbd/journal.c 2007-10-14 00:06:55.000000000 +0200
@@ -35,6 +35,7 @@
#include <linux/kthread.h>
#include <linux/poison.h>
#include <linux/proc_fs.h>
+#include <linux/ioprio.h>

#include <asm/uaccess.h>
#include <asm/page.h>
@@ -131,6 +132,8 @@ static int kjournald(void *arg)
printk(KERN_INFO "kjournald starting. Commit interval %ld seconds\n",
journal->j_commit_interval / HZ);

+ current->ioprio = (IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT) | 4;
+
/*
* And now, wait forever for commit wakeup events.
*/
-

To: Arjan van de Ven <arjan@...>
Cc: <linux-kernel@...>, <jens.axboe@...>, <mingo@...>
Date: Monday, October 15, 2007 - 2:47 pm

On Mon, 15 Oct 2007 10:46:47 -0700

Seems a pretty fundamental change which could do with some careful
benchmarking, methinks.

See, your patch amounts to "do more seeks to improve one test case".

Might be worth a code comment?
-

To: Andrew Morton <akpm@...>
Cc: Arjan van de Ven <arjan@...>, <linux-kernel@...>, <jens.axboe@...>, <mingo@...>
Date: Monday, October 15, 2007 - 4:13 pm

On Mon, 15 Oct 2007 11:47:38 -0700

FWIW, I have marked the kjournald processes on my system
realtime with "rtprio -c 1 `pidof kjournald`" and the
usual desktop stalls that plague my system have not yet

The big problem I have seen here is that processes end up
waiting on kjournald to do something, and kjournald is
waiting due to the IO scheduler.

This can cause a lot of low IO (high IO priority) processes
to indirectly get stuck behind a few high IO (low priority)
processes.

Since you have been involved a lot with ext3 development,
which kinds of workloads do you think will show a performance
degradation with Arjan's patch? What should I test?

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
-

To: Rik van Riel <riel@...>
Cc: <arjan@...>, <linux-kernel@...>, <jens.axboe@...>, <mingo@...>
Date: Monday, October 15, 2007 - 5:12 pm

On Mon, 15 Oct 2007 16:13:15 -0400

Gee. Expect the unexpected ;)

One problem might be when kjournald is doing its ordered-mode data
writeback at the start of commit. That writeout will now be
higher-priority and might impact other tasks which are doing synchronous
file overwrites (ie: no commits) or O_DIRECT reads or writes or just plain
old reads.

If the aggregate number of seeks over the long term is the same as before
then of course the overall throughput should be the same, in which case the
impact might only be upon latency. However if for some reason the total
number of seeks is increased then there will be throughput impacts as well.

So as a starting point I guess one could set up a
copy-a-kernel-tree-in-a-loop running in the background and then see what
impact that has upon a large-linear-read, upon a
read-a-different-kernel-tree and upon some database-style multithreaded
O_DIRECT reader/overwriter.
-

To: Andrew Morton <akpm@...>
Cc: Arjan van de Ven <arjan@...>, <linux-kernel@...>, <mingo@...>
Date: Monday, October 15, 2007 - 3:28 pm

Yes, completely agree! I think Arjans patch makes a heap of sense, but

It should not be merged as-is, instead I'll provide a function to do
this. It should also set current->io_context->ioprio_changed. Since no
IO has been done yet at this point it doesn't matter. But we should cut
a piece of set_task_ioprio() out and provide that as a kernel helper for
this sort of thing.

Even just writing it as:

current->ioprio = (IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT) | IOPRIO_NORM;

would be more readable. Or perhaps this would suffice, given the above
restriction that IO hasn't been done yet:

current->ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_RT, IOPRIO_NORM);

--
Jens Axboe

-

To: Jens Axboe <jens.axboe@...>
Cc: Andrew Morton <akpm@...>, Arjan van de Ven <arjan@...>, <linux-kernel@...>
Date: Monday, October 22, 2007 - 5:10 am

Arjan gave the relevant hard numbers:

| With latencytop, I noticed that the (in memory) atime updates during a
| kernel build had latencies of 600 msec or longer [...]
|
| With this patch, the latencies for atime updates (and similar
| operation) go down by a factor of 3x to 4x !

atime update latencies went down by a factor of 3x-4x ...

but what bothers me even more is the large picture. Linux's development
is still fundamentally skewed towards bandwidth (which goes up with
hardware advances anyway), while the focus on latencies is very lacking
(which users do care about much more and which usually does _not_
improve with improved hardware), so i cannot see why we shouldnt apply
this. Reminds me of the illogical, almost superstitious resistence
against the relatime patch. (which is not in 2.6.24 mind you - killed
for good)

if bandwidth hurts anywhere, it will be pointed out and fixed, we've got
like tons of bandwidth benchmarks and it's _easy_ to fix bandwidth
problems. But _finally_ we now have desktop latency tools, hard numbers
and patches that fix them, but what do we do ... we put up extra
roadblocks??

so lets just goddamn apply this _trivial_ patch. This isnt an intrusive
1000 line rewrite that is hard to revert. If it causes any bandwidth
problems, it will be just as trivial to undo. If we do anything else we
just stiffle the still young and very much under-represented "lets fix
latencies that bothers people" movement. If anything we need _positive_
discrimination for latency related fixes (which treatment this fix does
not need at all - all it needs is _equal_ footing with the countless
bandwidth patches that go into the kernel all the time), otherwise it
will never take off and become as healthy as bandwidth optimizations.
Ok?

Ingo
-

To: Ingo Molnar <mingo@...>
Cc: Jens Axboe <jens.axboe@...>, Arjan van de Ven <arjan@...>, <linux-kernel@...>
Date: Monday, October 22, 2007 - 5:23 am

I think the situation is that we've asked for some additional
what-can-be-hurt-by-this testing.

Yes, we could sling it out there and wait for the reports. But often
that's a pretty painful process and regressions can be discovered too late
for us to do anything about them.

-

To: Andrew Morton <akpm@...>
Cc: Jens Axboe <jens.axboe@...>, Arjan van de Ven <arjan@...>, <linux-kernel@...>
Date: Monday, October 22, 2007 - 5:40 am

reverting this oneliner is trivial. Finding bandwidth problems and
tracking them down to this oneliner change is relatively easy too.
Finding latency problems and fixing them is _not_ trivial.

Boot up a Linux desktop and start OOo or firefox, and measure the time
it takes to start the app up. 10-20 seconds on a top-of-the-line
quad-core 3.2 GHz system - which is a shame. Same box can do in excess
of 1GB/sec block IO. Yes, one could blame the apps but in reality most
of the blame is mostly on the kernel side. We do not make bloat and
latency suckage apparent enough to user-space (due to lack of
intelligent instrumentation), we make latencies hard to fix, we have an
acceptance bias towards bandwidth fixes (because they are easier to
measure and justify) - and that's all what it takes to let such a
situation get out of control.

and i can bring up the scheduler as an example. CFS broke the bandwidth
performance of one particular app and it took only a few days to get it
back under control. But it was months to get good latency behavior out
of the scheduler. And that is with the help of excellent scheduler
instrumentation. In the IO space the latency situation is much, much
worse. Really.

Ingo
-

To: Ingo Molnar <mingo@...>
Cc: Jens Axboe <jens.axboe@...>, Arjan van de Ven <arjan@...>, <linux-kernel@...>
Date: Monday, October 22, 2007 - 5:49 am

None of which is an argument for simply not bothering to do a bit more
developer testing before merging.

To: Andrew Morton <akpm@...>
Cc: Jens Axboe <jens.axboe@...>, Arjan van de Ven <arjan@...>, <linux-kernel@...>
Date: Monday, October 22, 2007 - 5:27 am

i mean this one:

http://people.redhat.com/mingo/relatime-patches/improve-relatime.patch

this actually makes relatime practical.

Ingo
-

Previous thread: [PATCH] tgafb: Remove a redundant non-mono test in mono imageblit by Maciej W. Rozycki on Monday, October 15, 2007 - 1:42 pm. (1 message)

Next thread: none