From: Arjan van de Ven <arjan@...>
Subject: [patch] Give kjournald a IOPRIO_CLASS_RT io priority
Date: Oct 15, 1:46 pm 2007
Subject: Give kjournald a IOPRIO_CLASS_RT io priority
From: Arjan van de Ven
With latencytop, I noticed that the (in memory) atime updates during a
kernel build had latencies of 600 msec or longer; this is obviously not so
nice behavior. Other EXT3 journal related operations had similar or even
longer latencies.
Digging into this a bit more, it appears to be an interaction between EXT3
and CFQ in that CFQ tries to be fair to everyone, including kjournald.
However, in reality, kjournald is "special" in that it does a lot of journal
work and effectively this leads to a twisted kind of "mass priority
inversion" type of behavior.
The good news is that CFQ already has the infrastructure to make certain
processes special... JBD just wasn't using that quite yet.
The patch below makes kjournald of the IOPRIO_CLASS_RT priority to break
this priority inversion behavior. With this patch, the latencies for atime
updates (and similar operation) go down by a factor of 3x to 4x !
Signed-off-by: Arjan van de Ven
diff -purN linux-2.6.23-rc9.org/fs/jbd/journal.c linux-2.6.23-rc9.lt/fs/jbd/journal.c
--- linux-2.6.23-rc9.org/fs/jbd/journal.c 2007-10-02 05:24:52.000000000 +0200
+++ linux-2.6.23-rc9.lt/fs/jbd/journal.c 2007-10-14 00:06:55.000000000 +0200
@@ -35,6 +35,7 @@
#include
#include
#include
+#include
#include
#include
@@ -131,6 +132,8 @@ static int kjournald(void *arg)
printk(KERN_INFO "kjournald starting. Commit interval %ld seconds\n",
journal->j_commit_interval / HZ);
+ current->ioprio = (IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT) | 4;
+
/*
* And now, wait forever for commit wakeup events.
*/
-
From: Andrew Morton <akpm@...>
Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority
Date: Oct 15, 2:47 pm 2007
On Mon, 15 Oct 2007 10:46:47 -0700
Arjan van de Ven wrote:
>
> Subject: Give kjournald a IOPRIO_CLASS_RT io priority
> From: Arjan van de Ven
>
> With latencytop, I noticed that the (in memory) atime updates during a
> kernel build had latencies of 600 msec or longer; this is obviously not so
> nice behavior. Other EXT3 journal related operations had similar or even
> longer latencies.
>
> Digging into this a bit more, it appears to be an interaction between EXT3
> and CFQ in that CFQ tries to be fair to everyone, including kjournald.
> However, in reality, kjournald is "special" in that it does a lot of journal
> work and effectively this leads to a twisted kind of "mass priority
> inversion" type of behavior.
>
> The good news is that CFQ already has the infrastructure to make certain
> processes special... JBD just wasn't using that quite yet.
>
> The patch below makes kjournald of the IOPRIO_CLASS_RT priority to break
> this priority inversion behavior. With this patch, the latencies for atime
> updates (and similar operation) go down by a factor of 3x to 4x !
>
Seems a pretty fundamental change which could do with some careful
benchmarking, methinks.
See, your patch amounts to "do more seeks to improve one test case".
Surely other testcases will worsen. What are they?
> Signed-off-by: Arjan van de Ven
>
>
> diff -purN linux-2.6.23-rc9.org/fs/jbd/journal.c linux-2.6.23-rc9.lt/fs/jbd/journal.c
> --- linux-2.6.23-rc9.org/fs/jbd/journal.c 2007-10-02 05:24:52.000000000 +0200
> +++ linux-2.6.23-rc9.lt/fs/jbd/journal.c 2007-10-14 00:06:55.000000000 +0200
> @@ -35,6 +35,7 @@
> #include
> #include
> #include
> +#include
>
> #include
> #include
> @@ -131,6 +132,8 @@ static int kjournald(void *arg)
> printk(KERN_INFO "kjournald starting. Commit interval %ld seconds\n",
> journal->j_commit_interval / HZ);
>
> + current->ioprio = (IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT) | 4;
> +
Might be worth a code comment?
-
From: Jens Axboe <jens.axboe@...>
Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority
Date: Oct 15, 3:28 pm 2007
On Mon, Oct 15 2007, Andrew Morton wrote:
> On Mon, 15 Oct 2007 10:46:47 -0700
> Arjan van de Ven wrote:
>
> >
> > Subject: Give kjournald a IOPRIO_CLASS_RT io priority
> > From: Arjan van de Ven
> >
> > With latencytop, I noticed that the (in memory) atime updates during a
> > kernel build had latencies of 600 msec or longer; this is obviously not so
> > nice behavior. Other EXT3 journal related operations had similar or even
> > longer latencies.
> >
> > Digging into this a bit more, it appears to be an interaction between EXT3
> > and CFQ in that CFQ tries to be fair to everyone, including kjournald.
> > However, in reality, kjournald is "special" in that it does a lot of journal
> > work and effectively this leads to a twisted kind of "mass priority
> > inversion" type of behavior.
> >
> > The good news is that CFQ already has the infrastructure to make certain
> > processes special... JBD just wasn't using that quite yet.
> >
> > The patch below makes kjournald of the IOPRIO_CLASS_RT priority to break
> > this priority inversion behavior. With this patch, the latencies for atime
> > updates (and similar operation) go down by a factor of 3x to 4x !
> >
>
> Seems a pretty fundamental change which could do with some careful
> benchmarking, methinks.
>
> See, your patch amounts to "do more seeks to improve one test case".
> Surely other testcases will worsen. What are they?
Yes, completely agree! I think Arjans patch makes a heap of sense, but
some numbers would be great to see.
> > Signed-off-by: Arjan van de Ven
> >
> >
> > diff -purN linux-2.6.23-rc9.org/fs/jbd/journal.c linux-2.6.23-rc9.lt/fs/jbd/journal.c
> > --- linux-2.6.23-rc9.org/fs/jbd/journal.c 2007-10-02 05:24:52.000000000 +0200
> > +++ linux-2.6.23-rc9.lt/fs/jbd/journal.c 2007-10-14 00:06:55.000000000 +0200
> > @@ -35,6 +35,7 @@
> > #include
> > #include
> > #include
> > +#include
> >
> > #include
> > #include
> > @@ -131,6 +132,8 @@ static int kjournald(void *arg)
> > printk(KERN_INFO "kjournald starting. Commit interval %ld seconds\n",
> > journal->j_commit_interval / HZ);
> >
> > + current->ioprio = (IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT) | 4;
> > +
>
> Might be worth a code comment?
It should not be merged as-is, instead I'll provide a function to do
this. It should also set current->io_context->ioprio_changed. Since no
IO has been done yet at this point it doesn't matter. But we should cut
a piece of set_task_ioprio() out and provide that as a kernel helper for
this sort of thing.
Even just writing it as:
current->ioprio = (IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT) | IOPRIO_NORM;
would be more readable. Or perhaps this would suffice, given the above
restriction that IO hasn't been done yet:
current->ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_RT, IOPRIO_NORM);
--
Jens Axboe
-
From: Ingo Molnar <mingo@...>
Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority
Date: Oct 22, 5:10 am 2007
* Jens Axboe wrote:
> > Seems a pretty fundamental change which could do with some careful
> > benchmarking, methinks.
> >
> > See, your patch amounts to "do more seeks to improve one test case".
> > Surely other testcases will worsen. What are they?
>
> Yes, completely agree! I think Arjans patch makes a heap of sense, but
> some numbers would be great to see.
Arjan gave the relevant hard numbers:
| With latencytop, I noticed that the (in memory) atime updates during a
| kernel build had latencies of 600 msec or longer [...]
|
| With this patch, the latencies for atime updates (and similar
| operation) go down by a factor of 3x to 4x !
atime update latencies went down by a factor of 3x-4x ...
but what bothers me even more is the large picture. Linux's development
is still fundamentally skewed towards bandwidth (which goes up with
hardware advances anyway), while the focus on latencies is very lacking
(which users do care about much more and which usually does _not_
improve with improved hardware), so i cannot see why we shouldnt apply
this. Reminds me of the illogical, almost superstitious resistence
against the relatime patch. (which is not in 2.6.24 mind you - killed
for good)
if bandwidth hurts anywhere, it will be pointed out and fixed, we've got
like tons of bandwidth benchmarks and it's _easy_ to fix bandwidth
problems. But _finally_ we now have desktop latency tools, hard numbers
and patches that fix them, but what do we do ... we put up extra
roadblocks??
so lets just goddamn apply this _trivial_ patch. This isnt an intrusive
1000 line rewrite that is hard to revert. If it causes any bandwidth
problems, it will be just as trivial to undo. If we do anything else we
just stiffle the still young and very much under-represented "lets fix
latencies that bothers people" movement. If anything we need _positive_
discrimination for latency related fixes (which treatment this fix does
not need at all - all it needs is _equal_ footing with the countless
bandwidth patches that go into the kernel all the time), otherwise it
will never take off and become as healthy as bandwidth optimizations.
Ok?
Ingo
-
From: Andrew Morton <akpm@...>
Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority
Date: Oct 22, 5:23 am 2007
On Mon, 22 Oct 2007 11:10:57 +0200 Ingo Molnar wrote:
>
> * Jens Axboe wrote:
>
> > > Seems a pretty fundamental change which could do with some careful
> > > benchmarking, methinks.
> > >
> > > See, your patch amounts to "do more seeks to improve one test case".
> > > Surely other testcases will worsen. What are they?
> >
> > Yes, completely agree! I think Arjans patch makes a heap of sense, but
> > some numbers would be great to see.
>
> Arjan gave the relevant hard numbers:
>
> | With latencytop, I noticed that the (in memory) atime updates during a
> | kernel build had latencies of 600 msec or longer [...]
> |
> | With this patch, the latencies for atime updates (and similar
> | operation) go down by a factor of 3x to 4x !
>
> atime update latencies went down by a factor of 3x-4x ...
>
> but what bothers me even more is the large picture. Linux's development
> is still fundamentally skewed towards bandwidth (which goes up with
> hardware advances anyway), while the focus on latencies is very lacking
> (which users do care about much more and which usually does _not_
> improve with improved hardware), so i cannot see why we shouldnt apply
> this. Reminds me of the illogical, almost superstitious resistence
> against the relatime patch. (which is not in 2.6.24 mind you - killed
> for good)
Try `mount -o relatime' and prepare to be surprised ;)
> if bandwidth hurts anywhere, it will be pointed out and fixed, we've got
> like tons of bandwidth benchmarks and it's _easy_ to fix bandwidth
> problems. But _finally_ we now have desktop latency tools, hard numbers
> and patches that fix them, but what do we do ... we put up extra
> roadblocks??
>
> so lets just goddamn apply this _trivial_ patch. This isnt an intrusive
> 1000 line rewrite that is hard to revert. If it causes any bandwidth
> problems, it will be just as trivial to undo. If we do anything else we
> just stiffle the still young and very much under-represented "lets fix
> latencies that bothers people" movement. If anything we need _positive_
> discrimination for latency related fixes (which treatment this fix does
> not need at all - all it needs is _equal_ footing with the countless
> bandwidth patches that go into the kernel all the time), otherwise it
> will never take off and become as healthy as bandwidth optimizations.
> Ok?
>
I think the situation is that we've asked for some additional
what-can-be-hurt-by-this testing.
Yes, we could sling it out there and wait for the reports. But often
that's a pretty painful process and regressions can be discovered too late
for us to do anything about them.
-
From: Ingo Molnar <mingo@...>
Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority
Date: Oct 22, 5:27 am 2007
* Andrew Morton wrote:
> > but what bothers me even more is the large picture. Linux's
> > development is still fundamentally skewed towards bandwidth (which
> > goes up with hardware advances anyway), while the focus on latencies
> > is very lacking (which users do care about much more and which
> > usually does _not_ improve with improved hardware), so i cannot see
> > why we shouldnt apply this. Reminds me of the illogical, almost
> > superstitious resistence against the relatime patch. (which is not
> > in 2.6.24 mind you - killed for good)
>
> Try `mount -o relatime' and prepare to be surprised ;)
i mean this one:
http://people.redhat.com/mingo/relatime-patches/improve-relatime.patch
this actually makes relatime practical.
Ingo
-
From: Ingo Molnar <mingo@...>
Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority
Date: Oct 22, 5:40 am 2007
* Andrew Morton wrote:
> > so lets just goddamn apply this _trivial_ patch. This isnt an
> > intrusive 1000 line rewrite that is hard to revert. If it causes any
> > bandwidth problems, it will be just as trivial to undo. If we do
> > anything else we just stiffle the still young and very much
> > under-represented "lets fix latencies that bothers people" movement.
> > If anything we need _positive_ discrimination for latency related
> > fixes (which treatment this fix does not need at all - all it needs
> > is _equal_ footing with the countless bandwidth patches that go into
> > the kernel all the time), otherwise it will never take off and
> > become as healthy as bandwidth optimizations. Ok?
>
> I think the situation is that we've asked for some additional
> what-can-be-hurt-by-this testing.
>
> Yes, we could sling it out there and wait for the reports. But often
> that's a pretty painful process and regressions can be discovered too
> late for us to do anything about them.
reverting this oneliner is trivial. Finding bandwidth problems and
tracking them down to this oneliner change is relatively easy too.
Finding latency problems and fixing them is _not_ trivial.
Boot up a Linux desktop and start OOo or firefox, and measure the time
it takes to start the app up. 10-20 seconds on a top-of-the-line
quad-core 3.2 GHz system - which is a shame. Same box can do in excess
of 1GB/sec block IO. Yes, one could blame the apps but in reality most
of the blame is mostly on the kernel side. We do not make bloat and
latency suckage apparent enough to user-space (due to lack of
intelligent instrumentation), we make latencies hard to fix, we have an
acceptance bias towards bandwidth fixes (because they are easier to
measure and justify) - and that's all what it takes to let such a
situation get out of control.
and i can bring up the scheduler as an example. CFS broke the bandwidth
performance of one particular app and it took only a few days to get it
back under control. But it was months to get good latency behavior out
of the scheduler. And that is with the help of excellent scheduler
instrumentation. In the IO space the latency situation is much, much
worse. Really.
Ingo
-
From: Andrew Morton <akpm@...>
Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority
Date: Oct 22, 5:49 am 2007
None of which is an argument for simply not bothering to do a bit more
developer testing before merging.
Where is latencytop?
Can't find latencytop with google. Where is it?
It's probably a homegrown
It's probably a homegrown tool by Ingo Molnar. I am confident that he plans to release it sooner or later; but for now, it seems that it's not public yet.
According to a LWN comment of his, he wants to make tracking latency problems as easy as Intel did with PowerTop. (PowerTop is considered a great success of kernel instrumentation)
Waiting for the "official"
Waiting for the "official" latencytop you can always use a simple script like this:
#!/bin/bash function Field() { if [ ! -f "$2" ]; then return 1 fi grep -m1 -e "$1" "$2" | cut -d: -f2- | tr -d '[[:space:]]' } function print_stat() { for task in /proc/*/task/*; do name=$(Field "^Name" $task/status) pid=$(basename $task) value=$(Field "^$1" $task/sched) if [ -z "$pid" ] || [ -z "$value" ]; then continue fi echo -e "$value\t$pid\t$name" done } echo "### BLOCK MAX ###" print_stat "se.block_max" | sort -k1nr | head -n10 echo echo "### WAIT MAX ###" print_stat "se.wait_max" | sort -k1nr | head -n10atime updates should be left
atime updates should be left aside from the main filesystem traffic. Kept in separate queue or so. And not cause write-outs as often as other inode modifications cause it. 'relatime' is a step in the research, good but not final.