"With latencytop, I noticed that the (in memory) atime updates during a kernel build had latencies of 600 msec or longer; this is obviously not so nice behavior. Other EXT3 journal related operations had similar or even longer latencies," Arjan van de Ven reported [1], describing a "mass priority inversion" caused by, "an interaction between EXT3 and CFQ in that CFQ tries to be fair to everyone, including kjournald. However, in reality, kjournald is 'special' in that it does a lot of journal work". Finally, he offered a tiny patch to resolve the issue, "the patch below makes kjournald of the IOPRIO_CLASS_RT priority to break this priority inversion behavior. With this patch, the latencies for atime updates (and similar operation) go down by a factor of 3x to 4x !"
Andrew Morton took a cautious stance, "seems a pretty fundamental change which could do with some careful benchmarking, methinks. See, your patch amounts to 'do more seeks to improve one test case'. Surely other testcases will worsen. What are they?" CFQ author Jens Axboe agreed, "It should not be merged as-is, instead I'll provide a function to do this." Ingo Molnar wasn't convinced, "atime update latencies went down by a factor of 3x-4x ... but what bothers me even more is the large picture. Linux's development is still fundamentally skewed towards bandwidth (which goes up with hardware advances anyway), while the focus on latencies is very lacking (which users do care about much more and which usually does _not_ improve with improved hardware), so i cannot see why we shouldnt apply this." He added, "if bandwidth hurts anywhere, it will be pointed out and fixed, we've got like tons of bandwidth benchmarks and it's _easy_ to fix bandwidth problems. But _finally_ we now have desktop latency tools, hard numbers and patches that fix them, but what do we do ... we put up extra roadblocks??" Andrew calmy replied, "I think the situation is that we've asked for some additional what-can-be-hurt-by-this testing. Yes, we could sling it out there and wait for the reports. But often that's a pretty painful process and regressions can be discovered too late for us to do anything about them."
From: Arjan van de Ven <arjan@...> Subject: [patch] Give kjournald a IOPRIO_CLASS_RT io priority [1]Date: Oct 15, 1:46 pm 2007 Subject: Give kjournald a IOPRIO_CLASS_RT io priority From: Arjan van de Ven <arjan@linux.intel.com> With latencytop, I noticed that the (in memory) atime updates during a kernel build had latencies of 600 msec or longer; this is obviously not so nice behavior. Other EXT3 journal related operations had similar or even longer latencies. Digging into this a bit more, it appears to be an interaction between EXT3 and CFQ in that CFQ tries to be fair to everyone, including kjournald. However, in reality, kjournald is "special" in that it does a lot of journal work and effectively this leads to a twisted kind of "mass priority inversion" type of behavior. The good news is that CFQ already has the infrastructure to make certain processes special... JBD just wasn't using that quite yet. The patch below makes kjournald of the IOPRIO_CLASS_RT priority to break this priority inversion behavior. With this patch, the latencies for atime updates (and similar operation) go down by a factor of 3x to 4x ! Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> diff -purN linux-2.6.23-rc9.org/fs/jbd/journal.c linux-2.6.23-rc9.lt/fs/jbd/journal.c --- linux-2.6.23-rc9.org/fs/jbd/journal.c 2007-10-02 05:24:52.000000000 +0200 +++ linux-2.6.23-rc9.lt/fs/jbd/journal.c 2007-10-14 00:06:55.000000000 +0200 @@ -35,6 +35,7 @@ #include <linux/kthread.h> #include <linux/poison.h> #include <linux/proc_fs.h> +#include <linux/ioprio.h> #include <asm/uaccess.h> #include <asm/page.h> @@ -131,6 +132,8 @@ static int kjournald(void *arg) printk(KERN_INFO "kjournald starting. Commit interval %ld seconds\n", journal->j_commit_interval / HZ); + current->ioprio = (IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT) | 4; + /* * And now, wait forever for commit wakeup events. */ -
From: Andrew Morton <akpm@...> Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority [1]Date: Oct 15, 2:47 pm 2007 On Mon, 15 Oct 2007 10:46:47 -0700 Arjan van de Ven <arjan@infradead.org> wrote: > > Subject: Give kjournald a IOPRIO_CLASS_RT io priority > From: Arjan van de Ven <arjan@linux.intel.com> > > With latencytop, I noticed that the (in memory) atime updates during a > kernel build had latencies of 600 msec or longer; this is obviously not so > nice behavior. Other EXT3 journal related operations had similar or even > longer latencies. > > Digging into this a bit more, it appears to be an interaction between EXT3 > and CFQ in that CFQ tries to be fair to everyone, including kjournald. > However, in reality, kjournald is "special" in that it does a lot of journal > work and effectively this leads to a twisted kind of "mass priority > inversion" type of behavior. > > The good news is that CFQ already has the infrastructure to make certain > processes special... JBD just wasn't using that quite yet. > > The patch below makes kjournald of the IOPRIO_CLASS_RT priority to break > this priority inversion behavior. With this patch, the latencies for atime > updates (and similar operation) go down by a factor of 3x to 4x ! > Seems a pretty fundamental change which could do with some careful benchmarking, methinks. See, your patch amounts to "do more seeks to improve one test case". Surely other testcases will worsen. What are they? > Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> > > > diff -purN linux-2.6.23-rc9.org/fs/jbd/journal.c linux-2.6.23-rc9.lt/fs/jbd/journal.c > --- linux-2.6.23-rc9.org/fs/jbd/journal.c 2007-10-02 05:24:52.000000000 +0200 > +++ linux-2.6.23-rc9.lt/fs/jbd/journal.c 2007-10-14 00:06:55.000000000 +0200 > @@ -35,6 +35,7 @@ > #include <linux/kthread.h> > #include <linux/poison.h> > #include <linux/proc_fs.h> > +#include <linux/ioprio.h> > > #include <asm/uaccess.h> > #include <asm/page.h> > @@ -131,6 +132,8 @@ static int kjournald(void *arg) > printk(KERN_INFO "kjournald starting. Commit interval %ld seconds\n", > journal->j_commit_interval / HZ); > > + current->ioprio = (IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT) | 4; > + Might be worth a code comment? -
From: Jens Axboe <jens.axboe@...> Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority [1]Date: Oct 15, 3:28 pm 2007 On Mon, Oct 15 2007, Andrew Morton wrote: > On Mon, 15 Oct 2007 10:46:47 -0700 > Arjan van de Ven <arjan@infradead.org> wrote: > > > > > Subject: Give kjournald a IOPRIO_CLASS_RT io priority > > From: Arjan van de Ven <arjan@linux.intel.com> > > > > With latencytop, I noticed that the (in memory) atime updates during a > > kernel build had latencies of 600 msec or longer; this is obviously not so > > nice behavior. Other EXT3 journal related operations had similar or even > > longer latencies. > > > > Digging into this a bit more, it appears to be an interaction between EXT3 > > and CFQ in that CFQ tries to be fair to everyone, including kjournald. > > However, in reality, kjournald is "special" in that it does a lot of journal > > work and effectively this leads to a twisted kind of "mass priority > > inversion" type of behavior. > > > > The good news is that CFQ already has the infrastructure to make certain > > processes special... JBD just wasn't using that quite yet. > > > > The patch below makes kjournald of the IOPRIO_CLASS_RT priority to break > > this priority inversion behavior. With this patch, the latencies for atime > > updates (and similar operation) go down by a factor of 3x to 4x ! > > > > Seems a pretty fundamental change which could do with some careful > benchmarking, methinks. > > See, your patch amounts to "do more seeks to improve one test case". > Surely other testcases will worsen. What are they? Yes, completely agree! I think Arjans patch makes a heap of sense, but some numbers would be great to see. > > Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> > > > > > > diff -purN linux-2.6.23-rc9.org/fs/jbd/journal.c linux-2.6.23-rc9.lt/fs/jbd/journal.c > > --- linux-2.6.23-rc9.org/fs/jbd/journal.c 2007-10-02 05:24:52.000000000 +0200 > > +++ linux-2.6.23-rc9.lt/fs/jbd/journal.c 2007-10-14 00:06:55.000000000 +0200 > > @@ -35,6 +35,7 @@ > > #include <linux/kthread.h> > > #include <linux/poison.h> > > #include <linux/proc_fs.h> > > +#include <linux/ioprio.h> > > > > #include <asm/uaccess.h> > > #include <asm/page.h> > > @@ -131,6 +132,8 @@ static int kjournald(void *arg) > > printk(KERN_INFO "kjournald starting. Commit interval %ld seconds\n", > > journal->j_commit_interval / HZ); > > > > + current->ioprio = (IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT) | 4; > > + > > Might be worth a code comment? It should not be merged as-is, instead I'll provide a function to do this. It should also set current->io_context->ioprio_changed. Since no IO has been done yet at this point it doesn't matter. But we should cut a piece of set_task_ioprio() out and provide that as a kernel helper for this sort of thing. Even just writing it as: current->ioprio = (IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT) | IOPRIO_NORM; would be more readable. Or perhaps this would suffice, given the above restriction that IO hasn't been done yet: current->ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_RT, IOPRIO_NORM); -- Jens Axboe -
From: Ingo Molnar <mingo@...> Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority [1]Date: Oct 22, 5:10 am 2007 * Jens Axboe <jens.axboe@oracle.com> wrote: > > Seems a pretty fundamental change which could do with some careful > > benchmarking, methinks. > > > > See, your patch amounts to "do more seeks to improve one test case". > > Surely other testcases will worsen. What are they? > > Yes, completely agree! I think Arjans patch makes a heap of sense, but > some numbers would be great to see. Arjan gave the relevant hard numbers: | With latencytop, I noticed that the (in memory) atime updates during a | kernel build had latencies of 600 msec or longer [...] | | With this patch, the latencies for atime updates (and similar | operation) go down by a factor of 3x to 4x ! atime update latencies went down by a factor of 3x-4x ... but what bothers me even more is the large picture. Linux's development is still fundamentally skewed towards bandwidth (which goes up with hardware advances anyway), while the focus on latencies is very lacking (which users do care about much more and which usually does _not_ improve with improved hardware), so i cannot see why we shouldnt apply this. Reminds me of the illogical, almost superstitious resistence against the relatime patch. (which is not in 2.6.24 mind you - killed for good) if bandwidth hurts anywhere, it will be pointed out and fixed, we've got like tons of bandwidth benchmarks and it's _easy_ to fix bandwidth problems. But _finally_ we now have desktop latency tools, hard numbers and patches that fix them, but what do we do ... we put up extra roadblocks?? so lets just goddamn apply this _trivial_ patch. This isnt an intrusive 1000 line rewrite that is hard to revert. If it causes any bandwidth problems, it will be just as trivial to undo. If we do anything else we just stiffle the still young and very much under-represented "lets fix latencies that bothers people" movement. If anything we need _positive_ discrimination for latency related fixes (which treatment this fix does not need at all - all it needs is _equal_ footing with the countless bandwidth patches that go into the kernel all the time), otherwise it will never take off and become as healthy as bandwidth optimizations. Ok? Ingo -
From: Andrew Morton <akpm@...> Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority [1]Date: Oct 22, 5:23 am 2007 On Mon, 22 Oct 2007 11:10:57 +0200 Ingo Molnar <mingo@elte.hu> wrote: > > * Jens Axboe <jens.axboe@oracle.com> wrote: > > > > Seems a pretty fundamental change which could do with some careful > > > benchmarking, methinks. > > > > > > See, your patch amounts to "do more seeks to improve one test case". > > > Surely other testcases will worsen. What are they? > > > > Yes, completely agree! I think Arjans patch makes a heap of sense, but > > some numbers would be great to see. > > Arjan gave the relevant hard numbers: > > | With latencytop, I noticed that the (in memory) atime updates during a > | kernel build had latencies of 600 msec or longer [...] > | > | With this patch, the latencies for atime updates (and similar > | operation) go down by a factor of 3x to 4x ! > > atime update latencies went down by a factor of 3x-4x ... > > but what bothers me even more is the large picture. Linux's development > is still fundamentally skewed towards bandwidth (which goes up with > hardware advances anyway), while the focus on latencies is very lacking > (which users do care about much more and which usually does _not_ > improve with improved hardware), so i cannot see why we shouldnt apply > this. Reminds me of the illogical, almost superstitious resistence > against the relatime patch. (which is not in 2.6.24 mind you - killed > for good) Try `mount -o relatime' and prepare to be surprised ;) > if bandwidth hurts anywhere, it will be pointed out and fixed, we've got > like tons of bandwidth benchmarks and it's _easy_ to fix bandwidth > problems. But _finally_ we now have desktop latency tools, hard numbers > and patches that fix them, but what do we do ... we put up extra > roadblocks?? > > so lets just goddamn apply this _trivial_ patch. This isnt an intrusive > 1000 line rewrite that is hard to revert. If it causes any bandwidth > problems, it will be just as trivial to undo. If we do anything else we > just stiffle the still young and very much under-represented "lets fix > latencies that bothers people" movement. If anything we need _positive_ > discrimination for latency related fixes (which treatment this fix does > not need at all - all it needs is _equal_ footing with the countless > bandwidth patches that go into the kernel all the time), otherwise it > will never take off and become as healthy as bandwidth optimizations. > Ok? > I think the situation is that we've asked for some additional what-can-be-hurt-by-this testing. Yes, we could sling it out there and wait for the reports. But often that's a pretty painful process and regressions can be discovered too late for us to do anything about them. -
From: Ingo Molnar <mingo@...> Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority [1]Date: Oct 22, 5:27 am 2007 * Andrew Morton <akpm@linux-foundation.org> wrote: > > but what bothers me even more is the large picture. Linux's > > development is still fundamentally skewed towards bandwidth (which > > goes up with hardware advances anyway), while the focus on latencies > > is very lacking (which users do care about much more and which > > usually does _not_ improve with improved hardware), so i cannot see > > why we shouldnt apply this. Reminds me of the illogical, almost > > superstitious resistence against the relatime patch. (which is not > > in 2.6.24 mind you - killed for good) > > Try `mount -o relatime' and prepare to be surprised ;) i mean this one: http://people.redhat.com/mingo/relatime-patches/improve-relatime.patch [2] this actually makes relatime practical. Ingo -
From: Ingo Molnar <mingo@...> Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority [2]Date: Oct 22, 5:40 am 2007 * Andrew Morton <akpm@linux-foundation.org> wrote: > > so lets just goddamn apply this _trivial_ patch. This isnt an > > intrusive 1000 line rewrite that is hard to revert. If it causes any > > bandwidth problems, it will be just as trivial to undo. If we do > > anything else we just stiffle the still young and very much > > under-represented "lets fix latencies that bothers people" movement. > > If anything we need _positive_ discrimination for latency related > > fixes (which treatment this fix does not need at all - all it needs > > is _equal_ footing with the countless bandwidth patches that go into > > the kernel all the time), otherwise it will never take off and > > become as healthy as bandwidth optimizations. Ok? > > I think the situation is that we've asked for some additional > what-can-be-hurt-by-this testing. > > Yes, we could sling it out there and wait for the reports. But often > that's a pretty painful process and regressions can be discovered too > late for us to do anything about them. reverting this oneliner is trivial. Finding bandwidth problems and tracking them down to this oneliner change is relatively easy too. Finding latency problems and fixing them is _not_ trivial. Boot up a Linux desktop and start OOo or firefox, and measure the time it takes to start the app up. 10-20 seconds on a top-of-the-line quad-core 3.2 GHz system - which is a shame. Same box can do in excess of 1GB/sec block IO. Yes, one could blame the apps but in reality most of the blame is mostly on the kernel side. We do not make bloat and latency suckage apparent enough to user-space (due to lack of intelligent instrumentation), we make latencies hard to fix, we have an acceptance bias towards bandwidth fixes (because they are easier to measure and justify) - and that's all what it takes to let such a situation get out of control. and i can bring up the scheduler as an example. CFS broke the bandwidth performance of one particular app and it took only a few days to get it back under control. But it was months to get good latency behavior out of the scheduler. And that is with the help of excellent scheduler instrumentation. In the IO space the latency situation is much, much worse. Really. Ingo -
From: Andrew Morton <akpm@...> Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority [2]Date: Oct 22, 5:49 am 2007 None of which is an argument for simply not bothering to do a bit more developer testing before merging.
Related links:
- Archive of above thread [2]