On 2.6.24, top started showing 100% iowait on one CPU when a UML
instance was running (but completely idle). I've traced it to this
commit. Reverting it cures the problem.
Miklos
commit 41d10da3717409de33d5441f2f6d8f072ab3fbb6
Author: Jeff Moyer <jmoyer@redhat.com>
Date: Tue Oct 16 23:27:20 2007 -0700
aio: account I/O wait time properly
Some months back I proposed changing the schedule() call in
read_events to an io_schedule():
http://osdir.com/ml/linux.kernel.aio.general/2006-10/msg00024.html
This was rejected as there are AIO operations that do not initiate
disk I/O. I've had another look at the problem, and the only AIO
operation that will not initiate disk I/O is IOCB_CMD_NOOP. However,
this command isn't even wired up!
Given that it doesn't work, and hasn't for *years*, I'm going to
suggest again that we do proper I/O accounting when using AIO.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Zach Brown <zach.brown@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff --git a/fs/aio.c b/fs/aio.c
index ea2e198..d02f43b 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -303,7 +303,7 @@ static void wait_for_all_aios(struct kioctx *ctx)
set_task_state(tsk, TASK_UNINTERRUPTIBLE);
while (ctx->reqs_active) {
spin_unlock_irq(&ctx->ctx_lock);
- schedule();
+ io_schedule();
set_task_state(tsk, TASK_UNINTERRUPTIBLE);
spin_lock_irq(&ctx->ctx_lock);
}
@@ -323,7 +323,7 @@ ssize_t fastcall wait_on_sync_kiocb(struct kiocb *iocb)
set_current_state(TASK_UNINTERRUPTIBLE);
if (!iocb->ki_users)
break;
- schedule();
+ io_schedule();
}
__set_current_state(TASK_RUNNING);
return iocb->ki_user_data;
@@ -1170,7 +1170,7 @@ retry:
ret = 0;
...Hi, The UML code sits in io_getevents waiting for an event to be submitted and completed. This is a case I completely overlooked when putting together the referenced patch. We could check ctx->reqs_active before scheduling to determine whether or not we are waiting for I/O, but this would require taking the context lock in order to be accurate. Given that the test would be only for the sake of book keeping, it might be okay to do it outside of the lock. Zach, what are your thoughts on this? -Jeff [1] commit 41d10da3717409de33d5441f2f6d8f072ab3fbb6 --
I agree that it'd be OK to test it outside the lock, though we'll want some commentary: /* Try to only show up in io wait if there are ops in flight */ if (ctx->reqs_active) io_schedule(); else schedule(); It's cheap, safe, and accurate the overwhelming majority of the time :). We only need it in read_events(). The other two io_schedule() calls are only reached to wait on pending reqs specifically. It still won't make sense for iocbs which aren't performing IO, but I guess that's one more bridge to cross when we come to it. Do you want to throw this tiny patch together and submit it? - z --
Sure. I tested this on a system that I used to reproduce the problem,
and it shows I/O Wait back at normal levels on an idle system with 1
uml guest running.
Andrew, do you need a separate email with a [patch] heading or will
this do?
Cheers,
Jeff
Only account I/O wait time in read_events if there are active
requests.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
diff --git a/fs/aio.c b/fs/aio.c
index f12db41..9dec7d2 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1161,7 +1161,12 @@ retry:
ret = 0;
if (to.timed_out) /* Only check after read evt */
break;
- io_schedule();
+ /* Try to only show up in io wait if there are ops
+ * in flight */
+ if (ctx->reqs_active)
+ io_schedule();
+ else
+ schedule();
if (signal_pending(tsk)) {
ret = -EINTR;
break;
--
