From Leonid Ananiev The patch fixes oops because of extra IO control block freeing. IO is retried if page could not be invalidated. Signed-off-by: Leonid Ananiev <leonid.i.ananiev@intel.com> The patch fixes oops "Kernel BUG at fs/aio.c:509" archived at http://lkml.org/lkml/2007/2/8/337 The number of IO control block (iocb)users < 0. If page could not be invalidated by invalidate_inode_pages2_range() than EIO is returned. It happens if journal_try_to_free_buffers() fails to drop_buffers(). This EIO is not differing from real IO competition with EIO and aio_complete() is called. Later aio_complete() is called from dio_bio_end_aio() and frees iocb once more. After patch generic_file_direct_IO() sets PgBusy flag in iocb if page could not be invalidated. iocb is retried after IO competition. The process is waked up if IO is SYNC else iocb is kicked. The lines “if (ret != -EIOCBRETRY)” is deleted because nothing set to EIOCBRETRY. Next patches 2/3 and 3/3 do cleanup only. The patch is applied and tested with aio-stress on 2.6.20 and 2.6.21-rc2 diff -uprNX linux-2.6.21-rc2/Documentation/dontdiff linux-2.6.21-rc2/fs/aio.c linux-2.6.21-rc2-aio31/fs/aio.c --- linux-2.6.21-rc2/fs/aio.c 2007-03-05 00:29:58.000000000 -0800 +++ linux-2.6.21-rc2-aio31/fs/aio.c 2007-03-05 00:38:11.000000000 -0800 @@ -723,37 +723,12 @@ static ssize_t aio_run_iocb(struct kiocb ret = retry(iocb); current->io_wait = NULL; - if (ret != -EIOCBRETRY && ret != -EIOCBQUEUED) { + if (!kiocbIsPgBusy(iocb) && ret != -EIOCBQUEUED) { BUG_ON(!list_empty(&iocb->ki_wait.task_list)); aio_complete(iocb, ret, 0); } out: spin_lock_irq(&ctx->ctx_lock); - - if (-EIOCBRETRY == ret) { - /* - * OK, now that we are done with this iteration - * and know that there is more left to go, - * this is where we let go so that a subsequent - * "kick" can start the next iteration - */ - - /* will make __queue_kicked_iocb succeed from here on ...
On Mon, 05 Mar 2007 17:23:33 +0300 Your email client performs space-stuffing, which makes things harder on the receiving end. http://mbligh.org/linuxdocs/Email/Clients/Thunderbird might -
(I'm going to read this as "This EIO is misinterpreted as real IO This analysis is correct. Nothing can clobber -EIOCBQUEUED as it is returned up from fs/direct-io.c to fs/aio.c. This -EIO from invalidation is one of the two places that currently break this rule. The fix I had hoped for, invalidating down in fs/direct-io.c before returning -EIOCBQUEUED, doesn't work because it ends up getting the ordering between the journal lock and the page lock backwards. True, but this is gratuitously cruel to external users of - EIOCBRETRY. Silently breaking them doesn't sound like a great plan. If we really decide to remove EIOCBRETRY support we'd get rid of all the retry infrastructure and remove the EIOCBRETRY errno so their builds failed. That's a separate issue that shouldn't be confused with this EIOCBQUEUED clobbering. Just removing some pieces of the There are two problems behind this bug: - ext3_releasepage() returns failure if it races with kjournald holding a reference while it waits for a transaction to commit. - generic_file_direct_IO() causes an oops if it clobbers -EIOCBQUEUED with the return code from invalidate_inode_pages2_range()..- >releasepage(). This fix makes the incorrect assertion that *any* failure from invalidate_inode_pages2_range(), which might not have anything to do with this race you're currently seeing, is transitory and should trigger a retry. That's wrong, it should be returning the error. Now, getting ext3_releasepage() to not fail if this race hits to begin with is another story. Chris has some ideas about reworking the page laundering helper to make that more reliable. - z -
ZB>If we really decide to remove EIOCBRETRY support we'd get rid of all ZB>the retry infrastructure and remove the EIOCBRETRY errno so their ZB>builds failed. Originally EIOCBRETRY was used in fs/read_write.c for vector IO. And EIOCBRETRY was deleted from it after. Now EIOCBRETRY is used in drivers/usb/gadget/inode.c only. And the patch 2/3 proposes to set iocb flag instead of EIOCBRETRY errno in inode.c. Agree that the name kiocbSetPgBusy() could be better chosen if kiocbSetPgBusy() is just instead for vector IO. But why other than DIO developers must continue using EIOCBRETRY? Thay could use the same way as fs/read_write.c uses? ZB> I was waiting for them to arrive before responding. Sorry. I've listed "linux-kernel@vger.kernel.org; stern@rowland.harvard.edu" in a single (that is incorrect) thunderbird 'To' line; I have got response from Stern; and thought that mailing was OK. ZB>- ext3_releasepage() returns failure if it races with kjournald ZB>holding a reference while it waits for a transaction to commit. The patch proposes iterative synchronization way to solve this problem. ZB>- generic_file_direct_IO() causes an oops if it clobbers -EIOCBQUEUED ZB>with the return code from invalidate_inode_pages2_range()..- ZB>releasepage(). After patching invalidate_inode_pages2_range() does not change return value. ZB>This fix makes the incorrect assertion that *any* failure from ZB>invalidate_inode_pages2_range(), which might not have anything to do ZB>with this race you're currently seeing, is transitory and should ZB>trigger a retry. That's wrong, it should be returning the error. Just patch makes to retry iocb if page is transitory busy for any reason. If busy -> do retry IO operation later. We will get correct errno if mapping was deleted or retry success if page is dirty or committed. Leonid -
