Re: [PATCH 1/3] aio: fix oops because of extra IO control block freeing.

Previous thread: Re: [lm-sensors] Could the k8temp driver be interfering with ACPI? by Bodo Eggert on Monday, March 5, 2007 - 6:56 am. (6 messages)

Next thread: [PATCH 0/2] add reporting of SVM flags to /proc/cpuinfo by Joerg Roedel on Monday, March 5, 2007 - 7:36 am. (8 messages)
From: Leonid Ananiev
Date: Monday, March 5, 2007 - 7:23 am

From Leonid Ananiev

The patch fixes oops because of extra IO control block freeing.
IO is retried if page could not be invalidated.

Signed-off-by: Leonid Ananiev <leonid.i.ananiev@intel.com>

The patch fixes oops "Kernel BUG at fs/aio.c:509" archived at 
http://lkml.org/lkml/2007/2/8/337
The number of IO control block (iocb)users < 0.
If page could not be invalidated by invalidate_inode_pages2_range()
than EIO is returned. It happens if journal_try_to_free_buffers() fails 
to drop_buffers().
This EIO is not differing from real IO competition with EIO and 
aio_complete() is called.
Later aio_complete() is called from dio_bio_end_aio() and frees iocb 
once more.

After patch generic_file_direct_IO() sets PgBusy flag in iocb
if page could not be invalidated. iocb is retried after IO competition.
The process is waked up if IO is SYNC else iocb is kicked.
The lines “if (ret != -EIOCBRETRY)” is deleted because
nothing set to EIOCBRETRY.

Next patches 2/3 and 3/3 do cleanup only.
The patch is applied and tested with aio-stress on 2.6.20 and 2.6.21-rc2
diff -uprNX linux-2.6.21-rc2/Documentation/dontdiff 
linux-2.6.21-rc2/fs/aio.c linux-2.6.21-rc2-aio31/fs/aio.c
--- linux-2.6.21-rc2/fs/aio.c	2007-03-05 00:29:58.000000000 -0800
+++ linux-2.6.21-rc2-aio31/fs/aio.c	2007-03-05 00:38:11.000000000 -0800
@@ -723,37 +723,12 @@ static ssize_t aio_run_iocb(struct kiocb
  	ret = retry(iocb);
  	current->io_wait = NULL;

-	if (ret != -EIOCBRETRY && ret != -EIOCBQUEUED) {
+	if (!kiocbIsPgBusy(iocb) && ret != -EIOCBQUEUED) {
  		BUG_ON(!list_empty(&iocb->ki_wait.task_list));
  		aio_complete(iocb, ret, 0);
  	}
  out:
  	spin_lock_irq(&ctx->ctx_lock);
-
-	if (-EIOCBRETRY == ret) {
-		/*
-		 * OK, now that we are done with this iteration
-		 * and know that there is more left to go,
-		 * this is where we let go so that a subsequent
-		 * "kick" can start the next iteration
-		 */
-
-		/* will make __queue_kicked_iocb succeed from here on ...
From: Andrew Morton
Date: Wednesday, March 7, 2007 - 3:14 pm

On Mon, 05 Mar 2007 17:23:33 +0300



Your email client performs space-stuffing, which makes things harder on the
receiving end.  http://mbligh.org/linuxdocs/Email/Clients/Thunderbird might
-

From: Zach Brown
Date: Wednesday, March 7, 2007 - 3:48 pm

(I'm going to read this as "This EIO is misinterpreted as real IO  

This analysis is correct.  Nothing can clobber -EIOCBQUEUED as it is  
returned up from fs/direct-io.c to fs/aio.c.  This -EIO from  
invalidation is one of the two places that currently break this rule.

The fix I had hoped for, invalidating down in fs/direct-io.c before  
returning -EIOCBQUEUED, doesn't work because it ends up getting the  
ordering between the journal lock and the page lock backwards.   

True, but this is gratuitously cruel to external users of - 
EIOCBRETRY.  Silently breaking them doesn't sound like a great plan.   
If we really decide to remove EIOCBRETRY support we'd get rid of all  
the retry infrastructure and remove the EIOCBRETRY errno so their  
builds failed.  That's a separate issue that shouldn't be confused  
with this EIOCBQUEUED clobbering.  Just removing some pieces of the  


There are two problems behind this bug:

- ext3_releasepage() returns failure if it races with kjournald  
holding a reference while it waits for a transaction to commit.

- generic_file_direct_IO() causes an oops if it clobbers -EIOCBQUEUED  
with the return code from invalidate_inode_pages2_range()..- 
 >releasepage().

This fix makes the incorrect assertion that *any* failure from  
invalidate_inode_pages2_range(), which might not have anything to do  
with this race you're currently seeing, is transitory and should  
trigger a retry.  That's wrong, it should be returning the error.

Now, getting ext3_releasepage() to not fail if this race hits to  
begin with is another story.  Chris has some ideas about reworking  
the page laundering helper to make that more reliable.

- z
-

From: Ananiev, Leonid I
Date: Thursday, March 8, 2007 - 1:19 pm

ZB>If we really decide to remove EIOCBRETRY support we'd get rid of all
ZB>the retry infrastructure and remove the EIOCBRETRY errno so their
ZB>builds failed.

Originally EIOCBRETRY was used in fs/read_write.c for vector IO.
And EIOCBRETRY was deleted from it after.
Now EIOCBRETRY is used in drivers/usb/gadget/inode.c only. And the patch
2/3 proposes to set iocb flag instead of EIOCBRETRY errno in inode.c.
Agree that the name kiocbSetPgBusy() could be better chosen if
kiocbSetPgBusy() is just instead for vector IO. But why other than DIO
developers must continue using EIOCBRETRY? Thay could use the same way
as fs/read_write.c uses?

ZB> I was waiting for them to arrive before responding.
Sorry. I've listed "linux-kernel@vger.kernel.org;
stern@rowland.harvard.edu" in a single (that is incorrect) thunderbird
'To' line; I have got response from Stern; and thought that mailing was
OK.

ZB>- ext3_releasepage() returns failure if it races with kjournald
ZB>holding a reference while it waits for a transaction to commit.

The patch proposes iterative synchronization way to solve this problem.

ZB>- generic_file_direct_IO() causes an oops if it clobbers -EIOCBQUEUED
ZB>with the return code from invalidate_inode_pages2_range()..-
ZB>releasepage().

After patching invalidate_inode_pages2_range() does not change return
value.

ZB>This fix makes the incorrect assertion that *any* failure from
ZB>invalidate_inode_pages2_range(), which might not have anything to do
ZB>with this race you're currently seeing, is transitory and should
ZB>trigger a retry.  That's wrong, it should be returning the error.

Just patch makes to retry iocb if page is transitory busy for any
reason.
If busy -> do retry IO operation later.
We will get correct errno if mapping was deleted or retry success if
page is dirty or committed.

Leonid
-

Previous thread: Re: [lm-sensors] Could the k8temp driver be interfering with ACPI? by Bodo Eggert on Monday, March 5, 2007 - 6:56 am. (6 messages)

Next thread: [PATCH 0/2] add reporting of SVM flags to /proc/cpuinfo by Joerg Roedel on Monday, March 5, 2007 - 7:36 am. (8 messages)