Re: [PATCH] jbd_commit_transaction() races with journal_try_to_drop_buffers() causing DIO failures

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Hello,

On Fri 09-05-08 15:27:52, Mingming Cao wrote:
  Hmm, you are right. It is not enough to just wait in ext3_launder_page()
because we don't have a transaction for direct_IO started yet. But if we
actually released buffers from the page there, it should be fine.

  Not to me, really. The scheme for buffer references you are trying to
impose is awkward to say the least. First, it is completely
counter-intuitive (at least to me ;), second, it is impractical as well.
For example in your scheme, you have no sensible way of locking ordered
data mode buffer - you cannot just do: get the reference and do
lock_buffer() because that violates your requirements.  The only reasonable
way you could do that is to lock the page to make sure buffer won't go away
from you - but you cannot currently do that in journal commit code because
of lock ordering. So the only way I can see which is left is: get some jbd
spin lock to serialize with journal_try_to_free_buffers(), get the buffer
reference, try to lock buffer, if it fails, drop everything and restart.
And this is IMO no-go...
  And BTW even if you fix such races, I think you'll still have races like:
CPU1:                                      CPU2:
  filemap_write_and_wait()
                                           dirty a page
                                           msync() (dirties buffers)
  invalidate_inode_page2_range() -> -EIO

  The code could historically always return EIO when mixing buffered and
unbuffered accesses and the question is, under which circumstances is this
acceptable?  I agree that the current state when if you do "buffered write,
DIO write" in sequence and you'll possibly get EIO is bad and we should fix
it. But I'm not sure we should fix the EIO return under all possible
circumstances at all costs...

  ^^^ Here you can see what I wrote above. Basically you just busy-loop
wait for buffer lock. You should at least put schedule() there so that you
don't lockup the CPU but it's ugly anyway.

  ^^^ And here you add a place where we are not guaranteed to make any
progress... If someone intensively spins on that buffer, commit code could
cycle here forever (or at least for quite a long time).


									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[RFC] JBD ordered mode rewrite, Jan Kara, (Thu Mar 6, 10:42 am)
Re: [RFC] JBD ordered mode rewrite, Josef Bacik, (Thu Mar 6, 12:05 pm)
Re: [RFC] JBD ordered mode rewrite, Andrew Morton, (Thu Mar 6, 4:53 pm)
Re: [RFC] JBD ordered mode rewrite, Mark Fasheh, (Thu Mar 6, 6:34 pm)
Re: [RFC] JBD ordered mode rewrite, Mingming Cao, (Fri Mar 7, 3:55 am)
Re: [RFC] JBD ordered mode rewrite, Andreas Dilger, (Fri Mar 7, 4:52 pm)
Re: [RFC] JBD ordered mode rewrite, Mingming Cao, (Fri Mar 7, 5:08 pm)
Re: [RFC] JBD ordered mode rewrite, Christoph Hellwig, (Sat Mar 8, 5:14 am)
Re: [RFC] JBD ordered mode rewrite, Jan Kara, (Mon Mar 10, 9:30 am)
Re: [RFC] JBD ordered mode rewrite, Jan Kara, (Mon Mar 10, 10:38 am)
Re: [RFC] JBD ordered mode rewrite, Jan Kara, (Mon Mar 10, 11:00 am)
Re: [RFC] JBD ordered mode rewrite, Jan Kara, (Mon Mar 10, 11:29 am)
Re: [RFC] JBD ordered mode rewrite, Jan Kara, (Mon Mar 10, 12:54 pm)
Re: [RFC] JBD ordered mode rewrite, Andreas Dilger, (Mon Mar 10, 2:37 pm)
Possible race between direct IO and JBD?, Mingming Cao, (Fri Apr 25, 4:38 pm)
Re: Possible race between direct IO and JBD?, Andrew Morton, (Sat Apr 26, 3:41 am)
Re: Possible race between direct IO and JBD?, Jan Kara, (Mon Apr 28, 5:26 am)
Re: Possible race between direct IO and JBD?, Badari Pulavarty, (Mon Apr 28, 10:11 am)
Re: Possible race between direct IO and JBD?, Jan Kara, (Mon Apr 28, 11:09 am)
Re: Possible race between direct IO and JBD?, Mingming Cao, (Mon Apr 28, 12:09 pm)
Re: Possible race between direct IO and JBD?, Jan Kara, (Tue Apr 29, 5:43 am)
Re: Possible race between direct IO and JBD?, Mingming Cao, (Tue Apr 29, 10:49 am)
Re: [PATCH] jbd_commit_transaction() races with journal_tr ..., Jan Kara, (Mon May 12, 8:54 am)
Re: [PATCH] Fix DIO EIO error caused by race between jbd_c ..., Badari Pulavarty, (Fri May 16, 10:12 am)
Re: [PATCH] Fix DIO EIO error caused by race between jbd_c ..., Badari Pulavarty, (Fri May 16, 10:17 am)
Re: [PATCH][take 5] JBD2: Fix race between free buffer and ..., Aneesh Kumar K.V, (Thu May 29, 11:24 pm)