Re: Mount ext3 with barrier=1 doesn't send real barrier bio?

Previous thread: [PATCH] mm: mminit_loglevel cannot be __meminitdata anymore by Marcin Slusarz on Friday, August 15, 2008 - 2:02 pm. (2 messages)

Next thread: [BUG] __GFP_THISNODE is not always honored by Adam Litke on Friday, August 15, 2008 - 3:01 pm. (12 messages)
From: Milan Broz
Date: Friday, August 15, 2008 - 2:31 pm

Hi,

I run some barrier tests over device-mapper (which currently doesn't
support barrier bio at all) and even if I set barrier=1 in ext3 mount,
there is never any bio with barrier flag... (in 2.6.27-rc)

How is the barrier=1 flag supposed to work in ext3 (JBD) now?

See:
If you specify barrier=1, JFS_BARRIER flag is set in ext3_init_journal_params
	journal->j_flags |= JFS_BARRIER;

Now, journal_write_commit_record is called and this happens:

	if (journal->j_flags & JFS_BARRIER) {
		set_buffer_ordered(bh);
		barrier_done = 1;
	}
	ret = sync_dirty_buffer(bh);

	if (barrier_done)
		clear_buffer_ordered(bh);

	if (ret == -EOPNOTSUPP && barrier_done) {
	...

From this code I expect that EOPNOTSUPP is returned if barrier is not
supported (yes, that exactly does device-mapper now without barrier patches).

But it *never* happens because:

sync_dirty_buffer always calls 
	submit_bh(WRITE_SYNC, bh)

and in submit_bh is this test:

	if (buffer_ordered(bh) && (rw == WRITE))
		rw = WRITE_BARRIER;

but there is rw == WRITE_SYNC, not WRITE !

So the barrier flag for bio is never set and normal sync write
is performed.

Why it isn't done like in attached patch? Is it intentional or it is bug?

I think it was caused by change in this commit:

commit 18ce3751ccd488c78d3827e9f6bf54e6322676fb
Author: Jens Axboe <jens.axboe@oracle.com>
Date:   Tue Jul 1 09:07:34 2008 +0200

    Properly notify block layer of sync writes

Milan
--

Set BIO_RW_BARRIER flag even for submit_bh sync write request.

Signed-off-by: Milan Broz <mbroz@redhat.com>
---
 fs/buffer.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2926,16 +2926,16 @@ int submit_bh(int rw, struct buffer_head * bh)
 	BUG_ON(!buffer_mapped(bh));
 	BUG_ON(!bh->b_end_io);
 
-	if (buffer_ordered(bh) && (rw == WRITE))
-		rw = WRITE_BARRIER;
-
 	/*
 	 * Only clear out a write error when rewriting, should this
 	 * include WRITE_SYNC as well?
 	 ...
From: Eric Sandeen
Date: Wednesday, August 20, 2008 - 4:38 pm

Milan, you're right.  Ric saw this same strange behavior when doing some
benchmarking with and without barriers; Chris noticed the change in
submit_bh; I was about to write up a similar patch to what you've sent
already.  Jens, does Milan's fix look good to you?

Incidentally, I ran Ric's test on ext3 on a sata drive:

# fs_mark -d /mnt/test -n 1600 -t 1 -s 20480

cfq:
                        files/s
           2.6.25   2.6.26.2  2.6.26.2+patch
barrier=0    169      127       126
barrier=1     33      126        33

noop:
                        files/s
           2.6.25   2.6.26.2  2.6.26.2+patch
barrier=0    191      184       185
barrier=1     33      180        33

deadline:
                        files/s
           2.6.25   2.6.26.2  2.6.26.2+patch
barrier=0    181      182       185
barrier=1     33      185        33

anticipatory:
                        files/s
           2.6.25   2.6.26.2  2.6.26.2+patch
barrier=0    187      133       132
barrier=1     34      134        33


--

From: Jens Axboe
Date: Wednesday, August 20, 2008 - 10:26 pm

Yep looks good, thanks a lot Milan! I'll send in the patch, unless I'm
badly mistaken we need it for 2.6.26-stable as well.

-- 
Jens Axboe

--

From: Ric Wheeler
Date: Thursday, August 21, 2008 - 3:43 am

I think we definitely need it there as well, thanks!

Ric

--

From: OGAWA Hirofumi
Date: Thursday, August 21, 2008 - 3:23 pm

This should be ((rw & RW_MASK) == WRITE) too?  Anyway, this seems change
behavior of submit_bh(WRITE_BARRIER) (maybe reiserfs only), it wouldn't
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
--

From: Jens Axboe
Date: Thursday, August 21, 2008 - 11:38 pm

Yes, I believe the simpler and more correct fix is:

diff --git a/fs/buffer.c b/fs/buffer.c
index 38653e3..16b2263 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2926,14 +2926,13 @@ int submit_bh(int rw, struct buffer_head * bh)
 	BUG_ON(!buffer_mapped(bh));
 	BUG_ON(!bh->b_end_io);
 
-	if (buffer_ordered(bh) && (rw == WRITE))
+	if (buffer_ordered(bh) && (rw & WRITE))
 		rw = WRITE_BARRIER;
 
 	/*
-	 * Only clear out a write error when rewriting, should this
-	 * include WRITE_SYNC as well?
+	 * Only clear out a write error when rewriting
 	 */
-	if (test_set_buffer_req(bh) && (rw == WRITE || rw == WRITE_BARRIER))
+	if (test_set_buffer_req(bh) && (rw & WRITE))
 		clear_buffer_write_io_error(bh);
 
 	/*

-- 
Jens Axboe

--

From: OGAWA Hirofumi
Date: Friday, August 22, 2008 - 12:45 am

I see. But, umm..., this means WRITE_SYNC with barrier was deprecated?
Or typo?
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
--

From: Jens Axboe
Date: Friday, August 22, 2008 - 12:58 am

It was supposed to read rw |= WRITE_BARRIER :-)

-- 
Jens Axboe

--

Previous thread: [PATCH] mm: mminit_loglevel cannot be __meminitdata anymore by Marcin Slusarz on Friday, August 15, 2008 - 2:02 pm. (2 messages)

Next thread: [BUG] __GFP_THISNODE is not always honored by Adam Litke on Friday, August 15, 2008 - 3:01 pm. (12 messages)