Re: PATCH 3/6 - direct-io: do not merge logically non-contiguous requests

Previous thread: [PATCH 4/5] ARM: config U300 PL180 PL011 PL022 for DMA v9 by Linus Walleij on Friday, August 6, 2010 - 4:50 am. (1 message)

Next thread: Hello, newbie wants help. by dingshaoheng on Friday, August 6, 2010 - 5:16 am. (2 messages)
From: Christoph Hellwig
Date: Friday, August 6, 2010 - 5:03 am

Something is deeply wrong here.  Raw block device access has a 1:1
mapping between logical and physical block numbers.  They really should
never be non-contiguous.
--

From: Christian Ehrhardt
Date: Saturday, August 7, 2010 - 12:31 am

At least I did nothing I know about to break it :-)

As I mentioned just iozone using direct I/O (-I flag of iozone then 
using O_DIRECT for the file) on a ext2 file-system.
The file system was coming clean out of mkfs the file was written with 
iozone one step before the traced read run.

The only uncommon thing here might be the block device, which is a scsi 
disk on our SAN servers (I'm running on s390) - so the driver in charge 
is zfcp (drivers/s390/scsi/).
I could use dasd (drivers/s390/block) disks as well, but I have no 
blktrace of them yet - what I already know is that they show a similar 
cost increase. On monday I should be able to get machine resources to 
verify that both disk types are affected.

Let me know if I can do anything else on my system to shed some light on 
the matter.



-- 

Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, System z Linux Performance
--

From: Josef Bacik
Date: Tuesday, August 10, 2010 - 6:55 pm

So above we have this

        loff_t cur_offset = dio->block_in_file << dio->blkbits;
        loff_t bio_next_offset = dio->logical_offset_in_bio +
                dio->bio->bi_size;


block_in_file is the logical offset of the current page we are working on.
logical_offset_in_bio is the logical offset of the first page in the bio, plus
bi_size gives us the logical offset that would come next for a contiguous page,
so if cur_offset != bio_next_offset then the range in the bio and the current
page we have are not right next to eachother, so it works just fine.  It's a
little tricky, but

dio->block_in_file - logical offset of current page
dio->logical_offset_in_bio - logical offset of first page added to the bio

So say blocksize of 4k, we do dio to 12k, the first time around
dio->block_in_file is 0, we set dio->cur_page, and move on to the next page, and
bio->block_in_file is set to 1.  We find that dio->cur_page is set, so we do
dio_send_cur_page().  Since !dio->bio we create a new bio, and set
dio->logical_offset_in_bio to 0, since thats the offset of dio->cur_page.  Then
we setup the next cur_page as the page for logical block 1, and
dio->block_in_file gets bumped to 2.  We map the next block and come into
dio_send_cur_page() again.  At this point cur_offset would be 8192...and shit I
just realized what was wrong.  If you change

loff_t cur_offset = dio->block_in_file << dio->blkbits;

to

loff_t cur_offset = dio->cur_page_fs_offset << dio->blkbits;

That should fix the problem.  Sorry guys, I screwed that up.  I'll look at this
again tomorrow after I've had my 2 hours of sleep and see if this all still
makes sense, but I think the above should fixe the performance thing.  As for
the dio->boundary thing, dio_bio_submit() sets dio->boundary to 0, so the same
bio won't be submitted twice.  Thanks,

Josef
--

From: Jeff Moyer
Date: Wednesday, August 11, 2010 - 6:27 am

Sorry, I wasn't very clear in my description, but you figured it out.
;-)  Of course, cur_page_fs_offset is already in bytes, so that left

While I don't doubt that you are right, I will sleep better at night if
we do an else if.  (To be fair, this ambiguity was not introduced by
you).

I've tested this patch, added printk's and watched blktrace to verify
that we don't split up I/Os.  So long as no one objects, I'll post this
for inclusion in a new thread.

Thanks for looking into it, Josef.

Cheers,
Jeff

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 7600aac..445901c 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -632,7 +632,7 @@ static int dio_send_cur_page(struct dio *dio)
 	int ret = 0;
 
 	if (dio->bio) {
-		loff_t cur_offset = dio->block_in_file << dio->blkbits;
+		loff_t cur_offset = dio->cur_page_fs_offset;
 		loff_t bio_next_offset = dio->logical_offset_in_bio +
 			dio->bio->bi_size;
 
@@ -657,7 +657,7 @@ static int dio_send_cur_page(struct dio *dio)
 		 * Submit now if the underlying fs is about to perform a
 		 * metadata read
 		 */
-		if (dio->boundary)
+		else if (dio->boundary)
 			dio_bio_submit(dio);
 	}
 
--

From: Josef Bacik
Date: Wednesday, August 11, 2010 - 7:08 am

No problem, thanks for making me look at it again.  You can add

Acked-by: Josef Bacik <josef@redhat.com>

Thanks,

Josef 
--

Previous thread: [PATCH 4/5] ARM: config U300 PL180 PL011 PL022 for DMA v9 by Linus Walleij on Friday, August 6, 2010 - 4:50 am. (1 message)

Next thread: Hello, newbie wants help. by dingshaoheng on Friday, August 6, 2010 - 5:16 am. (2 messages)