Re: Correct behavior on O_DIRECT sparse file writes

Previous thread: [PATCH 00/31] Remove iget() and read_inode() [try #4] by David Howells on Friday, October 12, 2007 - 2:07 am. (34 messages)

Next thread: Re: 2.6.23-mm1: BUG in reiserfs_delete_xattrs by Laurent Riffard on Sunday, October 14, 2007 - 3:34 pm. (9 messages)
From: Chris Mason
Date: Friday, October 12, 2007 - 1:39 pm

Hello everyone,

The test below creates a sparse file and then fills a hole with
O_DIRECT.  As far as I can tell from reading generic_osync_inode, the
filesystem metadata is only forced to disk if i_size changes during the
file write.  I've tested ext3, xfs and reiserfs and they all skip the
commit when filling holes.

I would argue that filling holes via O_DIRECT is supposed to commit the
metadata required to find those file blocks later.  At least on ext3,
O_SYNC does force a commit on fill holes  (haven't tested others).

So, is the current behavior a bug or a feature?

dd if=/dev/zero of=foo bs=1M seek=1 count=1 oflag=direct

hexdump foo | head -n 2
0000000 62b1 ea2d 73e8 c64f f5ef 1af5 dd09 8ccd
0000010 75ec 9581 e0ea ae9b e28f b76d a700 4d5b

dd if=/dev/urandom of=foo bs=4k count=1 conv=notrunc oflag=direct
reboot -nf

(after reboot)

hexdump foo
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0200000

-chris


-

From: Andrew Morton
Date: Friday, October 12, 2007 - 2:02 pm

On Fri, 12 Oct 2007 16:39:27 -0400

I don't think it's a bug.  Sure, O_DIRECT is synchronous, but that's
because it is, err, direct.  Not because it provides extra data-integrity
-

From: Florian Weimer
Date: Saturday, October 13, 2007 - 4:24 am

This needs to be prominently documented.  Right now, it's far from clear
that you need both O_DIRECT and O_SYNC.
-

From: Chuck Lever
Date: Monday, October 15, 2007 - 8:36 am

It's certainly not a requirement for NFS.  O_DIRECT on NFS forces data 
to the server, which always updates a file's metadata on each write, 
including indirect blocks.
From: Bryan Henderson
Date: Monday, October 15, 2007 - 9:53 am

That makes sense, but how do you explain the committing of the size change 
without O_SYNC?  That seems wrong to me.

This does need to be documented carefully, because a person could easily 
believe, even subconsciously,  that O_DIRECT makes the entire file write 
direct, and sloppy documentation might actually use words to that effect.

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems

-

Previous thread: [PATCH 00/31] Remove iget() and read_inode() [try #4] by David Howells on Friday, October 12, 2007 - 2:07 am. (34 messages)

Next thread: Re: 2.6.23-mm1: BUG in reiserfs_delete_xattrs by Laurent Riffard on Sunday, October 14, 2007 - 3:34 pm. (9 messages)