Then I'll restore the lkml to the cc list.
It should return the number of bytes successfully written before the
error, giving you the location of the first error. Also using smaller
individual writes ( preferably issued in parallel ) also allows the
problem spot to be isolated.
When you are writing a transaction log, you do; you don't need much
data, but you do need to be sure it has hit the disk before continuing.
You certainly aren't writing many mb across a dozen write() calls and
only then care to make sure it is all flushed in an unknown order. When
order matters, you can not use fsync, which is one of the reasons why
databases use O_DIRECT; they care about the ordering.
I meant it is not a good idea to use fsync as you can't properly handle
errors.
Throughput is nowhere near perfect, as the pipeline is stalled for quite
some time. The pipe fills up quickly while dd is blocked on the sync
write, which then blocks tar until all 16 MB have hit the disk. Only
then does dd go back to reading from the tar pipe, allowing it to
continue. During the time it takes tar to archive another 16 MB of
data, the write queue is empty. The only time that the tar process gets
to continue running while data is written to disk is in the small time
it takes for the pipe ( 4 KB isn't it? ) to fill up.
No, semantics have nothing to do with performance. Semantics deals with
the state of the machine after the call, not how quickly it got there.
Semantics is a question of correct operation, not optimal.
With both O_DIRECT and O_SYNC, the machine state is essentially the same
after the call: the data has hit the disk. Aside from the performance
difference, the application can not tell the difference between O_DIRECT
and O_SYNC, so if that performance difference can be resolved by
changing the implementation, Linus can be happy and get rid of O_DIRECT.
-