Re: [patch] fs: truncate introduce new sequence

Previous thread: Re: [PATCH] improve the performance of large sequential write NFS workloads by Wu Fengguang on Monday, December 21, 2009 - 6:59 pm. (28 messages)

Next thread: ditt gyllene mynt har blivit krediterat by Hot Ruby Royale on Tuesday, December 22, 2009 - 10:19 am. (1 message)
From: Nick Piggin
Date: Thursday, December 17, 2009 - 11:51 pm

I wonder if you could consider merging at least this patch upstream?

In this form, it is basically leaving existing code and filesystems
unchanged. The reason to merge it would be to help with filesystem
maintainers to put conversions into their queues. A few dependencies
are on this (like Jan's sub-page block improvements).

---
From: Nick Piggin <npiggin@suse.de>
Subject: [PATCH] truncate: introduce new sequence

Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
setattr > vmtruncate > truncate, have filesystems call their truncate sequence
from ->setattr if filesystem specific operations are required. vmtruncate is
deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
previously should be used.

simple_setattr is introduced for simple in-ram filesystems to implement
the new truncate sequence. Eventually all filesystems should be converted
to implement a setattr, and the default code in notify_change should go
away.

simple_setsize is also introduced to perform just the ATTR_SIZE portion
of simple_setattr (ie. changing i_size and trimming pagecache).

To implement the new truncate sequence:
- filesystem specific manipulations (eg freeing blocks) must be done in
  the setattr method rather than ->truncate.
- vmtruncate can not be used by core code to trim blocks past i_size in
  the event of write failure after allocation, so this must be performed
  in the fs code.
- convert usage of helpers block_write_begin, nobh_write_begin,
  cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
  variants. These avoid calling vmtruncate to trim blocks (see previous).
- inode_setattr should not be used. generic_setattr is a new function
  to be used to copy simple attributes into the generic inode.
- make use of the better opportunity to handle errors with the new sequence.

Big problem with the previous calling sequence: the filesystem is not called
until i_size has already changed.  This means it is not allowed to ...
From: Dave Chinner
Date: Tuesday, December 22, 2009 - 5:24 am

__blockdev_direct_IO_newtrunc()?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--

From: Nick Piggin
Date: Wednesday, December 23, 2009 - 12:08 am

Thanks. I mustn't have retested dio after merging the latest
direct IO simplifications :(

Will test everything and resend it.

--

From: Nick Piggin
Date: Wednesday, December 23, 2009 - 2:29 am

Here is an updated patch.
--

From: Nick Piggin <npiggin@suse.de>
Subject: [PATCH] truncate: introduce new sequence

Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
setattr > vmtruncate > truncate, have filesystems call their truncate sequence
from ->setattr if filesystem specific operations are required. vmtruncate is
deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
previously should be used.

simple_setattr is introduced for simple in-ram filesystems to implement
the new truncate sequence. Eventually all filesystems should be converted
to implement a setattr, and the default code in notify_change should go
away.

simple_setsize is also introduced to perform just the ATTR_SIZE portion
of simple_setattr (ie. changing i_size and trimming pagecache).

To implement the new truncate sequence:
- filesystem specific manipulations (eg freeing blocks) must be done in
  the setattr method rather than ->truncate.
- vmtruncate can not be used by core code to trim blocks past i_size in
  the event of write failure after allocation, so this must be performed
  in the fs code.
- convert usage of helpers block_write_begin, nobh_write_begin,
  cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
  variants. These avoid calling vmtruncate to trim blocks (see previous).
- inode_setattr should not be used. generic_setattr is a new function
  to be used to copy simple attributes into the generic inode.
- make use of the better opportunity to handle errors with the new sequence.

Big problem with the previous calling sequence: the filesystem is not called
until i_size has already changed.  This means it is not allowed to fail the
call, and also it does not know what the previous i_size was. Also, generic
code calling vmtruncate to truncate allocated blocks in case of error had
no good way to return a meaningful error (or, for example, atomically handle
block deallocation).

Cc: Christoph Hellwig ...
Previous thread: Re: [PATCH] improve the performance of large sequential write NFS workloads by Wu Fengguang on Monday, December 21, 2009 - 6:59 pm. (28 messages)

Next thread: ditt gyllene mynt har blivit krediterat by Hot Ruby Royale on Tuesday, December 22, 2009 - 10:19 am. (1 message)