Re: [patch] Converting writeback linked lists to a tree based data structure

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Fengguang Wu <wfg@...>
Cc: Andrew Morton <akpm@...>, Michael Rubin <mrubin@...>, Peter Zijlstra <a.p.zijlstra@...>, <linux-kernel@...>, <linux-mm@...>
Date: Wednesday, January 16, 2008 - 6:35 pm

On Wed, Jan 16, 2008 at 05:07:20PM +0800, Fengguang Wu wrote:

Note that data writeback may be adversely affected by location
based writeback rather than time based writeback - think of
the effect of location based data writeback on an app that
creates lots of short term (<30s) temp files and then removes
them before they are written back.

Also, data writeback locatio cannot be easily derived from
the inode number in pretty much all cases. "near" in terms
of XFS means the same AG which means the data could be up to
a TB away from the inode, and if you have >1TB filesystems
usingthe default inode32 allocator, file data is *never*
placed near the inode - the inodes are in the first TB of
the filesystem, the data is rotored around the rest of the
filesystem.

And with delayed allocation, you don't know where the data is even
going to be written ahead of the filesystem ->writepage call, so you
can't do optimal location ordering for data in this case.


Makes sense for location based writeback of the inodes themselves,
but not for data.

Hmmmm - I'm wondering if we'd do better to split data writeback from
inode writeback. i.e. we do two passes.  The first pass writes all
the data back in time order, the second pass writes all the inodes
back in location order.

Right now we interleave data and inode writeback, (i.e.  we do data,
inode, data, inode, data, inode, ....). I'd much prefer to see all
data written out first, then the inodes. ->writepage often dirties
the inode and hence if we need to do multiple do_writepages() calls
on an inode to flush all the data (e.g. congestion, large amounts of
data to be written, etc), we really shouldn't be calling
write_inode() after every do_writepages() call. The inode
should not be written until all the data is written....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [patch] Converting writeback linked lists to a tree base..., David Chinner, (Wed Jan 16, 6:35 pm)