Re: [RFC] fsblock

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Christoph Hellwig
Date: Saturday, June 30, 2007 - 4:05 am

Warning ahead:  I've only briefly skipped over the pages so the comments
in the mail are very highlevel.

On Sun, Jun 24, 2007 at 03:45:28AM +0200, Nick Piggin wrote:

Traditional unix buffer cache is always physical block indexed and
used for all data/metadata/blockdevice node access.  There's been
a lot of variants of schemes where data or some data is in a separate
inode,logial block indexed scheme.  Most modern OSes including Linux
now always do the inode,logial block index with some noop substitute
for the metadata and block device node variants of operation.

Now what you replace is a really crappy hybrid of a traditional
unix buffercache implemented ontop of the pagecache for the block
device node (for metadata) and a lot of abuse of the same data
structure as used in the buffercache for keeping metainformation
about the actual data mapping.


Actually most of the code is no older than 10 years.  Just compare
fs/buffer.c in 2.2 and 2.6.  buffer_head is a perfectly fine name
for one of it's uses in the traditional buffercache.

I also thing there is little to no reason to get rid of that use:
This buffercache is what most linux block-based filesystems (except
xfs and jfs most notably) are written to, and it fits them very nicely.

What I'd really like to see is to get rid of the abuse of struct buffer_head
in the data path, and the sometimes to intimate coupling of the buffer cache
with page cache internals.



That's what I mean.  And from a quick glimpse at your code they're still
far too deeply coupled in fsblock.  Really, we don't really want to share
anything between the buffer cache and data mapping operations - they are
so deeply different that this sharing is what creates the enormous complexity
we have to deal with.


The whole concept of delayed allocation requires page allocations at
writeout time, as do various network protocols or even storage drivers.


Not really something that is the block layers fault but rather the lazyness
of the filesystem maintainers.


See now why people like large order page cache so much :)


And this is a complete pain in the ass.  XFS uses vmap in it's metadata buffer
cache due to requirements carrier over from IRIX (in fact that's why I implemented
vmap in it's current form).  This works okay most of them time, but there are
a lot of scenarios where you run out of vmalloc space as you mention.  What's
also nasy is that you can't call vunmap from irq context, and vunmap beeing
rather bad for system peformance due to the tlb flushing overhead.


So as the closing comment I'd say I'd rather keep buffer_heads for metadata
for now and try to decouple the data path from it.  Your fsblock patches
are a very nice start for this, but I'd rather skip the intermediate step
towards the extent based API Dave has been outlining.  Having deal with the
I/O path of a high performance filesystem for a while per-page or sub-page
structures are a real pain to deal with and I'd really prefer to have data
structures for as much as possible blocks with the same state.
-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[RFC] fsblock, Nick Piggin, (Sat Jun 23, 6:45 pm)
[patch 1/3] add the fsblock layer, Nick Piggin, (Sat Jun 23, 6:46 pm)
[patch 2/3] block_dev: convert to fsblock, Nick Piggin, (Sat Jun 23, 6:46 pm)
[patch 3/3] minix: convert to fsblock, Nick Piggin, (Sat Jun 23, 6:47 pm)
Re: [RFC] fsblock, Nick Piggin, (Sat Jun 23, 6:53 pm)
Re: [RFC] fsblock, Jeff Garzik, (Sat Jun 23, 8:07 pm)
Re: [RFC] fsblock, Nick Piggin, (Sat Jun 23, 8:47 pm)
Re: [RFC] fsblock, William Lee Irwin III, (Sat Jun 23, 9:19 pm)
Re: [RFC] fsblock, Chris Mason, (Sun Jun 24, 6:51 am)
Re: [RFC] fsblock, Andi Kleen, (Sun Jun 24, 7:16 am)
Re: [patch 1/3] add the fsblock layer, Andi Kleen, (Sun Jun 24, 8:28 am)
Re: [patch 1/3] add the fsblock layer, Arjan van de Ven, (Sun Jun 24, 1:18 pm)
Re: [RFC] fsblock, Nick Piggin, (Sun Jun 24, 11:58 pm)
Re: [RFC] fsblock, Nick Piggin, (Mon Jun 25, 12:16 am)
Re: [patch 1/3] add the fsblock layer, Nick Piggin, (Mon Jun 25, 12:19 am)
Re: [patch 1/3] add the fsblock layer, Andi Kleen, (Mon Jun 25, 1:58 am)
Re: [RFC] fsblock, Chris Mason, (Mon Jun 25, 5:25 am)
Re: [patch 1/3] add the fsblock layer, Chris Mason, (Mon Jun 25, 6:19 am)
Re: [patch 1/3] add the fsblock layer, Nick Piggin, (Mon Jun 25, 7:42 pm)
Re: [RFC] fsblock, David Chinner, (Mon Jun 25, 8:06 pm)
Re: [RFC] fsblock, Nick Piggin, (Mon Jun 25, 8:55 pm)
Re: [RFC] fsblock, David Chinner, (Tue Jun 26, 2:23 am)
Re: [RFC] fsblock, Nick Piggin, (Tue Jun 26, 4:14 am)
Re: [RFC] fsblock, Chris Mason, (Tue Jun 26, 5:34 am)
Re: [RFC] fsblock, Nick Piggin, (Tue Jun 26, 10:32 pm)
Re: [RFC] fsblock, David Chinner, (Tue Jun 26, 11:05 pm)
Re: [RFC] fsblock, Chris Mason, (Wed Jun 27, 4:50 am)
Re: [RFC] fsblock, Kyle Moffett, (Wed Jun 27, 5:39 am)
Re: [RFC] fsblock, Anton Altaparmakov, (Wed Jun 27, 8:18 am)
Re: [RFC] fsblock, David Chinner, (Wed Jun 27, 3:35 pm)
Re: [RFC] fsblock, Nick Piggin, (Wed Jun 27, 7:44 pm)
Re: [RFC] fsblock, Chris Mason, (Thu Jun 28, 5:20 am)
Re: [RFC] fsblock, David Chinner, (Thu Jun 28, 7:08 pm)
Re: [RFC] fsblock, Nick Piggin, (Thu Jun 28, 7:33 pm)
Re: [RFC] fsblock, Christoph Hellwig, (Sat Jun 30, 3:42 am)
Re: [RFC] fsblock, Christoph Hellwig, (Sat Jun 30, 3:44 am)
Re: [RFC] fsblock, Christoph Hellwig, (Sat Jun 30, 4:05 am)
Re: [RFC] fsblock, Jeff Garzik, (Sat Jun 30, 4:10 am)
Re: [RFC] fsblock, Christoph Hellwig, (Sat Jun 30, 4:13 am)
Re: [RFC] fsblock, Christoph Lameter, (Mon Jul 9, 10:14 am)
Re: [RFC] fsblock, Nick Piggin, (Mon Jul 9, 5:54 pm)
Re: [RFC] fsblock, Christoph Lameter, (Mon Jul 9, 5:59 pm)
Re: [RFC] fsblock, Nick Piggin, (Mon Jul 9, 6:07 pm)
Re: [RFC] fsblock, Dave McCracken, (Mon Jul 9, 6:37 pm)