Re: [RFC] fsblock

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: David Chinner
Date: Tuesday, June 26, 2007 - 2:23 am

On Tue, Jun 26, 2007 at 01:55:11PM +1000, Nick Piggin wrote:

Sure, but it's not a "filesystem block" which is what you are
calling it. IMO, it's overloading a well known term with something
different, and that's just confusing.

Can we call it a block mapping layer or something like that?
e.g. struct blkmap?


Extent based block mapping is entirely independent of block size.
Please don't confuse the two....


Yes. Block based is simple, but has flexibility and scalability
problems.  e.g the number of fsblocks that are required to map large
files.  It's not uncommon for use to have millions of bufferheads
lying around after writing a single large file that only has a
handful of extents. That's 5-6 orders of magnitude difference there
in memory usage and as memory and disk sizes get larger, this will
become more of a problem....


For VM operations, no, but they would continue to be locked on a
per-page basis. However, we can do filesystem block operations
without needing to hold page locks. e.g. space reservation and
allocation......


No, that's wrong. I'm not talking about VM parallelisation,
I want to be able to support multiple writers to a single file.
i.e. removing the i_mutex restriction on writes. To do that
you've got to have a range locking scheme integrated into
the block map for the file so that concurrent lookups and
allocations don't trip over each other.

iomaps can double as range locks simply because iomaps are
expressions of ranges within the file.  Seeing as you can only
access a given range exclusively to modify it, inserting an empty
mapping into the tree as a range lock gives an effective method of
allowing safe parallel reads, writes and allocation into the file.

The fsblocks and the vm page cache interface cannot be used to
facilitate this because a radix tree is the wrong type of tree to
store this information in. A sparse, range based tree (e.g. btree)
is the right way to do this and it matches very well with
a range based API.

None of what I'm talking about requires any changes to the existing
page cache or VM address space. I'm proposing that we should be
treat the block mapping as an address space in it's own right. i.e.
perhaps the struct page should not have block mapping objects
attached to it at all.

By separating out the block mapping from the page cache, we make the
page cache completely independent of filesystem block size, and it
can just operate wholly on pages. We can implement a generic extent
mapping tree instead of every filesystem having to (re)implement
their own. And if the filesystem does it's job of preventing
fragmentation, the amount of memory consumed by the tree will
be orders of magnitude lower than any fsblock based indexing.

I also like what this implies for keeping track of sub-block dirty
ranges. i.e. no need for RMW cycles for if we are doing sector sized
and aligned I/O - we can keep track of sub-block dirty state in the
block mapping tree easily *and* we know exactly what sector on disk
it maps to. That means we don't care about filesystem block size
as it no longer has any influence on RMW boundaries.

None of this is possible with fsblocks, so I really think that
fsblocks are not the step forward we need. They are just bufferheads
under another name and hence have all the same restrictions that
bufferheads imply. We should be looking to eliminate bufferheads
entirely rather than perpetuating them as fsblocks.....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[RFC] fsblock, Nick Piggin, (Sat Jun 23, 6:45 pm)
[patch 1/3] add the fsblock layer, Nick Piggin, (Sat Jun 23, 6:46 pm)
[patch 2/3] block_dev: convert to fsblock, Nick Piggin, (Sat Jun 23, 6:46 pm)
[patch 3/3] minix: convert to fsblock, Nick Piggin, (Sat Jun 23, 6:47 pm)
Re: [RFC] fsblock, Nick Piggin, (Sat Jun 23, 6:53 pm)
Re: [RFC] fsblock, Jeff Garzik, (Sat Jun 23, 8:07 pm)
Re: [RFC] fsblock, Nick Piggin, (Sat Jun 23, 8:47 pm)
Re: [RFC] fsblock, William Lee Irwin III, (Sat Jun 23, 9:19 pm)
Re: [RFC] fsblock, Chris Mason, (Sun Jun 24, 6:51 am)
Re: [RFC] fsblock, Andi Kleen, (Sun Jun 24, 7:16 am)
Re: [patch 1/3] add the fsblock layer, Andi Kleen, (Sun Jun 24, 8:28 am)
Re: [patch 1/3] add the fsblock layer, Arjan van de Ven, (Sun Jun 24, 1:18 pm)
Re: [RFC] fsblock, Nick Piggin, (Sun Jun 24, 11:58 pm)
Re: [RFC] fsblock, Nick Piggin, (Mon Jun 25, 12:16 am)
Re: [patch 1/3] add the fsblock layer, Nick Piggin, (Mon Jun 25, 12:19 am)
Re: [patch 1/3] add the fsblock layer, Andi Kleen, (Mon Jun 25, 1:58 am)
Re: [RFC] fsblock, Chris Mason, (Mon Jun 25, 5:25 am)
Re: [patch 1/3] add the fsblock layer, Chris Mason, (Mon Jun 25, 6:19 am)
Re: [patch 1/3] add the fsblock layer, Nick Piggin, (Mon Jun 25, 7:42 pm)
Re: [RFC] fsblock, David Chinner, (Mon Jun 25, 8:06 pm)
Re: [RFC] fsblock, Nick Piggin, (Mon Jun 25, 8:55 pm)
Re: [RFC] fsblock, David Chinner, (Tue Jun 26, 2:23 am)
Re: [RFC] fsblock, Nick Piggin, (Tue Jun 26, 4:14 am)
Re: [RFC] fsblock, Chris Mason, (Tue Jun 26, 5:34 am)
Re: [RFC] fsblock, Nick Piggin, (Tue Jun 26, 10:32 pm)
Re: [RFC] fsblock, David Chinner, (Tue Jun 26, 11:05 pm)
Re: [RFC] fsblock, Chris Mason, (Wed Jun 27, 4:50 am)
Re: [RFC] fsblock, Kyle Moffett, (Wed Jun 27, 5:39 am)
Re: [RFC] fsblock, Anton Altaparmakov, (Wed Jun 27, 8:18 am)
Re: [RFC] fsblock, David Chinner, (Wed Jun 27, 3:35 pm)
Re: [RFC] fsblock, Nick Piggin, (Wed Jun 27, 7:44 pm)
Re: [RFC] fsblock, Chris Mason, (Thu Jun 28, 5:20 am)
Re: [RFC] fsblock, David Chinner, (Thu Jun 28, 7:08 pm)
Re: [RFC] fsblock, Nick Piggin, (Thu Jun 28, 7:33 pm)
Re: [RFC] fsblock, Christoph Hellwig, (Sat Jun 30, 3:42 am)
Re: [RFC] fsblock, Christoph Hellwig, (Sat Jun 30, 3:44 am)
Re: [RFC] fsblock, Christoph Hellwig, (Sat Jun 30, 4:05 am)
Re: [RFC] fsblock, Jeff Garzik, (Sat Jun 30, 4:10 am)
Re: [RFC] fsblock, Christoph Hellwig, (Sat Jun 30, 4:13 am)
Re: [RFC] fsblock, Christoph Lameter, (Mon Jul 9, 10:14 am)
Re: [RFC] fsblock, Nick Piggin, (Mon Jul 9, 5:54 pm)
Re: [RFC] fsblock, Christoph Lameter, (Mon Jul 9, 5:59 pm)
Re: [RFC] fsblock, Nick Piggin, (Mon Jul 9, 6:07 pm)
Re: [RFC] fsblock, Dave McCracken, (Mon Jul 9, 6:37 pm)