On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote:
Well it is the metadata used to manage the filesystem block for the
given bit of pagecache (even if the block is not actually allocated
or even a hole, it is deemed to be so by the filesystem).
I'm not fixed on fsblock, but blkmap doesn't grab me either. It
is a map from the pagecache to the block layer, but blkmap sounds
like it is a map from the block to somewhere.
fsblkmap ;)
I'm not, but it seemed like you were confused that fsblock is tied
to changing the aops APIs. It is not, but they can be changed to
give improvements in a good number of areas (*including* better
large block support).
I guess fsblock is 3 times smaller and you would probably have 16
times fewer of them for such a filesystem (given a 4K page size)
still leaves a few orders of magnitude ;)
However, fsblock has this nice feature where it can drop the blocks
when the last reference goes away, so you really only have fsblocks
around for dirty or currently-being-read blocks...
But you give me a good idea: I'll gear the filesystem-side APIs to
be more extent based as well (eg. fsblock's get_block equivalent).
That way it should be much easier to change over to such extents in
future or even have an extent based representation sitting in front
of the fsblock one and acting as a high density cache in your above
situation.
You could do that without holding the page locks as well AFAIKS.
Actually again it might be a bit troublesome with the current
aops APIs, but I don't think fsblock stands in your way there
either.
The independent mapping tree is something I have been thinking
about, but you still need to tie the page to the block at some
point and you need to track IO details and such.
The problem with implementing it in generic code is that it
will add another layer of locking and data structure that may
be better done in the filesystem. (because you _do_ already
need to do all the per-page stuff as well). This was my thing
about overengineering: fsblock is supposed to be just a very
light layer.
I don't know why you think none of that is possible with fsblocks.
You could easily keep an in-memory btree or similar as the
authoritative block management structure and feed the fsblock
layer from that.
There is nothing about fsblock that is tied to i_mutex, and all
it's locking basically comes for free on top of the page based
locking that's already required in the VM.
-