On Wed, Sep 19, 2007 at 04:04:30PM +0200, Andrea Arcangeli wrote:
Sure, and that's what I meant when I said VPC + large pages was
a means to the end, not the only solution to the problem.
No, I don't like fsblock because it is inherently a "struture
per filesystem block" construct, just like buggerheads. You
still need to allocate millions of them when you have millions
dirty pages around. Rather than type it all out again, read
the fsblocks thread from here:
http://marc.info/?l=linux-fsdevel&m=118284983925719&w=2
FWIW, with Chris mason's extent-based block mapping (which btrfs
is using and Christoph Hellwig is porting XFS over to) we completely
remove buggerheads from XFS and so fsblock would be a pretty
major step backwards for us if Chris's work goes into mainline.
That's not in the filesystem, though. ;)
However, I agree that if you don't have mmap then it's not
worthwhile and the changes for VPC aren't trivial.
We current support metadata blocks larger than page size for
certain types of metadata in XFS. e.g. directory blocks.
This however, requires vmap()ing a bunch of individual,
non-contiguous pages out of a block device address space
in exactly the fashion that was proposed by Nick with fsblock
originally.
vmap() has severe scalability problems - read this subthread
of this discussion between Nick and myself:
http://lkml.org/lkml/2007/9/11/508
<sigh>
There we go - back to the bloody I/O devices. Can ppl please stop
bringing this up because it *is not an issue any more*.
Hmm - so you'll need page cache tail packing as well in that case
to prevent memory being wasted on small files. That means any way
we look at it (VPC+mmap or config-page-shift+fsblock+pctails)
we've got some non-trivial VM modifications to make.
If VPC can be separated from the large contiguous page requirement
(i.e. virtually mapped compound page support), I still think it
comes out on top because it doesn't require every filesystem to be
modified and you can use standard pages where they are optimal
(i.e. on filesystems were block size <= PAGE_SIZE).
But, I'm not going to argue endlessly for one solution or another;
I'm happy to see different solutions being chased, so may the
best VM win ;)
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
-