On Tue, Jul 10, 2007 at 12:11:48PM +0200, Andrea Arcangeli wrote:
Sure, but now we waste more memory on small files....
The difference is that we can use different blocksizes even
within the one filesystem for small and large files with a
variable page cache. We can't do that with a fixed page size.
Not really. If I want a inode cache, it always needs to be 8k based.
If I want a directory cache, it needs to be one of 4k, 8k, 16k, 32k or 64k.
in the same filesystem. the data block size is different again to the
directory block size and the inode block size.
This is where the variable page cache wins hands down. I don't need
to care what page size someone built their kernel with, the file
system can be moved between different page size kernels and *still work*.
FWIW, I don't really care all that much about huge HPC machines. Most
of the systems I deal with are 4-8 socket machines with tens to hundreds of
TB of disk. i.e. small CPU count, relatively small memory (64-128GB RAM)
but really large storage subsystems.
I need really large filesystems that contain both small and large files to
work more efficiently on small boxes where we can't throw endless amounts of
RAM and CPUs at the problem. Hence things like 64k page size are just not an
option because of the wastage that it entails.
See, that's where variable page cache is so good - I don't need to
move everything to 64k block size. We can *easily* do variable
data block size in the filesystem because it's all extent based,
so this really isn't an issue for us on disk. Just changing the
base page size doesn't give us the flexibility we need in memory
to do this....
Me? I'm a just filesystems weenie, not a vm guru. I don't care about
academic mathematical proof for a defrag algorithm - I just want
something that works. It's the "something that works" that has been
done before....
i.e. I'm not wedded to large pages in the page cache - what I
really, really want is an efficient variable order page cache that
doesn't have any vmap overhead. I don't really care how it is
implemented, but changing the base page size doesn't meet the
"efficiency" or "flexibility" requirement I have.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
-