On (12/09/07 16:17), Christoph Lameter didst pronounce:
One of Nick's points is that to have a 100% reliable solution, that is
what is required. We already have a layering between the VM and the FS
but my understanding is that fsblock replaces rather than adds to it.
Surely, we'll be able to detect the situation where the memory is really
contiguous as a fast path and have a slower path where fragmentation was
a problem.
This is going in circles.
His point is that we also cannot prove it is 100% correct in all
situations. Without taking additional (expensive) steps, there will be a
workload that fragments physical memory. He doesn't know what it is and neither
do we, but that does not mean that someone else will find it. He also has a
point about the slow degredation of fragmentation that is woefully difficult
to reproduce. We've had this provability of correctness problem before.
His initial problem was not with the patches as such but the fact that they
seemed to be presented as a 1st class feature that we fully support and
is a potential solution for some VM and IO Scalability problems. This is
not the case, we have to treat it as a 2nd class feature until we *know* no
situation exists where it breaks down. These patches on their own would have
to run for months if not a year or so before we could be really sure about it.
The only implementation question about these patches that hasn't been addressed
is the mmap() support. What's wrong with it in it's current form. Can it be
fixed or if it's fundamentally screwed etc. That has fallen by the
wayside.
Unless callers always use an iterator for blocks that is optimised in the
physically linear case to be a simple array offset and when not physically
linear it either walks chains (complex) or uses vmap (must deal with TLB
flushes amoung other things). If it optimistically uses physically contiguous
memory, we may find a way to use only one page struct as well.
Or gee whiz, I don't know. Start with your patches as a strictly 2nd class
citizen and build fsblock in while trying to keep use of physically contiguous
memory where possible and it makes sense.
I am *very* wary of using reserve pools for anything other than
emergency situations. If nothing else pools == wasted memory + a sizing
problem. But hey, it is one option.
Are we going to agree on some sort of plan or are we just going to
handwave ourselves to death?
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-