On Tue, 2007-09-11 at 18:47 +0200, Andrea Arcangeli wrote:
Hi,
In practice, it's pretty difficult to trigger. Buddy allocators always
try and use the smallest possible sized buddy to split. Once a 64K is
split for a 4K or 8K allocation, the remainder of that block will be
used for other 4K, 8K, 16K, 32K allocations. The situation where
multiple 64K blocks gets split does not occur.
Now, the worst case scenario for your patch is that a hostile process
allocates large amount of memory and mlocks() one 4K page per 64K chunk
(this is unlikely in practice I know). The end result is you have many
64KB regions that are now unusable because 4K is pinned in each of them.
Your approach is not immune from problems either. To me, only Nicks
approach is bullet-proof in the long run.
This is true. Slub targetted reclaim (Chrisophs work) is useful
independent of this current problem.
I agree with this. It's why I thought Nick's approach was where we were
going to finish up ultimately.
I have never stated that the SGI design is immune from this problem.
You will need to take some sort of defragmentation to deal with internal
fragmentation. It's a very similar problem to blasting away at slab
pages and still not being able to free them because objects are in use.
Replace "slab" with "large page" and "object" with "4k page" and the
issues are similar.
If it's my fault, sorry about that. It wasn't my intention.
Who said it was off-topic? Again, if this was me, sorry - you should
have chucked something at my head to shut me up.
Right, clearly we failed or at least had sub-optimal results dicussion
this one at VM Summit. Good job we have mail to pick up the stick with.
heh. Well we need to come to some sort of conclusion here or this will
go around the merri-go-round till we're all bald.
Ok, I'm ok with that.
heh, I suggested printing the warning because I knew it had this
problem. The purpose in my mind was to see how far the design could be
brought before fs-block had to fill in the holes.
It should be able to stack on top of either approach and arguably
setting slub_min_order=large_block_order with large block filesystems is
90% of your approach anyway.
I am still failing to see what happens when there are pagetable pages,
slab objects or mlocked 4k pages pinning the 64K pages and you need to
allocate another 64K page for the filesystem. I *think* you deadlock in
a similar fashion to Christoph's approach but the shape of the problem
is different because we are dealing with internal instead of external
fragmentation. Am I wrong?
Yes. I just think you have a different worst case that is just as bad.
small files (you need something like Shaggy's page tail packing),
pagetable pages, pte pages all have to be dealt with. These are the
things I think will cause us internal fragmentaiton problems.
I'd rather avoid depending on it for the system to work 100% of the
same. Hence I've been saying that we need fsblock ultimately for this to
be a 100% supported feature.
Quite possibly.
--
Mel Gorman
-