Cc: Goswin von Brederlow <brederlo@...>, Nick Piggin <nickpiggin@...>, Christoph Lameter <clameter@...>, <andrea@...>, <torvalds@...>, <linux-fsdevel@...>, <linux-kernel@...>, Christoph Hellwig <hch@...>, William Lee Irwin III <wli@...>, David Chinner <dgc@...>, Jens Axboe <jens.axboe@...>, Badari Pulavarty <pbadari@...>, Maxim Levitsky <maximlevitsky@...>, Fengguang Wu <fengguang.wu@...>, swin wang <wangswin@...>, <totty.lu@...>, <hugh@...>, <joern@...>
Why is it difficult?
When user space allocates memory wouldn't it get it contiously? I mean
that is one of the goals, to use larger continious allocations and map
them with a single page table entry where possible, right? And then
you can roughly predict where an munmap() would free a page.
Say the application does map a few GB of file, uses madvice to tell
the kernel it needs a 2MB block (to get a continious 2MB chunk
mapped), waits for it and then munmaps 4K in there. A 4k hole for some
unmovable object to fill. If you can then trigger the creation of an
unmovable object as well (stat some file?) and loop you will fill the
ram quickly. Maybe it only works in 10% but then you just do it 10
times as often.
Over long times it could occur naturally. This is just to demonstrate
it with malice.
Do you have a graphic like
http://www.skynet.ie/~mel/anti-frag/2007-02-28/page_type_distribution.jpg
for that case?
That assumes that the number of groups allocated for unmovable objects
will continiously grow and shrink. I'm assuming it will level off at
some size for long times (hours) under normal operations. There should
be some buffering of a few groups to be held back in reserve when it
shrinks to prevent the scenario that the size is just at a group
boundary and always grows/shrinks by 1 group.
To copy you need a free page as destination. Thats all I
ment. Hopefully there will always be a free one and the actual freeing
is done asynchronously from the copying.
I'm more concerned with keeping the little unmovable things out of the
way. Those are the things that will fragment the memory and prevent
any huge pages to be available even with moving other stuff out of the
way.
It would also already be a big plus to have 64k continious chunks for
many operations. Guaranty that the filesystem and block layers can
always get such a page (by means of copying pages out of the way when
needed) and do even larger pages speculative.
But as you say that is where min_free_kbytes comes in. To have the
chance of a 2MB continious free chunk you need to have much more
reserved free. Something I would certainly do on a 16GB ram
server. Note that the buddy system will be better at having large free
chunks if user space allocates and frees large chunks as well. It is
much easier to make an 128K chunk out of 64K chunks than out of 4K
chunks. Much more probable to get 2 64k chunks adjacent than 32 4K
chunks in series.
Can you tell me how? I would like to do the same.
That might explain at lot. I was thinking that was with grouping. But
without grouping such a rather random distribution of unmovable
objects doesn't sound uncommon,.
I ment on demand eviction. Not evicting the whole group just because
we need 4k of it. Even looser would be to allow say 16 mixed groups
before moving pages out of the way when needed for an alloc. Best to
make it a proc entry and then change the value once a day and see how
it behaves. :)
MfG
Goswin
-