> On Tue, 13 Mar 2007 13:19:53 +0300 Kirill Korotaev <dev@sw.ru> wrote:
Not really - I mean "first allocated the page". ie: major fault(), read(),
write(), etc.
I'm not sure that we need to account for pages at all, nor care about rss.
If we use a physical zone-based containment scheme: fake-numa,
variable-sized zones, etc then it all becomes moot. You set up a container
which has 1.5GB of physial memory then toss processes into it. As that
process set increases in size it will toss out stray pages which shouldn't
be there, then it will start reclaiming and swapping out its own pages and
eventually it'll get an oom-killing.
No RSS acounting or page acounting in sight, because we already *have* that
stuff, at the physical level, in the zone.
Overcommitment can be performed by allowing different containers to share
the same zone set, or by dynamically increasing or decreasing the size of
a physical container.
This all works today with fake-numa and cpusets, no kernel changes needed.
It could be made to work fairly simply with a multi-zone approach, or with
resizeable zones.
I'd be interested in knowing what you think the shortcomings of this are
likely to be,.
-