This is a totally experimental patch against 2.6.27-rc5-mm1.
It allows to control how much dirty file pages a cgroup can have at any
given time. This feature is supposed to be strictly connected to a
generic cgroup IO controller (see below).
Interface: a new entry "filedirty" is added to the file memory.stat,
reporting the number of dirty file pages (in pages), and a new file
memory.file_dirty_limit_in_pages is added in the cgroup filesystem to
show/set the current limit.
The overall design is the following.
The dirty file pages are accounted for each cgroup using the memory
controller statistics. The memory controller also allows to define an
upper bound on the number of dirty file pages. When this upper bound is
exceeded the tasks in the cgroup are forced to writeback dirty pages to
return within the cgroup limit.
With this functionality a generic cgroup IO controller can apply any
kind of limitation or shaping policy directly to the IO requests
(elevator, IO scheduler, ...) and it shouldn't care about how fast the
userspace apps are dirtying pages in memory generating a lot of
hard/slow to reclaim pages (or even potential OOM conditions), because
the apps will be actually throttled by the IO controller when they'll
start to writeback pages.
[ Honestly, I don't like this implementation in memcgroup. I'm using the
memcgroup statistics to account the dirty file pages, but I'm also
adding a variable to struct mem_cgroup to implement the dirty file pages
limit. So it seems it would be more appropriate a struct res_counter,
but I can't use res_counter_[un]charge() interface, because the dirty
file pages limit is a soft-limit and can be exceeded without any
problem; if it's exceeded the task is simply forced to perform a
writeback of the dirty pages to return back to the allowed dirty limit.
Suggestions are welcome. ]
Signed-off-by: Andrea Righi <email@example.com>
fs/buffer.c | 1 +
fs/reiser4/as_ops.c | 4 ++-
mmmh.. maybe it's a bit more complex (would add some overhead?) to
translate the limit from dirty_ratio into pages or bytes, because we
need to evaluate it in function of the per-cgroup dirtyable memory (lru
pages and free pages I suppose). Maybe it's enough to implement it
directly in determine_dirtyable_memory().
I can try to implement it and post a new patch.
Correct, it's the same functionality provided by vm.dirty_ratio and
vm.dirty_background_ratio, except that is intended to be per-cgroup.
Without this functionality, a cgroup can even dirty all its memory,
allocated by the memory controller, since statistics and writeback
configurations are global.