Hmm, maybe worth to try. We may be able to set/clear all DIRTY/WRITBACK bit
on page_cgroup without mapping->tree_lock.
In such case, of course, the page itself should be locked by lock_page().
But.Hmm..for example.
account_page_dirtied() is the best place to mark page_cgroup dirty. But
it's called under mapping->tree_lock.
Another thinking:
I wonder we may have to change our approach for dirty page acccounting.
Please see task_dirty_inc(). It's for per task dirty limiting.
And you'll notice soon that there is no task_dirty_dec().
Making use of lib/proportions.c's proportion calculation as task_dirty limit or
per-bdi dirty limit does is worth to be considered.
This is very simple and can be implemented without problems we have now.
(Need to think about algorithm itself, but it's used and works well.)
We'll never see complicated race condtions.
I know some guys wants "accurate" accounting, but I myself don't want too much.
Using propotions.c can offer us unified approach with per-task dirty accounting.
or per-bid dirty accouting.
If we do so, memcg will have interface like per-bdi dirty ratio (see below)
[kamezawa@bluextal kvm2]$ ls /sys/block/dm-0/bdi/
max_ratio min_ratio power read_ahead_kb subsystem uevent
Maybe
memory.min_ratio
memory.max_ratio
And use this instead of task_dirty_limit(current, pbdi_dirty); As
if (mem_cgroup_dirty_ratio_support(current)) // return 0 if root cgroup
memcg_dirty_limit(current, pbdi_dirty, xxxx?);
else
task_dirty_limit(current, pbdi_diryt)
To be honest, I don't want to increase caller of lock_page_cgroup() and don't
want to see complicated race conditions.
Thanks,
-Kame
--