Vivek, thanks for the detailed explanation. Only a comment. I guess, if
we don't change also the per-process optimizations/improvements made by
some IO scheduler, I think we can have undesirable behaviours.
For example: CFQ uses the per-process iocontext to improve fairness
between *all* the processes in a system. But it doesn't have the concept
that there's a cgroup context on-top-of the processes.
So, some optimizations made to guarantee fairness among processes could
conflict with algorithms implemented at the cgroup layer. And
potentially lead to undesirable behaviours.
For example an issue I'm experiencing with my cgroup-io-throttle
patchset is that a cgroup can consistently increase the IO rate (always
respecting the max limits), simply increasing the number of IO worker
tasks respect to another cgroup with a lower number of IO workers. This
is probably due to the fact the CFQ tries to give the same amount of
"IO time" to all the tasks, without considering that they're organized
in cgroup.
I don't see this behaviour with noop or deadline, because they don't
have the concept of iocontext.
-Andrea
--