On Wed, Apr 08, 2009 at 10:37:59PM +0200, Andrea Righi wrote:
[..]
I agree here. In some scenarios people might want to put an upper cap on BW
even if more BW is available and in some scenarios people will like to do
proportional distribution and let one get more share of disk if it is
free.
Can you please elaborate a bit on this? Are you concerned about that data
structures created to solve the problem consume a lot of memory?
I think setting a maximum limit on dirty pages is an interesting thought.
It sounds like as if memory controller can handle it?
I guess currently memory controller puts limit on total amount of memory
consumed by cgroup and there are no knobs on type of memory consumed. So
if one can limit amount of dirty page cache memory per cgroup, it
automatically throttles the aysnc writes at the input itself.
So I agree that if we can limit the process from dirtying too much of
memory than IO scheduler level controller should be able to do both
proportional weight and max bw controller.
Currently doing proportional weight control for async writes is very
tricky. I am not seeing constantly backlogged traffic at IO scheudler
level and hence two different weight processes seem to be getting same
BW.
I will dive deeper into the patches on dm-ioband to see how they have
solved this issue. Looks like they are just waiting longer for slowest
group to consume its tokens and that will keep the disk idle. Extended
delays might now show up immediately as performance hog, because it might
also promote increased merging but it should lead to increased latency of
response. And proving latency issues is hard. :-)
IIUC, you are saying that allow hiearchy in user space and then flatten it
out and pass it to kernel?
Hmm.., agree that handling hierarchies is hard and expensive. But at the
same time rest of the controllers like cpu and memory are handling it in
kernel so it probably makes sense to keep the IO controller also in line.
In practice I am not expecting deep hiearchices. May be 2- 3 levels would
be good for most of the people.
Hmm.., I think that should work. I have yet to look at your patches in
detail but it looks like unlimited BW group will not be throttled at all
hence RT tasks can just go right through without getting impacted.
Hmm..., I am not very sure here. When admin is allocating the weights, he
has the whole picture. He knows how many groups are conteding for the disk
and what could be the worst case scenario. So if I have got two groups
with A and B with weight 1 and 2 and both are contending, then as an
admin one would expect to get 33% of BW for group A in worst case (if
group B is continuously backlogged). If B is not contending than A can get
100% of BW. So while configuring the system, will one not plan for worst
case (33% for A, and 66 % for B)?
Will the same thing not happen in proportional weight? If it is an RT
application, one can put it in RT groups to make sure it always gets
the BW first even if there is contention.
Even in regular group, the moment you issue the IO and IO scheduler sees
it, you will start getting your reserved share according to your weight.
How it will be different in the case of io throttling? Even if I don't
utilize the disk fully, cfq will still put the new guy in the queue and
then try to give its share (based on prio).
Are you saying that by keeping disk relatively free, the latency of
response for soft real time application will become better? In that
case can't one simply underprovision the disk?
But having said that I am not disputing the need of max BW controller
as some people have expressed the need of a constant BW view and don't
want too big a fluctuations even if BW is available. Max BW controller
can't gurantee the minumum BW hence can't avoid the fluctuations
completely, but it can still help in smoothing the traffic because
other competitiors will be stopped from doing too much of IO.
Thanks
Vivek
--