On Wed, Sep 24, 2008 at 05:29:37PM +0900, Hirokazu Takahashi wrote:
Ok, I will give more details of the thought process.
I was thinking of maintaing an rb-tree per request queue and not an
rb-tree per cgroup. This tree can contain all the bios submitted to that
request queue through __make_request(). Every node in the tree will represent
one cgroup and will contain a list of bios issued from the tasks from that
cgroup.
Every bio entering the request queue through __make_request() function
first will be queued in one of the nodes in this rb-tree, depending on which
cgroup that bio belongs to.
Once the bios are buffered in rb-tree, we release these to underlying
elevator depending on the proportionate weight of the nodes/cgroups.
Some more details which I was trying to implement yesterday.
There will be one bio_cgroup object per cgroup. This object will contain
many bio_group objects. Each bio_group object will be created for each
request queue where a bio from bio_cgroup is queued. Essentially the idea
is that bios belonging to a cgroup can be on various request queues in the
system. So a single object can not serve the purpose as it can not be on
many rb-trees at the same time. Hence create one sub object which will keep
track of bios belonging to one cgroup on a particular request queue.
Each bio_group will contain a list of bios and this bio_group object will
be a node in the rb-tree of request queue. For example. Lets say there are
two request queues in the system q1 and q2 (lets say they belong to /dev/sda
and /dev/sdb). Let say a task t1 in /cgroup/io/test1 is issueing io both
for /dev/sda and /dev/sdb.
bio_cgroup belonging to /cgroup/io/test1 will have two sub bio_group
objects, say bio_group1 and bio_group2. bio_group1 will be in q1's rb-tree
and bio_group2 will be in q2's rb-tree. bio_group1 will contain a list of
bios issued by task t1 for /dev/sda and bio_group2 will contain a list of
bios issued by task t1 for /dev/sdb. I thought the same can be extended
for stacked devices also.
I am still trying to implementing it and hopefully this is doable idea.
I think at the end of the day it will be something very close to dm-ioband
algorithm just that there will be no lvm driver and no notion of separate
dm-ioband device.
In my initial implementation I was queuing the request descriptors. Then
you mentioned that it is not a good idea because potentially a cgroup
issuing more requests might win the race.
Yesterday night I thought, then why not start queuing the bios as they
are submitted to the request_queue, using __make_request() and then
release these to underlying elevator or underlying request queue (in case
of stacked device). This will remove few issues.
- All the layers can uniformly queue bios and no intermixing of queuing
bios and request descriptors.
- Will get rid of issue of one cgroup winning the race because of limited
number of request descriptors.
Now I am planning to queue bios and probably there is no need to queue
request descriptors. I think that's what dm-ioband is doing. Queueing
bios for cgroups per io-band device.
Thinking more about it, In dm-ioband case, you seem to be buffering bios
from various cgroups on a separate request queue belonging to dm-ioband
device. I was thinking of moving all that buffering logic to existing
request queues instead of creating another request queue on top of request
queue I want to control (dm-ioband device).
That's true. rb-tree or list is just data structure detail. It is not
important. The core thing I am trying to achive is that is there a way that
I can get rid of notion of creating a separate dm-ioband device for every
device I want to control.
Is it just me who finds creation of dm-ioband devices odd and difficult to
manage or there are other people who think that it would be nice if we can get
rid of it?
Thanks
Vivek
--