On Wed, 2008-08-06 at 22:12 +0530, Balbir Singh wrote:
As Dave pointed out I just think that we should allow each disk to be
treated separately. To avoid the administration nightmare you mention
adding block device grouping capabilities should suffice to solve most
of the issues.
That is a really good question. The I/O tracking patches split the
memory controller in two functional parts: (1) page tracking and (2)
memory accounting/cgroup policy enforcement. By doing so the memory
controller specific code can be separated from the rest, which
admittedly, will not benefit the memory controller a great deal but,
hopefully, we can get cleaner code that is easier to maintain.
The important thing, though, is that with this separation the page
tracking bits can be easily reused by any subsystem that needs to keep
track of pages, and the I/O controller is certainly one such candidate.
Synchronous I/O is easy to deal with because everything is done in the
context of the task that generated the I/O, but buffered I/O and
synchronous I/O are problematic. However with the observation that the
owner of an I/O request happens to be the owner the of the pages the I/O
buffers of that request reside in, it becomes clear that pdflush and
friends could use that information to determine who the originator of
the I/O is and the I/O request accordingly.
Going back to your question, with the current I/O tracking patches I/O
controller would be bound to the page tracking functionality of cgroups
(page_cgroup) not the memory controller. We would not even need to
compile the memory controller. The dependency on cgroups would still be
there though.
As an aside, I guess that with some effort we could get rid of this
dependency by providing some basic tracking capabilities even when the
cgroups infrastructure is not being used. By doing so traditional I/O
schedulers such as CFQ could benefit from proper I/O tracking
capabilities without using cgroups. Of course if the kernel has cgroups
support compiled in the cgroups I/O tracking would be used instead (this
idea was inpired by CFS' group scheduling, which works both with and
without cgroups support). I am currently trying to implement this.
Yes, makes sense.
Thank you!
--