Documentation of the block device I/O controller: description, usage, advantages and design. Signed-off-by: Andrea Righi <righi.andrea@gmail.com> --- Documentation/controllers/io-throttle.txt | 377 +++++++++++++++++++++++++++++ 1 files changed, 377 insertions(+), 0 deletions(-) create mode 100644 Documentation/controllers/io-throttle.txt diff --git a/Documentation/controllers/io-throttle.txt b/Documentation/controllers/io-throttle.txt new file mode 100644 index 0000000..09df0af --- /dev/null +++ b/Documentation/controllers/io-throttle.txt @@ -0,0 +1,377 @@ + + Block device I/O bandwidth controller + +---------------------------------------------------------------------- +1. DESCRIPTION + +This controller allows to limit the I/O bandwidth of specific block devices for +specific process containers (cgroups) imposing additional delays on I/O +requests for those processes that exceed the limits defined in the control +group filesystem. + +Bandwidth limiting rules offer better control over QoS with respect to priority +or weight-based solutions that only give information about applications' +relative performance requirements. Nevertheless, priority based solutions are +affected by performance bursts, when only low-priority requests are submitted +to a general purpose resource dispatcher. + +The goal of the I/O bandwidth controller is to improve performance +predictability from the applications' point of view and provide performance +isolation of different control groups sharing the same block devices. + +NOTE #1: If you're looking for a way to improve the overall throughput of the +system probably you should use a different solution. + +NOTE #2: The current implementation does not guarantee minimum bandwidth +levels, the QoS is implemented only slowing down I/O "traffic" that exceeds the +limits specified by the user; minimum I/O rate thresholds are supposed to be +guaranteed if the user configures a proper I/O bandwidth partitioning of the +block ...
Hi Andrea, Had a query. What's your use case for capping max bandwidth? I was wondering will proportional bandwidth not cover it. So if we allocate weight/share to every cgroup and limit the bandwidth based on shares only in case of contention. Otherwise applications get to unlimited bandwidth. Much like what cpu controller does or for that matter dm-ioband seems to be doing the same thing. Will you not get same kind of QoS here when comapred to max-bandwidth. The only thing probably missing is what we call hard limit. When BW is available but you don't want a user to use that BW, until and unless user has paid for that. Thanks Vivek --
At the beginning my use case was to guarantee a certain level performance _predictability_. That means no more and no less than the specified threshold (should I say this would be useful for the real-time apps? maybe yes). But at this stage of development IMHO it's worth to implement a more generic solution, able to guarantee both min/max thresholds (to cover my original use case) as well as the weight/share functionality to cover a larger degree use case (QoS for massive shared environments). -Andrea --
Is "no more" harmful for real-time env? Which RT application hates more bandwidth than what one asked for? I could understand "no-less" but you mentioned in the past that implementing minimum gurantees is lot harder. I was thinking that what if we continue to stick to the current policy of letting RT requests go first and try to let them use disk bw first. cfq first dispatches requests of RT class (based on their priority). So in simple implementation, IO controller will simply let all the RT class requests to go directly to elevator and then let elevator dispatch these requests based on their RT prio. IO-controller will only buffer and control requests of non-RT class. This will make sure that we don't break the case of existing working RT applications and still be able to divide remaining disk BW among other non-RT tasks. IMHO, once above simple scheme is working, we can probably extend it to provide additional level of controls. Thanks Vivek --
RT doesn't mean as fast as possible, the objective of RT is to meet the individual timing requirement. So, the most important property for RT should be predicatbility. If you know that an application would require exactly T seconds to read a block from a device (no more, no less) well... in this case you're not introducing uncertainness in your RT task. And I agree for the "no-less" part. It's difficult, but there's surely Sounds reasonable, since we want to give more guarantees to respect minimum bw requirements for RT tasks. -Andrea --
