Re: [PATCH 0/2] dm-band: The I/O bandwidth controller: Overview

Previous thread: [PATCH -mm] fix variable use in AVR32 pte_alloc_one by Ben Nizette on Wednesday, January 23, 2008 - 4:57 am. (5 messages)

Next thread: Re: [PATCH RESEND] Minimal fix for private_list handling races by Jan Kara on Wednesday, January 23, 2008 - 6:30 am. (6 messages)
From: Ryo Tsuruta
Date: Wednesday, January 23, 2008 - 5:53 am

Hi everyone,

I'm happy to announce that I've implemented a Block I/O bandwidth controller.
The controller is designed to be of use in a cgroup or virtual machine
environment. The current approach is that the controller is implemented as
a device-mapper driver.

What's dm-band all about?
========================
Dm-band is an I/O bandwidth controller implemented as a device-mapper driver.
Several jobs using the same physical device have to share the bandwidth of
the device. Dm-band gives bandwidth to each job according to its weight, 
which each job can set its own value to.

At this time, a job is a group of processes with the same pid or pgrp or uid.
There is also a plan to make it support cgroup. A job can also be a virtual
machine such as KVM or Xen.

  +------+ +------+ +------+   +------+ +------+ +------+ 
  |cgroup| |cgroup| | the  |   | pid  | | pid  | | the  |  jobs
  |  A   | |  B   | |others|   |  X   | |  Y   | |others| 
  +--|---+ +--|---+ +--|---+   +--|---+ +--|---+ +--|---+   
  +--V----+---V---+----V---+   +--V----+---V---+----V---+   
  | group | group | default|   | group | group | default|  band groups
  |       |       |  group |   |       |       |  group | 
  +-------+-------+--------+   +-------+-------+--------+
  |         band1          |   |         band2          |  band devices
  +-----------|------------+   +-----------|------------+
  +-----------V--------------+-------------V------------+
  |                          |                          |
  |          sdb1            |           sdb2           |  physical devices
  +--------------------------+--------------------------+


How dm-band works.
========================
Every band device has one band group, which by default is called the default
group.

Band devices can also have extra band groups in them. Each band group
has a job to support and a weight. Proportional to the weight, dm-band gives
tokens to the group.

A group passes on I/O requests that its job issues to the ...
From: Ryo Tsuruta
Date: Wednesday, January 23, 2008 - 5:56 am

Here is the patch of dm-band.

Based on 2.6.23.14
Signed-off-by: Ryo Tsuruta <ryov@valinux.co.jp>
Signed-off-by: Hirokazu Takahashi <taka@valinux.co.jp>

diff -uprN linux-2.6.23.14.orig/drivers/md/Kconfig linux-2.6.23.14/drivers/md/Kconfig
--- linux-2.6.23.14.orig/drivers/md/Kconfig	2008-01-15 05:49:56.000000000 +0900
+++ linux-2.6.23.14/drivers/md/Kconfig	2008-01-21 16:09:41.000000000 +0900
@@ -276,4 +276,13 @@ config DM_DELAY
 
 	If unsure, say N.
 
+config DM_BAND
+	tristate "I/O band width control "
+	depends on BLK_DEV_DM
+	---help---
+	Any processes or cgroups can use the same storage
+	with its band-width fairly shared.
+
+	If unsure, say N.
+
 endif # MD
diff -uprN linux-2.6.23.14.orig/drivers/md/Makefile linux-2.6.23.14/drivers/md/Makefile
--- linux-2.6.23.14.orig/drivers/md/Makefile	2008-01-15 05:49:56.000000000 +0900
+++ linux-2.6.23.14/drivers/md/Makefile	2008-01-21 20:45:03.000000000 +0900
@@ -8,6 +8,7 @@ dm-multipath-objs := dm-hw-handler.o dm-
 dm-snapshot-objs := dm-snap.o dm-exception-store.o
 dm-mirror-objs	:= dm-log.o dm-raid1.o
 dm-rdac-objs	:= dm-mpath-rdac.o
+dm-band-objs	:= dm-bandctl.o dm-band-policy.o dm-band-type.o
 md-mod-objs     := md.o bitmap.o
 raid456-objs	:= raid5.o raid6algos.o raid6recov.o raid6tables.o \
 		   raid6int1.o raid6int2.o raid6int4.o \
@@ -39,6 +40,7 @@ obj-$(CONFIG_DM_MULTIPATH_RDAC)	+= dm-rd
 obj-$(CONFIG_DM_SNAPSHOT)	+= dm-snapshot.o
 obj-$(CONFIG_DM_MIRROR)		+= dm-mirror.o
 obj-$(CONFIG_DM_ZERO)		+= dm-zero.o
+obj-$(CONFIG_DM_BAND)		+= dm-band.o
 
 quiet_cmd_unroll = UNROLL  $@
       cmd_unroll = $(PERL) $(srctree)/$(src)/unroll.pl $(UNROLL) \
diff -uprN linux-2.6.23.14.orig/drivers/md/dm-band-policy.c linux-2.6.23.14/drivers/md/dm-band-policy.c
--- linux-2.6.23.14.orig/drivers/md/dm-band-policy.c	1970-01-01 09:00:00.000000000 +0900
+++ linux-2.6.23.14/drivers/md/dm-band-policy.c	2008-01-21 20:31:14.000000000 +0900
@@ -0,0 +1,185 @@
+/*
+ * Copyright (C) 2008 VA Linux Systems Japan K.K.
+ *
+ *  I/O bandwidth ...
From: Frans Pop
Date: Wednesday, January 23, 2008 - 6:33 am

Hi,

I'm not qualified to comment on the code, but here are some suggestions on
config option and comments.

Cheers,
FJP


s/band width/bandwidth/

s/band-width/bandwidth/

The help should probably be a bit more verbose as this does not tell anybody
much who has not already read the documentation.

Maybe something like:
<snip>
This device-mapper target allows to define how the
available bandwith of a storage device should be
shared between processes or cgroups.

Information on how to use dm-band is available in:
   Documentation/device-mapper/band.txt



s/when there exist some BIOs blocked/if some BIOs exist that are blocked/ ?

"none of them can't" : the double negative looks incorrect (and should be

s/have/has/

"has to do something" : that's rather vague...
--

From: Ryo Tsuruta
Date: Wednesday, January 23, 2008 - 8:48 am

Thank you for your suggstions.
I will correct those mistakes.

--
Ryo Tsuruta <ryov@valinux.co.jp>
--

From: Frans Pop
Date: Sunday, January 27, 2008 - 8:44 am

I just see in other Kconfig files that the last line should be:
   <file:Documentation/device-mapper/band.txt>.

Cheers,
FJP
--

From: Ryo Tsuruta
Date: Wednesday, January 23, 2008 - 5:58 am

Here is the document of dm-band.

Based on 2.6.23.14
Signed-off-by: Ryo Tsuruta <ryov@valinux.co.jp>
Signed-off-by: Hirokazu Takahashi <taka@valinux.co.jp>

diff -uprN linux-2.6.23.14.orig/Documentation/device-mapper/band.txt linux-2.6.23.14/Documentation/device-mapper/band.txt
--- linux-2.6.23.14.orig/Documentation/device-mapper/band.txt	1970-01-01 09:00:00.000000000 +0900
+++ linux-2.6.23.14/Documentation/device-mapper/band.txt	2008-01-23 21:48:46.000000000 +0900
@@ -0,0 +1,431 @@
+====================
+Document for dm-band
+====================
+
+Contents:
+  What's dm-band all about?
+  How dm-band works
+  Setup and Installation
+  Command Reference
+  TODO
+
+
+What's dm-band all about?
+========================
+Dm-band is an I/O bandwidth controller implemented as a device-mapper driver.
+Several jobs using the same physical device have to share the bandwidth of
+the device. Dm-band gives bandwidth to each job according to its weight, 
+which each job can set its own value to.
+
+At this time, a job is a group of processes with the same pid or pgrp or uid.
+There is also a plan to make it support cgroup. A job can also be a virtual
+machine such as KVM or Xen.
+
+  +------+ +------+ +------+   +------+ +------+ +------+ 
+  |cgroup| |cgroup| | the  |   | pid  | | pid  | | the  |  jobs
+  |  A   | |  B   | |others|   |  X   | |  Y   | |others| 
+  +--|---+ +--|---+ +--|---+   +--|---+ +--|---+ +--|---+   
+  +--V----+---V---+----V---+   +--V----+---V---+----V---+   
+  | group | group | default|   | group | group | default|  band groups
+  |       |       |  group |   |       |       |  group | 
+  +-------+-------+--------+   +-------+-------+--------+
+  |         band1          |   |         band2          |  band devices
+  +-----------|------------+   +-----------|------------+
+  +-----------V--------------+-------------V------------+
+  |                          |                          |
+  |          sdb1            |           sdb2           |  ...
From: Andi Kleen
Date: Wednesday, January 23, 2008 - 12:57 pm

Could you please address in the document how the intended use
cases/feature set etc. differs from CFQ2 io priorities?

Thanks,

-Andi
--

From: Ryo Tsuruta
Date: Thursday, January 24, 2008 - 3:32 am

Thank you for your suggestion, I'll do that step by step.

Thanks,
Ryo Tsuruta
--

From: Peter Zijlstra
Date: Wednesday, January 23, 2008 - 7:32 am

What definition of bandwidth does it use? Does it for example account
for seek latency?


--

From: Ryo Tsuruta
Date: Wednesday, January 23, 2008 - 10:25 am

The bandwidth in dm-band is determined by the proportion of the
processing time of each device's tokens(I/Os) to the processing time
of all device's tokens(I/Os).
The processing time of one token(I/O) is determined by one I/O cycle 
include seek latency, interrupt latency, etc...

Thanks,
Ryo Tsuruta <ryov@valinux.co.jp>
--

From: Alasdair G Kergon
Date: Wednesday, January 23, 2008 - 7:47 am

It seems to rely on 'current' to classify bios and doesn't do it until the map
function is called, possibly in a different process context, so it won't
always identify the original source of the I/O correctly: people need to take
this into account when designing their group configuration and so this should
be mentioned in the documentation.

I've uploaded it here while we consider ways we might refine the architecture and
interfaces etc.:

  http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-add-band-targ...

Alasdair
-- 
agk@redhat.com
--

From: Hirokazu Takahashi
Date: Wednesday, January 23, 2008 - 9:21 am

Yes, this should be mentioned in the document with the current implementation
as you pointed out.

By the way, I think once a memory controller of cgroup is introduced, it will

Thank you,
Hirokazu Takahashi.
--

From: YAMAMOTO Takashi
Date: Wednesday, January 23, 2008 - 8:38 pm

do you mean to make this a part of the memory subsystem?

YAMAMOTO Takashi
--

From: Hirokazu Takahashi
Date: Thursday, January 24, 2008 - 3:14 am

I just think if the memory subsystem is in front of us, we don't need to
reinvent the wheel.

But I don't have a concrete image how the interface between dm-band and
the memory subsystem should be designed yet. I'd be appreciate if some of
the cgroup developers give some ideas about it.

Thanks,

--

From: YAMAMOTO Takashi
Date: Thursday, January 24, 2008 - 11:26 pm

the current implementation of memory subsystem associates pages to
cgroups directly, rather than via tasks.  so it isn't straightforward to
use the information for other classification mechanisms like yours which
might not share the view of "hierarchy" with the memory subsystem.

--

From: Ryo Tsuruta
Date: Friday, January 25, 2008 - 12:07 am

Hi,

Now I report the result of dm-band bandwidth control test I did yesterday.
I've got really good results that dm-band works as I expected. I made
several band-groups on several disk partitions and gave them heavy I/O loads.

Hardware Spec.
==============
  DELL Dimention E521:

  Linux kappa.local.valinux.co.jp 2.6.23.14 #1 SMP
    Thu Jan 24 17:24:59 JST 2008 i686 athlon i386 GNU/Linux
  Detected 2004.217 MHz processor.
  CPU0: AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 02
  Memory: 966240k/981888k available (2102k kernel code, 14932k reserved,
    890k data, 216k init, 64384k highmem)
  scsi 2:0:0:0: Direct-Access     ATA      ST3250620AS     3.AA PQ: 0 ANSI: 5
  sd 2:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
  sd 2:0:0:0: [sdb] Write Protect is off
  sd 2:0:0:0: [sdb] Mode Sense: 00 3a 00 00
  sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled,
    doesn't support DPO or FUA
  sdb: sdb1 sdb2 < sdb5 sdb6 sdb7 sdb8 sdb9 sdb10 sdb11 sdb12 sdb13 sdb14
    sdb15 >

The results of bandwidth control test on partitions
===================================================

The configurations of the test #1:
   o Prepare three partitions sdb5, sdb6 and sdb7.
   o Give weights of 40, 20 and 10 to sdb5, sdb6 and sdb7 respectively.
   o Run 128 processes issuing random read/write direct I/O with 4KB data
     on each device at the same time.
   o Count up the number of I/Os and sectors which have done in 60 seconds.

                The result of the test #1
 ---------------------------------------------------------------------------
|     device      |       sdb5        |       sdb6        |      sdb7       |
|     weight      |    40 (57.0%)     |     20 (29.0%)    |    10 (14.0%)   |
|-----------------+-------------------+-------------------+-----------------|
|   I/Os (r/w)    |  6640( 3272/ 3368)|  3434( 1719/ 1715)|  1689( 857/ 832)|
|  sectors (r/w)  | 53120(26176/26944)| 27472(13752/13720)| 13512(6856/6656)|
|  ratio to total ...
From: INAKOSHI Hiroya
Date: Monday, January 28, 2008 - 11:42 pm

Hi,


you mean that you run 128 processes on each user-device pairs?  Namely,
I guess that

  user1: 128 processes on sdb5,
  user2: 128 processes on sdb5,
  another: 128 processes on sdb5,

The second preliminary studies might be:

- What if you use a different I/O size on each device (or device-user pair)?
- What if you use a different number of processes on each device (or
device-user pair)?


And my impression is that it's natural dm-band is in device-mapper,
separated from I/O scheduler.  Because bandwidth control and I/O
scheduling are two different things, it may be simpler that they are
implemented in different layers.

Regards,


--

From: Ryo Tsuruta
Date: Tuesday, January 29, 2008 - 8:32 pm

"User-device pairs" means "band groups", right?
What I actually did is the followings:

  user1: 128 processes on sdb5,
  user2: 128 processes on sdb5,
  user3: 128 processes on sdb5,

There are other ideas of controlling bandwidth, limiting bytes-per-sec,
latency time or something. I think it is possible to implement it if 
a lot of people really require it. I feel there wouldn't be a single
correct answer for this issue. Posting good ideas how it should work

I would like to know how dm-band works on various configurations on
various type of hardware. I'll try running dm-band on with other
configurations. Any reports or impressions of dm-band on your machines
are also welcome.

Thanks,
Ryo Tsuruta
--

Previous thread: [PATCH -mm] fix variable use in AVR32 pte_alloc_one by Ben Nizette on Wednesday, January 23, 2008 - 4:57 am. (5 messages)

Next thread: Re: [PATCH RESEND] Minimal fix for private_list handling races by Jan Kara on Wednesday, January 23, 2008 - 6:30 am. (6 messages)