Hi Rusty,
Would it make sense to use noop by default? After all we do not know
what is behind the backend driver and the hypervisor is likely to do its
own scheduling anyway. I guess this is the reason the Xen guys took this
approach.
What do you think about the patch below?
- Fernando
---
From: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Subject: [PATCH] virtio_blk: use noop elevator by default
Using the noop elevator by default seems to be safest bet because we do
not know what is behind the backend driver and the hypervisor is likely
to do its own scheduling anyway.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-2.6.27-rc4/drivers/block/virtio_blk.c linux-2.6.27-rc4-fixes/drivers/block/virtio_blk.c
--- linux-2.6.27-rc4/drivers/block/virtio_blk.c 2008-08-26 21:26:01.000000000 +0900
+++ linux-2.6.27-rc4-fixes/drivers/block/virtio_blk.c 2008-08-26 21:22:03.000000000 +0900
@@ -237,6 +237,8 @@ static int virtblk_probe(struct virtio_d
goto out_put_disk;
}
+ elevator_init(rq, "noop");
+
if (index < 26) {
sprintf(vblk->disk->disk_name, "vd%c", 'a' + index % 26);
} else if (index < (26 + 1) * 26) {
--
I plan to include some variant of disk profiling for 2.6.28 which will let eg CFQ turn off idling for such device types, I think that is a better solution. -- Jens Axboe --
Hi Jens, That is good news. With the proliferation of intelligent disk controllers and SSDs, the disk profiling approach seems to be the right controller-specific auto-tuning feature, not a new functionality of the generic elevator layer. Is this interpretation correct? Would it make sense in some cases to change elevators automatically depending on the characteristics of the underlying device instead (e.g. we might not need any of the extra features CFQ provides, for example)? I would like to take a look at those patches so I peeked into your git tree, but I could not find them (I probably chose the wrong branches). Are they accessible through your kernel.org's git repository? Thanks! Fernando --
As is the case with SSD devices, we do not want to idle in AS/CFQ when
the block device is a paravirt front-end driver. This patch adds a flag
(QUEUE_FLAG_VIRT) which should be used by front-end drivers such as
virtio_blk and xen-blkfront to indicate a paravirtualized device.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-2.6.28-rc2-orig/include/linux/blkdev.h linux-2.6.28-rc2/include/linux/blkdev.h
--- linux-2.6.28-rc2-orig/include/linux/blkdev.h 2008-10-27 17:41:58.000000000 +0900
+++ linux-2.6.28-rc2/include/linux/blkdev.h 2008-10-27 17:34:49.000000000 +0900
@@ -449,6 +449,7 @@ struct request_queue
#define QUEUE_FLAG_FAIL_IO 12 /* fake timeout */
#define QUEUE_FLAG_STACKABLE 13 /* supports request stacking */
#define QUEUE_FLAG_NONROT 14 /* non-rotational device (SSD) */
+#define QUEUE_FLAG_VIRT QUEUE_FLAG_NONROT /* paravirt device */
static inline int queue_is_locked(struct request_queue *q)
{
--
All three patches look fine, although we could just reuse QUEUE_FLAG_NONROT directly. But I agree it makes sense to make the distinction, so I've just applied 1-3. -- Jens Axboe --
I guess in theory you could imagine that the virtual device is mapped
directly onto a physical device, and the host OS does no scheduling, in
which case it would be appropriate for the guest do the work. But I
think otherwise this makes sense.
J
--
For that specific case, it should just not set the flag. -- Jens Axboe --
When the virtual device is mapped directly onto a physical device the host OS or the hypervisor could notify this fact to the guest using the PCI configuration space (the bytes reserved for vendor-defined purposes look like a good candidate to me). In such a case, on the guest side the driver should just check the configuration space and set/unset the flag accordingly. The good thing about this approach is that it can be used with both paravirt drivers and regular drivers (which is important for fully virtualized guests). I have just started to implement this idea but I would be great it you let me know your take on this issue. - Fernando --
As a paravirt front-end driver, virtio_blk is not a rotational device so
we want do avoid idling in AS/CFQ. Tell the block layer about this.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-2.6.28-rc2-orig/drivers/block/virtio_blk.c linux-2.6.28-rc2/drivers/block/virtio_blk.c
--- linux-2.6.28-rc2-orig/drivers/block/virtio_blk.c 2008-10-27 17:41:53.000000000 +0900
+++ linux-2.6.28-rc2/drivers/block/virtio_blk.c 2008-10-27 17:34:32.000000000 +0900
@@ -237,6 +237,8 @@ static int virtblk_probe(struct virtio_d
goto out_put_disk;
}
+ queue_flag_set_unlocked(QUEUE_FLAG_VIRT, vblk->disk->queue);
+
if (index < 26) {
sprintf(vblk->disk->disk_name, "vd%c", 'a' + index % 26);
} else if (index < (26 + 1) * 26) {
--
Xen's blkfront sets noop as the default I/O scheduler at initialization time to avoid elevator overheads such as idling, but with the advent of basic disk profiling capabilities this is not necessary anymore. We should just tell the block layer that we are a paravirt front-end driver and the elevator will automatically make the necessary adjustments. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> --- diff -urNp linux-2.6.28-rc2-orig/drivers/block/xen-blkfront.c linux-2.6.28-rc2/drivers/block/xen-blkfront.c --- linux-2.6.28-rc2-orig/drivers/block/xen-blkfront.c 2008-10-27 17:41:53.000000000 +0900 +++ linux-2.6.28-rc2/drivers/block/xen-blkfront.c 2008-10-27 17:38:59.000000000 +0900 @@ -343,7 +343,7 @@ static int xlvbd_init_blk_queue(struct g if (rq == NULL) return -1; - elevator_init(rq, "noop"); + queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq); /* Hard sector size and max sectors impersonate the equiv. hardware. */ blk_queue_hardsect_size(rq, sector_size); --
Thank you for the advice. I will be sending patches that take advantage of the profiling capability merged for 2.6.28. --
