[PATCH 0/3] virtio PCI driver

Previous thread: Re: [PATCH] virtio config_ops refactoring by Anthony Liguori on Wednesday, November 7, 2007 - 10:41 pm. (9 messages)

Next thread: [PATCH] sysctl: Check length at deprecated_sysctl_warning. by Tetsuo Handa on Wednesday, November 7, 2007 - 10:57 pm. (9 messages)
To: <linux-kernel@...>
Cc: Anthony Liguori <aliguori@...>, Rusty Russell <rusty@...>, <virtualization@...>, <kvm-devel@...>
Date: Wednesday, November 7, 2007 - 10:46 pm

This patch series implements a PCI driver for virtio. This allows virtio
devices (like block and network) to be used in QEMU/KVM. I'll post a very
early KVM userspace backend in kvm-devel for those that are interested.

This series depends on the two virtio fixes I've posted and Rusty's config_ops
refactoring. I've tested with these patches on Rusty's experimental virtio
tree.
-

To: <linux-kernel@...>
Cc: Anthony Liguori <aliguori@...>, Rusty Russell <rusty@...>, <virtualization@...>, <kvm-devel@...>
Date: Wednesday, November 7, 2007 - 10:46 pm

This is needed for the virtio PCI device to be compiled as a module.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 0e1bf05..3f28b47 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -260,6 +260,8 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
return IRQ_HANDLED;
}

+EXPORT_SYMBOL_GPL(vring_interrupt);
+
static struct virtqueue_ops vring_vq_ops = {
.add_buf = vring_add_buf,
.get_buf = vring_get_buf,
@@ -306,8 +308,12 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
return &vq->vq;
}

+EXPORT_SYMBOL_GPL(vring_new_virtqueue);
+
void vring_del_virtqueue(struct virtqueue *vq)
{
kfree(to_vvq(vq));
}

+EXPORT_SYMBOL_GPL(vring_del_virtqueue);
+
-

To: <linux-kernel@...>
Cc: Anthony Liguori <aliguori@...>, Rusty Russell <rusty@...>, <virtualization@...>, <kvm-devel@...>
Date: Wednesday, November 7, 2007 - 10:46 pm

This patch moves virtio under the virtualization menu and changes virtio
devices to not claim to only be for lguest.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

diff --git a/drivers/Kconfig b/drivers/Kconfig
index f4076d9..d945ffc 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -93,6 +93,4 @@ source "drivers/auxdisplay/Kconfig"
source "drivers/kvm/Kconfig"

source "drivers/uio/Kconfig"
-
-source "drivers/virtio/Kconfig"
endmenu
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index 4d0119e..be4b224 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -429,6 +429,7 @@ config VIRTIO_BLK
tristate "Virtio block driver (EXPERIMENTAL)"
depends on EXPERIMENTAL && VIRTIO
---help---
- This is the virtual block driver for lguest. Say Y or M.
+ This is the virtual block driver for virtio. It can be used with
+ lguest or QEMU based VMMs (like KVM or Xen). Say Y or M.

endif # BLK_DEV
diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index 6569206..ac4bcdf 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -50,5 +50,6 @@ config KVM_AMD
# OK, it's a little counter-intuitive to do this, but it puts it neatly under
# the virtualization menu.
source drivers/lguest/Kconfig
+source drivers/virtio/Kconfig

endif # VIRTUALIZATION
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 86b8641..e66aec4 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -3107,6 +3107,7 @@ config VIRTIO_NET
tristate "Virtio network driver (EXPERIMENTAL)"
depends on EXPERIMENTAL && VIRTIO
---help---
- This is the virtual network driver for lguest. Say Y or M.
+ This is the virtual network driver for virtio. It can be used with
+ lguest or QEMU based VMMs (like KVM or Xen). Say Y or M.

endif # NETDEVICES
-

To: Anthony Liguori <aliguori@...>
Cc: <linux-kernel@...>, Rusty Russell <rusty@...>, <virtualization@...>, <kvm-devel@...>
Date: Thursday, November 8, 2007 - 2:49 am

Perhaps the virt menu needs to be split into a host-side support menu
and guest-side support menu.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

-

To: <linux-kernel@...>
Cc: Anthony Liguori <aliguori@...>, Rusty Russell <rusty@...>, <virtualization@...>, <kvm-devel@...>
Date: Wednesday, November 7, 2007 - 10:46 pm

This is a PCI device that implements a transport for virtio. It allows virtio
devices to be used by QEMU based VMMs like KVM or Xen.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 9e33fc4..c81e0f3 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -6,3 +6,20 @@ config VIRTIO
config VIRTIO_RING
bool
depends on VIRTIO
+
+config VIRTIO_PCI
+ tristate "PCI driver for virtio devices (EXPERIMENTAL)"
+ depends on PCI && EXPERIMENTAL
+ select VIRTIO
+ select VIRTIO_RING
+ ---help---
+ This drivers provides support for virtio based paravirtual device
+ drivers over PCI. This requires that your VMM has appropriate PCI
+ virtio backends. Most QEMU based VMMs should support these devices
+ (like KVM or Xen).
+
+ Currently, the ABI is not considered stable so there is no guarantee
+ that this version of the driver will work with your VMM.
+
+ If unsure, say M.
+
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index f70e409..cc84999 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -1,2 +1,3 @@
obj-$(CONFIG_VIRTIO) += virtio.o
obj-$(CONFIG_VIRTIO_RING) += virtio_ring.o
+obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
new file mode 100644
index 0000000..85ae096
--- /dev/null
+++ b/drivers/virtio/virtio_pci.c
@@ -0,0 +1,469 @@
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/pci.h>
+#include <linux/interrupt.h>
+#include <linux/virtio.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_ring.h>
+#include <linux/virtio_pci.h>
+#include <linux/highmem.h>
+#include <linux/spinlock.h>
+
+MODULE_AUTHOR("Anthony Liguori <aliguori@us.ibm.com>");
+MODULE_DESCRIPTION("virtio-pci");
+MODULE_LICENSE("GPL");
+MODULE_VERSION("1");
+
+/* Our device structur...

To: Anthony Liguori <aliguori@...>
Cc: <linux-kernel@...>, Rusty Russell <rusty@...>, <virtualization@...>, <kvm-devel@...>
Date: Tuesday, November 20, 2007 - 11:01 am

This means we can't kick multiple queues with one exit.

I'd also like to see a hypercall-capable version of this (but that can

Can this be implemented via shared memory? We're exiting now on every

I would really like to see this implemented as pci config space, with no
tricks like multiplexing several virtqueues on one register. Something
like the PCI BARs where you have all the register numbers allocated

Is this run only on init? If so the lock isn't needed.

--
error compiling committee.c: too many arguments to function

-

To: Avi Kivity <avi@...>
Cc: <linux-kernel@...>, Rusty Russell <rusty@...>, <virtualization@...>, <kvm-devel@...>
Date: Tuesday, November 20, 2007 - 11:43 am

There is no interface in virtio currently to batch multiple queue
notifications so the only way one could do this AFAICT is to use a timer

I don't think so. A vmexit is required to lower the IRQ line. It may
be possible to do something clever like set a shared memory value that's
checked on every vmexit. I think it's very unlikely that it's worth it

My first implementation did that. I switched to using a selector
because it reduces the amount of PCI config space used and does not

Yes, it's also not stricly needed on cleanup I think. I left it there
though for clarity. I can remove.

Regards,

Anthony Liguori

-

To: Anthony Liguori <aliguori@...>
Cc: Rusty Russell <rusty@...>, <virtualization@...>, <linux-kernel@...>, <kvm-devel@...>
Date: Tuesday, November 20, 2007 - 12:12 pm

That means the user has to select which device to expose. With feature
bits, the hypervisor advertises both pio and hypercalls, the guest picks

But... it's tricky, and it's nonstandard. With pci config, you can do
live migration by shipping the pci config space to the other side. With
the special iospace, you need to encode/decode it.

Not much of an argument, I know.

wrt. number of queues, 8 queues will consume 32 bytes of pci space if
all you store is the ring pfn.

--
error compiling committee.c: too many arguments to function

-

To: Avi Kivity <avi@...>
Cc: Rusty Russell <rusty@...>, <virtualization@...>, <linux-kernel@...>, <kvm-devel@...>, Eric Van Hensbergen <ericvanhensbergen@...>
Date: Tuesday, November 20, 2007 - 6:16 pm

Well please propose the virtio API first and then I'll adjust the PCI
ABI. I don't want to build things into the ABI that we never actually

I was thinking more along the lines that a hypercall-based device would
certainly be implemented in-kernel whereas the current device is
naturally implemented in userspace. We can simply use a different
device for in-kernel drivers than for userspace drivers. There's no

It's pretty invasive. I think a more paravirt device that expected an
edge triggered interrupt would be a better solution for those types of

None of the PCI devices currently work like that in QEMU. It would be
very hard to make a device that worked this way because since the order
in which values are written matter a whole lot. For instance, if you
wrote the status register before the queue information, the driver could
get into a funky state.

We'll still need save/restore routines for virtio devices. I don't

You also at least need a num argument which takes you to 48 or 64
depending on whether you care about strange formatting. 8 queues may
not be enough either. Eric and I have discussed whether the 9p virtio
device should support multiple mounts per-virtio device and if so,
whether each one should have it's own queue. Any devices that supports
this sort of multiplexing will very quickly start using a lot of queues.

I think most types of hardware have some notion of a selector or mode.
Take a look at the LSI adapter or even VGA.

Regards,

Anthony Liguori

-

To: Anthony Liguori <aliguori@...>
Cc: <kvm-devel@...>, <virtualization@...>, <linux-kernel@...>, Eric Van Hensbergen <ericvanhensbergen@...>
Date: Wednesday, November 21, 2007 - 3:13 am

Move ->kick() to virtio_driver.

I believe Xen networking uses the same event channel for both rx and tx,

Where the device is implemented is an implementation detail that should
be hidden from the guest, isn't that one of the strengths of
virtualization? Two examples: a file-based block device implemented in
qemu gives you fancy file formats with encryption and compression, while
the same device implemented in the kernel gives you a low-overhead path
directly to a zillion-disk SAN volume. Or a user-level network device
capable of running with the slirp stack and no permissions vs. the
kernel device running copyless most of the time and using a dma engine
for the rest but requiring you to be good friends with the admin.

The user should expect zero reconfigurations moving a VM from one model

We abstract this away by having a "channel signalled" API (both at the
kernel for kernel devices and as a kvm.h exit reason / libkvm callback.

I was thinking it could be useful mostly in the context of a paravirt

Make it appear as a pci function? (though my feeling is that multiple

True. They aren't fun to use, though.

--
Any sufficiently difficult bug is indistinguishable from a feature.

-

To: Avi Kivity <avi@...>
Cc: <kvm-devel@...>, <virtualization@...>, <linux-kernel@...>, Eric Van Hensbergen <ericvanhensbergen@...>, Rusty Russell <rusty@...>, lguest <lguest@...>
Date: Friday, November 23, 2007 - 12:51 pm

Then on each kick, all queues have to be checked for processing? What

I would have to look, but since rx/tx are rather independent actions,
I'm not sure that you would really save that much. You still end up

I'm wary of introducing the notion of hypercalls to this device because
it makes the device VMM specific. Maybe we could have the device
provide an option ROM that was treated as the device "BIOS" that we
could use for kicking and interrupt acking? Any idea of how that would
map to Windows? Are there real PCI devices that use the option ROM
space to provide what's essentially firmware? Unfortunately, I don't

If you're doing restore by passing the PCI config blob to a registered
routine, then sure, but that doesn't seem much better to me than just
having the device generate that blob in the first place (which is what
we have today). I was assuming that you would want to use the existing

I don't think they're really any worse :-)

Regards,

Anthony Liguori

-

To: Anthony Liguori <aliguori@...>
Cc: Eric Van Hensbergen <ericvanhensbergen@...>, lguest <lguest@...>, <kvm-devel@...>, <linux-kernel@...>, <virtualization@...>
Date: Friday, November 23, 2007 - 1:47 pm

rx and tx are closely related. You rarely have one without the other.

In fact, a turned implementation should have zero kicks or interrupts
for bulk transfers. The rx interrupt on the host will process new tx
descriptors and fill the guest's rx queue; the guest's transmit function
can also check the receive queue. I don't know if that's achievable for
Linuz guests currently, but we should aim to make it possible.

Another point is that virtio still has a lot of leading zeros in its
mileage counter. We need to keep things flexible and learn from others

The BIOS wouldn't work even on x86 because it isn't mapped to the guest
address space (at least not consistently), and doesn't know the guest's
programming model (16, 32, or 64-bits? segmented or flat?)

Xen uses a hypercall page to abstract these details out. However, I'm
not proposing that. Simply indicate that we support hypercalls, and use
some layer below to actually send them. It is the responsibility of this
layer to detect if hypercalls are present and how to call them.

Hey, I think the best place for it is in paravirt_ops. We can even patch
the hypercall instruction inline, and the driver doesn't need to know

Then we can start selling virtio extension chassis.

--
Any sufficiently difficult bug is indistinguishable from a feature.

-

To: Avi Kivity <avi@...>
Cc: Eric Van Hensbergen <ericvanhensbergen@...>, lguest <lguest@...>, <kvm-devel@...>, <linux-kernel@...>, <virtualization@...>
Date: Monday, November 26, 2007 - 3:18 pm

ATM, the net driver does a pretty good job of disabling kicks/interrupts
unless they are needed. Checking for rx on tx and vice versa is a good

Yes, after thinking about it over holiday, I agree that we should at
least introduce a virtio-pci feature bitmask. I'm not inclined to
attempt to define a hypercall ABI or anything like that right now but
having the feature bitmask will at least make it possible to do such a

Yes, paravirt_ops is attractive for abstracting the hypercall calling
mechanism but it's still necessary to figure out how hypercalls would be
identified. I think it would be necessary to define a virtio specific
hypercall space and use the virtio device ID to claim subspaces.

For instance, the hypercall number could be (virtio_devid << 16) | (call
number). How that translates into a hypercall would then be part of the
paravirt_ops abstraction. In KVM, we may have a single virtio hypercall
where we pass the virtio hypercall number as one of the arguments or
PCI bus? My concern was that it was limited by something stupid like an
8-bit identifier.

Regards,

Anthony Liguori

-

To: Anthony Liguori <aliguori@...>
Cc: Eric Van Hensbergen <ericvanhensbergen@...>, lguest <lguest@...>, <kvm-devel@...>, <linux-kernel@...>, <virtualization@...>
Date: Tuesday, November 27, 2007 - 5:02 am

No, definitely not define a hypercall ABI. The feature bit should say
"this device understands a hypervisor-specific way of kicking. consult
your hypervisor manual and cpuid bits for further details. should you

If we don't call it a hypercall, but a virtio kick operation, we don't
need to worry about the hypercall number or ABI. It's just a function
that takes an argument that's implemented differently by every

IIRC pci slots are 8-bit, but you can have multiple buses, so
effectively 16 bits of device address space (discounting functions which
are likely not hot-pluggable).

--
error compiling committee.c: too many arguments to function

-

To: <kvm-devel@...>
Cc: Anthony Liguori <aliguori@...>, Eric Van Hensbergen <ericvanhensbergen@...>, <kvm-devel@...>, lguest <lguest@...>, <linux-kernel@...>, <virtualization@...>
Date: Tuesday, November 27, 2007 - 5:25 am

You have an 8 bit bus number and an 8 bit device/function number.
The function number is 3 bits, so if you want to use only function 0
for everything, you are limited to a little under 8192 (2^(8+5)) devices
per PCI domain. PC style hardware cannot easily address multiple PCI
domains, but I think you can have them if you assume that the guest is
using mmconfig.

For using multiple buses, the easiest way could be to have every
device/function on bus 0 be a bridge by itself, so you end up with a
flat number space for the actual devices,

$ lspci -t
[0000:00]-+-00.0-[0000:01]--+-00.0
| +-01.0
| +-02.0
| + ...
| \-3f.0
+-00.1-[0000:02]--+-00.0
| +-01.0
| +-02.0
| + ...
| \-3f.0
+ ...
|
+-3f.6-[0000:ff]--+-00.0
+-01.0
+-02.0
+ ...
\-3f.0

Arnd <><
-

To: Avi Kivity <avi@...>
Cc: Anthony Liguori <aliguori@...>, Eric Van Hensbergen <ericvanhensbergen@...>, <kvm-devel@...>, lguest <lguest@...>, <linux-kernel@...>, <virtualization@...>
Date: Tuesday, November 27, 2007 - 5:09 am

...unless you're lucky enough to be on s390 where pio is not available.
I don't see why we'd have two different ways to talk to a virtio
device. I think we should use a hypercall for interrupt injection,
without support for grumpy old soldered pci features other than
HPA-style Lguest PCI bus organization. There are no devices that we
want to be backward compatible with.
-

To: <carsteno@...>
Cc: Eric Van Hensbergen <ericvanhensbergen@...>, <kvm-devel@...>, lguest <lguest@...>, <linux-kernel@...>, <virtualization@...>
Date: Tuesday, November 27, 2007 - 5:27 am

pio is useful for qemu, for example, and as a fallback against changing
hypervisor calling conventions. As Anthony points out, it makes a
qemu-implemented device instantly available to Xen at no extra charge.

My wording was inappropriate for s390, though. The politically correct
version reads "this device understands a hypervisor-specific way of
kicking. consult your hypervisor manual and platform-specific way of
querying hypervisor information for further details. should you not be
satisfied with this method, the standard method of kicking virtio
devices on your platform is still available".

On s390, I imagine that "the standard method" is the fabled diag
instruction (which, with the proper arguments, will cook your steak to
the exact shade of medium-rare you desire). So you will never need to
set the "hypervisor-specific way of kicking" bit, as your standard
method is already optimal.

Unfortunately, we have to care for platform differences, subarch
differences (vmx/svm), hypervisor differences (with virtio), and guest
differences (Linux/Windows/pvLinux, 32/64). Much care is needed when
designing the ABI here.

[actually thinking a bit, this is specific to the virtio pci binding;
s390 will never see any of it]

--
error compiling committee.c: too many arguments to function

-

To: Avi Kivity <avi@...>
Cc: <carsteno@...>, Eric Van Hensbergen <ericvanhensbergen@...>, <kvm-devel@...>, lguest <lguest@...>, <linux-kernel@...>, <virtualization@...>
Date: Tuesday, November 27, 2007 - 6:12 am

You remember that we've lost the big debate around virtio in Tucson?
We intend to bind our virtio devices to PCI too, so that they look the
same in Linux userland across architectures.
-

To: <carsteno@...>
Cc: Eric Van Hensbergen <ericvanhensbergen@...>, <kvm-devel@...>, lguest <lguest@...>, <linux-kernel@...>, <virtualization@...>
Date: Tuesday, November 27, 2007 - 6:19 am

Ouch.

--
error compiling committee.c: too many arguments to function

-

To: Avi Kivity <avi@...>
Cc: <carsteno@...>, Eric Van Hensbergen <ericvanhensbergen@...>, <kvm-devel@...>, lguest <lguest@...>, <linux-kernel@...>, <virtualization@...>
Date: Tuesday, November 27, 2007 - 6:28 am

That was my initial opinion too, but HPA has come up with a lean and
clean PCI binding for lguest. I think we should seriously consider
using that over the current qemu device emulation based thing.
-

To: Avi Kivity <avi@...>
Cc: Anthony Liguori <aliguori@...>, <kvm-devel@...>, <virtualization@...>, <linux-kernel@...>, Eric Van Hensbergen <ericvanhensbergen@...>
Date: Wednesday, November 21, 2007 - 2:22 pm

I think that is pretty insightful, and indeed, is probably the only
reason we would ever consider using a virtio based driver.

But is this really a virtualization problem, and is virtio the right
place to solve it? Doesn't I/O hotplug with multipathing or NIC teaming
provide the same infrastructure in a way that is useful in more than
just a virtualization context?

Zach

-

To: Zachary Amsden <zach@...>
Cc: Anthony Liguori <aliguori@...>, <kvm-devel@...>, <virtualization@...>, <linux-kernel@...>, Eric Van Hensbergen <ericvanhensbergen@...>
Date: Thursday, November 22, 2007 - 3:32 am

With the aid of a dictionary I was able to understand about half the
words in the last sentence. Moving from device to device using
hotplug+multipath is complex to configure, available on only some
guests, uses rarely-exercised paths in the guest OS, and only works for
a few types of devices (network and block). Having host independence in
the device means you can change the device implementation for, say, a
display driver (consider, for example, a vmgl+virtio driver, which can
be implemented in userspace or tunneled via virtio-over-tcp to some
remote display without going through userspace, without the guest
knowing about it).

--
error compiling committee.c: too many arguments to function

-

To: Anthony Liguori <aliguori@...>
Cc: <linux-kernel@...>, Rusty Russell <rusty@...>, <virtualization@...>, <kvm-devel@...>
Date: Thursday, November 8, 2007 - 8:39 pm

While it's a little premature, we can start thinking of irq path
improvements.
The current patch acks a private isr and afterwards apic eoi will also
be hit since its
a level trig irq. This means 2 vmexits per irq.
We can start with regular pci irqs and move afterwards to msi.
Some other ugly hack options [we're better use msi]:
- Read the eoi directly from apic and save the first private isr ack
- Convert the specific irq line to edge triggered and dont share it

-

To: <dor.laor@...>
Cc: <linux-kernel@...>, Rusty Russell <rusty@...>, <virtualization@...>, <kvm-devel@...>
Date: Thursday, November 8, 2007 - 10:17 pm

I must admit, that I don't know a whole lot about interrupt delivery.
If we can avoid the private ISR ack then that would certainly be a good
thing to do! I think that would involve adding another bit to the
virtqueues to indicate whether or not there is work to be handled. It's
really just moving the ISR to shared memory so that there's no plenty
for accessing it.

Regards,

-

To: <kvm-devel@...>
Cc: <linux-kernel@...>, Rusty Russell <rusty@...>, <virtualization@...>, Anthony Liguori <aliguori@...>
Date: Thursday, November 8, 2007 - 1:46 pm

If you use

vp_dev->vdev.dev.parent = &pci_dev->dev;

Then there is no need for the special kvm root device, and the actual
virtio device shows up in a more logical place, under where it is
really (virtually) attached.

Arnd <><
-

To: Arnd Bergmann <arnd@...>
Cc: <kvm-devel@...>, <linux-kernel@...>, Rusty Russell <rusty@...>, <virtualization@...>
Date: Thursday, November 8, 2007 - 3:04 pm

They already show up underneath of the PCI bus. The issue is that there
are two separate 'struct device's for each virtio device. There's the
PCI device (that's part of the pci_dev structure) and then there's the
virtio_device one. I thought that setting the dev.parent of the
virtio_device struct device would result in having two separate entries
under the PCI bus directory which would be pretty confusing :-)

Regards,

-

To: <virtualization@...>
Cc: Anthony Liguori <aliguori@...>, <kvm-devel@...>, <virtualization@...>, <linux-kernel@...>
Date: Friday, November 9, 2007 - 7:03 am

But that's what a device tree means. Think about a USB disk drive: The drive
shows up as a child of the USB controller, which in turn is a child of
the PCI bridge. Note that I did not suggest having the virtio parent set to
the parent of the PCI device, but to the PCI device itself.

I find it more confusing to have a device just hanging off the root when
it is actually handled by the PCI subsystem.

Arnd <><
-

To: Anthony Liguori <aliguori@...>
Cc: <linux-kernel@...>, Rusty Russell <rusty@...>, <virtualization@...>, <kvm-devel@...>
Date: Thursday, November 8, 2007 - 2:12 am

Didn't see support for dma. I think that with Amit's pvdma patches you
can support dma-capable devices as well without too much fuss.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

-

To: Avi Kivity <avi@...>
Cc: <linux-kernel@...>, Rusty Russell <rusty@...>, <virtualization@...>, <kvm-devel@...>
Date: Thursday, November 8, 2007 - 9:54 am

What is the use case you're thinking of? A semi-paravirt driver that
does dma directly to a device?

Regards,

Anthony Liguori

-

To: Anthony Liguori <aliguori@...>
Cc: Avi Kivity <avi@...>, Rusty Russell <rusty@...>, <virtualization@...>, <linux-kernel@...>, <kvm-devel@...>
Date: Thursday, November 8, 2007 - 7:43 pm

You would also lose performance since pv-dma will trigger an exit for
each virtio io while

-

To: Anthony Liguori <aliguori@...>
Cc: <linux-kernel@...>, Rusty Russell <rusty@...>, <virtualization@...>, <kvm-devel@...>
Date: Thursday, November 8, 2007 - 10:37 am

If a pci device is capable of dma (or issuing interrupts), it will be

No, an unmodified driver that, by using clever tricks with dma_ops, can
do dma directly to guest memory. See Amit's patches.

In fact, why do a virtio transport at all? It can be done either with
trap'n'emulate, or by directly mapping the device mmio space into the guest.

(what use case are you considering? devices without interrupts and dma?
pci door stoppers?)

--
error compiling committee.c: too many arguments to function

-

To: Avi Kivity <avi@...>
Cc: <linux-kernel@...>, Rusty Russell <rusty@...>, <virtualization@...>, <kvm-devel@...>
Date: Thursday, November 8, 2007 - 11:06 am

Hrm, I think we may be talking about different things. Are you thinking
that the driver I posted allows you to do PCI pass-through over virtio?
That's not what it is.

The driver I posted is a virtio implementation that uses a PCI device.
This lets you use virtio-blk and virtio-net under KVM. The alternative
to this virtio PCI device would be a virtio transport built with
hypercalls like lguest has. I choose a PCI device because it ensured
that each virtio device showed up like a normal PCI device.

Am I misunderstanding what you're asking about?

Regards,

-

To: Anthony Liguori <aliguori@...>
Cc: <linux-kernel@...>, Rusty Russell <rusty@...>, <virtualization@...>, <kvm-devel@...>
Date: Thursday, November 8, 2007 - 11:13 am

No, I completely misunderstood the patch. Should review complete
patches rather than random hunks.

Sorry for the noise.

--
error compiling committee.c: too many arguments to function

-

Previous thread: Re: [PATCH] virtio config_ops refactoring by Anthony Liguori on Wednesday, November 7, 2007 - 10:41 pm. (9 messages)

Next thread: [PATCH] sysctl: Check length at deprecated_sysctl_warning. by Tetsuo Handa on Wednesday, November 7, 2007 - 10:57 pm. (9 messages)