Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Avi Kivity
Date: Thursday, October 1, 2009 - 1:34 am

On 09/30/2009 10:04 PM, Gregory Haskins wrote:



Virtualization is about not doing that.  Sometimes it's necessary (when 
you have made unfixable design mistakes), but just to replace a bus, 
with no advantages to the guest that has to be changed (other 
hypervisors or hypervisorless deployment scenarios aren't).


Well, Xen requires pre-translation (since the guest has to give the host 
(which is just another guest) permissions to access the data).  So 
neither is a superset of the other, they're just different.

It doesn't really matter since Xen is unlikely to adopt virtio.


You can simply use the same vector for both rx and tx and poll both at 
every interrupt.


(irq window exits should only be required on a small percentage of 
interrupt injections, since the guest will try to disable interrupts for 
short periods only)


Can you please stop comparing userspace-based virtio hosts to 
kernel-based venet hosts?  We know the userspace implementation sucks.


Requiring all three exits means the guest is spending most of its time 
with interrupts disabled; that's unlikely.

Thanks for the numbers.  Are those 11% attributable to rx/tx 
piggybacking from the same interface?

Also, 170K interupts -> 17K interrupts/sec -> 55kbit/interrupt -> 
6.8kB/interrupt.  Ignoring interrupt merging and assuming equal rx/tx 
distribution, that's about 13kB/interrupt.  Seems rather low for a 
saturated link.


With standard PCI, they do not.  But all modern host adapters support 
MSI and they will happily give you one interrupt per queue.


Look at the vmxnet3 submission (recently posted on virtualization@).  
It's a perfectly ordinary PCI NIC driver, apart from having so many 'V's 
in the code.  16 rx queues, 8 tx queues, 25 MSIs, BARs for the 
registers.  So while the industry as a whole might disagree with me, it 
seems VMware does not.



Let's do that then.  Please reserve the corresponding comparisons from 
your side as well.


What are scheduler coordination and non-802.x fabrics?


(avoiding infinite loop)


I think Ira said he can make vhost work?


virtio-net over pci is deployed.  Replacing the backend with vhost-net 
will require no guest modifications.  Replacing the frontend with venet 
or virt-net/vbus-pci will require guest modifications.

Obviously virtio-net isn't deployed in non-virt.  But if we adopt vbus, 
we have to migrate guests.




But we have to implement vbus for each guest we want to support.  That 
includes Windows and older Linux which has a different internal API, so 
we have to port the code multiple times, to get existing functionality.


virtio-net doesn't use any pv layer.


virtio-net doesn't modify the PCI model.  And if you look at vmxnet3, 
they mention that it conforms to somthing called UPT, which allows 
hardware vendors to implement parts of their NIC model.  So vmxnet3 is 
apparently suitable to both hardware and software implementations.


You can have dynamic MSI/queue routing with virtio, and each MSI can be 
routed to a vcpu at will.


Do you mean interrupt priority?  Well, apic allows interrupt priorities 
and Windows uses them; Linux doesn't.  I don't see a reason to provide 
more than native hardware.


N:1 breaks down on large guests since one vcpu will have to process all 
events.  You could do N:M, with commands to change routings, but where's 
your userspace interface?  you can't tell from /proc/interrupts which 
vbus interupts are active, and irqbalance can't steer them towards less 
busy cpus since they're invisible to the interrupt controller.



The larger your installed base, the more difficult it is.  Of course 
it's doable, but I prefer not doing it and instead improving things in a 
binary backwards compatible manner.  If there is no choice we will bow 
to the inevitable and make our users upgrade.  But at this point there 
is a choice, and I prefer to stick with vhost-net until it is proven 
that it won't work.


One of the benefits of virtualization is that the guest model is 
stable.  You can live-migrate guests and upgrade the hardware 
underneath.  You can have a single guest image that you clone to 
provision new guests.  If you switch to a new model, you give up those 
benefits, or you support both models indefinitely.

Note even hardware nowadays is binary compatible.  One e1000 driver 
supports a ton of different cards, and I think (not sure) newer cards 
will work with older drivers, just without all their features.


For a new install, sure.  I'm talking about existing deployments (and 
those that will exist by the time vbus is ready for roll out).


virtio was certainly not pain free, needing Windows drivers, updates to 
management tools (you can't enable it by default, so you have to offer 
it as a choice), mkinitrd, etc.  I'd rather not have to go through that 
again.


No, you have to update the driver in your initrd (for Linux) or properly 
install the new driver (for Windows).  It's especially difficult for 
Windows.


I don't want to support both virtio and vbus in parallel.  There's 
enough work already.  If we adopt vbus, we'll have to deprecate and 
eventually kill off virtio.


PCI is continuously updated, with MSI, MSI-X, and IOMMU support being 
some recent updates.  I'd like to ride on top of that instead of having 
to clone it for every guest I support.


Right, it means you can hand off those eventfds to other qemus or other 
pure userspace servers.  It's more flexible.


No kvm feature will ever be exposed to a guest without userspace 
intervention.  It's a basic requirement.  If it causes complexity (and 
it does) we have to live with it.


Ah, you have a Windows venet driver?



It's the compare venet-in-kernel to virtio-in-userspace thing again.  
Let's defer that until mst complete vhost-net mergable buffers, it which 
time we can compare vhost-net to venet and see how much vbus contributes 
to performance and how much of it comes from being in-kernel.


Since this is getting confusing to me, I'll start from scratch looking 
at the vbus layers, top to bottom:

Guest side:
1. venet guest kernel driver - AFAICT, duplicates the virtio-net guest 
driver functionality
2. vbus guest driver (config and hotplug) - duplicates pci, or if you 
need non-pci support, virtio config and its pci bindings; needs 
reimplementation for all supported guests
3. vbus guest driver (interrupt coalescing, priority) - if needed, 
should be implemented as an irqchip (and be totally orthogonal to the 
driver); needs reimplementation for all supported guests
4. vbus guest driver (shm/ioq) - finder grained layering than virtio 
(which only supports the combination, due to the need for Xen support); 
can be retrofitted to virtio at some cost

Host side:
1. venet host kernel driver - is duplicated by vhost-net; doesn't 
support live migration, unprivileged users, or slirp
2. vbus host driver (config and hotplug) - duplicates pci support in 
userspace (which will need to be kept in any case); already has two 
userspace interfaces
3. vbus host driver (interrupt coalescing, priority) - if we think we 
need it (and I don't), should be part of kvm core, not a bus
4. vbus host driver (shm) - partially duplicated by vhost memory slots
5. vbus host driver (ioq) - duplicates userspace virtio, duplicated by vhost


To me, compatible means I can live migrate an image to a new system 
without the user knowing about the change.  You'll be able to do that 
with vhost-net.


You'll probably need to change that as you start running smp guests.


Please post your issues.  I see ioeventfd/irqfd as critical kvm interfaces.


I'm missing something.  Where's the pv layer for virtio-net?

Linux drivers have an abstraction layer to deal with non-pci.  But the 
Windows drivers are ordinary pci drivers with nothing that looks 
pv-ish.  You could implement virtio-net hardware if you wanted to.


(and selinux label)


It always begins with a 119-line patch and then grows, that's life.


For virt uses, I don't see the need.  For non-virt, I have no opinion.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCHv5 3/3] vhost_net: a kernel-level virtio server, Michael S. Tsirkin, (Thu Aug 27, 9:07 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Ira W. Snyder, (Thu Sep 3, 11:39 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Michael S. Tsirkin, (Mon Sep 7, 3:15 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Ira W. Snyder, (Tue Sep 8, 10:20 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Michael S. Tsirkin, (Tue Sep 8, 1:14 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Fri Sep 11, 9:00 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Fri Sep 11, 9:14 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Michael S. Tsirkin, (Sat Sep 12, 10:46 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Michael S. Tsirkin, (Sun Sep 13, 5:01 am)
RE: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Xin, Xiaohui, (Sun Sep 13, 10:57 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Michael S. Tsirkin, (Mon Sep 14, 12:05 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Mon Sep 14, 9:08 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Michael S. Tsirkin, (Mon Sep 14, 9:47 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Michael S. Tsirkin, (Mon Sep 14, 9:53 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Mon Sep 14, 12:14 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Mon Sep 14, 12:28 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Tue Sep 15, 6:03 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Tue Sep 15, 6:50 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Michael S. Tsirkin, (Tue Sep 15, 7:28 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Tue Sep 15, 1:08 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Michael S. Tsirkin, (Tue Sep 15, 1:40 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Tue Sep 15, 1:43 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Michael S. Tsirkin, (Tue Sep 15, 2:25 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Michael S. Tsirkin, (Tue Sep 15, 2:38 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Tue Sep 15, 2:39 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Tue Sep 15, 2:55 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Wed Sep 16, 4:44 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Wed Sep 16, 7:10 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Arnd Bergmann, (Wed Sep 16, 7:57 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Michael S. Tsirkin, (Wed Sep 16, 8:13 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Arnd Bergmann, (Wed Sep 16, 8:22 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Michael S. Tsirkin, (Wed Sep 16, 9:08 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Wed Sep 16, 12:22 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Wed Sep 16, 8:11 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Michael S. Tsirkin, (Wed Sep 16, 8:57 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Wed Sep 16, 9:13 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Javier Guerra, (Thu Sep 17, 7:16 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Ira W. Snyder, (Mon Sep 21, 2:43 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Ira W. Snyder, (Tue Sep 22, 8:25 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Wed Sep 23, 7:26 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Wed Sep 23, 8:10 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Wed Sep 23, 10:58 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Wed Sep 23, 2:15 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Thu Sep 24, 11:03 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Thu Sep 24, 11:04 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Ira W. Snyder, (Thu Sep 24, 12:27 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Ira W. Snyder, (Fri Sep 25, 10:01 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Fri Sep 25, 2:32 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Michael S. Tsirkin, (Sun Sep 27, 12:43 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Wed Sep 30, 1:04 pm)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Avi Kivity, (Thu Oct 1, 1:34 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Michael S. Tsirkin, (Thu Oct 1, 2:28 am)
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server, Gregory Haskins, (Thu Oct 1, 12:24 pm)