There is a need for communication channel between host and various
agents that are running inside a VM guest. The channel will be used
for statistic gathering, logging, cut & paste, host screen resolution
changes notification, guest configuration etc.
It is undesirable to use TCP/IP for this purpose since network
connectivity may not exist between host and guest and if it exists the
traffic can be not routable between host and guest for security reasons
or TCP/IP traffic can be firewalled (by mistake) by unsuspecting VM user.
The patch implements separate PCI device for this type of communication.
To create a channel "-vmchannel channel:dev" option should be specified
on qemu commmand line during VM launch.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
Makefile.target | 2
hw/pc.c | 8 +
hw/virtio-vmchannel.c | 283 +++++++++++++++++++++++++++++++++++++++++++++++++
hw/virtio-vmchannel.h | 19 +++
sysemu.h | 4 +
vl.c | 35 ++++++
6 files changed, 344 insertions(+), 7 deletions(-)
create mode 100644 hw/virtio-vmchannel.c
create mode 100644 hw/virtio-vmchannel.h
diff --git a/Makefile.target b/Makefile.target
index 8c649be..d9f5aad 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -637,7 +637,7 @@ OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o
# virtio support
-OBJS+= virtio.o virtio-blk.o virtio-balloon.o
+OBJS+= virtio.o virtio-blk.o virtio-balloon.o virtio-vmchannel.o
CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE
endif
ifeq ($(TARGET_BASE_ARCH), ppc)
diff --git a/hw/pc.c b/hw/pc.c
index 73dd8bc..57e3b1d 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1095,7 +1095,7 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size,
}
}
- /* Add virtio block devices */
+ /* Add virtio devices */
if (pci_enabled) {
int index;
...Isn't this exactly what the firmware configuration device was supposed to be used for? In the list of use cases you gave, I don't see anything that could not be done with it. So, to avoid duplicated functionality, I'd add the missing pieces to the configuration device and if PCI compatibility is desired, the firmware configuration device IO port could be handled by a wrapper PCI device much like what you proposed. --
The requirement for firmware configuration interface was different. We wanted something simple that we can use as early as possible in cpu init code and performance was not considered at all. Obviously PCI device doesn't fit for this. We don't want to write PCI driver inside a BIOS and PCI initialization is too late in HW initialization sequence. The requirement for vmchannel was that it should allow a guest to communicate with external (to qemu) process and with reasonable performance too. Firmware interface that copies data byte at time does not fit. And obviously firmware interface lacks interrupts, we don't vmchannel code uses virtio subsistem (which was not present in qemu when firmware interface was added BTW). Theoretically we can use virtio for FW interface too, but the in guest part of vitio is too complex to be added to firmware IMO. Lets keep simple things simple. -- Gleb. --
This is not a requirement that I think is important. It's only a requirement for you because you have closed code that you want to implement the backend with. I would personally be more interested in vmchannel backends in QEMU and I think there will be a lot of them. But the firmware config interface is different than what is proposed here in a number of important ways. The first is that this is a streaming communication mechanism verses a value/pair store. It maps naturally to userspace via a socket abstraction and is present in a number of other hypervisors (XenSocket in Xen, VMCI in VMware, etc.). I see the firmware config as more akin to a device tree or CMOS than a generic guest<=>host transport. Regards, Anthony Liguori --
I don't know why do you think that we are going to use that for closed code or something. It will be used by libvirt and it is open source last I checked. -- Gleb. --
For what? vmchannel was developed for SPICE, is this not right? That's where my assumption comes from. If there's another use case, please describe it. Regards, --
No, spice does its own thing. It's dma intensive, so it isn't a good fit for vmchannel. -- error compiling committee.c: too many arguments to function --
Our management system uses vmchannel to communicate with an agent running on the guest. We use this agent to collect information about the guest OS: e.g., installed applications, who's logged in, whether anything's running, or the guest is rebooting. The agent is capable of performing operations on the guest, too. We use this to log a user in (for single sign-on), to log a user out before migrating to file, to renew the guest's dhcp lease if the guest is migrated to another subnet, to name a few uses. Dan. --
Our management system uses vmchannel to communicate with an agent running on the guest. We use this agent to collect information about the guest OS: e.g., installed applications, who's logged in, whether anything's running, or the guest is rebooting. The agent is capable of performing operations on the guest, too. We use this to log a user in (for single sign-on), to log a user out before migrating to file, to renew the guest's dhcp lease if the guest is migrated to another subnet, to name a few uses. Dan. --
One non-QEMU backend I can see being implemented is a DBus daemon, providing a simple bus for RPC calls between guests & host. Or on a similar theme, perhaps a QPid message broker in the host OS. Yet another backend is a clustering service providing a virtual fence device to VMs. All of these would live outside QEMU, and as such exposing the backend using the character device infrastructure is a natural fit. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| --
The main problem with "external" backends is that they cannot easily participate in save/restore or live migration. If you want to have an RPC mechanism, I would suggest implementing the backend in QEMU and Why not use virtual networking for a clustering service (as you would in If you don't have QEMU as a broker, it makes it very hard for QEMU to virtualization all of the resources exposed to the guest. This complicates things like save/restore and complicates security policies since you now have things being done on behalf of a guest originating from another process. It generally breaks the model of guest-as-a-process. What's the argument to do these things external to QEMU? Regards, --
DBus is a general purpose RPC service, which has little-to-no knowledge of the semantics of application services running over it. Simply pushing a backend into QEMU can't magically make sure all the application level state is preserved across save/restore/migrate. For some protocols the only viable option may be to explicitly give the equivalent of -EPIPE / POLLHUP to the guest and have it explicitly re-establish connectivity It imposes a configuration & authentication burden on the guest to use networking. When a virtual fence device is provided directly from the host OS, you can get zero-config deployment of clustering with the need to configure any authentication credentials in the guest. This really depends on what you define the semantics of the vmchannel protocol to be - specifically whether you want save/restore/migrate to be totally opaque to the guest or not. I could imagine one option is to have the guest end of the device be given -EPIPE when the backend is restarted for restore/migrate, and choose to re-establish its connection if so desired. This would not require QEMU to maintain any backend state. For stateless datagram (UDP-like) application protocols there's nothing There are many potential uses cases for VMchannel, not all are going to be general purpose things that everyone wants to use. Forcing alot of application specific backend code into QEMU is not a good way to approach this from a maintenance point of view. Some backends may well be well suited to living inside QEMU, while others may be better suited as external services. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| --
Daniel P. Berrange wrote: Could you describe a practical use case of VMchannel in Qemu? I think I If it is only good for specialized esoteric stuff, why should it be in Qemu? Thiemo --
In the case of dbus, you actually have a shot of making save/restore transparent. If you send the RPCs, you can parse the messages in QEMU and know when you have a complete buffer. You can then dispatch the RPC from QEMU (and BTW, perfect example of security, you want the RPCs to originate from the QEMU process). When you get the RPC response, you can marshal it and make it available to the guest. If you ever have a request or response, you should save the partial results as part of save/restore. You could use the live feature of savevm to attempt to wait until there are no pending RPCs. In fact, you have to do this because otherwise, the save/restore would be broken. This example is particularly bad for EPIPE. If the guest sends an RPC, what happens if it gets EPIPE? Has it been completed? It would make it very difficult to program for this model. EPIPE is the model Xen used for guest save/restore and it's been a huge hassle. You don't want guests involved in save/restore because it adds a combinatorial factor to your test matrix. You have to now test every host combination with every supported guest combination to ensure that save/restore has not regressed. It's a huge burden and IMHO is never If you just want to use vmchannel for networking without the "configuration" burden then someone heavily involved with a distro should just preconfigure, say Fedora, to create a private network on a dedicated network interface as soon as the system starts. Then you have a dedicated, never disappearing network interface you can use for all of It's a losing proposition because it explodes the test matrix to build I think VMchannel is a useful concept but not for the same reasons you do :-) Regards, --
This is missing the point I was trying to make. Sure you can parse the
RPC calls and know where the boundaries are so you can ensure a consistent
RPC stream, but that's not the state problem I was trying to describe.
There is state in the higher level application relating to the RPC calls.
So, consider two RPC calls made by say NetworkManager
- Create FOO()
- Run FOO.Bar()
We save/restore half-way through the 'Run FOO.Bar()' method, QEMU can see
this and so it replays the interruptted 'Run FOO.Bar()' method call upon
restore. This is usless because there is nothing that says object 'FOO()'
even exists on the host the VM is being restored on.
Thus for this kind of RPC service I believe is it better not to try and
be transparent in save/restore. Explicitly break the communication channel
and allow the guest VM to re-discover the relative services it wants to
This is really a question of what semantics the application wants to provide.
DBus is not attempting to provide a guarenteed reliable message delivery
service, and its specification does not require any particular semantics
upon connection failure. This is already true of the existing TCP based
impl of DBus. So whether in-flight RPCs complete or not is 'undefined',
and thus upon re-connection the client in the guest needs to re-discover
whatever it was talking to and verify what state its in.
A service like AMQP/QPid does provide a guarenteed reliable message stream
with the neccessary protocol handshakes/acknowledgments to be able to
implement clearly defined semantics upon connection failure. ie the app is
able to get a guarenteed response as to whether the message has been
delivered or not, even in the face of network comms failure.
Daniel
--
|: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 ...Please don't make comments less specific. We probably want to go in the Did you intend for GPLv2 or GPLv2+? There's no requirement either way I very much like just naming these things dprintf() but this is not a No need to strdup(). optarg is good for the duration of execution. I've only done a light review but things mostly look good. I'd like to wait a bit to see what the reaction is on netdev before applying. Regards, Anthony Liguori --
I change the comment because I also changes the code it describes. Previously it registered only block device, now it registers balloon optarg is const and I change the string during parsing (call strsep on it). -- Gleb. --
Gee, that sounds awfully familiar ;-) Regards, --
