[PATCH] Vmchannel PCI device.

Previous thread: [PATCH 02/36] KVM: x86 emulator: consolidate emulation of two operand instructions by Avi Kivity on Sunday, December 14, 2008 - 1:06 am. (37 messages)

Next thread: [PATCH] AF_VMCHANNEL address family for guest<->host communication. by Gleb Natapov on Sunday, December 14, 2008 - 4:50 am. (29 messages)
From: Gleb Natapov
Date: Sunday, December 14, 2008 - 4:50 am

There is a need for communication channel between host and various
agents that are running inside a VM guest. The channel will be used
for statistic gathering, logging, cut &amp; paste, host screen resolution
changes notification, guest configuration etc.

It is undesirable to use TCP/IP for this purpose since network
connectivity may not exist between host and guest and if it exists the
traffic can be not routable between host and guest for security reasons
or TCP/IP traffic can be firewalled (by mistake) by unsuspecting VM user.

The patch implements separate PCI device for this type of communication.
To create a channel &quot;-vmchannel channel:dev&quot; option should be specified
on qemu commmand line during VM launch.

Signed-off-by: Gleb Natapov &lt;gleb@redhat.com&gt;
---

 Makefile.target       |    2 
 hw/pc.c               |    8 +
 hw/virtio-vmchannel.c |  283 +++++++++++++++++++++++++++++++++++++++++++++++++
 hw/virtio-vmchannel.h |   19 +++
 sysemu.h              |    4 +
 vl.c                  |   35 ++++++
 6 files changed, 344 insertions(+), 7 deletions(-)
 create mode 100644 hw/virtio-vmchannel.c
 create mode 100644 hw/virtio-vmchannel.h

diff --git a/Makefile.target b/Makefile.target
index 8c649be..d9f5aad 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -637,7 +637,7 @@ OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
 OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
 OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o
 # virtio support
-OBJS+= virtio.o virtio-blk.o virtio-balloon.o
+OBJS+= virtio.o virtio-blk.o virtio-balloon.o virtio-vmchannel.o
 CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE
 endif
 ifeq ($(TARGET_BASE_ARCH), ppc)
diff --git a/hw/pc.c b/hw/pc.c
index 73dd8bc..57e3b1d 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1095,7 +1095,7 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size,
         }
     }
 
-    /* Add virtio block devices */
+    /* Add virtio devices */
     if (pci_enabled) {
         int index;
         ...
From: Blue Swirl
Date: Sunday, December 14, 2008 - 5:28 am

Isn't this exactly what the firmware configuration device was supposed
to be used for? In the list of use cases you gave, I don't see
anything that could not be done with it.

So, to avoid duplicated functionality, I'd add the missing pieces to
the configuration device and if PCI compatibility is desired, the
firmware configuration device IO port could be handled by a wrapper
PCI device much like what you proposed.
--

From: Gleb Natapov
Date: Sunday, December 14, 2008 - 6:12 am

The requirement for firmware configuration interface was different. We
wanted something simple that we can use as early as possible in cpu init
code and performance was not considered at all. Obviously PCI device doesn't
fit for this. We don't want to write PCI driver inside a BIOS and PCI
initialization is too late in HW initialization sequence.

The requirement for vmchannel was that it should allow a guest
to communicate with external (to qemu) process and with reasonable
performance too. Firmware interface that copies data byte at time does
not fit.  And obviously firmware interface lacks interrupts, we don't
vmchannel code uses virtio subsistem (which was not present in qemu when
firmware interface was added BTW). Theoretically we can use virtio for
FW interface too, but the in guest part of vitio is too complex to be
added to firmware IMO. Lets keep simple things simple.

--
			Gleb.
--

From: Anthony Liguori
Date: Sunday, December 14, 2008 - 12:15 pm

This is not a requirement that I think is important.  It's only a 
requirement for you because you have closed code that you want to 
implement the backend with.  I would personally be more interested in 
vmchannel backends in QEMU and I think there will be a lot of them.

But the firmware config interface is different than what is proposed 
here in a number of important ways.  The first is that this is a 
streaming communication mechanism verses a value/pair store.  It maps 
naturally to userspace via a socket abstraction and is present in a 
number of other hypervisors (XenSocket in Xen, VMCI in VMware, etc.).

I see the firmware config as more akin to a device tree or CMOS than a 
generic guest&lt;=&gt;host transport.

Regards,

Anthony Liguori
--

From: Gleb Natapov
Date: Sunday, December 14, 2008 - 12:37 pm

I don't know why do you think that we are going to use that for closed
code or something. It will be used by libvirt and it is open source last
I checked.

--
			Gleb.
--

From: Anthony Liguori
Date: Sunday, December 14, 2008 - 3:52 pm

For what?

vmchannel was developed for SPICE, is this not right?  That's where my 
assumption comes from.  If there's another use case, please describe it.

Regards,


--

From: Avi Kivity
Date: Monday, December 15, 2008 - 2:20 am

No, spice does its own thing.  It's dma intensive, so it isn't a good 
fit for vmchannel.

-- 
error compiling committee.c: too many arguments to function

--

From: Dan Kenigsberg
Date: Monday, December 15, 2008 - 2:25 am

Our management system uses vmchannel to communicate with an agent
running on the guest.

We use this agent to collect information about the guest OS: e.g.,
installed applications, who's logged in, whether anything's running, or
the guest is rebooting.

The agent is capable of performing operations on the guest, too. We use
this to log a user in (for single sign-on), to log a user out before
migrating to file, to renew the guest's dhcp lease if the guest is
migrated to another subnet, to name a few uses.

Dan.
--

From: Dan Kenigsberg
Date: Monday, December 15, 2008 - 8:43 am

Our management system uses vmchannel to communicate with an agent
running on the guest.

We use this agent to collect information about the guest OS: e.g.,
installed applications, who's logged in, whether anything's running, or
the guest is rebooting.

The agent is capable of performing operations on the guest, too. We use
this to log a user in (for single sign-on), to log a user out before
migrating to file, to renew the guest's dhcp lease if the guest is
migrated to another subnet, to name a few uses.

Dan.
--

From: Daniel P. Berrange
Date: Sunday, December 14, 2008 - 3:13 pm

One non-QEMU backend I can see being implemented is a DBus daemon,
providing a simple bus for RPC calls between guests &amp; host. Or on
a similar theme, perhaps a QPid message broker in the host OS. Yet
another backend is a clustering service providing a virtual fence
device to VMs. All of these would live outside QEMU, and as such
exposing the backend using the character device infrastructure 
is a natural fit.

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--

From: Anthony Liguori
Date: Sunday, December 14, 2008 - 3:56 pm

The main problem with &quot;external&quot; backends is that they cannot easily 
participate in save/restore or live migration.  If you want to have an 
RPC mechanism, I would suggest implementing the backend in QEMU and 

Why not use virtual networking for a clustering service (as you would in 

If you don't have QEMU as a broker, it makes it very hard for QEMU to 
virtualization all of the resources exposed to the guest.  This 
complicates things like save/restore and complicates security policies 
since you now have things being done on behalf of a guest originating 
from another process.  It generally breaks the model of guest-as-a-process.

What's the argument to do these things external to QEMU?

Regards,


--

From: Daniel P. Berrange
Date: Sunday, December 14, 2008 - 4:33 pm

DBus is a general purpose RPC service, which has little-to-no knowledge
of the semantics of application services running over it. Simply pushing
a backend into QEMU can't magically make sure all the application level
state is preserved across save/restore/migrate. For some protocols the
only viable option may be to explicitly give the equivalent of -EPIPE 
/ POLLHUP to the guest and have it explicitly re-establish connectivity 

It imposes a configuration &amp; authentication burden on the guest to
use networking. When a virtual fence device is provided directly from
the host OS, you can get zero-config deployment of clustering with
the need to configure any authentication credentials in the guest.

This really depends on what you define the semantics of the vmchannel
protocol to be - specifically whether you want save/restore/migrate to
be totally opaque to the guest or not. I could imagine one option is to
have the guest end of the device be given -EPIPE when the backend is
restarted for restore/migrate, and choose to re-establish its connection
if so desired. This would not require QEMU to maintain any backend state.
For stateless datagram (UDP-like) application protocols there's nothing 

There are many potential uses cases for VMchannel, not all are going
to be general purpose things that everyone wants to use. Forcing alot
of application specific backend code into QEMU is not a good way to 
approach this from a maintenance point of view. Some backends may well
be well suited to living inside QEMU, while others may be better suited
as external services. 

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--

From: Thiemo Seufer
Date: Sunday, December 14, 2008 - 6:18 pm

Daniel P. Berrange wrote:

Could you describe a practical use case of VMchannel in Qemu? I think I

If it is only good for specialized esoteric stuff, why should it be in
Qemu?


Thiemo
--

From: Anthony Liguori
Date: Sunday, December 14, 2008 - 7:03 pm

In the case of dbus, you actually have a shot of making save/restore 
transparent.  If you send the RPCs, you can parse the messages in QEMU 
and know when you have a complete buffer.  You can then dispatch the RPC 
from QEMU (and BTW, perfect example of security, you want the RPCs to 
originate from the QEMU process).  When you get the RPC response, you 
can marshal it and make it available to the guest.

If you ever have a request or response, you should save the partial 
results as part of save/restore.  You could use the live feature of 
savevm to attempt to wait until there are no pending RPCs.  In fact, you 
have to do this because otherwise, the save/restore would be broken.

This example is particularly bad for EPIPE.  If the guest sends an RPC, 
what happens if it gets EPIPE?  Has it been completed?  It would make it 
very difficult to program for this model.

EPIPE is the model Xen used for guest save/restore and it's been a huge 
hassle.  You don't want guests involved in save/restore because it adds 
a combinatorial factor to your test matrix.  You have to now test every 
host combination with every supported guest combination to ensure that 
save/restore has not regressed.  It's a huge burden and IMHO is never 

If you just want to use vmchannel for networking without the 
&quot;configuration&quot; burden then someone heavily involved with a distro 
should just preconfigure, say Fedora, to create a private network on a 
dedicated network interface as soon as the system starts.  Then you have 
a dedicated, never disappearing network interface you can use for all of 

It's a losing proposition because it explodes the test matrix to build 

I think VMchannel is a useful concept but not for the same reasons you 
do :-)

Regards,


--

From: Daniel P. Berrange
Date: Monday, December 15, 2008 - 2:47 am

This is missing the point I was trying to make. Sure you can parse the
RPC calls and know where the boundaries are so you can ensure a consistent
RPC stream, but that's not the state problem I was trying to describe.

There is state in the higher level application relating to the RPC calls.
So, consider two RPC calls made by say NetworkManager

      - Create FOO()
      - Run FOO.Bar()

We save/restore half-way through the 'Run FOO.Bar()' method, QEMU can see
this and so it replays the interruptted 'Run FOO.Bar()' method call upon 
restore. This is usless because there is nothing that says object 'FOO()'
even exists on the host the VM is being restored on.

Thus for this kind of RPC service I believe is it better not to try and
be transparent in save/restore. Explicitly break the communication channel
and allow the guest VM to re-discover the relative services it wants to

This is really a question of what semantics the application wants to provide.
DBus is not attempting to provide a guarenteed reliable message delivery
service, and its specification does not require any particular semantics 
upon connection failure. This is already true of the existing TCP based
impl of DBus. So whether in-flight RPCs complete or not is 'undefined',
and thus upon re-connection the client in the guest needs to re-discover
whatever it was talking to and verify what state its in.

A service like AMQP/QPid does provide a guarenteed reliable message stream
with the neccessary protocol handshakes/acknowledgments to be able to 
implement clearly defined semantics upon connection failure. ie the app is 
able to get a guarenteed response as to whether the message has been
delivered or not, even in the face of network comms failure.

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  ...
From: Anthony Liguori
Date: Sunday, December 14, 2008 - 12:24 pm

Please don't make comments less specific.  We probably want to go in the 


Did you intend for GPLv2 or GPLv2+?  There's no requirement either way 

I very much like just naming these things dprintf() but this is not a 


No need to strdup().  optarg is good for the duration of execution.

I've only done a light review but things mostly look good.  I'd like to 
wait a bit to see what the reaction is on netdev before applying.

Regards,

Anthony Liguori
--

From: Gleb Natapov
Date: Sunday, December 14, 2008 - 12:44 pm

I change the comment because I also changes the code it describes.
Previously it registered only block device, now it registers balloon
optarg is const and I change the string during parsing (call strsep on it).

--
			Gleb.
--

From: Paul Brook
Date: Sunday, December 14, 2008 - 5:41 pm

Needs documentation.

Paul
--

From: Anthony Liguori
Date: Sunday, December 14, 2008 - 6:50 pm

Gee, that sounds awfully familiar ;-)

Regards,


--

Previous thread: [PATCH 02/36] KVM: x86 emulator: consolidate emulation of two operand instructions by Avi Kivity on Sunday, December 14, 2008 - 1:06 am. (37 messages)

Next thread: [PATCH] AF_VMCHANNEL address family for guest<->host communication. by Gleb Natapov on Sunday, December 14, 2008 - 4:50 am. (29 messages)