Re: Use of virtio device IDs

Previous thread: [RFC] [PATCH 0/3] Recursive mtime for ext3 by Jan Kara on Tuesday, November 6, 2007 - 10:15 am. (15 messages)

Next thread: virtio config_ops refactoring by Anthony Liguori on Tuesday, November 6, 2007 - 10:48 am. (2 messages)
From: Anthony Liguori
Date: Tuesday, November 6, 2007 - 10:16 am

Hi Rusty,

I've written a PCI virtio transport and noticed something strange.  All 
current in-tree virtio devices register ID tables that match a specific 
device ID, but any vendor ID.

This is incompatible with using PCI vendor/device IDs for virtio 
vendor/device IDs since vendors control what device IDs mean.  A simple 
solution would be to assign a fixed vendor ID to all current virtio 
devices.  This doesn't solve the problem completely though since you 
would create a conflict between the PCI vendor ID space and the virtio 
vendor ID space.

The only solutions seem to be virtualizing the virtio vendor/device IDs 
(which is what I'm currently doing) or to mandate that the virtio vendor 
ID be within the PCI vendor ID space.  It's probably not necessary to 
make the same requirement for device IDs though.

What are your thoughts?

Regards,

Anthony Liguori
-

From: Anthony Liguori
Date: Tuesday, November 6, 2007 - 11:49 am

There's another ugly bit in the current implementation.

Right now, we would have to have every PCI vendor/device ID pair in the 
virtio PCI driver ID table for every virtio device.

This means every time a virtio device is added to Linux, the virtio PCI 
driver has to be modified (assuming that each virtio device uses a 
unique PCI vendor/device ID) :-/

Regards,


-

From: Gregory Haskins
Date: Tuesday, November 6, 2007 - 8:38 pm

I realize you guys are probably far down this road in the design
process, but FWIW: This is a major motivation for the reason that the
IOQ stuff I posted a while back used strings for device identification
instead of a fixed length, centrally managed namespace like PCI
vendor/dev-id.  Then you can just name your device something reasonably
unique (e.g. "qumranet::veth", or "ibm-pvirt-clock").

(I realize that if you are going to do PCI, you need to make it
PCI-like.  But I think using PCI in the first place is probably the
wrong direction.  IMHO, there's really not a lot of reason to be
constrained by a hardware specification once you decide to go PV.  This
is even more true if you want to support as many platforms as possible
(i.e. platforms that don't have PCI natively).

Regards,
-Greg

-

From: Avi Kivity
Date: Tuesday, November 6, 2007 - 10:40 pm

I dislike strings.  They make it look as if you have a nice extensible
interface, where in reality you have a poorly documented interface which
leads to poor interoperability.


PCI means that you can reuse all of the platform's infrastructure for
irq allocation, discovery, device hotplug, and management.  You can
write it for new guests but backporting it to older guests will be a
huge task.

We will support non-pci for s390, but in order to support Windows and
older Linux PCI is necessary.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

-

From: Rusty Russell
Date: Tuesday, November 6, 2007 - 11:09 pm

Yes, you end up with exactly names like "qumranet::veth" 
and "ibm-pvirt-clock".  I would recommend looking very hard at /proc, Open 
Firmware on a modern system, or the Xen store, to see what a lack of 

The aim is that PCI support is clean, but that we're not really tied to PCI.  
I think we're getting closer with the recent config changes.

Cheers,
Rusty.
-

From: Anthony Liguori
Date: Tuesday, November 6, 2007 - 11:29 pm

Yes, my main desire was to ensure that we had a clean PCI ABI that would 
be natural to implement on a platform like Windows.  I think with the 
recent config_ops refactoring, we can now do that.

Regards,


-

From: Anthony Liguori
Date: Wednesday, November 7, 2007 - 10:33 am

FWIW, I've switched to using the PCI subsystem vendor/device IDs for 
virtio which Rusty suggested.  I think this makes even more sense than 
using the main vendor/device ID since I do think that we only should use 
a single vendor/device ID for all virtio PCI devices and then 
differentiate based on the subsystem IDs.

Regards,


-

From: Gregory Haskins
Date: Wednesday, November 7, 2007 - 1:38 pm

Its not really a full fledged interface, but rather just a simple id
mechanism.  A decentralized id mechanism with less administrative burden.

On the flip side, a centralized namespace has the advantage of
controlling collisions at the expense of administrative overhead.  After
designing systems both ways in the past, I prefer to reduce the admin

Its tempting to use, yes.  However, most of that infrastructure is
completely inappropriate for a PV implementation, IMHO.  You are
probably better off designing something that is PV specific instead of
shoehorning it in to fit a different model (at least for the things I
have in mind).  Its not a heck of a lot of code to write a pv-centric

I don't know if I would agree with "necessary".  "Easier" perhaps. ;) By
definition once you are PV you are hypervisor aware.  Now its just a
matter of plugging in the appropriate plumbing to bridge the hypervisor
to the guest-os.  Some might be easier than others, sure.  But all
should be extensible to a degree.

But I digress.  I haven't really had much of a chance to follow the
latest developments here as I have been lost in -rt land for a few
months now.  But I know Anthony and Rusty are top-notch, so I'm sure you
guys have it under control.  Hopefully, one day soon I will be able to
join you guys again (perhaps to the KVM team's dismay ;).

Regards,
-Greg


-

From: Avi Kivity
Date: Wednesday, November 7, 2007 - 11:37 pm

Well, if we design our pv devices to look like hardware, they will fit

It is.  Especially if you consider Windows and a gazillion versions of
deployed, non-pv-capable Linux systems.  For pv-friendly newer Linux,
it's probably doable, but why?


It's "necessary" in a pragmatic sense: we want to deliver drivers that
provide features for a wide variety of guests in a reasonable
timeframe.  And that means no rewriting guest OS infrastructure.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

-

From: Gerd Hoffmann
Date: Thursday, November 8, 2007 - 2:17 am

Disclaimer: Havn't looked at the virtio code much.

I think we should keep the door open for both models and don't nail the
 virtio infrastructure to one of them.

For pure pv devices I don't see the point in trying to squeeze it into
the PCI model.  Also s390 has no PCI, so there effecticely is no way
around that, we must be able have some pure virtual bus like xenbus.

IMHO the PCI model is most useful for emulated devices with a optional
pv path.  Say the usual emulated piix3 IDE controller, give it an
additional pv mode, so the guest can drive it in pv mode if it knows
about it, and use the generic piix ide driver if it doesn't.  That kind

Uhm, well, yea.  Guess you are refering to the pv-on-hvm drivers.  Been
there, dealt with it.  What exactly do you think is messy there?

IMHO the most messy thing is the boot problem.  hvm bios can't deal with
pv disks, so you can't boot with pv disks only.  "fixed" by having the
(boot) disk twice in the system, once via emulated ide, once as pv disk.
 Ouch.


At least for any udev-based linux distro there is no need to rewrite any
guest os infrastructure.  Your virtual bus driver needs a proper uevent
callback, the virtual device drivers a module alias, and you are done.
udev autoloads your virtual device driver modules nicely, without any
distro tool hacking or config file writing ...

cheers,
  Gerd

-

From: Anthony Liguori
Date: Thursday, November 8, 2007 - 9:40 am

I don't really agree with this assessment.  There is no performance 
advantage to using a pure virtual bus.  If you have a pure pv device 
that looks and act like a PCI device, besides the obvious advantage of 
easy portability to other guest OSes (since everything support PCI, but 
porting XenBus--event to Linux 2.4.x was a royal pain), it is also very 
easy to support the device on other VMMs.

For instance, the PCI device that I just posted would allow virtio 
devices to be used trivially with HVM on Xen.  In fact, once the 
backends are complete and merged into QEMU, the next time Xen rebases 
QEMU they'll get the virtio PV-on-HVM drivers for free.  To me, that's a 

I have actually addressed this problem with a PV option rom for QEMU.  I 
expect to get time to submit the QEMU patches by the end of the year.  
See http://hg.codemonkey.ws/extboot

Regards,

Anthony Liguori
-

From: Gregory Haskins
Date: Tuesday, November 13, 2007 - 6:18 am

(Sorry for the delay)

Since PCI was designed as a hardware solution it has all kinds of stuff
specifically geared towards hardware constraints.   Those constraints
are different in a virtualized platform, so some things do not translate
well to an optimal solution.  Half of the stuff wouldn't be used, and
the other half has some nasty baggage associated with it (like still
requiring partial emulation in a PV environment).

The point of PV, of course, is high performance guest/host interfaces,
and PCI baggage just gets in the way in many respects.  Once a
particular guest's subsystem is HV aware we no longer strictly require
legacy emulation.  It should know how to talk to the host using whatever
is appropriate.  I aim to strip out all of those emulation points
whenever possible.  Once you do that, all of the PCI features that we
would use drops to zero.

On the flip side, we have full emulation if you want broader

Like what hardware?  Like PCI hardware?  What if the platform in
question doesn't have PCI?  Also note that devices don't have to look
like emulated PCI devices per se to look like hardware to the

After having done it in the past, I disagree.   But it sounds like you
are lumping core-pv and io-pv together.  To be clear, I am not.  I agree
that core-pv is invasive and not legacy friendly.  io-pv is different,
however, and generally can be retrofitted to an OS in the same way that
support for an arbitrary device X over subsystem Y (PCI, usb, pv, etc)



I guess what I am really trying to say in all this is; I would be
careful about painting KVM into a PCI corner.  If you want to "render" a
view of PV devices as PCI for platforms that can utilize it, there is
probably not any harm in that.

However, I believe having things be PCI centric, especially in the long
term, will easily turn into a mistake for the project.  I don't think
its going to be as useful as you think it is, and then we might find
ourselves in a "backwards compatibility maintenance" situation ...
From: Zachary Amsden
Date: Tuesday, November 13, 2007 - 6:59 am

I would tend to disagree with that statement.  The point of PV is a
simpler to implement guest/host interface, which sometimes results in a
higher performance interface.  PV does not always mean high performance,

Device discovery, bus enumeration and shared memory configuration space
mapping are all very useful features that require complex negotiation
with the hypervisor.  Those are provided implicitly in a standardized

There is no reason you need to sacrifice performance for broader

What if the platform in question does have PCI.  How are you going to
write drivers for non-Linux guests?  Design a new bus protocol and
driver system for pv-only devices which can't run anywhere except for a
couple selected guests, or design along the lines of what the physical
hardware on your platform actually looks like?  It's not like you can
construct a full-featured virtualization of x86 without implenting PCI
at some level anyway.

Note this is not an argument for PCI.  It is an argument for devices
that look like hardware on whatever platform you are virtualizing.  On
s390, that might be a bit different than on x86.  But the key idea is
re-use the platform architecture as much as possible.  This gets you far
more code sharing and interoperability for devices.

Just because you can paravirtualize everything does not mean it is a
good idea or more efficient.  A good high performance "paravirtualized"
network driver only needs one efficient place to trap to the hypervisor,
that is to kick off a TX queue.  Nothing else needs to be
paravirtualized to make this efficient, and now you have a driver that
is fairly easily ported among different operating systems because it
reuses many architectural primitives.

So I think it is a good thing for virtio to allow coupling to a PCI
device backend for x86, but be generic enough to allow coupling to other
backends for non-PCI architectures.  Perhaps the top-level device code
and TX/RX queues can be re-used, although I'm not convinced sharing ...
Previous thread: [RFC] [PATCH 0/3] Recursive mtime for ext3 by Jan Kara on Tuesday, November 6, 2007 - 10:15 am. (15 messages)

Next thread: virtio config_ops refactoring by Anthony Liguori on Tuesday, November 6, 2007 - 10:48 am. (2 messages)