Create how-to for SR-IOV user and device driver developer.
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Alex Chiang <achiang@hp.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
---
Documentation/DocBook/kernel-api.tmpl | 2 +
Documentation/PCI/pci-iov-howto.txt | 227 +++++++++++++++++++++++++++++++++
2 files changed, 229 insertions(+), 0 deletions(-)
create mode 100644 Documentation/PCI/pci-iov-howto.txt
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl
index b7b1482..c6ceb39 100644
--- a/Documentation/DocBook/kernel-api.tmpl
+++ b/Documentation/DocBook/kernel-api.tmpl
@@ -239,6 +239,7 @@ X!Ekernel/module.c
</sect1>
<sect1><title>PCI Support Library</title>
+!Iinclude/linux/pci.h
!Edrivers/pci/pci.c
!Edrivers/pci/pci-driver.c
!Edrivers/pci/remove.c
@@ -251,6 +252,7 @@ X!Edrivers/pci/hotplug.c
-->
!Edrivers/pci/probe.c
!Edrivers/pci/rom.c
+!Edrivers/pci/iov.c
</sect1>
<sect1><title>PCI Hotplug Support Library</title>
!Edrivers/pci/hotplug/pci_hotplug_core.c
diff --git a/Documentation/PCI/pci-iov-howto.txt b/Documentation/PCI/pci-iov-howto.txt
new file mode 100644
index 0000000..ff1969e
--- /dev/null
+++ b/Documentation/PCI/pci-iov-howto.txt
@@ -0,0 +1,227 @@
+ PCI Express Single Root I/O Virtualization HOWTO
+ Copyright (C) 2008 Intel Corporation
+
+
+1. Overview
+
+1.1 What is SR-IOV
+
+Single Root I/O Virtualization (SR-IOV) is a PCI Express Extended
+capability which makes one physical device appear as multiple virtual
+devices. The physical device is referred to as Physical Function while
+the virtual devices are referred to as Virtual Functions. Allocation
+of Virtual Functions can be dynamically controlled by Physical Function
+via ...Why do you need to do this? Thus far, all the documentation has been I don't think this section actually helps a software developer use Wouldn't it be more useful to have the iov/N directories be a symlink to We already have tools to set the MAC and VLAN parameters for network I think a better interface would put the 'notify' into the struct pci_driver. That would make 'notify' a bad name .... how about 'virtual'? There's also no documentation for the second parameter to I'm not 100% convinced about this API. The assumption here is that the driver will do it, but I think it should probably be in the core. The driver probably wants to be notified that the PCI core is going to create a virtual function, and would it please prepare to do so, but I'm not convinced this should be triggered by the driver. How would the I think we'd be better off having the driver create its own sysfs From my reading of the SR-IOV spec, this isn't how it's supposed to work. The device is supposed to be a fully functional PCI device that on demand can start peeling off virtual functions; it's not supposed to boot up and initialise all its virtual functions at once. -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
The main concern here is that a VF may be disabed such as when PF enter D3 state or undergo an reset and thus be plug-off, but user won't Do you mean Ethtool? If yes, it is impossible for SR-IOV since the Our concern is that the PF driver may put an default state when it is loaded so that SR-IOV can work without any user level configuration, but of course the driver won't dynamically change it. The spec defines either we enable all VFs or Disable. Per VF enabling is not supported. Is this what you concern? Thanks, eddie --
If we're relying on the user to reconfigure virtual functions on return I don't think ethtool has that ability; ip(8) can set mac addresses and vconfig(8) sets vlan parameters. The device driver already has to be aware of SR-IOV. If it's going to support the standard tools (and it damn well ought to), then it should Let me try to explain this a bit better. The user decides they want a new ethernet virtual function. In the scheme as you have set up: 1. User communicates to ethernet driver "I want a new VF" 2. Ethernet driver tells PCI core "create new VF". I propose: 1. User tells PCI core "I want a new VF on PCI device 0000:01:03.0" 2. PCI core tells driver "User wants a new VF" My scheme gives us a unified way of creating new VFs, yours requires each driver to invent a way for the user to tell them to create a new VF. I don't think that's true. The spec requires you to enable all the VFs from 0 to NumVFs, but NumVFs can be lower than TotalVFs. At least, that's how I read it. But no, that isn't my concern. My concern is that you've written a driver here that seems to be a stub driver. That doesn't seem to be how SR-IOV is supposed to work; it's supposed to be a fully-functional driver that has SR-IOV knowledge added to it. -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
No. that is the concern we don't put those configuration under VF nodes because it will disappear. OK, as if it has the VF parameter, will look into details. BTW, the SR-IOV patch is not only for network, some other devices such as IDE will use same code base as well and we image it could have other If user need a new VF, the VF must be already enabled or existed in OS. Otherwise, we need to disable all VFs first and then change NumVFs to re-enable VFs. Yes, but setting NumVFs can only occur when VFs are disabled. Following are from spec. NumVFs may only be written while VF Enable is Clear. If NumVFs is written when VF Enable is Set, the results are undefined. Yes, it is a full feature driver as if PF has resource in, for example not all queues are assigned to VFs. Thx, eddie --
Neither ip(8) nor vconfig(8) can set MAC and VLAN address for VF when As Eddie said, we have two problems here: 1) User has to set device specific parameters of a VF when he wants to use this VF with KVM (assign this device to KVM guest). In this case, VF driver is not loaded in the host environment. So operations which are implemented as driver callback (e.g. set_mac_address()) are not supported. 2) For security reason, some SR-IOV devices prohibit the VF driver configuring the VF via its own register space. Instead, the configurations must be done through the PF which the VF is associated with. This means PF driver has to receive parameters that are used to configure its VFs. These parameters obviously can be passed by traditional tools, if without modification for SR-IOV. --
I suspect what you want to do is create, then configure the device in I think that idea also covers this point. -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
That is not true. Rememver the created VFs will be destroyed no matter for PF power event or error recovery conducted reset. So what we want is: Config, create, assign, and then deassign and destroy and then Sorry can u explain a little bit more? The SR-IOV patch won't define what kind of entries should be created or not, we leave network subsystem to decide what to do. Same for disk subsstem etc. Thx, eddie --
No entries should be created. This needs to be not SR-IOV specific. -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
I think we need to cover both the scenarios here, virtualization and non virtualization. In the absence of virtualization, the VF and PF driver should be identical. In this context, how does the PF driver allocates a VF? Is dynamic allocation of VFs possible, or does it have to allocate all the VFs that the device supports when the PF driver loads? Also, will the probe function be called for the VFs, or does the PF driver handle only the probe for the physical function? In virtualization context things get bit more complex as the the VF driver in guest would like to treat the VF as a physical function but that may not be possible from the device perspective as the control registers may well be shared between VF and PF. I would think that the VF allocation is the job of SR PCIM. PCIM may well ask the PF driver to configure a VF upon user request. Thanks much, --
Yes, putting the callback function to the 'pci_driver' is better. Looks like the 'virtual' is not very descriptive (and it's a adj. while other callbacks are verb). Any other candidates? Thanks, Yu --
