Re: [ANNOUNCE] New driver vxge for Neterion's X3100 series 10 GbEPCIe adapter

Previous thread: [net-next PATCH 1/8] igb: switch to new dca API by Jeff Kirsher on Friday, March 13, 2009 - 11:40 pm. (9 messages)

Next thread: [net-2.6 PATCH 5/10] Neterion: New driver: register set - vxge-reg.h by Ramkrishna Vepa on Saturday, March 14, 2009 - 1:21 am. (2 messages)
From: Ramkrishna Vepa
Date: Saturday, March 14, 2009 - 1:20 am

This is a release of a new network driver, "vxge", for our latest PCIe based
hardware - The X3100 10GbE Server/Storage Adapter. The X3100 ASIC supports 
four modes of operation, configurable via firmware -
	Single function mode
	Multi function	mode
	SRIOV mode
	MRIOV mode
	
The driver patches series will follow this email. This driver has undergone 
significant testing for the past six months in all four modes of operation,
and is very stable. We would appreciate the community review and comments
on this driver.
	
The modes, besides single function mode, are oriented towards Server
I/O virtualization and/or I/O sharing (see PCI SIG SR IOV and MR IOV
specs for reference), although they can be used on a single
non-virtualized server as well for instance, to run workloads that would
typically benefit from using separate network cards. In these scenarios,
X3100 can replace large number of GbE NICs without any system or network
changes (outside of L2 driver/hardware), while each physical NIC will be
able run up to 10 GbE instead of 1 GbE.

Major features include -
	Virtual ethernet bridge 
	Multiqueue enabled
	Service level guarantees per queue
	Dual port with integrated IEEE 802.3ad link aggregation
	Tcp/Udp/IP stateless offloads - 
		TCP/UDP/IPv4/IPv6 checksum offload, TSO.
	Large receive offload
	Receive traffic hashing
	MSI-X interrupts
	Multiple tx and rx queues with number of steering options

A note on the different modes of operation -

Single-function mode: From Linux stack perspective, the adapter is a
typical multi-queue 10GbE pci-e netdev interface (driven by the
submitted vxge driver).
 
Multi-function mode: From Linux stack perspective, the adapter is a
multi-function pci-e device where each function is a multi-queue pci-e
netdev interface. This mode has some applications in native Linux
environments, but it is primarily designed for use in hypervisors that
do not yet support SR IOV pci-e extensions. In fact, the functionality
in this mode is virtually ...
From: David Miller
Date: Saturday, March 14, 2009 - 1:57 pm

Please resubmit this with the feedback you've received addressed.

Also, no need to CC: jgarzik, he doesn't handle network driver
patch submissions any more.
--

From: Ramkrishna Vepa
Date: Saturday, March 14, 2009 - 6:06 pm

Thanks for the feedback. We'll make the changes, retest and then
resubmit.

--

From: Bill Fink
Date: Sunday, March 15, 2009 - 4:29 pm

Hi Ram,


BTW I got the following mail delivery error on my emails to you:

From: postmaster@pc.s2io.com
Date: Sat, 14 Mar 2009 15:23:49 -0400
Subject: Delivery Status Notification (Failure)

Unable to deliver message to the following recipients, because the message was f
orwarded more than the maximum allowed times. This could indicate a mail loop.

       ram.vepa@neterion.com

						-Bill
--

From: Ramkrishna Vepa
Date: Sunday, March 15, 2009 - 4:32 pm

Hmmm... Not sure why you got this error message. Will check with our IT
guys.

Thanks,
--

From: Yu Zhao
Date: Monday, March 30, 2009 - 11:13 pm

Xen upstream already supports the SR-IOV, and the native Linux and KVM
will be supporting it too when 2.6.30 comes out.

Intel 82576 driver has been patched to enable the SR-IOV capability:
  http://patchwork.kernel.org/patch/8063/
  http://patchwork.kernel.org/patch/8064/
  http://patchwork.kernel.org/patch/8065/
  http://patchwork.kernel.org/patch/8066/
Thought one of these patches uses the sysfs interface to receive the NumVFs
from user space, which is deprecated, rest of them still clearly demonstrate
how to convert the traditional PCI NIC driver to the `Physical Function'
--

From: Leonid Grossman
Date: Tuesday, March 31, 2009 - 7:38 am

Agreed - once SR-IOV support ships in Linux and Xen, using X3100
Multi-function mode becomes optional and the device can/will be used in
SR IOV mode. In other hypervisors, transition to SR IOV will take longer
time and Multi-function mode will be used for a while.

Enabling SR IOV mode should be transparent to vxge driver - the driver
has no SR IOV specific code, and we plan to use the same netdev driver
in both Linux and DomU Linux guest. Also (an optional) Xen Dom0
privileged vxge driver stays the same in Multi-function mode and SR IOV
mode.

We will look at 82576 patches to understand the changes better, but (at
least conceptually :-)) SR-IOV should not require "traditional PCI NIC
driver" to change. Some new "knobs" for VF bandwidth allocation, etc.
could be optionally added but these are applicable to multi-port or
multi-function devices and not SR IOV specific.
The main job of SR IOV support is arguably to translate (reduced) VF PCI
config space to full "traditional" PCI space, so networking (or storage
or any other subsystem) doesn't know the difference. 
What networking resources are implemented behind SR IOV VF is a
different question; in x3100 a VF has the same set of NIC resources as a
legacy pci function, so a netdev driver can stay the same.

Please let us know if this addresses the comment - alternatively, we can
start a different thread since current vxge driver submission doesn't
claim SR IOV support. Once SR IOV is supported in the kernel, we will
--

From: Alexander Duyck
Date: Tuesday, March 31, 2009 - 10:50 am

For the most part I think the bit you would be interested in is the 
"sysfs" patch, http://patchwork.kernel.org/patch/8066/, which is what I 
had used in the original implementation to enable the VFs.  I am going 
to push this to a module parameter similar to your max_config_dev.  The 
rest of the patches handle PF to VF communications and configuration 
which it sounds like is handled via firmware for your adapter.

Most of the changes you would probably need to make would be in 
vxge_probe/vxge_remove.  All you would end up needing to do is call 
pci_enable_sriov(pdev, max_config_dev - 1) on your physical function 
devices and then you would end up getting exactly as many VFs as you 
need.  The call should be safe since I am assuming your VFs don't 
implement their own SR-IOV capability structures.  The cleanup would be 
pretty strait forward as well since you would just need to call 
pci_disable_sriov in remove.

Thanks,

Alex





--

From: Ramkrishna Vepa
Date: Tuesday, March 31, 2009 - 7:38 pm

[Ram]Currently, the messaging interface is not part of this driver
submission and each function driver is independent of the other - there
[Ram] When the device indicates that it is SRIOV capable in the pci
config space, why not create the VF pci config spaces as part of the
enumeration process? This way, there would be no change required for the
network drivers. 

Ram
--

From: Yu Zhao
Date: Tuesday, March 31, 2009 - 7:53 pm

Yes, and that's what the PCI subsystem does. If the vxge VF is identical
to its PF, then vxge should be able to drive both PF and VF without any
modification.

Thanks,
Yu
--

From: Ramkrishna Vepa
Date: Tuesday, March 31, 2009 - 8:36 pm

> Yes, and that's what the PCI subsystem does. If the vxge VF is
[Ram] Ok. In that case, is the call to pci_enable/disable_sriov still
required for vxge?

Ram
--

From: Yu Zhao
Date: Tuesday, March 31, 2009 - 10:09 pm

Yes, the vxge driver first binds the PF once it's loaded (VF doesn't
exist at this time) and calls the SR-IOV API. The VF appears after the
SR-IOV is enabled and then the same copy of the vxge driver can bind
the VF too if you want to use the VF in the native Linux. Though the
hardware is in the SR-IOV mode in this case, it would be equal to the
multi-function mode. Or you can assign the VF to the Xen/KVM guest and
let another copy of vxge driver (may be vxge for Windows, Solaris, BSD,
etc.) running in the guest bind it.

Thanks,
Yu
--

From: Leonid Grossman
Date: Tuesday, March 31, 2009 - 10:44 pm

Yu, could you pl. explain why this call is not optional - SR-IOV pci-e
code should be able to find SR-IOV capable device and enable all VFs
based upon pci-e config space alone, without any help from
device-specific PF driver.
Once VFs appear, vxge or any other native netdev driver should be able
to bind a VF regardless of PF driver being loaded first (or at all) -
there are some use cases that do not assume PF driver presence...
--

From: Alexander Duyck
Date: Tuesday, March 31, 2009 - 11:14 pm

On Tue, Mar 31, 2009 at 10:44 PM, Leonid Grossman

The fact is not all drivers will want to enable all of the VFs just
because they can.  I know in the case of igb I actually prefer to have
the default as not enabling all the VFs because as soon as we do the
PF itself is no longer multiqueue capable due to the fact that the
resources are distributed between the PF and the VFs.

The actual calls themselves are fairly small in terms of enabling
things.  The pci_enable_sriov call will check before hand if the PCI
Config space for SR-IOV exists so it is safe to call in either the VFs
or on the PF.  Also from our testing internally we prefer to limit the
number of VFs allocated to just what is needed since we have seen it
can be difficult to sort out PFs and VFs because they can be allocated
in such a way that you have to go through with ethtool -i to identify
the pci bus/device/function in order to even figure out which port any
given interface belongs to.
--

From: Yu Zhao
Date: Tuesday, March 31, 2009 - 11:55 pm

Yes, this is true in certain cases. However, there are several things
that prevent us to enable the SR-IOV implicitly by the PCI subsystem.

First, the SR-IOV spec says "Once the SRIOV capability is configured
enabling VF to be assigned to individual SI, the PF takes on a more
supervisory role. For example, the PF can be used to manage device
specific functionality such as internal resource allocation to each
VF, VF arbitration to shared resources such as the PCIe Link or the
Function-specific Link (e.g., a network or storage Link), etc." And
some SR-IOV devices follow this suggestion thus their VF cannot work
without PF driver pre-exits. Intel 82576 is an example -- it requires
the PF driver to allocate tx/rx queues for the VF so the VF can be
functional. Only enabling the SR-IOV in the PF PCI config space will
end up with VF appearing useless even its PCI config space is visible.

Second, the SR-IOV depends on somethings that are not available before
the PCI subsystem is fully initialized. This mean we cannot enable the
SR-IOV capability before all PCI device are enumerated. For example,
if the VF resides on the bus different than the PF bus, then we can't
enable the VF before the bus used by the VF is scan because we don't
know if the bus is reserved by the BIOS for the VF or not. Another
example is the dependency link used by the PF -- we can't create the
sysfs symbolic link indicating the dependency before all PFs in a
device are enumerated.

And some SR-IOV devices can support multiple modes at same time.
The 82576 can support N VFs + M VMDq modes (N + M = 8), which means
sometimes people may want to only enable arbitrary number of VFs.
The PCI subsystem can't get value to config the NumVFs unless some

So the PF will not be binded to any driver in these use cases? Can you
please elaborate?

Thanks,
Yu
--

From: Leonid Grossman
Date: Wednesday, April 1, 2009 - 7:30 am

Correct, PF driver can "optionally" manage device specific resources
like queue pairs, etc. - this is not the reason though to mandate a PF
driver presence in order for VFs to operate. 
Arguably PCI code (perhaps with the help of SR PCIM) should be
responsible for pci-e resource configuration, and networking driver
should be responsible for network resources configuration. 
If a device like 82576 is implemented in a way that VFs can't operate
without a PF driver present, it's a reasonable design trade-off and this
case should be supported - but other devices like x3100 do not have this
restriction, so PF driver presence should not be a "must have" (pl. see

Have you considered using SR PCIM for this, instead of using a NIC (or

Yes. For example one of the common use cases for SR IOV is to replace a
large number of "legacy" GbE interfaces (as transparently as possible). 
Assume a customer wants to replace four quad GbE NICs (perhaps running
16 Vlans) with one SR IOV 10GbE card. He considers a PF driver to be a
potential security hole and a configuration overhead, and prevents is
from loading (perhaps via fw option). His expectation is that 16 VFs
will come up and bind to a driver, very much like 16 GbE interfaces did
in the original configuration.

So, there should be arguably a way to enable VFs based upon the
information in device configuration space alone, without requiring a PF
driver to be loaded.
--

Previous thread: [net-next PATCH 1/8] igb: switch to new dca API by Jeff Kirsher on Friday, March 13, 2009 - 11:40 pm. (9 messages)

Next thread: [net-2.6 PATCH 5/10] Neterion: New driver: register set - vxge-reg.h by Ramkrishna Vepa on Saturday, March 14, 2009 - 1:21 am. (2 messages)