[net-next PATCH v3] igbvf: add new driver to support 82576 virtual functions

Previous thread: [patch 0/1] ctnetlink: allocation improvements by Holger Eitzenberger on Wednesday, March 25, 2009 - 2:25 pm. (1 message)

Next thread: Re: 2.6.29 forcedeth hang W/O NAPI enabled by Adam Richter on Wednesday, March 25, 2009 - 4:24 pm. (8 messages)
From: Jeff Kirsher
Date: Wednesday, March 25, 2009 - 2:52 pm

From: Alexander Duyck <alexander.h.duyck@intel.com>

This adds an igbvf driver to handle virtual functions provided by the
igb driver when SR-IOV has been enabled.  A virtual function is a
lightweight pci-e function that supports a single queue and shares
resources with the 82576 physical function contained within the igb
driver.

To spawn virtual functions from the igb driver all that is needed is to
issue a echo X > /sys/class/ethY/num_vfs.  X can be a value between 0 and
7, 0 will disable SR-IOV functionality for the port and is the default.  If
the num_vfs sysfs entry is not present then the device does not have SR-IOV
capability enabled either in software or hardware.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/Kconfig         |   21 
 drivers/net/Makefile        |    1 
 drivers/net/igbvf/Makefile  |   38 +
 drivers/net/igbvf/defines.h |  125 ++
 drivers/net/igbvf/ethtool.c |  540 ++++++++
 drivers/net/igbvf/igbvf.h   |  335 +++++
 drivers/net/igbvf/mbx.c     |  350 +++++
 drivers/net/igbvf/mbx.h     |   75 +
 drivers/net/igbvf/netdev.c  | 2917 +++++++++++++++++++++++++++++++++++++++++++
 drivers/net/igbvf/regs.h    |  108 ++
 drivers/net/igbvf/vf.c      |  398 ++++++
 drivers/net/igbvf/vf.h      |  265 ++++
 12 files changed, 5173 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/igbvf/Makefile
 create mode 100644 drivers/net/igbvf/defines.h
 create mode 100644 drivers/net/igbvf/ethtool.c
 create mode 100644 drivers/net/igbvf/igbvf.h
 create mode 100644 drivers/net/igbvf/mbx.c
 create mode 100644 drivers/net/igbvf/mbx.h
 create mode 100644 drivers/net/igbvf/netdev.c
 create mode 100644 drivers/net/igbvf/regs.h
 create mode 100644 drivers/net/igbvf/vf.c
 create mode 100644 drivers/net/igbvf/vf.h

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index e5ffc1c..1f3d29d 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2048,6 +2048,27 @@ ...
From: Jeff Kirsher
Date: Wednesday, March 25, 2009 - 3:00 pm

On Wed, Mar 25, 2009 at 2:52 PM, Jeff Kirsher


into drivers/net/igbvf/ethtool.c.  We verified that this fixes the
compilation issues on PPC.

-- 
Cheers,
Jeff
--

From: Stephen Hemminger
Date: Wednesday, March 25, 2009 - 3:03 pm

On Wed, 25 Mar 2009 14:52:48 -0700

Please don't use sysfs for this.
Instead build a proper rtnl_link netlink interface (see macvlan).

Intel doesn't invent unique hardware, other vendors will do the same
thing (or follow your lead), so doing it right the first time for these
kind of devices makes sense.
--

From: Alexander Duyck
Date: Wednesday, March 25, 2009 - 3:33 pm

The sysfs part of this is already in net-next as it isn't part of the 
igbvf driver it is part of igb.  It was added in over a month ago:
http://patchwork.ozlabs.org/patch/23471/

I will see what we can do to make use of a netlink interface.  I'm 
honestly not even sure if it will work well for this kind of setup since 
the VF driver won't even be running on the same OS in most cases.  For 
now the sysfs code in igb is actually quite minimal, and the main goal 
was to keep the interface as simple as possible which is what I have done.

We are already planning to make a number of changes to support Host 
configuration of the guest interfaces for 2.6.31, and a number of those 
included sysfs entries that I had rejected for this release.  What I can 
probably do is see about trying to make all of those changes be pushed 
over to a netlink interface since it would make much more sense when we 
have multiple items to configure other than just setting the number of VFs.

Thanks,

Alex
--

From: David Miller
Date: Wednesday, March 25, 2009 - 4:58 pm

From: Alexander Duyck <alexander.h.duyck@intel.com>

I have to agree with Stephen that the way to create new links
of virtual and similar devices is to use rtnl_link.

We have very good infrastructure for this, and if I understand
things correctly iproute2 might even work with an igb using
rtnl_link with zero modifications to the tool sources.

Please fix this up, it's a very reasonable request for such
a large new piece of driver infrastructure.  And since this is
a new driver, you have no time constraints for submission either.

--

From: Alexander Duyck
Date: Wednesday, March 25, 2009 - 5:54 pm

Thanks for allowing me more time, but the issue isn't the igbvf driver. 
  The part that Stephen is referencing will be an issue in the igb 
driver.  That is why I pointed him to 
http://patchwork.ozlabs.org/patch/23471/ since it is the part that 
contains the code that he has issues with.

Since the issue isn't the igbvf driver there is no reason for it to be 
held up.  If you want to revert the offending patch feel free to and 
what I will do is send a patch out that just statically set the number 
of VFs to 7 if the kernel and device supports enabling SR-IOV.  I didn't 
want to take that approach since it means that it disables multi-queue 
for igb on those devices but I feel like I have been backed into a 
corner on this and don't have many other options.

Thanks,

Alex











--

From: Yu Zhao
Date: Wednesday, March 25, 2009 - 7:33 pm

We do have some powerful tools for network device configuration, but they
still can't cover the multi queue and SR-IOV NICs (also David Woodhouse's
Solos DSL, etc.) configuration requirement. Unless someone improves the
existing tools to support configuring these new hardware features, people
would keep trying to use the sysfs which is the easiest way so far.

Thanks,
Yu
--

From: David Miller
Date: Wednesday, March 25, 2009 - 8:16 pm

From: Yu Zhao <yu.zhao@intel.com>

We have a generic link creation mechanism in rtnl_link so I think
this claim wrt. SR-IOV is false.

Believe it or not, somebody took the time to make a generic mechanism
for things many virtual network device drivers have to do, and
igb/igbvf can use it too.
--

From: Patrick McHardy
Date: Thursday, March 26, 2009 - 6:05 am

So what are these missing features in the existing interfaces?
--

From: David Miller
Date: Wednesday, March 25, 2009 - 8:12 pm

From: Alexander Duyck <alexander.h.duyck@intel.com>

I disagree, I think both cases should be fixed.

Just because we do something already never means that it's
ok to proliferate the mistake further.
--

From: Roland Dreier
Date: Wednesday, March 25, 2009 - 8:21 pm

> > Since the issue isn't the igbvf driver there is no reason for it to
 > > be held up.
 > 
 > I disagree, I think both cases should be fixed.
 > 
 > Just because we do something already never means that it's
 > ok to proliferate the mistake further.

The igbvf driver doesn't proliferate any mistake.  The num_vfs sysfs
attribute that everyone finds so objectionable is in the igb driver, and
if you follow the patchwork URL that was provided, you would see that
you applied the patch adding it back in February.  The only thing the
igbvf driver does relating to this is bind to the PCI ID of the virtual
functions created by the igb driver.

 - R.
--

From: David Miller
Date: Wednesday, March 25, 2009 - 8:33 pm

From: Roland Dreier <rdreier@cisco.com>

I am saying I made a mistake when merging that.

Can we continue forward now?

Thanks.
--

From: Alexander Duyck
Date: Wednesday, March 25, 2009 - 8:27 pm

That isn't what I mean.  The code he is referring to exists nowhere in
the igbvf driver.  I suppose I can edit the igbvf commit comments so
that they don't mention the sysfs entry, but the code is in the igb
driver.

Also I just want to clarify.  It isn't fair to compare igbvf to
macvlan.  A better comparison would be virtio since it is meant to run
on a guest, not on the hypervisor/DOM0 OS.  I have also been looking
all over for an example that is even close to what we are looking for
and the sad fact is that the closest thing I can find in the kernel is
the DCB netlink code since it effects the number of queues in use. ,
and I don't have the time to implement anything like that before the
merge window is closed.

What I want to know is if I get rid of the sysfs entry in igb and go
with a static number of VFs, and remove the commit comment referencing
the sysfs entry for igbvf will that be enough to get this accepted for
2.6.30?

Thanks,

Alex
--

From: David Miller
Date: Wednesday, March 25, 2009 - 8:34 pm

From: Alexander Duyck <alexander.duyck@gmail.com>

I know it's in the igb driver, I fully understand that.

And I'm saying it was a mistake to merge that, it slipped past
me in my review of that change, and we need to move forward
and get rid of this thing.
--


Ok, I have already submitted the patches to remove the sysfs entry, and the 
comments about it from the commit in igbvf to Jeff.  Hopefully we can get 
the pushed through sometime soon.

In the meantime I have been working on the rtnl_link_ops approach and I 
think I have a few things going but I wanted to get some input before I go 
much further.

First, is it ok for me to call rtnl_unlock prior to doing my settings 
changes on the sriov config space, followed by rtnl_lock afterwards in my 
newlink and dellink operations?  I ask because I had to do this in order to 
prevent a deadlock when the pci-hotplug events fired for the vfs and called 
unregister/register_netdev.

Second is it acceptable for me to just free the netdev at the end of 
newlink and call delete on the PF interface directly?  I ask because I 
don't have any use for the netdevs that are generated and I cannot call 
delete on specific VFs anyway since they are allocated/freed in LIFO order 
so I would always have to free the last one I allocated.

I have included a patch for review below that implements these changes 
against the current driver.  Please feel free to comment.

Thanks,

Alex

--

This patch converts igb from using a sysfs interface to allocate vfs to 
using the rtnl_link interface.

Basically the new interface work like this to add a VF:
ip link add eth3 type igbpf

To delete a VF it would be:
ip link del eth3 type igbpf

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>

---

 drivers/net/igb/igb_main.c |  120 +++++++++++++++++++++++++++++---------------
 1 files changed, 80 insertions(+), 40 deletions(-)


diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 7d3ac76..d1e9425 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -154,11 +154,17 @@ static void igb_netpoll(struct net_device *);
 #endif
 
 #ifdef CONFIG_PCI_IOV
-static ssize_t igb_set_num_vfs(struct device *, struct device_attribute *,
-                  ...

No, both functions are called with the RTNL already held. I'm not
sure I understand what kind of potential deadlock you're trying
to avoid. The ->newlink and ->dellink functions are called (mainly)
in response to userspace netlink messages and there should never
be a need to change anything related to rtnl locking.

A deadlock can happen when you call rtnl_link_unregister() while
holding the RTNL. There's an unlocked version (__rtnl_link_unregister)
for this case.


No, the newly created netdev is freed when returning an error, other

Thats not really how this is supposed to work. Every device is an
independant instance, so you can delete them in arbitrary order.
If you need to assign them some device resources, you need to do
this mapping internally.
--

From: Alexander Duyck
Date: Friday, March 27, 2009 - 8:22 am

So what I was seeing prior to changing the locking is that if I had the 
igbvf driver loaded and enabled a vf the operation would hang, and 
anything that tried to configure a network interface would hang as well.

The call to enable SR-IOV is contained within the newlink and dellink 
calls with this patch.  When I change the number of VFs it will trigger 
PCI hotplug events where it will remove all the VFs and then add them 
back.  As a result there are a number of register/unregister_netdev 
calls that are triggered by the igbvf_probe/remove calls in the igbvf 

The problem is I have to alloc/free VFs in order.  See the rest of my 

This is where it gets messy and where we don't really have any good 
tools for this.  The problem is each VF is not independent.   If I 
remove VFs it has to be in LIFO ordering.  This is due to the fact that 
SR-IOV config space only allows you to specify a number of VFs, not the 
ordering of them, so they cannot be enabled/disabled individually.

Thanks,

Alex
--

Previous thread: [patch 0/1] ctnetlink: allocation improvements by Holger Eitzenberger on Wednesday, March 25, 2009 - 2:25 pm. (1 message)

Next thread: Re: 2.6.29 forcedeth hang W/O NAPI enabled by Adam Richter on Wednesday, March 25, 2009 - 4:24 pm. (8 messages)