The 'fe-lock' branch of
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git fe-lockcontains the following changes that I would like to get tested:
[netdrvr] forcedeth: make NAPI unconditional
[netdrvr] forcedeth: interrupt handling cleanup
[netdrvr] forcedeth: process TX completions using NAPI
[netdrvr] forcedeth: internal simplification and cleanups
[netdrvr] forcedeth: timer overhaulThese are intended for feedback and testing, NOT for merging.
The goals of these changes are:
* move the driver towards a more sane, simple, easy to verify locking
setup -- irq handler would often acquire/release the lock twice
for each interrupt -- and hopefully* to eliminate a rarely used, apparently fragile locking scheme that
includes heavy use of disable_irq(). this tool is most often employed
during NIC reset/reconfiguration, so satisfying this goal implies
changing the way NIC reset and config are accomplished.Miscellaneous notes:
* using the new napi_struct stuff in net-2.6.24, the TX completion
process has been moved to a -separate- NAPI polling channel. thus
there are now two napi_structs, one for RX and one for TX, independent
of each other.* I feel TX NAPI is a useful tool, because it provides an independent TX
process control point and system load feedback point.
Thus I felt this was slightly superior to tasklets.* But who knows if this is a good idea? :) I am interested in
feedback and criticism on this issue.-
OK, I've successfully tested patches 1-5 on an AMD64 system with
00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3)
A trivial s/napi/tx_napi/ fix must be applied to patch #3 (sent in
separate email). Once that is done, the patchset works flawlessly here,
passing simple RX, TX, RX+TX TCP stress tests.I never ran into any TX problems, of the type hinted at by the "Known
bugs" section at the top of forcedeth.c. Either (a) my chip does not
have that bug, (b) my chip needs DEV_NEED_TIMERIRQ for other reasons, or
(c) the issue is not a hardware issue but a driver bug that is now fixed.I'm going to hope its (c), <grin> But only testing will tell.
Jeff
P.S. Depending on when this gets fixed, you might have to revert
net-2.6.24.git commit 5f5dace1ce001b24fb8286e09ffd3c4d2b547e09 ("[NET]:
Various dst_ifdown routines to catch refcounting bugs").
-
need to test that with mcp55 based, that will use MSI.
YH
-
I would love to see an MSI-X test too (might have to turn on throughput
mode).Jeff
-
commit abca163a14b28c234df9bf38034bc967ff81c3aa
Author: Jeff Garzik <jeff@garzik.org>
Date: Sun Oct 7 07:22:14 2007 -0400[netdrvr] forcedeth: wrap slow path hw manipulation inside hw_mutex
* This makes sure everybody who wants to start/stop the RX and TX engines
first acquires this mutex.* tx_timeout code was deleted, replaced by scheduling reset_task.
* linkchange moved to a workqueue (always inside hw_mutex)
* simplified irq handling a bit
* make sure to disable workqueues before NAPI
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
drivers/net/forcedeth.c | 272 ++++++++++++++++++++++++++++++------------------
1 file changed, 175 insertions(+), 97 deletions(-)abca163a14b28c234df9bf38034bc967ff81c3aa
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index a037f49..d1c1efa 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -63,6 +63,7 @@
#include <linux/if_vlan.h>
#include <linux/dma-mapping.h>
#include <linux/workqueue.h>
+#include <linux/mutex.h>#include <asm/irq.h>
#include <asm/io.h>
@@ -647,6 +648,12 @@ struct nv_skb_map {
unsigned int dma_len;
};+struct nv_mc_info {
+ u32 addr[2];
+ u32 mask[2];
+ u32 pff;
+};
+
/*
* SMP locking:
* All hardware access under dev->priv->lock, except the performance
@@ -709,6 +716,8 @@ struct fe_priv {
unsigned int pkt_limit;
struct timer_list oom_kick;
struct work_struct reset_task;
+ struct work_struct linkchange_task;
+ struct work_struct mcast_task;
struct delayed_work stats_task;
u32 reset_task_irq;
int rx_ring_size;
@@ -731,14 +740,18 @@ struct fe_priv {
int tx_ring_size;/* vlan fields */
- struct vlan_group *vlangrp;
+ struct vlan_group *vlangrp;/* msi/msi-x fields */
- u32 msi_flags;
- struct msix_entry msi_x_entry[NV_MSI_X_MAX_VECTORS];
+ u32 msi_flags;
+ struct msix_entry msi_x_entry[NV_MSI_X_MAX_...
You will need the attached patch to even build (oops).
Also, testing shows there is a mutex deadlock somewhere.
/me agrees violently
btw., when i played with this tunable under -rt:
enum {
NV_OPTIMIZATION_MODE_THROUGHPUT,
NV_OPTIMIZATION_MODE_CPU
};
static int optimization_mode = NV_OPTIMIZATION_MODE_THROUGHPUT;the MODE_CPU one gave (much) _higher_ bandwidth. The queueing model in
forcedeth seemed to be not that robust and i think a single queueing
model should be adopted instead of this tunable. (which i think just hid
some bug/dependency) But i never got to the bottom of it so it's just
the impression i got.Ingo
-
That's interesting. It will be informative to narrow down the variables
affected by this. My changes stirred the pot quite a bit :)* 'throughput' mode enables MSI-X, and separate interrupt vectors for RX
and TX. so, NVIDIA's MSI-X implementation, our generic MSI-X support,
or "Known bugs" (see top of file) may be a factor here.* 'throughput' mode also changes the NIC's timer interrupt frequency
* do you recall if you were running in NAPI mode? It defaulted to off
in Kconfig, but I turned it on unconditionally.* I think TX NAPI has the potential to make the optimization_mode
irrelevant (along with the other changes, most notably the interrupt
handling change)* and overall, yes, if we can have a single queueing model /
optimization mode I am strongly in favor of that.Testing welcome ;-) Though these patches are raw and "hot off the
presses", so unrelated bugs are practically a certainty. And I am
worrying about the "Known bugs" note at the top. My gut feeling is that
this was, in part, misunderstanding on the part of reverse-engineers,
since corrected when NVIDIA started contributing to the driver.Jeff
-
s/and hopefully// (it became the next bullet point)
-
It should also be pointed out that these patches were generated on top
of davem's net-2.6.24.git tree.They -probably- apply to -mm, but you might have to remove the forcedeth
patches -mm already has, before applying.-
commit d7c766113ee2ec66ae8975e0acbad086d2c23594
Author: Jeff Garzik <jeff@garzik.org>
Date: Sat Oct 6 10:57:56 2007 -0400[netdrvr] forcedeth: timer overhaul
* convert stats_poll timer to a delayed-work workqueue stats_task
* protect hw stats update with a lock
* now that recovery is the only remaining use of nv_do_nic_poll(),
rename it to nv_reset_task(), move it from a timer to a workqueue,
and delete all non-recovery-related code from the function.* kill np->in_shutdown, it mirrors netif_running(). furthermore,
the overwhelming majority of sites that tested np->in_shutdown
were inside rtnl_lock() and guaranteed never to race against shutdown
anyway.Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
drivers/net/forcedeth.c | 194 ++++++++++++++++++------------------------------
1 file changed, 75 insertions(+), 119 deletions(-)d7c766113ee2ec66ae8975e0acbad086d2c23594
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index d6eacd7..a037f49 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -62,6 +62,7 @@
#include <linux/init.h>
#include <linux/if_vlan.h>
#include <linux/dma-mapping.h>
+#include <linux/workqueue.h>#include <asm/irq.h>
#include <asm/io.h>
@@ -450,7 +451,6 @@ union ring_type {
#define NV_PKTLIMIT_2 9100 /* Actual limit according to NVidia: 9202 */#define OOM_REFILL (1+HZ/20)
-#define POLL_WAIT (1+HZ/100)
#define LINK_TIMEOUT (3*HZ)
#define STATS_INTERVAL (10*HZ)@@ -670,7 +670,6 @@ struct fe_priv {
* Locking: spin_lock(&np->lock); */
struct net_device_stats stats;
struct nv_ethtool_stats estats;
- int in_shutdown;
u32 linkspeed;
int duplex;
int autoneg;
@@ -681,7 +680,6 @@ struct fe_priv {
unsigned int phy_model;
u16 gigabit;
int intr_test;
- int recover_error;/* General data: RO fields */
dma_addr_t ring_addr;
@@ -71...
commit 39572457a4dfe9a9dc1efd6641e7a6467e5658a1
Author: Jeff Garzik <jeff@garzik.org>
Date: Sat Oct 6 01:21:01 2007 -0400[netdrvr] forcedeth: internal simplification and cleanups
* remove changelog from source; its kept in git repository
* split guts of RX/TX DMA engine disable into disable portion,
and wait/etc. portions.* consolidate descriptor version tests using nv_optimized()
* consolidate NIC DMA start, stop and drain into
nv_start_txrx(), nv_stop_txrx(), nv_drain_txrx()* change nv_poll_controller() to call interrupt handling function
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
drivers/net/forcedeth.c | 228 +++++++++++++++++-------------------------------
1 file changed, 84 insertions(+), 144 deletions(-)39572457a4dfe9a9dc1efd6641e7a6467e5658a1
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 1c236e6..d6eacd7 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -29,89 +29,7 @@
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
- * Changelog:
- * 0.01: 05 Oct 2003: First release that compiles without warnings.
- * 0.02: 05 Oct 2003: Fix bug for nv_drain_tx: do not try to free NULL skbs.
- * Check all PCI BARs for the register window.
- * udelay added to mii_rw.
- * 0.03: 06 Oct 2003: Initialize dev->irq.
- * 0.04: 07 Oct 2003: Initialize np->lock, reduce handled irqs, add printks.
- * 0.05: 09 Oct 2003: printk removed again, irq status print tx_timeout.
- * 0.06: 10 Oct 2003: MAC Address read updated, pff flag generation updated,
- * irq mask updated
- * 0.07: 14 Oct 2003: Further irq mask updates.
- * 0.08: 20 Oct 2003: rx_desc.Length initialization added, nv_alloc_rx refill
- * added into irq handler, NULL check for drain_ring.
- * 0.09: 20 Oct 2003: Basic link speed irq implementation. Only hand...
commit 57cbfacc00d69be2ba02b65d1021442273b76263
Author: Jeff Garzik <jeff@garzik.org>
Date: Fri Oct 5 23:25:56 2007 -0400[netdrvr] forcedeth: process TX completions using NAPI
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
drivers/net/forcedeth.c | 143 +++++++++++++++++++++++++++---------------------
1 file changed, 83 insertions(+), 60 deletions(-)57cbfacc00d69be2ba02b65d1021442273b76263
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 1d1a5f5..1c236e6 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -744,6 +744,7 @@ struct fe_priv {struct net_device *dev;
struct napi_struct napi;
+ struct napi_struct tx_napi;/* General data:
* Locking: spin_lock(&np->lock); */
@@ -810,7 +811,6 @@ struct fe_priv {
union ring_type tx_ring;
u32 tx_flags;
int tx_ring_size;
- int tx_stop;/* vlan fields */
struct vlan_group *vlangrp;
@@ -1763,10 +1763,7 @@ static int nv_start_xmit(struct sk_buff *skb, struct net_device *dev)empty_slots = nv_get_empty_tx_slots(np);
if (unlikely(empty_slots <= entries)) {
- spin_lock_irq(&np->lock);
netif_stop_queue(dev);
- np->tx_stop = 1;
- spin_unlock_irq(&np->lock);
return NETDEV_TX_BUSY;
}@@ -1879,10 +1876,7 @@ static int nv_start_xmit_optimized(struct sk_buff *skb, struct net_device *dev)
empty_slots = nv_get_empty_tx_slots(np);
if (unlikely(empty_slots <= entries)) {
- spin_lock_irq(&np->lock);
netif_stop_queue(dev);
- np->tx_stop = 1;
- spin_unlock_irq(&np->lock);
return NETDEV_TX_BUSY;
}@@ -1987,12 +1981,16 @@ static int nv_start_xmit_optimized(struct sk_buff *skb, struct net_device *dev)
*
* Caller must own np->lock.
*/
-static void nv_tx_done(struct net_device *dev)
+static int nv_tx_done(struct net_device *dev, int limit)
{
struct fe_priv *np = netdev_priv(dev);
u32 flags;
+ int orig_limit = limit;
struct ring_desc* orig...
The attached patch fixes an obvious bug. Once applied, TX NAPI actually
works :)
commit a606d2a111cdf948da5d69eb1de5526c5c2dafef
Author: Jeff Garzik <jeff@garzik.org>
Date: Fri Oct 5 22:56:05 2007 -0400[netdrvr] forcedeth: interrupt handling cleanup
* nv_nic_irq_optimized() and nv_nic_irq_other() were complete duplicates
of nv_nic_irq(), with the exception of one function call. Consolidate
all three into a single interrupt handler, deleting a lot of redundant
code.* greatly simplify irq handler locking.
Prior to this change, the irq handler(s) would acquire and release
np->lock for each action (RX, TX, other events).For the common case -- RX or TX work -- the lock is always acquired,
making all successive acquire/release combinations largely redundant.Acquire the lock at the beginning of the irq handler, and release it at
the end of the irq handler. This is simple, easy, and obvious.* remove irq handler work loop.
All interesting events emanating from the irq handler either have
their own work loops, or they poke a timer into action.Therefore, delete the pointless master interrupt handler work loop.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
drivers/net/forcedeth.c | 325 +++++++++++-------------------------------------
1 file changed, 77 insertions(+), 248 deletions(-)a606d2a111cdf948da5d69eb1de5526c5c2dafef
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 49906cc..1d1a5f5 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -2917,208 +2917,110 @@ static void nv_link_irq(struct net_device *dev)
dprintk(KERN_DEBUG "%s: link change notification done.\n", dev->name);
}-static irqreturn_t nv_nic_irq(int foo, void *data)
+static irqreturn_t __nv_nic_irq(struct net_device *dev, bool optimized)
{
- struct net_device *dev = (struct net_device *) data;
struct fe_priv *np = netdev_priv(dev);
u8 __iomem *base = get_hwbase(dev);
u3...
i like that! One forcedeth annoyance that triggers frequently on one of
my testboxes is:[ 120.955202] eth0: too many iterations (6) in nv_nic_irq.
[ 121.233865] eth0: too many iterations (6) in nv_nic_irq.
[ 129.215450] eth0: too many iterations (6) in nv_nic_irq.
[ 139.734408] eth0: too many iterations (6) in nv_nic_irq.
[ 144.546811] eth0: too many iterations (6) in nv_nic_irq.
[ 153.811005] eth0: too many iterations (6) in nv_nic_irq.
[ 154.695879] eth0: too many iterations (6) in nv_nic_irq.
[ 155.455078] eth0: too many iterations (6) in nv_nic_irq.
[ 173.912162] eth0: too many iterations (6) in nv_nic_irq.Ingo
-
any chance to create three verion irq handlers for ioapic, msi, msi-x...?
MACRO or inline function?
YH
-
MSI-X already has its own separate interrupt handlers. MSI and INTx
call the same interrupt handling code, like the unmodified driver goes.
Creating an MSI-specific irq handler would not save very much AFAICS,
but I might be missing something.Do you have ideas/suggestions for a different method?
Jeff
-
in the e1000 driver, it has seperate handler for msi and ioapic.
but in forcedeth, the nv_nic_irq_optimized keep check msi_flags...,
with num of msi interrupt number, that could cause cpu loading get a
little bit high..., even the network performance is ok.YH
-
With all the activity in the interrupt handler, a few in-cache branches
are definitely going to be lost in the noise.Separating the interrupt handlers between MSI and non-MSI tends to be of
more benefit when the separation is accompanied by more efficient
locking in the MSI interrupt handler, or a different mode of interrupt
clear, or some other attribute.Though CPU usage would be a good thing to measure, with these patches.
Jeff
-
commit 7bfc023b952e8e12c7333efccd2e78023c546a7c
Author: Jeff Garzik <jeff@garzik.org>
Date: Fri Oct 5 20:50:24 2007 -0400[netdrvr] forcedeth: make NAPI unconditional
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
drivers/net/Kconfig | 17 -----
drivers/net/forcedeth.c | 149 +-----------------------------------------------
2 files changed, 4 insertions(+), 162 deletions(-)7bfc023b952e8e12c7333efccd2e78023c546a7c
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 9c635a2..59eab61 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -1430,23 +1430,6 @@ config FORCEDETH
<file:Documentation/networking/net-modules.txt>. The module will be
called forcedeth.-config FORCEDETH_NAPI
- bool "Use Rx and Tx Polling (NAPI) (EXPERIMENTAL)"
- depends on FORCEDETH && EXPERIMENTAL
- help
- NAPI is a new driver API designed to reduce CPU and interrupt load
- when the driver is receiving lots of packets from the card. It is
- still somewhat experimental and thus not yet enabled by default.
-
- If your estimated Rx load is 10kpps or more, or if the card will be
- deployed on potentially unfriendly networks (e.g. in a firewall),
- then say Y here.
-
- See <file:Documentation/networking/NAPI_HOWTO.txt> for more
- information.
-
- If in doubt, say N.
-
config CS89x0
tristate "CS89x0 support"
depends on NET_PCI && (ISA || MACH_IXDP2351 || ARCH_IXDP2X01 || ARCH_PNX010X)
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index dae30b7..49906cc 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -123,12 +123,7 @@
* DEV_NEED_TIMERIRQ will not harm you on sane hardware, only generating a few
* superfluous timer interrupts from the nic.
*/
-#ifdef CONFIG_FORCEDETH_NAPI
-#define DRIVERNAPI "-NAPI"
-#else
-#define DRIVERNAPI
-#endif
-#define FORCEDETH_VERSION "0.60"
+#define FORCEDETH_VERSION "1.00"
#define DRV_NAME "forcedeth"...
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
| Linus Torvalds | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| David Newall | Re: Slow DOWN, please!!! |
| Ian Campbell | Re: [PATCH] x86: Construct 32 bit boot time page tables in native format. |
| Matthias Scheler | Re: HEADS UP: timecounters (branch simonb-timecounters) merged into -current |
| Greg Troxel | Re: Interface to change NFS exports |
| Thor Lancelot Simon | metadata cache and memory fragmentation |
| YAMAMOTO Takashi | amap memory allocation |
git: | |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 0/37] dccp: Feature negotiation - last call for comments |
| David Miller | [GIT]: Networking |
| Dushan Tcholich | Re: ksoftirqd high cpu load on kernels 2.6.24 to 2.6.27-rc1-mm1 |
