[GIT]: Networking

Previous thread: Soft-Lockup/Race in networking in 2.6.31-rc1+195 (possibly caused by netem) by Andres Freund on Tuesday, June 30, 2009 - 7:20 pm. (13 messages)

Next thread: [PATCH net-2.6] be2net: fix spurious interrupt handling in intx mode by Sathya Perla on Wednesday, July 1, 2009 - 7:06 am. (3 messages)
To: David Miller <davem@...>
Cc: netdev <netdev@...>, Ethan Hsiao <ethanhsiao@...>
Date: Tuesday, June 30, 2009 - 5:14 pm

1. Use more efficient way to determine flag status.
2. Hardware dose not mark fragment bit against IPv6 packets,
print TCP/UDP checksum warning message for IPv4 packets only.

Signed-off-by: Guo-Fu Tseng <cooldavid@cooldavid.org>
---
drivers/net/jme.c | 26 +++++++++++++-------------
1 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/net/jme.c b/drivers/net/jme.c
index 29cd5f4..e7068c7 100644
--- a/drivers/net/jme.c
+++ b/drivers/net/jme.c
@@ -880,27 +880,27 @@ jme_rxsum_ok(struct jme_adapter *jme, u16 flags)
if (!(flags & (RXWBFLAG_TCPON | RXWBFLAG_UDPON | RXWBFLAG_IPV4)))
return false;

- if (unlikely(!(flags & RXWBFLAG_MF) &&
- (flags & RXWBFLAG_TCPON) && !(flags & RXWBFLAG_TCPCS))) {
- msg_rx_err(jme, "TCP Checksum error.\n");
- goto out_sumerr;
+ if (unlikely((flags & (RXWBFLAG_MF | RXWBFLAG_TCPON | RXWBFLAG_TCPCS))
+ == RXWBFLAG_TCPON)) {
+ if (flags & RXWBFLAG_IPV4)
+ msg_rx_err(jme, "TCP Checksum error\n");
+ return false;
}

- if (unlikely(!(flags & RXWBFLAG_MF) &&
- (flags & RXWBFLAG_UDPON) && !(flags & RXWBFLAG_UDPCS))) {
- msg_rx_err(jme, "UDP Checksum error.\n");
- goto out_sumerr;
+ if (unlikely((flags & (RXWBFLAG_MF | RXWBFLAG_UDPON | RXWBFLAG_UDPCS))
+ == RXWBFLAG_UDPON)) {
+ if (flags & RXWBFLAG_IPV4)
+ msg_rx_err(jme, "UDP Checksum error.\n");
+ return false;
}

- if (unlikely((flags & RXWBFLAG_IPV4) && !(flags & RXWBFLAG_IPCS))) {
+ if (unlikely((flags & (RXWBFLAG_IPV4 | RXWBFLAG_IPCS))
+ == RXWBFLAG_IPV4)) {
msg_rx_err(jme, "IPv4 Checksum error.\n");
- goto out_sumerr;
+ return false;
}

return true;
-
-out_sumerr:
- return false;
}

static void
--
1.6.0.6

--

To: <torvalds@...>
Cc: <akpm@...>, <netdev@...>, <linux-kernel@...>
Date: Wednesday, July 1, 2009 - 12:39 am

1) Revert ARP windows IP conflict verification support, it breaks
things, from Eric W. Biederman.

2) SCTP unintentionally broken by routing changes, fix from
Wei Yongjun.

3) XFRM layer code does xfrm_addr_cmp() by hand, leading to
code duplication and the usual assortment of cut&paste
bugs. Also from Wei Yongjun.

4) Netfilter fixes from Patrick McHardy, Jaswinder Singh Rajput,
Jan Engelhardt, and Jesper Dangaard Brouer.

5) bnx2x driver has same link reporting bug as bnx2 driver, from
Naohiro Ooiwa.

6) More fib_trie rebalancing fixes from Jarek Poplawski.

7) usbnet was converted to use netdev->stats to record stats
instead of using a private copy. Unfortunately, there are
a bunch of sub-drivers that code share with usbnet, and those
did not get converted so stats got lost. Fixed by Herbert Xu.

8) Fixes for the new ieee802154 stack from Dmitry Eremin-Solenikov.

9) Non-TSO packets can become TSO ones and vice versa, which is
illegal and results in packets which have an illegal state.
Fixes from Herbert Xu.

10) bfin_sir doesn't compile, missed net_device_ops conversion.
Fix from Graf Yang.

11) Intel ethernet driver fixes from J. Kirsher and the rest of
the Intel driver team. In particular, more bugs caught by
the DMA mapping API debugging layer.

Please pull, thanks a lot.

The following changes since commit 52989765629e7d182b4f146050ebba0abf2cb0b7:
Linus Torvalds (1):
Merge git://git.kernel.org/.../davem/net-2.6

are available in the git repository at:

master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master

Alexander Duyck (1):
igb: return PCI_ERS_RESULT_DISCONNECT on permanent error

Andre Detsch (1):
e1000: return PCI_ERS_RESULT_DISCONNECT on permanent error

David S. Miller (2):
Merge branch 'master' of git://git.kernel.org/.../kaber/nf-2.6
Merge branch 'for-linus' of git://git.kernel.org/.../lowpan/lowpan

Dmitry Eremin-Solenikov (3):
...

To: David Miller <davem@...>
Cc: <torvalds@...>, <akpm@...>, <netdev@...>, <linux-kernel@...>
Date: Thursday, July 2, 2009 - 3:57 am

Hm, something in this lot quickly wrecked networking here - see the
tx timeout dump below. It starts with:

[ 351.004596] WARNING: at net/sched/sch_generic.c:246 dev_watchdog+0x10b/0x19c()
[ 351.011815] Hardware name: System Product Name
[ 351.016220] NETDEV WATCHDOG: eth0 (forcedeth): transmit queue 0 timed out

Config attached. Unfortunately i've got no time to do bisection
today.

Ingo

[ 351.004596] WARNING: at net/sched/sch_generic.c:246 dev_watchdog+0x10b/0x19c()
[ 351.011815] Hardware name: System Product Name
[ 351.016220] NETDEV WATCHDOG: eth0 (forcedeth): transmit queue 0 timed out
[ 351.022996] Pid: 0, comm: swapper Tainted: G W 2.6.31-rc1-tip-01159-g24b7800-dirty #59554
[ 351.031850] Call Trace:
[ 351.034270] [<c105108d>] warn_slowpath_common+0x60/0x90
[ 351.039550] [<c10510fb>] warn_slowpath_fmt+0x29/0x2c
[ 351.044593] [<c18eadf6>] dev_watchdog+0x10b/0x19c
[ 351.049348] [<c1074779>] ? __lock_acquired+0xbc/0x2c3
[ 351.054474] [<c1059fb7>] ? run_timer_softirq+0x125/0x1f0
[ 351.059832] [<c105a00d>] run_timer_softirq+0x17b/0x1f0
[ 351.065046] [<c1059fb7>] ? run_timer_softirq+0x125/0x1f0
[ 351.070420] [<c18eaceb>] ? dev_watchdog+0x0/0x19c
[ 351.075173] [<c105625d>] __do_softirq+0xbc/0x16f
[ 351.079849] [<c10561a1>] ? __do_softirq+0x0/0x16f
[ 351.084632] <IRQ> [<c1055b7e>] ? irq_exit+0x3a/0x3c
[ 351.089646] [<c10340b4>] ? smp_apic_timer_interrupt+0x74/0x82
[ 351.095468] [<c102383b>] ? apic_timer_interrupt+0x2f/0x34
[ 351.100927] [<c10220a5>] ? cpu_idle+0x43/0x5e
[ 351.105334] [<c10297fc>] ? default_idle+0x6e/0xc4
[ 351.110114] [<c103adc8>] ? native_safe_halt+0xa/0xc
[ 351.115040] [<c1029801>] ? default_idle+0x73/0xc4
[ 351.119803] [<c10220ab>] ? cpu_idle+0x49/0x5e
[ 351.124241] [<c1a3170c>] ? rest_init+0x58/0x5a
[ 351.128735] [<c20797e6>] ? start_kernel+0x268/0x26d
[ 351.133688] [&lt...

To: Ingo Molnar <mingo@...>
Cc: David Miller <davem@...>, <torvalds@...>, <akpm@...>, <netdev@...>, <linux-kernel@...>
Date: Thursday, July 2, 2009 - 6:51 am

forcedeth might have a problem, in its netif_wake_queue() logic, but
I could not see why a recent patch could make this problem visible now.

CPU0/1: AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 02
is not a new cpu either :)

forcedeth uses an internal tx_stop without appropriate barrier.

Could you try following patch ?

(random guess as I dont have much time right now)

Thank you

diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 1094d29..dc6bbde 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -2165,7 +2165,7 @@ static int nv_start_xmit(struct sk_buff *skb, struct net_device *dev)
empty_slots = nv_get_empty_tx_slots(np);
if (unlikely(empty_slots <= entries)) {
netif_stop_queue(dev);
- np->tx_stop = 1;
+ set_mb(np->tx_stop, 1);
spin_unlock_irqrestore(&np->lock, flags);
return NETDEV_TX_BUSY;
}
@@ -2286,7 +2286,7 @@ static int nv_start_xmit_optimized(struct sk_buff *skb, struct net_device *dev)
empty_slots = nv_get_empty_tx_slots(np);
if (unlikely(empty_slots <= entries)) {
netif_stop_queue(dev);
- np->tx_stop = 1;
+ set_mb(np->tx_stop, 1);
spin_unlock_irqrestore(&np->lock, flags);
return NETDEV_TX_BUSY;
}
@@ -2564,7 +2564,7 @@ static void nv_tx_timeout(struct net_device *dev)
else
status = readl(base + NvRegIrqStatus) & NVREG_IRQSTAT_MASK;

- printk(KERN_INFO "%s: Got tx_timeout. irq: %08x\n", dev->name, status);
+ printk(KERN_INFO "%s: Got tx_timeout. irq: %08x tx_stop=%d\n", dev->name, status, np->tx_stop);

{
int i;
--

To: Ingo Molnar <mingo@...>
Cc: David Miller <davem@...>, <torvalds@...>, <akpm@...>, <netdev@...>, <linux-kernel@...>, Ayaz Abdulla <aabdulla@...>
Date: Thursday, July 2, 2009 - 10:04 am

Oh well this patch was soooo stupid, sorry Ingo.

We might have a race in napi_schedule(), leaving interrupts disabled forever.
I cannot test this patch, I dont have the hardware...

Thanks

diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 1094d29..3b4e076 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -3514,11 +3514,13 @@ static irqreturn_t nv_nic_irq(int foo, void *data)
nv_msi_workaround(np);

#ifdef CONFIG_FORCEDETH_NAPI
- napi_schedule(&np->napi);
-
- /* Disable furthur irq's
- (msix not enabled with napi) */
- writel(0, base + NvRegIrqMask);
+ if (napi_schedule_prep(&np->napi)) {
+ /*
+ * Disable further irq's (msix not enabled with napi)
+ */
+ writel(0, base + NvRegIrqMask);
+ __napi_schedule(&np->napi);
+ }

#else
do
@@ -3615,12 +3617,13 @@ static irqreturn_t nv_nic_irq_optimized(int foo, void *data)
nv_msi_workaround(np);

#ifdef CONFIG_FORCEDETH_NAPI
- napi_schedule(&np->napi);
-
- /* Disable furthur irq's
- (msix not enabled with napi) */
- writel(0, base + NvRegIrqMask);
-
+ if (napi_schedule_prep(&np->napi)) {
+ /*
+ * Disable further irq's (msix not enabled with napi)
+ */
+ writel(0, base + NvRegIrqMask);
+ __napi_schedule(&np->napi);
+ }
#else
do
{
--

To: Eric Dumazet <eric.dumazet@...>
Cc: Ingo Molnar <mingo@...>, David Miller <davem@...>, torvalds@linux-foundation.org <torvalds@...>, akpm@linux-foundation.org <akpm@...>, netdev@vger.kernel.org <netdev@...>, linux-kernel@vger.kernel.org <linux-kernel@...>
Date: Thursday, July 2, 2009 - 12:13 pm

Yes, good catch. There is a race condition here with napi poll.

I would prefer to do the following to keep the code simple and clean.

writel(0, base + NvRegIrqMask);
napi_schedule(&np->napi);

--

To: Ayaz Abdulla <aabdulla@...>
Cc: David Miller <davem@...>, netdev@vger.kernel.org <netdev@...>, linux-kernel@vger.kernel.org <linux-kernel@...>
Date: Thursday, July 2, 2009 - 6:32 pm

CC trimmed down to network devs only :)

It would be racy too ...

check drivers/net/amd8111e.c, drivers/net/natsemi.c ...

If this cpu inconditionaly calls writel(0, base + NvRegIrqMask);
while another cpu just called writel(np->irqmask, base + NvRegIrqMask),
we end with disabled interrupts ?

--

To: Eric Dumazet <eric.dumazet@...>
Cc: David Miller <davem@...>, netdev@vger.kernel.org <netdev@...>
Date: Thursday, July 2, 2009 - 1:29 pm

<BCC linux-kernel>

Yes, but the next instruction is to call napi_schedule() which will
--

To: <aabdulla@...>
Cc: <eric.dumazet@...>, <netdev@...>
Date: Thursday, July 2, 2009 - 9:32 pm

From: Ayaz Abdulla <aabdulla@nvidia.com>

Why? If you BCC linux-kernel, then followups don't get
logged there, that's completely and utterly pointless.

Either CC: it or don't include it at all.
--

To: Ayaz Abdulla <aabdulla@...>
Cc: David Miller <davem@...>, netdev@vger.kernel.org <netdev@...>
Date: Thursday, July 2, 2009 - 8:07 pm

Sorry, I dont get it. You are referring to softirqs, while I speak
of hardware (NIC) interrupts, that can be masked for ever.

With your patch :

CPU0 CPU1
napi_complete();
/* please note there is no smp_wmb()
in __napi_complete() after
clear_bit(NAPI_STATE_SCHED, &n->state);
(It's implied on x86 because of lock prefix)
*/
writel(np->irqmask, base + NvRegIrqMask)
writel(0, base + NvRegIrqMask);
napi_schedule(&np->napi); // might do nothing because this cpu see
// NAPI_STATE_SCHED set to one

NIC cannot deliver further interrupts.

With my patch :

CPU0 CPU1
napi_complete();
/* please note there is no smp_wmb()
in __napi_complete() after
clear_bit(NAPI_STATE_SCHED, &n->state);
*/
writel(np->irqmask, base + NvRegIrqMask)
if (napi_schedule_prep(&np->napi)) // if false, NIC can deliver further interrupts.
// if true, we mask interrupts but re-enable napi. packets will be processed,
// interrupt will be re-enabled later.

My patch has a guarantee that we mask NIC interrupts *only* if napi is re-scheduled.

It probably makes no difference on x86, but it might be better to have same logic in all drivers...
--

To: Eric Dumazet <eric.dumazet@...>
Cc: David Miller <davem@...>, netdev@vger.kernel.org <netdev@...>
Date: Thursday, July 2, 2009 - 9:11 pm

Our hw only runs on architectures that provide coherency. i.e x86

I'm fine with keeping it with both napi_schedule_prep() and __napi_schedule() if you want it to be similar to other drivers.

Thanks!
________________________________________
From: Eric Dumazet [eric.dumazet@gmail.com]
Sent: Thursday, July 02, 2009 5:07 PM
To: Ayaz Abdulla
Cc: David Miller; netdev@vger.kernel.org
Subject: Re: [GIT]: Networking

Sorry, I dont get it. You are referring to softirqs, while I speak
of hardware (NIC) interrupts, that can be masked for ever.

With your patch :

CPU0 CPU1
napi_complete();
/* please note there is no smp_wmb()
in __napi_complete() after
clear_bit(NAPI_STATE_SCHED, &n->state);
(It's implied on x86 because of lock prefix)
*/
writel(np->irqmask, base + NvRegIrqMask)
writel(0, base + NvRegIrqMask);
napi_schedule(&np->napi); // might do nothing because this cpu see
// NAPI_STATE_SCHED set to one

NIC cannot deliver further interrupts.

With my patch :

CPU0 CPU1
napi_complete();
/* please note there is no smp_wmb()
in __napi_complete() after
clear_bit(NAPI_STATE_SCHED, &n->state);
*/
writel(np->irqmask, base + NvRegIrqMask)
if (napi_schedule_prep(&np->napi)) // if false, NIC can deliver further interrupts.
// if true, we mask interrupts but re-enable napi. packets will be processed,
// interrupt will be re-enabled later.

My patch has a guarantee that we mask NIC interrupts *only* if napi is re-scheduled.

It probably makes no difference on x86, but it might be better to have same logic in all drivers...
--------------------------------------------------------...

To: <AAbdulla@...>
Cc: <eric.dumazet@...>, <netdev@...>
Date: Thursday, July 2, 2009 - 9:30 pm

From: Ayaz Abdulla <AAbdulla@nvidia.com>

It has nothing to do with "coherency".

Re-read Eric's emails, the race can happen on x86 quite trivially.
--

To: Eric Dumazet <eric.dumazet@...>
Cc: David Miller <davem@...>, <torvalds@...>, <akpm@...>, <netdev@...>, <linux-kernel@...>, Ayaz Abdulla <aabdulla@...>
Date: Thursday, July 2, 2009 - 4:05 pm

I have tested it and the config that failed before works now. (as do
other configs as well)

I'll have to re-test whether the failing config fails with your fix
reverted - that will have to wait until tomorrow or so.

Ingo
--

To: Eric Dumazet <eric.dumazet@...>, Joerg Roedel <joerg.roedel@...>
Cc: David Miller <davem@...>, <torvalds@...>, <akpm@...>, <netdev@...>, <linux-kernel@...>, Ayaz Abdulla <aabdulla@...>
Date: Friday, July 3, 2009 - 5:46 pm

update: that forcedeth system is still working fine.

I triggered other networking problems today though, a trylock
failure message in netconsole, plus a dma-debug assert from e1000e:

[ 1275.442025] Hangcheck: hangcheck value past margin!
[ 1451.860024] ------------[ cut here ]------------
[ 1451.860024] BUG: spinlock trylock failure on UP on CPU#0, distcc/32540
[ 1451.860024] lock: f5318760, .magic: dead4ead, .owner: distcc/32540, .owner_cpu: 0
[ 1451.860024] Pid: 32540, comm: distcc Tainted: G W 2.6.31-rc1-tip-01236-gb62d24b-dirty #6631
[ 1451.860024] Call Trace:
[ 1451.860024] [<c17764e8>] spin_bug+0x98/0x140
[ 1451.860024] [<c1776d01>] _raw_spin_trylock+0x71/0xc0
[ 1451.860024] [<c283ffbb>] _spin_trylock+0x1b/0x80
[ 1451.860024] [<c23c7868>] netpoll_send_skb+0x198/0x470
[ 1451.860024] [<c23c7eb0>] netpoll_send_udp+0x240/0x290
[ 1451.860024] [<c2840884>] ? _spin_lock_irqsave+0x64/0x90
[ 1451.860024] [<c1d69beb>] write_msg+0xeb/0x160
[ 1451.860024] [<c10933c8>] __call_console_drivers+0x78/0xa0
[ 1451.860024] [<c10934d9>] _call_console_drivers+0xe9/0x120
[ 1451.860024] [<c10938e7>] call_console_drivers+0x137/0x260
[ 1451.860024] [<c1093aaa>] release_console_sem+0x9a/0x1a0
[ 1451.860024] [<c10949b6>] vprintk+0x416/0x500
[ 1451.860024] [<c178b8f7>] ? check_for_illegal_area+0x197/0x340
[ 1451.860024] [<c1094abb>] printk+0x1b/0x20
[ 1451.860024] [<c1092bdb>] warn_slowpath_common+0x1b/0xe0
[ 1451.860024] [<c1092ce9>] warn_slowpath_fmt+0x29/0x30
[ 1451.860024] [<c178b8f7>] check_for_illegal_area+0x197/0x340
[ 1451.860024] [<c178be4c>] debug_dma_map_page+0x1cc/0x310
[ 1451.860024] [<c239fe19>] skb_dma_map+0x219/0x580
[ 1451.860024] [<c23919b6>] ? skb_copy_bits+0x126/0x590
[ 1451.860024] [<c1ab920f>] e1000_tx_map+0x4f/0x2a0
[ 1451.860024] [<c1abbb6b>] e1000_xmit_frame+0x28b/0x5a0
[ 1451.860024] [<c23a9fc0>] dev_har...

To: <mingo@...>
Cc: <eric.dumazet@...>, <joerg.roedel@...>, <torvalds@...>, <akpm@...>, <netdev@...>, <linux-kernel@...>, <aabdulla@...>
Date: Sunday, July 5, 2009 - 10:04 pm

From: Ingo Molnar <mingo@elte.hu>

I've applied Eric's patch, thanks!
--

To: Ingo Molnar <mingo@...>
Cc: <eric.dumazet@...>, <joerg.roedel@...>, <davem@...>, <torvalds@...>, <akpm@...>, <netdev@...>, <linux-kernel@...>, <aabdulla@...>
Date: Sunday, July 5, 2009 - 12:09 am

I don't know about the DMA error but this is perfectly normal.
You're invoking netpoll from within the NIC's xmit routine,
that's why the trylock fails. So I guess you should fix spinlock
debugging instead :)

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

Previous thread: Soft-Lockup/Race in networking in 2.6.31-rc1+195 (possibly caused by netem) by Andres Freund on Tuesday, June 30, 2009 - 7:20 pm. (13 messages)

Next thread: [PATCH net-2.6] be2net: fix spurious interrupt handling in intx mode by Sathya Perla on Wednesday, July 1, 2009 - 7:06 am. (3 messages)