[patch] net: avoid race between netpoll and network fast path

Previous thread: Please pull 'fixes-davem' branch of wireless-2.6 by John W. Linville on Tuesday, October 16, 2007 - 10:31 pm. (5 messages)

Next thread: [PATCH 0/6] forcedeth interrupt and task overhaul, v2 by Jeff Garzik on Wednesday, October 17, 2007 - 1:52 am. (8 messages)
To: Matt Mackall <mpm@...>
Cc: <netdev@...>
Date: Tuesday, October 16, 2007 - 11:45 pm

The current netpoll design and implementation has serveral race issues with the
network fast path that panics/hangs the system or causes interface timeout/reset
but the fix is likely to have impact on the overall system performance and could
involve a large number of drivers. The proposal is to disable the problem code
for normal operations but only to enable it at the time of crash in case polling
is necessary. Tests that have been done included the bug fix verification
as well as regression check on the netlog results in various crash modes.

Signed-off-by: Tina Yang <tina.yang@oracle.com>

---

--- linux-2.6.23.1/include/linux/kernel.h.orig 2007-10-12 09:43:44.000000000 -0700
+++ linux-2.6.23.1/include/linux/kernel.h 2007-10-15 22:05:27.000000000 -0700
@@ -184,6 +184,8 @@ static inline void console_verbose(void)
console_loglevel = 15;
}

+extern int netpoll_enable;
+
extern void bust_spinlocks(int yes);
extern void wake_up_klogd(void);
extern int oops_in_progress; /* If set, an oops, panic(), BUG() or die() is in progress */
--- linux-2.6.23.1/net/core/netpoll.c.orig 2007-10-12 09:43:44.000000000 -0700
+++ linux-2.6.23.1/net/core/netpoll.c 2007-10-15 22:20:15.000000000 -0700
@@ -150,15 +150,19 @@ static void service_arp_queue(struct net
}
}

+int netpoll_enable;
+EXPORT_SYMBOL(netpoll_enable);
void netpoll_poll(struct netpoll *np)
{
if (!np->dev || !netif_running(np->dev) || !np->dev->poll_controller)
return;

/* Process pending work on NIC */
- np->dev->poll_controller(np->dev);
- if (np->dev->poll)
- poll_napi(np);
+ if (unlikely(netpoll_enable)) {
+ np->dev->poll_controller(np->dev);
+ if (np->dev->poll)
+ poll_napi(np);
+ }

service_arp_queue(np->dev->npinfo);

--- linux-2.6.23.1/kernel/panic.c.orig 2007-10-12 09:43:44.000000000 -0700
+++ linux-2.6.23.1/kernel/panic.c 2007-10-15 22:07:25.000000000 -0700
@@ -66,6 +66,7 @@ NORET_TYPE void panic(const char * fmt,
unsig...

To: <tina.yang@...>
Cc: <mpm@...>, <netdev@...>
Date: Wednesday, October 17, 2007 - 12:06 am

From: Tina Yang <tina.yang@oracle.com>

This is at best a kludge, and it's the wrong way to approach this problem.

Fix the bug, and fix it right.

If you disable that stretch of code, what you've done is make the
netpoll code hang and/or drop console messages if the TX queue is full
in the driver and the only way to liberate TX space is to call into
->poll().

You haven't shown the precise race that leads to corruption so that someone
so motivated can guide you towards a more correct fix if you are not
capable of implementing it properly on your own.
-

To: David Miller <davem@...>
Cc: <mpm@...>, <netdev@...>
Date: Wednesday, October 17, 2007 - 1:46 am

Isn't net_rx_action() calling ->poll() to free the TX space ?
TX queue full can only be emptied when the device is done transmitting
not because of netpoll ->poll() it. The softirq (net_rx_action)
is the purpose for such an event. Netconsole messages will be
dropped if the device can't keep up with it regardless of netpoll
->poll() or not. If no dropping can be tolerated, then the
netpoll upper layer probably should be redesigned to buffer the data.

The poll_list currently is in a per_cpu structure, not being
protected globally that netpoll thread from any cpu can

The precise race is
1) net_rx_action get the dev from poll_list
2) at the same time, netpoll poll_napi() get a hold of the poll lock
and calls ->poll(), remove dev from the poll list
3) after it finishes, net_rx_action get the poll lock, and calls
->poll() the second time, and panic when trying to remove (again)
the dev from the poll list.
and I had logged all the crash info from the crash scenes into the
bug database.

As Matt Mackall had acknowledged, the network fast path went to great
length to reduce locking overhead, should that be undone because of
netpoll if that's what it takes to fix it more correctly ?

-

To: <tina.yang@...>
Cc: <mpm@...>, <netdev@...>
Date: Tuesday, October 30, 2007 - 12:26 am

From: Tina Yang <tina.yang@oracle.com>

This is trivial to fix.

I'll check the following into 2.6.14 and backport it to
the -stable trees.

[NET]: Fix race between poll_napi() and net_rx_action()

netpoll_poll_lock() synchronizes the ->poll() invocation
code paths, but once we have the lock we have to make
sure that NAPI_STATE_SCHED is still set. Otherwise we
get:

cpu 0 cpu 1

net_rx_action() poll_napi()
netpoll_poll_lock() ... spin on ->poll_lock
->poll()
netif_rx_complete
netpoll_poll_unlock() acquire ->poll_lock()
->poll()
netif_rx_complete()
CRASH

Based upon a bug report from Tina Yang.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/core/dev.c b/net/core/dev.c
index 853c8b5..02e7d83 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2172,7 +2172,15 @@ static void net_rx_action(struct softirq_action *h)

weight = n->weight;

- work = n->poll(n, weight);
+ /* This NAPI_STATE_SCHED test is for avoiding a race
+ * with netpoll's poll_napi(). Only the entity which
+ * obtains the lock and sees NAPI_STATE_SCHED set will
+ * actually make the ->poll() call. Therefore we avoid
+ * accidently calling ->poll() when NAPI is not scheduled.
+ */
+ work = 0;
+ if (test_bit(NAPI_STATE_SCHED, &n->state))
+ work = n->poll(n, weight);

WARN_ON_ONCE(work > weight);

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index bf8d18f..c499b5c 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -116,6 +116,29 @@ static __sum16 checksum_udp(struct sk_buff *skb, struct udphdr *uh,
* network adapter, forcing superfluous retries and possibly timeouts.
* Thus, we set our budget to greater than 1.
*/
+static int poll_one_napi(struct netpoll_info *npinfo,
+ struct napi_struct *napi, int budget)
+{
+ int work;
+
+ /* net_rx_action's ->poll() invocations and our's are
+ * synchronized by this test which is only made while
...

To: David Miller <davem@...>
Cc: <tina.yang@...>, <netdev@...>
Date: Tuesday, October 30, 2007 - 1:08 am

Thanks, Dave.

--
Mathematics is the supreme nostalgia of our time.
-

Previous thread: Please pull 'fixes-davem' branch of wireless-2.6 by John W. Linville on Tuesday, October 16, 2007 - 10:31 pm. (5 messages)

Next thread: [PATCH 0/6] forcedeth interrupt and task overhaul, v2 by Jeff Garzik on Wednesday, October 17, 2007 - 1:52 am. (8 messages)