login
Header Space

 
 

Re: new NAPI quota synchronization issue

Score:
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: <netdev@...>
Cc: <shemminger@...>
Date: Wednesday, September 19, 2007 - 1:35 pm

From: David Miller <davem@davemloft.net>
Date: Wed, 19 Sep 2007 09:58:25 -0700 (PDT)


Ok, here is the patch and I've checked it into net-2.6.24 as well.

There really shouldn't be any fundamental synchronization issues
in the new NAPI stuff any longer.  I'm pretty sure any problems
remaining can only be caused by drivers bugs but we'll see :-)

I went over the list handling several times and it looks bulletproof.
Only the thread of control that sets the NAPI_STATE_SCHED bit
atomically will do list modifications, until the thread of control
that decides unconditionally to clear the bit, which will do a
list_del() immediately before clearing that bit.

commit d97459caa5dc97b5da0b9be1ec3f107f3c58d7f9
Author: David S. Miller <davem@sunset.davemloft.net>
Date:   Wed Sep 19 10:31:58 2007 -0700

    [NAPI]: Make quota management stateless.
    
    Because we update the napi->quota after returning from
    napi->poll() we have races which can, among other things,
    allow napi->quota to go negative.
    
    For example, if the driver uses the NAPI resched mechanism
    it typically does a completion like this:
    
    	netif_rx_complete(dev, napi);
    	if (unlikely(more_work_showed_up(priv))) {
    		if (netif_rx_reschedule(dev, napi))
    			goto poll_more;
    	}
    	return work_done;
    
    Between the netif_rx_complete() and the netif_rx_reschedule()
    an interrupt on another cpu can schedule the NAPI.  Which is
    fine and handled by the checking of the netif_rx_reschedule()
    return value, but when it happens:
    
    1) The other cpu can do a rull ->poll() run, and update the
       napi->quota
    2) The current thread of execution returns and updates
       napi->quota too
    
    So #1 uses a not-updated napi->quota value, and #2 over
    subtracts from napi->quota.
    
    In short napi->quota access is not synchronized enough.
    
    The good news it that we don't really need it in the first
    place.  The only time we can have partial napi->quota
    updates is when we are trying to adhere to netdev_budget
    in the polling loop of net_rx_action().
    
    We can allow a slight oversubscription of netdev_budget,
    but at most one napi->weight, and that is harmless.
    
    Given that, napi->quota takes on only two values when used,
    the full napi->weight and zero.  Therefore it is entirely
    superfluous and we can always pass napi->weight down to
    the ->poll() routine, and kill off napi->quota and all the
    synchronization problems.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index bc88e4c..cf89ce6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -294,7 +294,6 @@ struct napi_struct {
 
 	unsigned long		state;
 	int			weight;
-	int			quota;
 	int			(*poll)(struct napi_struct *, int);
 #ifdef CONFIG_NETPOLL
 	spinlock_t		poll_lock;
diff --git a/net/core/dev.c b/net/core/dev.c
index 471e59d..0b9f82e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2107,13 +2107,6 @@ static int process_backlog(struct napi_struct *napi, int quota)
 	return work;
 }
 
-static void napi_check_quota_bug(struct napi_struct *n)
-{
-	/* It is illegal to consume more than the given quota.  */
-	if (WARN_ON_ONCE(n->quota < 0))
-		n->quota = 0;
-}
-
 /**
  * __napi_schedule - schedule for receive
  * @napi: entry to schedule
@@ -2124,9 +2117,6 @@ void fastcall __napi_schedule(struct napi_struct *n)
 {
 	unsigned long flags;
 
-	napi_check_quota_bug(n);
-	n->quota = n->weight;
-
 	local_irq_save(flags);
 	list_add_tail(&n->poll_list, &__get_cpu_var(softnet_data).poll_list);
 	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
@@ -2146,6 +2136,7 @@ static void net_rx_action(struct softirq_action *h)
 
 	while (!list_empty(list)) {
 		struct napi_struct *n;
+		int work;
 
 		/* If softirq window is exhuasted then punt.
 		 *
@@ -2168,27 +2159,21 @@ static void net_rx_action(struct softirq_action *h)
 
 		have = netpoll_poll_lock(n);
 
-		/* if quota not exhausted process work */
-		if (likely(n->quota > 0)) {
-			int work = n->poll(n, min(budget, n->quota));
+		work = n->poll(n, n->weight);
 
-			budget -= work;
-			n->quota -= work;
-		}
+		WARN_ON_ONCE(work > n->weight);
 
-		local_irq_disable();
+		budget -= work;
 
-		napi_check_quota_bug(n);
+		local_irq_disable();
 
 		/* Drivers must not modify the NAPI state if they
-		 * consume the entire quota.  In such cases this code
+		 * consume the entire weight.  In such cases this code
 		 * still "owns" the NAPI instance and therefore can
 		 * move the instance around on the list at-will.
 		 */
-		if (unlikely(!n->quota)) {
-			n->quota = n->weight;
+		if (unlikely(work == n->weight))
 			list_move_tail(&n->poll_list, list);
-		}
 
 		netpoll_poll_unlock(have);
 	}
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
new NAPI quota synchronization issue, David Miller, (Wed Sep 19, 12:58 pm)
Re: new NAPI quota synchronization issue, David Miller, (Wed Sep 19, 1:35 pm)
Re: new NAPI quota synchronization issue, Stephen Hemminger, (Wed Sep 19, 6:01 pm)
Re: new NAPI quota synchronization issue, David Miller, (Wed Sep 19, 6:10 pm)
speck-geostationary