[v3 Patch 1/3] netpoll: add generic support for bridge and bonding devices

Previous thread: [PATCH] memcg: update documentation v3 by KAMEZAWA Hiroyuki on Wednesday, April 7, 2010 - 10:58 pm. (24 messages)

Next thread: [PATCH] race condition between __purge_vmap_area_lazy() and free_unmap_vmap_area_noflush() by Zhao, Leifu on Wednesday, April 7, 2010 - 11:36 pm. (2 messages)
From: Amerigo Wang
Date: Wednesday, April 7, 2010 - 11:18 pm

V3:
Update to latest Linus' tree.
Fix deadlocks when releasing slaves of bonding devices.
Thanks to Andy.

V2:
Fix some bugs of previous version.
Remove ->netpoll_setup and ->netpoll_xmit, they are not necessary.
Don't poll all underlying devices, poll ->real_dev in struct netpoll.
Thanks to David for suggesting above.

--------->

This whole patchset is for adding netpoll support to bridge and bonding
devices. I already tested it for bridge, bonding, bridge over bonding,
and bonding over bridge. It looks fine now.

Please comment.


To make bridge and bonding support netpoll, we need to adjust
some netpoll generic code. This patch does the following things:

1) introduce two new priv_flags for struct net_device:
   IFF_IN_NETPOLL which identifies we are processing a netpoll;
   IFF_DISABLE_NETPOLL is used to disable netpoll support for a device
   at run-time;

2) introduce one new method for netdev_ops:
   ->ndo_netpoll_cleanup() is used to clean up netpoll when a device is
     removed.

3) introduce netpoll_poll_dev() which takes a struct net_device * parameter;
   export netpoll_send_skb() and netpoll_poll_dev() which will be used later;

4) hide a pointer to struct netpoll in struct netpoll_info, ditto.

5) introduce ->real_dev for struct netpoll.

6) introduce a new status NETDEV_BONDING_DESLAE, which is used to disable
   netconsole before releasing a slave, to avoid deadlocks.

Cc: David Miller <davem@davemloft.net>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: WANG Cong <amwang@redhat.com>

---

Index: linux-2.6/include/linux/if.h
===================================================================
--- linux-2.6.orig/include/linux/if.h
+++ linux-2.6/include/linux/if.h
@@ -71,6 +71,8 @@
 					 * release skb->dst
 					 */
 #define IFF_DONT_BRIDGE 0x800		/* disallow bridging this ether dev */
+#define IFF_IN_NETPOLL	0x1000		/* whether we are processing netpoll */
+#define IFF_DISABLE_NETPOLL	0x2000	/* disable netpoll at run-time */
 
 ...
From: Amerigo Wang
Date: Wednesday, April 7, 2010 - 11:19 pm

Based on Andy's work, but I modified a lot.

Similar to the patch for bridge, this patch does:

1) implement the 2 methods to support netpoll for bonding;

2) modify netpoll during forwarding packets via bonding;

3) disable netpoll support of bonding when a netpoll-unabled device
   is added to bonding;

4) enable netpoll support when all underlying devices support netpoll.

Cc: Andy Gospodarek <gospo@redhat.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: WANG Cong <amwang@redhat.com>

---

Index: linux-2.6/drivers/net/bonding/bond_main.c
===================================================================
--- linux-2.6.orig/drivers/net/bonding/bond_main.c
+++ linux-2.6/drivers/net/bonding/bond_main.c
@@ -59,6 +59,7 @@
 #include <linux/uaccess.h>
 #include <linux/errno.h>
 #include <linux/netdevice.h>
+#include <linux/netpoll.h>
 #include <linux/inetdevice.h>
 #include <linux/igmp.h>
 #include <linux/etherdevice.h>
@@ -430,7 +431,18 @@ int bond_dev_queue_xmit(struct bonding *
 	}
 
 	skb->priority = 1;
-	dev_queue_xmit(skb);
+#ifdef CONFIG_NET_POLL_CONTROLLER
+	if (bond->dev->priv_flags & IFF_IN_NETPOLL) {
+		struct netpoll *np = bond->dev->npinfo->netpoll;
+		slave_dev->npinfo = bond->dev->npinfo;
+		np->real_dev = np->dev = skb->dev;
+		slave_dev->priv_flags |= IFF_IN_NETPOLL;
+		netpoll_send_skb(np, skb);
+		slave_dev->priv_flags &= ~IFF_IN_NETPOLL;
+		np->dev = bond->dev;
+	} else
+#endif
+		dev_queue_xmit(skb);
 
 	return 0;
 }
@@ -1329,6 +1341,61 @@ static void bond_detach_slave(struct bon
 	bond->slave_cnt--;
 }
 
+#ifdef CONFIG_NET_POLL_CONTROLLER
+/*
+ * You must hold read lock on bond->lock before calling this.
+ */
+static bool slaves_support_netpoll(struct net_device *bond_dev)
+{
+	struct bonding *bond = netdev_priv(bond_dev);
+	struct slave *slave;
+	int i = 0;
+	bool ret ...
From: Amerigo Wang
Date: Wednesday, April 7, 2010 - 11:18 pm

Based on the previous patch, make bridge support netpoll by:

1) implement the 2 methods to support netpoll for bridge;

2) modify netpoll during forwarding packets via bridge;

3) disable netpoll support of bridge when a netpoll-unabled device
   is added to bridge;

4) enable netpoll support when all underlying devices support netpoll.

Cc: David Miller <davem@davemloft.net>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: WANG Cong <amwang@redhat.com>

---

Index: linux-2.6/net/bridge/br_device.c
===================================================================
--- linux-2.6.orig/net/bridge/br_device.c
+++ linux-2.6/net/bridge/br_device.c
@@ -13,8 +13,10 @@
 
 #include <linux/kernel.h>
 #include <linux/netdevice.h>
+#include <linux/netpoll.h>
 #include <linux/etherdevice.h>
 #include <linux/ethtool.h>
+#include <linux/list.h>
 
 #include <asm/uaccess.h>
 #include "br_private.h"
@@ -162,6 +164,59 @@ static int br_set_tx_csum(struct net_dev
 	return 0;
 }
 
+#ifdef CONFIG_NET_POLL_CONTROLLER
+bool br_devices_support_netpoll(struct net_bridge *br)
+{
+	struct net_bridge_port *p;
+	bool ret = true;
+	int count = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&br->lock, flags);
+	list_for_each_entry(p, &br->port_list, list) {
+		count++;
+		if (p->dev->priv_flags & IFF_DISABLE_NETPOLL
+				|| !p->dev->netdev_ops->ndo_poll_controller)
+			ret = false;
+	}
+	spin_unlock_irqrestore(&br->lock, flags);
+	return count != 0 && ret;
+}
+
+static void br_poll_controller(struct net_device *br_dev)
+{
+	struct netpoll *np = br_dev->npinfo->netpoll;
+
+	if (np->real_dev != br_dev)
+		netpoll_poll_dev(np->real_dev);
+}
+
+void br_netpoll_cleanup(struct net_device *br_dev)
+{
+	struct net_bridge *br = netdev_priv(br_dev);
+	struct net_bridge_port *p, *n;
+	const struct net_device_ops *ops;
+
+	br->dev->npinfo = NULL;
+	list_for_each_entry_safe(p, n, ...
From: Stephen Hemminger
Date: Thursday, April 8, 2010 - 8:37 am

On Thu, 8 Apr 2010 02:18:58 -0400


There is no protection on dev->priv_flags for SMP access.
It would better bit value in dev->state if you are using it as control flag.

Then you could use 
			if (unlikely(test_and_clear_bit(__IN_NETPOLL, &skb->dev->state)))


One message is sufficient.

-- 
--

From: Cong Wang
Date: Thursday, April 8, 2010 - 10:43 pm

Probably no, because only br_netpoll_cleanup() will be called


Yes? netpoll_send_skb() needs to see IFF_IN_NETPOLL is set, so
we can't clear this bit before calling it.




Yes? The first messages explains the reason for the second message.


Thanks.
--

From: Cong Wang
Date: Monday, April 12, 2010 - 3:37 am

Hmm, I think we can't use ->state here, it is not for this kind of purpose,
according to its comments.

Also, I find other usages of IFF_XXX flags of ->priv_flags are also using
&, | to set or clear the flags. So there must be some other things preventing
the race...


Thanks.
--

From: Eric Dumazet
Date: Monday, April 12, 2010 - 3:38 am

Yes, its RTNL that protects priv_flags changes, hopefully...


--

From: Stephen Hemminger
Date: Monday, April 12, 2010 - 8:38 am

On Mon, 12 Apr 2010 12:38:57 +0200

The patch was not protecting priv_flags with RTNL.
For example..


@@ -308,7 +312,9 @@ static void netpoll_send_skb(struct netp
 		     tries > 0; --tries) {
 			if (__netif_tx_trylock(txq)) {
 				if (!netif_tx_queue_stopped(txq)) {
+					dev->priv_flags |= IFF_IN_NETPOLL;
 					status = ops->ndo_start_xmit(skb, dev);
+					dev->priv_flags &= ~IFF_IN_NETPOLL;
 					if (status == NETDEV_TX_OK)
 						txq_trans_update(txq);
--

From: Cong Wang
Date: Tuesday, April 13, 2010 - 1:57 am

Hmm, but I checked the bonding case (IFF_BONDING), it doesn't
hold rtnl_lock. Strange.

--

From: Jay Vosburgh
Date: Tuesday, April 13, 2010 - 9:52 am

I looked, and there are a couple of cases in bonding that don't
have RTNL for adjusting priv_flags (in bond_ab_arp_probe when no slaves
are up, and a couple of cases in 802.3ad).  I think the solution there
is to move bonding away from priv_flags for some of this (e.g., convert
bonding to use a frame hook like bridge and macvlan, and greatly
simplify skb_bond_should_drop), but that's a separate topic.

	The majority of the cases, however, do hold RTNL.  Bonding
generally doesn't have to acquire RTNL itself, since whatever called
into bonding is holding it already.  For example, the slave add and
remove paths (bond_enslave, bond_release) are called either via sysfs or
ioctl, both of which acquire RTNL.  All of the set and clear operations
for IFF_BONDING fall into this category; look at bonding_store_slaves
for an example.

	Bonding does acquire RTNL itself when performing failovers,
e.g., bond_mii_monitor holds RTNL prior to calling bond_miimon_commit,
which will change priv_flags.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
--

From: Stephen Hemminger
Date: Tuesday, April 13, 2010 - 10:33 am

On Tue, 13 Apr 2010 09:52:47 -0700

All this was related to netpoll. And netpoll processing often needs to occur
in hard IRQ context. Therefor netpoll stuff and RTNL (which is a mutex),
really don't mix well.  Keep RTNL for what it was meant for network
reconfiguration. Don't turn it into a network special BKL.



-- 
--

From: Cong Wang
Date: Wednesday, April 14, 2010 - 1:16 am

Hmm, I think for my patch, holding RTNL lock is not necessary,
because there're no other call pathes to change IFF_IN_NETPOLL bit,
which is unlike bonding or bridge cases where sysfs/ioctl is provided
to change it.

The only chance to change IFF_IN_NETPOLL is in netpoll_send_skb()
which can't be called simultaneously because there are other locks
protecting it.

Or am I still missing something?

Thanks.
--

From: Cong Wang
Date: Wednesday, April 14, 2010 - 1:11 am

Thanks a lot for your reply!

You are right, I missed something.

Hmm, for bonding, RTNL lock is necessary because there are sysfs
interface and ioctl interface to change its configuration.

--

Previous thread: [PATCH] memcg: update documentation v3 by KAMEZAWA Hiroyuki on Wednesday, April 7, 2010 - 10:58 pm. (24 messages)

Next thread: [PATCH] race condition between __purge_vmap_area_lazy() and free_unmap_vmap_area_noflush() by Zhao, Leifu on Wednesday, April 7, 2010 - 11:36 pm. (2 messages)