Re: net: replace hooks in __netif_receive_skb (v4)

Previous thread: [PATCH] net/fec: fix pm to survive to suspend/resume by =?utf-8?q?Eric=20B=C3=A9nard?= on Thursday, May 27, 2010 - 10:53 am. (4 messages)

Next thread: [PATCH]: niu: fix uninitialized variable warning by Prarit Bhargava on Thursday, May 27, 2010 - 11:35 am. (2 messages)
From: Jiri Pirko
Date: Thursday, May 27, 2010 - 11:08 am

changelog:
v1->v2
	- writers are locked by rtnl_lock (removed unnecessary spinlock)
	- using call_rcu in unregister
	- nicer macvlan_port_create cleanup
	- struct rx_hanler is made const in funtion parameters

What this patch does is it removes two receive frame hooks (for bridge and for
macvlan) from __netif_receive_skb. These are replaced them with a general
list of rx_handlers which is iterated thru instead.

Then a network driver (of virtual netdev like macvlan or bridge) can register
an rx_handler for needed net device.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/macvlan.c      |   27 ++++++--
 include/linux/if_bridge.h  |    2 -
 include/linux/if_macvlan.h |    4 -
 include/linux/netdevice.h  |   18 +++++
 net/bridge/br.c            |    2 -
 net/bridge/br_if.c         |   14 ++++
 net/bridge/br_input.c      |   12 +++-
 net/bridge/br_private.h    |    3 +-
 net/core/dev.c             |  160 ++++++++++++++++++++++++++------------------
 9 files changed, 160 insertions(+), 82 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 87e8d4c..f6b7924 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -145,15 +145,16 @@ static void macvlan_broadcast(struct sk_buff *skb,
 }
 
 /* called under rcu_read_lock() from netif_receive_skb */
-static struct sk_buff *macvlan_handle_frame(struct macvlan_port *port,
-					    struct sk_buff *skb)
+static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb)
 {
+	struct macvlan_port *port;
 	const struct ethhdr *eth = eth_hdr(skb);
 	const struct macvlan_dev *vlan;
 	const struct macvlan_dev *src;
 	struct net_device *dev;
 	unsigned int len;
 
+	port = rcu_dereference(skb->dev->macvlan_port);
 	if (is_multicast_ether_addr(eth->h_dest)) {
 		src = macvlan_hash_lookup(port, eth->h_source);
 		if (!src)
@@ -511,10 +512,16 @@ static void macvlan_setup(struct net_device *dev)
 	dev->tx_queue_len	= 0;
 }
 
+static const struct netdev_rx_handler macvlan_rx_handler ...
From: Stephen Hemminger
Date: Thursday, May 27, 2010 - 1:08 pm

On Thu, 27 May 2010 20:08:24 +0200

I wonder if we really need the chaining.  What about allowing only one
hook per device (and return -EBUSY)?  That also gets rid of the allocation,
and the extra overhead.

The handler hook, has to be export GPL, since we really don't want more
network stack abuse by 3rd parties.
--

From: Jiri Pirko
Date: Thursday, May 27, 2010 - 10:51 pm

Hmm, that's a good question. I thought about it already. But I'm also wondering
if there is a possible scenario, when the chaining can be necessary. Also I do
not see any -significant- downside of having multiple handlers in a list, so I

Noted, will be in the next patch version.

Jirka
--

From: Jiri Pirko
Date: Thursday, May 27, 2010 - 11:12 pm

changelog:
v2->v3
	- used GPL-ed exports
v1->v2
	- writers are locked by rtnl_lock (removed unnecessary spinlock)
	- using call_rcu in unregister
	- nicer macvlan_port_create cleanup
	- struct rx_hanler is made const in funtion parameters

What this patch does is it removes two receive frame hooks (for bridge and for
macvlan) from __netif_receive_skb. These are replaced them with a general
list of rx_handlers which is iterated thru instead.

Then a network driver (of virtual netdev like macvlan or bridge) can register
an rx_handler for needed net device.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/macvlan.c      |   27 ++++++--
 include/linux/if_bridge.h  |    2 -
 include/linux/if_macvlan.h |    4 -
 include/linux/netdevice.h  |   18 +++++
 net/bridge/br.c            |    2 -
 net/bridge/br_if.c         |   14 ++++
 net/bridge/br_input.c      |   12 +++-
 net/bridge/br_private.h    |    3 +-
 net/core/dev.c             |  160 ++++++++++++++++++++++++++------------------
 9 files changed, 160 insertions(+), 82 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 87e8d4c..f6b7924 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -145,15 +145,16 @@ static void macvlan_broadcast(struct sk_buff *skb,
 }
 
 /* called under rcu_read_lock() from netif_receive_skb */
-static struct sk_buff *macvlan_handle_frame(struct macvlan_port *port,
-					    struct sk_buff *skb)
+static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb)
 {
+	struct macvlan_port *port;
 	const struct ethhdr *eth = eth_hdr(skb);
 	const struct macvlan_dev *vlan;
 	const struct macvlan_dev *src;
 	struct net_device *dev;
 	unsigned int len;
 
+	port = rcu_dereference(skb->dev->macvlan_port);
 	if (is_multicast_ether_addr(eth->h_dest)) {
 		src = macvlan_hash_lookup(port, eth->h_source);
 		if (!src)
@@ -511,10 +512,16 @@ static void macvlan_setup(struct net_device *dev)
 	dev->tx_queue_len	= 0;
 }
 
+static const struct ...
From: Eric Dumazet
Date: Friday, May 28, 2010 - 12:02 am

Please chose another place to hold rx_handlers, to keep rx path memory
needs to minimum cache lines. Somewhere in the following section :

/*
 * Cache line mostly used on receive path (including eth_type_trans())
 */
        unsigned long           last_rx;        /* Time of last Rx      */
        /* Interface address info used in eth_type_trans() */
        unsigned char           *dev_addr;      /* hw address, (before bcast
                                                   because most packets are
                                                   unicast) */

        struct netdev_hw_addr_list      dev_addrs; /* list of device
                                                      hw addresses */

        unsigned char           broadcast[MAX_ADDR_LEN];        /* hw bcast add */

#ifdef CONFIG_RPS
        struct kset             *queues_kset;

        struct netdev_rx_queue  *_rx;

        /* Number of RX queues allocated at alloc_netdev_mq() time  */
        unsigned int            num_rx_queues;
#endif

        struct netdev_queue     rx_queue;


and before the :

	struct netdev_queue     *_tx ____cacheline_aligned_in_smp;



--

From: Jiri Pirko
Date: Friday, May 28, 2010 - 12:33 am

changelog:
v3->v4
	- moved rx_handlers in net_device structure to be near to other data
	  used in rx path
v2->v3
	- used GPL-ed exports
v1->v2
	- writers are locked by rtnl_lock (removed unnecessary spinlock)
	- using call_rcu in unregister
	- nicer macvlan_port_create cleanup
	- struct rx_hanler is made const in funtion parameters

What this patch does is it removes two receive frame hooks (for bridge and for
macvlan) from __netif_receive_skb. These are replaced them with a general
list of rx_handlers which is iterated thru instead.

Then a network driver (of virtual netdev like macvlan or bridge) can register
an rx_handler for needed net device.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/macvlan.c      |   27 ++++++--
 include/linux/if_bridge.h  |    2 -
 include/linux/if_macvlan.h |    4 -
 include/linux/netdevice.h  |   18 +++++
 net/bridge/br.c            |    2 -
 net/bridge/br_if.c         |   14 ++++
 net/bridge/br_input.c      |   12 +++-
 net/bridge/br_private.h    |    3 +-
 net/core/dev.c             |  160 ++++++++++++++++++++++++++------------------
 9 files changed, 160 insertions(+), 82 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 87e8d4c..f6b7924 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -145,15 +145,16 @@ static void macvlan_broadcast(struct sk_buff *skb,
 }
 
 /* called under rcu_read_lock() from netif_receive_skb */
-static struct sk_buff *macvlan_handle_frame(struct macvlan_port *port,
-					    struct sk_buff *skb)
+static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb)
 {
+	struct macvlan_port *port;
 	const struct ethhdr *eth = eth_hdr(skb);
 	const struct macvlan_dev *vlan;
 	const struct macvlan_dev *src;
 	struct net_device *dev;
 	unsigned int len;
 
+	port = rcu_dereference(skb->dev->macvlan_port);
 	if (is_multicast_ether_addr(eth->h_dest)) {
 		src = macvlan_hash_lookup(port, eth->h_source);
 		if (!src)
@@ -511,10 +512,16 @@ static ...
From: Stephen Hemminger
Date: Tuesday, June 1, 2010 - 8:28 am

From: Jiri Pirko <jpirko@redhat.com>

What this patch does is it removes two receive frame hooks (for bridge and for
macvlan) from __netif_receive_skb. These are replaced them with a single
hook for both. It only supports one hook per device because it makes no
sense to do bridging and macvlan on the same device.

Then a network driver (of virtual netdev like macvlan or bridge) can register
an rx_handler for needed net device.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
---
v3->v4 
       - only one hook per device.
v2->v3
	- used GPL-ed exports
v1->v2
	- writers are locked by rtnl_lock (removed unnecessary spinlock)
	- using call_rcu in unregister
	- nicer macvlan_port_create cleanup
	- struct rx_hanler is made const in funtion parameters

 drivers/net/macvlan.c      |   19 ++++---
 include/linux/if_bridge.h  |    2 
 include/linux/if_macvlan.h |    4 -
 include/linux/netdevice.h  |    8 +++
 net/bridge/br.c            |    2 
 net/bridge/br_if.c         |    8 +++
 net/bridge/br_input.c      |   12 +++-
 net/bridge/br_private.h    |    3 -
 net/core/dev.c             |  119 ++++++++++++++++++++-------------------------
 9 files changed, 94 insertions(+), 83 deletions(-)

--- a/drivers/net/macvlan.c	2010-05-28 08:41:38.248169422 -0700
+++ b/drivers/net/macvlan.c	2010-06-01 08:16:52.307206412 -0700
@@ -145,15 +145,16 @@ static void macvlan_broadcast(struct sk_
 }
 
 /* called under rcu_read_lock() from netif_receive_skb */
-static struct sk_buff *macvlan_handle_frame(struct macvlan_port *port,
-					    struct sk_buff *skb)
+static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb)
 {
+	struct macvlan_port *port;
 	const struct ethhdr *eth = eth_hdr(skb);
 	const struct macvlan_dev *vlan;
 	const struct macvlan_dev *src;
 	struct net_device *dev;
 	unsigned int len;
 
+	port = rcu_dereference(skb->dev->macvlan_port);
 	if (is_multicast_ether_addr(eth->h_dest)) {
 		src = ...
From: Jiri Pirko
Date: Tuesday, June 1, 2010 - 8:41 am

Actually, I'm not happy about this (reducing to only one hook) and for two
reasons:

1) I think it's a good behaviour to "mask" one handler by another in case device
is for example used for macvlan and then added to bridge. Because when it's
again removed from the bridge, the original functionality is restored. And also,
this would be consistent with the current behaviour.

2) I would imagine a situation, when multiple handers are needed in cascade.
Actually I'm working on a virtual device draft which uses two handlers, although
in corner situation.

Regards,
	Jirka

--

From: Stephen Hemminger
Date: Tuesday, June 1, 2010 - 9:10 am

On Tue, 1 Jun 2010 17:41:07 +0200

I don't like it because:

1) Adding macvlan to bridge makes no sense. The bridge is already in promicious mode.

2) I don't like to see functionality added when it is not needed.

3) The extra overhead of traversing list causes more cache activity in extreme
   hot path.

Wait and add the multiple handlers when your code is included.

-- 
http://www.extremeprogramming.org/rules/early.html
--

From: Jiri Pirko
Date: Wednesday, June 2, 2010 - 12:02 am

Right, I'm not saying it has sense to have it together in the same moment. But
you can to this:

# ip link add link eth0 type macvlan

You can use eth0 + macvlan0. Now you do:

# brctl addif br0 eth0

Direct use of eth0 and macvlan0 is not not possible now.

# brctl delif br0 eth0


But ok, I hear your arguments, I thought about them myself before I did the
patch and I thought that added overhead (which is not too big, I see one more
dereference, rx_handler pointer) would be acceptible.

--

From: Fischer, Anna
Date: Tuesday, June 1, 2010 - 9:13 am

I think the idea of this is really good, and it has been long required to get rid of the bridging hook and the "hack" for macvlan to get into the network stack. 


What happens with 'ret' here? It is completely ignored, isn't it? Because you assume that there is only a single receiver per device? I think it would be much better to have return codes that indicate whether a packet has been consumed by a receive handler, or whether it is supposed to be processed further. The same actually applies for the packet handlers that are processed a bit further down in netif_receive_skb() - the return code is ignored and so a handler has no control over further processing of the packet.
--

From: Jiri Pirko
Date: Tuesday, June 1, 2010 - 11:50 pm

It's not ignored. it's returned at the end of the function __netif_receive_skb.
--

From: Jiri Pirko
Date: Wednesday, June 2, 2010 - 12:24 am

Few nitpicks, I'm going to send patch reflecting this later.


Why not to use rcu_assign_pointer here? since this is not hot path, that would be
nicer. It should be done for rcu-protected pointers (see rcu_assign_poiter



--

From: Jiri Pirko
Date: Wednesday, June 2, 2010 - 12:52 am

What this patch does is it removes two receive frame hooks (for bridge and for
macvlan) from __netif_receive_skb. These are replaced them with a single
hook for both. It only supports one hook per device because it makes no
sense to do bridging and macvlan on the same device.

Then a network driver (of virtual netdev like macvlan or bridge) can register
an rx_handler for needed net device.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
---
v4->v5
	- do rcu_assign_pointer even for assigning null for rcu-protected
	  pointers 
	- cosmetics (renames, comments and __netif_receive_skb if part)

 drivers/net/macvlan.c      |   19 +++++--
 include/linux/if_bridge.h  |    2 -
 include/linux/if_macvlan.h |    4 --
 include/linux/netdevice.h  |    7 +++
 net/bridge/br.c            |    2 -
 net/bridge/br_if.c         |    8 +++
 net/bridge/br_input.c      |   12 +++-
 net/bridge/br_private.h    |    3 +-
 net/core/dev.c             |  119 ++++++++++++++++++++-----------------------
 9 files changed, 93 insertions(+), 83 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 87e8d4c..53422ce 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -145,15 +145,16 @@ static void macvlan_broadcast(struct sk_buff *skb,
 }
 
 /* called under rcu_read_lock() from netif_receive_skb */
-static struct sk_buff *macvlan_handle_frame(struct macvlan_port *port,
-					    struct sk_buff *skb)
+static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb)
 {
+	struct macvlan_port *port;
 	const struct ethhdr *eth = eth_hdr(skb);
 	const struct macvlan_dev *vlan;
 	const struct macvlan_dev *src;
 	struct net_device *dev;
 	unsigned int len;
 
+	port = rcu_dereference(skb->dev->macvlan_port);
 	if (is_multicast_ether_addr(eth->h_dest)) {
 		src = macvlan_hash_lookup(port, eth->h_source);
 		if (!src)
@@ -515,6 +516,7 @@ static int macvlan_port_create(struct net_device *dev)
 {
 	struct ...
From: David Miller
Date: Wednesday, June 2, 2010 - 7:11 am

From: Jiri Pirko <jpirko@redhat.com>

Ok, this looks good to me.  Applied, thanks everyone.
--

From: Stephen Hemminger
Date: Wednesday, June 2, 2010 - 8:07 am

On Wed, 2 Jun 2010 09:52:08 +0200

Rcu assign is not necessary here for because the hook didn't
get registered so there is no way for other CPU to see it.

-- 
--

From: Eric Dumazet
Date: Wednesday, June 2, 2010 - 8:15 am

Thats a valid point, but we should use it, and not care of this litle
detail. Compiler generates same code anyway, since NULL value is tested
by rcu_assign_pointer().

If we dont use rcu_assign_pointer() ourself, Paul or Arnd will put it
one day or another :)

http://lkml.org/lkml/2010/6/1/290



--

From: Paul E. McKenney
Date: Wednesday, June 2, 2010 - 1:43 pm

Some year soon, I hope.  ;-)

							Thanx, Paul
--

From: Fischer, Anna
Date: Wednesday, June 2, 2010 - 9:20 am

I would like to see this being accepted. As mentioned before I would love to be able to support multiple receive frame hooks per device, but I think this is a good start.

I think especially with virtualization coming into Linux and new network virtualization approaches being developed it would be nice to have a more generic function for packet receive handlers to hook into the network stack more cleanly. Current approaches rely on 'misusing' the bridging hook into the kernel, because the packet 'sniffing' hooks (via dev_pack_add()) do not give enough control over packet processing.

--

Previous thread: [PATCH] net/fec: fix pm to survive to suspend/resume by =?utf-8?q?Eric=20B=C3=A9nard?= on Thursday, May 27, 2010 - 10:53 am. (4 messages)

Next thread: [PATCH]: niu: fix uninitialized variable warning by Prarit Bhargava on Thursday, May 27, 2010 - 11:35 am. (2 messages)