changelog:
v1->v2
- writers are locked by rtnl_lock (removed unnecessary spinlock)
- using call_rcu in unregister
- nicer macvlan_port_create cleanup
- struct rx_hanler is made const in funtion parameters
What this patch does is it removes two receive frame hooks (for bridge and for
macvlan) from __netif_receive_skb. These are replaced them with a general
list of rx_handlers which is iterated thru instead.
Then a network driver (of virtual netdev like macvlan or bridge) can register
an rx_handler for needed net device.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
drivers/net/macvlan.c | 27 ++++++--
include/linux/if_bridge.h | 2 -
include/linux/if_macvlan.h | 4 -
include/linux/netdevice.h | 18 +++++
net/bridge/br.c | 2 -
net/bridge/br_if.c | 14 ++++
net/bridge/br_input.c | 12 +++-
net/bridge/br_private.h | 3 +-
net/core/dev.c | 160 ++++++++++++++++++++++++++------------------
9 files changed, 160 insertions(+), 82 deletions(-)
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 87e8d4c..f6b7924 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -145,15 +145,16 @@ static void macvlan_broadcast(struct sk_buff *skb,
}
/* called under rcu_read_lock() from netif_receive_skb */
-static struct sk_buff *macvlan_handle_frame(struct macvlan_port *port,
- struct sk_buff *skb)
+static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb)
{
+ struct macvlan_port *port;
const struct ethhdr *eth = eth_hdr(skb);
const struct macvlan_dev *vlan;
const struct macvlan_dev *src;
struct net_device *dev;
unsigned int len;
+ port = rcu_dereference(skb->dev->macvlan_port);
if (is_multicast_ether_addr(eth->h_dest)) {
src = macvlan_hash_lookup(port, eth->h_source);
if (!src)
@@ -511,10 +512,16 @@ static void macvlan_setup(struct net_device *dev)
dev->tx_queue_len = 0;
}
+static const struct netdev_rx_handler macvlan_rx_handler ...On Thu, 27 May 2010 20:08:24 +0200 I wonder if we really need the chaining. What about allowing only one hook per device (and return -EBUSY)? That also gets rid of the allocation, and the extra overhead. The handler hook, has to be export GPL, since we really don't want more network stack abuse by 3rd parties. --
Hmm, that's a good question. I thought about it already. But I'm also wondering if there is a possible scenario, when the chaining can be necessary. Also I do not see any -significant- downside of having multiple handlers in a list, so I Noted, will be in the next patch version. Jirka --
changelog:
v2->v3
- used GPL-ed exports
v1->v2
- writers are locked by rtnl_lock (removed unnecessary spinlock)
- using call_rcu in unregister
- nicer macvlan_port_create cleanup
- struct rx_hanler is made const in funtion parameters
What this patch does is it removes two receive frame hooks (for bridge and for
macvlan) from __netif_receive_skb. These are replaced them with a general
list of rx_handlers which is iterated thru instead.
Then a network driver (of virtual netdev like macvlan or bridge) can register
an rx_handler for needed net device.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
drivers/net/macvlan.c | 27 ++++++--
include/linux/if_bridge.h | 2 -
include/linux/if_macvlan.h | 4 -
include/linux/netdevice.h | 18 +++++
net/bridge/br.c | 2 -
net/bridge/br_if.c | 14 ++++
net/bridge/br_input.c | 12 +++-
net/bridge/br_private.h | 3 +-
net/core/dev.c | 160 ++++++++++++++++++++++++++------------------
9 files changed, 160 insertions(+), 82 deletions(-)
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 87e8d4c..f6b7924 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -145,15 +145,16 @@ static void macvlan_broadcast(struct sk_buff *skb,
}
/* called under rcu_read_lock() from netif_receive_skb */
-static struct sk_buff *macvlan_handle_frame(struct macvlan_port *port,
- struct sk_buff *skb)
+static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb)
{
+ struct macvlan_port *port;
const struct ethhdr *eth = eth_hdr(skb);
const struct macvlan_dev *vlan;
const struct macvlan_dev *src;
struct net_device *dev;
unsigned int len;
+ port = rcu_dereference(skb->dev->macvlan_port);
if (is_multicast_ether_addr(eth->h_dest)) {
src = macvlan_hash_lookup(port, eth->h_source);
if (!src)
@@ -511,10 +512,16 @@ static void macvlan_setup(struct net_device *dev)
dev->tx_queue_len = 0;
}
+static const struct ...Please chose another place to hold rx_handlers, to keep rx path memory
needs to minimum cache lines. Somewhere in the following section :
/*
* Cache line mostly used on receive path (including eth_type_trans())
*/
unsigned long last_rx; /* Time of last Rx */
/* Interface address info used in eth_type_trans() */
unsigned char *dev_addr; /* hw address, (before bcast
because most packets are
unicast) */
struct netdev_hw_addr_list dev_addrs; /* list of device
hw addresses */
unsigned char broadcast[MAX_ADDR_LEN]; /* hw bcast add */
#ifdef CONFIG_RPS
struct kset *queues_kset;
struct netdev_rx_queue *_rx;
/* Number of RX queues allocated at alloc_netdev_mq() time */
unsigned int num_rx_queues;
#endif
struct netdev_queue rx_queue;
and before the :
struct netdev_queue *_tx ____cacheline_aligned_in_smp;
--
changelog:
v3->v4
- moved rx_handlers in net_device structure to be near to other data
used in rx path
v2->v3
- used GPL-ed exports
v1->v2
- writers are locked by rtnl_lock (removed unnecessary spinlock)
- using call_rcu in unregister
- nicer macvlan_port_create cleanup
- struct rx_hanler is made const in funtion parameters
What this patch does is it removes two receive frame hooks (for bridge and for
macvlan) from __netif_receive_skb. These are replaced them with a general
list of rx_handlers which is iterated thru instead.
Then a network driver (of virtual netdev like macvlan or bridge) can register
an rx_handler for needed net device.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
drivers/net/macvlan.c | 27 ++++++--
include/linux/if_bridge.h | 2 -
include/linux/if_macvlan.h | 4 -
include/linux/netdevice.h | 18 +++++
net/bridge/br.c | 2 -
net/bridge/br_if.c | 14 ++++
net/bridge/br_input.c | 12 +++-
net/bridge/br_private.h | 3 +-
net/core/dev.c | 160 ++++++++++++++++++++++++++------------------
9 files changed, 160 insertions(+), 82 deletions(-)
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 87e8d4c..f6b7924 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -145,15 +145,16 @@ static void macvlan_broadcast(struct sk_buff *skb,
}
/* called under rcu_read_lock() from netif_receive_skb */
-static struct sk_buff *macvlan_handle_frame(struct macvlan_port *port,
- struct sk_buff *skb)
+static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb)
{
+ struct macvlan_port *port;
const struct ethhdr *eth = eth_hdr(skb);
const struct macvlan_dev *vlan;
const struct macvlan_dev *src;
struct net_device *dev;
unsigned int len;
+ port = rcu_dereference(skb->dev->macvlan_port);
if (is_multicast_ether_addr(eth->h_dest)) {
src = macvlan_hash_lookup(port, eth->h_source);
if (!src)
@@ -511,10 +512,16 @@ static ...From: Jiri Pirko <jpirko@redhat.com>
What this patch does is it removes two receive frame hooks (for bridge and for
macvlan) from __netif_receive_skb. These are replaced them with a single
hook for both. It only supports one hook per device because it makes no
sense to do bridging and macvlan on the same device.
Then a network driver (of virtual netdev like macvlan or bridge) can register
an rx_handler for needed net device.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
---
v3->v4
- only one hook per device.
v2->v3
- used GPL-ed exports
v1->v2
- writers are locked by rtnl_lock (removed unnecessary spinlock)
- using call_rcu in unregister
- nicer macvlan_port_create cleanup
- struct rx_hanler is made const in funtion parameters
drivers/net/macvlan.c | 19 ++++---
include/linux/if_bridge.h | 2
include/linux/if_macvlan.h | 4 -
include/linux/netdevice.h | 8 +++
net/bridge/br.c | 2
net/bridge/br_if.c | 8 +++
net/bridge/br_input.c | 12 +++-
net/bridge/br_private.h | 3 -
net/core/dev.c | 119 ++++++++++++++++++++-------------------------
9 files changed, 94 insertions(+), 83 deletions(-)
--- a/drivers/net/macvlan.c 2010-05-28 08:41:38.248169422 -0700
+++ b/drivers/net/macvlan.c 2010-06-01 08:16:52.307206412 -0700
@@ -145,15 +145,16 @@ static void macvlan_broadcast(struct sk_
}
/* called under rcu_read_lock() from netif_receive_skb */
-static struct sk_buff *macvlan_handle_frame(struct macvlan_port *port,
- struct sk_buff *skb)
+static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb)
{
+ struct macvlan_port *port;
const struct ethhdr *eth = eth_hdr(skb);
const struct macvlan_dev *vlan;
const struct macvlan_dev *src;
struct net_device *dev;
unsigned int len;
+ port = rcu_dereference(skb->dev->macvlan_port);
if (is_multicast_ether_addr(eth->h_dest)) {
src = ...Actually, I'm not happy about this (reducing to only one hook) and for two reasons: 1) I think it's a good behaviour to "mask" one handler by another in case device is for example used for macvlan and then added to bridge. Because when it's again removed from the bridge, the original functionality is restored. And also, this would be consistent with the current behaviour. 2) I would imagine a situation, when multiple handers are needed in cascade. Actually I'm working on a virtual device draft which uses two handlers, although in corner situation. Regards, Jirka --
On Tue, 1 Jun 2010 17:41:07 +0200 I don't like it because: 1) Adding macvlan to bridge makes no sense. The bridge is already in promicious mode. 2) I don't like to see functionality added when it is not needed. 3) The extra overhead of traversing list causes more cache activity in extreme hot path. Wait and add the multiple handlers when your code is included. -- http://www.extremeprogramming.org/rules/early.html --
Right, I'm not saying it has sense to have it together in the same moment. But you can to this: # ip link add link eth0 type macvlan You can use eth0 + macvlan0. Now you do: # brctl addif br0 eth0 Direct use of eth0 and macvlan0 is not not possible now. # brctl delif br0 eth0 But ok, I hear your arguments, I thought about them myself before I did the patch and I thought that added overhead (which is not too big, I see one more dereference, rx_handler pointer) would be acceptible. --
I think the idea of this is really good, and it has been long required to get rid of the bridging hook and the "hack" for macvlan to get into the network stack. What happens with 'ret' here? It is completely ignored, isn't it? Because you assume that there is only a single receiver per device? I think it would be much better to have return codes that indicate whether a packet has been consumed by a receive handler, or whether it is supposed to be processed further. The same actually applies for the packet handlers that are processed a bit further down in netif_receive_skb() - the return code is ignored and so a handler has no control over further processing of the packet. --
It's not ignored. it's returned at the end of the function __netif_receive_skb. --
Few nitpicks, I'm going to send patch reflecting this later. Why not to use rcu_assign_pointer here? since this is not hot path, that would be nicer. It should be done for rcu-protected pointers (see rcu_assign_poiter --
What this patch does is it removes two receive frame hooks (for bridge and for
macvlan) from __netif_receive_skb. These are replaced them with a single
hook for both. It only supports one hook per device because it makes no
sense to do bridging and macvlan on the same device.
Then a network driver (of virtual netdev like macvlan or bridge) can register
an rx_handler for needed net device.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
---
v4->v5
- do rcu_assign_pointer even for assigning null for rcu-protected
pointers
- cosmetics (renames, comments and __netif_receive_skb if part)
drivers/net/macvlan.c | 19 +++++--
include/linux/if_bridge.h | 2 -
include/linux/if_macvlan.h | 4 --
include/linux/netdevice.h | 7 +++
net/bridge/br.c | 2 -
net/bridge/br_if.c | 8 +++
net/bridge/br_input.c | 12 +++-
net/bridge/br_private.h | 3 +-
net/core/dev.c | 119 ++++++++++++++++++++-----------------------
9 files changed, 93 insertions(+), 83 deletions(-)
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 87e8d4c..53422ce 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -145,15 +145,16 @@ static void macvlan_broadcast(struct sk_buff *skb,
}
/* called under rcu_read_lock() from netif_receive_skb */
-static struct sk_buff *macvlan_handle_frame(struct macvlan_port *port,
- struct sk_buff *skb)
+static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb)
{
+ struct macvlan_port *port;
const struct ethhdr *eth = eth_hdr(skb);
const struct macvlan_dev *vlan;
const struct macvlan_dev *src;
struct net_device *dev;
unsigned int len;
+ port = rcu_dereference(skb->dev->macvlan_port);
if (is_multicast_ether_addr(eth->h_dest)) {
src = macvlan_hash_lookup(port, eth->h_source);
if (!src)
@@ -515,6 +516,7 @@ static int macvlan_port_create(struct net_device *dev)
{
struct ...From: Jiri Pirko <jpirko@redhat.com> Ok, this looks good to me. Applied, thanks everyone. --
On Wed, 2 Jun 2010 09:52:08 +0200 Rcu assign is not necessary here for because the hook didn't get registered so there is no way for other CPU to see it. -- --
Thats a valid point, but we should use it, and not care of this litle detail. Compiler generates same code anyway, since NULL value is tested by rcu_assign_pointer(). If we dont use rcu_assign_pointer() ourself, Paul or Arnd will put it one day or another :) http://lkml.org/lkml/2010/6/1/290 --
Some year soon, I hope. ;-) Thanx, Paul --
I would like to see this being accepted. As mentioned before I would love to be able to support multiple receive frame hooks per device, but I think this is a good start. I think especially with virtualization coming into Linux and new network virtualization approaches being developed it would be nice to have a more generic function for packet receive handlers to hook into the network stack more cleanly. Current approaches rely on 'misusing' the bridging hook into the kernel, because the packet 'sniffing' hooks (via dev_pack_add()) do not give enough control over packet processing. --
