Hi,
Bonding in 802.3ad mode breaks when the bond interface is added
to a bridge (which makes 802.3ad unusable in XEN, for example).The problem is that br_pass_frame_up() will change the skb's dev
pointer to point to the bridge interface. As a result, the LACP
packets will not reach the bond_3ad_lacpdu_recv() protocol
handler registered on the bonding device. Even if they did, the
handler won't work with the changed skb->dev.The following patch fixes the problem.
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
diff --git a/net/core/dev.c b/net/core/dev.c
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1832,6 +1832,32 @@ static inline struct net_device *skb_bond(struct sk_buff *skb)
}+/*
+ * If a bonding master interface in 802.3ad mode is part of a bridge,
+ * the bridging hook will mangle the skb->dev to the bridge device.
+ * The bonding 802.3ad protocol handler needs to access the original device,
+ * so fix this up for the 802.3ad packets.
+ */
+static inline void skb_fixup_bridged_bond(struct sk_buff *skb, struct net_device **orig_dev)
+{
+ if (unlikely(skb->protocol == __constant_htons(ETH_P_SLOW)) &&
+ (skb->dev->priv_flags & IFF_EBRIDGE)) {
+ struct net_device *dev;
+
+ read_lock(&dev_base_lock);
+ dev = __dev_get_by_index(&init_net, skb->iif);
+ read_unlock(&dev_base_lock);
+
+ if (!dev)
+ return;
+
+ if (dev->master)
+ skb->dev = dev->master;
+
+ *orig_dev = dev;
+ }
+}
+
static void net_tx_action(struct softirq_action *h)
{
struct softnet_data *sd = &__get_cpu_var(softnet_data);
@@ -2080,6 +2106,8 @@ ncls:
if (!skb)
goto out;+ skb_fixup_bridged_bond(skb, &orig_dev);
+
type = skb->protocol;
list_for_each_entry_rcu(ptype,
&ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {--
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ--
On Tue, 3 Jun 2008 15:21:06 +0200
Agree with Patrick, it might be a real problem but your solution
is doing it in the wrong place. The packets do arrive on the bridge
interface which is an aggregation of all the interfaces in the bridge.The LACPDU's are received via now on the bond device. If you moved
the packet type handler down to the physical interface, this problem
would go away because the packets would be handled to bond_3ad_lacpdu_recv
prior to being handled by the bridge. You would still have to handle
cases where bonding device was inactive, but that shouldn't be hard.
--
Hi,
... actually, now I realized the above statement is wrong. The
bridging code does not change the LACP frames, it just drops
them, because the LACP frames always have the link-localThis wouldn't really work, because with bonding the packet only
passes through netif_receive_skb() once, it just has the skb->dev
modified by skb_bond() at the very beginning.But I think I found a much nicer fix for the problem:
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -136,6 +136,10 @@ struct sk_buff *br_handle_frame(struct net_bridge_port *p, struct sk_buff *skb)
if (skb->protocol == htons(ETH_P_PAUSE))
goto drop;+ /* Don't touch SLOW frames (LACP, etc.) */
+ if (skb->protocol == htons(ETH_P_SLOW))
+ return skb;
+
/* Process STP BPDU's through normal netif_receive_skb() path */
if (p->br->stp_enabled != BR_NO_STP) {
if (NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, skb->dev,The LACP frames always have the link-local destination MAC
address and so they are not handled by the bridge anyway. They
are only dropped, unless STP is turned on. So let's just not drop
the SLOW packets. Does this look better?Thanks,
--
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ--
On Tue, 3 Jun 2008 21:32:27 +0200
Better, but still have a couple of questions:
1) Do you want to processing frames when bridge port is in blocking
state (because STP detected a loop)?
2) Do you want to process after netfilter processing to allow
some firewalling possiblity?--
I believe so. If I'm reading correctly, the layout is something
like:bridge -> bond0 -> [ eth0, eth1, etc ]
so bonding needs to see the LACPDUs in order to decide which
subset of the slaves (eth0, eth1, etc) should be active and which should
not. That, in turn, may affect the topology of the network. In other
words, the presence or absence of a loop is determined by the set of
interfaces (or, really, the location of the peer of that set) made
active by link aggregation. For 802.3ad, the set of active slaves
(active aggregator) will always connect to the same peer, but link
failures could move the active aggregator from one peer to a different
peer.This seems to agree with my (brief) examination of standards and
documentation: 802.3ad doesn't really say much about STP, 802.1d 6.5.1
discusses link aggregation a bit, in particular:a) For a MAC entity that contains a Link Aggregation sublayer, the value
of MAC_Enabled is directly determined by the value of the aAggAdminState
attribute (30.7.1.13 in IEEE Std 802.3-2002), and the value of
MAC_Operational is directly determined by the value of the aAggOperState
attribute (30.7.1.13 in IEEE Std 802.3).suggests that the aggregation is treated as a unit (I'm not that
familiar with 802.1d, so I could be misreading it here).Lastly, Cisco's Etherchannel implementation treats a LACP
aggregation as a single bridge port.Thoughts?
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
--
On Tue, 03 Jun 2008 14:22:08 -0700
I prefer the following because it process all link-local frames through
the normal input path. This means the frames will:
* be filterable by netfilter
* processed by af_packet users
* not forwarded across bridge (this is important).--- a/net/bridge/br_input.c 2008-06-03 21:44:54.000000000 -0700
+++ b/net/bridge/br_input.c 2008-06-03 21:52:20.000000000 -0700
@@ -135,15 +135,12 @@ struct sk_buff *br_handle_frame(struct n
/* Pause frames shouldn't be passed up by driver anyway */
if (skb->protocol == htons(ETH_P_PAUSE))
goto drop;
-
- /* Process STP BPDU's through normal netif_receive_skb() path */
- if (p->br->stp_enabled != BR_NO_STP) {
- if (NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, skb->dev,
- NULL, br_handle_local_finish))
- return NULL;
- else
- return skb;
- }
+
+ if (NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, skb->dev,
+ NULL, br_handle_local_finish))
+ return NULL; /* frame consumed by filter */
+ else
+ return skb; /* continue processing */
}switch (p->state) {
--
Well, the LACP frames are not filtered by netfilter when there is
bonding on its own (not part of a bridge), so I don't see why
this should change when the bond is made part of a bridge.Maybe it is a good idea to run the LACP frames through netfilter,
but I think this should be done consistently in the bonding code,
whether or not bridging is set up, and probably on the individual
slave interfaces. It does not make sense to filter bonding's LACPwhere did the "if (p->br->stp_enabled != BR_NO_STP)" condition
go? Is it not needed? I thought it was there to prevent the STP
BPDUs from being handled when STP is turned off.--
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ--
On Wed, 4 Jun 2008 10:24:25 +0200
That is already done in br_stp_rcv so the check here was not
needed.
--
Ah, I see. So can we get one of the patches in? I still think
that running the LACP frames through the bridging NF hooks does
not make sense, but it's your call.--
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ--
From: Jiri Bohac <jbohac@suse.cz>
Stephen, if prefer your approach, please repost your patch with full
commit message and proper signoffs, thanks!
--
Stephen, I haven't seen a definitive answer from you - can you
please either ack my patch (underneath) or repost yours?Thanks!
Fix bridged 802.3ad bonding
When a bonding master is added to a bridge, the bridging hook
will take over the LACP frames. The bonding ETH_P_SLOW ptype
handler will not get these, because the skb->dev is changed by
the bridging code. This breaks bonding in the 802.3ad mode.ETH_P_SLOW frames should not be touched by the bridge.
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -136,6 +136,10 @@ struct sk_buff *br_handle_frame(struct net_bridge_port *p, struct sk_buff *skb)
if (skb->protocol == htons(ETH_P_PAUSE))
goto drop;+ /* Don't touch SLOW frames (LACP, etc.) */
+ if (skb->protocol == htons(ETH_P_SLOW))
+ return skb;
+
/* Process STP BPDU's through normal netif_receive_skb() path */
if (p->br->stp_enabled != BR_NO_STP) {
if (NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, skb->dev,--
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ--
On Tue, 03 Jun 2008 14:22:08 -0700
I think the LACP frames need to be filterable. Otherwise, you open
yourself up to problems with spoofed frames. See the security attack
on STP from a couple of years ago.--
Yes. When the bond is one of the bridged interfaces, the bridge
should not affect it at all, I think. I'm talking about this kind
of setup:eth0--\
eth1---> bond0 ---- bridge
eth2--/ | |
| |
wlan1--------------- |
wlan2------------------The bridge should treat bond0 just like any other bridged
interface (e.g. wlan1 or wlan2) and not influence its internal
functionality at all.Whatever state the bridge is in, it should not influence the
bond's private communication (LACP). Of course, the bridge will
block traffic that arrives on the bond, but it should not block
control traffic that arrives on the physical slave interfacesAgain, the bonding works well when there is no bridging. I think
it should continue to work the same when the bond is added to a
brdidge.--
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ--
This shouldn't be done in the core code. How about handling this
in either the bridging code or registering the ptype handler for
the briding device?--
| Linus Torvalds | Linux 2.6.27-rc8 |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| Linus Torvalds | Linux 2.6.20-rc6 |
| Mike Snitzer | Re: Distributed storage. |
git: | |
| Gerrit Renker | [PATCH 03/37] dccp: List management for new feature negotiation |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | [GIT]: Networking |
| Herbert Xu | Re: Kernel oops with 2.6.26, padlock and ipsec: probably problem with fpu state ch... |
