Hi all. This is only a draft of patch to consult. I'm aware that it should be divided into multiple patches. I want to know opinion from you folks. The problem is described in following bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=487763 Basically here's what's going on. In every mode, bonding interface uses the same mac address for all enslaved devices. Except for mode balance-alb. When you put this kind of bond device into a bridge it will only add one of mac adresses into a hash list of mac addresses, say X. This mac address is marked as local. But this bonding interface also has mac address Y. Now then packet arrives with destination address Y, this address is not marked as local and the packed looks like it needs to be forwarded. This packet is then lost which is wrong. Notice that interfaces can be added and removed from bond while it is in bridge. Therefore I introduce another function pointer in struct net_device_ops - ndo_check_mac_address. This function when it's implemented should check passed mac address against the one set in device. I'm using this in bonding driver when the bond is in mode balance-alb to walk thru all slaves and checking if any of them equals passed address. Then in bridge function br_handle_frame_finish() I'm using ndo_check_mac_address to recognize the destination mac address as local. Please look at this and tell me what you think about it. Thanks Jirka Signed-off-by: Jiri Pirko <jpirko@redhat.com> drivers/net/bonding/bond_alb.c | 17 +++++++++++++++++ drivers/net/bonding/bond_alb.h | 1 + drivers/net/bonding/bond_main.c | 11 +++++++++++ include/linux/netdevice.h | 7 +++++++ net/bridge/br_input.c | 5 ++++- 5 files changed, 40 insertions(+), 1 deletions(-) diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index 27fb7f5..b7bcee0 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -1762,6 +1762,23 @@ int ...
On Fri, 13 Mar 2009 19:33:04 +0100 A better and more general way to do this have the dev_set_mac_address function check the return of the notifier and unwind. Then any protocol can easily prevent address from changing. --
Can you please describe this thougth a bit more? I can't understand it now... Thanks Jirka --
On Sat, 14 Mar 2009 10:49:11 +0100
Something like this:
--- a/net/core/dev.c 2009-03-15 15:55:02.098126056 -0700
+++ b/net/core/dev.c 2009-03-15 16:02:43.999251305 -0700
@@ -3830,6 +3830,7 @@ int dev_set_mac_address(struct net_devic
{
const struct net_device_ops *ops = dev->netdev_ops;
int err;
+ char save_addr[MAX_ADDR_LEN];
if (!ops->ndo_set_mac_address)
return -EOPNOTSUPP;
@@ -3837,9 +3838,17 @@ int dev_set_mac_address(struct net_devic
return -EINVAL;
if (!netif_device_present(dev))
return -ENODEV;
+
+ memcpy(save_addr, dev->dev_addr, dev->addr_len);
err = ops->ndo_set_mac_address(dev, sa);
- if (!err)
- call_netdevice_notifiers(NETDEV_CHANGEADDR, dev);
+ if (err)
+ return err;
+
+ err = call_netdevice_notifiers(NETDEV_CHANGEADDR, dev);
+ if (err) {
+ memcpy(sa->sa_data, save_addr, dev->addr_len);
+ ops->ndo_set_mac_address(dev, sa);
+ }
return err;
}
And something like this:
--- a/drivers/net/bonding/bond_main.c 2009-03-15 16:03:53.909000973 -0700
+++ b/drivers/net/bonding/bond_main.c 2009-03-15 16:11:43.227127031 -0700
@@ -3534,6 +3534,7 @@ static int bond_slave_netdev_event(unsig
{
struct net_device *bond_dev = slave_dev->master;
struct bonding *bond = netdev_priv(bond_dev);
+ int err;
switch (event) {
case NETDEV_UNREGISTER:
@@ -3570,6 +3571,15 @@ static int bond_slave_netdev_event(unsig
* servitude.
*/
break;
+ case NETDEV_CHANGEADDR:
+ if (bond->params.mode == BOND_MODE_ALB)
+ err = bond_alb_check_mac_address(bond);
+ else if (compare_ether_addr(bond_dev->dev_addr, addr) != 0)
+ err = -EINVAL;
+
+ if (err)
+ return notifier_from_errno(err);
+ break;
case NETDEV_CHANGENAME:
/*
* TODO: handle changing the primary's name
--
Yes, I think the changing mac address of slaves should be also handled by bonding driver. But my patch fixes a different issue. See, unlike in any other bonding modes, in balance-alb mode incoming packets have multiple MAC adresses (of any of enslaved devices). This causes problem because bridge only recognize one of them (the mac of master which is the mac on one of the slaves) as local - the other MAC's are not recognized as they are a part of port and therefore handled as general MAC adresses. This is the problem. I can see two solutions. Either like my patch or somehow allow bridge to know more MAC addressses per port (maybe netdev can be changed to know more then one MAC address). Any thoughts? Thanks --
From: Jiri Pirko <jpirko@redhat.com> The netdev struct already supports having a list of multiple unicast MAC addresses, it can probably be used and inspected for this. I'll hold off on your patch until we make some more progress on this discussion. --
Yes I was looking at this thing yesterday (uc_list). But this list serves to different purpose. Do you think that it will be correct to use it for this? I would maybe like to make a new list similar to this for our purpose (say addr_list). I think it would be more correct. Eventually in the furute we would use this list as a primary place to store device address instead of dev_addr value and make it more general (as device --
From: Jiri Pirko <jpirko@redhat.com> Whatever you do with that list privately inside of the bonding driver should be fine. It might upset something in the generic code if you don't clean it up before deregistration of the bonding device, so just be tidy. --
Well I do not need it only inside the bonding driver. I want bridge to use this list when adding a device in it and get mac addresses from there into its --
From reading the balance-alb description, I get the impression that this mode is simply not meant to be used with bridging: Adaptive load balancing: includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic, and does not require any special switch support. The receive load balancing is achieved by ARP negotiation. The bonding driver intercepts the ARP Replies sent by the local system on their way out and overwrites the source hardware address with the unique hardware address of one of the slaves in the bond such that different peers use different hardware addresses for the server. In any case I'd tend to say that if bond-alb mode mangles outgoing MAC addresses, it should restore the original one for received packets and keep the hacks local to bonding. --
To let bonding driver to resolve this I think there will be needed some kind of hook in netif_receive_skb() as for example bridge has. I would rather do this more general and transparent. --
(resend) Hi all. The problem is described in following bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=487763 Basically here's what's going on. In every mode, bonding interface uses the same mac address for all enslaved devices. Except for mode balance-alb. When you put this kind of bond device into a bridge it will only add one of mac adresses into a hash list of mac addresses, say X. This mac address is marked as local. But this bonding interface also has mac address Y. Now then packet arrives with destination address Y, this address is not marked as local and the packed looks like it needs to be forwarded. This packet is then lost which is wrong. Notice that interfaces can be added and removed from bond while it is in bridge. This patch solves the situation in the bonding without touching bridge code, as Patrick suggested. For every incoming frame to bonding it searches the destination address in slaves list and if any of slave addresses matches, it rewrites the address in frame by the adress of bonding master. This ensures that all frames comming thru the bonding in alb mode have the same address. Jirka Signed-off-by: Jiri Pirko <jpirko@redhat.com> diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index 27fb7f5..2838be0 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -1762,6 +1762,26 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr) return 0; } +void bond_alb_change_dest(struct sk_buff *skb) +{ + struct net_device *bond_dev = skb->dev; + struct bonding *bond = netdev_priv(bond_dev); + unsigned char *dest = eth_hdr(skb)->h_dest; + struct slave *slave; + int i; + + if (!memcmp(dest, bond_dev->dev_addr, ETH_ALEN)) + return; + read_lock(&bond->lock); + bond_for_each_slave(bond, slave, i) { + if (!memcmp(slave->dev->dev_addr, dest, ETH_ALEN)) { + memcpy(dest, bond_dev->dev_addr, ETH_ALEN); + break; + } + } + read_unlock(&bond->lock); +} + void ...
(resend, using compare_ether_addr_64bits instead of memcmp) Hi all. The problem is described in following bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=487763 Basically here's what's going on. In every mode, bonding interface uses the same mac address for all enslaved devices. Except for mode balance-alb. When you put this kind of bond device into a bridge it will only add one of mac adresses into a hash list of mac addresses, say X. This mac address is marked as local. But this bonding interface also has mac address Y. Now then packet arrives with destination address Y, this address is not marked as local and the packed looks like it needs to be forwarded. This packet is then lost which is wrong. Notice that interfaces can be added and removed from bond while it is in bridge. This patch solves the situation in the bonding without touching bridge code, as Patrick suggested. For every incoming frame to bonding it searches the destination address in slaves list and if any of slave addresses matches, it rewrites the address in frame by the adress of bonding master. This ensures that all frames comming thru the bonding in alb mode have the same address. Jirka Signed-off-by: Jiri Pirko <jpirko@redhat.com> diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index 27fb7f5..83998f4 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -1762,6 +1762,26 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr) return 0; } +void bond_alb_change_dest(struct sk_buff *skb) +{ + struct net_device *bond_dev = skb->dev; + struct bonding *bond = netdev_priv(bond_dev); + unsigned char *dest = eth_hdr(skb)->h_dest; + struct slave *slave; + int i; + + if (!compare_ether_addr_64bits(dest, bond_dev->dev_addr)) + return; + read_lock(&bond->lock); + bond_for_each_slave(bond, slave, i) { + if (!compare_ether_addr_64bits(slave->dev->dev_addr, dest)) { + memcpy(dest, bond_dev->dev_addr, ...
I think you mean "only balance-alb will simultaneously use multiple MAC addresses across different slaves." Yes? I ask because the active-backup mode with fail_over_mac=active will change the bond's MAC to always be the MAC of whatever the currently active slave is, but I don't think that will trigger the Since you put the hook outside of the skb_bond_should_drop function, does the VLAN accelerated receive path do the right thing if, e.g., there's a VLAN on top of bonding and that VLAN is part of the bridge? -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com --
Yes this fail_over_mac is en exception. In fact I was playing with fail_over_mac bonding interface in bridge and I have no luck to force a problem with two NICs. However with 3 NICs I've managed it to the state of 100% packet loss. I'm going --
From: Jiri Pirko <jpirko@redhat.com> I'll let you guys discuss this some more. It looks like we could have some more tweaks before this patch is finalized. --
Jiri: not trying to be pushy, but you didn't address the above question about the VLAN path, and I just want to make sure that you saw it (it was at the bottom of a long email, so I fear you may not have seen it). -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com --
Don't worry :) I did not forget about this - just needed a bit time to investigate... Yeah, this look's like a problem. In __vlan_hwaccel_rx there is following line: skb->dev = vlan_group_get_device(grp, vlan_tci & VLAN_VID_MASK); This rewrites the dev so latter on when netif_receive_skb is called the hook will be not called (because dev->master will not be set). Ok I will put the hook inside the skb_bond_should_drop() - it seems like a correct solution... --
(resend, updated changelog, hook moved into skb_bond_should_drop, skb_bond_should_drop ifdefed) Hi all. The problem is described in following bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=487763 Basically here's what's going on. In every mode, bonding interface uses the same mac address for all enslaved devices (except fail_over_mac). Only balance-alb will simultaneously use multiple MAC addresses across different slaves. When you put this kind of bond device into a bridge it will only add one of mac adresses into a hash list of mac addresses, say X. This mac address is marked as local. But this bonding interface also has mac address Y. Now then packet arrives with destination address Y, this address is not marked as local and the packed looks like it needs to be forwarded. This packet is then lost which is wrong. Notice that interfaces can be added and removed from bond while it is in bridge. This patch solves the situation in the bonding without touching bridge code, as Patrick suggested. For every incoming frame to bonding it searches the destination address in slaves list and if any of slave addresses matches, it rewrites the address in frame by the adress of bonding master. This ensures that all frames comming thru the bonding in alb mode have the same address. Jirka Signed-off-by: Jiri Pirko <jpirko@redhat.com> diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index 27fb7f5..b973ede 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -1762,6 +1762,25 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr) return 0; } +void bond_alb_change_dest(struct sk_buff *skb, struct net_device *bond_dev) +{ + struct bonding *bond = netdev_priv(bond_dev); + unsigned char *dest = eth_hdr(skb)->h_dest; + struct slave *slave; + int i; + + if (!compare_ether_addr_64bits(dest, bond_dev->dev_addr)) + return; + read_lock(&bond->lock); + bond_for_each_slave(bond, slave, i) ...
From: Jiri Pirko <jpirko@redhat.com> I don't like the hook, but if that's how it's best done.... Patrick, please review this. --
Yes I agree with you, but I thing that for now it's the best way to do this. I picked this solution out of 3 that I had in mind and this is the lesser evil :) --
Me neither, but I don't think this approach can be done without the hook. While I still find it questionable whether this mode really needs to be supported for a bridge at all, an alternative approach would be to have bonding add FDB entries for all secondary MACs to make bridging treat them as local. --
Well there is I think nothing unusual in this net scheme. And by for example Yes - that is the clear way. But there's not really straihtforward way to do this. The clear approach would be to extend struct net_device for list of these mac addresses and let the drivers (binding) fill it and bridge to process it. But I don't know. --
We have a list of secondary unicast addresses, but that might not be suitable in this case since the addresses are (mostly) intended not to be visible to the stack if I understood correctly. --
Well, there are many unusual things, that do not imply that they should not be I agree this list is not suitable for this - it's used for different purpose and I think it would be not wise to mix it with what we want... --
From: Patrick McHardy <kaber@trash.net> Do you guys foresee any possibility of an alternative implementation any time soon? Otherwise we're just stalling by not putting something into the tree, and as far as I can tell this patch here might as well be it. --
Adding bridge FDB entries seems like the best fix. It might need some minor ugliness to avoid new dependencies between bonding and bridging, but it definitely beats having new hooks in the core in my opinion. But I have no idea whether Jiri is actually implementing this. --
Currently I'm thinking the way. What I have on mind: I would like to add a list into struct net_device to contain all mac addresses of the device. I would also like to use similar interface to handle them as currently is for uc_list and mc_list. However I do not like that these lists are not using standard list_head but they are propriate lists only for this purpose. I'm thinking about converting them to use list_head first. Or maybe ignore them and do the new list for macs in parallel? Then we can fill this list with macs in bonding driver and let bridge check it and make fdb entries. --
Using list_heads in the address lists would require some pretty large amount of work since you'd need to convert all the drivers. I'm all in favour of doing this, but I wouldn't make the fix depend on that work. --
Whatever will make this easier :) You could of course already add the new structure and use it for your new list and do the conversion of the existing structures on top of that. --
[PATCH net-next] bonding: allow bond in mode balance-alb to work properly in bridge -try4.1 Hi all. The problem is described in following bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=487763 Basically here's what's going on. In every mode, bonding interface uses the same mac address for all enslaved devices (except fail_over_mac). Only balance-alb will simultaneously use multiple MAC addresses across different slaves. When you put this kind of bond device into a bridge it will only add one of mac adresses into a hash list of mac addresses, say X. This mac address is marked as local. But this bonding interface also has mac address Y. Now then packet arrives with destination address Y, this address is not marked as local and the packed looks like it needs to be forwarded. This packet is then lost which is wrong. Notice that interfaces can be added and removed from bond while it is in bridge. *** When the multiple addresses for bridge port approach failed to solve this issue due to STP I started to think other way to solve this. I returned to previous solution but tweaked one. This patch solves the situation in the bonding without touching bridge code. For every incoming frame to bonding the destination address is compared to current address of the slave device from which tha packet came. If these two match destination address is replaced by mac address of the master. This address is known by bridge so it is delivered properly. I experimentally tried that this works as good as searching through the slave list (v4 of this patch). I was forced to create a new header because I need to use compare_ether_addr_64bits() (defined in linux/etherdevice.h) in linux/netdevice.h. I've hit some cross include issues. I think that it's good to have skb_bond_should_drop() in a separate file anyway. Jirka Signed-off-by: Jiri Pirko <jpirko@redhat.com> diff --git a/include/linux/bonding.h b/include/linux/bonding.h new file mode 100644 index 0000000..3081ddb --- ...
Did you test this with a bond with more than 2 ports? I ask because I might also expect a check against all the members of the bond (rather than simply the receiving device). That check would be quite expensive for every frame and I think the scenario is quite unlikely based on the frequency of 'learning frames' sent by the alb code (so the switch connected to the host should have it's forwarding database correct), but it might be something to think This certainly won't cure all of the problems that arise with bonding and bridging interactions, but it's a step in the right direction. Acked-by: Andy Gospodarek <andy@greyhouse.net> --
As you can see, my previous patch did the checking vs all slaves. I tried this experimentally and searched address from the list and dev->dev_addr differs only when I unplug cable and mac swap occurs. Then one packet is lost. But there are --
But couldnt we test skb->pkt_type == PACKET_HOST instead, Or eth_type_trans() not yet called at this point ? I would suggest : if (skb->pkt_type == PACKET_HOST) --
Yes Eric, you are right, good point. eth_type_trans() is called in any driver before and compare_ether_addr_64bits(dest, dev->dev_addr) is done there. So it's safe to use this here (and save some ticks). I'm going to make a new patch and test it. --
[PATCH net-next] bonding: allow bond in mode balance-alb to work properly in bridge -try4.2 (updated) changes v4.1 -> v4.2 - use skb->pkt_type == PACKET_HOST compare rather then comparing skb dest addr against skb->dev->dev_addr Hi all. The problem is described in following bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=487763 Basically here's what's going on. In every mode, bonding interface uses the same mac address for all enslaved devices (except fail_over_mac). Only balance-alb will simultaneously use multiple MAC addresses across different slaves. When you put this kind of bond device into a bridge it will only add one of mac adresses into a hash list of mac addresses, say X. This mac address is marked as local. But this bonding interface also has mac address Y. Now then packet arrives with destination address Y, this address is not marked as local and the packed looks like it needs to be forwarded. This packet is then lost which is wrong. Notice that interfaces can be added and removed from bond while it is in bridge. *** When the multiple addresses for bridge port approach failed to solve this issue due to STP I started to think other way to solve this. I returned to previous solution but tweaked one. This patch solves the situation in the bonding without touching bridge code. For every incoming frame to bonding the destination address is compared to current address of the slave device from which tha packet came. If these two match destination address is replaced by mac address of the master. This address is known by bridge so it is delivered properly. Note that the comparsion is not made directly, it's used skb->pkt_type == PACKET_HOST instead. This is "set" previously in eth_type_trans(). I experimentally tried that this works as good as searching through the slave list (v4 of this patch). I was forced to create a new header because I need to use compare_ether_addr_64bits() (defined in linux/etherdevice.h) in linux/netdevice.h. I've hit some ...
Just overwriting the dest would be faster, and avoids
to include <linux/etherdevice.h>, maybe a new include file
could be avoided ?
If it is already the master->dev_addr, then memcpy() is a no-op
If it wasnt the master->dev_addr, then memcpy() does what you wanted.
You can also give a hint to gcc as h_dest is guaranteed to be 16 bit aligned
static inline void skb_bond_set_mac_by_master(struct sk_buff *skb,
struct net_device *master)
{
if (skb->pkt_type == PACKET_HOST) {
u16 *dest = (u16 *)eth_hdr(skb)->h_dest;
memcpy(dest, master->dev_addr, ETH_ALEN);
}
}
Compiler will emit better code for memcpy() on some arches.
(not on x86, as it already does one 32bit and one 16bit move)
--
Okay, I consulted the comparing/memcpy question with Oleg (cc'ed) and he also agree to do this your way. I'll make a patch, test it and post it soon. --
[PATCH net-next] bonding: allow bond in mode balance-alb to work properly in bridge -try4.3 (updated) changes v4.2 -> v4.3 - memcpy the address always, not just in case it differs from master->dev_addr - compare_ether_addr_64bits() is not used so there is no direct need to make new header file (I think it would be good to have bond stuff in separate file anyway). changes v4.1 -> v4.2 - use skb->pkt_type == PACKET_HOST compare rather then comparing skb dest addr against skb->dev->dev_addr Hi all. The problem is described in following bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=487763 Basically here's what's going on. In every mode, bonding interface uses the same mac address for all enslaved devices (except fail_over_mac). Only balance-alb will simultaneously use multiple MAC addresses across different slaves. When you put this kind of bond device into a bridge it will only add one of mac adresses into a hash list of mac addresses, say X. This mac address is marked as local. But this bonding interface also has mac address Y. Now then packet arrives with destination address Y, this address is not marked as local and the packed looks like it needs to be forwarded. This packet is then lost which is wrong. Notice that interfaces can be added and removed from bond while it is in bridge. *** When the multiple addresses for bridge port approach failed to solve this issue due to STP I started to think other way to solve this. I returned to previous solution but tweaked one. This patch solves the situation in the bonding without touching bridge code. For every incoming frame to bonding the destination address is compared to current address of the slave device from which tha packet came. If these two match destination address is replaced by mac address of the master. This address is known by bridge so it is delivered properly. Note that the comparsion is not made directly, it's used skb->pkt_type == PACKET_HOST instead. This is "set" previously in ...
Yes, this could be done in a future cleanup patch. I find this (short) version easier to review for a new feature. --
From: Eric Dumazet <eric.dumazet@gmail.com> Applied, thanks everyone. --
This one is fine too. --
(resend, updated changelog, completely reworked) Hi all. The problem is described in following bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=487763 Basically here's what's going on. In every mode, bonding interface uses the same mac address for all enslaved devices (except fail_over_mac). Only balance-alb will simultaneously use multiple MAC addresses across different slaves. When you put this kind of bond device into a bridge it will only add one of mac adresses into a hash list of mac addresses, say X. This mac address is marked as local. But this bonding interface also has mac address Y. Now then packet arrives with destination address Y, this address is not marked as local and the packed looks like it needs to be forwarded. This packet is then lost which is wrong. Notice that interfaces can be added and removed from bond while it is in bridge. This patchset solves this issue in the best way it can be possibly solved. By adding all mac addresses of all slave devices to the bridge hash list. To carry these addresses the new list has to be introduced in struct net_device. Jirka --
Introducing function dev_mac_address_changed which can be called from driver
which changed his mac address to force notifiers to be called.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
include/linux/netdevice.h | 1 +
net/core/dev.c | 12 ++++++++++++
2 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2e7783f..ff8db51 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1461,6 +1461,7 @@ extern int dev_change_net_namespace(struct net_device *,
extern int dev_set_mtu(struct net_device *, int);
extern int dev_set_mac_address(struct net_device *,
struct sockaddr *);
+extern void dev_mac_address_changed(struct net_device *);
extern int dev_hard_start_xmit(struct sk_buff *skb,
struct net_device *dev,
struct netdev_queue *txq);
diff --git a/net/core/dev.c b/net/core/dev.c
index 91d792d..1adc89b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3833,6 +3833,18 @@ int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa)
return err;
}
+/**
+ * dev_mac_address_changed - Notify Media Access Control Address changed
+ * @dev: device
+ *
+ * Notifies the change of the hardware (MAC) address of the device
+ */
+void dev_mac_address_changed(struct net_device *dev)
+{
+ call_netdevice_notifiers(NETDEV_CHANGEADDR, dev);
+}
+EXPORT_SYMBOL(dev_mac_address_changed);
+
/*
* Perform the SIOCxIFxxx calls, inside read_lock(dev_base_lock)
*/
--
1.6.0.6
--
On Mon, 13 Apr 2009 10:38:48 +0200 The original version of this that I send, allowed notifiers to return an error to block changing address (error would go back to application). This is how other notifier hooks work (mtu, etc). Why is dev_set_mac_address_changed called out separately, it should be inside dev_set_mac_address. --
This patch introduces a new list in struct net_device and brings a set of
functions to handle the work with device address list. The list is a replacement
for the original dev_addr field and because in some situations there is need to
carry several device addresses with the net device. To be backward compatible,
dev_addr is made to point to the first member of the list so original drivers
sees no difference.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
include/linux/netdevice.h | 51 +++++++++-
net/core/dev.c | 264 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 313 insertions(+), 2 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ff8db51..8cf62f1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -210,6 +210,12 @@ struct dev_addr_list
#define dmi_users da_users
#define dmi_gusers da_gusers
+struct hw_addr {
+ struct list_head list;
+ unsigned char addr[MAX_ADDR_LEN];
+ int refcount;
+};
+
struct hh_cache
{
struct hh_cache *hh_next; /* Next entry */
@@ -776,8 +782,12 @@ struct net_device
*/
unsigned long last_rx; /* Time of last Rx */
/* Interface address info used in eth_type_trans() */
- unsigned char dev_addr[MAX_ADDR_LEN]; /* hw address, (before bcast
- because most packets are unicast) */
+ unsigned char *dev_addr; /* hw address, (before bcast
+ because most packets are
+ unicast) */
+
+ struct list_head dev_addr_list; /* list of device hw addresses */
+ spinlock_t dev_addr_list_lock;
unsigned char broadcast[MAX_ADDR_LEN]; /* hw bcast add */
@@ -1779,6 +1789,32 @@ static inline void netif_addr_unlock_bh(struct net_device *dev)
spin_unlock_bh(&dev->addr_list_lock);
}
+/* Locking helpers for spinlock guarding dev_addr_list */
+
+static inline void netif_dev_addr_lock(struct net_device *dev)
+{
+ spin_lock(&dev->dev_addr_list_lock);
+}
+
+static inline void ...On Mon, 13 Apr 2009 10:42:02 +0200 This lock is unnecessary, use RCU list for read. Since all changes are under RTNL mutex, there is no chance for conflict on update. --
From: Stephen Hemminger <shemminger@linux-foundation.org> Agreed. --
From: Jiri Pirko <jpirko@redhat.com> Please don't pollute the global namespace with a structure name Please let the compiler inline things as it sees fit. These aren't routines in some header file or anything like that. --
This patch changes the handling of mac addresses of bridge port devices. Now
it uses previously introduced list of device addresses. It allows the bridge to
know more then one local mac address per port which is mandatory for the right
work in some cases.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
net/bridge/br_fdb.c | 120 +++++++++++++++++++++++++++++++++--------------
net/bridge/br_if.c | 2 +-
net/bridge/br_notify.c | 2 +-
net/bridge/br_private.h | 4 +-
4 files changed, 89 insertions(+), 39 deletions(-)
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index a48f5ef..6efc556 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -77,10 +77,45 @@ static inline void fdb_delete(struct net_bridge_fdb_entry *f)
br_fdb_put(f);
}
-void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr)
+/*
+ * Finds out if passed address is one of the addresses assigned to the device.
+ * Returns 1 on positive result
+ */
+static inline int is_dev_addr(struct net_device *dev, unsigned char *addr)
+{
+ struct hw_addr *ha;
+ int ret = 1;
+
+ netif_dev_addr_lock_bh(dev);
+ for_each_dev_addr(dev, ha) {
+ ret = compare_ether_addr(addr, ha->addr);
+ if (!ret)
+ break;
+ }
+ netif_dev_addr_unlock_bh(dev);
+ return !ret ? 1 : 0;
+}
+
+static int another_port_has_addr(const struct net_bridge_port *p,
+ struct net_bridge_fdb_entry *f)
+{
+ struct net_bridge *br = p->br;
+ struct net_bridge_port *op;
+
+ list_for_each_entry(op, &br->port_list, list) {
+ if (op != p && is_dev_addr(op->dev, f->addr.addr)) {
+ f->dst = op;
+ return 1;
+ }
+ }
+ return 0;
+}
+
+void br_fdb_changeaddr(struct net_bridge_port *p, struct net_device *dev)
{
struct net_bridge *br = p->br;
int i;
+ struct hw_addr *ha;
spin_lock_bh(&br->hash_lock);
@@ -92,26 +127,23 @@ void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr)
f = hlist_entry(h, struct net_bridge_fdb_entry, ...On Mon, 13 Apr 2009 10:44:08 +0200 Why not a general version in net_device.h or etherdevice.h? Forwarding database is hot path, people sometimes run lots of devices You added another layer of locking on the already hot bridge fast path. --
This only puts the original loop code to the function, so if compiler decides to inline this there might be no difference. --
From: Jiri Pirko <jpirko@redhat.com> Please drop the inline, let the compiler work it out. --
When in mode alb, add all device addresses which belong to an enslaved slave
device to the bond device. This ensures that all mac addresses will be
treated as local and bonding in this mode will work fine in bridge.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
drivers/net/bonding/bond_main.c | 30 +++++++++++++++++++++++++++++-
1 files changed, 29 insertions(+), 1 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 99610f3..47795c7 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1385,6 +1385,11 @@ static void bond_setup_by_slave(struct net_device *bond_dev,
bond->setup_by_slave = 1;
}
+static inline int should_copy_dev_addrs(struct bonding *bond)
+{
+ return bond->params.mode == BOND_MODE_ALB ? 1 : 0;
+}
+
/* enslave device <slave> to bond device <master> */
int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
{
@@ -1510,6 +1515,13 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
*/
new_slave->original_flags = slave_dev->flags;
+ if (should_copy_dev_addrs(bond)) {
+ res = dev_addr_add_multiple(bond_dev, slave_dev);
+ if (res)
+ goto err_free;
+ dev_mac_address_changed(bond_dev);
+ }
+
/*
* Save slave's original ("permanent") mac address for modes
* that need it, and for restoring it upon release, and then
@@ -1527,7 +1539,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
res = dev_set_mac_address(slave_dev, &addr);
if (res) {
pr_debug("Error %d calling set_mac_address\n", res);
- goto err_free;
+ goto err_remove_dev_addrs;
}
}
@@ -1769,6 +1781,12 @@ err_restore_mac:
dev_set_mac_address(slave_dev, &addr);
}
+err_remove_dev_addrs:
+ if (should_copy_dev_addrs(bond)) {
+ dev_addr_del_multiple(bond_dev, slave_dev);
+ dev_mac_address_changed(bond_dev);
+ }
+
err_free:
kfree(new_slave);
@@ -1954,6 +1972,11 @@ int ...On Mon, 13 Apr 2009 10:46:15 +0200
static inline bool should_copy_dev_addrs(const struct bonding *bond)
{
return (bond->params.mode == BOND_MODE_ALB);
}
Three things are wrong with your style here:
1. Needless use of tri-graph operator, just return the result
2. Use const for test_foo() type functions
The notifier (dev_mac_address_changed) should be part of dev_addr_add
--
(resend, rcu list locking, cometics) Hi all. The problem is described in following bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=487763 Basically here's what's going on. In every mode, bonding interface uses the same mac address for all enslaved devices (except fail_over_mac). Only balance-alb will simultaneously use multiple MAC addresses across different slaves. When you put this kind of bond device into a bridge it will only add one of mac adresses into a hash list of mac addresses, say X. This mac address is marked as local. But this bonding interface also has mac address Y. Now then packet arrives with destination address Y, this address is not marked as local and the packed looks like it needs to be forwarded. This packet is then lost which is wrong. Notice that interfaces can be added and removed from bond while it is in bridge. This patchset solves this issue in the best way it can be possibly solved. By adding all mac addresses of all slave devices to the bridge hash list. To carry these addresses the new list has to be introduced in struct net_device. Jirka --
This patch introduces a new list in struct net_device and brings a set of
functions to handle the work with device address list. The list is a replacement
for the original dev_addr field and because in some situations there is need to
carry several device addresses with the net device. To be backward compatible,
dev_addr is made to point to the first member of the list so original drivers
sees no difference.
Note: patch adding list_first_entry_rcu (currently in Ingo's tip tree) needed.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
include/linux/etherdevice.h | 24 ++++
include/linux/netdevice.h | 31 +++++-
net/core/dev.c | 262 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 315 insertions(+), 2 deletions(-)
diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
index a1f17ab..348a75e 100644
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -205,4 +205,28 @@ static inline int compare_ether_header(const void *a, const void *b)
(a32[1] ^ b32[1]) | (a32[2] ^ b32[2]);
}
+/**
+ * is_etherdev_addr - Tell if given Ethernet address belongs to the device.
+ * @dev: Pointer to a device structure
+ * @addr: Pointer to a six-byte array containing the Ethernet address
+ *
+ * Compare passed address with all addresses of the device. Return true if the
+ * address if one of the device addresses.
+ */
+static inline bool is_etherdev_addr(const struct net_device *dev,
+ const u8 *addr)
+{
+ struct netdev_hw_addr *ha;
+ int res = 1;
+
+ rcu_read_lock();
+ for_each_dev_addr(dev, ha) {
+ res = compare_ether_addr(addr, ha->addr);
+ if (!res)
+ break;
+ }
+ rcu_read_unlock();
+ return !res;
+}
+
#endif /* _LINUX_ETHERDEVICE_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2e7783f..77abfdf 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -210,6 +210,12 @@ struct dev_addr_list
#define dmi_users da_users
#define ...This patch introduces a new list in struct net_device and brings a set of
functions to handle the work with device address list. The list is a replacement
for the original dev_addr field and because in some situations there is need to
carry several device addresses with the net device. To be backward compatible,
dev_addr is made to point to the first member of the list so original drivers
sees no difference.
Note: patch adding list_first_entry_rcu (currently in Ingo's tip tree) needed.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
include/linux/etherdevice.h | 24 ++++
include/linux/netdevice.h | 31 +++++-
net/core/dev.c | 263 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 316 insertions(+), 2 deletions(-)
diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
index a1f17ab..348a75e 100644
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -205,4 +205,28 @@ static inline int compare_ether_header(const void *a, const void *b)
(a32[1] ^ b32[1]) | (a32[2] ^ b32[2]);
}
+/**
+ * is_etherdev_addr - Tell if given Ethernet address belongs to the device.
+ * @dev: Pointer to a device structure
+ * @addr: Pointer to a six-byte array containing the Ethernet address
+ *
+ * Compare passed address with all addresses of the device. Return true if the
+ * address if one of the device addresses.
+ */
+static inline bool is_etherdev_addr(const struct net_device *dev,
+ const u8 *addr)
+{
+ struct netdev_hw_addr *ha;
+ int res = 1;
+
+ rcu_read_lock();
+ for_each_dev_addr(dev, ha) {
+ res = compare_ether_addr(addr, ha->addr);
+ if (!res)
+ break;
+ }
+ rcu_read_unlock();
+ return !res;
+}
+
#endif /* _LINUX_ETHERDEVICE_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2e7783f..77abfdf 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -210,6 +210,12 @@ struct dev_addr_list
#define dmi_users da_users
#define ...From: Jiri Pirko <jpirko@redhat.com> Jiri, please add some distinguishing text to your subject lines when you post fixed up version of patches. Like "v2" or something like that, and make a note under the commit message of the changes you've made from the previous version. Otherwise I think it's a dup (because I get a thousand copies anyways) and will just delete it both in my inbox and on patchwork. Thanks. --
You see no difference ? Please look more closely... I see one additional dereference in hot path, to small objects possibly with false sharing effects. So I would advise not changing dev_addr[] to a pointer. And instead copy first netdev_hw_addr into it. Also, doing a kzalloc(sizeof(struct netdev_hw_addr)) for allocating these structs might give a block of memory < L1_CACHE_SIZE so kernel is free to give other part of this cache line to some other layer that could be a hot spot, so false sharing could happen. Since you obviously need a write lock here to be sure following can be done by one cpu only. kzalloc(max(sizeof(*ha), L1_CACHE_SIZE), GFP_...) is thus higly recommended here. Oh well... I'm pretty sure this synchronize_rcu() call can be avoided, dont you think ? Check kfree_rcu() or equivalent, as it seems not yet --
Hmm :( That is what I was trying to avoid. If the first netdev_hw_addr in the list is a copy of dev_addr, then there must be synchronizing of those two. This would be a pain.. Plus I thought that eventually dev_addr would not be accessible directly but only by set of macros/inlines to accesse the list, and You mean PAGE_CACHE_SIZE? I think that would be little wasting... But I see your I used the original as the bridge code used it. Ok, noted. Yes, it is not needed here. I've copied it here from the original unicast and multicast add funtion to stay close but as I can see, there is no need for it there either. Agree that here locking is not necessary, but I wanted to stay consistent to the --
Also needed for IPv6 in softirq context. --
If kfree_rcu() not yet available, please use a regular call_rcu() construct
(thus adding a struct rcu_head rcu; in struct netdev_hw_addr)
If you delete say 10 addresses on a device, while RTNL (or other lock) locked,
Yes, it is very confusing for reviewers because we feel patch submiter
is not comfortable with locking rules.
Check for example dev_add_pack() in net/core/dev.c : It uses list_add_rcu()
but as it also uses a regular spinlock, there is no point using rcu_read_lock().
void dev_add_pack(struct packet_type *pt)
{
int hash;
spin_lock_bh(&ptype_lock);
if (pt->type == htons(ETH_P_ALL))
list_add_rcu(&pt->list, &ptype_all);
else {
hash = ntohs(pt->type) & PTYPE_HASH_MASK;
list_add_rcu(&pt->list, &ptype_base[hash]);
}
spin_unlock_bh(&ptype_lock);
}
Please note list_add_rcu() (and/or rcu_assign_pointer()) are still needed to protect
readers that dont use the spinlock at all.
If you use fact that RTNL is locked when calling your code, you could add
ASSERT_RTNL();
at strategic points so that this assertion can be checked at runtime.
(but Patrick & David wrote that you should not assume RTNL, so you probably need another lock...)
Thank you
--
changes against last patch version:
-added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush
-removed unnecessary rcu_read locking in dev_addr_init
-use compare_ether_addr_64bits instead of compare_ether_addr
-use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr
-use call_rcu instead of rcu_synchronize
-moved is_etherdev_addr into __KERNEL__ ifdef
This patch introduces a new list in struct net_device and brings a set of
functions to handle the work with device address list. The list is a replacement
for the original dev_addr field and because in some situations there is need to
carry several device addresses with the net device. To be backward compatible,
dev_addr is made to point to the first member of the list so original drivers
sees no difference.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
include/linux/etherdevice.h | 27 +++++
include/linux/netdevice.h | 32 +++++-
net/core/dev.c | 271 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 328 insertions(+), 2 deletions(-)
diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
index a1f17ab..3d7a668 100644
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -182,6 +182,33 @@ static inline unsigned compare_ether_addr_64bits(const u8 addr1[6+2],
return compare_ether_addr(addr1, addr2);
#endif
}
+
+/**
+ * is_etherdev_addr - Tell if given Ethernet address belongs to the device.
+ * @dev: Pointer to a device structure
+ * @addr: Pointer to a six-byte array containing the Ethernet address
+ *
+ * Compare passed address with all addresses of the device. Return true if the
+ * address if one of the device addresses.
+ *
+ * Note that this function calls compare_ether_addr_64bits() so take care of
+ * the right padding.
+ */
+static inline bool is_etherdev_addr(const struct net_device *dev,
+ const u8 addr[6 + 2])
+{
+ struct netdev_hw_addr *ha;
+ int res = 1;
+
+ rcu_read_lock();
+ for_each_dev_addr(dev, ha) ...Please put here the ASSERT_RTNL(), not in various callers, since this is the place where we really assume rtnl lock is locked by us. You still use rcu_read_lock()/unlock() and rcu variant here... But caller of this function has RTNL (or other lock) so dont use rcu here, as it seems same here, no need for rcu_read_lock(), since you are going to change list, you seems risky here to set this to NULL... You could use a static var to avoid further NULL dereference. static char nulladdr[MAX_ADDR_LEN]; --
Well I'd like to have ASSERT_RTNL in callers. The reason is that for this purpose (dev_addr) the guarding lock is rtnl. But for example for multicast addresses it won't be. It will be most probably a spin lock. But those callers (multicast) will use this __hw_addr_xxx functions too. Therefore I'd like to Yes this is unecessrary and confusing I agree. Will remove these read locks in Yes, I wanted to show that for "from_list" this is a reader... I assume that driver might not use dev_addr after it calls unregister_netdevice(). But ok - I would rather move calling dev_addr_flush() somewhere later where there is a guarantee that dev_addr should not be referenced. Perhaps in free_netdev() ? It would also correspond with calling --
v2 -> v3 (current): -removed unnecessary rcu read locking -moved dev_addr_flush() calling to ensure no null dereference of dev_addr v1 -> v2: -added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush -removed unnecessary rcu_read locking in dev_addr_init -use compare_ether_addr_64bits instead of compare_ether_addr -use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr -use call_rcu instead of rcu_synchronize -moved is_etherdev_addr into __KERNEL__ ifdef This patch introduces a new list in struct net_device and brings a set of functions to handle the work with device address list. The list is a replacement for the original dev_addr field and because in some situations there is need to carry several device addresses with the net device. To be backward compatible, dev_addr is made to point to the first member of the list so original drivers sees no difference. Signed-off-by: Jiri Pirko <jpirko@redhat.com> --- include/linux/etherdevice.h | 27 +++++ include/linux/netdevice.h | 32 +++++- net/core/dev.c | 261 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 318 insertions(+), 2 deletions(-) diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h index a1f17ab..3d7a668 100644 --- a/include/linux/etherdevice.h +++ b/include/linux/etherdevice.h @@ -182,6 +182,33 @@ static inline unsigned compare_ether_addr_64bits(const u8 addr1[6+2], return compare_ether_addr(addr1, addr2); #endif } + +/** + * is_etherdev_addr - Tell if given Ethernet address belongs to the device. + * @dev: Pointer to a device structure + * @addr: Pointer to a six-byte array containing the Ethernet address + * + * Compare passed address with all addresses of the device. Return true if the + * address if one of the device addresses. + * + * Note that this function calls compare_ether_addr_64bits() so take care of + * the right padding. + */ +static inline bool is_etherdev_addr(const struct net_device *dev, + const u8 addr[6 ...
On Fri, 17 Apr 2009 13:57:24 +0200 Minor nit, the ordering of elements cause holes that might not be needed. Space saving? is rcu_head needed or would using synchronize_net Since this is local you should be able to audit all Ditto, ASSERT_RTNL makes sense for exposed kernel API and --
Fri, Apr 17, 2009 at 05:33:15PM CEST, shemminger@vyatta.com wrote: Well I originaly had this done by synchronize_rcu(). Eric argued that it might cause especially __hw_addr_del_multiple_ii() to run long and suggested to use call_rcu() instead. I plan to switch this to kfree_rcu() (or whatever it's called) once it hits the tree. <snip> --
Yes, and dont forget we wont save space, as we allocate a full cache line to hold a 'struct netdev_hw_addr', since we dont want this critical and read_mostly object polluted by a hot spot elsewhere in kernel... Considering this, letting 'rcu_head' at the end of structure, even if we have an eventual hole on 64 bit arches is not really a problem, and IMHO the best thing to do, as rcu_head is only used at dismantle time. And yes, maybe kfree_rcu() will makes its way in kernel, eventually :) Thank you --
I will order the struct better, there are archs with small cache line size where --
How exactly ?
If you consider a 32bit arch with 16 or 32 bytes cache line,
sizeof(struct_list_dead) is 8
sizeof(addr) = 32 (but we really use 6 bytes for ethernet)
struct netdev_hw_addr {
unsigned char addr[MAX_ADDR_LEN];
struct list_head list;
int refcount;
struct rcu_head rcu_head;
};
would cost more at lookup time, since we would use two cache lines
struct netdev_hw_addr {
struct list_head list;
unsigned char addr[MAX_ADDR_LEN];
int refcount;
struct rcu_head rcu_head;
};
Is nicer, because at least 8 bytes of addr share the same cache line
than list. So direct dev->dev_addr would be fast (for devices with one
address), and is_etherdev_addr() would still use one cache line per
item.
--
v3 -> v4 (current): -changed kzalloc to kmalloc in __hw_addr_add_ii() -ASSERT_RTNL() avoided in dev_addr_flush() and dev_addr_init() v2 -> v3: -removed unnecessary rcu read locking -moved dev_addr_flush() calling to ensure no null dereference of dev_addr v1 -> v2: -added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush -removed unnecessary rcu_read locking in dev_addr_init -use compare_ether_addr_64bits instead of compare_ether_addr -use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr -use call_rcu instead of rcu_synchronize -moved is_etherdev_addr into __KERNEL__ ifdef This patch introduces a new list in struct net_device and brings a set of functions to handle the work with device address list. The list is a replacement for the original dev_addr field and because in some situations there is need to carry several device addresses with the net device. To be backward compatible, dev_addr is made to point to the first member of the list so original drivers sees no difference. Signed-off-by: Jiri Pirko <jpirko@redhat.com> --- include/linux/etherdevice.h | 27 +++++ include/linux/netdevice.h | 32 +++++- net/core/dev.c | 261 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 318 insertions(+), 2 deletions(-) diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h index a1f17ab..3d7a668 100644 --- a/include/linux/etherdevice.h +++ b/include/linux/etherdevice.h @@ -182,6 +182,33 @@ static inline unsigned compare_ether_addr_64bits(const u8 addr1[6+2], return compare_ether_addr(addr1, addr2); #endif } + +/** + * is_etherdev_addr - Tell if given Ethernet address belongs to the device. + * @dev: Pointer to a device structure + * @addr: Pointer to a six-byte array containing the Ethernet address + * + * Compare passed address with all addresses of the device. Return true if the + * address if one of the device addresses. + * + * Note that this function calls compare_ether_addr_64bits() so take care ...
How about this (and another 2 patches in patchset)? What's your opinion guys? Thanks, Jirka --
On Sat, 18 Apr 2009 10:58:49 +0200 I am still unsure why this added complexity to the network device model is needed. How does this interact with neighbor table (ARP)? Isn't this what macvlan already does. --
I've found no other way how to support devices with multiple macs i.e. as a port It doesn't interact ARP. ARP uses only the first device (dev_addr ptr). I'm not Hmm I'm looking at this and I'm not sure it would help (I've found no documentation). Can you please show an example how this solves "bonding interface in atb mode as a port in the bridge" problem? --
v4 -> v5 (current): -added device address type (suggested by davem) -removed refcounting (better to have simplier code then safe potentially few bytes) v3 -> v4: -changed kzalloc to kmalloc in __hw_addr_add_ii() -ASSERT_RTNL() avoided in dev_addr_flush() and dev_addr_init() v2 -> v3: -removed unnecessary rcu read locking -moved dev_addr_flush() calling to ensure no null dereference of dev_addr v1 -> v2: -added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush -removed unnecessary rcu_read locking in dev_addr_init -use compare_ether_addr_64bits instead of compare_ether_addr -use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr -use call_rcu instead of rcu_synchronize -moved is_etherdev_addr into __KERNEL__ ifdef This patch introduces a new list in struct net_device and brings a set of functions to handle the work with device address list. The list is a replacement for the original dev_addr field and because in some situations there is need to carry several device addresses with the net device. To be backward compatible, dev_addr is made to point to the first member of the list so original drivers sees no difference. Signed-off-by: Jiri Pirko <jpirko@redhat.com> --- include/linux/etherdevice.h | 27 +++++ include/linux/netdevice.h | 37 ++++++- net/core/dev.c | 271 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 333 insertions(+), 2 deletions(-) diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h index a1f17ab..3d7a668 100644 --- a/include/linux/etherdevice.h +++ b/include/linux/etherdevice.h @@ -182,6 +182,33 @@ static inline unsigned compare_ether_addr_64bits(const u8 addr1[6+2], return compare_ether_addr(addr1, addr2); #endif } + +/** + * is_etherdev_addr - Tell if given Ethernet address belongs to the device. + * @dev: Pointer to a device structure + * @addr: Pointer to a six-byte array containing the Ethernet address + * + * Compare passed address with all addresses of the ...
From: Jiri Pirko <jpirko@redhat.com> Unused static function, this will create build warnings. Or, it should :-) If you plan to use such a function in subsequent patches, add it in those changes not here. Otherwise I have no fundamental objection to this patch, nice work! --
Yes, Ok, I was not quite sure. Thanks for explanation, I'll resubmit later --
v5 -> v6 (current): -removed so far unused static functions -corrected dev_addr_del_multiple to call del instead of add v4 -> v5: -added device address type (suggested by davem) -removed refcounting (better to have simplier code then safe potentially few bytes) v3 -> v4: -changed kzalloc to kmalloc in __hw_addr_add_ii() -ASSERT_RTNL() avoided in dev_addr_flush() and dev_addr_init() v2 -> v3: -removed unnecessary rcu read locking -moved dev_addr_flush() calling to ensure no null dereference of dev_addr v1 -> v2: -added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush -removed unnecessary rcu_read locking in dev_addr_init -use compare_ether_addr_64bits instead of compare_ether_addr -use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr -use call_rcu instead of rcu_synchronize -moved is_etherdev_addr into __KERNEL__ ifdef This patch introduces a new list in struct net_device and brings a set of functions to handle the work with device address list. The list is a replacement for the original dev_addr field and because in some situations there is need to carry several device addresses with the net device. To be backward compatible, dev_addr is made to point to the first member of the list so original drivers sees no difference. Signed-off-by: Jiri Pirko <jpirko@redhat.com> --- include/linux/etherdevice.h | 27 +++++ include/linux/netdevice.h | 37 ++++++- net/core/dev.c | 250 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 312 insertions(+), 2 deletions(-) diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h index a1f17ab..3d7a668 100644 --- a/include/linux/etherdevice.h +++ b/include/linux/etherdevice.h @@ -182,6 +182,33 @@ static inline unsigned compare_ether_addr_64bits(const u8 addr1[6+2], return compare_ether_addr(addr1, addr2); #endif } + +/** + * is_etherdev_addr - Tell if given Ethernet address belongs to the device. + * @dev: Pointer to a device structure + * @addr: ...
From: Jiri Pirko <jpirko@redhat.com> Applied to net-next-2.6, thanks! --
On Tue, 05 May 2009 12:27:18 -0700 (PDT) Not sure if this is such a good idea since the purpose of this was to fix a bonding/bridging interaction, but it breaks STP on bridging. --
From: Stephen Hemminger <shemminger@vyatta.com> Thanks for not paying attention... :-/ The Intel folks want to have an address list functionality so they can public MAC addresses meant for FCOE and other purposes. So even if the bonding bits bomb, we still need this. --
On Fri, 08 May 2009 16:00:08 -0700 (PDT) But the other infrastructure may have same issues (netfilter, etc). Just seems like it would be either to have multiple network devices so that upper layers could disambiguate easier. -- --
From: Stephen Hemminger <shemminger@vyatta.com> That's quite a heavyweight solution to what is purely an addressing issue, don't you think? We can just revert all of that netdev_ops stuff if you think per-netdev cost doesn't matter :-) --
On Fri, 08 May 2009 16:25:55 -0700 (PDT) I am just concerned that is a fundamental change (like MQ) was and it will take a couple of releases to shake out all the inter-relationships. When an architectural change is made, would like to see more analysis done. Netdev-ops was more of a refactoring than a change to what protocols and qdisc/filters/... see. -- --
This patch changes the handling of mac addresses of bridge port devices. Now
it uses previously introduced list of device addresses. It allows the bridge to
know more then one local mac address per port which is mandatory for the right
work in some cases.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
net/bridge/br_fdb.c | 101 ++++++++++++++++++++++++++++++----------------
net/bridge/br_if.c | 2 +-
net/bridge/br_notify.c | 2 +-
net/bridge/br_private.h | 4 +-
4 files changed, 70 insertions(+), 39 deletions(-)
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index a48f5ef..1e63f76 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -77,10 +77,26 @@ static inline void fdb_delete(struct net_bridge_fdb_entry *f)
br_fdb_put(f);
}
-void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr)
+static bool another_port_has_addr(const struct net_bridge_port *p,
+ struct net_bridge_fdb_entry *f)
+{
+ struct net_bridge *br = p->br;
+ struct net_bridge_port *op;
+
+ list_for_each_entry(op, &br->port_list, list) {
+ if (op != p && is_etherdev_addr(op->dev, f->addr.addr)) {
+ f->dst = op;
+ return 1;
+ }
+ }
+ return 0;
+}
+
+void br_fdb_changeaddr(struct net_bridge_port *p, struct net_device *dev)
{
struct net_bridge *br = p->br;
int i;
+ struct netdev_hw_addr *ha;
spin_lock_bh(&br->hash_lock);
@@ -92,26 +108,23 @@ void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr)
f = hlist_entry(h, struct net_bridge_fdb_entry, hlist);
if (f->dst == p && f->is_local) {
- /* maybe another port has same hw addr? */
- struct net_bridge_port *op;
- list_for_each_entry(op, &br->port_list, list) {
- if (op != p &&
- !compare_ether_addr(op->dev->dev_addr,
- f->addr.addr)) {
- f->dst = op;
- goto insert;
- }
- }
-
- /* delete old one */
- fdb_delete(f);
- goto insert;
+ /*
+ * maybe another port ...(repost, no modifications)
This patch changes the handling of mac addresses of bridge port devices. Now
it uses previously introduced list of device addresses. It allows the bridge to
know more then one local mac address per port which is mandatory for the right
work in some cases.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
net/bridge/br_fdb.c | 101 ++++++++++++++++++++++++++++++----------------
net/bridge/br_if.c | 2 +-
net/bridge/br_notify.c | 2 +-
net/bridge/br_private.h | 4 +-
4 files changed, 70 insertions(+), 39 deletions(-)
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index a48f5ef..1e63f76 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -77,10 +77,26 @@ static inline void fdb_delete(struct net_bridge_fdb_entry *f)
br_fdb_put(f);
}
-void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr)
+static bool another_port_has_addr(const struct net_bridge_port *p,
+ struct net_bridge_fdb_entry *f)
+{
+ struct net_bridge *br = p->br;
+ struct net_bridge_port *op;
+
+ list_for_each_entry(op, &br->port_list, list) {
+ if (op != p && is_etherdev_addr(op->dev, f->addr.addr)) {
+ f->dst = op;
+ return 1;
+ }
+ }
+ return 0;
+}
+
+void br_fdb_changeaddr(struct net_bridge_port *p, struct net_device *dev)
{
struct net_bridge *br = p->br;
int i;
+ struct netdev_hw_addr *ha;
spin_lock_bh(&br->hash_lock);
@@ -92,26 +108,23 @@ void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr)
f = hlist_entry(h, struct net_bridge_fdb_entry, hlist);
if (f->dst == p && f->is_local) {
- /* maybe another port has same hw addr? */
- struct net_bridge_port *op;
- list_for_each_entry(op, &br->port_list, list) {
- if (op != p &&
- !compare_ether_addr(op->dev->dev_addr,
- f->addr.addr)) {
- f->dst = op;
- goto insert;
- }
- }
-
- /* delete old one */
- fdb_delete(f);
- goto ...Same problem than a previous patch Jiri. You should not use rcu_read_lock()/rcu_read_unlock() at all in this context, --
On Wed, 6 May 2009 16:46:25 +0200 Won't this all break spanning tree which expects 1:1 relationship between address and port. --
From: Stephen Hemminger <shemminger@linux-foundation.org> Indeed, this could be a fundamental issue with this change. --
When in mode alb, add all device addresses which belong to an enslaved slave
device to the bond device. This ensures that all mac addresses will be
treated as local and bonding in this mode will work fine in bridge.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
drivers/net/bonding/bond_main.c | 23 ++++++++++++++++++++++-
1 files changed, 22 insertions(+), 1 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 99610f3..4025dd0 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1385,6 +1385,11 @@ static void bond_setup_by_slave(struct net_device *bond_dev,
bond->setup_by_slave = 1;
}
+static bool should_copy_dev_addrs(const struct bonding *bond)
+{
+ return (bond->params.mode == BOND_MODE_ALB);
+}
+
/* enslave device <slave> to bond device <master> */
int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
{
@@ -1510,6 +1515,12 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
*/
new_slave->original_flags = slave_dev->flags;
+ if (should_copy_dev_addrs(bond)) {
+ res = dev_addr_add_multiple(bond_dev, slave_dev);
+ if (res)
+ goto err_free;
+ }
+
/*
* Save slave's original ("permanent") mac address for modes
* that need it, and for restoring it upon release, and then
@@ -1527,7 +1538,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
res = dev_set_mac_address(slave_dev, &addr);
if (res) {
pr_debug("Error %d calling set_mac_address\n", res);
- goto err_free;
+ goto err_remove_dev_addrs;
}
}
@@ -1769,6 +1780,10 @@ err_restore_mac:
dev_set_mac_address(slave_dev, &addr);
}
+err_remove_dev_addrs:
+ if (should_copy_dev_addrs(bond))
+ dev_addr_del_multiple(bond_dev, slave_dev);
+
err_free:
kfree(new_slave);
@@ -1954,6 +1969,9 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
/* close slave before ...| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian | [RFC/PATCH] RE: git calls SSH_ASKPASS even if DISPLAY is not set |
| Junio C Hamano | Re: Rss produced by git is not valid xml? |
| Linux Kernel Mailing List | iSeries: fix section mismatch in iseries_veth |
| Linux Kernel Mailing List | ixbge: remove TX lock and redo |
