Updated to address Vlad's comment about storing the link-local IPv6 address in the bond/vlan structure, I no longer overwrite the existing address with a newer one. Also added missed locking for the inet6_dev. I don't see any way to address David Steven's comment since there is currently no NA tunable in the IPv6 code. --- This patch adds better IPv6 failover support for bonding devices, especially when in active-backup mode and there are only IPv6 addresses configured, as reported by Alex Sidorenko. - Creates a new file, net/drivers/bonding/bond_ipv6.c, for the IPv6-specific routines. Both regular bonds and VLANs over bonds are supported. - Adds a new tunable, num_unsol_na, to limit the number of unsolicited IPv6 Neighbor Advertisements that are sent on a failover event. Default is 1. - Creates two new IPv6 neighbor discovery functions: ndisc_build_skb() ndisc_send_skb() These were required to support VLANs since we have to be able to add the VLAN id to the skb since ndisc_send_na() and friends shouldn't be asked to do this. These two routines are basically __ndisc_send() split into two pieces, in a slightly different order. - Updates Documentation/networking/bonding.txt and bumps the rev of bond support to 3.4.0. On failover, this new code will generate one packet: - An unsolicited IPv6 Neighbor Advertisement, which helps the switch learn that the address has moved to the new slave. Testing has shown that sending just the NA results in pretty good behavior when in active-back mode, I saw no lost ping packets for example. -Brian Signed-off-by: Brian Haley <brian.haley@hp.com> ---
Brian,
How do you feel about doubling up with dad_transmits for that
part?
At least the switch/cache updating portion is the same
in both cases.
+-DLS
--
I don't really want to since this is bonding-specific behavior, and we're not performing DAD for the address. This is just another sysfs I don't think they are. In the DAD case we're probing for another node that might have the address configured, in the bond failover case we want to update the switch quickly so we don't drop packets. -Brian --
I think they really are the same case, and doing DAD
would solve the problem just as well. But I can hold my nose a
little bit and live with it. :-) Getting the problem solved is
more important than the details.
+-DLS
--
If I'm reading things correctly, DAD sends neighbor solicitations, and we're sending neighbor advertisements, and not running the DAD logic. As a semi-related question, what does IPv6 do if it receives a gratutitous NA, and finds a duplicate? I agree that doing DAD would update the switches, peers, etc, but, if I'm reading the IPv6 code correctly, the delay between probes is one second (nd_tbl.retrans_time), and it looks like there's an initial delay of up to 1 second as well (in addrconf_dad_kick, the rtr_solicit_delay). For failover purposes, we want to issue the gratuitous ARP or NA packets immediately with a minimal delay between probes. Is my understanding of the DAD behavior correct? -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com --
If a node has an IPv6 neighbor entry and receives an unsolicited NA it will change it's state to stale, forcing a re-lookup on the next Right, and since a bond failover event should be rare, I think sending the NA immediately is OK. All those random delays are meant to cover the case, for example, when a router sends an advertisement with a new prefix - you don't want everyone that receives it to do DAD immediately since it could overwhelm the network. -Brian --
^^^^^^^^^^^^ You probably meant to say "solicited". Unsolicited NAs can only change the state to STALE. Also, the re-lookup will happen on a delay after the transmit. --
I think Jay was asking about the case where the target address in the NA matches the address of the receiving interface. The RFCs don't describe how to handle such a case and leave it to the implementation. Linux logs a warning message and ignores such NA. Thanks --
Yes, but in this case, you have duplicate addresses configured. This can happen when subnets merge and isn't really related to bonding driver. In such a case, if we do an NS, it will trigger an NA and we'll end up logging such a warning. If we do an NA, the other end, if it's linux, will log this warning, or deal with it in its own manner. So, it's really a draw. --
Looks good to me. This provides a nice base that we can build on. -vlad --
Hi Jay, Did you ever get a chance to look at this patch? Now that net-next is open I'd like to resubmit it. Thanks, -Brian --
I'm good with it; I'm merging it with a couple of other feature things and I'll post it with that set in a day or two when things open (unless I missed a mail, I don't think net-next is open as I type this). -J --- --
