Hi, I agree, however there's further things that mac-vlans aren't currently doing as virtual ethernet interfaces that real ones do. Unicast ethernet traffic sent out one mac-vlan interface with a destination address of another mac-vlan interface on the same host isn't delivered. mac-vlan interfaces, even though they're conceptually located on the same ethernet segment, are currently isolated from each other for unicast traffic. Thinking about the few scenarios where I've been using mac-vlans, it seems to me that the fundamental goal of mac-vlans is to be functionaly equivalent to having multiple physical ethernet network cards attached to the same LAN segment, with those physical interfaces attached together via a switch or a hub. Virtualising that scenario using mac-vlans and a single physical interface means that the outbound mac-vlan code on a host needs to replicate the functionality of a switch or a hub. Outbound unicast traffic from one mac-vlan interface needs to be either seen by the destination address matching code of all other mac-vlan interfaces (i.e. replicating hub functionality), or selectively forwarded to the intended mac-vlan interface (replicating switch functionality). If the unicast destination doesn't match one of the other mac-vlan interfaces, then it should be forwarded out the mac-vlan interfaces' parent physical interface, as the destination is likely to be on the physical ethernet. Blindly forwarding the outbound mac-vlan origin traffic out the physical interface doesn't solve the problem - switches (and I'm pretty sure at least functionaly, hubs) don't forward traffic back out the interface they receive it on. So the hub/switch forwarding functionality between mac-vlan interfaces and their parent physical interface needs to happen within the kernel. I've been thinking of having a go at adding this functionality. But then I realised that that would just be more duplication of the functionality of the code that is aleady in the kernel - the ...
At least for my use, having them all blindly TX is fine. For thousands of interfaces, if you did this right and also delivered all broadcast packets locally (ie, ARP), you will cause a lot of overhead, and unless you are running a patched kernel (or namespaces perhaps), you can't really communicate with yourself over the network anyway using IP. For the behaviour you want, try adding pairs of VETH interfaces and add one end of the veth's to the bridge. Add a physical port to the bridge for egress. Since this can be done, I don't really see any reason to change mac-vlan significantly... If the veth/bridge thing doesn't work, then let us know, as I think that would be a bug. I use a similar-to-veth virtual-device pair in this way and it works fine. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com --
There is one scenario in which macvlans totally beat bridging veth devices. macvlans support the full set of stateless hardware offloads that the hardware supports. Whereas veth device support none of them. I don't think changing macvlans makes a lot of sense. Beyond the pain of making it work, there are the semantic differences of local broadcast working. Doing something so that bridges have roughly the same performance as macvlans would be very nice. I think it requires advertising most if not all stateless hardware offloads, and then implementing them in software on the endpoints that don't support them. I did get as far as implementing a first draft at looping packets back locally and behaviour difference for broadcasts and multicast differences made macvlans a bad fit. For clean code something like the bridge code where you don't use the original interface directly for sending and receiving traffic seems required. Eric --
On Sat, 07 Mar 2009 10:13:16 -0800 So then, my question is, what are mac-vlans for i.e. what is their common use case? The problem I was trying to solve was to run up an arbitrary number of PPPoE servers on a single LAN segment. I could do that with physical interfaces, however I only had a maximum of 4 ethernet interfaces in the host. Using mac-vlans seemed to be the obvious way to eliminate the physical constraints of the host. I did expect though that the mac-vlan virtual interfaces would work the same real interfaces, so I was expecting that I could bridge them and that unicast traffic between them would work. If bridged veth pairs is a solution to my problem, what would I use mac-vlans for? Ben seems to have a use case, however I know that he does network testing, so some of his use cases could be quite uncommon. I do some odd things with networking occasionally, so the mac-vlan behaviour / limitations might be useful, but I'm curious if there would be some more common uses? Regards, mark. --
Doesn't pppoe always talk to an upstream box (the pppoe-server)? If that is so, why would the local mac-vlans ever need to communicate directly to eachother? We've used pppoe on mac-vlans, and it *seemed* to work, but perhaps we were missing something... I think they might also be useful for adding a more realistic 'virtual ip' to an interface, perhaps for interesting routing setups. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com --
Hi Ben, On Sun, 08 Mar 2009 09:54:02 -0700 That's true, and in that sense, I was 'lucky' that I didn't encounter this limitation in my test environment, but it was only because of my test environment (i.e. protocol), didn't require unicast communication between the mac-vlan interfaces on the same host. If I'd been testing some other protocol that did require unicast communications between macvlan interfaces, I would have been scratching my head, wondering why things weren't working correctly, and may have spent a lot of time thinking that my test setup was the thing that had failed, rather than being caused by missing functionality that I assumed was there. I think the "truth in advertising" or "principle of least surprise" should hold - if mac-vlans are to be seen by their users as virtual ethernet network cards, then they should function no differently to real network cards, or alternatively, if they aren't going to function that way, there needs to be quite obvious documentation somewhere (other than this thread) about their limitations, and a corresponding recommendation that you use bridged veth interfaces if you can't afford --
I agree on most points. There is one fundamental operational difference however. With macvlan, all MAC addresses are known are therefore can be programmed as secondary unicast addresses, while a bridge always uses promiscous mode and for unknown addresses needs to flood forward them. This could be changed in the bridging code of course for bridges consisting purely of local devices. Most of the bridging stuff isn't needed for macvlans though, so its probably easier to simply perform a lookup for local devices in macvlan on transmit, similar to what is done on reception. --
What I haven't figured out is how you handle the transmit path for
broadcast and multicast ethernet traffic. How do you test to see if
you have already preformed local transmission?
For discussion but not for application because it is incomplete:
This is what I came up with when I played with getting the local
transmission case working the other day.
From 15e4a58ae0cea86338ef9d73ae14ba32e4819f5a Mon Sep 17 00:00:00 2001
From: Eric Biederman <ebiederm@xmission.com>
Date: Thu, 5 Mar 2009 07:46:10 -0800
Subject: [PATCH] macvlan: Reflect macvlan packets meant for other macvlan devices
Switch ports do not send packets back out the same port they came
in on. This causes problems when using a macvlan device inside
of a network namespace as it becomes impossible to talk to
other macvlan devices.
Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com>
---
drivers/net/macvlan.c | 92 ++++++++++++++++++++++++++++++++++++-------------
1 files changed, 68 insertions(+), 24 deletions(-)
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index b5241fc..eb2539f 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -29,6 +29,7 @@
#include <linux/if_link.h>
#include <linux/if_macvlan.h>
#include <net/rtnetlink.h>
+#include <net/xfrm.h>
#define MACVLAN_HASH_SIZE (1 << BITS_PER_BYTE)
@@ -61,7 +62,8 @@ static struct macvlan_dev *macvlan_hash_lookup(const struct macvlan_port *port,
}
static void macvlan_broadcast(struct sk_buff *skb,
- const struct macvlan_port *port)
+ const struct macvlan_port *port,
+ struct net_device *src)
{
const struct ethhdr *eth = eth_hdr(skb);
const struct macvlan_dev *vlan;
@@ -77,6 +79,9 @@ static void macvlan_broadcast(struct sk_buff *skb,
hlist_for_each_entry_rcu(vlan, n, &port->vlan_hash[i], hlist) {
dev = vlan->dev;
+ if (dev == src)
+ continue;
+
nskb = skb_clone(skb, GFP_ATOMIC);
if (nskb == NULL) {
dev->stats.rx_errors++;
@@ -99,20 ...I'm not sure I understand the problem. Whats wrong with doing the same as on transmit, i.e.: - for multicast/broadcast, deliver everywhere (except self) - for unicast, deliver to matching local macvlan device or Pretty much like this :) --
Yes. There are two tricky parts. One problem is that macvlans and the primary hardware device share the same transmit queue. So when I have a broadcast packet on the primary devices queue I don't know if I have already sent it out to the macvlan devices or not. The second problem is that when I transmit a multicast packet and I have a local listener. I believe replicating the packet both at the ip layer and at the ethernet layer will result in receiving the packet locally twice. I'm not certain we need to solve the second problem as having two physical interfaces plugged into a switch will have the same problem. The first problem is all about how do we deliver packets everywhere except self. Eric --
So its about receiving packets on macvlan when transmitting on the real device? That sounds like a really hard problem that would probably indeed be better solved by a bridge. If its just about whether the packet should be sent out by macvlan to the wire as well, I'd say yes since thats what two real devices I think "except self should just mean "not to the originating virtual device". --
Yes. My concern is that if we hook the real device we will software broadcast packets twice. Now that I think about it we could call ndo_start_xmit directly from the macvlan code, and bypass whatever hook we use to intercept packets going out the normal device it should not be too difficult. Operationally it would be very nice if arp worked between a macvlan and the real device. Eric --
It would also require an additional hook in the networking core, We don't intercept packets on TX, they have to be explicitly delivered to macvlan. --
It might suck for performance, but mac-vlan could register an 'ALL' protocol on the physical dev, similar to tcp-dump, to grab pkts on tx and pass the ones it cares about back up to the vlans? I'd want run-time control to disable any of these costly options for those that don't need it, however. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com --
I like that idea. At least for prototyping. If well implemented it should not be more expensive than the ingress path where we already have, and where we already do that. Unless your traffic is highly assymmetric. Eric --
Well, the ingress path isn't free, and especially for broadcast pkts it is quite expensive with large numbers of devices. In a single namespace implementation, there are very few uses for having two NICs on the same system able to send to each other since an un-patched kernel will not do IPv4 traffic between two external ports, and multicast loops back in software already. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com --
If you want a local listener to see the packet you have to set IP_MULTICAST_LOOP. A packet from a local source address coming off the "wire" will be dropped in fib_validate_source(), it would have to come over lo. I'm not sure how that relates to how macvlan works, just something I've run into before (2 nics in same subnet, listener on one, sender on other, IP_MULTICAST_LOOP=0, no packets). -Brian --
A flag could be added to the skb so that we know it originated from a mac-vlan. That shouldn't require any extra hooks but just an extra check in the mac-vlan rx code to drop any pkt received from the underlying NIC with this flag set. For broadcasting (or unicasting) to other mac-vlans or the underlying physical device, the mac-vlan tx logic could check for local delivery before telling the lower-level NIC to transmit the pkt. Since we already have a mac hash, we could probably key off of the dest MAC fairly easily. For broadcast, it would be a flood. We could also add a flag to mac-vlan tx logic to be clever and only send ARP to the mac-vlans likely to care. This might not be a good filter for all possible cases, but for general cases, and thousands of mac-vlans, it would save a lot of work cheaply. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com --
