Re: Network Device Naming mechanism and policy

Previous thread: Re: network traffic stop with 2.6.29 after ftp put by Marco Berizzi on Tuesday, March 24, 2009 - 8:41 am. (1 message)

Next thread: 2.6.29 forcedeth hang W/O NAPI enabled by Mr. Berkley Shands on Tuesday, March 24, 2009 - 8:28 am. (1 message)
From: Matt Domsch
Date: Tuesday, March 24, 2009 - 8:46 am

You may recall http://lkml.org/lkml/2006/9/29/268, wherein I described
network device enumeration and naming challenges, and several possible
fixes.  Of these, Fix #1 (fix the PCI device list to be sorted
breadth-first) has been implemented in the kernel, and Fix #3 (system
board routing rules) have been implemented on Dell PowerEdge 10G and
11G servers (11G begin selling RSN).

However, these have not been completely satisfactory.  In particular,
it keeps getting harder and harder to route PCI-Express lanes to
guarantee the same ordering between a depth-first and breadth-first
walk, and it turns out, that isn't sufficient anyhow.


Problem:  Users expect on-motherboard NICs to be named eth0..ethN.  This can be difficult to achieve.

Ethernet device names are initially assigned by the kernel, and may be
changed by udev or nameif in userspace.  The initial name assigned by
the kernel is in monotonically increasing order, starting with eth0.
In this instance, the enumeration directly leads to an assigned name.

Complications:

1) Devices are discovered, and presented to the kernel for name
   assignment, based on several factors:

   a) the kernel hotplug mechanism emits events for udev to catch, to
      load the appropriate driver for a given device.  The kernel
      emits these events in some ordering, tied to the depth-first PCI
      bus walk.  Therefore the order in which userspace catches these
      events and starts to load a given device driver is tied to the
      depth-first bus walk.  There is no guarantee within PCI-Express
      hardware topology of any ordering to the discovery of devices.

      To ease this complication, SMBIOS 2.6 includes a mechanism for
      BIOS to specify its expected ordering of devices, for naming
      purposes.  Tools such as biosdevname use this information.


   b) udev may run modprobes in parallel.  It guarantees that the
      events and modprobes are begun in order, but makes no guarantee
      that one event's modprobe ...
From: Patrick McHardy
Date: Tuesday, March 24, 2009 - 9:21 am

I would classify this as a bug, especially the fact that udev doesn't
undo a failed rename, so you end up with ethX_rename. Virtual devices
using the same MAC address trigger this reliably unless you add
exceptions to the udev rules.

You state that it only operates on one device at a time. If that is
correct, I'm not sure why the _rename suffix is used at all instead
of simply trying to assign the final name, which would avoid this
problem.
--

From: Kay Sievers
Date: Tuesday, March 24, 2009 - 9:28 am

This is handled in most cases. Virtual interfaces claiming a
configured name and created before the "hardware" interface are not

How? The kernel assignes the names and the configured names may
conflict. So you possibly can not rename a device to the target name
when it's name is already taken. I don't see how to avoid this.

Thanks,
Kay
--

From: Patrick McHardy
Date: Tuesday, March 24, 2009 - 9:38 am

I don't remember the exact circumstances, but I've seen it quite a few

Sure, you can't rename it when the name is taken. But what udev
apparently does when renaming a device is:

- rename eth0 to eth0_rename
- rename eth0_rename to eth2
- rename returns -EEXISTS: udev keeps eth0_rename

What it could do is:

- rename eth0 to eth2
- rename returns -EEXISTS: device at least still has a proper name

Alternatively it should unroll the rename and hope that the
old name is still free. But I don't see why the _rename step
would do any good, assuming only a single device is handled at
a time, it can't prevent clashes.
--

From: Dan Williams
Date: Tuesday, March 24, 2009 - 9:40 am

Any particular reason the MAC addresses are the same?  This came up a
while ago with the 'dnet' device in the thread "Dave DNET ethernet
controller".

If the MAC address isn't a UUID for the device, then *what* is?

If there isn't one, then certainly udev can't be blamed for getting
ordering or names wrong, because there's nothing to use to actually
match up the device to a name, uniquely.  Note that combinations
including bus IDs or device positions in the bus don't work for any type
of hotplug case, because you can plug another adapter into the same
location but it's a different adapter.

Either people want (a) a name assigned to a specific device (which
implies a UUID like a MAC address stored on that device somewhere
accessible to the driver at plug/boot time), or they want (b) to assign
a name to a *position* on the PCI or USB or firewire or whatever bus, or
they (c) don't care about this at all.

The answer is really 'all of the above'.  Most of the people Matt cares
about are probably in the (b) camp.  But most desktop/laptop users are
in the (a) camp because they use hotplug so much.


--

From: Alan Cox
Date: Tuesday, March 24, 2009 - 10:00 am

> If the MAC address isn't a UUID for the device, then *what* is?

MAC is technically per system if desired (eg old Sun boxes) and that is
quite valid by IEE802.3. In that case you need MAC + topology.

If you are running DECnet your system runs on assigned MAC addresses so
you also have to be careful to use the EPROM MAC (if one exists which is

I'd argue the fumdamental problem is that I can do this

	ln -s /dev/sda /dev/thebigdiskunderthefridge

but cannot ln -s /dev/eth0 /dev/ethernet/slot0

and the SIOCGIF/SIF BSD style ioctl interface doesn't do pathnames or
file handles of network devices.

Anyone feel up to putting all the network devices into dev space and
fixing the ioctls ;)
--

From: Patrick McHardy
Date: Tuesday, March 24, 2009 - 10:04 am

Sometimes (I was referring to virtual devices) there may not be

I agree that udev can't do anything useful in that case. I would
prefer it it wouldn't even try though instead of messing with the
names and leaving a bunch of _rename devices around. Sure, I can
add a rule to disable it, but that shouldn't be necessary.

Generally, I'm wondering whether it should touch virtual network
devices at all since the MAC addresses are often not persistent,
sometimes not unique and the name might have already been chosen
explicitly by the administrator when creating the device.

Currently there are some rules to ignore a couple of known virtual
devices types. Are there actually cases where renaming virtual
devices is desired? Otherwise a more future-proof way than
blacklisting each type individually would be to add some attribute
informing udev that the device has no unique key and should be
ignored.

--

From: david
Date: Tuesday, March 24, 2009 - 11:51 am

I have seen systems (I think they were Sun boxes) where the _machine_ had 
a MAC address, and it used that same MAC on all interfaces.

this is convienient for some things, but not for others.

what's unique and reproducable is the discovery order

David Lang
--

From: Alan Cox
Date: Tuesday, March 24, 2009 - 2:02 pm

Not in the case of things like USB...
--

From: Greg KH
Date: Tuesday, March 24, 2009 - 4:14 pm

Or even PCI.

/me pats his laptop that reassigns PCI device ids randomly every 3rd or so boot.
--

From: Scott James Remnant
Date: Tuesday, March 24, 2009 - 10:02 am

Also bear in mind that a module completing init() does not necessarily
mean that the interfaces have been created.  If the driver requires
firmware, it will call out to userspace, and may not register the
interface until well afterwards.

One could even construct a pathological case where only a virtual device
was registered, and userspace was required to add logical interfaces
Well, the obvious fix to this is to make sure the names are always
Actually udev handles this by using a temporary name.  When renaming
eth0->eth1 it actually uses an intermediate name first.  This allows it
to simultaneously swap eth0<->eth1 since one unblocks the other
(actually both unblock each other).

There is a failure case where two devices both end up trying to get the
same name, in which case one will lock with a "_rename" name.  There was
an early debate in Ubuntu when we first wrote this code about using
later names (eth2, eth3, etc.) but we realised that just hides the
problem (and it happens again if you plug in a pccard or something that
wants eth2).

Since this is always a bug, making the problem visible was a "good
While this works for PCI slots, it already doesn't scale to other buses.
For example what slot number is the pccard slot?  If you have two
different pccard devices, would they get assigned the same name (udev
currently assigns them different names).

Now consider USB.  Would the device name change depending on which USB
port you plugged it into?  Or is USB just a single slot, in which case
what happens when you have two USB ethernet devices?

The Apple USB Ethernet device in my iPhone is not the USB Wireless
adapter I own, both have very different networking configurations.

I quite liked the idea of /dev/eth0, then we could just use symlinks.

Scott
--=20
Scott James Remnant
scott@ubuntu.com
From: Matt Domsch
Date: Tuesday, March 24, 2009 - 10:52 am

actually biosdevname handles this already, using eth_pccard_X.Y where

we would obviously need a solution.  eth_usb_{something} perhaps.

-- 
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux
--

From: Bill Nottingham
Date: Tuesday, March 24, 2009 - 11:12 am

Right, but having biosdevname chase each new bus that comes along
sounds iffy. I'd prefer /dev/net/by-name symlinks, if at all
possible. But that's a lot of code that I'm not prepared to write.

Bill
--

From: Scott James Remnant
Date: Tuesday, March 24, 2009 - 11:20 am

Not to mention that All The World Is Not x86

Scott
--=20
Scott James Remnant
scott@ubuntu.com
From: Karl O. Pinc
Date: Tuesday, March 24, 2009 - 9:42 am

My thoughts on the subject; from someone who is not
particularly qualified to have opinions.

Reading over your post, I searched for a single sentence describing
the problem you're trying to solve.  What I came up with was
this:


Perhaps a little magic in the udev rule that creates the
z70_persistent-net-rules file would solve the basic problem.
It could sort the nics by mac address when creating the
names.  It need only run when the z70 file does not exist.
I presume this would produce consistent results in most cases
and it feels technically feasible; although I am not
fully qualified to make that judgment.

Rather that put the onus on udev to make the above
change Dell could just run a little program at first
boot that mungs the z70 file as desired.  (It could then
force a reboot; I forget if this would be needed.)
I imagine Dell boots the boxes once at the factory,
but if not then the user has to suffer with a longer
boot process at first boot.  Because this is driven
by Dell, Dell would know exactly what nic has what
name.  And Dell knows what nics are on the mobo and
what are not, and so can control the mac address sort
order as desired.

The other solution that screams out at me is to ditch
those legacy BIOSes and go to something like LinuxBIOS.
Again, I'm not really qualified, but it sure feels like
there's an answer in this approach.

The other point that struck me was that sometimes, it seems,
users want persistence in the naming of their network devices
and sometimes they want device names based on bus position.

The sucky thing is that symlinks and nics don't mix well
and so it seems impossible to satisfy both the above
requirements at the same time.  This is an area that
IMHO could be better addressed by the Linux community.

Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                  -- Robert A. Heinlein
--

From: Matt Domsch
Date: Tuesday, March 24, 2009 - 10:45 am

nearly all dell systems running linux in the world were not
factory-installed with that os.  this isn't something i can simply
patch in our factories.  it needs to be fixed as far upstream as

well, there is no "mac address sort" anywhere.  (nor is that really a

It's not a BIOS problem.  BIOS can inform the OS of what it thinks
about hardware location, names, etc.  And our PowerEdge (9G and newer)
servers do - using SMBIOS 2.6 standard features we added (types 9, 10,
and 41) to the specification - exactly to allow such.  Now something
needs to use that information.  That something today is biosdevname,


correct.

-- 
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux
--

From: david
Date: Tuesday, March 24, 2009 - 11:49 am

I dispute this statement.

I have several hundred servers that have the on-motherboard NICs as the 
last ones.

anyone who's been making the assumption you describe will have been 
running into problems for many years.


not everyone uses udev. I compile the nessasary drivers into the kernel 


this approach causes serious problems in a few cases, including

1. a NIC goes bad and you replace it. now all the configs change

2. you reinstall a box and it's interface names change.

David Lang
--

From: Matt Domsch
Date: Tuesday, March 24, 2009 - 12:22 pm

I agree it's not a valid assumption.

People seem to want two things with names:
1) that devices be named deterministically
2) that the determinism doesn't change on a per-platform or
   per-configuration-of-a-platform basis.

This tends to mean they want the onboard devices named first, then the
add-in devices named.  But not necessarily.  I would hope to have a
deterministic naming method that would work for most people by

Right.  These cases are only deterministic because they start from a
known state; change or remove that state, and you're back to
non-deterministic.

-- 
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux
--

From: David Miller
Date: Tuesday, March 24, 2009 - 3:57 pm

From: Matt Domsch <Matt_Domsch@dell.com>

I learned a long time ago that eth0 et al. have zero meaning.

If the system firmware folks gave us topology information with respect
to these things, we could export something that tools such as
NetworkManager, iproute2, etc. could use.

For example, if we were told that PCI device "domain:bus:dev:fn" has
string label "Onboard Ethernet 0" then we could present that to the
user.

Changing how the actual network device name is determined is going to
have zero traction.

So, please, put mapping tables into the ACPI or similar and then
programs can go:

	for_each_network_device(name) {
		fd = open(name);
		label = get_system_label(fd, name);
		present_to_user(label, name);
	}

This "get_system_label()" thing can be an ethtool ioctl, some
rtnetlink call, or similar.  In the kernel, a generic routine would
exist for major bus types to make the mapping translation, and drivers
would call these.

For PCI it might take the PCI device pointer and try to fish
out a string from the ACPI layer.

For OpenFirmware we might just simply give the full device path,
or a matching device alias name.

That's the only model which allows a smooth transition and
no major infrastructure changes.

I guess it's easier to spew about MAC addresses and other
irrelevant topics than try to solve this problem properly. :-)
--

From: Chris Friesen
Date: Wednesday, March 25, 2009 - 1:22 pm

What about things like USB network adapters where the topology is not 
fixed?  Presumably we would want to use some sort of unique identifier, 
and the MAC comes to mind.  Of course, then you run into the problem of 
how to deal with duplicate MACs.

Chris
--

From: Dan Williams
Date: Thursday, March 26, 2009 - 1:17 pm

USB devices do have a serial number field in the descriptors, but that
only sometimes gets populated with sensible values.  More often than not
it's just zeros.  But worth checking if the MAC isn't set yet.

Dan


--

From: Matt Domsch
Date: Thursday, March 26, 2009 - 9:39 am

Your wish is my command.  DMTF SMBIOS 2.6 specification
http://www.dmtf.org/standards/smbios/ contains changes which provide
this for PCI devices.

Specifically, Type 9 ("System Slots") was extended to include the PCI
domain/bus/device/function for each slot.  Type 10 ("On Board Devices
Information") could not be extended, thus it was deprecated, and new
Type 41 ("Onboard Devices Extended Information") was created to be
extensible and now includes PCI domain/bus/device/function
information.  Both Type 9 and Type 41 include a String field which
hopefully has a more descriptive value, such as "Onboard Ethernet
Broadcom 5808 NIC 1" in the case of some Dell servers.

Shipping Dell 10G (and very soon 11G) server BIOS includes this
information.  biosdevname can use this to report device names.  Some
HP systems have a vendor-specific SMBIOS extension to provide a

While I'd be happy for NetworkManager to present these SMBIOS-provided
human-parsable names when available, the names aren't terribly
meaningful in a programatic fashion.  The users I've encountered are
looking for a programatic way to say:

  The first LOM is my management/admin NIC.  The second LOM is my bulk
  traffic NIC.  The first add-in card is my backup NIC.

meaning we still need a translation from "how I want to use a NIC" to
"which NIC should I plug the cable into".  The SMBIOS names don't
completely solve this.

Hence my desire of having a way to have multiple alternate names for
the same interface.  One such name would be the full SMBIOS string.
Another would be a bus topology name. A third could be a "how do I use
it" name.  Analogous to devices represented in /dev using symlinks for
these other names.  I don't care if it's symlinks in /dev or some
other mechanism.

Thanks,
Matt

-- 
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux
--

From: Dan Williams
Date: Thursday, March 26, 2009 - 1:16 pm

nm-applet could support some sort of "named" adapters, though I'd rather
have this done with udev rules (or something like that) so that the
NIC's common name would be consistent in both the CLI and in the GUI.

The only reason nm-applet does what it does now (pulling VID/PID and
dropping stupid words like "Corporation") is so the user has *some* clue
what NIC they are about to touch; using "eth0" and "eth1" and "eth2"
isn't very helpful.  But the distinction between "Intel Gigabit
Ethernet" and "D-Link 10/100 USB Adapter" is quite a bit easier to grasp
at a glance.


--

From: Len Brown
Date: Friday, March 27, 2009 - 9:06 am

ACPI added _PLD (Physical Device Location) back in 3.0, ISTR.
However, searching my archives, I have yet to see a single instance
of its use in the field.

ACPI also supplies the slot number stuff, which is exported via 
the existing pci_slot driver.

cheers,
Len Brown, Intel Open Source Technology Center
--

From: Matt Domsch
Date: Thursday, April 9, 2009 - 7:58 am

David, would you be opposed to the additional device names being done
as device nodes in userspace, as several people suggested?

/sys/devices/*/net/ifindex already exports the netlink device index.
It would be trivial to add a /sys/devices/*/net/dev file, with
<major>:<minor> for a device, where <minor> = ifindex.

Then udev could then maintain /dev/net/by-{mac,path,...} as symlinks
to /dev/net/$kernelname.

Tools such as iproute's 'ip' could then be extended to look up their
'dev' argument by /dev path, resolve the symlink to name, get the device node, and
open the socket with the minor number / index (as normal).

Thanks,
Matt

-- 
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux
--

From: Kurt Van Dijck
Date: Tuesday, March 31, 2009 - 7:07 am

My idea as a user, having configured some servers:


with kernel point of view, there should be no preference. If users
the problem here is the monotonic increasing order. I never rename ethX
back to the monotonic ethX numbering. IMHO, renaming eth0 to eth1 sounds
redundant.
I rename ethx to lan, wan, wlan, remote, lan0, lan1, ...

--

Previous thread: Re: network traffic stop with 2.6.29 after ftp put by Marco Berizzi on Tuesday, March 24, 2009 - 8:41 am. (1 message)

Next thread: 2.6.29 forcedeth hang W/O NAPI enabled by Mr. Berkley Shands on Tuesday, March 24, 2009 - 8:28 am. (1 message)