Re: (Semi-random) thoughts on device tree structure and devfs

Previous thread: removing aiboost(4) as redundant by Constantine Aleksandrovich Murenin on Friday, March 5, 2010 - 11:47 pm. (9 messages)

Next thread: test wanted: module plists by David Holland on Sunday, March 7, 2010 - 7:37 pm. (2 messages)
From: Masao Uebayashi
Date: Sunday, March 7, 2010 - 2:43 am

I've been spending LOTS of time to investigate various devicess sources, to
understand some questions I've had, like:

- Why NetBSD/arm has no bus_space_mmap(4)?
- Why tty locking is messy?
- Why sys/dev/wscons has so many #ifdef's?  (Modular unfriendly!)

After absorbed myself 3 days now, I think I've figured out almost all of
problems I've had and how I can fix these.  Before going directly to the
answer, let me summarize problems I've found:

*

a) Device enumeration is unstable / unpredictable

dk(4) is a pseudo device, and its instances are numbered in the order it's
created.  This is fine when you manually / explicitly add wedges(4) by using
"dkctl addwedge".  This is not fine, if I have a gpt(4) disk label which has
ordered partitions.  I expect disks to be created in the order I write in
the gpt(4) disk label.  It's annoying the numbering changes when I add a new
disk.  Same for raidframe(4).


b) Consistent device topology management is missing

The reason why NetBSD/arm has no bus_space_mmap(9) has turned out to be the
fact that we have no consistent (MI) way to manage physical address space of
devices.  NetBSD/mips has a working bus_space_mmap(9) in
sys/arch/mips/mips/bus_space_alignstride_chipdep.c.  It defines address
windows and manage it by itself.

Who wants to reimplement it on all cpus/ports/platforms?  Considering physical
address space is a pretty much simple concept - a single linear address space.
And we already manage (kind of) tree of devices in autoconf(9).  Do we want
to manage such a topology in many places?  No.


c) Control / data flow is unclear

I've never remembered what wscons command/device to configure wscons to add
screen, load font, change encoding.  It's a total mess.  I don't know how
the ioctl I send via wscons command is delivered to device.  Same for data.
Even by looking at sys/dev/wscons.  Why it it so complicated?

Our tty locking code has so many hacks.  See grep XXX sys/kern/tty*.  And we
have to fix all serial ...
From: Christoph Egger
Date: Sunday, March 7, 2010 - 2:58 am

The good news:
dyoung@ and I started with prototyping pmem(9) that provides an MI
physical address space management.

The bad news:
We had no time to continue on this for more than a year now.

Christoph
From: Masao Uebayashi
Date: Sunday, March 7, 2010 - 5:02 am

What have you achieved?

Masao

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
From: Christoph Egger
Date: Monday, March 8, 2010 - 1:59 am

http://www.netbsd.org/~cegger/pmem2.diff
May not cleanly apply against -current.

Christoph
From: Masao Uebayashi
Date: Monday, March 8, 2010 - 2:20 am

Ah.  You provide API for MD codes.  (I thought you wanted to provide a new
API for drivers, and was about to write a compllainment. :P)

I wonder if we want to make device memory allocation very smart, using vmem(9).
It's done in very low level; very probably handled in bootstrap code.  We
don't want to introduce unnecessary dependencies of kernel subsystems.

Masao

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
From: Joerg Sonnenberger
Date: Sunday, March 7, 2010 - 3:12 am

I don't think this is a problem by itself. With devfs I would normally
expose the symlinks or so based on the label or UUID in the GPT, but
that functionality is missing.

Joerg
From: Masao Uebayashi
Date: Sunday, March 7, 2010 - 5:09 am

Identifying device by id would be useful, but I don't like those are exposed
as path, like /devices/iommu@f,e0000000/sbus@f,e0001000/..., which is too
complicated IMO.

What I'm thinking of is to have a file showing device class specific
information, like disk0.info, which would work like procfs.

	find /dev/mainbus0 -name 'disk*.info' -print | \
	while read f; do
		grep -q 'the-guid-i-am-looking-for' $f && echo found: $f
	done

Masao

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
From: Masao Uebayashi
Date: Sunday, March 7, 2010 - 5:31 am

ioctl would be called via <device>.ctl file in devfs.  devfs can lookup the
device_t instance by the opened vnode.  device_t points to its device class
data, which in turn points to [bc]devsw entry.  This means that device major
can go.

device_t will be more important because it represents device nodes shown
in devfs tree.  I think power hooks added by pmf(9) (?) should move out of
there, because it makes device_t's responsibility ambiguous.

What I'm thinking is to make device_t "inherit" behaviors, like

	- bus
	  - where devices and bridges attach
	- bridge
	  - owns address windows, bus is attached
	- addressable
	  - have bus_addr_t
	  - parent is also addressable)
	- device or psuedo
	- read device (like azalia) / device function (like audio)

Probably ifnet could be merged with device_t.

Masao

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
From: Jukka Ruohonen
Date: Sunday, March 7, 2010 - 9:16 am

I already pointed this out on port-i386@, but theoretically this sounds like
a good thing for the ACPI side of things where "pseudo" (ACPI) devices need
to be matched with "real" devices.

As an example: one thing that holds back the ACPI CPU code I am working on
is that I need to be sure that e.g. cpu3 that attaches to acpi0 is the same
cpu3 that has attached to mainbus0. So:

	/dev/acpi0/cpu3
	-> /dev/mainbus0/cpu3

No idea how this works in practice (frankly, I find autoconfiguration scary).

- Jukka.

PS. Another example is the Grégoire Sutre's work with ACPI display devices;

  http://mail-index.netbsd.org/tech-kern/2010/01/20/msg006983.html
From: Quentin Garnier
Date: Sunday, March 7, 2010 - 1:18 pm

Well, the answer to that is simple:  there should only be one device.
Anything design that doesn't produce that result can go to thrown out
the window without further delay.

--=20
Quentin Garnier - cube@cubidou.net - cube@NetBSD.org
"See the look on my face from staying too long in one place
[...] every time the morning breaks I know I'm closer to falling"
KT Tunstall, Saving My Face, Drastic Fantastic, 2007.
From: Jukka Ruohonen
Date: Sunday, March 7, 2010 - 3:06 pm

In the above example it would be "acpicpu3 at acpi0" and "cpu3 at mainbus0".

But as you know quite well what is involved, I am merely pointing out that
the current situation holds back many possibilities. And noting that I don't
have the competency to do anything about it.

- Jukka.
From: Quentin Garnier
Date: Sunday, March 7, 2010 - 1:50 pm

On Sun, Mar 07, 2010 at 06:43:49PM +0900, Masao Uebayashi wrote:

You're barking up the wrong tree.  What's annoying is not that the
numbering changes.  It is that the numbering is relevant to the use of
the device.  I expect dk(4) devices to be given names (be it real names
or GUIDs), and I expect to be able to use that whenever I currently have
to use a string of the form "dkN".

code.

What kind of user do you talk about here?  If it's the end user, then

Wrong.  Device numbers should be irrelevant to anything but operations



This has nothing to do with what devfs is about.  If your idea of devfs
is that the user should know the whole device path to access a hard
drive, you have strange ideas about simplicity.

Beside, imagine you move said hard drive from one port to the other (or
on to another, say, faster controller);  the ultimate idea of devfs is
that the device node for the hard drive doesn't change.

Not that full, explicit device paths aren't something useful to expose
one way or another to the userland.  It's just not what devfs is about,

Again, users shouldn't have to care about device numbering.  With your
idea of numbering, the way to access a device should change depending on
which USB port I put my usb key drive in?  I fail to see how this is
better than what we have now.


Are you really just discovering that wscons needs a lot of love?  It's
old news.  The problem is that nobody wants to deal with that mess and
the ensuing binary compatibility nightmare.


Really?

It seems to me that you are really confused, about a number of things.
Out of those, the most important is what the user experience should be,
so let me be clear on this:  the end user should never, ever, ever deal
with monstruosities like a full device path.

And device paths are not devfs, okay?

--=20
Quentin Garnier - cube@cubidou.net - cube@NetBSD.org
"See the look on my face from staying too long in one place
[...] every time the morning breaks I know I'm closer to ...
From: Johnny Billquist
Date: Sunday, March 7, 2010 - 5:56 pm

Is it just me who find this whole ide ironic?

Have everyone forgotten how to set up their own kernel? Is everyone now 
booting GENERIC? (Or just making a copy of GENERIC, with a few patches 
without understanding what they are editing?)

The whole point being that if you boot a kernel, in which you have 
configured the whole system to connect anything anywhere, you should not 
be surprised if the device enumeration might seem random.
If you want predictable device enumetaion, you can have that, and have 
been able to have that for over twenty years...

The line
wd*     at atabus? drive ? flags 0x0000

(to use one example) says that match any wd type disk to any unit number 
on any atabus, without doing any closer matching. Ie. kindof unpredictable.

The asterisks and question marks means exactly that. If you want 
predictable matching that stays the same at every boot, no matter what 
hardware you put on the system, you write explicit lines in the config 
instead.

Jeezuz! How have we fallen to these lows? Trying to make a filesystem 
that shows the hardware configuration, with absurd, long and silly 
paths, which is pretty useless anyway, since if we just move the disk 
the slightest, we lost it anyway.
For basically no gain in functionality, a lot of new mess to deal with 
when managing the system, and a lot of work...

I can see a point in having a way to express a specific disk, based on a 
disk label instead of the hardware, since that would actually be useful.

The idea suggested by Masao looks to me like a lot of cruft that will 
break away even farther from the original simplicity of Unix, for ny 
actual gain.

But I guess I'm a grumpy old fart, who thinks so already anyway.

NetBSD... A system that used to be better...

(Do I need to say that I agree with Quentin?)

	Johnny



-- 
Johnny Billquist                  || "I'm on a bus
                                   ||  on a psychedelic trip
email: bqt@softjar.se             ||  Reading murder ...
From: Quentin Garnier
Date: Sunday, March 7, 2010 - 7:01 pm

How exactly does hard-wiring a kernel helps with some of the issues
described here?  Say you have two USB drives, and plug them in a
different order in different ports (which defeats all config(5)

Yes, it is random, and should be considered as such.  That doesn't
mean, however, that it is impossible to somehow locate device in a
constant way regardless of how they attached.  I know that poeple have

There is a lot to be gained from providing a useful binary distribution
of NetBSD.  That includes a kernel that people don't have to play with
in order to make it useful.

Grumpy old farts will always compile their own kernel and do their own
thing, but fortunately I don't think it is a goal for anybody in the
NetBSD community to be useful only to grumpy old farts.


Are you still positive about that?  I am certainly not advocating the
status quo.

--=20
Quentin Garnier - cube@cubidou.net - cube@NetBSD.org
"See the look on my face from staying too long in one place
[...] every time the morning breaks I know I'm closer to falling"
KT Tunstall, Saving My Face, Drastic Fantastic, 2007.
From: Eric Haszlakiewicz
Date: Sunday, March 7, 2010 - 9:41 pm

I often run GENERIC kernels.  In fact, the only reason I'm not running it on
the machine I'm on now is because for some reason one of the devices I wanted
was commented out (spdmem).  Given what I need the machine to do, GENERIC works
fine, and I'd rather spend my time on more useful things (like spouting random

I definitely agree that this is important criteria to keep in mind.  It's
easy to get distracted by things that look cool, even if they don't work.
I know I've gone down that path many times.

eric
From: Masao Uebayashi
Date: Monday, March 8, 2010 - 12:47 am

Imagine if I want to use a USB disk as / on my DELL OptiPlex 745.  The device
tree of that machine looks like:

/mainbus0
  /pci0
    /puhb0
      /agp0
    /ppb0
      /pci0
	/vge0
	  /ukphy0
	/vga0
	  /wsdisplay0
	  /drm0
	/uhci0
	/azalia0
	/ppb0
	  /pci0
	/ppb1
	  /pci0
	/uhci1
	/uhci2
	/uhci3
	/uhci4
	/ppb2
	  /pci0
	/ichlpcib0
	  /isa0
	    /lpt0
	    /com0
	/piixide0
	  /atabus0
	    /wd0
	  /atabus1
	    /atapibus0
	      /cd0
	/ichsmb0
	/piixide1
	  /atabus0
	  /atabus1

How do you write a kernel config which can always identify my USB disk as
sd0a, even if I plug random devices?

Masao

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
From: Johnny Billquist
Date: Monday, March 8, 2010 - 7:02 am

It would help if you started by showing where your disk would be in the 
device tree. Then I can tell you what (more or less) you need in your 
config file.

USB, or whatever else, is no magic. You can specify explicitly where 
your disk is, and have it show up with a specific device number even 
with other devices attached anywhere.

I seem to remember that way back there was even a tool (in pkgsrc?) 
which extracted your current device setup, and created a config file 
from that, so that you would always get the same enumeration, no matter 
what else showed up on the machine.

The point is, the config file totally, and exactly describes your 
hardware setup. Your suggestion would simply mean that this information 
would be duplicated in the file system.
The config file have the additional "feature" of actually making the 
device appear with the same name, even if you move it around, by just 
changing the config file. Everything else in the system will not have to 
be told after that. And the names exposed, and referred to, are simple 
and short, even though you do have the full tree described in the config 
file.

Someone else mentioned that the problem have grown for the simple reason 
that hardware configurations change much more often now than in the 
past. I would agree with that. However, for vital pieces of the 
hardware, the setup normally don't change that much (such as the disks 
normally used by the system).

So, the device configuration and enumeration is only random so far as 
that if you tell the system that it is okay to give a device a random 
number, it will actually possibly do that.
Otherwise it is totally predictable.

If you, on the other hand, do move your disk around (be that by using 
USB and different ports and hubs, or different controllers), neither the 
old config, nor your new solution will help. The disk will change 
identity (or path) (well, with the old config, it might actually keep 
it's identity, but that's a chancy proposition at ...
From: Masao Uebayashi
Date: Tuesday, March 9, 2010 - 11:27 am

"More or less" doesn't meet my criteria.  Consider mission critical
use cases.  My devfs is not meant only for hobbysts.

From: Johnny Billquist
Date: Tuesday, March 9, 2010 - 12:51 pm

"More or less", because I don't have all the details. If you were to 
post the dmesg from your booting, I could give you the exact thing.

Are you sure your USB disk shows up as sd? Looking at the config file, I 
would have thought it would match wd.

If it is wd, then the config should have something along these lines:

wd0 at umass0
umass0 at uhub0 port 0 configuration 0 interface 0
uhub0 at usb0
usb0 at uhci0
uhci0 at pci1 dev 1 function 0
pci1 at ppb0 bus 0
ppb0 at pci0 dev 0 function 0
pci0 at mainbus 0 bus 0


Obviously I've thrown in a bunch of "0" here, where there probably 
should be something else, as well as a "1", since you have two pci buses 
involved already at this point.
That's the "more or less" part. Now, if you don't understand the concept 
based on this, then I don't think putting correct numbers in here is 
going to help much more either. The basic idea though, is that this will 
always cause the same disk to be wd0, and no other disk will ever become 
that. No matter what hardware you add, or where.

	Johnny

From: Quentin Garnier
Date: Tuesday, March 9, 2010 - 1:05 pm

What you really don't seem to understand is that this answers only
half of the contract.  Put the drive in another USB port and it
doesn't show up as wd0.

The idea was that only that disk would show up as wd0, and would
always show up as wd0.  (Incidentally, wd@umass is very rare.  I think
it was only some old Archos.)

--=20
Quentin Garnier - cube@cubidou.net - cube@NetBSD.org
"See the look on my face from staying too long in one place
[...] every time the morning breaks I know I'm closer to falling"
KT Tunstall, Saving My Face, Drastic Fantastic, 2007.
From: Johnny Billquist
Date: Tuesday, March 9, 2010 - 1:14 pm

But you miss my other half point. In Masao's original idea, he 
complained about device enumeration being random, and wanted it moved 
out into the filename namespace.
But if you move the device to another port, it will move just as much 
within the file system, so he didn't solve anything.
If you want to be able to refer to a disk in the scenario where you 
actually move it around, you need some other solution.

My answer only intended to show that the device enumeration isn't 
random, depending on if you add/remove other devices, which is what 
Masao was claiming.

His original claim, and reason for his proposed solution, is basically 
wrong.

The problem you are highlighting is another one, and one which I agree 
it would be nice to have a solution to. But the only solution I can come 
up with is to be able to refer to disks by their name in the disk label, 
or something similar, which is unique per disk, and have no relationship 
at all with which how they are attached to the system.

Something like:

wd0 at umass? label="foobar"

But, as I said, this is another problem, which Masao hasn't at all 
addressed. His solution to his random device enumeration problem is 
simply a solution to a non-problem.

I hope I made myself clear, since I sometimes seem to not be able to 
express clear enough what I mean.

	Johnny
From: Quentin Garnier
Date: Tuesday, March 9, 2010 - 1:41 pm

Your answer only says that device enumeration is deterministic.  Nobody
said that it wasn't.  I know autoconf(9) hasn't aged very well, but it's
not that bad.  Yet.

--=20
Quentin Garnier - cube@cubidou.net - cube@NetBSD.org
"See the look on my face from staying too long in one place
[...] every time the morning breaks I know I'm closer to falling"
KT Tunstall, Saving My Face, Drastic Fantastic, 2007.
From: Steven Bellovin
Date: Tuesday, March 9, 2010 - 2:46 pm

I've had problems with SATA drive enumeration due to weird BIOS issues.  I'd rather have had my configuration driven by what was on the disk.


		--Steve Bellovin, http://www.cs.columbia.edu/~smb





From: Johnny Billquist
Date: Tuesday, March 9, 2010 - 5:39 pm

Masao said exactly that. Or rather, that it wasn't possible to get the 
same device number for a specific device facing other changes in the 
hardware configuration.

	Johnny

-- 
Johnny Billquist                  || "I'm on a bus
                                   ||  on a psychedelic trip
email: bqt@softjar.se             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol
From: Iain Hibbert
Date: Wednesday, March 10, 2010 - 1:56 am

So, you want to be able to mount a disk by the label:

  $ mount -t msdosfs -o label "foobar" /external_disk_foobar

or, if you know the UUID

  $ mount -t msdosfs -o uuid 3478374923723423 ~/thumb_drive

What I'm asking is, why does the "device node" need to be deterministic
and why is this a 'devfs' problem?

The "special" argument to mount(8) does not really need to be a device
node, it could find the right one on its own by checking hw.disknames and
scanning the disklabels..

iain


From: Johnny Billquist
Date: Wednesday, March 10, 2010 - 6:36 am

That would be equally acceptable. As long as it'n not just mount, but 
also fsck, and whatever else that deals with disks on a low level 

The device node only need to be deterministic to the point that you can 
predictably get the same configuration on every boot. /etc/fstab is a 
typical example of a critical piece, that depends on this.

Think boot time - your system starts by doing some fsck on the disks in 
fstab, and then mounts them. If suddenly another disk gets the id of 
your root disk, the bootup will fail miserably until you fix fstab.

And having this fixed in the config/kernel seems like a much easier 
proposition than to make all possible potential tools that needs to be 
aware of this. Even tools that you might now know about, or even 
originate totally outside of NetBSD.

Since there is no really standardized library to access the raw devices, 
most programs simply just open a device node. How do you, at that point, 
make it find the right device node, with a disk that matches a 

Yes. But it is not only mount that would need to be fixed.

	Johnny
From: Greg A. Woods
Date: Wednesday, March 10, 2010 - 12:33 pm

At Wed, 10 Mar 2010 08:56:36 +0000 (GMT), Iain Hibbert <plunky@rya-online.n=
et> wrote:

Yes, something like that, using fs_volname of course.  I've wanted this
kind of feature for decades.

And of course all the other filesystem tools should have this interface
as well.  It's no good if it's not uniformly usable.  newfs and tunefs
need to be able to set and change fs_volname to start with.  Disk tools
could be made to work with disk label names too for added fun, but let
us not confuse fs_volname with pack names, disklabel names, etc.

Naturally this should not replace the use of the device file, but rather
be added in addition to it, as an optional way to specify the ultimate
device used to access the filesystem.

In fact I'd much rather see lots of work go into this feature than into
anything even remotely related to devfs.

BTW, we don't want to end up with the horrid mess some GNU/Linux systems
now use when their kernel config's specify root=LABEL=xxx -- I think we

I think UUID's, as I understand them so far (fs_id, right?), are really
too fragile, too meaningless and difficult to read, and too dangerous,
to use for this purpose.

They are not actually unique, to start with, so labelling them so is
just plain wrong.

Search google for Russell Coker's discussion on Label vs. UUID.

Filesystem volume names can be said to have many of the same problems,
except to start with we know and understand that they're not unique
right off the bat, and we can assign human meaning to them and make them
memorable.

Let's at least get filesystem access by volume names working right, then
we can go on to think about other things, if they still seem worthwhile.

--=20
						Greg A. Woods
						Planix, Inc.

<woods@planix.com>       +1 416 218 0099        http://www.planix.com/
From: Masao Uebayashi
Date: Wednesday, March 10, 2010 - 6:22 pm

While I understand usefulness of human-readable labels, I don't think
it should be handled in kernel.  Because labels are arbitrary.  They
are not ensured to be unique.

I think labels should be resolved by some name service.  It's not
different than /etc/hosts -> IP address.

Masao
From: Greg A. Woods
Date: Thursday, March 11, 2010 - 3:34 pm

At Thu, 11 Mar 2010 10:22:29 +0900, Masao Uebayashi <uebayasi@gmail.com> wr=
ote:

The fs_id value is _NOT_ going to be any more unique than the fs_volname
value.

The fs_id value is also not guaranteed to be unique to start with,
especially not across the operational lifetime of a filesystem.

There are a plethora of ways the fs_id can be duplicated, and just about
as many ways for it to get lost (or changed without change control) too.

Sure, labels are arbitrary -- at least to the machine.  They are not,
necessarily, arbitrary to the human who creates them though.

In any case the label doesn't have to be _guaranteed_ to be unique to be
useful to both the human and the machine.

Also, the filesystem identifier doesn't have to be a meaningless lengthy
string of impossible to memorize sequences of digits to be useful to the
system either -- a human created, human meaningful, label can be just as

Sorry, but I'm flabbergasted!   What the heck does that mean in this
context of filesystem identification?

Do you really want to add more complexity, goo, and mess, and places for
errors to happen by adding a translation layer?

First off, there's really nowhere to store your magical mappings.

K.I.S.S.  Please!

We do have a place to store a human readable/meaningful filesystem
identifier.

Let the human provide this label.

If the system finds duplicate labels then tell the human which devices
have conflicting labels and where those filesystem were last mounted and
let the human decide which device should be used.  (i.e. the labels do
need to be unique for a successful automatic initialisation of the
system, but there needs to be a manual way to work around them not being
unique regardless of what data they consist of)

In my opinion the fs_id value is truly useless anywhere outside of the
on-disk storage of a single filesystem copy where its sole valid use is
(IIUC) to help to match valid backup superblock copies.  The fact I'm
not even sure it's safe or sane to ...
From: Masao Uebayashi
Date: Saturday, March 13, 2010 - 11:08 pm

I want to simplify path namespace.  I want labels and other
"referencial" informations to be accessed via file, like procfs's
doing.

# cat wd0/.info

Masao
From: Masao Uebayashi
Date: Tuesday, March 9, 2010 - 7:31 pm

One of the problems is that such a long term user like you have to
know the full detailed dmesg and analyze it.  That doesn't meet my
goals.  Imagize admins hot-swap multiple disks/NICs on missiong
critical servers.

Masao
From: Masao Uebayashi
Date: Tuesday, March 9, 2010 - 9:14 pm

And you have to disable configuration other PCI buses to prevent
unwanted USB devices from appearing.  You also have to rebuild kernel.
 Even all of these done, your system "more or less" works.

Masao
From: Johnny Billquist
Date: Wednesday, March 10, 2010 - 6:06 am

Not sure what you mean here.
If you don't want "unknown" devices to appear, then just don't have 
wildcarded devices in the config.
If you want "unknown" devices to actually do appear, then you have to 
have the wildcard entries in there. But they will not get assigned to 
numbers for which you have explicit entries in the config. So they will 
be assigned "unused" numbers.

No kernel rebuilding is neccesary.

Maybe you should state more explicitly what your scenario is, and what 
you expect to happen?

When the system boots up, I assume you want some set of devices to 
always get the same enumerations, no matter what other hardware 
might/might not exist.

This is done by explicitly naming those devices in the config file.

Devices which are more "unknown" can either be accepted, and accessed by 
the system, if you keep wildcarded devices around in the config.
Exactly what number gets assigned to each device as it shows up, will be 
"kindof random". But since these are devices not normally expected, they 
can't really be predicable anyway.
Or if you never want the system to accept totally unknown devices, just 
remove all the wildcarded device entries in the config. That way, if 
someone plugs in a new disk, or whatever, it will not be accessible by 
the system.

Any other scenario you had in mind?

Oh, and notice how the kernel is never rebuilt. You build the kernel 
once, with the configuration you expect, and then you just run it the 
whole time.

	Johnny
From: Masao Uebayashi
Date: Wednesday, March 10, 2010 - 6:19 am

You built non-GENERIC in the first place.

Masao
From: Johnny Billquist
Date: Wednesday, March 10, 2010 - 5:57 am

Two things comes to mind here:
1) Hot-swapping disks and so on have nothing to do with interpreting 
dmesg, or setting up a configuration. The configuration should already 
have been done, and working.

2) As I mentioned before - I know I have seen a program which will spit 
out the config neccesary to actually get a static setup in place, based 
on the current configuration. So, if you have managed to get a setup 
that is correct just now, you can basically "snapshot" it, and you'll 
get the same setup every time after that. And it does not take an 
"expert" to just use that program.

So I can't say that this should be a problem. Basically, we already 
today have a way of getting a predictable device enumeration, which is 
repeatable, even in the face of other random changes to the hardware. So 
I don't see the point in why you want to change this. And moving it into 
a filesystem makes it awkward, more difficult in some ways, and in short 
is just a bunch of work that gives nothing. Wouldn't it be better to 
spend that energy on something that actually will buy us something?

And of course I have to know the full details. Just as I would have to 
know the full details to know the path in your filesystem, if that were 
to reflect the hardware configuration. How would you know where to find 
your disk in the file system if you didn't know exactly all the buses 
and instances that lay between the root and your disk?

Hmm, I just realized that I didn't completely follow how your file 
system design will even solve which controller gets which path. Maybe it 
was in your original mail, but I have forgotten that detail in that 
case. If you have two disk controllers on one bus, how do you decide 
which is "0", and which is "1"?

	Johnny
From: Masao Uebayashi
Date: Wednesday, March 10, 2010 - 6:38 am

Physical location has to be known by drivers in some way.  Bus drivers
are responsible to probe devices & enumerate them precisely.
Otherwise those buses and their children are not predictable.  What
I've in mind is like:

        /dev/.../pci0/pcislot0/isp0/...
        /dev/.../pci0/pcislot1/isp0/...
        /dev/.../pci0/pcislot2/isp0/...
        /dev/.../pci0/pcislot3/isp0/...
or
        /dev/.../pci0/isp0/...
        /dev/.../pci0/isp1/...
        /dev/.../pci0/isp2/...
        /dev/.../pci0/isp3/...

Masao
From: Joerg Sonnenberger
Date: Wednesday, March 10, 2010 - 6:46 am

That program was recently retired from pkgsrc because it hasn't really
worked for ages, was marked as for NetBSD 1.5 only and noone cared since
then.

Joerg
From: Johnny Billquist
Date: Wednesday, March 10, 2010 - 6:50 am

Yikes. I AM an old fart. :-)
Since I don't use it I was just digging through old memories. 1.5 sounds 
about right.

	Johnny
From: Ted Lemon
Date: Monday, March 8, 2010 - 12:34 pm

You'd need to put the UUID in the kernel config.

From: David Young
Date: Monday, March 8, 2010 - 8:09 pm

I'd go further and say that we should be able to supply a set of device
properties (such as drvctl -p prints) to the kernel.  Let us match a
device by its intrinsic properties (MAC address, serial number, and/or
GUID), and set the unit number according to the device property.

Quentin is right that this *only* helps us to fix the unit number, but I
think that in itself is an important, *feasible* step forward.

Dave

-- 
David Young             OJC Technologies
dyoung@ojctech.com      Urbana, IL * (217) 278-3933
From: Iain Hibbert
Date: Tuesday, March 9, 2010 - 1:09 am

One thing that I think is problematic about trying to do that, is that you
might sometimes need to attach a device (allocate the unit number) in
order to discover its intrinsic properties. It can't always be done in the
attach routine because you might have to wait for a query (or several) to
return. For that reason, we should consider that the dv_xname is not
necessarily a useful tag.

(I say "device" rather than disk because I know that Bluetooth controllers
work this way - you can't get the BDADDR until it is up and running)

I have never used wedges but, for the disk case, would it not be better to
make a method of configuring a dk in advance, so that whenever a disk
appears with the correct parameters it will already be mapped to the dk
you expect? (perhaps a daemon could handle it) Then you know that /dev/dk3
is your USB stick and will never be anything else..

iain


From: Joerg Sonnenberger
Date: Tuesday, March 9, 2010 - 8:25 am

I don't think it has to be or should be in the kernel. Basically,
/dev/dk3 gets created or is used by the kernel. A daemon is notified
(*cough* udevd) and that scans the device properties, finds the UUID and
creates /dev/uuid/2345324523453245. It also finds the label and creates
/dev/label/my-usb-stick. The latter is what you put in /etc/fstab.

Joerg
From: David Young
Date: Tuesday, March 9, 2010 - 8:43 am

What if udevd is on /dev/uuid/2345324523453245 ?

Dave

-- 
David Young             OJC Technologies
dyoung@ojctech.com      Urbana, IL * (217) 278-3933
From: Joerg Sonnenberger
Date: Tuesday, March 9, 2010 - 8:48 am

The boot loader has a separate mechanism to pass down what is booted
from. That should be good enough for getting root mounted.

Joerg
From: Masao Uebayashi
Date: Tuesday, March 9, 2010 - 9:27 am

What do you mean?  How can you mount / on /dev/uuid/2345324523453245?

Masao

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
From: Steven Bellovin
Date: Tuesday, March 9, 2010 - 8:45 am

I agree completely.  In fact, we could do that now, with a simple rc.d script.

		--Steve Bellovin, http://www.cs.columbia.edu/~smb





From: Iain Hibbert
Date: Tuesday, March 9, 2010 - 9:06 am

Sorry I got confused - in your method, what is dk3 needed for?

What I suggested then, was a daemon that waits for a disk device to
appear, then it can probe the disk and configure the appropriate dk(4)
device. If it determines that the device is your USB stick then it
configures as dk3. Otherwise, just put it as eg dk7. The admin knows that
/dev/dk3 is normal and can arrange permissions accordingly so that you can
access it, but dk7 can be restricted (thats for the paranoid admin)

Then, I'm not sure why /dev/uuid/* and /dev/label/* would be necessary?
(sure, they would be 'nice' to have)

iain


From: Joerg Sonnenberger
Date: Tuesday, March 9, 2010 - 9:35 am

It is still the device, the rest are just symlinks.

Joerg
From: Iain Hibbert
Date: Tuesday, March 9, 2010 - 11:34 am

do you propose to do away with sd0a then?

iain


From: Joerg Sonnenberger
Date: Tuesday, March 9, 2010 - 11:41 am

That's a very good question. Right now you can decide whether you want
to use the disklabel approach or wedges. I don't think we need both.
I think having the full disk (raw) device ((r)sd0d) and compat symlinks
to the wedges is good enough and would help eliminate quite a bit glue
in the existing drivers.

Joerg
From: Thor Lancelot Simon
Date: Tuesday, March 9, 2010 - 9:43 am

And now anyone who can jack around with the userspace daemon process
can cause you to mount a filesystem you didn't intend to mount.

I think discovery of the identifiers used to mount devices needs to
be in the kernel.  We can do that already for RAIDframe and GPT; why
back away from it now?

Thor
From: Masao Uebayashi
Date: Tuesday, March 9, 2010 - 9:48 am

Ah.  I've been unaware of that.  Thanks for pointing it out.

Although I once said mknod /dev/id/... should be run in userland, now I
believe it should be in-kernel.  It's so simple.

What I don't want is to dig not-truely unique strings like labels.  That
makes devfs responsible to resolve confliction, which in turn leads to some
configuration thing, which I definitely want to avoid.

Masao

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
From: der Mouse
Date: Tuesday, March 9, 2010 - 9:53 am

You need some kind of persistent state *somewhere*, to support chmod,
chown, mv, rm, etc.  Or are you proposing to break those?  That idea
strikes me as a pretty crippling regression.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
From: Masao Uebayashi
Date: Tuesday, March 9, 2010 - 10:01 am

My devfs doesn't do that complicate thing.  Mine is more or less procfs +
kauth.

Masao

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
From: Eric Haszlakiewicz
Date: Tuesday, March 9, 2010 - 12:01 pm

Wow, that sucks.  Not being able to change permissions (and less importantly,
mv or rm the device files) would definitely be a problem.

eric
From: Masao Uebayashi
Date: Tuesday, March 9, 2010 - 12:04 pm

Could you show me use cases how it sucks?  I need more use cases.

Masao
From: der Mouse
Date: Tuesday, March 9, 2010 - 12:28 pm

That was my own reaction too, but, y'know what?  What Uebayashi-san
suggests is just fine as a research experiment, and, if it succeeds
there, on the road to production use it can grow such things.  NetBSD
is still a decent framework for minor OS research experiments like
that, and I think that's as it should be.

Of course, anyone who proposes to put it into NetBSD's main released
tree without support for such things should be shouted down.
Vociferously.  And thoroughly.  Lack of chmod/chown/etc in /dev would
be a total showstopper (for me, at the very least) for production use.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
From: Masao Uebayashi
Date: Tuesday, March 9, 2010 - 12:38 pm

Could you tell me the senario how it sucks?

Masao
From: Jochen Kunz
Date: Tuesday, March 9, 2010 - 1:32 pm

On Wed, 10 Mar 2010 04:38:03 +0900
I programm microcontrolers with a serial programmer. I use a serial
connection to the target microcontroler for debugging. So I want to
be able to read/write the serial port device node (e.g. /dev/tty03
or /dev/ttyU0) directely. But I don't want other users grant access to
my serial devices. So I chown the device node to user jkunz and make it
read/writable by that user only.

The Linux devfs solved this problem with an init-script, that changed
ownership and modes after each reboot. Looked a bit awkward to me when
I had to deal with it.

Non-persistent ownership and modes of device nodes is a show stopper.
-- 


tschüß,
       Jochen

Homepage: http://www.unixag-kl.fh-kl.de/~jkunz/

From: der Mouse
Date: Tuesday, March 9, 2010 - 1:52 pm

That's one of the scenarios I've run into, though for me it's at least
as often been something other than a serial line - on my scanner
machine, for example, /dev/scanner is a symlink to /dev/uk0, and
/dev/uk0 is mode 600 owner mouse.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
From: Manuel Bouyer
Date: Tuesday, March 9, 2010 - 2:16 pm

I also have the same requirements, for some various kind of devices
(serial, USB, even some disk devices)

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--
From: Masao Uebayashi
Date: Tuesday, March 9, 2010 - 7:53 pm

Is it acceptable for you to do such things by some layering?  I don't
know how to do that exactly yet, but the point is you need a little

A bit?  I thought udevfs's config file is totally inacceptable for
missiong critical embedded purposes.  Why do we have to learn a new

Sure.

Masao
From: Jochen Kunz
Date: Wednesday, March 10, 2010 - 12:54 am

On Wed, 10 Mar 2010 11:53:43 +0900
Masao Uebayashi <uebayasi@gmail.com> wrote:

Anything else but "chown jkunz.users /dev/ttyU0" is awkward to me.
So what ever solution for this problem is introduced (udevd,
rc.script, ...), I have to learn a special way to configure device
node ownership and modes on devfs.

There could be some devd(8), that listens for ownership and modes
on devfs and stores it on disk. At next boot it could reconstruct
I talked about the long gone Linux devfs, not udev. But it doesn't
matter. Anything else then chmod(1) and chown(8) needs to be learned.
So it doesn't matter what new stuff I have to learn to get it done.

BTW: My dayjob is to programm an embedded system running Linux.
A (small) part of my work is to fight with udevd. And those systems
are used in mission critical applications.
-- 


tschüß,
       Jochen

Homepage: http://www.unixag-kl.fh-kl.de/~jkunz/

From: Masao Uebayashi
Date: Wednesday, March 10, 2010 - 2:01 am

Fair enough.

After some thinking, providing "traditional" view and persistent bits
turns out to be not that difficult.

/dev has a few reserved directory (like /dev/id).  You have no freedom
there.  Any access other than that goes to devfsd.  It has knowledge
equivalt to sys/arch/*/conf/majors.* as reference.  And it tracks
mknod(2), rename(2), etc per-mount point.

When you do mknod(/dev/wd0a); rename(/dev/wd0a, /dev/woah0a);
open(/dev/woah0a), devfsd resoves it by using DBs and converts it to
something like /dev/default/wd0a and pass it back to kernel.

You have to shutdown cleanly, otherwise you lose DB.


Are you happy about that? :)

Masao
From: haad
Date: Wednesday, March 10, 2010 - 4:44 am

This seems as wrong approach to me. I was in contact with mjf@ when he
designed his devfs and I think that his approach was not the best but
reasonable. We do not want to have any sort of static major number
definition everything should be dynamic. There should be some sort of
config file which describe what should devfsd do when it receive event
from kernel. e.g. if usb key with uuid abc was inserted create
/dev/usb_work_key device.

AFAIK. last version of devfsd was able to handle dynamic major numbers
and configuring devices accordingly to config file(proplist).


-- 


Regards.

Adam
From: Masao Uebayashi
Date: Wednesday, March 10, 2010 - 5:21 am

Dynamic major might make sense in transition.  But it's not the goal.
My devfs looks up the device instance via struct device.  dev_t will
be no longer used.

devfsd maps major/minor recorded in filesystem to struct device
instances.  I don't see how dynamic major helps here...

Masao
From: Robert Elz
Date: Wednesday, March 10, 2010 - 7:03 am

Date:        Wed, 10 Mar 2010 21:21:21 +0900
    From:        Masao Uebayashi <uebayasi@gmail.com>
    Message-ID:  <70f62c5e1003100421s5c54035bkdee5917165b0104d@mail.gmail.com>

  | dev_t will be no longer used.

I'm not sure if something that blatant (unqualified) is actually what
you meant to say, but if it was, you cannot do that.

dev_t (however poorly designed you might think it to be) is a part of
the kernel/application API that has been there forever, and is not
going away any time soon.

Stuff like "find -x" needs to keep on working (that uses dev_t), and for
that matter, doing a backup of a device tree using cpio, then restoring
it later (however insane it really is to use cpio for this kind of
purpose) needs to keep on working.

How the kernel actually associates between drivers and code that needs
to access the drivers is a whole different issue, and if your plan is
just to replace usages like
	(*bdevsw[major(dev)].d_strategy)(...);
with something different, then that might be OK, but both the dev_t
and the minor() and major() macros to interpret it simply have to
remain.

Part of what I am seeing when reading this discussion is that it doesn't
appear as if any two participants have any real idea what the others are
talking about - everyone is focusing only upon their pet need or desire,
and no-one is really looking at the big picture.

I have no real opinion on how all this should be done, just two hints
for how a solution to whatever problem actually exists should be
investigated.   First don't start with, and certainly don't concentrate
on, disks - they're way too easy (complicated sure, but it isn't hard
to come up with solutions that seem to work for disks).  Probably even
network interfaces (not that we really treat them as devices anyway),
what you need to make work properly are devices like tape units, cd
readers & writers (with nothing loaded in them), serial line interfaces
(com ports, or tty devices), line printer interfaces (parallel ...
From: Joerg Sonnenberger
Date: Wednesday, March 10, 2010 - 7:41 am

The only property of dev_t that userland really cares is that it is a
number and that it is unique per device. That is fulfilled as long as

I don't disagree on this.

Joerg
From: Robert Elz
Date: Wednesday, March 10, 2010 - 8:32 am

Date:        Wed, 10 Mar 2010 15:41:44 +0100
    From:        Joerg Sonnenberger <joerg@britannica.bec.de>
    Message-ID:  <20100310144144.GB23857@britannica.bec.de>

  | The only property of dev_t that userland really cares is that it is a
  | number and that it is unique per device. 

For the vast majority of userland that's right (and what's more,
temporally unique - the same number can mean something entirely
different tomorrow, and generally nothing will care).

That is, except for cpio (and similar) - that actually has a portable
(ie: defined) format that includes the ability to store devices.

Now no-one sane would expect to be able to take a device from one
system to another (that is, a name of a device and its dev_t) and have
it work (usefully) on another system, so the sole practical use of
this ability is backup/restore (a function for which cpio is particularly
useless, but for which it is used nevertheless).  A shorter term
variant of the same thing is "cpio -p" to make a backup copy of a
filesystem (and all its device nodes, etc) - a function for which cpio
is not quite so useless.

For dump/restore we could alter the format in which devices are
represented, so that they could be correctly restored, no matter
what we do with them, but for cpio we don't really have that option,
and people do want to be able to get back their owner/modes for
device files, and have the right names apply to the right devices with
the right access permissions for the appropriate users - and (aside
from the name, used id, and modes) all that's available to indicate
what device is the dev_t.

kre

From: der Mouse
Date: Wednesday, March 10, 2010 - 8:22 am

dev_t was, and is, a kludge, to deal with devices in the relatively
primitive filesystem Unix used back in its early days (well, I think
they might have been ints then, rather than dev_t, but the difference
between the two is trivial).  It's good enough for most purposes and
its problems have been relatively minor so far, so it's survived, but
there's nothing sacred about it.

Everything starts somewhere.  I would never go near Uebayashi-san's
devfs in any of the incarnations described so far on a production
system.  But I think it's high time someone started thinking about, and
experimenting with, alternatives to traditional device nodes and device
numbers, and I'm glad to see this happening.  If and when this gets to
the point of being contemplated for production use, that is the time to
worry about compatability with historical practices and decide whether
historical compatability must be maintained or an incompatability is
acceptable.  There have been flag days before and there will be again;

find -x might need to keep working (or might not; if filesystem
mounting changes sufficiently it may no longer make sense, not that
changes filesystem mounting seem to me to be part of this).  There is
nothing that says it has to continue working with exactly the same
dev_t-based implementation it traditionally has - and even if it does,
that needs dev_ts only for st_dev fields, not for special device nodes
in the filesystem; the use of the same thing for both is a historical
accident that I see no particular need to preserve.  find -x cares
about dev_t only in the sense that it equates "a.st_dev==b.st_dev" with
"a and b are on the same filesystem"; something like indices into an
array of mount points, or event-of-mounting serial numbers, would work

Not given devfs.  With a devfs, backing up device nodes makes about as

A valid point.  But it could also be that nobody has thought radically
enough to come up with the interface that _is_ better - somewhat a la
"your idea is crazy, ...
From: Jochen Kunz
Date: Wednesday, March 10, 2010 - 10:49 am

On Wed, 10 Mar 2010 10:22:40 -0500 (EST)
Seconded. If you rework it, do it thoroughly. Wipe everything and start
by zero.

Somthing else comes to my mind: Kernel configuration and devfs
configuration interact closely. E.g. you can give the device
enumeration order in the kernel configuration by "nailing down"
devices. Now those symbolic kernel devices like com(4) need to be
assigned to a device node name in /dev.

Why separate those two? There should be a single configuration file
that configures kernel options like what device to search where and
what device node to assign to it. (+ permissions and ownership etc.)
This file is used to get the kernel default configuration at compile
time. Now this file should be passed to the kernel at boottime
optionally. Thus makeing the kernel reconfigurable at reboot. In
addition the in-core version of that file must be runtime alterable.
This way you can en-/disable device drivers at runtime, probably
resulting in the (un)load of a kernel module and creation or delition
of device nodes in /dev. The current kernel configuration can be dumped
to a file and passed to the kernel at next boot...

If you chmod(1) or chown(8) a device node in /dev, devfs updates the
in-core kernel configuration as chmod(2) and chown(2) get down to devfs.
At (clean) reboot kernel configuration gets dumped and reloaded.
Et voila, devfs with persistent permissions without a devfsd(8).
-- 


tschüß,
       Jochen

Homepage: http://www.unixag-kl.fh-kl.de/~jkunz/

From: Masao Uebayashi
Date: Thursday, March 11, 2010 - 6:37 am

chmod(2) / chown(2) are OK.  devfsd(8) also needs to track rename(4)
done in /dev.  (I never do like that, but people want...)

Masao
From: Johnny Billquist
Date: Thursday, March 11, 2010 - 7:32 am

I missed when Jochen wrote this, so I'll comment now.
This might sound tempting, but I don't think it is a good idea.
Keeping track of changes and trying to retain them over reboots is
risky. And the mappings need to be able to handle complex things, such
as several names pointing to the same device. And people using totally
different names. So, both renames, chmod, chown, unlink and mknods needs
to be tracked.

So, what we have basically done, at that point, is to reimplement what
we already have, but in a more complex way.

All for the sake of getting a default entry in there for a virgin
system? (Or when would this actually be helpful?)

In fact, even more complex - what do we do if someone removed a device
entry, for which a device exists? Do we keep track of it in that
database, marked as deleted then perhaps? Otherwise it would be
recreated at next boot? What about a new kernel? Should we wipe the
database? That might not be the right thing to do. Should we keep it?
That might also be right - after all, this is a new kernel... We might
have added some devices. Should they turn up or not?

Nah, I don't see any gains. Only losses. The current entries in /dev is
working better than this, in combination with MAKEDEV, which you can run
if there is something you do want to add which is missing, with default
values. After that, you can fool around with, and modify to your hearts
content, without anything unexpected happening under your nose when you
didn't expect it.

	Johnny

From: Masao Uebayashi
Date: Thursday, March 11, 2010 - 7:50 am

You're only pointing out that "managing static thing statically" is
easy.  Everyone already knows that.

What we're talking is what we don't have now.

Masao
From: Masao Uebayashi
Date: Thursday, March 11, 2010 - 8:35 am

Speaking of tracking state...  I've found that keeping track of state
in devfsd is very wrong.  It duplicates what filesystems already does.
 So what we need for emulating "traditional" view is a way to proxy
those state bits nicely (probably to tmpfs).

Speaking of persistency, I come to think it's totally *not* worth in devfs.

So users have two options:

- Traditional /dev
  - Fine grained access control
  - Persistent
    - Relying on UFS (or whatever)
  - Static configuration

- New /dev
  - Simplified access control
  - Volatile
  - Dynamic configuration

Masao
From: Adam Hoka
Date: Thursday, March 11, 2010 - 10:12 am

On Fri, 12 Mar 2010 00:35:24 +0900

Im wondering if it would be too hack-ish to make devfs file
backed (at least optionally, in case of early boot or read
only rootfs).

For example: mount -t devfs /etc/devfs.db /dev

So it could be persistant without a userland process.
Or is this something that would be complicated?

-- 
NetBSD - Simplicity is prerequisite for reliability
From: Joerg Sonnenberger
Date: Thursday, March 11, 2010 - 10:18 am

How is that different from mounting devfs and calling mtree next?

Joerg
From: Eric Haszlakiewicz
Date: Thursday, March 11, 2010 - 11:37 am

fwiw, that what I've always been assuming devfs would be doing.  It
needs *some* place to store the persistant state, regardless of how

mtree won't handle renames and whiteouts; otherwise pretty close.

eric
From: Adam Hoka
Date: Thursday, March 11, 2010 - 11:55 am

On Thu, 11 Mar 2010 18:18:54 +0100

Atomicity (or at least on-line updates to the database).

This could also enable you to do
  mount -t devfs /etc/devfs-bindchroot.db /var/chroot/bind/dev
for example. So you can have different permissions at a different location.

-- 
NetBSD - Simplicity is prerequisite for reliability
From: Eduardo Horvath
Date: Thursday, March 11, 2010 - 12:06 pm

I don't think a static database will cut it.  What happens when someone 
attaches a new USB stick and devfs generates a bunch of new nodes?  What 
ownership and permissions should they get?

Eduardo
From: der Mouse
Date: Thursday, March 11, 2010 - 1:09 pm

Presumably the database holds, among other things, information
specifying whether new nodes will appear in such a case, and, if so,
what ownership and modes they'll have.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
From: Masao Uebayashi
Date: Saturday, March 13, 2010 - 11:00 pm

My (current) idea is to expose all devices as perm 0000, then let
devfs promote those nodes.  As joerg said, this is kind of mtree(8),
in that:

- it should use well-established syntax
- it calls mknod(2) internally

It's not like mtree(8) in that:

- it can't hard-code paths.

We'll probably end up with some patterns, but let's not re-invent a new syntax.

Masao
From: Greg A. Woods
Date: Thursday, March 11, 2010 - 3:54 pm

At Fri, 12 Mar 2010 00:35:24 +0900, Masao Uebayashi <uebayasi@gmail.com> wr=
ote:

Indeed -- I do agree with that much at least!

I've had diskless systems running for a long while now (since 2003)
where /dev is created by init(8) on every boot (by running
/sbin/MAKEDEV, as I've renamed it).

In the extremely rare cases where I've wanted to change permissions or
similar on a device node I can just use the normal commands:

	chmod 666 /dev/tty001

and if I want to make such a change persistent across boots I just add
that exact same command to /etc/rc.local.

There's no magic needed.

I think the only key feature necessary is that devfs handle the normal
permissions and ownership changes, but to do so of course with no more
persistence than tmpfs, md. or mfs.

--=20
						Greg A. Woods
						Planix, Inc.

<woods@planix.com>       +1 416 218 0099        http://www.planix.com/
From: Manuel Bouyer
Date: Friday, March 12, 2010 - 12:22 pm

This wouldn't work very well for hot-plug devices.
As I understand it, nodes would be created at plug time, and removed at unplug
time (correct me if I'm wrong). So you would need to run you chmod
when your e.g. USB device is plugged (which is also the time at which you
know where it will how up in the device space).

Linux udev can handle this, and it's usefull (I've got do to such
special setups at work a few time already).

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--
From: Greg A. Woods
Date: Saturday, March 13, 2010 - 4:02 pm

At Fri, 12 Mar 2010 20:22:25 +0100, Manuel Bouyer <bouyer@antioche.eu.org> =
wrote:

Hmmm.... well, we have had "hot plug" devices of a sort ever since 1.6
or earlier (when I began using MFS /dev).....

The only magic trick there is to be able to predict all the possible
major and minor numbers at the time you write your MAKEDEV script, or at
least be able to update that script as necessary.  In the past this has
been sufficient, eg. with SCSI probe and scan detecting new devices.

However even that kind of magic really isn't truly necessary.

Indeed without devfs it could be as easy as the kernel to simply
spitting out a message saying that "a device at major N, minor Y" was
available to be used (when it was detected), and then leave it entirely
up to the user, or some agent of the user (eg. a script monitoring for
such messages), to run "mknod" as appropriate, and perhaps adjusting
permissions and ownerships at the same time, possibly even updating
/etc/MAKEDEV.local.  In fact I've wanted the kernel to tell me what
major/minor number(s) to use for new SCSI devices, though to some extent
the way MAKEDEV is written to use unit numbers, it works well enough.

Obviously there are other ways for the kernel to notify userland of such
events as device attach/detach besides having a script monitor
/dev/console output or kernel syslog messages.  Perhaps kqueue()
monitoring /dev itself is sufficient, though perhaps then only for a
"flat" file tree in /dev.

So, with a devfs implementation that creates the new /dev file node
automatically, the agent script could still be responsible for changing
permissions and ownerships as desired.

I.e. no magic for persistence of filesystem metadata is necessary in
devfs so long as there are ways to monitor for and handle events that
indicate changes have happened in the live state of devfs filesystem.

--=20
						Greg A. Woods
						Planix, Inc.

<woods@planix.com>       +1 416 218 0099        http://www.planix.com/
From: der Mouse
Date: Thursday, March 11, 2010 - 8:04 am

First, a note - I asked a Linux person I work with why the penguins
switched from devfs to udevd.  He said that it was a question of
pulling relatively complex policy issues out of the kernel into
userland, the stance being that things like "users in group pix should
be able to access any USB scanner or camera devices that may appear" do
not belong in the kernel.  I'm sure this forms an argument of some sort
for NetBSD's purposes, but I'm not sure which way.


For your use cases, yes, perhaps.  My use cases too, most of them at
least.  But there are other use cases (some of them reasonable, even :)
which the traditional /dev does not support well, such as the "I want
the disk with UUID xyz to appear at some fixed place regardless of
whether it's on SCSI, USB, firewire, bluetooth, or what" one that's
been mentioned upthread.  Those use cases, the ones /dev does not
handle well, are what are driving devfs.

It may be that a devfs is not a good way to handle them.  But /dev
definitely is not; I don't see much alternative but to keep trying
various things until someone finds something better.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
From: Masao Uebayashi
Date: Saturday, March 13, 2010 - 11:16 pm

Now it's obvious that we need an explicit "switch"; which we use,
either "static" (legacy, current behavior) or "dynamic" (devfs,
hot-plug friendly, but backward incompatible / standard inconformant).

Masao
From: Masao Uebayashi
Date: Saturday, March 13, 2010 - 10:51 pm

OK, this is something like, config exploring buses with the whole tree
image as a recipe.  IIUC this is exactly what ACPI needs.  You build a
whole tree from ACPI table, then enter configure(), build cfdata
on-the-fly and give it to *_attach().  Bus drivers may have to be
changed to pass its subtree to config_found()...

For permissions, they're probably going to be per-mount (== per-view).
 We should concentrate on physical topology / connection during
configure().

Masao
From: David Holland
Date: Thursday, March 11, 2010 - 5:11 pm

On Wed, Mar 10, 2010 at 10:22:40AM -0500, der Mouse wrote:
 > Everything starts somewhere.  I would never go near Uebayashi-san's
 > devfs in any of the incarnations described so far on a production
 > system.  But I think it's high time someone started thinking about, and
 > experimenting with, alternatives to traditional device nodes and device
 > numbers, and I'm glad to see this happening.

...twelve years ago. http://www.eecs.harvard.edu/syrah/vino/

Apart from the general problems with devfs as a concept (which I've
blathered plenty about in this and other discussions) based on that
experience there are some pertinent things I can say here:

(1) dev_t cannot go away, because a fairly fundamental guarantee in
Unix is that two files are the same if stat returns the same (st_dev,
st_ino) pair for each. Violate this semantic at your own risk.

(2) As Joerg (I think) already noted, it is perfectly sufficient to
just number devices as they're attached. There is no particular need
to give these numberings semantic significance, or make them
persistent across reboot. (Although for nfsd you need to check where
your NFS file handles are coming from.)

(3) It is also necessary that device nodes continue to appear as
device nodes to stat (S_IFBLK, S_IFCHR, etc.) because assorted
regrettable things happen if e.g. disk partitions appear to be regular
files. Given this, by far the path of least resistance is to fill
st_rdev with the same dev_t value already generated.

 > With that in mind, I'd say that the more radical Uebayashi-san's devfs
 > is, the less like past (failed) attempts at devfses it is, the more
 > likely it is to turn out to be a better way.  Eliminating (this use of)
 > dev_t is an example.

As the foregoing implies, VINO's devfs had no dev_t, or at least, no
semantic dev_t. I would still call it a failure; however, building it
did point out at least two important points in addition to the ones
above.

...

oh, why not.

(1) Attaching a device into devfs and ...
From: der Mouse
Date: Thursday, March 11, 2010 - 8:52 pm

This dev_t does not have to correspond, though, to anything else in the


Oh, they probably shouldn't appear to be ordinary files.  (I'm not
convinced they can't be; those "regrettable things" could be looked
upon as things needing fixing upon switching paradigms.)  They need
not, however, be traditional character or block device "files".
Indeed, I can't offhand see any reason why userland has to even be able
to tell whether two of them are the same or not (though it can help at
the human level in some cases); as long as opening one connects you to
the correct driver, they could be pretty much anything.  stat()
returning an st_rdev is another of those implementation details that is
not necessary but which people have trouble letting go of because
they're not willing to bite big enough bullets.

procfs and kernfs are examples of filesystems which illustrate that
it's possible to have a non-"device" entities in the filesystem which,
when opened, connect to specialized code.  Doing this with a devfs
might even involve creating a new type of filesystem entity (S_IFDEV,

Only at a very general level, the level of "new stuff appearing in the
filesystem", but at that level open(,O_CREAT,) also qualifies.  So do
other calls; perhaps most relevantly here, consider mknod() - some of
the ideas mentioned upthread have involved a userland daemon that

That actually does not follow.  Attempting to look up the name (as
opposed to doing something with an existing name) could be what
triggers the load.

Of course, that means that the name exists in some sense, but that
sense does not have to be one that's visible to userland (while you may
want an administrative interface that lets you see them, it is in no
way essential).

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
From: David Holland
Date: Friday, March 12, 2010 - 11:06 pm

On Thu, Mar 11, 2010 at 10:52:53PM -0500, der Mouse wrote:
 > > (1) dev_t cannot go away, because a fairly fundamental guarantee in
 > > Unix is that two files are the same if stat returns the same (st_dev,
 > > st_ino) pair for each.
 > 
 > This dev_t does not have to correspond, though, to anything else in the
 > system.

Not really, no, but it may as well be the same as what's in st_rdev.

 > > (3) It is also necessary that device nodes continue to appear as
 > > device nodes to stat (S_IFBLK, S_IFCHR, etc.)
 > 
 > No, actually.  See below.
 >
 > > because assorted regrettable things happen if e.g. disk partitions
 > > appear to be regular files.
 > 
 > Oh, they probably shouldn't appear to be ordinary files.  (I'm not
 > convinced they can't be; those "regrettable things" could be looked
 > upon as things needing fixing upon switching paradigms.)  

In the best case it's like when naive Linux users first encounter
/proc/kcore. The biggest obvious real problem is that you'll probably
end up with an extra copy of each disk on your backup tapes. You also
get programs that know to avoid device nodes tripping on various
special semantic properties some devices have, like blocking for
carrier opening ttys or rewinding tapes.

This issues could probably be fixed with attributes of some kind, but
"I'm a device" is after all exactly the right attribute...

Anyhow, I tried it and the other guys on the project made me revert :-)

 > procfs and kernfs are examples of filesystems which illustrate that
 > it's possible to have a non-"device" entities in the filesystem which,
 > when opened, connect to specialized code.

Oh sure, and sometime I should write up VINO's kernfs too (it was not
a failure) but these work out somewhat differently in practice. The
files in procfs and kernfs are for the most part semantically
equivalent to real files even when they're virtual or dynamically
generated. Devices frequently have other properties.

 > Doing this with a devfs might even ...
From: der Mouse
Date: Saturday, March 13, 2010 - 6:02 am

If there still is an st_rdev.  I see no particular reason that needs to

Disagree.  Writing to real files does not, for example, change the
system hostname or alter a process's registers.

In fact, that sounds a lot like the kind of dangers that inhere in

In terms of the end state achieved, neither do I.  But there can be
value in that programs that haven't been ported are more likely to
misbehave if they see a "name" (by which I mean S_IFCHR and S_IFBLK)
they think they know the semantics of but with different semantics than

In some respects.  But lurking under all this has been doing away with
st_rdev, which for some programs is a radical enoguh departure that a
new name is deserved.  (Others won't care, but I suspect most of them

I'm not sure I'd call a filesystem a "foreign object".  If that's fair,
then the filesystem namespace is _all_ "foreign object"s, and the

I'm not sure how fair it is to call it a "proxy object", any more than
an S_IFREG inode is a proxy for the big array of bytes (stored
elsewhere on the disk) that make up the file's contents.
Alternatively, they're all proxies, and the adjective becomes pretty

Well, I'm not sure I'd call it "non-devfs", in that you're basically
creating either one devfs per device or a devfs which exports only one
device per mount, depending on how much of the device-specific part you

Well, they don't, really, but not automounting them doesn't solve any

Not all that unsolved.  I've used at least two automounters, each of
which solved it well enough for their purposes.  Device automounter
config *is* unsolved - in exactly the same way that devfs config is

I don't see any real difference between an automounter mounting devices
into /dev individually and a devfs making devices appear under a devfs
mount.  It would even be just a relatively trivial bit of coding to

The same thing you would have loaded for the same name in a "touch the

Well, there may be value in its not appearing in readdir() output.

/~\ The ...
From: David Holland
Date: Sunday, March 14, 2010 - 1:23 pm

On Sat, Mar 13, 2010 at 08:02:51AM -0500, der Mouse wrote:
 >>> [st_dev] does not have to correspond, though, to anything else in
 >>> the system.
 >> Not really, no, but it may as well be the same as what's in st_rdev.
 > 
 > If there still is an st_rdev.  I see no particular reason that needs to
 > be preserved.

No, except that it is somewhat useful to be able to identify a device
node (or at least distinguish it from others) and plenty of existing
code expects the st_rdev field to exist. Patching all that is only
worthwhile if it accomplishes some purpose, which it wouldn't really.

 > > The files in procfs and kernfs are for the most part semantically
 > > equivalent to real files even when they're virtual or dynamically
 > > generated.  Devices frequently have other properties.
 > 
 > Disagree.  Writing to real files does not, for example, change the
 > system hostname or alter a process's registers.
 > 
 > In fact, that sounds a lot like the kind of dangers that inhere in
 > writing to devices indiscriminately, doesn't it?

Yes... and no. There's another sense in which /kern/hostname is the
same as /etc/passwd: both are text files that affect the system
configuration. Changes to both also have immediate operational effects
on the running system. The fact that one is not preserved across
reboots is a negligible difference from the perspective of some
program that might randomly open either.

Unexpectedly opening a tty without being prepared to hang indefinitely
waiting for carrier-detect is a different class of problem. Many
devices also are not like regular files in that you cannot read back
what you write to them; /kern/hostname is again a regular file by that
standard.

I'm not saying that it might not be useful to tag /kern/hostname
somehow (and /etc/passwd too) so that certain classes of programs,
like say mail delivery tools, can categorically refuse to write to
them. But that's kind of a different issue from marking devices...

 > >> [...] devfs might even ...
From: der Mouse
Date: Sunday, March 14, 2010 - 1:43 pm

If we do away with device numbers - I think that was mentioned - what
point does st_rdev have?  What meaningful values could you put there?
The only fundamental purpose they (as used here) serve is to handle the
mapping between filesystem entities and driver instances, and that's

And the device driver is part of the conceptual entity that a device

Where it is stored is an implementation detail.  (And, if it's
autoloaded, the driver itself may very well be in the filesystem.
Depending on what the driver does, the object, if any, backing it also

Depends on exactly what you include under the "devfs" name.  There is
no devfs in the sense of something being passed to vfs_attach(), but it
seems to me that that, like the existence of vfs_attach() at all, is an
implementation detail; there is code (mostly in the automounter) that
performs the functions we have been attributing to a devfs, and in that

I'm not convinced "too many levels of indirection" is fair - and, even
if it is, I'm not convinced individual device mounts aren't
approximately as bad in that regard.

The main purpose it serves, it seems to me, is to collect the relevant
code together.  Deciding whether to implement the loose conceptual
devfs as individual automounted device nodes or a single devfs mount
strikes me as a bit of a toss-up; either can perform the fundamental
function of a dynamic mapping between filesystem-namespace strings and
device drivers.  I'm perfectly willing to accept your experience that
the single-devfs-mount has operational problems, but, pending someone
trying it, I don't believe that the other way doesn't.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
From: David Holland
Date: Sunday, March 28, 2010 - 2:55 pm

On Sun, Mar 14, 2010 at 04:43:18PM -0400, der Mouse wrote:
 > >> In some respects.  But lurking under all this has been doing away
 > >> with st_rdev, which [...]
 > > Well, no, we're doing away with a specific interpretation of the
 > > contents of st_rdev. Getting rid of st_rdev itself doesn't serve much
 > > further purpose.
 > 
 > If we do away with device numbers - I think that was mentioned - what
 > point does st_rdev have?  What meaningful values could you put there?
 > The only fundamental purpose they (as used here) serve is to handle the
 > mapping between filesystem entities and driver instances, and that's
 > something that can, and maybe even should, be done differently.

There's one other purpose, which is determining if two device inodes
encountered in the FS namespace refer to the same device or not. This
is not a completely useless property, and if you're going to have some
kind of device identity anyway (as is needed for filling in st_dev)
st_rdev is a natural place to put it for the device node itself.

 > >> I'm not sure how fair it is to call it a "proxy object", any more
 > >> than an S_IFREG inode is a proxy for the big array of bytes (stored
 > >> elsewhere on the disk) that make up the file's contents.
 > > But that big array is part of the conceptual entity that the inode
 > > represents.
 > 
 > And the device driver is part of the conceptual entity that a device
 > inode represents.

No, it's not, because the device inode belongs to a file system and
the driver does not.

 > > The driver pointed to by a device special file is not part of
 > > anything in the filesystem.
 > 
 > Where it is stored is an implementation detail.

Yes, which is why it's not *part* of the filesystem.

 > >> Well, I'm not sure I'd call it "non-devfs", in that you're basically
 > >> creating either one devfs per device or a devfs which exports only
 > >> one device per mount, depending on how much of the device-specific
 > >> part you consider to be part of the ...
From: Johnny Billquist
Date: Wednesday, March 10, 2010 - 6:39 am

Sweet jesus. Talk about brittle solutions...
Clean shutdown to survive... Yeah, that we can guarantee... Or maybe 
not... :-(

And the extra overhead seems just excessive!

	Johnny

From: der Mouse
Date: Wednesday, March 10, 2010 - 8:15 am

I'm a little confused, here.  How can chmod and chown on /dev/wd0a do
anything useful if /dev/wd0a just gets redirected to (say)
/dev/default/wd0a?  Removing access helps in only a few cases, because
someone wishing to bypass the removal can go directly to
/dev/default/wd0a.  And granting access doesn't help either, because
the access will fail on /dev/default/wd0a even if it doesn't on
/dev/wd0a.


Well, yes.  But research efforts are like that.  Robustness is pretty
much necessary for production use but not for the stage this appears to
be at.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
From: Masao Uebayashi
Date: Wednesday, March 10, 2010 - 9:36 am

I'm not a researcher.  I'm an engineer.  I like steady move & feasible project.

I think everyone agrees that having /dev/id is useful even only alone.
 So that would be the first step.

If people complains the inability of rename(2) in /dev/id...  I should
quit using NetBSD.  (Fortunately no one has.)

Masao
From: David Holland
Date: Thursday, March 11, 2010 - 4:39 pm

On Thu, Mar 11, 2010 at 01:36:41AM +0900, Masao Uebayashi wrote:
 > > Well, yes. ?But research efforts are like that. ?Robustness is pretty
 > > much necessary for production use but not for the stage this appears to
 > > be at.
 > 
 > I'm not a researcher.  I'm an engineer.  I like steady move &
 > feasible project.

I am a researcher, and my core area of interest is exactly this kind
of problem. If you are looking for a feasible project that can be
relied on to move forward, my honest best recommendation is to pick
something else. :-|

-- 
David A. Holland
dholland@netbsd.org
From: Eric Haszlakiewicz
Date: Thursday, March 11, 2010 - 11:42 am

Perhaps it should be a hard link instead then?  Unfortunately, then you

The "DB" (whether it's an actual database, or a file on a filesystem, or
whatever) shouldn't care about clean shutdown most of the time, only if
you happen to crash in the middle of changing things.  The contents
and permissions of /dev aren't going to be changing much in most normal
operation so it seems like optimizing the "DB" to be in a safe state
most of the time, even if it makes changes slower, would solved this problem.

eric

From: der Mouse
Date: Tuesday, March 9, 2010 - 9:50 am

Anyone who can meddle with a root-run process can do a lot worse than
that (to start with, mounting that filesystem directly).

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
From: Thor Lancelot Simon
Date: Tuesday, March 9, 2010 - 10:46 am

Not if the system is running in secure mode.

Thor
From: Eric Haszlakiewicz
Date: Tuesday, March 9, 2010 - 11:57 am

This is already a problem with dkctl.  And anyway, jacking around with the
userspace daemon is unnecessarily complicated: if you have sufficient access
to do that, you probably have sufficient access to just change the symlink.

eric
From: Thor Lancelot Simon
Date: Tuesday, March 9, 2010 - 12:23 pm

I want to be able to tell the kernel to mount a device reliably identified
by some kind of unique, symbolic name.  I want to be able to load a list
of permissible such names into the kernel while it's running insecure, and
restrict mounting to those and only those when it's running secure.

Relying on a userspace daemon for naming makes that impossible.

Thor
From: Joerg Sonnenberger
Date: Tuesday, March 9, 2010 - 12:45 pm

I don't get it. What kind of devices are you talking about? If the
environment is static, you can still use the same identifier as before.
If it is not, why do you believe that the device you are dealing with is
the one you hoped it is?

Joerg
From: Thor Lancelot Simon
Date: Tuesday, March 9, 2010 - 12:55 pm

That's a matter for the kernel to decide -- not one for some userspace
program which could be tampered with by any process running with euid 0.

At least, that is how I would strongly prefer it to be.

Thor
From: Steven Bellovin
Date: Tuesday, March 9, 2010 - 12:58 pm

But what's to stop someone from mounting a new file system over /bin?  Or are you talking about secure_level 2?

		--Steve Bellovin, http://www.cs.columbia.edu/~smb





From: Thor Lancelot Simon
Date: Tuesday, March 9, 2010 - 2:45 pm

I'm talking about trying to build policies which provide some of the
guarantees we only provide at securelevel 2 now, but allow more flexibility
to do things the administrator's decided ahead of time the system should
be allowed to do.

Doing this right is not trivial (it may require a signature binding the
contents of a medium to its UUID, etc.) but it's certainly not impossible
either.

Causing all binding of names to devices to run forcibly through a userspace
daemon *will* make such enhancements impossible.  That would suck.

Thor
From: Steven Bellovin
Date: Tuesday, March 9, 2010 - 7:59 pm

I think that Joerg's proposal doesn't prevent you from doing what you want, though I don't think it helps, either.  He suggested that /dev/uuid and /dev/label just have symlinks to the usual device file, so no user-level daemons would be involved.  Those who have your security needs will mount on /dev/usualstuff; those who have topologically confused configurations would use /dev/label/whatever.  Many folks will mix and match -- a typical laptop, with only one hard drive, could have / on /dev/usual, while USB sticks and external hard drives would be referenced via the /dev/label symlink.

		--Steve Bellovin, http://www.cs.columbia.edu/~smb





From: Masao Uebayashi
Date: Tuesday, March 9, 2010 - 8:43 pm

> I think that Joerg's proposal doesn't prevent you from doing what you want, though I don't think it helps, either.  He suggested that /dev/uuid and /dev/label just have symlinks to the usual device file, so no user-level daemons would be involved.

He said it has to be done in userland daemon. :)

Masao
From: Steven Bellovin
Date: Tuesday, March 9, 2010 - 8:52 pm

The userland daemon creates the symlinks but not the device files, I thought.


		--Steve Bellovin, http://www.cs.columbia.edu/~smb





From: Matthew Mondor
Date: Tuesday, March 9, 2010 - 9:14 pm

On Tue, 9 Mar 2010 22:52:17 -0500

That's also my understanding, but since this system is also very simple,
even a kernel implementation would probably be nice to do this, in
which case users of such a system could add an entry in
their /etc/fstab to mount the uuid fs under /dev/uuid/ ?
-- 
Matt
From: Masao Uebayashi
Date: Tuesday, March 9, 2010 - 9:17 pm

Yes.  That's exactly same view with me.

Masao
From: Eric Haszlakiewicz
Date: Wednesday, March 10, 2010 - 1:53 pm

So if you want to lock things down, why not just change the /dev mount to be
read-only?  Then bump the securelevel, and whoever the daemon is running as
won't be able to change anything.

eric
From: David Young
Date: Tuesday, March 9, 2010 - 8:36 am

I don't understand why the intrinsic properties cannot be found out in

We don't need a second mechanism to handle dk(4), do we?  If dk3 should
attach to the volume with GUID 60708090-a0b0-c0d0-e0f0-01020304050, let
the device properties say so:

<plist version="1.0">
<dict>
        <key>device-driver</key>
        <string>dk</string>
        <key>device-unit</key>
        <integer>0x3</integer>
	<key>guid</key>
	<string>60708090-a0b0-c0d0-e0f0-01020304050</string>
</dict>
</plist>

Dave

-- 
David Young             OJC Technologies
dyoung@ojctech.com      Urbana, IL * (217) 278-3933
From: Iain Hibbert
Date: Tuesday, March 9, 2010 - 9:10 am

Well, in the case of bt3c(4) it needs to load firmware before you can talk
to it and find out the BDADDR.  So, you also need to access the disk
before it configures..  I don't think the boot up sequence can handle this
scenario as yet?  In that case, the firmware is loaded when the device is
enabled (/etc/rc), not during autoconfig.

But if you want to rewrite the autoconfig mechanism so that each
xxx_attach() function is called in its own kernel thread so that devices
can wait until its safe to load the firmware then I'm all for it.. when do
you plan to allocate the unit number though?

iain


From: David Young
Date: Tuesday, March 9, 2010 - 12:01 pm

I guess that you could split bt3c(4) into upper & lower drivers.  The
upper driver's responsibility is to load the firmware and to match and
attach an instance of the lower driver.

Dave

-- 
David Young             OJC Technologies
dyoung@ojctech.com      Urbana, IL * (217) 278-3933
From: Masao Uebayashi
Date: Tuesday, March 9, 2010 - 9:24 am

Usually GUID is recorded in partition table.  You're viewing things in reverse
order...

Masao

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
From: David Young
Date: Tuesday, March 9, 2010 - 9:48 am

I don't see a problem.  Let the kernel read the partition table,
iterate over the partitions, extract properties from each partition,
try to match a dk to each partition by properties (e.g., guid(dk3)
== guid(partition 7 at sd0)).  If there is a match, take the dk unit
number from the matching property list (e.g., dk3).  If there is no
match, choose a unit number that is used by neither a device_t or a
configuration properties list.

Dave

-- 
David Young             OJC Technologies
dyoung@ojctech.com      Urbana, IL * (217) 278-3933
From: Masao Uebayashi
Date: Tuesday, March 9, 2010 - 10:03 am

That way you teach lots of knowledge into dk(4).  That's what I don't like
to do.

Now you pass GUID from kernel config, what is the point to have the predefined
unit number 3?

Masao

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
From: David Young
Date: Tuesday, March 9, 2010 - 11:47 am

The code providing DKWEDGE_METHOD_GPT already has the knowledge.  I
don't think that the knowledge has to move from there.  All that dk(4)
has to do is to match device-properties lists, and for that it can use

The point is to make the device node, /dev/dk3, a reliable handle for
the volume.

Dave

-- 
David Young             OJC Technologies
dyoung@ojctech.com      Urbana, IL * (217) 278-3933
From: Masao Uebayashi
Date: Tuesday, March 9, 2010 - 9:49 pm

What if you want to mount a NIC as /?  You'll fix all drivers?

All of you say that lookup-by-ID works in your way.  It's possible,
because ID is unique.  What I'm talking is the best design how to do
it.  Now raidframe(4) alreadys does it itself, why do you have same
logic in raidframe(4) and dk(4)?

I think dk(4) does too many things.  That means you have to
re-implement same logic in many places.  That also means users have to
learn all devices' behavior.


The point is you can't rely on device unit numbers of pseudo devices.

Masao
From: David Young
Date: Wednesday, March 10, 2010 - 8:13 am

Of course you have to fix drivers.  Drivers don't extract the device


Drivers have to know how to extract properties such as MAC address from
their devices.  I don't think that we can avoid that.  If drivers record
the properties that they extract under standard keys, then we can match


We can discard the pseudo-devices concept, if need be.

We cannot rely on any device's unit numbers, now, if it can change
slot/port/chassis.  If we extend the set of "locators" to include
intrinsic device properties such as MAC address, volume GUID, and serial
number, then we can establish a permanent correspondence between a
device unit and a physical device.

Dave

-- 
David Young             OJC Technologies
dyoung@ojctech.com      Urbana, IL * (217) 278-3933
From: Masao Uebayashi
Date: Wednesday, March 10, 2010 - 6:16 pm

OK, you want to match device by ID.  Like:

        fxp* macaddr xx:xx:xx:xx:xx:xx

That might make sense.

What doesn't make sense there is to *fix* device unit number.  Device
unit number will be no longer used after devfs, because we lookup

"A library function" is inacceptable to me.  This is a substantial
design of device(9) API.  This should be a *primitive*.

Device probes, configures, and extracts properties from the real
device.  Just before leaving attach(), it *puts* its ID in a
well-known place so that device(9) can lookup these IDs later.

Anything more than this is inacceptable to me.  autoconf(9) is already
too complex.  I got *huge* frustration to understand it, that's why

In what sense?

As I explained in the first post, pseudo device is strict definition;
it has no parent in terms of physiical topology.  It may have parents
in terms of components.   I've very carefully investigated those.  I
strictly defferenciate them.  Please re-read the first post in this


I wonder if we can assume serial numbers are unique.

And again, device unit is no more.

Masao
From: Masao Uebayashi
Date: Wednesday, March 10, 2010 - 6:25 pm

Remember the cost to fix drivers to extract IDs in match().  Now I see
no value doing this.

Masao
From: Iain Hibbert
Date: Thursday, March 11, 2010 - 2:56 am

If a device has no parent, just attach it at root (similar to mainbus*),
with parent == NULL, or even pseudo* at root, and pseudo-dev* at pseudo?

It is a frustration when building a 'software' device that there are some
differences between the methodology of configuration, and it is not
possible to pass configuration arguments from userland into the device
attach routine..

I think the "pseudo-device" abstraction is unnecessary

iain


From: Masao Uebayashi
Date: Thursday, March 11, 2010 - 6:40 am

Could you show one (or more) real example(s) / senario(s)?  That would
help to understand problems & clarify requirements...

Masao
From: Iain Hibbert
Date: Thursday, March 11, 2010 - 8:33 am

Well, a line discipline which takes serial IO and converts it into a soft
device which interacts with the rest of the system. In particular example,
dev/bluetooth/btuart.c does that for a bluetooth device. The open routine
is called from the TIOSLINED ioctl code and does:

=09cfdata =3D malloc(sizeof(struct cfdata), M_DEVBUF, M_WAITOK);
=09for (unit =3D 0; unit < btuart_cd.cd_ndevs; unit++)
=09=09if (device_lookup(&btuart_cd, unit) =3D=3D NULL)
=09=09=09break;

=09cfdata->cf_name =3D btuart_cd.cd_name;
=09cfdata->cf_atname =3D btuart_cd.cd_name;
=09cfdata->cf_unit =3D unit;
=09cfdata->cf_fstate =3D FSTATE_STAR;

=09dev =3D config_attach_pseudo(cfdata);
=09if (dev =3D=3D NULL) {
=09=09free(cfdata, M_DEVBUF);
=09=09splx(s);
=09=09return EIO;
=09}

=09sc =3D device_private(dev);
=09sc->sc_tp =3D tp;

here, we must find the device softc and insert some information after
attach has finished (tp =3D=3D tty pointer) because there is no way to pass
that to the btuart_attach() routine.

There was a thread recently regarding extending this driver,

  http://archive.netbsd.se/?ml=3Dnetbsd-tech-kern&a=3D2010-01&t=3D12251898

with Kiyohara wishing to pass some additional configuration that would be
possible with eg

=09dev =3D config_found(NULL, &arg, ...);

and would also solve the (very slight) race condition, and moreover it
would not require malloc of cfdata.

The "parent" argument is mostly unused by autoconfig anyway..

iain
From: Quentin Garnier
Date: Thursday, March 11, 2010 - 11:47 am

On Thu, Mar 11, 2010 at 03:33:27PM +0000, Iain Hibbert wrote:

You can use a static cfdata_t, you know.  The malloc'ing was a mistake
made initially and carried upon since.

--=20
Quentin Garnier - cube@cubidou.net - cube@NetBSD.org
"See the look on my face from staying too long in one place
[...] every time the morning breaks I know I'm closer to falling"
KT Tunstall, Saving My Face, Drastic Fantastic, 2007.
From: David Holland
Date: Thursday, March 11, 2010 - 5:16 pm

On Thu, Mar 11, 2010 at 03:33:27PM +0000, Iain Hibbert wrote:
 > > Could you show one (or more) real example(s) / senario(s)?  That would
 > > help to understand problems & clarify requirements...
 > 
 > Well, a line discipline which takes serial IO and converts it into a soft
 > device which interacts with the rest of the system.

Line disciplines are a bad example, because they're a prehistoric kind
of hacked-up bus attachment and as such ought to be rototilled out of
existence.

-- 
David A. Holland
dholland@netbsd.org
From: Iain Hibbert
Date: Friday, March 12, 2010 - 2:00 am

Well, line discipline is a solution to a problem, which is that we want a
'device' in the kernel but the device is not directly accessible and
communicates to us through a serial protocol.

You can say its a bad idea all you like, but unless you suggest an
alternative solution that doesn't help to remove it.

One alternative is to move the translator out of the kernel, eg instead of
using the pppd(8) which needs complicated hooks, import userland ppp(8) as
per FreeBSD which IIRC provides a tap(4) interface. The argument against
that is probably not as strong as it once was as even embedded devices
these days can be several orders of magnitude faster than the computers
that were prevalent when pppd(8) was written. But then, data rates have
improved also - pppd(8) runs on my uhso(4) dongle at up to 180KiB/s and I
expect there would still be objections to removing it.

Any other solutions you would like to propose?

iain


From: David Holland
Date: Friday, March 12, 2010 - 11:23 pm

On Fri, Mar 12, 2010 at 09:00:11AM +0000, Iain Hibbert wrote:
 >>>> Could you show one (or more) real example(s) / senario(s)?  That would
 >>>> help to understand problems & clarify requirements...
 >>>
 >>> Well, a line discipline which takes serial IO and converts it into a soft
 >>> device which interacts with the rest of the system.
 >>
 >> Line disciplines are a bad example, because they're a prehistoric kind
 >> of hacked-up bus attachment and as such ought to be rototilled out of
 >> existence.
 > 
 > Well, line discipline is a solution to a problem, which is that we want a
 > 'device' in the kernel but the device is not directly accessible and
 > communicates to us through a serial protocol.
 > 
 > You can say its a bad idea all you like, but unless you suggest an
 > alternative solution that doesn't help to remove it.

I did; bus attachments.

That is, instead of just having "com* at pci*" or whatever and all the
tty stuff being a legacy blob layer you'd do something like this:	

   attach com at pci with ...
   attach sl at com with ...
   attach ppp at com with ...
   attach tty at com with ...

and then connect things up on the fly at runtime using whatever
suitable device control tools.

This is not necessarily that different from line disciplines in
practice (maybe, maybe not), but it's a lot cleaner structurally and
it allows this stuff to share common infrastructure with the rest of
the device tree. Whatever that infrastructure might be in the long
run.

-- 
David A. Holland
dholland@netbsd.org
From: Masao Uebayashi
Date: Saturday, March 13, 2010 - 11:33 pm

If you pay a little more respect to engineers, you'll find this is
almost same as Iain's saying and what I wrote in the first mail.

Masao
From: David Holland
Date: Sunday, March 14, 2010 - 2:41 am

On Sun, Mar 14, 2010 at 03:33:19PM +0900, Masao Uebayashi wrote:
 > > I did; bus attachments.
 > 
 > If you pay a little more respect to engineers, you'll find this is
 > almost same as Iain's saying and what I wrote in the first mail.

huh? he asked me what I meant, I said what I meant...

-- 
David A. Holland
dholland@netbsd.org
From: Masao Uebayashi
Date: Saturday, March 13, 2010 - 11:11 pm

Although I have 0 knowledge & have no time to learn tty/line disc at
the moment, I fully support to fix those *now*.

You need struct device.  You understand how data/control flow.  I
think it's perfectly reasonable to make it a device as a "function".

Masao
From: Matthias Drochner
Date: Thursday, March 11, 2010 - 1:45 pm

you would set this in stone.
It would be easy to extend the current attach_pseudo() function by
an "attach args" argument. (An interface attribute should be passed
too for consistency because this is used as a qualifier for the
opaque "attach args" elsewhere.)
I think it hasn't been done just because noone needed it.

For the future, I'd think that the currently unstructured "attach args"
needs to be split into 3 parts:
1. Information about the device type, what is needed by child driver's
   "match" function to select the right driver. This is qualified
   by the interface attribute.
2. If the hardware supports it, information about the individual instance,
   as Ethernet HW address, or UUID for disk partitions, to allow drivers
   to recognize a device after temporary disconnection. This is qualified
   by the child device type (which can have multiple attachments).
3. everything else: handles, cookies, whatever needed for parent-child
   communication
For (1), it would make sense to make it a proplist, and pass it to
drvctl, along with locator information, to support on-demand loading

I think it makes sense, because it allows to limit use of the
interface-attribute-less "root" to a minimum. There is a reason
that many/most ports use just one "mainbus" at root.

best regards
Matthias




------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzende des Aufsichtsrats: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. ...
From: der Mouse
Date: Thursday, March 11, 2010 - 2:00 pm

That doesn't make the pseudo-device abstraction necessary.  I've
sometimes wondered why pseudo-devices weren't handled by creating a
pseudo-bus for them to attach at; I did that once, and it was no more
than an afternoon's work to have pseudo0 attach at mainbus0 and then my
device attach at pseudo0.  (I did this because I needed a struct
device, but there's no reason it couldn't have been done instead of
creating pseudo-devices as we know them today.)

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
From: Matthias Drochner
Date: Thursday, March 11, 2010 - 2:24 pm

If we are taking the "interface attribute" abstraction serious this
doesn't work easily: A device doesn't attach at a device but at
an interface attribute (which is usually provided by a device but
can come out of the blue in the pseudo-device case).
The "mainbus" and the interfaces provided by it are platform

That's what attach_pseudo does... I think it makes sense to
have some way to create a device hierarchy without pretending
that it is connected to the platform's physical main bus.

best regards
Matthias



------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzende des Aufsichtsrats: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
From: der Mouse
Date: Thursday, March 11, 2010 - 8:24 pm

I probably could have done it that way, but it looked easier to do it
as a pseudo-bus, and for the purposes at the time, the method didn't

mainbus0 does not necessarily correspond to any physical bus.  There is
other precedent for things in the autoconf tree that do not correspond
to anything physical, such as wsdisplay.  (Or, more precisely,
corresponds to the same physical thing as something else.)

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
From: Matthias Drochner
Date: Friday, March 12, 2010 - 1:20 pm

Well, not physical in a sense that you can look at the wires, but
commonly it has a number of interface attributes which match the
capabilities of the the host bridge and some more platform

Yes, devices in the autoconf tree are connected by interfaces (named
by interface attributes) which are primarily APIs. Sometimes they
correspond to some physical bus semantics, sometimes it is just
a software abstraction.

Anyway - the interface attributes of "mainbus" are fixed by the
platform which makes it a bad choice to attach random MI things at.

best regards
Matthias



------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzende des Aufsichtsrats: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
From: Masao Uebayashi
Date: Tuesday, March 9, 2010 - 9:21 am

The question is, why do you need device unit number, when we can lookup struct
device directly?  cfdriver_t -> device_t is a back reference.  The only case
it makes sense I can think of is device detachment, but that's handled by
reference counting.

Masao

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
From: Jochen Kunz
Date: Monday, March 8, 2010 - 12:50 am

On Mon, 08 Mar 2010 01:56:28 +0100
The problem is: Over twenty years ago a hardware reconfiguration was a
infrequent and intrusive task. It required a power down and probably
rewireing the NPR grand chain on your UniBus backplane with a wire wrap
tool... System configuration was static in those good, old days where
Unix machines where administered by a professional sysadmin and cost a
fortune.

Today we have all sorts of hot plug devices. SCSI, SAS, FibreChannel,
(e)SATA, USB, FireWire, PCcard, ExpressCard, hot plugable PCI(-Express),
Bluetooth, ... System configuration is verry dynamic today and every
user is its own Root. We need a better way to deal with this.

Linux had a devfs and droped it. Now it has udevd(8). Most likely the
penguins had a reason for this. udevd(8) gives the user land control
over device enumeration. Maybe no bad idea. (Disclaimer: I don't like
Linux.)

BTW: OSF/1 aka DEC-Unix aka Tru64-Unix did somthing like Linux +
udevd(8) over 10 years ago.
-- 


tschüß,
       Jochen

Homepage: http://www.unixag-kl.fh-kl.de/~jkunz/

From: Masao Uebayashi
Date: Monday, March 8, 2010 - 1:18 am

I took a little glance at OpenSolaris/FreeBSD devfs man pages, and quickly
stopped.  They're all overly complicated.

Those who complain about redundant device paths exposed should look at other
implementations.  I don't really like to have /etc/devfsd.conf and bikeshed
its format.

Exposing device tree works, because NetBSD's hardware device abstraction is

Good point.  We're hopelessly behind.

Masao

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
From: der Mouse
Date: Monday, March 8, 2010 - 8:54 am

Surely there are mailing list messages or something that outline that
reason?  (Not that I have any idea where they'd be, but don't we have

I think it probably is no bad idea.  I don't like Linux either, but I
don't think it's so irremediably disastrous that there's nothing at all

Another reason to think that it's likely worth trying.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
From: Jukka Ruohonen
Date: Monday, March 8, 2010 - 9:21 am

It is more like:

Linux had a devfs and [dropped] it. Now it has udevd(8). Most likely the
penguins had a reason for this. Linux had udevd(8) and reintroduced devfs.
Now it has udevd(8) and some kind of devfs. Most likely the penguins had a
reason for this.

- Jukka.

http://lwn.net/Articles/331818/
From: Thor Lancelot Simon
Date: Monday, March 8, 2010 - 9:59 am

So did SGI, and it was a disaster.  If you're going to break the common
Unix idiom (single directory full of nodes for devices) you'd better be
prepared to replace it with something that's very, very easy and
intuitive for experienced Unix administrators to learn.

From: Dan Engholm
Date: Monday, March 8, 2010 - 9:52 am

This page has links to various bits about the move from devfs to udev.

http://www.kernel.org/pub/linux/utils/kernel/hotplug/udev.html

--Dan
From: Greg A. Woods
Date: Sunday, March 7, 2010 - 8:06 pm

At Sun, 7 Mar 2010 20:50:03 +0000, Quentin Garnier <cube@cubidou.net> wrote:

Indeed.  This needs carving in stone somewhere, since folks seem to

Indeed.

--=20
						Greg A. Woods
						Planix, Inc.

<woods@planix.com>       +1 416 218 0099        http://www.planix.com/
From: Eric Haszlakiewicz
Date: Sunday, March 7, 2010 - 9:31 pm

Careful here:  does "end user" mean grandma clicking though KDE, or an admin
figuring out why one disk of his raid component disappeared?  More precise

I thought this was *one* of the things that devfs was supposed to do.  The
two features being:
   1) provide a way to see detailed information about how devices are laid out
and, the one relevant for this discussion
   2) provide stable names for devices that don't change if they happen to be
laid out differently today vs. yesterday.

with #2 possibly provided by a userland script that parses the structure
provided by #1, plus whatever additional information it needs, and creates
symlinks (or otherwise causes device nodes to appear in the right paths)
#2 == simplicity, #1 == transparency and (low level) control

Perhaps I'm muddling the base feature requirements with various ideas for 
implementations?

eric
From: Quentin Garnier
Date: Sunday, March 7, 2010 - 10:54 pm

Well, even if the admin gives proper names to all the disks and such,
yes, there is one moment when they'll be faced with details of the
device tree.  But to me it's a brief part of the setup, not really of
the use.

The click-through user has their administrative tasks proxied by things
like hald(8), but I expect said proxy to see as little gory details as a
human admin as well.


There's nothing complicated about what devfs is for: having all the
relevant device nodes in /dev, and only those.  Anything else is an
implementation detail.  Neither #1 or #2 are part of the immediate goal
of devfs.  I don't see why #1 should be a part of a devfs
implementation, and #2 is certainly something nice to have, but it goes
beyond the initial intent of a devfs.

--=20
Quentin Garnier - cube@cubidou.net - cube@NetBSD.org
"See the look on my face from staying too long in one place
[...] every time the morning breaks I know I'm closer to falling"
KT Tunstall, Saving My Face, Drastic Fantastic, 2007.
From: Masao Uebayashi
Date: Monday, March 8, 2010 - 12:39 am

> What kind of user do you talk about here?

This is a good question.

When I speak on netbsd lists, I have following in mind:

a) Desktop users

They use NetBSD for browsing www, reading/writing mails, playing videos,
doing math, studying, text processing, etc.  They use a variety of machines,
typically notebooks/netbooks and commonly seen x86 platforms.

b) Server users

They use NetBSD for mission critical, high-performance servers like high-load
network servers, I/O servers, etc.  They use multi-cpu high-performace machines
with multiple network / disk interfaces.

c) Embedded users

They use NetBSD for production like routers, printers, cellphones, various
home electronics, robots, factory machines, cars, trains, ships, airplanes,
submarines, spaceships.  They want small and reliable systems.

d) Hobbists

They run NetBSD on old machine like VAX, Alpha, m68k, SH, PowerPC, MIPS, some
production NAS or routers.  Most of them are slower than others.  They like to
hack NetBSD source code.

*

I'm a little biased to c), because I'm it myself, but I think all of these


There're some devfs implementations around, and AFAIK there's no standard, right?
I came up with my design by myself.  So my devfs follows the design of my devfs.

The overall intent is to concentrate the information into the device tree,
where we can identify *all* the instances of devices, including ones that
don't have any IDs like GUID or MAC address.  Each node exactly matches
device::dv_xname.

I don't want to make this more complex, like give drivers freedom to decide
how they look like.  That leads to lots of code added around drivers
(xxx_register / xxx_deregister), like mjf's proposal did.  My devfs doesn't
do that for consistency and simplicity of the implementation.

As pointed out by all of you, the device tree of my devfs can't lookup device IDs.
That could be easily realized ...
From: Masao Uebayashi
Date: Monday, March 8, 2010 - 1:55 am

Lookup-by-ID should be symlink, like:

 	/dev/id/guid/25892e17-80f6-415f-9c65-7395632f0223
	-> /dev/mainbus0/.../wd0/disk0/gpt0

	/dev/id/ieee802mac/00-b0-d0-86-bb-f7
	-> /dev/mainbus0/.../bge0

	/dev/id/ipv4/172.16.0.1
	-> /dev/mainbus0/.../bge0/ether0/net0

Masao

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
From: David Holland
Date: Monday, March 8, 2010 - 1:16 am

At the risk of being a wet blanket:

On Sun, Mar 07, 2010 at 06:43:49PM +0900, Masao Uebayashi wrote:
 > I've been spending LOTS of time to investigate various devicess sources, to
 > understand some questions I've had, like:
 > 
 > - Why NetBSD/arm has no bus_space_mmap(4)?

hellifIknow;

 > - Why tty locking is messy?

because ttys are messy, which is because they haven't had a big
rototill in some twenty years or more;

 > - Why sys/dev/wscons has so many #ifdef's?  (Modular unfriendly!)

dunno;

 > - How dk(4) is enumerated?

in the order found;

 > a) Device enumeration is unstable / unpredictable
 > 
 > dk(4) is a pseudo device, and its instances are numbered in the order it's
 > created.  This is fine when you manually / explicitly add wedges(4) by using
 > "dkctl addwedge".  This is not fine, if I have a gpt(4) disk label which has
 > ordered partitions.  I expect disks to be created in the order I write in
 > the gpt(4) disk label.  It's annoying the numbering changes when I add a new
 > disk.  Same for raidframe(4).

Why doesn't gpt(4) create the wedges in that order? If it did that
they'd come out numbered the way you'd expect.

Having the numbering change when you add a new disk is unavoidable.
See further notes below.

 > b) Consistent device topology management is missing
 > 
 > The reason why NetBSD/arm has no bus_space_mmap(9) has turned out to be the
 > fact that we have no consistent (MI) way to manage physical address space of
 > devices.  NetBSD/mips has a working bus_space_mmap(9) in
 > sys/arch/mips/mips/bus_space_alignstride_chipdep.c.  It defines address
 > windows and manage it by itself.
 > 
 > Who wants to reimplement it on all cpus/ports/platforms?
 > Considering physical address space is a pretty much simple concept
 > - a single linear address space.

Except when it isn't really; consider for example NUMA systems. I
think there have also been systems where different CPUs see a
different physical address space view. Whether any ...
From: Masao Uebayashi
Date: Monday, March 8, 2010 - 6:06 am

In my devfs, devices are enumerated in the local connection.

	/dev/mainbus0/.../piixide0/ata0/wd0
	/dev/mainbus0/.../piixide0/ata1/wd0
	/dev/mainbus0/.../piixide1/ata0/wd0


See the original post.  I showed the reversed pseudobus, which I've found
very powerful.  More examples:

	/dev/mainbus0/.../screen0/vt100emul0
	/dev/mainbus0/.../screen0/wsmuxout0
	/dev/mainbus0/.../kbd0/wsmuxin0
	/dev/mainbus0/.../mouse0/wsmuxin0
	/dev/pseudobus0/wsmux0/wsmuxout0 -> /dev/mainbus0/.../screen0/wsmuxout0
	/dev/pseudobus0/wsmux0/wsmuxin0 -> /dev/mainbus0/.../kbd0/wsmuxin0
	/dev/pseudobus0/wsmux0/wsmuxin1 -> /dev/mainbus0/.../mouse0/wsmuxin0

Where screen0 has two children, vt100emul0 and wsmuxout0.  wsmuxout0 *joins*
wsmux0.  kbd0's child wsmuxin0 joins wsmux0 too.  When kbd0 receives a
character, it delivers it to wsmuxin0, which in turn delivers it to wsmux0,
which in turn delivers it to wsmuxout0, then finally screen0.  screen0 sends
the received character to its child vt100emul0.

Now how multi-head support looks is pretty much straightforward.

bridge(4) + tap(4) + some ether(4) would look exactly same manner.

Masao

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
From: David Young
Date: Friday, March 12, 2010 - 2:04 pm

No device unit number?  You've lost me.  Please fill in the blank,


Actually, that sounds great to me.  Then we can, as you suggested at
the top of this thread, create the ether(4) pseudo-device that is
analogous to audio(4).  Let us attach a particular ether(4) instance to
an ethernet h/w instance according to the h/w's properties.

Take fxp(4) for an example.  Rename it fxphw(4).  Let it attach an
ether(4) instance at its ether attribute, using an optional 'basename'
attach argument of 'fxp', so that the ether(4) instance knows that it
should take its customary name, fxp0 (or whatever).

I think that an added benefit of breaking things down this way is that
we can attach >1 ether(4) to a single h/w instance, which makes a sense
with those NICs that support >1 unicast address.  Maybe we can attach
vlan(4) to the h/w backend's ether attribute, too.

Another added benefit of breaking things down this way is that we may be
able to get rid of the problematic "network" class in PMF.

You seemed to have in mind attaching at fxp0 an ether0, and attaching at
ether0 a net0.  What is net0's role?

BTW, I have considered previously that splitting the WLAN drivers into
hardware backends and pseudo-device frontends makes a lot of sense as
you consider the possibility to operate more than one 802.11 station on
a single hardware adapter.  It would probably look something like this:

rtwhw0
|
+---net80211arb0
    |
    +---rtw0
    |
    +---rtw1

rtw0 and rtw1 are instances of net80211 state machines.  They command
net80211arb0 to send packets and to pass back received packets meeting
certain criteria ("received on channel y, BSSID x, destinations {p, q,
r}"). net80211arb0 will arbitrate access to the hardware. rtwhw0 will

We cannot.

Dave

-- 
David Young             OJC Technologies
dyoung@ojctech.com      Urbana, IL * (217) 278-3933
From: Matthias Drochner
Date: Friday, March 12, 2010 - 4:37 pm

This looks somewhat shortsighted. "ethernet" is no interface in a
technical sense anymore. It is just perhaps a tag put at protocols
which use 48-bit MAC addresses and can be bridged to other protocols
of that kind. But then, where do draw the line? FDDI can be bridged
to ethernet as well, so would you call it "ether"?

Besides that, I don't see how this could solve any real-world problem
which a trivial shell script can't deal with.

best regards
Matthias



------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzende des Aufsichtsrats: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
From: David Young
Date: Friday, March 12, 2010 - 5:35 pm

It looks to me like the FDDI frame format differs from ethernet's,
however, both frames carry 48-bit source and destination addresses.
The code in fddi_input() and in fddi_output() resembles the code in
ether_input() and in ether_output(); perhaps we can extract some of the
common code into ieee802like_input() and _output() for re-use?  Maybe
fxp should have an ieee802like interface for the bridge to use?  Is that
what you have in mind?  If not, can you please be more specific about
your concerns?

Dave

-- 
David Young             OJC Technologies
dyoung@ojctech.com      Urbana, IL * (217) 278-3933
From: Masao Uebayashi
Date: Saturday, March 13, 2010 - 11:25 pm

In devfs world, traditional device names (/dev/xxxN or ifconfig xxxN)
are provided as a short alias.  Basically devfs internally walks the
whole tree, count the base device name you requested ("fxp", "sd",

Actually, I don't know.  I'm not familar with network.  Basic idea is
if those devices share some ioctl()'s, they should have a superclass.

Masao
Previous thread: removing aiboost(4) as redundant by Constantine Aleksandrovich Murenin on Friday, March 5, 2010 - 11:47 pm. (9 messages)

Next thread: test wanted: module plists by David Holland on Sunday, March 7, 2010 - 7:37 pm. (2 messages)