login
Header Space

 
 

Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor

Previous thread: [RFC][PATCH] page reclaim throttle take2 by KOSAKI Motohiro on Monday, February 25, 2008 - 10:32 pm. (26 messages)

Next thread: [PATCH] cifs: remove GLOBAL_EXTERN macro by Harvey Harrison on Monday, February 25, 2008 - 11:24 pm. (1 message)
To: Andrew Morton <akpm@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>
Date: Monday, February 25, 2008 - 11:21 pm

Hi Andrew,

There is a problem with active restarts in autofs (that is to
say restarting autofs when there are busy mounts).

Currently autofs uses "umount -l" to clear active mounts at
restart. While using lazy umount works for most cases, anything
that needs to walk back up the mount tree to construct a path,
such as getcwd(2) and the proc file system /proc/&lt;pid&gt;/cwd, no
longer works because the point from which the path is constructed
has been detached from the mount tree.

The actual problem with autofs is that it can't reconnect to
existing mounts. Immediately one things of just adding the
ability to remount autofs file systems would solve it, but
alas, that can't work. This is because autofs direct mounts
and the implementation of "on demand mount and expire" of
nested mount trees have the file system mounted on top of
the mount trigger dentry.

To resolve this a miscellaneous device node for routing ioctl
commands to these mount points has been implemented for the
autofs4 kernel module.

For those wishing to test this out an updated user space daemon
is needed. Checking out and building from the git repo or
applying all the current patches to the 5.0.3 tar distribution
will do the trick. This is all available at the usual location
on kernel.org.

Ian
--
To: Ian Kent <raven@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>
Date: Thursday, February 28, 2008 - 12:40 am

Could we please be a bit more specific than "the usual location"?

Should autofs userspace have an entry in Documentation/Changes?
--
To: Andrew Morton <akpm@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>
Date: Thursday, February 28, 2008 - 2:07 am

Yes, I should have been more specific.

Sound like a sensible thing to do.
I'll include a patch for that when I re-post the patch set.

Ian


--
To: Andrew Morton <akpm@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>
Date: Tuesday, February 26, 2008 - 12:29 am

Hi Andrew,

Patch to add a display mount option to show the device number
of the autofs mount super block.

Signed-off-by: Ian Kent &lt; raven@themaw.net&gt;

Ian

---
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.add-mount-device-display-option linux-2.6.25-rc2-mm1/fs/autofs4/inode.c
--- linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.add-mount-device-display-option	2008-02-20 13:01:06.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/inode.c	2008-02-20 13:03:45.000000000 +0900
@@ -190,6 +190,7 @@ static int autofs4_show_options(struct s
 	seq_printf(m, ",timeout=%lu", sbi-&gt;exp_timeout/HZ);
 	seq_printf(m, ",minproto=%d", sbi-&gt;min_proto);
 	seq_printf(m, ",maxproto=%d", sbi-&gt;max_proto);
+	seq_printf(m, ",dev=%d", autofs4_get_dev(sbi));
 
 	if (sbi-&gt;type &amp; AUTOFS_TYPE_OFFSET)
 		seq_printf(m, ",offset");
@@ -332,7 +333,7 @@ int autofs4_fill_super(struct super_bloc
 	sbi-&gt;sb = s;
 	sbi-&gt;version = 0;
 	sbi-&gt;sub_version = 0;
-	sbi-&gt;type = 0;
+	sbi-&gt;type = AUTOFS_TYPE_INDIRECT;
 	sbi-&gt;min_proto = 0;
 	sbi-&gt;max_proto = 0;
 	mutex_init(&amp;sbi-&gt;wq_mutex);
 
--
To: Ian Kent <raven@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>
Date: Thursday, February 28, 2008 - 1:17 am

%u would be more appropriate here.
--
To: Andrew Morton <akpm@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>
Date: Monday, February 25, 2008 - 11:23 pm

Hi Andrew,

Patch to add miscellaneous device to autofs4 module for
ioctls.

Signed-off-by: Ian Kent &lt; raven@themaw.net&gt;

Ian

---
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/expire.c.device-node-ioctl linux-2.6.25-rc2-mm1/fs/autofs4/expire.c
--- linux-2.6.25-rc2-mm1/fs/autofs4/expire.c.device-node-ioctl	2008-01-25 07:58:37.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/expire.c	2008-02-22 11:51:41.000000000 +0900
@@ -244,10 +244,10 @@ cont:
 }
 
 /* Check if we can expire a direct mount (possibly a tree) */
-static struct dentry *autofs4_expire_direct(struct super_block *sb,
-					    struct vfsmount *mnt,
-					    struct autofs_sb_info *sbi,
-					    int how)
+struct dentry *autofs4_expire_direct(struct super_block *sb,
+				     struct vfsmount *mnt,
+				     struct autofs_sb_info *sbi,
+				     int how)
 {
 	unsigned long timeout;
 	struct dentry *root = dget(sb-&gt;s_root);
@@ -281,10 +281,10 @@ static struct dentry *autofs4_expire_dir
  *  - it is unused by any user process
  *  - it has been unused for exp_timeout time
  */
-static struct dentry *autofs4_expire_indirect(struct super_block *sb,
-					      struct vfsmount *mnt,
-					      struct autofs_sb_info *sbi,
-					      int how)
+struct dentry *autofs4_expire_indirect(struct super_block *sb,
+				       struct vfsmount *mnt,
+				       struct autofs_sb_info *sbi,
+				       int how)
 {
 	unsigned long timeout;
 	struct dentry *root = sb-&gt;s_root;
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/init.c.device-node-ioctl linux-2.6.25-rc2-mm1/fs/autofs4/init.c
--- linux-2.6.25-rc2-mm1/fs/autofs4/init.c.device-node-ioctl	2008-01-25 07:58:37.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/init.c	2008-02-22 11:51:41.000000000 +0900
@@ -29,11 +29,20 @@ static struct file_system_type autofs_fs
 
 static int __init init_autofs4_fs(void)
 {
-	return register_filesystem(&amp;autofs_fs_type);
+	int err;
+
+	err = register_filesystem(&amp;autofs_fs_type);
+	if (err)
+		return err;
+
+	err = au...
To: Ian Kent <raven@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Christoph Hellwig <hch@...>, Al Viro <viro@...>
Date: Thursday, February 28, 2008 - 1:17 am

Could you please document the new kernel interface which you're proposing? 
In Docmentation/ or in the changelog?

We seem to be passing some string into a miscdevice ioctl and getting some
results back.  Be aware that this won't be a terribly popular proposal, so
I'd suggest that you fully describe the problem which it's trying to solve,
and how it solves it, and why the various alternatives (sysfs, netlink,



We prefer not to bother with the filename-in-the-file thing.  You know what
file you're reading, and these things tend to not get updated across


This needs parentheses.





That's fd_install() plus an add-on.  It's not autofs-specific.  Should be


We have a new filesystem type, a misc device with a mysterious ioctl,
hand-rolled mountpoint chasing, hand-rolled fd installation and now pipes
too.

This is a complex interface.  We really need to see the overall problem





Have you really carefully reviewed and tested what happens when non-autofs
fds are fed into all the ioctl modes?

I hope all these ioctl entrypoints are root-only.  What determines that? 






--
To: Andrew Morton <akpm@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Christoph Hellwig <hch@...>, Al Viro <viro@...>
Date: Friday, February 29, 2008 - 12:24 pm

It appears I could do this with the generic netlink subsystem.

And, will still be in the netlink implementation and will still return

Agian, will still be in the netlink implementation.


Also, still in the netlink implementation, with a comment a bit more

That's not going to change.
There's nothing new here at all.
This is merely an re-implementation of the existing autofs ioctl

I'll add a document describing this, as previously agreed.


I haven't had any problems with this over time.
I've always thought that this was because the flag is only set during an
expire, of which there is only ever one going for a given mount, is
synchronous, and mount requests only read the flag to check status.

But I could be wrong since it may have been OK because the existing
autofs ioctl holds the BKL for its operations.

I'll think about it.

snip ....



--
To: Andrew Morton <akpm@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Christoph Hellwig <hch@...>, Al Viro <viro@...>, Thomas Graf <tgraf@...>
Date: Friday, April 11, 2008 - 3:02 am

I've spent several weeks on this now and I'm having considerable 
difficulty with the expire function.

First, I think using a raw netlink implementation defeats the point of 
using this approach at all due to increased complexity. So I've used the 
generic netlink facility and the libnl library for user space. While the 
complexity on the kernel side is acceptable that isn't the case in user 
space, the code for the library to issue mount point control commands has 
more than doubled in size and is still not working for mount point 
expiration.  This has been made more difficult because libnl isn't 
thread safe, but I have overcome this limitation for everything but 
the expire function, I now can't determine whether the problem I have with 
receiving multicast messages, possibly out of order, on individual 
netlink sockets opened specifically for this purpose, is due to this or is 
something I'm doing wrong.

The generic netlink implementation allows only one message to be in flight 
at a time. But my expire selects an expire candidate (if possible), sends 
a request to the daemon to do the umount, obtains the result status and 
returns this as the result to the original expire request. Consequently, I 
need to spawn a kernel thread to do this and return, then listen for the
matching multicast message containing the result. I don't particularly 
like spawning a thread to do this because it opens the possibility of 
orphaned threads which introduces other difficulties cleaning them up if 
the user space application goes away or misbehaves. But I'm also having 
problems catching the multicast messages. This works fine in normal 
operation but fails badly when I have multiple concurrent expires 
happening, such as when shutting down the daemon with several hundred 
active mounts. I can't avoid the fact that netlink doesn't provide the 
same functionality as the ioctl interface and clearly isn't meant to.

So, the question is, what are the criteria to use for deciding that a 
netli...
To: Ian Kent <raven@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Christoph Hellwig <hch@...>, Al Viro <viro@...>, Thomas Graf <tgraf@...>, <netdev@...>
Date: Saturday, April 12, 2008 - 12:03 am

Gee, it sounds like you went above and beyond the call there.

The one-message-in-flight limitation of genetlink is suprising - one would
expect a kernel subsystem (especially a networking one) to support

Do I recall correctly in remembering that your original design didn't
really add any _new_ concepts to autofs interfacing?  That inasmuch as
the patch sinned, it was repeating already-committed sins?

And: you know more about this than anyone else, and you are (now) unbiased
by the presence of existing code.  What's your opinion?
--
To: Andrew Morton <akpm@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Christoph Hellwig <hch@...>, Al Viro <viro@...>, Thomas Graf <tgraf@...>, <netdev@...>
Date: Monday, April 14, 2008 - 12:45 am

Hahaha, maybe, but I have to be sure it's not just my own lack of

I'm not sure but I think there are some special requirements for such a
message bus architecture. I've only skimmed the code but it looked like
a mutex for each genetlink family or, ideally, for each socket should
be possible.

We also need to face the fact that this isn't designed to be a drop in
replacement for ioctls as it can't provide (and probably can never
provide) the not often used independently re-entrant function call like

Almost, it is a re-implementation of the existing ioctl interface.

It has an extra entry point so we can obtain a file handle to an autofs
mount that has been over mounted and another to get owner info for mount
re-construction on daemon restart. Which is what we need to be able to
solve the active restart problem.

I also made a couple of improvements, namely, allow actual failure
status to be returned from the daemon to the kernel rather than always
using ENOENT (long overdue, still need to update the daemon though) and
added an additional entry point to check if a path is a mount point so
we can eliminate some of the high overhead mount table scanning in the

There's no question that genetlink is an elegant solution for common
case ioctl functions but, as I say, it's not a complete replacement
probably because it's primary purpose in life is to be a message bus
implementation rather than specifically an ioctl replacement.

As is often the case after posting a "please help" message it occurred
to me that there is another way I might be able to do this. Instead of
sending an independent umount request I could check, and if a candidate
is found, set the expiring flag and return a "yes" status to the daemon
and have the same function do the umount, then clear the when returning
the status. That would eliminate the ugliness in the daemon and the need
to use kernel threads but would open the possibility of the "expiring
flag" remaining set if the daemon went away. That would prev...
To: Andrew Morton <akpm@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Christoph Hellwig <hch@...>, Al Viro <viro@...>
Date: Thursday, February 28, 2008 - 2:18 am

Also a good suggestion.

Yes, as I said above.

I don't expect that people that aren't close to the development of
autofs will "get" the problem description in the leading post but I will
try and expand on it as best I can.

As for the possible alternatives, it sounds like I have some more work
to do on that. Mount options can't be used as I described in the lead in
post and, as far as my understanding of sysfs goes, I don't think it's
appropriate. But, I'm not aware of what the netlink interface may be

--
To: Kernel Mailing List <linux-kernel@...>
Cc: Andrew Morton <akpm@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Christoph Hellwig <hch@...>, Al Viro <viro@...>
Date: Thursday, March 13, 2008 - 3:00 am

I've attempted an implementation of this using the Generic Netlink 
interface and I've struck a difficulty. I would like to use current 
recommended development policy but I'm not sure how far I should go with 
this in order to avoid using an ioctl implementation so I'm after some 
advice and suggestions.

So, onto the description.
Sorry about the length of the post.

Autofs user space uses a number of ioctls for mount control.

I have a problem that can only be resolved by adding an additional control 
function that doesn't need to open the autofs mount point path directly 
to issue ioctl commands. So I decided to re-implement the the control 
interface, hopefully, in a cleaner way and solve a couple of other 
problems at the same time (by adding two additional control functions 
giving a total of three new functions) and improving one of the existing 
functions.

My initial proposal used a miscellaneous device to route ioctl commands to 
autofs mounts and the question of why current recommended alternatives 
were not suitable was asked. The only alternative that may be suitable is 
the Netlink interface.

I won't go into the details of the new functions now but focus on the 
difficulty I have found implementing one of the existing functions using 
the Netlink interface.

There are some restrictions on the scope of the change.
The scope is only the ioctl interface. I don't want change too many things 
at once, in particular things that are currently working OK. And I'd like 
to retain the existing semantic behavior of the interface.

The function that is a problem is the sending of expire requests. In the 
current implementation this function is synchronous. An ioctl is used to 
ask the kernel module (autofs4) to check for mounts that can be expired 
and, if a candidate is found the module sends a request to the user space 
daemon asking it to try and umount the select mount after which the daemon 
sends a success or fail status back to the module which marks the 
complet...
To: Ian Kent <raven@...>
Cc: Kernel Mailing List <linux-kernel@...>, Andrew Morton <akpm@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Christoph Hellwig <hch@...>, Al Viro <viro@...>
Date: Friday, March 14, 2008 - 8:45 am

Netlink is a messaging protocol, synchronization is up to the user.

I suggest that you send a netlink notification to a multicast group for
every expire candiate when an expire request is received. Unmount
daemons simply subscribe to the group and wait for work to do. Put the
request onto a list including the netlink pid and sequence number so you
can address the orignal source of the request later on. Exit the netlink
receive function and wait for the userspace daemon to get back to you.

The userspace daemon notifies you of successful or unsuccesful unmount
attempts by sending notifications. Update your list entry accordingly
and once the request is fullfilled send a notification to the original
source of the request by using the stored pid and sequence number.

The userspace application requesting the expire can simply block on the
receival of this notification in order to make the whole operation
synchronous.

Sounds acceptable?
--
To: Thomas Graf <tgraf@...>
Cc: Kernel Mailing List <linux-kernel@...>, Andrew Morton <akpm@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Christoph Hellwig <hch@...>, Al Viro <viro@...>
Date: Friday, March 14, 2008 - 10:10 am

Yes, I realize that, but what I'm curious about are the options that I
have within the messaging system to control delivery of message replies,
other than using separate sockets. Can this be achieved by using the pid

I'll have to think about what you've said here to relate it to the
situation I have. I don't actually have umount daemons, at the moment I
request an expire and the daemon creates a thread to do the umount and
sends a status message to the kernel module. But that may not matter,

Actually, I've progressed on this since posting.

I've implemented the first steps toward using a workqueue to perform the
expire and, to my surprise, my code worked for a simple test case.
Basically, a thread in the daemon issues the expire, the kernel module
queues the work and replies. The expire workqueue task does the expire
check and if no candidates are found it sends an expire complete
notification, or it sends a umount request to the daemon and waits for
the status result, then returns that result as the expire compete
notification. Seems to work quite well.

I expect this is possibly the method you're suggesting above anyway.

Unfortunately, having now stepped up intensity of the testing, I'm
getting a hard hang on my system. I've setup to reduce the message
functions used to only two simple notification messages to the kernel
module to ensure it isn't the expire implementation causing the problem.
It's hard to see where I could have messed these two functions as they
are essentially re-entrant but there is fairly heavy mount activity of
about 10-15 mounts a second. Such is life!

Any ideas on what might be causing this?

Ian



--
To: Thomas Graf <tgraf@...>
Cc: Kernel Mailing List <linux-kernel@...>, Andrew Morton <akpm@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>
Date: Thursday, March 13, 2008 - 10:45 pm

On Thu, 13 Mar 2008, Ian Kent wrote:

Thomas, could you comment on the Netlink related questions I have posed 
--
To: Andrew Morton <akpm@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>
Date: Monday, February 25, 2008 - 11:23 pm

Hi Andrew,

Patch to track the uid and gid of the last process to request
a mount for on an autofs dentry.

Signed-off-by: Ian Kent &lt; raven@themaw.net&gt;

Ian

---
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/inode.c
--- linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.track-last-mount-ids	2008-02-20 13:11:28.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/inode.c	2008-02-20 13:12:23.000000000 +0900
@@ -43,6 +43,8 @@ struct autofs_info *autofs4_init_ino(str
 
 	ino-&gt;flags = 0;
 	ino-&gt;mode = mode;
+	ino-&gt;uid = 0;
+	ino-&gt;gid = 0;
 	ino-&gt;inode = NULL;
 	ino-&gt;dentry = NULL;
 	ino-&gt;size = 0;
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h
--- linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.track-last-mount-ids	2008-02-20 13:14:03.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h	2008-02-20 13:14:34.000000000 +0900
@@ -58,6 +58,9 @@ struct autofs_info {
 	unsigned long last_used;
 	atomic_t count;
 
+	uid_t uid;
+	gid_t gid;
+
 	mode_t	mode;
 	size_t	size;
 
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c
--- linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids	2008-02-20 13:06:20.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c	2008-02-20 13:10:23.000000000 +0900
@@ -363,6 +363,38 @@ int autofs4_wait(struct autofs_sb_info *
 
 	status = wq-&gt;status;
 
+	/*
+	 * For direct and offset mounts we need to track the requestor
+	 * uid and gid in the dentry info struct. This is so it can be
+	 * supplied, on request, by the misc device ioctl interface.
+	 * This is needed during daemon resatart when reconnecting
+	 * to existing, active, autofs mounts. The uid and gid (and
+	 * related string values) may be used for macro substitution
+	 * in autofs mount maps.
+	 */
+	if (!status) {
+		struct dentry *de = NULL;
+
+		/* direct mount or brow...
To: Ian Kent <raven@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Pavel Emelyanov <xemul@...>, Eric W. Biederman <ebiederm@...>
Date: Thursday, February 28, 2008 - 12:45 am

But uids and gids are no longer system-wide-unique.  Two different users
can have the same identifiers in different namespaces.  What happens then?
--
To: Andrew Morton <akpm@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Pavel Emelyanov <xemul@...>, Eric W. Biederman <ebiederm@...>
Date: Thursday, February 28, 2008 - 2:22 am

That's a tricky question.

Presumably, the process requesting the mount has the user space daemon
running in the namespace within which the uid and gid are to be looked
up, by the daemon.

Am I missing something?

Ian


--
To: Ian Kent <raven@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Pavel Emelyanov <xemul@...>, Eric W. Biederman <ebiederm@...>
Date: Thursday, February 28, 2008 - 2:37 am

err, you assume more knowledge at this end about what you're trying to do
than actually exists :)

You seem to imply that if a machine is running 100 user namespaces then it
needs to run 100 mount daemons.  Doesn't seem good.

What problem are you actually trying to solve here?
--
To: Andrew Morton <akpm@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Pavel Emelyanov <xemul@...>, Eric W. Biederman <ebiederm@...>
Date: Thursday, February 28, 2008 - 3:08 am

More likely my lack of understanding of how namespaces are meant to

The basic problem arises only when we want to restart the user space
daemon and there are active autofs managed mounts in place at exit (ie.
autofs mounts that have busy user mounts). They are left mounted and
processes using them continue to function. But then, when we startup
autofs we need to reconnect to these autofs mounts, some of which can
covered the by mounted file systems, and hence the need for another way
to open an ioctl descriptor to them.

It may have been overkill to re-implement all the current ioctls (and
add a couple of other much needed ones) but I though it sensible for
completeness, and we get to identify any possible problems the current
ioctls might have had due to the use of the BKL (by the VFS when calling
the ioctls).

So, why do we need the uid and gid? When someone walks over an autofs
dentry that is meant to cause a mount we send a request packet to the
daemon via a pipe which includes the process uid and gid, and as part of
the lookup we set macros for several mount map substitution variables,
derived from the uid and gid of the process requesting the mount and
they can be used within autofs maps.

This is all fine as long as we don't need to re-connect to these mounts
when starting up, since we don't get kernel requests for the mounts, we
need to obtain that information from the active mount itself.

Ian



--
To: Ian Kent <raven@...>
Cc: Andrew Morton <akpm@...>, Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Eric W. Biederman <ebiederm@...>
Date: Thursday, February 28, 2008 - 3:51 am

Why do we need the uid then? Is just pid not enough to uniquely 
identify a task?

Assuming we can get by with a pid only, this problem can be solved
by sending a pid_nr() of a task, i.e. the pid by which this task is
seen from an initial namespace. This pid is unique across the system
even when pid namespaces are created.

But this ... trick is only valid if the daemon, that receives the 
pid doesn't try to communicate with this task (e.g. send him a signal),
but just uses this as a key to lookup in some hash. This is not about
security - even having someone's global pid task can do nothing useful 
with it - this is about the consistency - when sending a signal to a
task, giving its _global_ pid to sys_kill() the signal may arrive to a 

--
To: Pavel Emelyanov <xemul@...>
Cc: Ian Kent <raven@...>, Andrew Morton <akpm@...>, Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>
Date: Thursday, February 28, 2008 - 4:33 pm

Pavel it is never correct to use a global pid when talking to user space.
In fact the concept is just a bit dubious.  We must always translate
the pid into the pid namespace of the task we are talking to, or at
least into the pid namespace of the process that opened the file
handle, (essentially the same, but does not have races in the corner
cases).

Even in the kernel using global ids is dubious.  When dealing with
user space it is just wrong.  

Speaking of.  I think we still need work on autofs in this regard.
I know last I looked we had some outstanding issues there.

Eric
--
To: Pavel Emelyanov <xemul@...>
Cc: Ian Kent <raven@...>, Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Eric W. Biederman <ebiederm@...>
Date: Thursday, February 28, 2008 - 3:59 am

The problem is that the userspace daemon is restarted.  ie: it exits
and is re-run.  It then needs to pick up various state from its previous
run.
--
To: Andrew Morton <akpm@...>
Cc: Pavel Emelyanov <xemul@...>, Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Eric W. Biederman <ebiederm@...>
Date: Thursday, February 28, 2008 - 4:06 am

Dumb old me, I really only need the uid.
The gid can come from the get user info functions of glibc.
DOH!

--
To: Ian Kent <raven@...>
Cc: Andrew Morton <akpm@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Eric W. Biederman <ebiederm@...>, Kernel Mailing List <linux-kernel@...>, Pavel Emelyanov <xemul@...>
Date: Thursday, February 28, 2008 - 8:31 am

In case the process was executed from a setgid executable, you might
have a different gid from what the user has. In fact, I don't know why
you may need more than the pid, since with the pid you can get to the
task's effective uid/gid and maybe other such information you need.

Cheers,
Fábio Olivé
-- 
ex sed lex awk yacc, e pluribus unix, amem
--
To: Ian Kent <raven@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Pavel Emelyanov <xemul@...>, Eric W. Biederman <ebiederm@...>
Date: Thursday, February 28, 2008 - 3:23 am

So we want to store persistant state in the kernel across userspace process
invokations.  That's normally done with a thing called a "file" ;) Could we
stick all the necessary state into files in a pseudo-fs and have the daemon

I don't understand that bit.  But then, I don't have a clue how autofs

It isn't a good idea to wait for races to reveal themselves.  It will take
years, especially with a system which has as low a call frequency as autofs
mounting.  And once a bug _does_ reveal itself, by then we'll have tens of


yeah, could be a problem.  Hopefully the namespace people can advise. 
Perhaps we need a concept of an exportable-to-userspace namespace-id+uid,
namespace-id+gid, namespace-id+pid, etc for this sort of thing.  It has

--
To: Andrew Morton <akpm@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Pavel Emelyanov <xemul@...>, Eric W. Biederman <ebiederm@...>
Date: Thursday, February 28, 2008 - 4:00 am

Nearly sent of a reply without thinking about this and it sounds like a
good idea initially. But, if we have a large number of autofs file
systems mounted (thousands?), duplicating the information already
present in the autofs file system seems untidy and unnecessary. I
thought of exposing this information in in sysfs, but that would make
the autofs module part of sysfs have many files, and there is the
problem that the same path could have more than one autofs file system
stacked on top of it, so isn't unique.

The other obvious place is in the mount options but that is one of the
reasons that only the device number is exposed their. If we have 5000 -
10000 mounts then scanning /proc/mounts becomes a big problem from a CPU
usage perspective (and is already a big problem for me, which is partly
addressed by the new bits in the implementation here). If I could think
of another way to expose the device number as well I would use it, even

Not sure I agree about the low call frequency.

It's quite normal for smaller sites to have hundreds of entries in maps
and equally common for them to do stupid things like run scripts that
scan the file systems, mounting every mount.

It's not quite the same order of magnitude, but I regularly use the
autofs connectathon suite for testing. It only ends up mounting several
hundred mounts but I can get mounting and expiring happening at the same

Right, I'll see what I can find on those topics.
My concern is that this will require considerable work in the daemon
which would be fine for version 5.1 but I need to resolve this problem


--
To: Ian Kent <raven@...>
Cc: Andrew Morton <akpm@...>, Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Pavel Emelyanov <xemul@...>, Eric W. Biederman <ebiederm@...>
Date: Thursday, February 28, 2008 - 1:13 pm

I think there is some confusion surrounding what the UID and GID are
used for in this context.  I'll try to explain it as best I can.

When the automount daemon parses a map entry, it will do some amount of
variable substitution.  So, let's say you're running on an i386 box, and
you want to mount a library directory from a server.  You might have a
map entry that looks like this:

lib	server:/export/$ARCH/lib

In this case, the automount daemon will replace $ARCH with i386, and
will try the following mount command:

mount -t nfs server:/export/i386/lib /automountdir/lib

There are cases where it would be helpful to use the requesting
process's UID in such a variable substitution.  Consider the case of a
CIFS share, where the automount daemon runs as user root, but we want to
mount the share using the credentials of the requesting user.  In this
case, the UID and GID can be helpful in formatting the mount options for
mounting the share.

So, the UID and GID are used only for map substitutions.  Now, having
said all of that, I'll have to look more closely at why we even need to
keep track of it, given that it only needs to be used when performing
the lookup, and at that time we have information on the requesting UID
and GID.

Cheers,

Jeff
--
To: Jeff Moyer <jmoyer@...>
Cc: Ian Kent <raven@...>, Andrew Morton <akpm@...>, Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Pavel Emelyanov <xemul@...>, Eric W. Biederman <ebiederm@...>
Date: Thursday, February 28, 2008 - 3:51 pm

Thanks Jeff.  If that's the case then user namespaces don't affect this
at all.

(Still trying to follow the rest of the thread bc i definately feel like
I'm missing something.  I swear I understood autofs 10+ years ago :)

thanks,
-serge
--
To: Serge E. Hallyn <serue@...>
Cc: Jeff Moyer <jmoyer@...>, Andrew Morton <akpm@...>, Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Pavel Emelyanov <xemul@...>, Eric W. Biederman <ebiederm@...>
Date: Thursday, February 28, 2008 - 11:32 pm

Yep, that's precisely the way this is used, by autofs anyway.

I guess the problem we face is that since this is a public interface
other applications could use this in a different way and we can't
control that. I think I need more information so I can document the
defined usage in my revised patch set.

In version 5 I set $UID, $GID, $USER, $GROUP and $HOME in addition to
the standard autofs macros, $ARCH, $CPU, $HOST, $OSNAME, $OSREL and
$OSVERS, and then expand the map entry.

The question that Jeff is asking himself is, why do we need this
information when we re-connect at startup, since the mounts are already
present.

The answer isn't easy to explain and will be lengthy, sorry, but let me
try anyway.

There are two types on maps, direct (in the module source you will see a
third type called an offset, which is just a direct mount in disguise)
and indirect.

For example, here is master map with direct and indirect map entries:

/-	/etc/auto.direct
/test	/etc/auto.indirect

/etc/auto.direct:

/automount/dparse/g6  budgie:/autofs/export1
/automount/dparse/g1  shark:/autofs/export1
and so on.

/etc/auto.indirect:

g1    shark:/autofs/export1
g6    budgie:/autofs/export1
and so on.

For the above indirect map an autofs file system is mounted on /test and
mounts are triggered by the inode lookup operation. So we see a mount of
shark:/autofs/export1 on /test/g1, for example.

The way that direct mounts are handled is by makeing an autofs mount on
each full path, such as /automount/dparse/g1, and using it as a mount
trigger. So when we walk on the path we mount shark:/autofs/export1 on
top of this mount point, for example. Since these are always a
directories we can use the follow_link inode operation to trigger the
mount.

But, each entry in direct and indirect maps can have offsets (often
called multi-mount map entries).

For example, 

a direct mount map entry could also be:

/automount/dparse/g1 \
    /       shark:/autofs/export5/testing/test \
   ...
To: Ian Kent <raven@...>
Cc: Serge E. Hallyn <serue@...>, Jeff Moyer <jmoyer@...>, Andrew Morton <akpm@...>, Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Pavel Emelyanov <xemul@...>, Eric W. Biederman <ebiederm@...>
Date: Friday, February 29, 2008 - 12:09 pm

The way the user namespaces work right now is similar to say the IPC
namespace - a task belongs to one user, that user belongs to precisely
one user namespace.

Even in my additional userns patches, I was changing uid to store the
(uid, userns) so a struct user still belonged to just one user
namespace.

In contrast, with pid namespaces a task is associated with a 'struct
pid' which links it to multiple process ids, one in each pid namespace
to which it belongs.

Perhaps we should be treating user namespaces like pid namespaces?

For autofs this would mean that when autofs wants a uid for some task,
it would be given the uid in the user namespace which autofs 'knows'.

It would also help me fix the siginfo problems I haven't solved yet -
rather than having to worry about user namespace lifetimes with siginfos
(which last a little while but have no clearly defined lifespan) we
could send the uid in an init user namespace or the uid in the target
uid namespace, or just a lightweight user struct proxy akin to 'struct
pid'.

And it also obviates the need for any sort of delegation.

So if I'm user 500 in what I think is the initial user namespace, I can
create a container with a new user namespace, the init task of which is
both uid 0 in the child userns, and uid 500 in the higher level,
automatically giving the container access to any files I own.

Eric, when you get a chance (I know you're overloaded atm) I'd love to
--
To: Serge E. Hallyn <serue@...>
Cc: Ian Kent <raven@...>, Jeff Moyer <jmoyer@...>, Andrew Morton <akpm@...>, Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Pavel Emelyanov <xemul@...>, Eric W. Biederman <ebiederm@...>
Date: Saturday, March 1, 2008 - 9:13 pm

Succinctly.

I think the concept of mapping uids between user namespaces is
fundamental to properly describing and thinking about the semantics of
user namespaces correct.

We don't have to start out anything except handling the case when
no mapping exists, but asking the question how does this uid map
between from one namespace to another is fundamental.

Eric







--
To: Eric W. Biederman <ebiederm@...>
Cc: Serge E. Hallyn <serue@...>, Ian Kent <raven@...>, Jeff Moyer <jmoyer@...>, Andrew Morton <akpm@...>, Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Pavel Emelyanov <xemul@...>
Date: Monday, March 3, 2008 - 11:28 am

Earlier I had thought this could just be done using a special keyring,
but atm I'm thinking that would be far uglier than just having a
struct pid-like credential proxy in the kernel to pass around in place

True.

But in any case I'm happy letting other things like netns and related
sys be completed before prototyping this.

thanks,
-serge
--
To: Serge E. Hallyn <serue@...>
Cc: Ian Kent <raven@...>, Jeff Moyer <jmoyer@...>, Andrew Morton <akpm@...>, Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Pavel Emelyanov <xemul@...>
Date: Tuesday, March 4, 2008 - 6:16 pm

I have not looked at many of the implementation possibilities so unfortunately
I don't know what makes for a good implementation.

What I do know is that uids are serialized in filesystems, and their
mapping between namespaces is defined by system administrators.

Both of those properties are different from struct pid.  Which means
a generalized struct user in the kernel can at best hold a cache of the
mappings.

My preliminary investigations suggested that for the kernel filesystem
boundary generating a struct user or a struct group just to use for a
permission check and then to throw it away was wasteful.

However for inkernel entities a struct user sounds practical.

All of which is to say that we can learn lessons from the
implementation of struct pid but that we also have different
requirements so we can only use those lessons in a limited fashion.

Eric
--
To: Serge E. Hallyn <serue@...>
Cc: Ian Kent <raven@...>, Jeff Moyer <jmoyer@...>, Andrew Morton <akpm@...>, Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Pavel Emelyanov <xemul@...>
Date: Saturday, March 1, 2008 - 8:49 pm

I definitely think we should have some support like that.

We already have code in nfsv4 and p9fs if I remember correctly to make
between the user namespace of the server (which is string based) and
the uids of the local machine.

We also have a similar issue with security keys.

I don't know if the strict hierarchical nature we have makes a lot of
sense.  One of the things that we should account for is that
frequently user namespaces are kept in sync between multiple machines
by system administrators.  So in the real world user namespaces are a

Yes.  The user namespace of the process that opened the pipe I believe

Yes.  For the pids we have been looking at sending the pid in the
target namespace and sending the uid in the target namespace should be

Right.

Long term we want to look at making this an unprivileged operation.
Allowing a user to run less privileged processes.

My impression has always been that going from comparing (userns, uid)
to a more sophisticated mapping approach was a compatible extension.
However if it looks like we need user namespace mapping support up

I think you are on the right track.  In a lot of ways the user
namespace is the trickiest, as this is where we change the security
rules.  If it is only at the level of who is who.

Since we already have user namespace mapping infrastructure in the
kernel and ways to call back to user space to ask what the mapping
should be, I feel performing mapping in the user namespace
and generalizing the existing capability is a good idea.

We still want to tell users if you can get away with it synchronize
your user namespaces across file systems, and kernels, and machines.

If they can't having good general tools in the kernel that you only
need to learn once and not once per instance sounds good.

Eric
--
To: Serge E. Hallyn <serue@...>
Cc: Ian Kent <raven@...>, Jeff Moyer <jmoyer@...>, Andrew Morton <akpm@...>, Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Eric W. Biederman <ebiederm@...>
Date: Friday, February 29, 2008 - 12:20 pm

I'm afraid, that I'm just starting a new thread of discussion in a

So do you mean that I can become a root, by calling clone()?

Thanks,
Pavel
--
To: Pavel Emelyanov <xemul@...>
Cc: Serge E. Hallyn <serue@...>, Ian Kent <raven@...>, Jeff Moyer <jmoyer@...>, Andrew Morton <akpm@...>, Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>, Eric W. Biederman <ebiederm@...>
Date: Friday, February 29, 2008 - 1:42 pm

You can become root in the new container.  Your capabilities are
meaningful only to targets (users, files) which exist in the user
namespace in which you are root.  It becomes more precise than the
--
To: Andrew Morton <akpm@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>
Date: Tuesday, February 26, 2008 - 1:14 am

As has been spotted, this is obviously wrong.
And here is the correction.

Signed-off-by: Ian Kent &lt;raven@themaw.net&gt;

Ian

---
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids-fix linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c
--- linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids-fix	2008-02-26 14:02:05.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c	2008-02-26 14:02:20.000000000 +0900
@@ -385,10 +385,8 @@ int autofs4_wait(struct autofs_sb_info *
 
 		/* Set mount requestor */
 		if (ino) {
-			if (ino) {
-				ino-&gt;uid = wq-&gt;uid;
-				ino-&gt;gid = wq-&gt;gid;
-			}
+			ino-&gt;uid = wq-&gt;uid;
+			ino-&gt;gid = wq-&gt;gid;
 		}
 
 		if (de)
--
To: Andrew Morton <akpm@...>
Cc: Kernel Mailing List <linux-kernel@...>, autofs mailing list <autofs@...>, linux-fsdevel <linux-fsdevel@...>
Date: Monday, February 25, 2008 - 11:22 pm

Hi Andrew,

Patch to catch invalid dentry when calculating it's path.

Signed-off-by: Ian Kent &lt;raven@themaw.net&gt;

Ian

---
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.getpath-check-valid-dentry linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c
--- linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.getpath-check-valid-dentry	2008-02-20 12:55:39.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c	2008-02-20 12:54:46.000000000 +0900
@@ -171,7 +171,7 @@ static int autofs4_getpath(struct autofs
 	for (tmp = dentry ; tmp != root ; tmp = tmp-&gt;d_parent)
 		len += tmp-&gt;d_name.len + 1;
 
-	if (--len &gt; NAME_MAX) {
+	if (!len || --len &gt; NAME_MAX) {
 		spin_unlock(&amp;dcache_lock);
 		return 0;
 	}
--
Previous thread: [RFC][PATCH] page reclaim throttle take2 by KOSAKI Motohiro on Monday, February 25, 2008 - 10:32 pm. (26 messages)

Next thread: [PATCH] cifs: remove GLOBAL_EXTERN macro by Harvey Harrison on Monday, February 25, 2008 - 11:24 pm. (1 message)
speck-geostationary