Hi Andrew, There is a problem with active restarts in autofs (that is to say restarting autofs when there are busy mounts). Currently autofs uses "umount -l" to clear active mounts at restart. While using lazy umount works for most cases, anything that needs to walk back up the mount tree to construct a path, such as getcwd(2) and the proc file system /proc/<pid>/cwd, no longer works because the point from which the path is constructed has been detached from the mount tree. The actual problem with autofs is that it can't reconnect to existing mounts. Immediately one things of just adding the ability to remount autofs file systems would solve it, but alas, that can't work. This is because autofs direct mounts and the implementation of "on demand mount and expire" of nested mount trees have the file system mounted on top of the mount trigger dentry. To resolve this a miscellaneous device node for routing ioctl commands to these mount points has been implemented for the autofs4 kernel module. For those wishing to test this out an updated user space daemon is needed. Checking out and building from the git repo or applying all the current patches to the 5.0.3 tar distribution will do the trick. This is all available at the usual location on kernel.org. Ian --
Could we please be a bit more specific than "the usual location"? Should autofs userspace have an entry in Documentation/Changes? --
Yes, I should have been more specific. Sound like a sensible thing to do. I'll include a patch for that when I re-post the patch set. Ian --
Hi Andrew, Patch to add a display mount option to show the device number of the autofs mount super block. Signed-off-by: Ian Kent < raven@themaw.net> Ian --- diff -up linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.add-mount-device-display-option linux-2.6.25-rc2-mm1/fs/autofs4/inode.c --- linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.add-mount-device-display-option 2008-02-20 13:01:06.000000000 +0900 +++ linux-2.6.25-rc2-mm1/fs/autofs4/inode.c 2008-02-20 13:03:45.000000000 +0900 @@ -190,6 +190,7 @@ static int autofs4_show_options(struct s seq_printf(m, ",timeout=%lu", sbi->exp_timeout/HZ); seq_printf(m, ",minproto=%d", sbi->min_proto); seq_printf(m, ",maxproto=%d", sbi->max_proto); + seq_printf(m, ",dev=%d", autofs4_get_dev(sbi)); if (sbi->type & AUTOFS_TYPE_OFFSET) seq_printf(m, ",offset"); @@ -332,7 +333,7 @@ int autofs4_fill_super(struct super_bloc sbi->sb = s; sbi->version = 0; sbi->sub_version = 0; - sbi->type = 0; + sbi->type = AUTOFS_TYPE_INDIRECT; sbi->min_proto = 0; sbi->max_proto = 0; mutex_init(&sbi->wq_mutex); --
%u would be more appropriate here. --
Hi Andrew,
Patch to add miscellaneous device to autofs4 module for
ioctls.
Signed-off-by: Ian Kent < raven@themaw.net>
Ian
---
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/expire.c.device-node-ioctl linux-2.6.25-rc2-mm1/fs/autofs4/expire.c
--- linux-2.6.25-rc2-mm1/fs/autofs4/expire.c.device-node-ioctl 2008-01-25 07:58:37.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/expire.c 2008-02-22 11:51:41.000000000 +0900
@@ -244,10 +244,10 @@ cont:
}
/* Check if we can expire a direct mount (possibly a tree) */
-static struct dentry *autofs4_expire_direct(struct super_block *sb,
- struct vfsmount *mnt,
- struct autofs_sb_info *sbi,
- int how)
+struct dentry *autofs4_expire_direct(struct super_block *sb,
+ struct vfsmount *mnt,
+ struct autofs_sb_info *sbi,
+ int how)
{
unsigned long timeout;
struct dentry *root = dget(sb->s_root);
@@ -281,10 +281,10 @@ static struct dentry *autofs4_expire_dir
* - it is unused by any user process
* - it has been unused for exp_timeout time
*/
-static struct dentry *autofs4_expire_indirect(struct super_block *sb,
- struct vfsmount *mnt,
- struct autofs_sb_info *sbi,
- int how)
+struct dentry *autofs4_expire_indirect(struct super_block *sb,
+ struct vfsmount *mnt,
+ struct autofs_sb_info *sbi,
+ int how)
{
unsigned long timeout;
struct dentry *root = sb->s_root;
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/init.c.device-node-ioctl linux-2.6.25-rc2-mm1/fs/autofs4/init.c
--- linux-2.6.25-rc2-mm1/fs/autofs4/init.c.device-node-ioctl 2008-01-25 07:58:37.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/init.c 2008-02-22 11:51:41.000000000 +0900
@@ -29,11 +29,20 @@ static struct file_system_type autofs_fs
static int __init init_autofs4_fs(void)
{
- return register_filesystem(&autofs_fs_type);
+ int err;
+
+ err = register_filesystem(&autofs_fs_type);
+ if (err)
+ return err;
+
+ err = au...Could you please document the new kernel interface which you're proposing? In Docmentation/ or in the changelog? We seem to be passing some string into a miscdevice ioctl and getting some results back. Be aware that this won't be a terribly popular proposal, so I'd suggest that you fully describe the problem which it's trying to solve, and how it solves it, and why the various alternatives (sysfs, netlink, We prefer not to bother with the filename-in-the-file thing. You know what file you're reading, and these things tend to not get updated across This needs parentheses. That's fd_install() plus an add-on. It's not autofs-specific. Should be We have a new filesystem type, a misc device with a mysterious ioctl, hand-rolled mountpoint chasing, hand-rolled fd installation and now pipes too. This is a complex interface. We really need to see the overall problem Have you really carefully reviewed and tested what happens when non-autofs fds are fed into all the ioctl modes? I hope all these ioctl entrypoints are root-only. What determines that? --
It appears I could do this with the generic netlink subsystem. And, will still be in the netlink implementation and will still return Agian, will still be in the netlink implementation. Also, still in the netlink implementation, with a comment a bit more That's not going to change. There's nothing new here at all. This is merely an re-implementation of the existing autofs ioctl I'll add a document describing this, as previously agreed. I haven't had any problems with this over time. I've always thought that this was because the flag is only set during an expire, of which there is only ever one going for a given mount, is synchronous, and mount requests only read the flag to check status. But I could be wrong since it may have been OK because the existing autofs ioctl holds the BKL for its operations. I'll think about it. snip .... --
I've spent several weeks on this now and I'm having considerable difficulty with the expire function. First, I think using a raw netlink implementation defeats the point of using this approach at all due to increased complexity. So I've used the generic netlink facility and the libnl library for user space. While the complexity on the kernel side is acceptable that isn't the case in user space, the code for the library to issue mount point control commands has more than doubled in size and is still not working for mount point expiration. This has been made more difficult because libnl isn't thread safe, but I have overcome this limitation for everything but the expire function, I now can't determine whether the problem I have with receiving multicast messages, possibly out of order, on individual netlink sockets opened specifically for this purpose, is due to this or is something I'm doing wrong. The generic netlink implementation allows only one message to be in flight at a time. But my expire selects an expire candidate (if possible), sends a request to the daemon to do the umount, obtains the result status and returns this as the result to the original expire request. Consequently, I need to spawn a kernel thread to do this and return, then listen for the matching multicast message containing the result. I don't particularly like spawning a thread to do this because it opens the possibility of orphaned threads which introduces other difficulties cleaning them up if the user space application goes away or misbehaves. But I'm also having problems catching the multicast messages. This works fine in normal operation but fails badly when I have multiple concurrent expires happening, such as when shutting down the daemon with several hundred active mounts. I can't avoid the fact that netlink doesn't provide the same functionality as the ioctl interface and clearly isn't meant to. So, the question is, what are the criteria to use for deciding that a netli...
Gee, it sounds like you went above and beyond the call there. The one-message-in-flight limitation of genetlink is suprising - one would expect a kernel subsystem (especially a networking one) to support Do I recall correctly in remembering that your original design didn't really add any _new_ concepts to autofs interfacing? That inasmuch as the patch sinned, it was repeating already-committed sins? And: you know more about this than anyone else, and you are (now) unbiased by the presence of existing code. What's your opinion? --
Hahaha, maybe, but I have to be sure it's not just my own lack of I'm not sure but I think there are some special requirements for such a message bus architecture. I've only skimmed the code but it looked like a mutex for each genetlink family or, ideally, for each socket should be possible. We also need to face the fact that this isn't designed to be a drop in replacement for ioctls as it can't provide (and probably can never provide) the not often used independently re-entrant function call like Almost, it is a re-implementation of the existing ioctl interface. It has an extra entry point so we can obtain a file handle to an autofs mount that has been over mounted and another to get owner info for mount re-construction on daemon restart. Which is what we need to be able to solve the active restart problem. I also made a couple of improvements, namely, allow actual failure status to be returned from the daemon to the kernel rather than always using ENOENT (long overdue, still need to update the daemon though) and added an additional entry point to check if a path is a mount point so we can eliminate some of the high overhead mount table scanning in the There's no question that genetlink is an elegant solution for common case ioctl functions but, as I say, it's not a complete replacement probably because it's primary purpose in life is to be a message bus implementation rather than specifically an ioctl replacement. As is often the case after posting a "please help" message it occurred to me that there is another way I might be able to do this. Instead of sending an independent umount request I could check, and if a candidate is found, set the expiring flag and return a "yes" status to the daemon and have the same function do the umount, then clear the when returning the status. That would eliminate the ugliness in the daemon and the need to use kernel threads but would open the possibility of the "expiring flag" remaining set if the daemon went away. That would prev...
Also a good suggestion. Yes, as I said above. I don't expect that people that aren't close to the development of autofs will "get" the problem description in the leading post but I will try and expand on it as best I can. As for the possible alternatives, it sounds like I have some more work to do on that. Mount options can't be used as I described in the lead in post and, as far as my understanding of sysfs goes, I don't think it's appropriate. But, I'm not aware of what the netlink interface may be --
I've attempted an implementation of this using the Generic Netlink interface and I've struck a difficulty. I would like to use current recommended development policy but I'm not sure how far I should go with this in order to avoid using an ioctl implementation so I'm after some advice and suggestions. So, onto the description. Sorry about the length of the post. Autofs user space uses a number of ioctls for mount control. I have a problem that can only be resolved by adding an additional control function that doesn't need to open the autofs mount point path directly to issue ioctl commands. So I decided to re-implement the the control interface, hopefully, in a cleaner way and solve a couple of other problems at the same time (by adding two additional control functions giving a total of three new functions) and improving one of the existing functions. My initial proposal used a miscellaneous device to route ioctl commands to autofs mounts and the question of why current recommended alternatives were not suitable was asked. The only alternative that may be suitable is the Netlink interface. I won't go into the details of the new functions now but focus on the difficulty I have found implementing one of the existing functions using the Netlink interface. There are some restrictions on the scope of the change. The scope is only the ioctl interface. I don't want change too many things at once, in particular things that are currently working OK. And I'd like to retain the existing semantic behavior of the interface. The function that is a problem is the sending of expire requests. In the current implementation this function is synchronous. An ioctl is used to ask the kernel module (autofs4) to check for mounts that can be expired and, if a candidate is found the module sends a request to the user space daemon asking it to try and umount the select mount after which the daemon sends a success or fail status back to the module which marks the complet...
Netlink is a messaging protocol, synchronization is up to the user. I suggest that you send a netlink notification to a multicast group for every expire candiate when an expire request is received. Unmount daemons simply subscribe to the group and wait for work to do. Put the request onto a list including the netlink pid and sequence number so you can address the orignal source of the request later on. Exit the netlink receive function and wait for the userspace daemon to get back to you. The userspace daemon notifies you of successful or unsuccesful unmount attempts by sending notifications. Update your list entry accordingly and once the request is fullfilled send a notification to the original source of the request by using the stored pid and sequence number. The userspace application requesting the expire can simply block on the receival of this notification in order to make the whole operation synchronous. Sounds acceptable? --
Yes, I realize that, but what I'm curious about are the options that I have within the messaging system to control delivery of message replies, other than using separate sockets. Can this be achieved by using the pid I'll have to think about what you've said here to relate it to the situation I have. I don't actually have umount daemons, at the moment I request an expire and the daemon creates a thread to do the umount and sends a status message to the kernel module. But that may not matter, Actually, I've progressed on this since posting. I've implemented the first steps toward using a workqueue to perform the expire and, to my surprise, my code worked for a simple test case. Basically, a thread in the daemon issues the expire, the kernel module queues the work and replies. The expire workqueue task does the expire check and if no candidates are found it sends an expire complete notification, or it sends a umount request to the daemon and waits for the status result, then returns that result as the expire compete notification. Seems to work quite well. I expect this is possibly the method you're suggesting above anyway. Unfortunately, having now stepped up intensity of the testing, I'm getting a hard hang on my system. I've setup to reduce the message functions used to only two simple notification messages to the kernel module to ensure it isn't the expire implementation causing the problem. It's hard to see where I could have messed these two functions as they are essentially re-entrant but there is fairly heavy mount activity of about 10-15 mounts a second. Such is life! Any ideas on what might be causing this? Ian --
On Thu, 13 Mar 2008, Ian Kent wrote: Thomas, could you comment on the Netlink related questions I have posed --
Hi Andrew,
Patch to track the uid and gid of the last process to request
a mount for on an autofs dentry.
Signed-off-by: Ian Kent < raven@themaw.net>
Ian
---
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/inode.c
--- linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.track-last-mount-ids 2008-02-20 13:11:28.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/inode.c 2008-02-20 13:12:23.000000000 +0900
@@ -43,6 +43,8 @@ struct autofs_info *autofs4_init_ino(str
ino->flags = 0;
ino->mode = mode;
+ ino->uid = 0;
+ ino->gid = 0;
ino->inode = NULL;
ino->dentry = NULL;
ino->size = 0;
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h
--- linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.track-last-mount-ids 2008-02-20 13:14:03.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h 2008-02-20 13:14:34.000000000 +0900
@@ -58,6 +58,9 @@ struct autofs_info {
unsigned long last_used;
atomic_t count;
+ uid_t uid;
+ gid_t gid;
+
mode_t mode;
size_t size;
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c
--- linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids 2008-02-20 13:06:20.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c 2008-02-20 13:10:23.000000000 +0900
@@ -363,6 +363,38 @@ int autofs4_wait(struct autofs_sb_info *
status = wq->status;
+ /*
+ * For direct and offset mounts we need to track the requestor
+ * uid and gid in the dentry info struct. This is so it can be
+ * supplied, on request, by the misc device ioctl interface.
+ * This is needed during daemon resatart when reconnecting
+ * to existing, active, autofs mounts. The uid and gid (and
+ * related string values) may be used for macro substitution
+ * in autofs mount maps.
+ */
+ if (!status) {
+ struct dentry *de = NULL;
+
+ /* direct mount or brow...But uids and gids are no longer system-wide-unique. Two different users can have the same identifiers in different namespaces. What happens then? --
That's a tricky question. Presumably, the process requesting the mount has the user space daemon running in the namespace within which the uid and gid are to be looked up, by the daemon. Am I missing something? Ian --
err, you assume more knowledge at this end about what you're trying to do than actually exists :) You seem to imply that if a machine is running 100 user namespaces then it needs to run 100 mount daemons. Doesn't seem good. What problem are you actually trying to solve here? --
More likely my lack of understanding of how namespaces are meant to The basic problem arises only when we want to restart the user space daemon and there are active autofs managed mounts in place at exit (ie. autofs mounts that have busy user mounts). They are left mounted and processes using them continue to function. But then, when we startup autofs we need to reconnect to these autofs mounts, some of which can covered the by mounted file systems, and hence the need for another way to open an ioctl descriptor to them. It may have been overkill to re-implement all the current ioctls (and add a couple of other much needed ones) but I though it sensible for completeness, and we get to identify any possible problems the current ioctls might have had due to the use of the BKL (by the VFS when calling the ioctls). So, why do we need the uid and gid? When someone walks over an autofs dentry that is meant to cause a mount we send a request packet to the daemon via a pipe which includes the process uid and gid, and as part of the lookup we set macros for several mount map substitution variables, derived from the uid and gid of the process requesting the mount and they can be used within autofs maps. This is all fine as long as we don't need to re-connect to these mounts when starting up, since we don't get kernel requests for the mounts, we need to obtain that information from the active mount itself. Ian --
Why do we need the uid then? Is just pid not enough to uniquely identify a task? Assuming we can get by with a pid only, this problem can be solved by sending a pid_nr() of a task, i.e. the pid by which this task is seen from an initial namespace. This pid is unique across the system even when pid namespaces are created. But this ... trick is only valid if the daemon, that receives the pid doesn't try to communicate with this task (e.g. send him a signal), but just uses this as a key to lookup in some hash. This is not about security - even having someone's global pid task can do nothing useful with it - this is about the consistency - when sending a signal to a task, giving its _global_ pid to sys_kill() the signal may arrive to a --
Pavel it is never correct to use a global pid when talking to user space. In fact the concept is just a bit dubious. We must always translate the pid into the pid namespace of the task we are talking to, or at least into the pid namespace of the process that opened the file handle, (essentially the same, but does not have races in the corner cases). Even in the kernel using global ids is dubious. When dealing with user space it is just wrong. Speaking of. I think we still need work on autofs in this regard. I know last I looked we had some outstanding issues there. Eric --
The problem is that the userspace daemon is restarted. ie: it exits and is re-run. It then needs to pick up various state from its previous run. --
Dumb old me, I really only need the uid. The gid can come from the get user info functions of glibc. DOH! --
In case the process was executed from a setgid executable, you might have a different gid from what the user has. In fact, I don't know why you may need more than the pid, since with the pid you can get to the task's effective uid/gid and maybe other such information you need. Cheers, Fábio Olivé -- ex sed lex awk yacc, e pluribus unix, amem --
So we want to store persistant state in the kernel across userspace process invokations. That's normally done with a thing called a "file" ;) Could we stick all the necessary state into files in a pseudo-fs and have the daemon I don't understand that bit. But then, I don't have a clue how autofs It isn't a good idea to wait for races to reveal themselves. It will take years, especially with a system which has as low a call frequency as autofs mounting. And once a bug _does_ reveal itself, by then we'll have tens of yeah, could be a problem. Hopefully the namespace people can advise. Perhaps we need a concept of an exportable-to-userspace namespace-id+uid, namespace-id+gid, namespace-id+pid, etc for this sort of thing. It has --
Nearly sent of a reply without thinking about this and it sounds like a good idea initially. But, if we have a large number of autofs file systems mounted (thousands?), duplicating the information already present in the autofs file system seems untidy and unnecessary. I thought of exposing this information in in sysfs, but that would make the autofs module part of sysfs have many files, and there is the problem that the same path could have more than one autofs file system stacked on top of it, so isn't unique. The other obvious place is in the mount options but that is one of the reasons that only the device number is exposed their. If we have 5000 - 10000 mounts then scanning /proc/mounts becomes a big problem from a CPU usage perspective (and is already a big problem for me, which is partly addressed by the new bits in the implementation here). If I could think of another way to expose the device number as well I would use it, even Not sure I agree about the low call frequency. It's quite normal for smaller sites to have hundreds of entries in maps and equally common for them to do stupid things like run scripts that scan the file systems, mounting every mount. It's not quite the same order of magnitude, but I regularly use the autofs connectathon suite for testing. It only ends up mounting several hundred mounts but I can get mounting and expiring happening at the same Right, I'll see what I can find on those topics. My concern is that this will require considerable work in the daemon which would be fine for version 5.1 but I need to resolve this problem --
I think there is some confusion surrounding what the UID and GID are used for in this context. I'll try to explain it as best I can. When the automount daemon parses a map entry, it will do some amount of variable substitution. So, let's say you're running on an i386 box, and you want to mount a library directory from a server. You might have a map entry that looks like this: lib server:/export/$ARCH/lib In this case, the automount daemon will replace $ARCH with i386, and will try the following mount command: mount -t nfs server:/export/i386/lib /automountdir/lib There are cases where it would be helpful to use the requesting process's UID in such a variable substitution. Consider the case of a CIFS share, where the automount daemon runs as user root, but we want to mount the share using the credentials of the requesting user. In this case, the UID and GID can be helpful in formatting the mount options for mounting the share. So, the UID and GID are used only for map substitutions. Now, having said all of that, I'll have to look more closely at why we even need to keep track of it, given that it only needs to be used when performing the lookup, and at that time we have information on the requesting UID and GID. Cheers, Jeff --
Thanks Jeff. If that's the case then user namespaces don't affect this at all. (Still trying to follow the rest of the thread bc i definately feel like I'm missing something. I swear I understood autofs 10+ years ago :) thanks, -serge --
Yep, that's precisely the way this is used, by autofs anyway.
I guess the problem we face is that since this is a public interface
other applications could use this in a different way and we can't
control that. I think I need more information so I can document the
defined usage in my revised patch set.
In version 5 I set $UID, $GID, $USER, $GROUP and $HOME in addition to
the standard autofs macros, $ARCH, $CPU, $HOST, $OSNAME, $OSREL and
$OSVERS, and then expand the map entry.
The question that Jeff is asking himself is, why do we need this
information when we re-connect at startup, since the mounts are already
present.
The answer isn't easy to explain and will be lengthy, sorry, but let me
try anyway.
There are two types on maps, direct (in the module source you will see a
third type called an offset, which is just a direct mount in disguise)
and indirect.
For example, here is master map with direct and indirect map entries:
/- /etc/auto.direct
/test /etc/auto.indirect
/etc/auto.direct:
/automount/dparse/g6 budgie:/autofs/export1
/automount/dparse/g1 shark:/autofs/export1
and so on.
/etc/auto.indirect:
g1 shark:/autofs/export1
g6 budgie:/autofs/export1
and so on.
For the above indirect map an autofs file system is mounted on /test and
mounts are triggered by the inode lookup operation. So we see a mount of
shark:/autofs/export1 on /test/g1, for example.
The way that direct mounts are handled is by makeing an autofs mount on
each full path, such as /automount/dparse/g1, and using it as a mount
trigger. So when we walk on the path we mount shark:/autofs/export1 on
top of this mount point, for example. Since these are always a
directories we can use the follow_link inode operation to trigger the
mount.
But, each entry in direct and indirect maps can have offsets (often
called multi-mount map entries).
For example,
a direct mount map entry could also be:
/automount/dparse/g1 \
/ shark:/autofs/export5/testing/test \
...The way the user namespaces work right now is similar to say the IPC namespace - a task belongs to one user, that user belongs to precisely one user namespace. Even in my additional userns patches, I was changing uid to store the (uid, userns) so a struct user still belonged to just one user namespace. In contrast, with pid namespaces a task is associated with a 'struct pid' which links it to multiple process ids, one in each pid namespace to which it belongs. Perhaps we should be treating user namespaces like pid namespaces? For autofs this would mean that when autofs wants a uid for some task, it would be given the uid in the user namespace which autofs 'knows'. It would also help me fix the siginfo problems I haven't solved yet - rather than having to worry about user namespace lifetimes with siginfos (which last a little while but have no clearly defined lifespan) we could send the uid in an init user namespace or the uid in the target uid namespace, or just a lightweight user struct proxy akin to 'struct pid'. And it also obviates the need for any sort of delegation. So if I'm user 500 in what I think is the initial user namespace, I can create a container with a new user namespace, the init task of which is both uid 0 in the child userns, and uid 500 in the higher level, automatically giving the container access to any files I own. Eric, when you get a chance (I know you're overloaded atm) I'd love to --
Succinctly. I think the concept of mapping uids between user namespaces is fundamental to properly describing and thinking about the semantics of user namespaces correct. We don't have to start out anything except handling the case when no mapping exists, but asking the question how does this uid map between from one namespace to another is fundamental. Eric --
Earlier I had thought this could just be done using a special keyring, but atm I'm thinking that would be far uglier than just having a struct pid-like credential proxy in the kernel to pass around in place True. But in any case I'm happy letting other things like netns and related sys be completed before prototyping this. thanks, -serge --
I have not looked at many of the implementation possibilities so unfortunately I don't know what makes for a good implementation. What I do know is that uids are serialized in filesystems, and their mapping between namespaces is defined by system administrators. Both of those properties are different from struct pid. Which means a generalized struct user in the kernel can at best hold a cache of the mappings. My preliminary investigations suggested that for the kernel filesystem boundary generating a struct user or a struct group just to use for a permission check and then to throw it away was wasteful. However for inkernel entities a struct user sounds practical. All of which is to say that we can learn lessons from the implementation of struct pid but that we also have different requirements so we can only use those lessons in a limited fashion. Eric --
I definitely think we should have some support like that. We already have code in nfsv4 and p9fs if I remember correctly to make between the user namespace of the server (which is string based) and the uids of the local machine. We also have a similar issue with security keys. I don't know if the strict hierarchical nature we have makes a lot of sense. One of the things that we should account for is that frequently user namespaces are kept in sync between multiple machines by system administrators. So in the real world user namespaces are a Yes. The user namespace of the process that opened the pipe I believe Yes. For the pids we have been looking at sending the pid in the target namespace and sending the uid in the target namespace should be Right. Long term we want to look at making this an unprivileged operation. Allowing a user to run less privileged processes. My impression has always been that going from comparing (userns, uid) to a more sophisticated mapping approach was a compatible extension. However if it looks like we need user namespace mapping support up I think you are on the right track. In a lot of ways the user namespace is the trickiest, as this is where we change the security rules. If it is only at the level of who is who. Since we already have user namespace mapping infrastructure in the kernel and ways to call back to user space to ask what the mapping should be, I feel performing mapping in the user namespace and generalizing the existing capability is a good idea. We still want to tell users if you can get away with it synchronize your user namespaces across file systems, and kernels, and machines. If they can't having good general tools in the kernel that you only need to learn once and not once per instance sounds good. Eric --
I'm afraid, that I'm just starting a new thread of discussion in a So do you mean that I can become a root, by calling clone()? Thanks, Pavel --
You can become root in the new container. Your capabilities are meaningful only to targets (users, files) which exist in the user namespace in which you are root. It becomes more precise than the --
As has been spotted, this is obviously wrong.
And here is the correction.
Signed-off-by: Ian Kent <raven@themaw.net>
Ian
---
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids-fix linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c
--- linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids-fix 2008-02-26 14:02:05.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c 2008-02-26 14:02:20.000000000 +0900
@@ -385,10 +385,8 @@ int autofs4_wait(struct autofs_sb_info *
/* Set mount requestor */
if (ino) {
- if (ino) {
- ino->uid = wq->uid;
- ino->gid = wq->gid;
- }
+ ino->uid = wq->uid;
+ ino->gid = wq->gid;
}
if (de)
--Hi Andrew,
Patch to catch invalid dentry when calculating it's path.
Signed-off-by: Ian Kent <raven@themaw.net>
Ian
---
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.getpath-check-valid-dentry linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c
--- linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.getpath-check-valid-dentry 2008-02-20 12:55:39.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c 2008-02-20 12:54:46.000000000 +0900
@@ -171,7 +171,7 @@ static int autofs4_getpath(struct autofs
for (tmp = dentry ; tmp != root ; tmp = tmp->d_parent)
len += tmp->d_name.len + 1;
- if (--len > NAME_MAX) {
+ if (!len || --len > NAME_MAX) {
spin_unlock(&dcache_lock);
return 0;
}
--
| Arnd Bergmann | SCHED_IDLE documentation |
| david | Re: limits on raid |
| Jan Engelhardt | Re: [PATCH] CodingStyle: multiple updates |
| Ingo Molnar | Re: Rescheduling interrupts |
git: | |
| Russ Brown | git-svn: Branching clarifications |
| Sam Song | Fwd: [OT] Re: Git via a proxy server? |
| Junio C Hamano | Re: More precise tag following |
| Pierre Habouzit | Re: People unaware of the importance of "git gc"? |
| Michael | Virtual interface |
| Stijn | Re: libiconv problem |
| Stefan Beke | mail dovecot: pipe() failed: Too many open files |
| Amaury De Ganseman | "ping: sendto: No buffer space available" when using bittorrent or another p2p |
| Jim Winstead Jr. | Re: Root Disk/Book Disk Compatibility |
| Darren Senn | Re: Elm |
| Seung-Chul Woo | Is it possible to mount GNU HURD file system as DOS in SLS? |
| David Willmore | Re: Intel, the Pentium and Linux |
