Re: [PATCH 1/7] autofs4: Save autofs trigger's vfsmount in super block info

Previous thread: ditt gyllene mynt har blivit krediterat by Hot Ruby Royale on Tuesday, December 22, 2009 - 10:19 am. (1 message)

Next thread: [PATCH 1/2] nfsd: make sure data is on disk before calling ->fsync by Christoph Hellwig on Friday, December 25, 2009 - 9:44 am. (6 messages)
From: Valerie Aurora
Date: Wednesday, December 23, 2009 - 4:36 pm

Writable overlays/union mounts requires some generic VFS improvements
and changes.  The patches in this set are independent of union
mounts and should merged immediately.

Jan Blunck (6):
  autofs4: Save autofs trigger's vfsmount in super block info
  VFS: Make lookup_hash() return a struct path
  VFS: Make real_lookup() return a struct path
  VFS: Propagate mnt_flags into do_loopback
  VFS: BUG_ON() rehash of an already hashed dentry
  VFS: Remove unnecessary micro-optimization in cached_lookup()

Valerie Aurora (1):
  VFS: Add read-only users count to superblock

 fs/autofs4/autofs_i.h |    1 +
 fs/autofs4/init.c     |   11 +++-
 fs/autofs4/root.c     |    6 ++
 fs/dcache.c           |    1 +
 fs/namei.c            |  217 ++++++++++++++++++++++++++-----------------------
 fs/namespace.c        |    7 +-
 fs/super.c            |   18 ++++-
 include/linux/fs.h    |    5 +
 8 files changed, 158 insertions(+), 108 deletions(-)

--

From: Valerie Aurora
Date: Wednesday, December 23, 2009 - 4:36 pm

From: Jan Blunck <jblunck@suse.de>

This is a bugfix/replacement for commit
051d381259eb57d6074d02a6ba6e90e744f1a29f:

    During a path walk if an autofs trigger is mounted on a dentry,
    when the follow_link method is called, the nameidata struct
    contains the vfsmount and mountpoint dentry of the parent mount
    while the dentry that is passed in is the root of the autofs
    trigger mount.  I believe it is impossible to get the vfsmount of
    the trigger mount, within the follow_link method, when only the
    parent vfsmount and the root dentry of the trigger mount are
    known.

The solution in this commit was to replace the path embedded in the
parent's nameidata with the path of the link itself in
__do_follow_link().  This is a relatively harmless misuse of the
field, but union mounts ran into a bug during follow_link() caused by
the nameidata containing the wrong path (we count on it being what it
is all other places - the path of the parent).

A cleaner and easier to understand solution is to save the necessary
vfsmount in the autofs superblock info when it is mounted.  Then we
can easily update the vfsmount in autofs4_follow_link().

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Ian Kent <raven@themaw.net>
Cc: autofs@linux.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/autofs4/autofs_i.h |    1 +
 fs/autofs4/init.c     |   11 ++++++++++-
 fs/autofs4/root.c     |    6 ++++++
 fs/namei.c            |    7 ++-----
 4 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index 8f7cdde..db2bfce 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -130,6 +130,7 @@ struct autofs_sb_info {
 	int reghost_enabled;
 	int needs_reghost;
 	struct super_block *sb;
+	struct vfsmount *mnt;
 	struct mutex wq_mutex;
 	spinlock_t fs_lock;
 	struct autofs_wait_queue *queues; /* Wait queue pointer */
diff --git ...
From: Valerie Aurora
Date: Wednesday, December 23, 2009 - 4:36 pm

From: Jan Blunck <jblunck@suse.de>

This patch changes lookup_hash() into returning a struct path.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c |  115 ++++++++++++++++++++++++++++++-----------------------------
 1 files changed, 58 insertions(+), 57 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c768444..0ddfefb 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1169,7 +1169,7 @@ static int path_lookup_open(int dfd, const char *name,
 }
 
 static struct dentry *__lookup_hash(struct qstr *name,
-		struct dentry *base, struct nameidata *nd)
+				    struct dentry *base, struct nameidata *nd)
 {
 	struct dentry *dentry;
 	struct inode *inode;
@@ -1216,14 +1216,22 @@ out:
  * needs parent already locked. Doesn't follow mounts.
  * SMP-safe.
  */
-static struct dentry *lookup_hash(struct nameidata *nd)
+static int lookup_hash(struct nameidata *nd, struct qstr *name,
+		       struct path *path)
 {
 	int err;
 
 	err = inode_permission(nd->path.dentry->d_inode, MAY_EXEC);
 	if (err)
-		return ERR_PTR(err);
-	return __lookup_hash(&nd->last, nd->path.dentry, nd);
+		return err;
+	path->mnt = nd->path.mnt;
+	path->dentry =  __lookup_hash(name, nd->path.dentry, nd);
+	if (IS_ERR(path->dentry)) {
+		err = PTR_ERR(path->dentry);
+		path->dentry = NULL;
+		path->mnt = NULL;
+	}
+	return err;
 }
 
 static int __lookup_one_len(const char *name, struct qstr *this,
@@ -1735,12 +1743,10 @@ struct file *do_filp_open(int dfd, const char *pathname,
 	if (flag & O_EXCL)
 		nd.flags |= LOOKUP_EXCL;
 	mutex_lock(&dir->d_inode->i_mutex);
-	path.dentry = lookup_hash(&nd);
-	path.mnt = nd.path.mnt;
+	error = lookup_hash(&nd, &nd.last, &path);
 
 do_last:
-	error = PTR_ERR(path.dentry);
-	if (IS_ERR(path.dentry)) {
+	if (error) {
 		mutex_unlock(&dir->d_inode->i_mutex);
 		goto exit;
 	}
@@ -1901,8 +1907,7 @@ do_link:
 	}
 	dir = nd.path.dentry;
 ...
From: Valerie Aurora
Date: Wednesday, December 23, 2009 - 4:36 pm

From: Jan Blunck <jblunck@suse.de>

This patch changes real_lookup() into returning a struct path.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c |   82 +++++++++++++++++++++++++++++++++++++----------------------
 1 files changed, 51 insertions(+), 31 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 0ddfefb..3f645ce 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -473,10 +473,11 @@ ok:
  * make sure that nobody added the entry to the dcache in the meantime..
  * SMP-safe
  */
-static struct dentry * real_lookup(struct dentry * parent, struct qstr * name, struct nameidata *nd)
+static int real_lookup(struct nameidata *nd, struct qstr *name,
+		       struct path *path)
 {
-	struct dentry * result;
-	struct inode *dir = parent->d_inode;
+	struct inode *dir = nd->path.dentry->d_inode;
+	int res = 0;
 
 	mutex_lock(&dir->i_mutex);
 	/*
@@ -493,27 +494,36 @@ static struct dentry * real_lookup(struct dentry * parent, struct qstr * name, s
 	 *
 	 * so doing d_lookup() (with seqlock), instead of lockfree __d_lookup
 	 */
-	result = d_lookup(parent, name);
-	if (!result) {
+	path->dentry = d_lookup(nd->path.dentry, name);
+	path->mnt = nd->path.mnt;
+	if (!path->dentry) {
 		struct dentry *dentry;
 
 		/* Don't create child dentry for a dead directory. */
-		result = ERR_PTR(-ENOENT);
-		if (IS_DEADDIR(dir))
+		if (IS_DEADDIR(dir)) {
+			res = -ENOENT;
 			goto out_unlock;
+		}
 
-		dentry = d_alloc(parent, name);
-		result = ERR_PTR(-ENOMEM);
+		dentry = d_alloc(nd->path.dentry, name);
 		if (dentry) {
-			result = dir->i_op->lookup(dir, dentry, nd);
-			if (result)
+			path->dentry = dir->i_op->lookup(dir, dentry, nd);
+			if (path->dentry) {
 				dput(dentry);
-			else
-				result = dentry;
+				if (IS_ERR(path->dentry)) {
+					res = PTR_ERR(path->dentry);
+					path->dentry = NULL;
+					path->mnt = NULL;
+				}
+			} ...
From: Valerie Aurora
Date: Wednesday, December 23, 2009 - 4:37 pm

From: Jan Blunck <jblunck@suse.de>

The mnt_flags are propagated into do_loopback(), so that they can be checked
when mounting something loopback into a union.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index bdc3cb4..2244801 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1446,8 +1446,8 @@ static int do_change_type(struct path *path, int flag)
 /*
  * do loopback mount.
  */
-static int do_loopback(struct path *path, char *old_name,
-				int recurse)
+static int do_loopback(struct path *path, char *old_name, int recurse,
+		       int mnt_flags)
 {
 	struct path old_path;
 	struct vfsmount *mnt = NULL;
@@ -1959,7 +1959,8 @@ long do_mount(char *dev_name, char *dir_name, char *type_page,
 		retval = do_remount(&path, flags & ~MS_REMOUNT, mnt_flags,
 				    data_page);
 	else if (flags & MS_BIND)
-		retval = do_loopback(&path, dev_name, flags & MS_REC);
+		retval = do_loopback(&path, dev_name, flags & MS_REC,
+				     mnt_flags);
 	else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE))
 		retval = do_change_type(&path, flags);
 	else if (flags & MS_MOVE)
-- 
1.5.6.5

--

From: Valerie Aurora
Date: Wednesday, December 23, 2009 - 4:37 pm

While we can check if a file system is currently read-only, we can't
guarantee that it will stay read-only.  The file system can be
remounted read-write at any time; it's also conceivable that a file
system can be mounted a second time and converted to read-write if the
underlying fs allows it.  This is a problem for union mounts, which
require the underlying file system be read-only.  Add a read-only
users count and don't allow remounts to change the file system to
read-write or read-write mounts if there are any read-only users.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/super.c         |   18 ++++++++++++++++--
 include/linux/fs.h |    5 +++++
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 19eb70b..8a12bab 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -583,8 +583,10 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
 	shrink_dcache_sb(sb);
 	sync_filesystem(sb);
 
-	/* If we are remounting RDONLY and current sb is read/write,
-	   make sure there are no rw files opened */
+	/*
+	 * If we are remounting RDONLY and current sb is read/write,
+	 * make sure there are no rw files opened.
+	 */
 	if ((flags & MS_RDONLY) && !(sb->s_flags & MS_RDONLY)) {
 		if (force)
 			mark_files_ro(sb);
@@ -596,6 +598,14 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
 	}
 	remount_rw = !(flags & MS_RDONLY) && (sb->s_flags & MS_RDONLY);
 
+	/*
+	 * If we are remounting read/write, deny write access if the
+	 * file system is being used by anything that requires it to
+	 * stay read-only (e.g., union mounts).
+	 */
+	if (remount_rw && sb->s_readonly_users)
+		return -EROFS;
+
 	if (sb->s_op->remount_fs) {
 		retval = sb->s_op->remount_fs(sb, &flags, data);
 		if (retval)
@@ -953,6 +963,10 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void
 	WARN((mnt->mnt_sb->s_maxbytes < 0), "%s set ...
From: Valerie Aurora
Date: Wednesday, December 23, 2009 - 4:37 pm

From: Jan Blunck <jblunck@suse.de>

BUG_ON() rehash of an already hashed dentry.  For debugging of
dcache-related development.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/dcache.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index a100fa3..74d4ca9 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1551,6 +1551,7 @@ void d_rehash(struct dentry * entry)
 {
 	spin_lock(&dcache_lock);
 	spin_lock(&entry->d_lock);
+	BUG_ON(!d_unhashed(entry));
 	_d_rehash(entry);
 	spin_unlock(&entry->d_lock);
 	spin_unlock(&dcache_lock);
-- 
1.5.6.5

--

From: Valerie Aurora
Date: Wednesday, December 23, 2009 - 4:37 pm

From: Jan Blunck <jblunck@suse.de>

d_lookup() takes rename_lock which is a seq_lock.  This is so cheap
it's not worth calling lockless __d_lookup() first from
cache_lookup().  Rename cached_lookup() to cache_lookup() while we're
there.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c |   13 ++++---------
 1 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3f645ce..ae200ba 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -417,15 +417,10 @@ do_revalidate(struct dentry *dentry, struct nameidata *nd)
  * Internal lookup() using the new generic dcache.
  * SMP-safe
  */
-static struct dentry * cached_lookup(struct dentry * parent, struct qstr * name, struct nameidata *nd)
+static struct dentry *cache_lookup(struct dentry *parent, struct qstr *name,
+				   struct nameidata *nd)
 {
-	struct dentry * dentry = __d_lookup(parent, name);
-
-	/* lockess __d_lookup may fail due to concurrent d_move() 
-	 * in some unrelated directory, so try with d_lookup
-	 */
-	if (!dentry)
-		dentry = d_lookup(parent, name);
+	struct dentry *dentry = d_lookup(parent, name);
 
 	if (dentry && dentry->d_op && dentry->d_op->d_revalidate)
 		dentry = do_revalidate(dentry, nd);
@@ -1208,7 +1203,7 @@ static struct dentry *__lookup_hash(struct qstr *name,
 			goto out;
 	}
 
-	dentry = cached_lookup(base, name, nd);
+	dentry = cache_lookup(base, name, nd);
 	if (!dentry) {
 		struct dentry *new;
 
-- 
1.5.6.5

--

From: Ian Kent
Date: Friday, January 1, 2010 - 5:44 pm

Acked-by: <raven@themaw.net>

Don't know how I missed such an obvious solution when I did this.
--

From: Al Viro
Date: Wednesday, January 13, 2010 - 10:43 pm

TBH, I don't like either variant (both the in-tree one and that).
The reason why vfsmount does *NOT* belong in superblock, TYVM: you've
messed the lifetime rules.  You can't pin it down, or the damn thing will
be impossible to kill.  OTOH, you have no promise whatsoever that superblock
won't outlive the initial vfsmount.  You might get another vfsmount over
the same thing and once the original one is gone...

So this is simply broken.
--

From: Valerie Aurora
Date: Thursday, January 14, 2010 - 12:18 pm

Ian, you're the expert - any ideas?  What are the constraints here?

-VAL
--

From: Ian Kent
Date: Thursday, January 14, 2010 - 11:05 pm

It looks to me like the struct path coming to __do_follow_link() has
what I need. A potentially big change, but using the struct path instead
of the dentry in the inode op follow_link() call would get me what I
need in my follow_link() call without having to store anything.

Ian
--

From: Al Viro
Date: Friday, January 15, 2010 - 1:03 am

[AFS, CIFS and NFS maintainers added to Cc]


Yuck.  If we go for change of ->follow_link() prototype, we might as
well DTRT and split the damn thing instead of trying to overload an
unsuitable method.

Let's see what's really going on there.  AFAICT, we have several, er,
oddities in the area:
	* abuse of mnt_devname by nfs; do we ever want it to differ
(for nfs automounting purposes) for different vfsmounts over the same
superblock?
	* inconsistent behaviour when something had managed to mount
first.  Some expect the possibility of -EBUSY, some do not.
	* WTF is CIFS doing setting MNT_SHRINKABLE on *parent* vfsmount?
Blindly, not even bothering to cooperate with mnt_make_readonly() and
its ilk.  Most of all, _why_?  It's child that should be marked so, not
the parent...
	* speaking of which, what should happen to other vfsmount flags?
NFS seems to want them all inherited and MNT_SHRINKABLE set, which is
reasonable enough; CIFS sets MNT_SHRINKABLE on parent and inherits everything;
AFS can't be arsed and just sets MNT_SHRINKABLE, all flags on parent be
damned.
	* what's going on with use of intent flags in case of autofs4?

Other than that, it'd seem that we would be better off if we treated these
guys as "if we try to follow mountpoint and that sucker turns out to be
the end of the road, do something fs-specific and either try to go further
or see if we could attach the mountpoint given to us here".  Which might
be better done in VFS, with fs method returning vfsmount, NULL or ERR_PTR.

And yes, it'd mean rework of vfsmount expiry interface; we'd need to put the
vfsmount on expiry list before attaching it anywhere.  Not a big deal,
that...

Comments?  It's obviously not a solution yet and this direction might
very well turn out to be unfeasible, but let's at least see where that
might lead.  There are, after all, 4 users of that stuff.  Should be
enough to figure out what we really need and come up with a sane design...
--

From: David Howells
Date: Friday, January 15, 2010 - 7:55 am

Why's that wrong?  This is AFS's automounter, and it seems quite reasonable to
have the outer edges pruned to make space.

David
--

From: Al Viro
Date: Friday, January 15, 2010 - 9:58 am

Do you want it to inerit e.g. nosuid?
--

From: David Howells
Date: Friday, January 15, 2010 - 10:08 am

Are you just talking about MNT_SHRINKABLE?  Or all the other flags?

Should I be passing:

	nd->path.mnt->mnt_flags | MNT_SHRINKABLE

instead?

David
--

From: Al Viro
Date: Friday, January 15, 2010 - 10:26 am

Maybe, maybe not...  BTW, even that leaves an unpleasant race with
mnt_make_readonly() (CIFS and NFS seem to be suffering from one).
Which flags do we want to be inherited?  Grabbing MNT_WRITE_HOLD,
for example, would obviously be a bad idea...
--

From: Al Viro
Date: Saturday, January 16, 2010 - 3:17 am

Oh, man...  After doing code review re locking rules for mnt_flags
and related stuff, a bunch of races had shown up.  I'll put fixes
into #for-linus tomorrow.

Shit galore:
	* may_umount() ought to take namespace_sem (shared).  Otherwise
we race with clone_mnt() doing add_list() to ->mnt_share/->mnt_slave.
	* attach_recursive_mnt() ought to take vfsmount_lock around
its loop that does set_mnt_shared(); otherwise mount --move can race
with e.g. mount -o remount.
	* do_remount() ought to take vfsmount_lock around the assigment
to mnt_flags *and* take care to leave MNT_SHARED and MNT_UNBINDABLE alone,
especially the former.
	* CIFS shouldn't step on mnt_flags
	* NFS, AFS and CIFS should *not* leave MNT_SHARED and MNT_WRITE_HOLD
in flags passed to do_add_mount(); alternatively, do_add_mount() might
trim those.
	* [unsolved, to be dealt along with per-superblock write counts]
do_remount() plays fast and loose with MNT_READONLY for !MS_BIND case.
	* [*really* unsolved] it remains to be seen whether we want to
propagate modifications of mount flags via shared subtree stuff.  For most
of those it's trivial (and arguably the right thing to do), but ro/rw is
really nasty.  Nick's mnt_want_write() implementation will need very careful
analysis.
	* [#for-next fodder] pnode.c:get_source() cleanup; right now it is
correct, but PITA to prove the correctness.  Incidentally, CL_PROPAGATION
is gone after that one.  Not #for-linus stuff, but I'll be really happier
with it for post-.33
	* [#for-next, but might go into #for-linus as well] explicit
documentation of invariants added to Documentation/filesystems/sharedsubtree.txt
Part of that can be deduced from what's already there, but I'd rather have
it explicit and one crucial bit is simply missing ("if mnt->mnt_master != NULL,
the entire mnt->mnt_share list consists of adjacent elements of mnt->mnt_slaves,
in the same order") and proving correctness without it is Not Fun(tm).  Hell,
I've spent an hour figuring out whether it's broken or not ...
From: Al Viro
Date: Sunday, January 17, 2010 - 10:57 am

Speaking of autofs4, what the hell is going on in
autofs_dev_ioctl_ismountpoint()?  Checks in there make no sense,
both the "could that dentry be negative?" and whatever the hell
it is trying to do with mnt_mountpoint.  Ian?
--

From: Ian Kent
Date: Sunday, January 17, 2010 - 9:21 pm

In that case we may find an autofs mount that has something mounted on
top of it and user space wants to know the super of the covering mount.

If there is something mounted on top of it user space needs to know if
it is another autofs file system or some other type of file system. So
if the nameidata path, located by autofs_dev_ioctl_find_super(), is not
the top (or bottom, depending on the terminology you prefer) then we
need to follow the mount and return the magic of the thing mounted on
top of it.

Ian
--

From: Al Viro
Date: Sunday, January 17, 2010 - 10:59 pm

IDGI.  What you are doing there is
                if (path.mnt->mnt_mountpoint != path.mnt->mnt_root) {
                        if (follow_down(&path))
                                magic = path.mnt->mnt_sb->s_magic;
                }
and I don't think it means what you think it means.  Just what is that
mnt_mountpoint check about?  Before that point we'd found the autofs
vfsmount M that
	1) M is mounted on <name>
	2) M->mnt_sb has the right s_dev
	3) M is the closest one to root in mount tree out of vfsmounts
satisfying (1) and (2)
Now we check that
	4) the mountpoint M is attached to has dentry different from
M->mnt_root.  That's an interesting thing to check, seeing that the
only way to get it false is to have mount --bind name name, with name
not being the mountpoint before that.  And M being the result of
that mount --bind.
	5) something is mounted on top of root of M.

Then we proceed to return the s_magic of that something.  For one thing,
if there *are* several vfsmounts satisfying (1,2), which one do we really
want?  For another, what's the intent of (4)?  It looks very odd; what's
really being checked there?

In another branch we have
                if (path.dentry->d_inode &&
                    path.mnt->mnt_root == path.dentry) {
                        err = 1;
                        magic = path.dentry->d_inode->i_sb->s_magic;
                }
and AFAICT, path.dentry->d_inode == NULL is impossible there.  Besides,
path.mnt->mnt_sb->s_magic would be simpler anyway (and evaluate to
the same thing).
--

From: Ian Kent
Date: Monday, January 18, 2010 - 2:14 am

The code here has changed a little from what I originally posted but,
assuming for now the functionality is equivalent, I can't see what the
point of that check is and now I can't remember if there was some odd
special case. follow_mount() should be sufficient. I'll fix this.

The possibility of more than one vfsmount being present is, as you say
possible, but it is not legal for autofs (and last time I checked I
concluded it wasn't possible for me to veto bind mount requests). Other
than bind mounts I'm struggling to think of a case where I would have

Yes, the dentry should always be positive here but let me think about it
a little more in case I'm missing something. And yes, using the vfsmount
super block pointer would be better. I'll fix these too.

Ian
--

From: Al Viro
Date: Monday, January 18, 2010 - 3:27 am

That's interesting...  Other place where we go through the stack of mounts
is where we look for one with given bits in type; do we have a possibility
of multiple candidates there and which one do we really want?  Same function,
case when we have ioctlfd == -1, !autofs_type_any(type).  Current code
(as well as original) goes for one closest to root; it would certainly be

FWIW, I'd rather have that stuff in vfs tree; that's not the only patch
eliminating mnt_mountpoint use.  AFAICT, none of the uses outside of
core VFS code are legitimate (and BTW, NFSv4 one is contrary to RFC - it
should do follow_down() in loop before reporting mount_fileid instead of
just picking the immediate ancestor for one thing and I'm not at all sure
that check around that thing is correct; it ignores export path.dentry
and looks at path.mnt.root instead, which seems to be wrong).  Other places
would be just as happy with mnt/mnt->mnt_root instead of mnt->mnt_parent/
mnt->mnt_mountpoint or, worse, mnt/mnt->mnt_mountpoint.  Pohmelfs is even
nastier, with its d_path() abuses, but that's a separate story.

IMO we ought to get rid of mnt_mountpoint/mnt_parent uses outside of core
VFS and I'd rather do that in one patch queue.

Back to another question: which syscalls should and which syscalls should not 
trigger automount on the last component?  Note that it's something we'd better
be consistent about between autofs4 and cifs/afs/nfs...
--

From: Trond Myklebust
Date: Monday, January 18, 2010 - 12:35 pm

In addition to the ones that trigger automounts now:

One syscall that we've had a lot of complaints about is 'stat()'. Since
it doesn't follow symlinks, it will fail to trigger the automount, and
will return bogus values for st_dev. This again confuses some versions
of the 'du' utility that check the value of st_dev before and after
entering the subdir.
(see https://bugzilla.redhat.com/show_bug.cgi?id=431166 )

In the cifs/afs/nfs case, statfs() should really trigger automounts too.
Not sure if we want that for the autofs case.

Finally, it makes little sense for cifs/afs/nfs to allow open() on the
underlying directory, even if O_NOFOLLOW is set. Again, autofs might be
an exception due to its reliance on ioctls as a method of communication
with the kernel.

Cheers
  Trond
--

From: Ian Kent
Date: Tuesday, January 19, 2010 - 12:05 am

We shouldn't have multiple candidates for the same reason autofs can't
support bind mounts. The mounts in an autofs tree of mounts are tied to
a map of mounts in user space which means they are tied to a specific
path. And, we won't (or at least shouldn't unless user space uses the
facility incorrectly) have multiple mounts of the same autofs type on
the stack due to the way autofs must use these mounts.

As far as going from the top of the stack goes I don't think it makes
any real difference since the stack is small (2 or possibly 3). The
find_autofs_mount() function does go from the (terminology?) bottom of
the mount stack using follow_up() whereas the original code went from
the top using the path_lookup() to get the parent and then
follow_down(). So there is additional following with the current code
since kern_path() will go down to the bottom of the stack and then we

That's actually a difficult question.

stat(2) is the most problematic call for much the same reasons as
described by Trond.

A (hopefully fairly simple) description of a couple of specific cases
will be useful to clarify the situation.

First, terminology, an "indirect autofs mount" is an autofs fs mount
that implements single path components as mount points within the autofs
fs directory. There are two cases, indirect mounts that are "nobrose"
and ones that "browse"able. The later is just an autofs indirect mount
with pre-created mount point directories so they can be seen by listing
the directory whereas a nobrowse mount corresponds to an autofs mount we
have seen historically where a directory listing of an autofs mount with
no current active automounted mounts returns nothing.

There is no problem with stat(2) for nobrowse autofs indirect mounts
because directories that can potentially trigger mounts simply don't
exist and the user space tools tend to behave as people expect.

But, for browse autofs indirect mounts we get into trouble. For example
a long or colour ls will stat(2) each directory ...
From: Ian Kent
Date: Sunday, January 17, 2010 - 10:08 pm

It's the same old story.
autofs needs to take care not to trigger mounts for certain system calls
when we have reached the last path component. AFAIK the only way to do

Forgive my thickness but am I understanding this correctly?

Are you just saying it would be better to add a file system method to
cater for the case where we might need to do further resolution on the
mount root we have just arrived at before continuing into it (mmm ...
now it feels like I've stated the obvious, but anyway ...)?

Would that be before following the link or after (not that it would
matter for my needs)?

Ian
--

Previous thread: ditt gyllene mynt har blivit krediterat by Hot Ruby Royale on Tuesday, December 22, 2009 - 10:19 am. (1 message)

Next thread: [PATCH 1/2] nfsd: make sure data is on disk before calling ->fsync by Christoph Hellwig on Friday, December 25, 2009 - 9:44 am. (6 messages)