Re: [V9fs-developer] [GIT PULL] 9p file system bug fixes for 2.6.35-rc2

Previous thread: 2.6.35-rc2-git1 - BUG: using smp_processor_id() in preemptible [00000000] code: ondemand/4323 by Miles Lane on Monday, June 7, 2010 - 12:02 pm. (2 messages)

Next thread: [git patches] libata fixes by Jeff Garzik on Monday, June 7, 2010 - 1:07 pm. (1 message)
From: Eric Van Hensbergen
Date: Monday, June 7, 2010 - 1:02 pm

Linus,

Please merge the following bug fixes for the 9P file system.

The following changes since commit 386f40c86d6c8d5b717ef20620af1a750d0dacb4:
  Linus Torvalds (1):
        Revert "tty: fix a little bug in scrup, vt.c"

are available in the git repository at:

  ssh://master.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git for-linus

Fang Wenqi (1):
      virtio_9p.h needs <linux/types.h>

jvrao (2):
      Add a helper function to get fsgid for a file create.
      9p: Add a wstat after TCREATE to fix the gid.

 fs/9p/vfs_inode.c         |   34 ++++++++++++++++++++++++++++++++++
 include/linux/virtio_9p.h |    1 +
 2 files changed, 35 insertions(+), 0 deletions(-)
--

From: Linus Torvalds
Date: Monday, June 7, 2010 - 5:08 pm

Quite frankly, this looks rather broken.

It uses "dentry->d_parent" without locking (it so happens to likely be ok, 
since we are in "create()" and thus should be holding the parent 
semaphore). On its own, that might be excusable (if people were even 
_aware_ of the this locking rule!), but it does so just to get the inode 
pointer to that parent.

And the only thing that makes it ok to access dentry->d_parent - the fact 
that we are in v9fs_create() - is also the thing that should have made 
people look at the arguments to the function and say "hmm".

We pass in the directory inode pointer as an argument to the create 
function! The code could have used that thing directly, instead of 
mucking around with dentry pointers that it had no business looking at.

I see why it seems to have happened: v9fs does the exact same thing for 
the pre-existing "v9fs_fid_lookup()". So there is history to this 
behavior.

Maybe people weren't aware of the fact that just dereferencing 
dentry->d_parent willy-nilly isn't actually allowed. That field changes. 
Sure, there are cases where it's ok, but this is a dangerous thing to do 
in general. 

In fact, the other thing that I find doing that whole "dentry->d_parent" 
thing seems to literally be broken. If you look at v9fs_fid_lookup(), 
you'll notice how it walks up the d_parent chain, and at that point you do 
NOT own the directory i_mutex, so at that point d_parent really _can_ be 
changing wildly due to concurrent renames or whatever.

So 9pfs seems to have some preexisting bugs in this area. I'm not going to 
pull new bug-prone code. See the other discussions about being tight this 
release about really _only_ taking regressions after the merge window 
closed.

		Linus
--

From: Al Viro
Date: Monday, June 7, 2010 - 5:41 pm

Eh...  It's bogus, all right, but i_mutex is not the correct solution.
You'd have to take it on a lot of inodes along the way to root *and*
you'd violate the ordering in process (ancestors first).

I'm not sure what's the right thing to do there, actually - s_vfs_rename_sem
also won't do, since it'll give you ordering problems of its own (it's
taken before i_mutex in VFS, so trying to take it under i_mutex would not
do).

The _really_ interesting question is how do servers deal with topology-changing
renames.  Note that the problem exists only with extended 9P - with the
original one all of that had been a non-issue, since it didn't allow
cross-directory renames at all and the tree topology remained stable all along.
--

From: Linus Torvalds
Date: Monday, June 7, 2010 - 5:48 pm

Oh, no, I didn't imply it was. But the other sites that I saw doing 
the dentry->d_parent access already _had_ the i_mutex thing, so I was 
pointing out how this one does not (and indeed _cannot_ do that).

So I'm just saying that pretty much _all_ the dentry->d_parent use in 9p 
seems very suspect. The cases where we hold i_mutex (because the caller 
already took it) shouldn't do that whole d_parent dance, because they get 
the directory inode passed into them directly.

And the other places are just buggy.

So from a quick look, the use of d_parent in 9p is simply not a good idea. 

			Linus
--

From: Aneesh Kumar K. V
Date: Wednesday, June 16, 2010 - 9:42 am

Can we use dcache_lock in v9fs_fid_lookup ? Since we are holding
parent directory inode->i_mutex in other places where we refer
dentry->d_parent I guess those access are ok ?. And for v9fs_fid_lookup we
can hold dcache_lock, walk the parent, build the full path name and use
that for TWALK ? 

Another option is we deny a cross directory rename when
doing fid_lookup. That is we can introduce a per superblock v9fs
specific rwlock that get taken (in write mode) in a cross directory
rename and in fid_lookup we take them in read mode ? We will have to set

True the problem exist only with .L extension since .u protocol don't
allow a cross directory renames. In the server we update the path names
attached to a fid during rename. This happen to all fids that have path
component matching the changed name.

-aneesh

--

From: Aneesh Kumar K. V
Date: Thursday, June 24, 2010 - 9:30 am

something like this ?

commit 86c06ad7506e7e05dd4e1e1b8cee28e19703c4f6
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date:   Mon Jun 21 21:50:07 2010 +0530

    fs/9p: Prevent parallel rename when doing fid_lookup
    
    During fid lookup we need to make sure that the dentry->d_parent doesn't
    change so that we can safely walk the parent dentries. To ensure that
    we need to prevent cross directory rename during fid_lookup. Add a
    per superblock rename_lock rwlock to prevent parallel fid lookup and rename.
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

diff --git a/fs/9p/fid.c b/fs/9p/fid.c
index 7317b39..14b542a 100644
--- a/fs/9p/fid.c
+++ b/fs/9p/fid.c
@@ -139,14 +139,19 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
 	fid = v9fs_fid_find(dentry, uid, any);
 	if (fid)
 		return fid;
-
+	/*
+	 * we don't have a matching fid. To do a TWALK we need
+	 * parent fid. We need to prevent rename when we want to
+	 * look at the parent.
+	 */
+	read_lock(&v9ses->rename_lock);
 	ds = dentry->d_parent;
 	fid = v9fs_fid_find(ds, uid, any);
 	if (!fid) { /* walk from the root */
 		n = 0;
 		for (ds = dentry; !IS_ROOT(ds); ds = ds->d_parent)
 			n++;
-
+		read_unlock(&v9ses->rename_lock);
 		fid = v9fs_fid_find(ds, uid, any);
 		if (!fid) { /* the user is not attached to the fs yet */
 			if (access == V9FS_ACCESS_SINGLE)
@@ -165,9 +170,11 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
 
 			v9fs_fid_add(ds, fid);
 		}
-	} else /* walk from the parent */
+	} else  {
+		/* walk from the parent */
 		n = 1;
-
+		read_unlock(&v9ses->rename_lock);
+	}
 	if (ds == dentry)
 		return fid;
 
@@ -175,8 +182,10 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
 	if (!wnames)
 		return ERR_PTR(-ENOMEM);
 
+	read_lock(&v9ses->rename_lock);
 	for (d = dentry, i = (n-1); i >= 0; i--, d = d->d_parent)
 		wnames[i] = (char *) d->d_name.name;
+	read_unlock(&v9ses->rename_lock);
 
 	clone = ...
From: Eric Van Hensbergen
Date: Tuesday, June 29, 2010 - 1:18 pm

Linus, Al,

Does this approach satisfy your concerns?  We've been going over
several different options on how to proceed, but this seems to be the
best option.  The v9fs_fid_lookup traversal of the d_parent path is an
exception case that hardly ever gets hit in practice so the extra
locking shouldn't be a performance problem, but we haven't been able
to figure out a way to resolve it in another fashion for existing
use-case scenarios.

     Thanks for any guidance,

         -eric


On Thu, Jun 24, 2010 at 11:30 AM, Aneesh Kumar K. V
--

From: Linus Torvalds
Date: Tuesday, June 29, 2010 - 1:38 pm

Using a p9fs rename lock does seem to be a reasonable option.

That said, the patch itself seems to not be valid. You drop the lock
too early in v9fs_fid_lookup() as far as I can tell. You then re-take
it before doing that whole

   for (d = dentry, i = (n-1); i >= 0; i--, d = d->d_parent)

loop with it held again, but that's totally bogus - because you
dropped the lock, your 'n-1' count has absolutely no meaning any more,
since a cross-directory rename might have changed the depth of the
thing in the meantime.

And if the depth changes, you aren't at all guaranteed to stay on the
same p9fs filesystem, so now you're doing that d_parent access without
the proper locking (sure: you hold the rename lock, but it's not at
all guaranteed that the rename lock is the _right_ lock any more as
you traverse the list down!)

But I didn't look deeply at the patch. There might be some reason why
it's safe (I doubt it, though), and there might be other places where
you do the same. But in general, dropping and re-taking a lock is a
bad idea. If you dropped the lock, you can't depend on anything you
found out while having held it.

              Linus
--

From: Aneesh Kumar K. V
Date: Wednesday, June 30, 2010 - 4:52 am

You are correct. we cannot drop the rename lock in between. I also found
another issue in that we are using dentry->d_name.name directly. That
would imply we need to hold the rename_lock even during the
client_walk.  How about the patch below ?. I updated the patch to hold
rename_lock during multiple path walk. Also the rename path is updated
to hold the lock during p9_client_rename operations.

commit 79f6f20dbb70ad35db37b674957c95de20662a75
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date:   Wed Jun 30 15:37:58 2010 +0530

    fs/9p: Prevent parallel rename when doing fid_lookup
    
    During fid lookup we need to make sure that the dentry->d_parent doesn't
    change so that we can safely walk the parent dentries. To ensure that
    we need to prevent cross directory rename during fid_lookup. Add a
    per superblock rename_lock rwlock to prevent parallel fid lookup and rename.
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

diff --git a/fs/9p/fid.c b/fs/9p/fid.c
index 5d6cfcb..7b387fe 100644
--- a/fs/9p/fid.c
+++ b/fs/9p/fid.c
@@ -97,6 +97,34 @@ static struct p9_fid *v9fs_fid_find(struct dentry *dentry, u32 uid, int any)
 	return ret;
 }
 
+/*
+ * We need to hold v9ses->rename_lock as long as we hold references
+ * to returned path array. Array element contain pointers to
+ * dentry names.
+ */
+static int build_path_from_dentry(struct v9fs_session_info *v9ses,
+				  struct dentry *dentry, char ***names)
+{
+	int n = 0, i;
+	char **wnames;
+	struct dentry *ds;
+
+	for (ds = dentry; !IS_ROOT(ds); ds = ds->d_parent)
+		n++;
+
+	wnames = kmalloc(sizeof(char *) * n, GFP_KERNEL);
+	if (!wnames)
+		goto err_out;
+
+	for (ds = dentry, i = (n-1); i >= 0; i--, ds = ds->d_parent)
+		wnames[i] = (char  *)ds->d_name.name;
+
+	*names = wnames;
+	return n;
+err_out:
+	return -ENOMEM;
+}
+
 /**
  * v9fs_fid_lookup - lookup for a fid, try to walk if not found
  * @dentry: dentry to look for fid in
@@ -112,7 +140,7 ...
From: Linus Torvalds
Date: Wednesday, June 30, 2010 - 9:48 am

On Wed, Jun 30, 2010 at 4:52 AM, Aneesh Kumar K. V


I'm not finding any obvious problems with this, and you're right that
you also need to hold the rename write-lock even for regular renames
(not just cross-directory ones) in order to protect the name itself.
So those two patches together seem to be ok at a quick glance.

That said, I do wonder if you wouldn't be better off copying the name
components in order to then drop the lock earlier. That way you'd only
need to hold the lock while walking the dentries, and could possibly
release it during the actual walk (unless you need the names to be
stable, but I don't think you can rely on that anyway, since other
clients might be doing renames concurrently.. I don't know)

                       Linus
--

From: Aneesh Kumar K. V
Date: Wednesday, June 30, 2010 - 11:58 am

Updated patch

From 0bae321401e88d495ea7e9e96272c11c1b9b9ec3 Mon Sep 17 00:00:00 2001
From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date: Wed, 30 Jun 2010 19:18:50 +0530
Subject: [PATCH] fs/9p: Prevent parallel rename when doing fid_lookup

During fid lookup we need to make sure that the dentry->d_parent doesn't
change so that we can safely walk the parent dentries. To ensure that
we need to prevent cross directory rename during fid_lookup. Add a
per superblock rename_lock rwlock to prevent parallel fid lookup and rename.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/9p/fid.c       |  134 ++++++++++++++++++++++++++++++++++++++---------------
 fs/9p/v9fs.c      |    1 +
 fs/9p/v9fs.h      |    1 +
 fs/9p/vfs_inode.c |   14 ++++--
 fs/9p/vfs_super.c |    1 +
 5 files changed, 110 insertions(+), 41 deletions(-)

diff --git a/fs/9p/fid.c b/fs/9p/fid.c
index 5d6cfcb..7dcefaa 100644
--- a/fs/9p/fid.c
+++ b/fs/9p/fid.c
@@ -97,6 +97,48 @@ static struct p9_fid *v9fs_fid_find(struct dentry *dentry, u32 uid, int any)
 	return ret;
 }
 
+static int build_path_from_dentry(struct v9fs_session_info *v9ses,
+				  struct dentry *dentry, char ***names)
+{
+	int n = 0, i;
+	char **wnames;
+	struct dentry *ds;
+
+	read_lock(&v9ses->rename_lock);
+	for (ds = dentry; !IS_ROOT(ds); ds = ds->d_parent)
+		n++;
+
+	wnames = kmalloc(sizeof(char *) * n, GFP_KERNEL);
+	if (!wnames) {
+		n = 0;
+		goto err_out;
+	}
+	for (ds = dentry, i = (n-1); i >= 0; i--, ds = ds->d_parent) {
+		wnames[i] = kmalloc(strlen(ds->d_name.name) + 1, GFP_KERNEL);
+		if (!wnames[i])
+			goto err_out;
+
+		strcpy(wnames[i], ds->d_name.name);
+	}
+	*names = wnames;
+	read_unlock(&v9ses->rename_lock);
+	return n;
+err_out:
+	read_unlock(&v9ses->rename_lock);
+	for (; i < n; i++)
+		kfree(wnames[i]);
+	kfree(wnames);
+	return -ENOMEM;
+}
+
+static void kfree_wnames(char **wnames, int n)
+{
+	int i;
+	for (i = 0; i < n; ...
From: Latchesar Ionkov
Date: Wednesday, June 30, 2010 - 11:16 am

I think that you need to use the s_vfs_rename_mutex in the super_block
struct instead of introducing a new rename_lock in the v9fs session.

Thanks,
    Lucho

On Wed, Jun 30, 2010 at 5:52 AM, Aneesh Kumar K. V
--

From: Linus Torvalds
Date: Wednesday, June 30, 2010 - 11:31 am

I actually think it's better to avoid having filesystems muck around
with VFS locking details. Also, I think we get the VFS rename mutex
only for cross-directory renames, and as mentioned, 9p needs locking
even for regular directory renames.

(Also, this way you can have parallel readers - although we could
obviously change the vfs rename mutex into a rw-sem too).

So I do think that keeping the logic private to a 9p-specific lock is
the right solution here.

                 Linus
--

From: Aneesh Kumar K. V
Date: Wednesday, June 30, 2010 - 11:33 am

I guess that will have lock dependency issue with inode->i_mutex. Also as Linus
suggested with dentry name copied we can avoid holding the lock when
doing 9P operation.
 
-aneesh
--

From: Aneesh Kumar K. V
Date: Wednesday, June 30, 2010 - 5:55 am

To make sure d_name.name remain correct we need something like below ?

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index eae89ad..9f9f804 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1050,7 +1050,7 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 	struct p9_fid *olddirfid;
 	struct p9_fid *newdirfid;
 	struct p9_wstat wstat;
-	int retval, cross_dir_rename = 0;
+	int retval;
 
 	P9_DPRINTK(P9_DEBUG_VFS, "\n");
 	retval = 0;
@@ -1071,17 +1071,15 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 		retval = PTR_ERR(newdirfid);
 		goto clunk_olddir;
 	}
-	cross_dir_rename = (old_dentry->d_parent != new_dentry->d_parent);
-	if (cross_dir_rename)
-		write_lock(&v9ses->rename_lock);
 
+	write_lock(&v9ses->rename_lock);
 	if (v9fs_proto_dotl(v9ses)) {
 		retval = p9_client_rename(oldfid, newdirfid,
 					(char *) new_dentry->d_name.name);
 		if (retval != -ENOSYS)
 			goto clunk_newdir;
 	}
-	if (cross_dir_rename) {
+	if (old_dentry->d_parent != new_dentry->d_parent) {
 		/*
 		 * 9P .u can only handle file rename in the same directory
 		 */
@@ -1100,8 +1098,7 @@ clunk_newdir:
 	if (!retval)
 		/* successful rename */
 		d_move(old_dentry, new_dentry);
-	if (cross_dir_rename)
-		write_unlock(&v9ses->rename_lock);
+	write_unlock(&v9ses->rename_lock);
 	p9_client_clunk(newdirfid);
 
 clunk_olddir:
--

From: Venkateswararao Jujjuri (JV)
Date: Tuesday, June 8, 2010 - 7:29 am

Silly me. I sent out another patch using the dir inode passed through arguments.
But we still need to analyze the use of dentry->d_parent in other parts of code.. 



--

Previous thread: 2.6.35-rc2-git1 - BUG: using smp_processor_id() in preemptible [00000000] code: ondemand/4323 by Miles Lane on Monday, June 7, 2010 - 12:02 pm. (2 messages)

Next thread: [git patches] libata fixes by Jeff Garzik on Monday, June 7, 2010 - 1:07 pm. (1 message)