Linus,
Please merge the following bug fixes for the 9P file system.
The following changes since commit 386f40c86d6c8d5b717ef20620af1a750d0dacb4:
Linus Torvalds (1):
Revert "tty: fix a little bug in scrup, vt.c"
are available in the git repository at:
ssh://master.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git for-linus
Fang Wenqi (1):
virtio_9p.h needs <linux/types.h>
jvrao (2):
Add a helper function to get fsgid for a file create.
9p: Add a wstat after TCREATE to fix the gid.
fs/9p/vfs_inode.c | 34 ++++++++++++++++++++++++++++++++++
include/linux/virtio_9p.h | 1 +
2 files changed, 35 insertions(+), 0 deletions(-)
--
Quite frankly, this looks rather broken. It uses "dentry->d_parent" without locking (it so happens to likely be ok, since we are in "create()" and thus should be holding the parent semaphore). On its own, that might be excusable (if people were even _aware_ of the this locking rule!), but it does so just to get the inode pointer to that parent. And the only thing that makes it ok to access dentry->d_parent - the fact that we are in v9fs_create() - is also the thing that should have made people look at the arguments to the function and say "hmm". We pass in the directory inode pointer as an argument to the create function! The code could have used that thing directly, instead of mucking around with dentry pointers that it had no business looking at. I see why it seems to have happened: v9fs does the exact same thing for the pre-existing "v9fs_fid_lookup()". So there is history to this behavior. Maybe people weren't aware of the fact that just dereferencing dentry->d_parent willy-nilly isn't actually allowed. That field changes. Sure, there are cases where it's ok, but this is a dangerous thing to do in general. In fact, the other thing that I find doing that whole "dentry->d_parent" thing seems to literally be broken. If you look at v9fs_fid_lookup(), you'll notice how it walks up the d_parent chain, and at that point you do NOT own the directory i_mutex, so at that point d_parent really _can_ be changing wildly due to concurrent renames or whatever. So 9pfs seems to have some preexisting bugs in this area. I'm not going to pull new bug-prone code. See the other discussions about being tight this release about really _only_ taking regressions after the merge window closed. Linus --
Eh... It's bogus, all right, but i_mutex is not the correct solution. You'd have to take it on a lot of inodes along the way to root *and* you'd violate the ordering in process (ancestors first). I'm not sure what's the right thing to do there, actually - s_vfs_rename_sem also won't do, since it'll give you ordering problems of its own (it's taken before i_mutex in VFS, so trying to take it under i_mutex would not do). The _really_ interesting question is how do servers deal with topology-changing renames. Note that the problem exists only with extended 9P - with the original one all of that had been a non-issue, since it didn't allow cross-directory renames at all and the tree topology remained stable all along. --
Oh, no, I didn't imply it was. But the other sites that I saw doing the dentry->d_parent access already _had_ the i_mutex thing, so I was pointing out how this one does not (and indeed _cannot_ do that). So I'm just saying that pretty much _all_ the dentry->d_parent use in 9p seems very suspect. The cases where we hold i_mutex (because the caller already took it) shouldn't do that whole d_parent dance, because they get the directory inode passed into them directly. And the other places are just buggy. So from a quick look, the use of d_parent in 9p is simply not a good idea. Linus --
Can we use dcache_lock in v9fs_fid_lookup ? Since we are holding parent directory inode->i_mutex in other places where we refer dentry->d_parent I guess those access are ok ?. And for v9fs_fid_lookup we can hold dcache_lock, walk the parent, build the full path name and use that for TWALK ? Another option is we deny a cross directory rename when doing fid_lookup. That is we can introduce a per superblock v9fs specific rwlock that get taken (in write mode) in a cross directory rename and in fid_lookup we take them in read mode ? We will have to set True the problem exist only with .L extension since .u protocol don't allow a cross directory renames. In the server we update the path names attached to a fid during rename. This happen to all fids that have path component matching the changed name. -aneesh --
something like this ?
commit 86c06ad7506e7e05dd4e1e1b8cee28e19703c4f6
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date: Mon Jun 21 21:50:07 2010 +0530
fs/9p: Prevent parallel rename when doing fid_lookup
During fid lookup we need to make sure that the dentry->d_parent doesn't
change so that we can safely walk the parent dentries. To ensure that
we need to prevent cross directory rename during fid_lookup. Add a
per superblock rename_lock rwlock to prevent parallel fid lookup and rename.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
diff --git a/fs/9p/fid.c b/fs/9p/fid.c
index 7317b39..14b542a 100644
--- a/fs/9p/fid.c
+++ b/fs/9p/fid.c
@@ -139,14 +139,19 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
fid = v9fs_fid_find(dentry, uid, any);
if (fid)
return fid;
-
+ /*
+ * we don't have a matching fid. To do a TWALK we need
+ * parent fid. We need to prevent rename when we want to
+ * look at the parent.
+ */
+ read_lock(&v9ses->rename_lock);
ds = dentry->d_parent;
fid = v9fs_fid_find(ds, uid, any);
if (!fid) { /* walk from the root */
n = 0;
for (ds = dentry; !IS_ROOT(ds); ds = ds->d_parent)
n++;
-
+ read_unlock(&v9ses->rename_lock);
fid = v9fs_fid_find(ds, uid, any);
if (!fid) { /* the user is not attached to the fs yet */
if (access == V9FS_ACCESS_SINGLE)
@@ -165,9 +170,11 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
v9fs_fid_add(ds, fid);
}
- } else /* walk from the parent */
+ } else {
+ /* walk from the parent */
n = 1;
-
+ read_unlock(&v9ses->rename_lock);
+ }
if (ds == dentry)
return fid;
@@ -175,8 +182,10 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
if (!wnames)
return ERR_PTR(-ENOMEM);
+ read_lock(&v9ses->rename_lock);
for (d = dentry, i = (n-1); i >= 0; i--, d = d->d_parent)
wnames[i] = (char *) d->d_name.name;
+ read_unlock(&v9ses->rename_lock);
clone = ...Linus, Al,
Does this approach satisfy your concerns? We've been going over
several different options on how to proceed, but this seems to be the
best option. The v9fs_fid_lookup traversal of the d_parent path is an
exception case that hardly ever gets hit in practice so the extra
locking shouldn't be a performance problem, but we haven't been able
to figure out a way to resolve it in another fashion for existing
use-case scenarios.
Thanks for any guidance,
-eric
On Thu, Jun 24, 2010 at 11:30 AM, Aneesh Kumar K. V
--
Using a p9fs rename lock does seem to be a reasonable option.
That said, the patch itself seems to not be valid. You drop the lock
too early in v9fs_fid_lookup() as far as I can tell. You then re-take
it before doing that whole
for (d = dentry, i = (n-1); i >= 0; i--, d = d->d_parent)
loop with it held again, but that's totally bogus - because you
dropped the lock, your 'n-1' count has absolutely no meaning any more,
since a cross-directory rename might have changed the depth of the
thing in the meantime.
And if the depth changes, you aren't at all guaranteed to stay on the
same p9fs filesystem, so now you're doing that d_parent access without
the proper locking (sure: you hold the rename lock, but it's not at
all guaranteed that the rename lock is the _right_ lock any more as
you traverse the list down!)
But I didn't look deeply at the patch. There might be some reason why
it's safe (I doubt it, though), and there might be other places where
you do the same. But in general, dropping and re-taking a lock is a
bad idea. If you dropped the lock, you can't depend on anything you
found out while having held it.
Linus
--
You are correct. we cannot drop the rename lock in between. I also found
another issue in that we are using dentry->d_name.name directly. That
would imply we need to hold the rename_lock even during the
client_walk. How about the patch below ?. I updated the patch to hold
rename_lock during multiple path walk. Also the rename path is updated
to hold the lock during p9_client_rename operations.
commit 79f6f20dbb70ad35db37b674957c95de20662a75
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date: Wed Jun 30 15:37:58 2010 +0530
fs/9p: Prevent parallel rename when doing fid_lookup
During fid lookup we need to make sure that the dentry->d_parent doesn't
change so that we can safely walk the parent dentries. To ensure that
we need to prevent cross directory rename during fid_lookup. Add a
per superblock rename_lock rwlock to prevent parallel fid lookup and rename.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
diff --git a/fs/9p/fid.c b/fs/9p/fid.c
index 5d6cfcb..7b387fe 100644
--- a/fs/9p/fid.c
+++ b/fs/9p/fid.c
@@ -97,6 +97,34 @@ static struct p9_fid *v9fs_fid_find(struct dentry *dentry, u32 uid, int any)
return ret;
}
+/*
+ * We need to hold v9ses->rename_lock as long as we hold references
+ * to returned path array. Array element contain pointers to
+ * dentry names.
+ */
+static int build_path_from_dentry(struct v9fs_session_info *v9ses,
+ struct dentry *dentry, char ***names)
+{
+ int n = 0, i;
+ char **wnames;
+ struct dentry *ds;
+
+ for (ds = dentry; !IS_ROOT(ds); ds = ds->d_parent)
+ n++;
+
+ wnames = kmalloc(sizeof(char *) * n, GFP_KERNEL);
+ if (!wnames)
+ goto err_out;
+
+ for (ds = dentry, i = (n-1); i >= 0; i--, ds = ds->d_parent)
+ wnames[i] = (char *)ds->d_name.name;
+
+ *names = wnames;
+ return n;
+err_out:
+ return -ENOMEM;
+}
+
/**
* v9fs_fid_lookup - lookup for a fid, try to walk if not found
* @dentry: dentry to look for fid in
@@ -112,7 +140,7 ...On Wed, Jun 30, 2010 at 4:52 AM, Aneesh Kumar K. V
I'm not finding any obvious problems with this, and you're right that
you also need to hold the rename write-lock even for regular renames
(not just cross-directory ones) in order to protect the name itself.
So those two patches together seem to be ok at a quick glance.
That said, I do wonder if you wouldn't be better off copying the name
components in order to then drop the lock earlier. That way you'd only
need to hold the lock while walking the dentries, and could possibly
release it during the actual walk (unless you need the names to be
stable, but I don't think you can rely on that anyway, since other
clients might be doing renames concurrently.. I don't know)
Linus
--
Updated patch
From 0bae321401e88d495ea7e9e96272c11c1b9b9ec3 Mon Sep 17 00:00:00 2001
From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date: Wed, 30 Jun 2010 19:18:50 +0530
Subject: [PATCH] fs/9p: Prevent parallel rename when doing fid_lookup
During fid lookup we need to make sure that the dentry->d_parent doesn't
change so that we can safely walk the parent dentries. To ensure that
we need to prevent cross directory rename during fid_lookup. Add a
per superblock rename_lock rwlock to prevent parallel fid lookup and rename.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
fs/9p/fid.c | 134 ++++++++++++++++++++++++++++++++++++++---------------
fs/9p/v9fs.c | 1 +
fs/9p/v9fs.h | 1 +
fs/9p/vfs_inode.c | 14 ++++--
fs/9p/vfs_super.c | 1 +
5 files changed, 110 insertions(+), 41 deletions(-)
diff --git a/fs/9p/fid.c b/fs/9p/fid.c
index 5d6cfcb..7dcefaa 100644
--- a/fs/9p/fid.c
+++ b/fs/9p/fid.c
@@ -97,6 +97,48 @@ static struct p9_fid *v9fs_fid_find(struct dentry *dentry, u32 uid, int any)
return ret;
}
+static int build_path_from_dentry(struct v9fs_session_info *v9ses,
+ struct dentry *dentry, char ***names)
+{
+ int n = 0, i;
+ char **wnames;
+ struct dentry *ds;
+
+ read_lock(&v9ses->rename_lock);
+ for (ds = dentry; !IS_ROOT(ds); ds = ds->d_parent)
+ n++;
+
+ wnames = kmalloc(sizeof(char *) * n, GFP_KERNEL);
+ if (!wnames) {
+ n = 0;
+ goto err_out;
+ }
+ for (ds = dentry, i = (n-1); i >= 0; i--, ds = ds->d_parent) {
+ wnames[i] = kmalloc(strlen(ds->d_name.name) + 1, GFP_KERNEL);
+ if (!wnames[i])
+ goto err_out;
+
+ strcpy(wnames[i], ds->d_name.name);
+ }
+ *names = wnames;
+ read_unlock(&v9ses->rename_lock);
+ return n;
+err_out:
+ read_unlock(&v9ses->rename_lock);
+ for (; i < n; i++)
+ kfree(wnames[i]);
+ kfree(wnames);
+ return -ENOMEM;
+}
+
+static void kfree_wnames(char **wnames, int n)
+{
+ int i;
+ for (i = 0; i < n; ...I think that you need to use the s_vfs_rename_mutex in the super_block
struct instead of introducing a new rename_lock in the v9fs session.
Thanks,
Lucho
On Wed, Jun 30, 2010 at 5:52 AM, Aneesh Kumar K. V
--
I actually think it's better to avoid having filesystems muck around
with VFS locking details. Also, I think we get the VFS rename mutex
only for cross-directory renames, and as mentioned, 9p needs locking
even for regular directory renames.
(Also, this way you can have parallel readers - although we could
obviously change the vfs rename mutex into a rw-sem too).
So I do think that keeping the logic private to a 9p-specific lock is
the right solution here.
Linus
--
I guess that will have lock dependency issue with inode->i_mutex. Also as Linus suggested with dentry name copied we can avoid holding the lock when doing 9P operation. -aneesh --
To make sure d_name.name remain correct we need something like below ?
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index eae89ad..9f9f804 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1050,7 +1050,7 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
struct p9_fid *olddirfid;
struct p9_fid *newdirfid;
struct p9_wstat wstat;
- int retval, cross_dir_rename = 0;
+ int retval;
P9_DPRINTK(P9_DEBUG_VFS, "\n");
retval = 0;
@@ -1071,17 +1071,15 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
retval = PTR_ERR(newdirfid);
goto clunk_olddir;
}
- cross_dir_rename = (old_dentry->d_parent != new_dentry->d_parent);
- if (cross_dir_rename)
- write_lock(&v9ses->rename_lock);
+ write_lock(&v9ses->rename_lock);
if (v9fs_proto_dotl(v9ses)) {
retval = p9_client_rename(oldfid, newdirfid,
(char *) new_dentry->d_name.name);
if (retval != -ENOSYS)
goto clunk_newdir;
}
- if (cross_dir_rename) {
+ if (old_dentry->d_parent != new_dentry->d_parent) {
/*
* 9P .u can only handle file rename in the same directory
*/
@@ -1100,8 +1098,7 @@ clunk_newdir:
if (!retval)
/* successful rename */
d_move(old_dentry, new_dentry);
- if (cross_dir_rename)
- write_unlock(&v9ses->rename_lock);
+ write_unlock(&v9ses->rename_lock);
p9_client_clunk(newdirfid);
clunk_olddir:
--
Silly me. I sent out another patch using the dir inode passed through arguments. But we still need to analyze the use of dentry->d_parent in other parts of code.. --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead &qu |
