[PATCH] vfs: fix vfs_rename_dir for FS_RENAME_DOES_D_MOVE filesystems

Previous thread: [PATCH] partitions: Prefer strlcpy() over snprintf() by Jean Delvare on Thursday, July 10, 2008 - 7:30 am. (3 messages)

Next thread: [RFC] Improved versioned pointer algorithms by Daniel Phillips on Sunday, July 13, 2008 - 4:55 am. (1 message)
To: <linux-fsdevel@...>, <viro@...>
Cc: <akpm@...>, <hch@...>
Date: Friday, July 11, 2008 - 3:47 pm

Al, Christoph,

Zach just ran into this bug as well. Does this fix look reasonable?

thanks-
sage

----
From Sage Weil <sage@newdream.net>

d_move() is strangely implemented in that it swaps the position of
new_dentry and old_dentry in the namespace. This is admittedly weird (see
comments for d_move_locked()), but normally harmless: even though
new_dentry swaps places with old_dentry, it is unhashed, and won't be seen
by a subsequent lookup.

However, vfs_rename_dir() doesn't properly account for filesystems with
FS_RENAME_DOES_D_MOVE. If new_dentry has a target inode attached, it
unhashes the new_dentry prior to the rename() iop and rehashes it after,
but doesn't account for the possibility that rename() may have swapped
{old,new}_dentry. For FS_RENAME_DOES_D_MOVE filesystems, it rehashes
new_dentry (now the old renamed-from name, which d_move() expected to go
away), such that a subsequent lookup will find it.

To correct this, move vfs_rename_dir()'s call to d_move() _before_ the
target inode mutex is dealt with. Since d_move() will have been called
for all filesystems at this point, there is no need to rehash new_dentry
unless the rename failed. (If the rename succeeded, old_dentry should
already be rehashed in the new location.)

The only in-tree filesystems with FS_RENAME_DOES_D_MOVE are ocfs2 and nfs.
My suspicion is that they are not bitten by this particular bug because
the incorrectly rehashed new_dentry gets rejected by d_revalidate().

This was caught by the recently posted POSIX fstest suite, rename/10.t
test 62 (and others) on ceph. With this patch, all tests succeed.

Signed-off-by: Sage Weil <sage@newdream.net>
---
fs/namei.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

--- linux-2.6.25-orig/fs/namei.c 2008-04-16 19:49:44.000000000 -0700
+++ linux/fs/namei.c 2008-04-18 13:59:30.000000000 -0700
@@ -2488,17 +2488,18 @@
error = -EBUSY;
else
error = old_dir->i_op->rename(old_dir...

To: <sage@...>
Cc: <linux-fsdevel@...>, <viro@...>, <akpm@...>, <hch@...>
Date: Friday, July 11, 2008 - 4:53 pm

I think rehashing the new dentry is bogus, even on error. And it
looks racy with lookup as well.

I wonder what the original reason for that was? Git history doesn't
tell...

So a better fix would be just to remove the rehashing completely.
Does the below patch work for you?

Thanks,
Miklos

---
fs/namei.c | 2 --
1 file changed, 2 deletions(-)

Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c 2008-07-11 22:09:32.000000000 +0200
+++ linux-2.6/fs/namei.c 2008-07-11 22:40:16.000000000 +0200
@@ -2643,8 +2643,6 @@ static int vfs_rename_dir(struct inode *
if (!error)
target->i_flags |= S_DEAD;
mutex_unlock(&target->i_mutex);
- if (d_unhashed(new_dentry))
- d_rehash(new_dentry);
dput(new_dentry);
}
if (!error)
--

To: Miklos Szeredi <miklos@...>
Cc: <linux-fsdevel@...>, <viro@...>, <akpm@...>, <hch@...>
Date: Friday, July 11, 2008 - 6:15 pm

I assume just to leave the dentry in the same stat we originally found it

This would work as well, yeah. I've no real preference, here...

thanks-
--

To: Miklos Szeredi <miklos@...>
Cc: <sage@...>, <linux-fsdevel@...>, <viro@...>, <akpm@...>, <hch@...>
Date: Friday, July 11, 2008 - 6:12 pm

So we'd just come back through lookup to repopulate the existing
destination name that vfs_rename_dir() unhashed before calling
->rename() in the case that the rename fails? That seems gross, but

It'd work for my case, yeah.

- z
--

To: <zach.brown@...>
Cc: <miklos@...>, <sage@...>, <linux-fsdevel@...>, <viro@...>, <akpm@...>, <hch@...>
Date: Friday, July 18, 2008 - 6:59 am

We are talking about an _extremely_ rare event. Even the
"vfs_rename_dir() with positive target" is very rare, let alone a
failing one.

If we are going to worry about directory removal failure cases, we
should start with rmdir(), which is a wee bit more common, than the
above case here.

Miklos
--

To: Miklos Szeredi <miklos@...>
Cc: <sage@...>, <linux-fsdevel@...>, <viro@...>, <akpm@...>, <hch@...>
Date: Friday, July 18, 2008 - 3:44 pm

Agreed, which is why I called it relatively harmless.

- z
--

Previous thread: [PATCH] partitions: Prefer strlcpy() over snprintf() by Jean Delvare on Thursday, July 10, 2008 - 7:30 am. (3 messages)

Next thread: [RFC] Improved versioned pointer algorithms by Daniel Phillips on Sunday, July 13, 2008 - 4:55 am. (1 message)