After questioning the value of d_ino, I am now convinced of its utility. This version of union mounts fills in d_ino of fallthru directory entries with the inode number of the target. You still need to stat() the entry to get st_dev if you want to do a file uniqueness comparison using the inode. See the patch introducing generic_readdir_fallthru() for the implementation. -VAL Felix Fietkau (2): whiteout: jffs2 whiteout support fallthru: jffs2 fallthru support Jan Blunck (10): VFS: Make lookup_hash() return a struct path autofs4: Save autofs trigger's vfsmount in super block info whiteout/NFSD: Don't return information about whiteouts to userspace whiteout: Add vfs_whiteout() and whiteout inode operation whiteout: Set opaque flag if new directory was previously a whiteout whiteout: Allow removal of a directory with whiteouts whiteout: Split of ext2_append_link() from ext2_add_link() whiteout: ext2 whiteout support union-mount: Introduce MNT_UNION and MS_UNION flags union-mount: Call do_whiteout() on unlink and rmdir in unions Valerie Aurora (27): VFS: Comment follow_mount() and friends VFS: Add read-only users count to superblock whiteout: tmpfs whiteout support fallthru: Basic fallthru definitions union-mount: Union mounts documentation union-mount: Introduce union_dir structure and basic operations union-mount: Free union dirs on removal from dcache union-mount: Support for union mounting file systems union-mount: Implement union lookup union-mount: Copy up directory entries on first readdir() union-mount: Add generic_readdir_fallthru() helper fallthru: ext2 fallthru support fallthru: tmpfs fallthru support VFS: Split inode_permission() and create path_permission() VFS: Create user_path_nd() to lookup both parent and target union-mount: In-kernel file copyup routines union-mount: Implement union-aware access()/faccessat() union-mount: Implement union-aware link() union-mount: Implement ...
From: Jan Blunck <jblunck@suse.de>
If we mkdir() a directory on the top layer of a union, we don't want
entries from a matching directory on the lower layer to "show through"
suddenly. To prevent this, we set the opaque flag on a directory if
there was previously a white-out with the same name. (If there is no
white-out and the directory exists in a lower layer, then mkdir() will
fail with EEXIST.)
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
fs/namei.c | 11 ++++++++++-
include/linux/fs.h | 5 +++++
2 files changed, 15 insertions(+), 1 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 665d394..cd8b0d0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2108,6 +2108,7 @@ SYSCALL_DEFINE3(mknod, const char __user *, filename, int, mode, unsigned, dev)
int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
{
int error = may_create(dir, dentry);
+ int opaque = 0;
if (error)
return error;
@@ -2120,9 +2121,17 @@ int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
if (error)
return error;
+ if (d_is_whiteout(dentry))
+ opaque = 1;
+
error = dir->i_op->mkdir(dir, dentry, mode);
- if (!error)
+ if (!error) {
fsnotify_mkdir(dir, dentry);
+ if (opaque) {
+ dentry->d_inode->i_flags |= S_OPAQUE;
+ mark_inode_dirty(dentry->d_inode);
+ }
+ }
return error;
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1f80897..1dbe156 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -236,6 +236,7 @@ struct inodes_stat_t {
#define S_NOCMTIME 128 /* Do not update file c/mtime */
#define S_SWAPFILE 256 /* Do not truncate: swapon got its bmaps */
#define S_PRIVATE 512 /* Inode is fs-internal */
+#define S_OPAQUE 1024 /* Directory is opaque */
/*
* Note that nosuid etc flags are inode-specific: setting some file-system
@@ -270,6 +271,7 @@ struct inodes_stat_t {
#define IS_NOCMTIME(inode) ((inode)->i_flags & S_NOCMTIME)
...Add comments describing what the directions "up" and "down" mean and
ref count handling to the VFS follow_mount() family of functions.
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
fs/namei.c | 43 +++++++++++++++++++++++++++++++++++++++----
fs/namespace.c | 16 ++++++++++++++--
2 files changed, 53 insertions(+), 6 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 868d0cb..fd6df0d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -597,6 +597,17 @@ loop:
return err;
}
+/*
+ * follow_up - Find the mountpoint of path's vfsmount
+ *
+ * Given a path, find the mountpoint of its source file system.
+ * Replace @path with the path of the mountpoint in the parent mount.
+ * Up is towards /.
+ *
+ * Return 1 if we went up a level and 0 if we were already at the
+ * root.
+ */
+
int follow_up(struct path *path)
{
struct vfsmount *parent;
@@ -617,8 +628,22 @@ int follow_up(struct path *path)
return 1;
}
-/* no need for dcache_lock, as serialization is taken care in
- * namespace.c
+/*
+ * __follow_mount - Return the most recent mount at this mountpoint
+ *
+ * Given a mountpoint, find the most recently mounted file system at
+ * this mountpoint and return the path to its root dentry. This is
+ * the file system that is visible, and it is in the direction of VFS
+ * "down" - away from the root of the mount tree. See comments to
+ * lookup_mnt() for an example of "down."
+ *
+ * Does not decrement the refcount on the given mount even if it
+ * follows it to another mount and returns that path instead.
+ *
+ * Returns 0 if path was unchanged, 1 if we followed it to another mount.
+ *
+ * No need for dcache_lock, as serialization is taken care in
+ * namespace.c.
*/
static int __follow_mount(struct path *path)
{
@@ -637,6 +662,12 @@ static int __follow_mount(struct path *path)
return res;
}
+/*
+ * Like __follow_mount, but no return value and drops references to
+ * both ...From: Jan Blunck <jblunck@suse.de>
XXX - This is broken and included just to make union mounts work. Ian
Kent and David Howells are working on a long-term solution that will
replace abuse of ->follow_link() to trigger an automount with a new
op.
Original commit message:
This is a bugfix/replacement for commit
051d381259eb57d6074d02a6ba6e90e744f1a29f:
During a path walk if an autofs trigger is mounted on a dentry,
when the follow_link method is called, the nameidata struct
contains the vfsmount and mountpoint dentry of the parent mount
while the dentry that is passed in is the root of the autofs
trigger mount. I believe it is impossible to get the vfsmount of
the trigger mount, within the follow_link method, when only the
parent vfsmount and the root dentry of the trigger mount are
known.
The pre solution in this commit was to replace the path embedded in the
parent's nameidata with the path of the link itself in
__do_follow_link(). This is a relatively harmless misuse of the
field, but union mounts ran into a bug during follow_link() caused by
the nameidata containing the wrong path (we count on it being what it
is all other places - the path of the parent).
A cleaner and easier to understand solution is to save the necessary
vfsmount in the autofs superblock info when it is mounted. Then we
can easily update the vfsmount in autofs4_follow_link().
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Acked-by: Ian Kent <raven@themaw.net>
Cc: autofs@linux.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
fs/autofs4/autofs_i.h | 1 +
fs/autofs4/init.c | 11 ++++++++++-
fs/autofs4/root.c | 6 ++++++
fs/namei.c | 7 ++-----
4 files changed, 19 insertions(+), 6 deletions(-)
diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index 3d283ab..de3af64 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -133,6 +133,7 @@ struct ...From: Felix Fietkau <nbd@openwrt.org> Add support for whiteout dentries to jffs2. XXX - David Woodhouse suggests several changes and provides an untested patch. See: http://patchwork.ozlabs.org/patch/50466/ XXX - Backward compatibility? Creating a whiteout on a JFFS2 file system can only happen if it is deliberately mounted "-o union" so there is some way to prevent creation of whiteouts on a file system you want to later mount with an earlier (no support for whiteout) file system. However, ext2/3 has much more robust methods (explicit fs feature flag) to prevent such an occurance. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Valerie Aurora <vaurora@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: linux-mtd@lists.infradead.org --- fs/jffs2/dir.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++- fs/jffs2/fs.c | 4 +++ fs/jffs2/super.c | 2 +- include/linux/jffs2.h | 2 + 4 files changed, 77 insertions(+), 3 deletions(-) diff --git a/fs/jffs2/dir.c b/fs/jffs2/dir.c index 166062a..4798586 100644 --- a/fs/jffs2/dir.c +++ b/fs/jffs2/dir.c @@ -34,6 +34,8 @@ static int jffs2_mknod (struct inode *,struct dentry *,int,dev_t); static int jffs2_rename (struct inode *, struct dentry *, struct inode *, struct dentry *); +static int jffs2_whiteout (struct inode *, struct dentry *, struct dentry *); + const struct file_operations jffs2_dir_operations = { .read = generic_read_dir, @@ -56,6 +58,7 @@ const struct inode_operations jffs2_dir_inode_operations = .mknod = jffs2_mknod, .rename = jffs2_rename, .check_acl = jffs2_check_acl, + .whiteout = jffs2_whiteout, .setattr = jffs2_setattr, .setxattr = jffs2_setxattr, .getxattr = jffs2_getxattr, @@ -98,8 +101,14 @@ static struct dentry *jffs2_lookup(struct inode *dir_i, struct dentry *target, fd = fd_list; } } - if (fd) - ino = fd->ino; + if (fd) { + spin_lock(&target->d_lock); + if (fd->type == ...
Add support for whiteout dentries to tmpfs. This includes adding support for whiteouts to d_genocide(), which is called to tear down pinned tmpfs dentries. Whiteouts have to be persistent, so they have a pinning extra ref count that needs to be dropped by d_genocide(). Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Valerie Aurora <vaurora@redhat.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: linux-mm@kvack.org --- fs/dcache.c | 13 +++++- mm/shmem.c | 145 +++++++++++++++++++++++++++++++++++++++++++++++++++++------ 2 files changed, 143 insertions(+), 15 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 80f059b..79b9f6a 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2229,7 +2229,18 @@ resume: struct list_head *tmp = next; struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child); next = tmp->next; - if (d_unhashed(dentry)||!dentry->d_inode) + /* + * Skip unhashed and negative dentries, but process + * positive dentries and whiteouts. A whiteout looks + * kind of like a negative dentry for purposes of + * lookup, but it has an extra pinning ref count + * because it can't be evicted like a negative dentry + * can. What we care about here is ref counts - and + * we need to drop the ref count on a whiteout before + * we can evict it. + */ + if (d_unhashed(dentry)||(!dentry->d_inode && + !d_is_whiteout(dentry))) continue; if (!list_empty(&dentry->d_subdirs)) { this_parent = dentry; diff --git a/mm/shmem.c b/mm/shmem.c index f65f840..a0a4fa5 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1805,6 +1805,76 @@ static int shmem_statfs(struct dentry *dentry, struct kstatfs *buf) return 0; } +static int shmem_rmdir(struct inode *dir, struct dentry *dentry); +static int shmem_unlink(struct inode *dir, struct dentry *dentry); + +/* + * This is the whiteout support for tmpfs. It uses one singleton whiteout + * inode per ...
From: Jan Blunck <jblunck@suse.de> This patch adds whiteout support to EXT2. A whiteout is an empty directory entry (inode == 0) with the file type set to EXT2_FT_WHT. Therefore it allocates space in directories. Due to being implemented as a filetype it is necessary to have the EXT2_FEATURE_INCOMPAT_FILETYPE flag set. XXX - Needs serious review. Al wonders: What happens with a delete at the beginning of a block? Will we find the matching dentry or the first empty space? Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Valerie Aurora <vaurora@redhat.com> Cc: Theodore Tso <tytso@mit.edu> Cc: linux-ext4@vger.kernel.org --- fs/ext2/dir.c | 96 +++++++++++++++++++++++++++++++++++++++++++++-- fs/ext2/ext2.h | 3 + fs/ext2/inode.c | 11 ++++- fs/ext2/namei.c | 63 +++++++++++++++++++++++++++++- fs/ext2/super.c | 5 ++ include/linux/ext2_fs.h | 4 ++ 6 files changed, 172 insertions(+), 10 deletions(-) diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c index 57207a9..030bd46 100644 --- a/fs/ext2/dir.c +++ b/fs/ext2/dir.c @@ -219,7 +219,7 @@ static inline int ext2_match (int len, const char * const name, { if (len != de->name_len) return 0; - if (!de->inode) + if (!de->inode && (de->file_type != EXT2_FT_WHT)) return 0; return !memcmp(name, de->name, len); } @@ -255,6 +255,7 @@ static unsigned char ext2_filetype_table[EXT2_FT_MAX] = { [EXT2_FT_FIFO] = DT_FIFO, [EXT2_FT_SOCK] = DT_SOCK, [EXT2_FT_SYMLINK] = DT_LNK, + [EXT2_FT_WHT] = DT_WHT, }; #define S_SHIFT 12 @@ -448,6 +449,26 @@ ino_t ext2_inode_by_name(struct inode *dir, struct qstr *child) return res; } +/* Special version for filetype based whiteout support */ +ino_t ext2_inode_by_dentry(struct inode *dir, struct dentry *dentry) +{ + ino_t res = 0; + struct ext2_dir_entry_2 *de; + struct page *page; + + de = ext2_find_entry (dir, &dentry->d_name, &page); + if (de) { + res = le32_to_cpu(de->inode); + if (!res ...
From: Jan Blunck <jblunck@suse.de> The ext2_append_link() is later used to find or append a directory entry to whiteout. Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Valerie Aurora <vaurora@redhat.com> Cc: Theodore Tso <tytso@mit.edu> Cc: linux-ext4@vger.kernel.org --- fs/ext2/dir.c | 70 ++++++++++++++++++++++++++++++++++++++++---------------- 1 files changed, 50 insertions(+), 20 deletions(-) diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c index 7516957..57207a9 100644 --- a/fs/ext2/dir.c +++ b/fs/ext2/dir.c @@ -472,9 +472,10 @@ void ext2_set_link(struct inode *dir, struct ext2_dir_entry_2 *de, } /* - * Parent is locked. + * Find or append a given dentry to the parent directory */ -int ext2_add_link (struct dentry *dentry, struct inode *inode) +static ext2_dirent * ext2_append_entry(struct dentry * dentry, + struct page ** page) { struct inode *dir = dentry->d_parent->d_inode; const char *name = dentry->d_name.name; @@ -482,13 +483,10 @@ int ext2_add_link (struct dentry *dentry, struct inode *inode) unsigned chunk_size = ext2_chunk_size(dir); unsigned reclen = EXT2_DIR_REC_LEN(namelen); unsigned short rec_len, name_len; - struct page *page = NULL; - ext2_dirent * de; + ext2_dirent * de = NULL; unsigned long npages = dir_pages(dir); unsigned long n; char *kaddr; - loff_t pos; - int err; /* * We take care of directory expansion in the same loop. @@ -498,20 +496,19 @@ int ext2_add_link (struct dentry *dentry, struct inode *inode) for (n = 0; n <= npages; n++) { char *dir_end; - page = ext2_get_page(dir, n, 0); - err = PTR_ERR(page); - if (IS_ERR(page)) + *page = ext2_get_page(dir, n, 0); + de = ERR_PTR(PTR_ERR(*page)); + if (IS_ERR(*page)) goto out; - lock_page(page); - kaddr = page_address(page); + lock_page(*page); + kaddr = page_address(*page); dir_end = kaddr + ext2_last_byte(dir, n); de = (ext2_dirent *)kaddr; kaddr += PAGE_CACHE_SIZE - reclen; while ((char ...
If process doing the lookup doesn't have write permission on the top level directory then the lookup will fail. This is not intended, is it? Thanks, Miklos --
Does it fail? I'm not checking permissions before calling ->fallthru(). But I can't test this because the code doesn't set the owner of the copied up directory correctly. :) Don't bother doing any permission testing on this version - it's known buggy and I will fix it in the next release. Thanks, -VAL --
It fails because everything, including copyup, is done with the credentials of the user doing the lookup/copyup. This is wrong, for the time of the copyup the credentials need to be upgraded to be able to create and copy the lower file or directory into the upper filesystem even when the current process doesn't have enough privileges for that. Thanks, Miklos --
