Major changes since last version: * Separate MS_FALLTHRU instead of piggybacking on MS_WHITEOUT * Renumbering of ext2 flags - backwards incompatible * Bug fix for > 2 layers * Better mount error messages (from Miklos Szeredi) * Rebase against 2.6.35 This branch is named "ms_fallthru" and is in the usual git tree: git://git.kernel.org/pub/scm/linux/kernel/git/val/linux-2.6.git You will have to update your util-linux-ng and e2fsprogs to get the new magic numbers for ext2 feature flags and mount options. You have to throw away any existing union mount ext2 disk images and build new ones (all two of you who have them). Branch "union_mount" of both: git://git.kernel.org/pub/scm/fs/ext2/val/e2fsprogs.git git://git.kernel.org/pub/scm/utils/util-linux-ng/val/util-linux-ng.git -VAL Felix Fietkau (2): whiteout: jffs2 whiteout support fallthru: jffs2 fallthru support Jan Blunck (10): VFS: Make lookup_hash() return a struct path autofs4: Save autofs trigger's vfsmount in super block info whiteout/NFSD: Don't return information about whiteouts to userspace whiteout: Add vfs_whiteout() and whiteout inode operation whiteout: Set opaque flag if new directory was previously a whiteout whiteout: Allow removal of a directory with whiteouts whiteout: Split of ext2_append_link() from ext2_add_link() whiteout: ext2 whiteout support union-mount: Introduce MNT_UNION and MS_UNION flags union-mount: Call do_whiteout() on unlink and rmdir in unions Valerie Aurora (26): VFS: Comment follow_mount() and friends VFS: Add read-only users count to superblock whiteout: tmpfs whiteout support fallthru: Basic fallthru definitions fallthru: ext2 fallthru support fallthru: tmpfs fallthru support union-mount: Union mounts documentation union-mount: Introduce union_dir structure and basic operations union-mount: Free union dirs on removal from dcache union-mount: Support for union mounting file systems union-mount: Implement union ...
From: Jan Blunck <jblunck@suse.de> This patch adds whiteout support to EXT2. A whiteout is an empty directory entry (inode == 0) with the file type set to EXT2_FT_WHT. Therefore it allocates space in directories. Due to being implemented as a filetype it is necessary to have the EXT2_FEATURE_INCOMPAT_FILETYPE flag set. XXX - Needs serious review. Al wonders: What happens with a delete at the beginning of a block? Will we find the matching dentry or the first empty space? Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Valerie Aurora <vaurora@redhat.com> Cc: Theodore Tso <tytso@mit.edu> Cc: linux-ext4@vger.kernel.org --- fs/ext2/dir.c | 96 +++++++++++++++++++++++++++++++++++++++++++++-- fs/ext2/ext2.h | 3 + fs/ext2/inode.c | 11 ++++- fs/ext2/namei.c | 63 +++++++++++++++++++++++++++++- fs/ext2/super.c | 5 ++ include/linux/ext2_fs.h | 4 ++ 6 files changed, 172 insertions(+), 10 deletions(-) diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c index 57207a9..030bd46 100644 --- a/fs/ext2/dir.c +++ b/fs/ext2/dir.c @@ -219,7 +219,7 @@ static inline int ext2_match (int len, const char * const name, { if (len != de->name_len) return 0; - if (!de->inode) + if (!de->inode && (de->file_type != EXT2_FT_WHT)) return 0; return !memcmp(name, de->name, len); } @@ -255,6 +255,7 @@ static unsigned char ext2_filetype_table[EXT2_FT_MAX] = { [EXT2_FT_FIFO] = DT_FIFO, [EXT2_FT_SOCK] = DT_SOCK, [EXT2_FT_SYMLINK] = DT_LNK, + [EXT2_FT_WHT] = DT_WHT, }; #define S_SHIFT 12 @@ -448,6 +449,26 @@ ino_t ext2_inode_by_name(struct inode *dir, struct qstr *child) return res; } +/* Special version for filetype based whiteout support */ +ino_t ext2_inode_by_dentry(struct inode *dir, struct dentry *dentry) +{ + ino_t res = 0; + struct ext2_dir_entry_2 *de; + struct page *page; + + de = ext2_find_entry (dir, &dentry->d_name, &page); + if (de) { + res = le32_to_cpu(de->inode); + if (!res ...
Document design and implementation of union mounts (a.k.a. writable overlays). Signed-off-by: Valerie Aurora <vaurora@redhat.com> --- Documentation/filesystems/union-mounts.txt | 752 ++++++++++++++++++++++++++++ 1 files changed, 752 insertions(+), 0 deletions(-) create mode 100644 Documentation/filesystems/union-mounts.txt diff --git a/Documentation/filesystems/union-mounts.txt b/Documentation/filesystems/union-mounts.txt new file mode 100644 index 0000000..977a2b5 --- /dev/null +++ b/Documentation/filesystems/union-mounts.txt @@ -0,0 +1,752 @@ +Union mounts (a.k.a. writable overlays) +======================================= + +This document describes the architecture and current status of union +mounts, also known as writable overlays. + +In this document: + - Overview of union mounts + - Terminology + - VFS implementation + - Locking strategy + - VFS/file system interface + - Userland interface + - NFS interaction + - Status + - Contributing to union mounts + +Overview +======== + +A union mount layers one read-write file system over one or more +read-only file systems, with all writes going to the writable file +system. The namespace of both file systems appears as a combined +whole to userland, with files and directories on the writable file +system covering up any files or directories with matching pathnames on +the read-only file system. The read-write file system is the +"topmost" or "upper" file system and the read-only file systems are +the "lower" file systems. A few use cases: + +- Root file system on CD with writes saved to hard drive (LiveCD) +- Multiple virtual machines with the same starting root file system +- Cluster with NFS mounted root on clients + +Most if not all of these problems could be solved with a COW block +device or a clustered file system (include NFS mounts). However, for +some use cases, sharing is more efficient and better performing if +done at the file system namespace level. COW block devices only +increase ...
From: Jan Blunck <jblunck@suse.de>
Add per mountpoint flag for Union Mount support. You need additional patches
to util-linux for that to work - see:
git://git.kernel.org/pub/scm/utils/util-linux-ng/val/util-linux-ng.git
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
fs/namespace.c | 5 ++++-
include/linux/fs.h | 1 +
include/linux/mount.h | 4 ++--
3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index 984c331..f115cb6 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -809,6 +809,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt)
{ MNT_NODIRATIME, ",nodiratime" },
{ MNT_RELATIME, ",relatime" },
{ MNT_STRICTATIME, ",strictatime" },
+ { MNT_UNION, ",union" },
{ 0, NULL }
};
const struct proc_fs_info *fs_infop;
@@ -2008,10 +2009,12 @@ long do_mount(char *dev_name, char *dir_name, char *type_page,
mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME);
if (flags & MS_RDONLY)
mnt_flags |= MNT_READONLY;
+ if (flags & MS_UNION)
+ mnt_flags |= MNT_UNION;
flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE |
MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
- MS_STRICTATIME);
+ MS_STRICTATIME | MS_UNION);
if (flags & MS_REMOUNT)
retval = do_remount(&path, flags & ~MS_REMOUNT, mnt_flags,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 71ee74e..31cfa48 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -192,6 +192,7 @@ struct inodes_stat_t {
#define MS_REMOUNT 32 /* Alter flags of a mounted FS */
#define MS_MANDLOCK 64 /* Allow mandatory locks on an FS */
#define MS_DIRSYNC 128 /* Directory modifications are synchronous */
+#define MS_UNION 256 /* Merge namespace with FS mounted below */
#define MS_NOATIME 1024 /* Do not update access times. */
#define MS_NODIRATIME 2048 /* Do not update directory access times */
#define MS_BIND 4096
diff --git ...This patch adds the basic structures and operations of VFS-based union mounts (but not the ability to mount or lookup unioned file systems). Each directory in a unioned file system has an associated union stack created when the directory is first looked up. The union stack is a union_dir structure kept in a hash table indexed by mount and dentry of the directory; thus, specific paths are unioned, not dentries alone. The union_dir keeps a pointer to the upper path and the lower path and can be looked up by either path. Currently only two layers are supported, but the union_dir struct is flexible enough to allow more than two layers. This particular version of union mounts is based on ideas by Jan Blunck, Bharata B. Rao, and many others. Signed-off-by: Valerie Aurora <vaurora@redhat.com> --- fs/Kconfig | 13 +++++ fs/Makefile | 1 + fs/dcache.c | 3 + fs/union.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/union.h | 66 ++++++++++++++++++++++++++ include/linux/dcache.h | 5 ++- 6 files changed, 206 insertions(+), 1 deletions(-) create mode 100644 fs/union.c create mode 100644 fs/union.h diff --git a/fs/Kconfig b/fs/Kconfig index 5f85b59..47409c9 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -59,6 +59,19 @@ source "fs/notify/Kconfig" source "fs/quota/Kconfig" +config UNION_MOUNT + bool "Union mounts (writable overlays) (EXPERIMENTAL)" + depends on EXPERIMENTAL + help + Union mounts allow you to mount a transparent writable + layer over a read-only file system, for example, an ext3 + partition on a hard drive over a CD-ROM root file system + image. + + See <file:Documentation/filesystems/union-mounts.txt> for details. + + If unsure, say N. + source "fs/autofs/Kconfig" source "fs/autofs4/Kconfig" source "fs/fuse/Kconfig" diff --git a/fs/Makefile b/fs/Makefile index e6ec1d3..936acf0 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ ...
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
fs/namei.c | 24 ++++++++++++++++++++----
1 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index e7b02fa..5b22cc5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2942,16 +2942,18 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
{
struct dentry *new_dentry;
struct nameidata nd;
+ struct nameidata old_nd;
struct path old_path;
int error;
char *to;
+ char *from;
if ((flags & ~AT_SYMLINK_FOLLOW) != 0)
return -EINVAL;
- error = user_path_at(olddfd, oldname,
+ error = user_path_nd(olddfd, oldname,
flags & AT_SYMLINK_FOLLOW ? LOOKUP_FOLLOW : 0,
- &old_path);
+ &old_nd, &old_path, &from);
if (error)
return error;
@@ -2959,8 +2961,20 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
if (error)
goto out;
error = -EXDEV;
- if (old_path.mnt != nd.path.mnt)
- goto out_release;
+ if (old_path.mnt != nd.path.mnt) {
+ if (IS_DIR_UNIONED(old_nd.path.dentry) &&
+ (old_nd.path.mnt == nd.path.mnt)) {
+ error = mnt_want_write(old_nd.path.mnt);
+ if (error)
+ goto out_release;
+ error = union_copyup(&old_nd, &old_path);
+ mnt_drop_write(old_nd.path.mnt);
+ if (error)
+ goto out_release;
+ } else {
+ goto out_release;
+ }
+ }
new_dentry = lookup_create(&nd, 0);
error = PTR_ERR(new_dentry);
if (IS_ERR(new_dentry))
@@ -2983,6 +2997,8 @@ out_release:
putname(to);
out:
path_put(&old_path);
+ path_put(&old_nd.path);
+ putname(from);
return error;
}
--
1.6.3.3
--
Copy up a file when opened with write permissions. Does not copy up
the file data when O_TRUNC is specified.
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
fs/namei.c | 28 ++++++++++++++++++++++++++++
1 files changed, 28 insertions(+), 0 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 67ebf4a..88d1a79 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1911,6 +1911,24 @@ exit:
return ERR_PTR(error);
}
+static int open_union_copyup(struct nameidata *nd, struct path *path,
+ int open_flag)
+{
+ struct vfsmount *oldmnt = path->mnt;
+ int error;
+
+ if (open_flag & O_TRUNC)
+ error = union_copyup_len(nd, path, 0);
+ else
+ error = union_copyup(nd, path);
+ if (error)
+ return error;
+ if (oldmnt != path->mnt)
+ mntput(nd->path.mnt);
+
+ return error;
+}
+
static struct file *do_last(struct nameidata *nd, struct path *path,
int open_flag, int acc_mode,
int mode, const char *pathname)
@@ -1962,6 +1980,11 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
if (!path->dentry->d_inode->i_op->lookup)
goto exit_dput;
}
+ if (acc_mode & MAY_WRITE) {
+ error = open_union_copyup(nd, path, open_flag);
+ if (error)
+ goto exit_dput;
+ }
path_to_nameidata(path, nd);
audit_inode(pathname, nd->path.dentry);
goto ok;
@@ -2033,6 +2056,11 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
if (path->dentry->d_inode->i_op->follow_link)
return NULL;
+ if (acc_mode & MAY_WRITE) {
+ error = open_union_copyup(nd, path, open_flag);
+ if (error)
+ goto exit_dput;
+ }
path_to_nameidata(path, nd);
error = -EISDIR;
if (S_ISDIR(path->dentry->d_inode->i_mode))
--
1.6.3.3
--
On rename() of a file on union mount, copyup and whiteout the source
file. Both are done under the rename mutex. I believe this is
actually atomic.
XXX - May not need to do file copyup under the lock.
XXX - Convert newly empty unioned dirs to not-unioned
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
fs/namei.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 70 insertions(+), 6 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 5b22cc5..67ebf4a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3159,6 +3159,7 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
{
struct dentry *old_dir, *new_dir;
struct path old, new;
+ struct path to_whiteout = {NULL, NULL};
struct dentry *trap;
struct nameidata oldnd, newnd;
char *from;
@@ -3174,13 +3175,9 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
goto exit1;
error = -EXDEV;
+ /* Union mounts will pass below test - dirs always on topmost */
if (oldnd.path.mnt != newnd.path.mnt)
goto exit2;
- /* Rename on union mounts not implemented yet */
- /* XXX much harsher check than necessary - can do some renames */
- if (IS_DIR_UNIONED(oldnd.path.dentry) ||
- IS_DIR_UNIONED(newnd.path.dentry))
- goto exit2;
old_dir = oldnd.path.dentry;
error = -EBUSY;
if (oldnd.last_type != LAST_NORM)
@@ -3203,7 +3200,7 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
error = -ENOENT;
if (!old.dentry->d_inode)
goto exit4;
- /* unless the source is a directory trailing slashes give -ENOTDIR */
+ /* unless the source is a directory, trailing slashes give -ENOTDIR */
if (!S_ISDIR(old.dentry->d_inode->i_mode)) {
error = -ENOTDIR;
if (oldnd.last.name[oldnd.last.len])
@@ -3215,6 +3212,11 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
error = -EINVAL;
if (old.dentry == trap)
goto exit4;
+ error = -EXDEV;
+ /* Can't rename a directory from a lower layer ...Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
fs/open.c | 23 ++++++++++++++++++++---
1 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/fs/open.c b/fs/open.c
index fc56da0..8588b31 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -552,18 +552,35 @@ static int chown_common(struct path *path, uid_t user, gid_t group)
SYSCALL_DEFINE3(chown, const char __user *, filename, uid_t, user, gid_t, group)
{
struct path path;
+ struct nameidata nd;
+ struct vfsmount *mnt;
+ char *tmp;
int error;
- error = user_path(filename, &path);
+ error = user_path_nd(AT_FDCWD, filename, LOOKUP_FOLLOW,
+ &nd, &path, &tmp);
if (error)
goto out;
- error = mnt_want_write(path.mnt);
+
+ if (IS_DIR_UNIONED(nd.path.dentry))
+ mnt = nd.path.mnt;
+ else
+ mnt = path.mnt;
+
+ error = mnt_want_write(mnt);
if (error)
goto out_release;
+
+ error = union_copyup(&nd, &path);
+ if (error)
+ goto out_drop_write;
error = chown_common(&path, user, group);
- mnt_drop_write(path.mnt);
+out_drop_write:
+ mnt_drop_write(mnt);
out_release:
path_put(&path);
+ path_put(&nd.path);
+ putname(tmp);
out:
return error;
}
--
1.6.3.3
--
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
fs/open.c | 24 ++++++++++++++++++++----
1 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/fs/open.c b/fs/open.c
index 8588b31..e4fc8e5 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -64,14 +64,17 @@ int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
static long do_sys_truncate(const char __user *pathname, loff_t length)
{
struct path path;
+ struct nameidata nd;
+ struct vfsmount *mnt;
struct inode *inode;
+ char *tmp;
int error;
error = -EINVAL;
if (length < 0) /* sorry, but loff_t says... */
goto out;
- error = user_path(pathname, &path);
+ error = user_path_nd(AT_FDCWD, pathname, 0, &nd, &path, &tmp);
if (error)
goto out;
inode = path.dentry->d_inode;
@@ -85,11 +88,16 @@ static long do_sys_truncate(const char __user *pathname, loff_t length)
if (!S_ISREG(inode->i_mode))
goto dput_and_out;
- error = mnt_want_write(path.mnt);
+ if (IS_DIR_UNIONED(nd.path.dentry))
+ mnt = nd.path.mnt;
+ else
+ mnt = path.mnt;
+
+ error = mnt_want_write(mnt);
if (error)
goto dput_and_out;
- error = inode_permission(inode, MAY_WRITE);
+ error = path_permission(&path, &nd.path, MAY_WRITE);
if (error)
goto mnt_drop_write_and_out;
@@ -97,6 +105,12 @@ static long do_sys_truncate(const char __user *pathname, loff_t length)
if (IS_APPEND(inode))
goto mnt_drop_write_and_out;
+ error = union_copyup_len(&nd, &path, length);
+ if (error)
+ goto mnt_drop_write_and_out;
+
+ /* path may have changed after copyup */
+ inode = path.dentry->d_inode;
error = get_write_access(inode);
if (error)
goto mnt_drop_write_and_out;
@@ -118,9 +132,11 @@ static long do_sys_truncate(const char __user *pathname, loff_t length)
put_write_and_out:
put_write_access(inode);
mnt_drop_write_and_out:
- mnt_drop_write(path.mnt);
+ mnt_drop_write(mnt);
dput_and_out:
path_put(&path);
+ path_put(&nd.path);
+ putname(tmp);
out:
...Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
fs/utimes.c | 14 ++++++++++++--
1 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/fs/utimes.c b/fs/utimes.c
index e4c75db..e83b6bd 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -8,8 +8,10 @@
#include <linux/stat.h>
#include <linux/utime.h>
#include <linux/syscalls.h>
+#include <linux/slab.h>
#include <asm/uaccess.h>
#include <asm/unistd.h>
+#include "union.h"
#ifdef __ARCH_WANT_SYS_UTIME
@@ -152,18 +154,26 @@ long do_utimes(int dfd, char __user *filename, struct timespec *times, int flags
error = utimes_common(&file->f_path, times);
fput(file);
} else {
+ struct nameidata nd;
+ char *tmp;
struct path path;
int lookup_flags = 0;
if (!(flags & AT_SYMLINK_NOFOLLOW))
lookup_flags |= LOOKUP_FOLLOW;
- error = user_path_at(dfd, filename, lookup_flags, &path);
+ error = user_path_nd(dfd, filename, lookup_flags, &nd, &path,
+ &tmp);
if (error)
goto out;
- error = utimes_common(&path, times);
+ error = union_copyup(&nd, &path);
+
+ if (!error)
+ error = utimes_common(&path, times);
path_put(&path);
+ path_put(&nd.path);
+ putname(tmp);
}
out:
--
1.6.3.3
--
Split inode_permission() into inode and file-system-dependent parts.
Create path_permission() to check permission based on the path to the
inode. This is for union mounts, in which an inode can be located on
a read-only lower layer file system but is still writable, since we
will copy it up to the writable top layer file system. So in that
case, we want to ignore MS_RDONLY on the lower layer. To make this
decision, we must know the path (vfsmount, dentry) of both the target
and its parent.
XXX - so ugly!
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
fs/namei.c | 92 ++++++++++++++++++++++++++++++++++++++++++++--------
include/linux/fs.h | 1 +
2 files changed, 79 insertions(+), 14 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 2d30a5b..74d6852 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -241,29 +241,20 @@ int generic_permission(struct inode *inode, int mask,
}
/**
- * inode_permission - check for access rights to a given inode
+ * __inode_permission - check for access rights to a given inode
* @inode: inode to check permission on
* @mask: right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
*
* Used to check for read/write/execute permissions on an inode.
- * We use "fsuid" for this, letting us set arbitrary permissions
- * for filesystem access without changing the "normal" uids which
- * are used for other things.
+ *
+ * This does not check for a read-only file system. You probably want
+ * inode_permission().
*/
-int inode_permission(struct inode *inode, int mask)
+static int __inode_permission(struct inode *inode, int mask)
{
int retval;
if (mask & MAY_WRITE) {
- umode_t mode = inode->i_mode;
-
- /*
- * Nobody gets write access to a read-only fs.
- */
- if (IS_RDONLY(inode) &&
- (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)))
- return -EROFS;
-
/*
* Nobody gets write access to an immutable file.
*/
@@ -288,6 +279,79 @@ int inode_permission(struct inode ...For union mounts, a file located on the lower layer will incorrectly
return EROFS on an access check. To fix this, use the new
path_permission() call, which ignores a read-only lower layer file
system if the target will be copied up to the topmost file system.
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
fs/open.c | 21 +++++++++++++++++----
1 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/fs/open.c b/fs/open.c
index 5463266..fc56da0 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -31,6 +31,7 @@
#include <linux/ima.h>
#include "internal.h"
+#include "union.h"
int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
struct file *filp)
@@ -288,7 +289,10 @@ SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode)
const struct cred *old_cred;
struct cred *override_cred;
struct path path;
+ struct nameidata nd;
+ struct vfsmount *mnt;
struct inode *inode;
+ char *tmp;
int res;
if (mode & ~S_IRWXO) /* where's F_OK, X_OK, W_OK, R_OK? */
@@ -312,10 +316,17 @@ SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode)
old_cred = override_creds(override_cred);
- res = user_path_at(dfd, filename, LOOKUP_FOLLOW, &path);
+ res = user_path_nd(dfd, filename, LOOKUP_FOLLOW,
+ &nd, &path, &tmp);
if (res)
goto out;
+ /* For union mounts, use the topmost mnt's permissions */
+ if (IS_DIR_UNIONED(nd.path.dentry))
+ mnt = nd.path.mnt;
+ else
+ mnt = path.mnt;
+
inode = path.dentry->d_inode;
if ((mode & MAY_EXEC) && S_ISREG(inode->i_mode)) {
@@ -324,11 +335,11 @@ SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode)
* with the "noexec" flag.
*/
res = -EACCES;
- if (path.mnt->mnt_flags & MNT_NOEXEC)
+ if (mnt->mnt_flags & MNT_NOEXEC)
goto out_path_release;
}
- res = inode_permission(inode, mode | MAY_ACCESS);
+ res = path_permission(&path, &nd.path, mode | MAY_ACCESS);
/* SuS v2 requires ...Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
fs/xattr.c | 31 +++++++++++++++++++++++++------
1 files changed, 25 insertions(+), 6 deletions(-)
diff --git a/fs/xattr.c b/fs/xattr.c
index 7869788..67815eb 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -320,17 +320,36 @@ SYSCALL_DEFINE5(lsetxattr, const char __user *, pathname,
size_t, size, int, flags)
{
struct path path;
+ struct nameidata nd;
+ struct vfsmount *mnt;
+ char *tmp;
int error;
- error = user_lpath(pathname, &path);
+ error = user_path_nd(AT_FDCWD, pathname, 0, &nd, &path, &tmp);
if (error)
return error;
- error = mnt_want_write(path.mnt);
- if (!error) {
- error = setxattr(path.dentry, name, value, size, flags);
- mnt_drop_write(path.mnt);
- }
+
+ if (IS_DIR_UNIONED(nd.path.dentry))
+ mnt = nd.path.mnt;
+ else
+ mnt = path.mnt;
+
+ error = mnt_want_write(mnt);
+ if (error)
+ goto out;
+
+ error = union_copyup(&nd, &path);
+ if (error)
+ goto out_drop_write;
+
+ error = setxattr(path.dentry, name, value, size, flags);
+
+out_drop_write:
+ mnt_drop_write(mnt);
+out:
path_put(&path);
+ path_put(&nd.path);
+ putname(tmp);
return error;
}
--
1.6.3.3
--
When a file on the read-only layer of a union mount is altered, it must be copied up to the topmost read-write layer. This patch creates union_copyup() and its supporting routines. Thanks to Valdis Kletnieks for a bug fix. XXX - Miklos Szeredi points out: What happens if we crash halfway through the file copyup? Answer: A bug, the file is truncated. Needs fixing. Cc: Valdis.Kletnieks@vt.edu Signed-off-by: Valerie Aurora <vaurora@redhat.com> --- fs/union.c | 324 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/union.h | 7 +- 2 files changed, 330 insertions(+), 1 deletions(-) diff --git a/fs/union.c b/fs/union.c index 917248d..ee358f9 100644 --- a/fs/union.c +++ b/fs/union.c @@ -24,6 +24,8 @@ #include <linux/namei.h> #include <linux/file.h> #include <linux/security.h> +#include <linux/splice.h> +#include <linux/xattr.h> #include "union.h" @@ -191,6 +193,72 @@ int needs_lookup_union(struct path *parent_path, struct path *path) return 1; } +/** + * union_copyup_xattr + * + * @old: dentry of original file + * @new: dentry of new copy + * + * Copy up extended attributes from the original file to the new one. + * + * XXX - Permissions? For now, copying up every xattr. + */ + +static int union_copyup_xattr(struct dentry *old, struct dentry *new) +{ + ssize_t list_size, size; + char *buf, *name, *value; + int error; + + /* Check for xattr support */ + if (!old->d_inode->i_op->getxattr || + !new->d_inode->i_op->getxattr) + return 0; + + /* Find out how big the list of xattrs is */ + list_size = vfs_listxattr(old, NULL, 0); + if (list_size <= 0) + return list_size; + + /* Allocate memory for the list */ + buf = kzalloc(list_size, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + /* Allocate memory for the xattr's value */ + error = -ENOMEM; + value = kmalloc(XATTR_SIZE_MAX, GFP_KERNEL); + if (!value) + goto out; + + /* Actually get the list of xattrs */ + list_size = vfs_listxattr(old, buf, ...
Proof-of-concept implementation of user_path_nd(). Lookup both the
parent and the target of a user-supplied filename, to supply later to
union copyup routines.
XXX - Inefficient, racy, gets the parent of the symlink instead of the
parent of the target. Al Viro would like to see something more like
this:
user_path_mumble() looks up and returns:
parent nameidata
positive topmost dentry of target
negative dentry of target from the topmost layer (if it doesn't exist on top)
Both the positive lower dentry and negative topmost dentry are passed
to the following code, like do_chown(). The tests for permissions and
such-like are performed on the positive lower dentry. When it comes
time to actually modify the target, we call union_copyup() with both
positive and negative dentries (and the parent nameidata).
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
fs/namei.c | 31 +++++++++++++++++++++++++++++++
include/linux/namei.h | 2 ++
2 files changed, 33 insertions(+), 0 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 74d6852..e7b02fa 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1546,6 +1546,37 @@ static int user_path_parent(int dfd, const char __user *path,
return error;
}
+int user_path_nd(int dfd, const char __user *filename,
+ unsigned flags, struct nameidata *parent_nd,
+ struct path *child, char **tmp)
+{
+ struct nameidata child_nd;
+ char *s = getname(filename);
+ int error;
+
+ if (IS_ERR(s))
+ return PTR_ERR(s);
+
+ /* Lookup parent */
+ error = do_path_lookup(dfd, s, LOOKUP_PARENT, parent_nd);
+ if (error)
+ goto out_putname;
+
+ /* Lookup child - XXX optimize, racy */
+ error = do_path_lookup(dfd, s, flags, &child_nd);
+ if (error)
+ goto out_path_put;
+ *child = child_nd.path;
+ *tmp = s;
+ return 0;
+
+out_path_put:
+ path_put(&parent_nd->path);
+out_putname:
+ putname(s);
+ return error;
+}
+
/*
* It's inline, so penalty for filesystems that don't use sticky bit is
* ...Implement unioned directories, whiteouts, and fallthrus in pathname lookup routines. do_lookup() and lookup_hash() call lookup_union() after looking up the dentry from the top-level file system. lookup_union() is centered around __lookup_hash(), which does cached and/or real lookups and revalidates each dentry in the union stack. XXX - implement negative union cache entries XXX - handle different permissions on directories Signed-off-by: Valerie Aurora <vaurora@redhat.com> --- fs/namei.c | 174 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- fs/union.c | 94 ++++++++++++++++++++++++++++++++ fs/union.h | 7 +++ 3 files changed, 274 insertions(+), 1 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 0b6378e..0821544 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -35,6 +35,7 @@ #include <asm/uaccess.h> #include "internal.h" +#include "union.h" /* [Feb-1997 T. Schoebel-Theuer] * Fundamental changes in the pathname lookup mechanisms (namei) @@ -723,6 +724,163 @@ static __always_inline void follow_dotdot(struct nameidata *nd) follow_mount(&nd->path); } +static struct dentry *__lookup_hash(struct qstr *name, struct dentry *base, + struct nameidata *nd); + +/* + * __lookup_union - Given a path from the topmost layer, lookup and + * revalidate each dentry in its union stack, building it if necessary + * + * @nd - nameidata for the parent of @topmost + * @name - pathname from this element on + * @topmost - path of the topmost matching dentry + * + * Given the nameidata and the path of the topmost dentry for this + * pathname, lookup, revalidate, and build the associated union stack. + * @topmost must be either a negative dentry or a directory, and not a + * whiteout. + * + * This function may stomp nd->path with the path of the parent + * directory of lower layer, so the caller must save nd->path and + * restore it afterwards. You probably want to use lookup_union(), + * not __lookup_union(). + */ + +static int ...
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
fs/xattr.c | 34 +++++++++++++++++++++++++++-------
1 files changed, 27 insertions(+), 7 deletions(-)
diff --git a/fs/xattr.c b/fs/xattr.c
index 01bb813..7869788 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -19,7 +19,7 @@
#include <linux/fsnotify.h>
#include <linux/audit.h>
#include <asm/uaccess.h>
-
+#include "union.h"
/*
* Check permissions for extended attribute access. This is a bit complicated
@@ -281,17 +281,37 @@ SYSCALL_DEFINE5(setxattr, const char __user *, pathname,
size_t, size, int, flags)
{
struct path path;
+ struct nameidata nd;
+ struct vfsmount *mnt;
+ char *tmp;
int error;
- error = user_path(pathname, &path);
+ error = user_path_nd(AT_FDCWD, pathname, LOOKUP_FOLLOW, &nd, &path,
+ &tmp);
if (error)
return error;
- error = mnt_want_write(path.mnt);
- if (!error) {
- error = setxattr(path.dentry, name, value, size, flags);
- mnt_drop_write(path.mnt);
- }
+
+ if (IS_DIR_UNIONED(nd.path.dentry))
+ mnt = nd.path.mnt;
+ else
+ mnt = path.mnt;
+
+ error = mnt_want_write(mnt);
+ if (error)
+ goto out;
+
+ error = union_copyup(&nd, &path);
+ if (error)
+ goto out_drop_write;
+
+ error = setxattr(path.dentry, name, value, size, flags);
+
+out_drop_write:
+ mnt_drop_write(mnt);
+out:
path_put(&path);
+ path_put(&nd.path);
+ putname(tmp);
return error;
}
--
1.6.3.3
--
From: Jan Blunck <jblunck@suse.de>
Call do_whiteout() when removing files and directories from a union
mounted file system.
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
fs/namei.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 0821544..2d30a5b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2596,6 +2596,10 @@ static long do_rmdir(int dfd, const char __user *pathname)
error = security_path_rmdir(&nd.path, path.dentry);
if (error)
goto exit4;
+ if (IS_DIR_UNIONED(nd.path.dentry)) {
+ error = do_whiteout(&nd, &path, 1);
+ goto exit4;
+ }
error = vfs_rmdir(nd.path.dentry->d_inode, path.dentry);
exit4:
mnt_drop_write(nd.path.mnt);
@@ -2685,6 +2689,10 @@ static long do_unlinkat(int dfd, const char __user *pathname)
error = security_path_unlink(&nd.path, path.dentry);
if (error)
goto exit3;
+ if (IS_DIR_UNIONED(nd.path.dentry)) {
+ error = do_whiteout(&nd, &path, 0);
+ goto exit3;
+ }
error = vfs_unlink(nd.path.dentry->d_inode, path.dentry);
exit3:
mnt_drop_write(nd.path.mnt);
--
1.6.3.3
--
If a dentry is removed from dentry cache because its usage count drops
to zero, the union_dirs in its union stack are freed too.
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
fs/dcache.c | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)
diff --git a/fs/dcache.c b/fs/dcache.c
index c7b6e67..4fe51a9 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -34,6 +34,7 @@
#include <linux/fs_struct.h>
#include <linux/hardirq.h>
#include "internal.h"
+#include "union.h"
int sysctl_vfs_cache_pressure __read_mostly = 100;
EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
@@ -175,6 +176,7 @@ static struct dentry *d_kill(struct dentry *dentry)
dentry_stat.nr_dentry--; /* For d_free, below */
/*drops the locks, at that point nobody can reach this dentry */
dentry_iput(dentry);
+ d_free_unions(dentry);
if (IS_ROOT(dentry))
parent = NULL;
else
@@ -695,6 +697,7 @@ static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
iput(inode);
}
+ d_free_unions(dentry);
d_free(dentry);
/* finished when we fall off the top of the tree,
@@ -1535,6 +1538,7 @@ void d_delete(struct dentry * dentry)
if (atomic_read(&dentry->d_count) == 1) {
dentry->d_flags &= ~DCACHE_CANT_MOUNT;
dentry_iput(dentry);
+ d_free_unions(dentry);
fsnotify_nameremove(dentry, isdir);
return;
}
@@ -1545,6 +1549,13 @@ void d_delete(struct dentry * dentry)
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
+ /*
+ * Remove any associated unions. While someone still has this
+ * directory open (ref count > 0), we could not have deleted
+ * it unless it was empty, and therefore has no references to
+ * directories below it. So we don't need the unions.
+ */
+ d_free_unions(dentry);
fsnotify_nameremove(dentry, isdir);
}
EXPORT_SYMBOL(d_delete);
--
1.6.3.3
--
Create and tear down union mount structures on mount. Check
requirements for union mounts. This version clones the read-only
mounts as one big tree and points to them from the superblock of the
topmost layer file system.
Thanks to Felix Fietkau <nbd@openwrt.org> for a bug fix and Miklos
Szeredi <miklos@szeredi.hu> for better mount error messages.
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
fs/namespace.c | 255 ++++++++++++++++++++++++++++++++++++++++++++++++-
fs/super.c | 1 +
include/linux/fs.h | 7 ++
include/linux/mount.h | 2 +
4 files changed, 263 insertions(+), 2 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index f115cb6..aa6a132 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -33,6 +33,7 @@
#include <asm/unistd.h>
#include "pnode.h"
#include "internal.h"
+#include "union.h"
#define HASH_SHIFT ilog2(PAGE_SIZE / sizeof(struct list_head))
#define HASH_SIZE (1UL << HASH_SHIFT)
@@ -1050,6 +1051,7 @@ void umount_tree(struct vfsmount *mnt, int propagate, struct list_head *kill)
propagate_umount(kill);
list_for_each_entry(p, kill, mnt_hash) {
+ d_free_unions(p->mnt_root);
list_del_init(&p->mnt_expire);
list_del_init(&p->mnt_list);
__touch_mnt_namespace(p->mnt_ns);
@@ -1333,6 +1335,217 @@ static int invent_group_ids(struct vfsmount *mnt, bool recurse)
return 0;
}
+/**
+ * check_mnt_union - mount-time checks for union mount
+ *
+ * @mntpnt: path of the mountpoint the new mount will be on
+ * @topmost_mnt: vfsmount of the new file system to be mounted
+ * @mnt_flags: mount flags for the new file system
+ *
+ * Mount-time check of upper and lower layer file systems to see if we
+ * can union mount one on the other.
+ *
+ * The rules:
+ *
+ * Lower layer(s) and submounts read-only: We can't deal with
+ * namespace changes in the lower layers of a union, so the lower
+ * layer must be read-only. Note that we could possibly convert a
+ * read-write unioned mount into a ...From: Felix Fietkau <nbd@openwrt.org> Add support for whiteout dentries to jffs2. XXX - David Woodhouse suggests several changes and provides an untested patch. See: http://patchwork.ozlabs.org/patch/50466/ XXX - Backward compatibility? Creating a whiteout on a JFFS2 file system can only happen if it is deliberately mounted "-o union" so there is some way to prevent creation of whiteouts on a file system you want to later mount with an earlier (no support for whiteout) file system. However, ext2/3 has much more robust methods (explicit fs feature flag) to prevent such an occurance. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Valerie Aurora <vaurora@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: linux-mtd@lists.infradead.org --- fs/jffs2/dir.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++- fs/jffs2/fs.c | 4 +++ fs/jffs2/super.c | 2 +- include/linux/jffs2.h | 2 + 4 files changed, 77 insertions(+), 3 deletions(-) diff --git a/fs/jffs2/dir.c b/fs/jffs2/dir.c index 166062a..4798586 100644 --- a/fs/jffs2/dir.c +++ b/fs/jffs2/dir.c @@ -34,6 +34,8 @@ static int jffs2_mknod (struct inode *,struct dentry *,int,dev_t); static int jffs2_rename (struct inode *, struct dentry *, struct inode *, struct dentry *); +static int jffs2_whiteout (struct inode *, struct dentry *, struct dentry *); + const struct file_operations jffs2_dir_operations = { .read = generic_read_dir, @@ -56,6 +58,7 @@ const struct inode_operations jffs2_dir_inode_operations = .mknod = jffs2_mknod, .rename = jffs2_rename, .check_acl = jffs2_check_acl, + .whiteout = jffs2_whiteout, .setattr = jffs2_setattr, .setxattr = jffs2_setxattr, .getxattr = jffs2_getxattr, @@ -98,8 +101,14 @@ static struct dentry *jffs2_lookup(struct inode *dir_i, struct dentry *target, fd = fd_list; } } - if (fd) - ino = fd->ino; + if (fd) { + spin_lock(&target->d_lock); + if (fd->type == ...
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
fs/open.c | 23 ++++++++++++++++++++---
1 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/fs/open.c b/fs/open.c
index 5c9933f..693258f 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -646,18 +646,35 @@ out:
SYSCALL_DEFINE3(lchown, const char __user *, filename, uid_t, user, gid_t, group)
{
struct path path;
+ struct nameidata nd;
+ struct vfsmount *mnt;
+ char *tmp;
int error;
- error = user_lpath(filename, &path);
+ error = user_path_nd(AT_FDCWD, filename, 0, &nd, &path, &tmp);
if (error)
goto out;
- error = mnt_want_write(path.mnt);
+
+ if (IS_DIR_UNIONED(nd.path.dentry))
+ mnt = nd.path.mnt;
+ else
+ mnt = path.mnt;
+
+ error = mnt_want_write(mnt);
if (error)
goto out_release;
+
+ error = union_copyup(&nd, &path);
+ if (error)
+ goto out_drop_write;
+
error = chown_common(&path, user, group);
- mnt_drop_write(path.mnt);
+out_drop_write:
+ mnt_drop_write(mnt);
out_release:
path_put(&path);
+ path_put(&nd.path);
+ putname(tmp);
out:
return error;
}
--
1.6.3.3
--
Add support for fallthru directory entries to tmpfs
XXX - Makes up inode number for dirent
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
fs/dcache.c | 3 +-
fs/libfs.c | 21 ++++++++++++++++--
mm/shmem.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
3 files changed, 76 insertions(+), 12 deletions(-)
diff --git a/fs/dcache.c b/fs/dcache.c
index 249d077..2cd367a 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2240,7 +2240,8 @@ resume:
* we can evict it.
*/
if (d_unhashed(dentry)||(!dentry->d_inode &&
- !d_is_whiteout(dentry)))
+ !d_is_whiteout(dentry) &&
+ !d_is_fallthru(dentry)))
continue;
if (!list_empty(&dentry->d_subdirs)) {
this_parent = dentry;
diff --git a/fs/libfs.c b/fs/libfs.c
index dcaf972..20e9a49 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -130,6 +130,7 @@ int dcache_readdir(struct file * filp, void * dirent, filldir_t filldir)
struct dentry *cursor = filp->private_data;
struct list_head *p, *q = &cursor->d_u.d_child;
ino_t ino;
+ int d_type;
int i = filp->f_pos;
switch (i) {
@@ -155,14 +156,28 @@ int dcache_readdir(struct file * filp, void * dirent, filldir_t filldir)
for (p=q->next; p != &dentry->d_subdirs; p=p->next) {
struct dentry *next;
next = list_entry(p, struct dentry, d_u.d_child);
- if (d_unhashed(next) || !next->d_inode)
+ if (d_unhashed(next) || (!next->d_inode && !d_is_fallthru(next)))
continue;
+ if (d_is_fallthru(next)) {
+ /* XXX We don't know the inode
+ * number of the directory
+ * entry in the underlying
+ * file system. Should look
+ * it up, either on fallthru
+ * creation at first readdir
+ * or now at filldir time. */
+ ino = 123; /* Made up ino */
+ d_type = DT_UNKNOWN;
+ } else {
+ ino = next->d_inode->i_ino;
+ d_type = dt_type(next->d_inode);
+ }
+
spin_unlock(&dcache_lock);
if (filldir(dirent, next->d_name.name,
...From: Felix Fietkau <nbd@openwrt.org> Add support for fallthru dentries to jffs2. XXX - See comment on jffs2 whiteout commit about backwards compatibility concerns. Cc: David Woodhouse <dwmw2@infradead.org> Cc: linux-mtd@lists.infradead.org Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Valerie Aurora <vaurora@redhat.com> --- fs/jffs2/dir.c | 36 +++++++++++++++++++++++++++++++++--- include/linux/jffs2.h | 6 ++++++ 2 files changed, 39 insertions(+), 3 deletions(-) diff --git a/fs/jffs2/dir.c b/fs/jffs2/dir.c index 4798586..244a642 100644 --- a/fs/jffs2/dir.c +++ b/fs/jffs2/dir.c @@ -35,6 +35,7 @@ static int jffs2_rename (struct inode *, struct dentry *, struct inode *, struct dentry *); static int jffs2_whiteout (struct inode *, struct dentry *, struct dentry *); +static int jffs2_fallthru (struct inode *, struct dentry *); const struct file_operations jffs2_dir_operations = { @@ -59,6 +60,7 @@ const struct inode_operations jffs2_dir_inode_operations = .rename = jffs2_rename, .check_acl = jffs2_check_acl, .whiteout = jffs2_whiteout, + .fallthru = jffs2_fallthru, .setattr = jffs2_setattr, .setxattr = jffs2_setxattr, .getxattr = jffs2_getxattr, @@ -103,10 +105,14 @@ static struct dentry *jffs2_lookup(struct inode *dir_i, struct dentry *target, } if (fd) { spin_lock(&target->d_lock); - if (fd->type == DT_WHT) + switch (fd->type) { + case DT_WHT: target->d_flags |= DCACHE_WHITEOUT; - else + case JFFS2_DT_FALLTHRU: + target->d_flags |= DCACHE_FALLTHRU; + default: ino = fd->ino; + } spin_unlock(&target->d_lock); } mutex_unlock(&dir_f->sem); @@ -164,7 +170,10 @@ static int jffs2_readdir(struct file *filp, void *dirent, filldir_t filldir) fd->name, fd->ino, fd->type, curofs, offset)); continue; } - if (!fd->ino) { + if (fd->type == JFFS2_DT_FALLTHRU) + /* XXX Should really do a lookup for the real inode number here */ + fd->ino = 100; + else ...
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
fs/open.c | 25 +++++++++++++++++++++----
1 files changed, 21 insertions(+), 4 deletions(-)
diff --git a/fs/open.c b/fs/open.c
index e4fc8e5..5c9933f 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -503,18 +503,32 @@ out:
SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename, mode_t, mode)
{
struct path path;
+ struct nameidata nd;
+ struct vfsmount *mnt;
struct inode *inode;
+ char *tmp;
int error;
struct iattr newattrs;
- error = user_path_at(dfd, filename, LOOKUP_FOLLOW, &path);
+ error = user_path_nd(dfd, filename, LOOKUP_FOLLOW, &nd,
+ &path, &tmp);
if (error)
goto out;
- inode = path.dentry->d_inode;
- error = mnt_want_write(path.mnt);
+ if (IS_DIR_UNIONED(nd.path.dentry))
+ mnt = nd.path.mnt;
+ else
+ mnt = path.mnt;
+
+ error = mnt_want_write(mnt);
if (error)
goto dput_and_out;
+
+ error = union_copyup(&nd, &path);
+ if (error)
+ goto mnt_drop_write_and_out;
+
+ inode = path.dentry->d_inode;
mutex_lock(&inode->i_mutex);
error = security_path_chmod(path.dentry, path.mnt, mode);
if (error)
@@ -526,9 +540,12 @@ SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename, mode_t, mode)
error = notify_change(path.dentry, &newattrs);
out_unlock:
mutex_unlock(&inode->i_mutex);
- mnt_drop_write(path.mnt);
+mnt_drop_write_and_out:
+ mnt_drop_write(mnt);
dput_and_out:
path_put(&path);
+ path_put(&nd.path);
+ putname(tmp);
out:
return error;
}
--
1.6.3.3
--
Add support for fallthru directory entries to ext2. XXX What to do for d_ino for fallthrus? If we return the inode from the the underlying file system, it comes from a different inode "namespace" and that will produce spurious matches. This argues for implementation of fallthrus as symlinks because they have to allocate an inode (and inode number) anyway, and we can later reuse it if we copy the file up. Cc: Theodore Tso <tytso@mit.edu> Cc: linux-ext4@vger.kernel.org Signed-off-by: Valerie Aurora <vaurora@redhat.com> Signed-off-by: Jan Blunck <jblunck@suse.de> --- fs/ext2/dir.c | 92 ++++++++++++++++++++++++++++++++++++++++++++-- fs/ext2/ext2.h | 1 + fs/ext2/namei.c | 22 +++++++++++ fs/ext2/super.c | 2 + include/linux/ext2_fs.h | 4 ++ 5 files changed, 117 insertions(+), 4 deletions(-) diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c index 030bd46..f3b4aff 100644 --- a/fs/ext2/dir.c +++ b/fs/ext2/dir.c @@ -219,7 +219,8 @@ static inline int ext2_match (int len, const char * const name, { if (len != de->name_len) return 0; - if (!de->inode && (de->file_type != EXT2_FT_WHT)) + if (!de->inode && ((de->file_type != EXT2_FT_WHT) && + (de->file_type != EXT2_FT_FALLTHRU))) return 0; return !memcmp(name, de->name, len); } @@ -256,6 +257,7 @@ static unsigned char ext2_filetype_table[EXT2_FT_MAX] = { [EXT2_FT_SOCK] = DT_SOCK, [EXT2_FT_SYMLINK] = DT_LNK, [EXT2_FT_WHT] = DT_WHT, + [EXT2_FT_FALLTHRU] = DT_UNKNOWN, }; #define S_SHIFT 12 @@ -342,6 +344,24 @@ ext2_readdir (struct file * filp, void * dirent, filldir_t filldir) ext2_put_page(page); return 0; } + } else if (de->file_type == EXT2_FT_FALLTHRU) { + int over; + unsigned char d_type = DT_UNKNOWN; + + offset = (char *)de - kaddr; + /* XXX We don't know the inode number + * of the directory entry in the + * underlying file system. Should + * look it up, either on fallthru + * creation at ...
I don't think it makes sense to use "123" for the inode number. This is a valid inode number, and almost certainly one that will be in use in most filesystems. One option for extN is to use EXT2_BAD_INO (1). Cheers, Andreas --
The next version (Subject: Union mounts - return d_ino from lower fs) fixed this. Take a look and tell me what you think? -VAL --
Define the fallthru dcache flag and file system op. Mask out the
DCACHE_FALLTHRU flag on dentry creation. Actual users and changes to
lookup come in later patches.
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
Documentation/filesystems/vfs.txt | 6 ++++++
fs/dcache.c | 2 +-
include/linux/dcache.h | 7 +++++++
include/linux/fs.h | 2 ++
4 files changed, 16 insertions(+), 1 deletions(-)
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 964e0fc..bbaefa9 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -320,6 +320,7 @@ struct inode_operations {
int (*rmdir) (struct inode *,struct dentry *);
int (*mknod) (struct inode *,struct dentry *,int,dev_t);
int (*whiteout) (struct inode *, struct dentry *, struct dentry *);
+ int (*fallthru) (struct inode *, struct dentry *);
int (*rename) (struct inode *, struct dentry *,
struct inode *, struct dentry *);
int (*readlink) (struct dentry *, char __user *,int);
@@ -390,6 +391,11 @@ otherwise noted.
second is the dentry for the whiteout itself. This method
must unlink() or rmdir() the original entry if it exists.
+ fallthru: called by the readdir(2) system call on a layered file
+ system. Only required if you want to support fallthrus.
+ Fallthrus are place-holders for directory entries visible from
+ a lower level file system.
+
rename: called by the rename(2) system call to rename the object to
have the parent and name given by the second inode and dentry.
diff --git a/fs/dcache.c b/fs/dcache.c
index 79b9f6a..249d077 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -992,7 +992,7 @@ EXPORT_SYMBOL(d_alloc_name);
static void __d_instantiate(struct dentry *dentry, struct inode *inode)
{
if (inode) {
- dentry->d_flags &= ~DCACHE_WHITEOUT;
+ dentry->d_flags &= ~(DCACHE_WHITEOUT|DCACHE_FALLTHRU);
...From: Jan Blunck <jblunck@suse.de>
Whiteout a given directory entry. File systems that support whiteouts
must implement the new ->whiteout() directory inode operation.
XXX - Only whiteout when there is a matching entry in a lower layer.
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
Documentation/filesystems/vfs.txt | 10 +++++-
fs/dcache.c | 4 ++-
fs/namei.c | 73 ++++++++++++++++++++++++++++++++++++-
include/linux/dcache.h | 7 ++++
include/linux/fs.h | 2 +
5 files changed, 93 insertions(+), 3 deletions(-)
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 94677e7..964e0fc 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -308,7 +308,7 @@ struct inode_operations
-----------------------
This describes how the VFS can manipulate an inode in your
-filesystem. As of kernel 2.6.22, the following members are defined:
+filesystem. As of kernel 2.6.34, the following members are defined:
struct inode_operations {
int (*create) (struct inode *,struct dentry *,int, struct nameidata *);
@@ -319,6 +319,7 @@ struct inode_operations {
int (*mkdir) (struct inode *,struct dentry *,int);
int (*rmdir) (struct inode *,struct dentry *);
int (*mknod) (struct inode *,struct dentry *,int,dev_t);
+ int (*whiteout) (struct inode *, struct dentry *, struct dentry *);
int (*rename) (struct inode *, struct dentry *,
struct inode *, struct dentry *);
int (*readlink) (struct dentry *, char __user *,int);
@@ -382,6 +383,13 @@ otherwise noted.
will probably need to call d_instantiate() just as you would
in the create() method
+ whiteout: called by the rmdir(2) and unlink(2) system calls on a
+ layered file system. Only required if you want to support
+ whiteouts. The first dentry ...From: Jan Blunck <jblunck@suse.de> The ext2_append_link() is later used to find or append a directory entry to whiteout. Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Valerie Aurora <vaurora@redhat.com> Cc: Theodore Tso <tytso@mit.edu> Cc: linux-ext4@vger.kernel.org --- fs/ext2/dir.c | 70 ++++++++++++++++++++++++++++++++++++++++---------------- 1 files changed, 50 insertions(+), 20 deletions(-) diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c index 7516957..57207a9 100644 --- a/fs/ext2/dir.c +++ b/fs/ext2/dir.c @@ -472,9 +472,10 @@ void ext2_set_link(struct inode *dir, struct ext2_dir_entry_2 *de, } /* - * Parent is locked. + * Find or append a given dentry to the parent directory */ -int ext2_add_link (struct dentry *dentry, struct inode *inode) +static ext2_dirent * ext2_append_entry(struct dentry * dentry, + struct page ** page) { struct inode *dir = dentry->d_parent->d_inode; const char *name = dentry->d_name.name; @@ -482,13 +483,10 @@ int ext2_add_link (struct dentry *dentry, struct inode *inode) unsigned chunk_size = ext2_chunk_size(dir); unsigned reclen = EXT2_DIR_REC_LEN(namelen); unsigned short rec_len, name_len; - struct page *page = NULL; - ext2_dirent * de; + ext2_dirent * de = NULL; unsigned long npages = dir_pages(dir); unsigned long n; char *kaddr; - loff_t pos; - int err; /* * We take care of directory expansion in the same loop. @@ -498,20 +496,19 @@ int ext2_add_link (struct dentry *dentry, struct inode *inode) for (n = 0; n <= npages; n++) { char *dir_end; - page = ext2_get_page(dir, n, 0); - err = PTR_ERR(page); - if (IS_ERR(page)) + *page = ext2_get_page(dir, n, 0); + de = ERR_PTR(PTR_ERR(*page)); + if (IS_ERR(*page)) goto out; - lock_page(page); - kaddr = page_address(page); + lock_page(*page); + kaddr = page_address(*page); dir_end = kaddr + ext2_last_byte(dir, n); de = (ext2_dirent *)kaddr; kaddr += PAGE_CACHE_SIZE - reclen; while ((char ...
Add support for whiteout dentries to tmpfs. This includes adding support for whiteouts to d_genocide(), which is called to tear down pinned tmpfs dentries. Whiteouts have to be persistent, so they have a pinning extra ref count that needs to be dropped by d_genocide(). Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Valerie Aurora <vaurora@redhat.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: linux-mm@kvack.org --- fs/dcache.c | 13 +++++- mm/shmem.c | 145 +++++++++++++++++++++++++++++++++++++++++++++++++++++------ 2 files changed, 143 insertions(+), 15 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 80f059b..79b9f6a 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2229,7 +2229,18 @@ resume: struct list_head *tmp = next; struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child); next = tmp->next; - if (d_unhashed(dentry)||!dentry->d_inode) + /* + * Skip unhashed and negative dentries, but process + * positive dentries and whiteouts. A whiteout looks + * kind of like a negative dentry for purposes of + * lookup, but it has an extra pinning ref count + * because it can't be evicted like a negative dentry + * can. What we care about here is ref counts - and + * we need to drop the ref count on a whiteout before + * we can evict it. + */ + if (d_unhashed(dentry)||(!dentry->d_inode && + !d_is_whiteout(dentry))) continue; if (!list_empty(&dentry->d_subdirs)) { this_parent = dentry; diff --git a/mm/shmem.c b/mm/shmem.c index f65f840..a0a4fa5 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1805,6 +1805,76 @@ static int shmem_statfs(struct dentry *dentry, struct kstatfs *buf) return 0; } +static int shmem_rmdir(struct inode *dir, struct dentry *dentry); +static int shmem_unlink(struct inode *dir, struct dentry *dentry); + +/* + * This is the whiteout support for tmpfs. It uses one singleton whiteout + * inode per ...
From: Jan Blunck <jblunck@suse.de> Userspace isn't ready for handling another file type, so silently drop whiteout directory entries before they leave the kernel. Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Valerie Aurora <vaurora@redhat.com> Cc: linux-nfs@vger.kernel.org Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> --- fs/compat.c | 9 +++++++++ fs/nfsd/nfs3xdr.c | 5 +++++ fs/nfsd/nfs4xdr.c | 5 +++++ fs/nfsd/nfsxdr.c | 4 ++++ fs/readdir.c | 9 +++++++++ 5 files changed, 32 insertions(+), 0 deletions(-) diff --git a/fs/compat.c b/fs/compat.c index 6490d21..7e7b3a4 100644 --- a/fs/compat.c +++ b/fs/compat.c @@ -912,6 +912,9 @@ static int compat_fillonedir(void *__buf, const char *name, int namlen, struct compat_old_linux_dirent __user *dirent; compat_ulong_t d_ino; + if (d_type == DT_WHT) + return 0; + if (buf->result) return -EINVAL; d_ino = ino; @@ -983,6 +986,9 @@ static int compat_filldir(void *__buf, const char *name, int namlen, compat_ulong_t d_ino; int reclen = ALIGN(NAME_OFFSET(dirent) + namlen + 2, sizeof(compat_long_t)); + if (d_type == DT_WHT) + return 0; + buf->error = -EINVAL; /* only used if we fail.. */ if (reclen > buf->count) return -EINVAL; @@ -1072,6 +1078,9 @@ static int compat_filldir64(void * __buf, const char * name, int namlen, loff_t int reclen = ALIGN(jj + namlen + 1, sizeof(u64)); u64 off; + if (d_type == DT_WHT) + return 0; + buf->error = -EINVAL; /* only used if we fail.. */ if (reclen > buf->count) return -EINVAL; diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 2a533a0..9b96f5a 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -885,6 +885,11 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen, int elen; /* estimated entry length in words */ int num_entry_words = 0; /* actual number of words */ + if (d_type ...
For what it's worth: Acked-by: J. Bruce Fields <bfields@redhat.com> (Like Neil I kinda hoped we wouldn't need the check in every callback, but probably you're right that it wouldn't be worth the extra layer of indirection.) --
While we can check if a file system is currently read-only, we can't
guarantee that it will stay read-only. The file system can be
remounted read-write at any time; it's also conceivable that a file
system can be mounted a second time and converted to read-write if the
underlying fs allows it. This is a problem for union mounts, which
require the underlying file system be read-only. Add a read-only
users count and don't allow remounts to change the file system to
read-write or read-write mounts if there are any read-only users.
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
fs/namespace.c | 13 +++++++++++++
fs/super.c | 23 +++++++++++++++++++++++
include/linux/fs.h | 8 ++++++++
3 files changed, 44 insertions(+), 0 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index b8a66db..984c331 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -200,6 +200,19 @@ int __mnt_is_readonly(struct vfsmount *mnt)
}
EXPORT_SYMBOL_GPL(__mnt_is_readonly);
+static void inc_hard_readonly_users(struct vfsmount *mnt)
+{
+ BUG_ON(!__mnt_is_readonly(mnt));
+ mnt->mnt_sb->s_hard_readonly_users++;
+}
+
+static void dec_hard_readonly_users(struct vfsmount *mnt)
+{
+ BUG_ON(!__mnt_is_readonly(mnt));
+ BUG_ON(mnt->mnt_sb->s_hard_readonly_users == 0);
+ mnt->mnt_sb->s_hard_readonly_users--;
+}
+
static inline void inc_mnt_writers(struct vfsmount *mnt)
{
#ifdef CONFIG_SMP
diff --git a/fs/super.c b/fs/super.c
index 938119a..86bdf1f 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -108,6 +108,7 @@ out:
*/
static inline void destroy_super(struct super_block *s)
{
+ BUG_ON(s->s_hard_readonly_users);
security_sb_free(s);
kfree(s->s_subtype);
kfree(s->s_options);
@@ -512,6 +513,21 @@ rescan:
return NULL;
}
+/*
+ * Some uses of file systems require that they never be mounted
+ * read-write anywhere (e.g., the lower layers of union mounts must
+ * always be read-only). If there are ...From: Jan Blunck <jblunck@suse.de>
If we mkdir() a directory on the top layer of a union, we don't want
entries from a matching directory on the lower layer to "show through"
suddenly. To prevent this, we set the opaque flag on a directory if
there was previously a white-out with the same name. (If there is no
white-out and the directory exists in a lower layer, then mkdir() will
fail with EEXIST.)
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Valerie Aurora <vaurora@redhat.com>
---
fs/namei.c | 11 ++++++++++-
include/linux/fs.h | 5 +++++
2 files changed, 15 insertions(+), 1 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 665d394..cd8b0d0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2108,6 +2108,7 @@ SYSCALL_DEFINE3(mknod, const char __user *, filename, int, mode, unsigned, dev)
int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
{
int error = may_create(dir, dentry);
+ int opaque = 0;
if (error)
return error;
@@ -2120,9 +2121,17 @@ int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
if (error)
return error;
+ if (d_is_whiteout(dentry))
+ opaque = 1;
+
error = dir->i_op->mkdir(dir, dentry, mode);
- if (!error)
+ if (!error) {
fsnotify_mkdir(dir, dentry);
+ if (opaque) {
+ dentry->d_inode->i_flags |= S_OPAQUE;
+ mark_inode_dirty(dentry->d_inode);
+ }
+ }
return error;
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1f80897..1dbe156 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -236,6 +236,7 @@ struct inodes_stat_t {
#define S_NOCMTIME 128 /* Do not update file c/mtime */
#define S_SWAPFILE 256 /* Do not truncate: swapon got its bmaps */
#define S_PRIVATE 512 /* Inode is fs-internal */
+#define S_OPAQUE 1024 /* Directory is opaque */
/*
* Note that nosuid etc flags are inode-specific: setting some file-system
@@ -270,6 +271,7 @@ struct inodes_stat_t {
#define IS_NOCMTIME(inode) ((inode)->i_flags & S_NOCMTIME)
...