[PATCH 14/14] Unionfs: update lower mnts on rmdir with copyup

Previous thread: none

Next thread: kmemcheck caught read from freed memory (cfq_free_io_context) by Vegard Nossum on Tuesday, April 1, 2008 - 2:08 pm. (69 messages)
From: Erez Zadok
Date: Tuesday, April 1, 2008 - 2:06 pm

The following is a series of patchsets related to Unionfs.  Two of the
patches are VFS changes (and one of those two is rather minor).  There are
two major changes to Unionfs in this release:

1. Based on recommendations we've received while at LSF'08, we've changed
   Unionfs to use the vm_operations->fault method, instead of
   address_space_operations.  This makes unionfs's code smaller, runs faster
   (reduce memory pressure), and removes a number of potential races.  This
   change is significant, but is only an internal implementation change: it
   shouldn't change any user-visible behavior.  (Note: this change was
   inspired in part by Junjiro Okajima's work and suggestions -- thank you.)

   This change required some workarounds to VFS deficiencies for filesystems
   that want to use vm_ops->fault instead of address_space_operations.  In
   the next few weeks we'll be submitting patches that try to address these
   small deficiencies more cleanly.  One of those changes was necessary now
   (patch 02 of this patchset which exports vfs_splice* helpers).

2. Unionfs now implements a "delete-all" policy.  This means that if an
   object (e.g., a regular file) exists in multiple writable branches,
   unionfs will attempt to delete all instances of the same object, not just
   the first one it finds.  This helps to reduce the total number of
   whiteouts that unionfs has to create, and saves on the number of inodes
   consumed.  Of course, if a branch is marked or mounted read-only, then
   unionfs won't delete an object there, and only then will it fall back to
   creating a whiteout.  Since most users use only one writable branch in
   the unionfs, this change will probably affect only a small subset of
   people.  (Note: unionfs-odf already implements this delete-all policy, so
   this change here helps to synchronize the behavior of the two releases.)

These patches were tested (where appropriate) on 2.6.25-rc7, MM (mmotm
stamp-2008-04-01-02-39), as well as the ...
From: Erez Zadok
Date: Tuesday, April 1, 2008 - 2:06 pm

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/namei.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 00078e4..e421fb9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -399,7 +399,7 @@ void release_open_intent(struct nameidata *nd)
 	else
 		fput(nd->intent.open.file);
 }
-EXPORT_SYMBOL(release_open_intent);
+EXPORT_SYMBOL_GPL(release_open_intent);
 
 static inline struct dentry *
 do_revalidate(struct dentry *dentry, struct nameidata *nd)
-- 
1.5.2.2

--

From: Erez Zadok
Date: Tuesday, April 1, 2008 - 2:06 pm

A stackable file system which uses vm_ops->fault, and does not implement
address_space_operations, cannot use generic_file_splice_read/write, but has
to implement ->splice_read/write itself.  These two helper functions are
very useful to such a module.

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/splice.c            |   20 +++++++++++---------
 include/linux/splice.h |    5 +++++
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 0670c91..62b0db2 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -899,8 +899,8 @@ EXPORT_SYMBOL(generic_splice_sendpage);
 /*
  * Attempt to initiate a splice from pipe to file.
  */
-static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
-			   loff_t *ppos, size_t len, unsigned int flags)
+long vfs_splice_from(struct pipe_inode_info *pipe, struct file *out,
+		     loff_t *ppos, size_t len, unsigned int flags)
 {
 	int ret;
 
@@ -916,13 +916,14 @@ static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
 
 	return out->f_op->splice_write(pipe, out, ppos, len, flags);
 }
+EXPORT_SYMBOL_GPL(vfs_splice_from);
 
 /*
  * Attempt to initiate a splice from a file to a pipe.
  */
-static long do_splice_to(struct file *in, loff_t *ppos,
-			 struct pipe_inode_info *pipe, size_t len,
-			 unsigned int flags)
+long vfs_splice_to(struct file *in, loff_t *ppos,
+		   struct pipe_inode_info *pipe, size_t len,
+		   unsigned int flags)
 {
 	int ret;
 
@@ -938,6 +939,7 @@ static long do_splice_to(struct file *in, loff_t *ppos,
 
 	return in->f_op->splice_read(in, ppos, pipe, len, flags);
 }
+EXPORT_SYMBOL_GPL(vfs_splice_to);
 
 /**
  * splice_direct_to_actor - splices data directly between two non-pipes
@@ -1007,7 +1009,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
 		size_t read_len;
 		loff_t pos = sd->pos;
 
-		ret = do_splice_to(in, &pos, pipe, len, flags);
+		ret = vfs_splice_to(in, &pos, pipe, len, flags);
 		if ...
From: Erez Zadok
Date: Tuesday, April 1, 2008 - 2:06 pm

Optimize the unlinking of non-dir objects in unionfs by deleting all
possible lower inode objects from all writable lower branches.  This may
consume a bit more processing, but on average reduces overall inode
consumption and further saves a lot by reducing the total number of
whiteouts needed.  We create a whiteout now only if we could not delete all
lower objects, or if one of the lower branches was explicitly marked
read-only.

Signed-off-by: Himanshu Kanda <hkanda@cs.sunysb.edu>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 Documentation/filesystems/unionfs/concepts.txt |   25 ++++++
 fs/unionfs/lookup.c                            |    8 +-
 fs/unionfs/unlink.c                            |  107 +++++++++++++++---------
 3 files changed, 97 insertions(+), 43 deletions(-)

diff --git a/Documentation/filesystems/unionfs/concepts.txt b/Documentation/filesystems/unionfs/concepts.txt
index 8d9a1c5..0bc1a19 100644
--- a/Documentation/filesystems/unionfs/concepts.txt
+++ b/Documentation/filesystems/unionfs/concepts.txt
@@ -62,6 +62,31 @@ simplest solution is to take the instance from the highest priority
 (numerically lowest value) and "hide" the others.
 
 
+Unlinking:
+=========
+
+Unlink operation on non-directory instances is optimized to remove the
+maximum possible objects in case multiple underlying branches have the same
+file name.  The unlink operation will first try to delete file instances
+from highest priority branch and then move further to delete from remaining
+branches in order of their decreasing priority.  Consider a case (F..D..F),
+where F is a file and D is a directory of the same name; here, some
+intermediate branch could have an empty directory instance with the same
+name, so this operation also tries to delete this directory instance and
+proceed further to delete from next possible lower priority branch.  The
+unionfs unlink operation will smoothly delete the files with same name from
+all possible underlying branches.  In case if some error ...
From: Erez Zadok
Date: Tuesday, April 1, 2008 - 2:06 pm

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/unionfs/lookup.c |    2 +-
 fs/unionfs/super.c  |   24 ++++++++++++++----------
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/fs/unionfs/lookup.c b/fs/unionfs/lookup.c
index 7618716..7f512c2 100644
--- a/fs/unionfs/lookup.c
+++ b/fs/unionfs/lookup.c
@@ -32,7 +32,7 @@ static int is_validname(const char *name)
 }
 
 /* The rest of these are utility functions for lookup. */
-static noinline int is_opaque_dir(struct dentry *dentry, int bindex)
+static noinline_for_stack int is_opaque_dir(struct dentry *dentry, int bindex)
 {
 	int err = 0;
 	struct dentry *lower_dentry;
diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
index fba1598..2456654 100644
--- a/fs/unionfs/super.c
+++ b/fs/unionfs/super.c
@@ -189,9 +189,11 @@ out:
 }
 
 /* handle mode changing during remount */
-static noinline int do_remount_mode_option(char *optarg, int cur_branches,
-					   struct unionfs_data *new_data,
-					   struct path *new_lower_paths)
+static noinline_for_stack int do_remount_mode_option(
+					char *optarg,
+					int cur_branches,
+					struct unionfs_data *new_data,
+					struct path *new_lower_paths)
 {
 	int err = -EINVAL;
 	int perms, idx;
@@ -250,9 +252,10 @@ out:
 }
 
 /* handle branch deletion during remount */
-static noinline int do_remount_del_option(char *optarg, int cur_branches,
-					  struct unionfs_data *new_data,
-					  struct path *new_lower_paths)
+static noinline_for_stack int do_remount_del_option(
+					char *optarg, int cur_branches,
+					struct unionfs_data *new_data,
+					struct path *new_lower_paths)
 {
 	int err = -EINVAL;
 	int idx;
@@ -313,10 +316,11 @@ out:
 }
 
 /* handle branch insertion during remount */
-static noinline int do_remount_add_option(char *optarg, int cur_branches,
-					  struct unionfs_data *new_data,
-					  struct path *new_lower_paths,
-					  int *high_branch_id)
+static noinline_for_stack int do_remount_add_option(
+					char *optarg, ...
From: Erez Zadok
Date: Tuesday, April 1, 2008 - 2:06 pm

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/unionfs/dentry.c |    2 +-
 fs/unionfs/union.h  |   18 +++++++++---------
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/unionfs/dentry.c b/fs/unionfs/dentry.c
index f8f65e1..5e498c2 100644
--- a/fs/unionfs/dentry.c
+++ b/fs/unionfs/dentry.c
@@ -406,7 +406,7 @@ bool __unionfs_d_revalidate_chain(struct dentry *dentry, struct nameidata *nd,
 	chain = kzalloc(chain_len * sizeof(struct dentry *), GFP_KERNEL);
 	if (unlikely(!chain)) {
 		printk(KERN_CRIT "unionfs: no more memory in %s\n",
-		       __FUNCTION__);
+		       __func__);
 		goto out;
 	}
 
diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
index 63f3b21..e93b9ef 100644
--- a/fs/unionfs/union.h
+++ b/fs/unionfs/union.h
@@ -557,24 +557,24 @@ static inline void unionfs_mntput(struct dentry *dentry, int bindex)
 #ifdef CONFIG_UNION_FS_DEBUG
 
 /* useful for tracking code reachability */
-#define UDBG pr_debug("DBG:%s:%s:%d\n", __FILE__, __FUNCTION__, __LINE__)
+#define UDBG pr_debug("DBG:%s:%s:%d\n", __FILE__, __func__, __LINE__)
 
 #define unionfs_check_inode(i)	__unionfs_check_inode((i),	\
-	__FILE__, __FUNCTION__, __LINE__)
+	__FILE__, __func__, __LINE__)
 #define unionfs_check_dentry(d)	__unionfs_check_dentry((d),	\
-	__FILE__, __FUNCTION__, __LINE__)
+	__FILE__, __func__, __LINE__)
 #define unionfs_check_file(f)	__unionfs_check_file((f),	\
-	__FILE__, __FUNCTION__, __LINE__)
+	__FILE__, __func__, __LINE__)
 #define unionfs_check_nd(n)	__unionfs_check_nd((n),		\
-	__FILE__, __FUNCTION__, __LINE__)
+	__FILE__, __func__, __LINE__)
 #define show_branch_counts(sb)	__show_branch_counts((sb),	\
-	__FILE__, __FUNCTION__, __LINE__)
+	__FILE__, __func__, __LINE__)
 #define show_inode_times(i)	__show_inode_times((i),		\
-	__FILE__, __FUNCTION__, __LINE__)
+	__FILE__, __func__, __LINE__)
 #define show_dinode_times(d)	__show_dinode_times((d),	\
-	__FILE__, __FUNCTION__, __LINE__)
+	__FILE__, __func__, __LINE__)
 #define ...
From: Erez Zadok
Date: Tuesday, April 1, 2008 - 2:06 pm

Must implement splice_read/write directly, using VFS helpers, because we can
no longer rely on generic_file_splice_read/write: they need
address_space_operations implemented, which we no longer have.

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/unionfs/file.c  |   59 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/unionfs/union.h |    1 +
 2 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/fs/unionfs/file.c b/fs/unionfs/file.c
index 0c424f6..1fcaf8e 100644
--- a/fs/unionfs/file.c
+++ b/fs/unionfs/file.c
@@ -165,6 +165,61 @@ out:
 	return err;
 }
 
+static ssize_t unionfs_splice_read(struct file *file, loff_t *ppos,
+				   struct pipe_inode_info *pipe, size_t len,
+				   unsigned int flags)
+{
+	ssize_t err;
+	struct file *lower_file;
+
+	unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	err = unionfs_file_revalidate(file, false);
+	if (unlikely(err))
+		goto out;
+
+	lower_file = unionfs_lower_file(file);
+	err = vfs_splice_to(lower_file, ppos, pipe, len, flags);
+	/* update our inode atime upon a successful lower splice-read */
+	if (err >= 0) {
+		fsstack_copy_attr_atime(file->f_path.dentry->d_inode,
+					lower_file->f_path.dentry->d_inode);
+		unionfs_check_file(file);
+	}
+
+out:
+	unionfs_check_file(file);
+	unionfs_read_unlock(file->f_path.dentry->d_sb);
+	return err;
+}
+
+static ssize_t unionfs_splice_write(struct pipe_inode_info *pipe,
+				    struct file *file, loff_t *ppos,
+				    size_t len, unsigned int flags)
+{
+	ssize_t err = 0;
+	struct file *lower_file;
+
+	unionfs_read_lock(file->f_path.dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	err = unionfs_file_revalidate(file, true);
+	if (unlikely(err))
+		goto out;
+
+	lower_file = unionfs_lower_file(file);
+	err = vfs_splice_from(pipe, lower_file, ppos, len, flags);
+	/* update our inode times+sizes upon a successful lower write */
+	if (err >= 0) ...
From: Erez Zadok
Date: Tuesday, April 1, 2008 - 2:06 pm

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 Documentation/filesystems/unionfs/concepts.txt |   36 ++++++++++++++++++++++++
 1 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/Documentation/filesystems/unionfs/concepts.txt b/Documentation/filesystems/unionfs/concepts.txt
index 0bc1a19..b853788 100644
--- a/Documentation/filesystems/unionfs/concepts.txt
+++ b/Documentation/filesystems/unionfs/concepts.txt
@@ -52,6 +52,42 @@ Later, when Unionfs traverses branches (due to lookup or readdir), it
 eliminate 'foo' from the namespace (as well as the whiteout itself.)
 
 
+Opaque Directories:
+===================
+
+Assume we have a unionfs mount comprising of two branches.  Branch 0 is
+empty; branch 1 has the directory /a and file /a/f.  Let's say we mount a
+union of branch 0 as read-write and branch 1 as read-only.  Now, let's say
+we try to perform the following operation in the union:
+
+	rm -fr a
+
+Because branch 1 is not writable, we cannot physically remove the file /a/f
+or the directory /a.  So instead, we will create a whiteout in branch 0
+named /.wh.a, masking out the name "a" from branch 1.  Next, let's say we
+try to create a directory named "a" as follows:
+
+	mkdir a
+
+Because we have a whiteout for "a" already, Unionfs behaves as if "a"
+doesn't exist, and thus will delete the whiteout and replace it with an
+actual directory named "a".
+
+The problem now is that if you try to "ls" in the union, Unionfs will
+perform is normal directory name unification, for *all* directories named
+"a" in all branches.  This will cause the file /a/f from branch 1 to
+re-appear in the union's namespace, which violates Unix semantics.
+
+To avoid this problem, we have a different form of whiteouts for
+directories, called "opaque directories" (same as BSD Union Mount does).
+Whenever we replace a whiteout with a directory, that directory is marked as
+opaque.  In Unionfs 2.x, it means that we create a file named
+/a/.wh.__dir_opaque in branch 0, after ...
From: Erez Zadok
Date: Tuesday, April 1, 2008 - 2:06 pm

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/unionfs/inode.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/unionfs/inode.c b/fs/unionfs/inode.c
index 640969d..0dc07ec 100644
--- a/fs/unionfs/inode.c
+++ b/fs/unionfs/inode.c
@@ -1064,8 +1064,7 @@ static int unionfs_setattr(struct dentry *dentry, struct iattr *ia)
 	if (ia->ia_valid & ATTR_ATIME_SET)
 		inode->i_atime = lower_inode->i_atime;
 	fsstack_copy_inode_size(inode, lower_inode);
-	/* if setattr succeeded, then parent dir may have changed */
-	unionfs_copy_attr_times(dentry->d_parent->d_inode);
+
 out:
 	if (!err)
 		unionfs_check_dentry(dentry);
-- 
1.5.2.2

--

From: Erez Zadok
Date: Tuesday, April 1, 2008 - 2:06 pm

CC: Dave Miller <justdave@mozilla.com>

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/unionfs/main.c  |    7 ++++++-
 fs/unionfs/super.c |    4 +++-
 fs/unionfs/union.h |    1 +
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/unionfs/main.c b/fs/unionfs/main.c
index 8f59fb5..b76264a 100644
--- a/fs/unionfs/main.c
+++ b/fs/unionfs/main.c
@@ -736,7 +736,12 @@ static int unionfs_get_sb(struct file_system_type *fs_type,
 			  int flags, const char *dev_name,
 			  void *raw_data, struct vfsmount *mnt)
 {
-	return get_sb_nodev(fs_type, flags, raw_data, unionfs_read_super, mnt);
+	int err;
+	err = get_sb_nodev(fs_type, flags, raw_data, unionfs_read_super, mnt);
+	if (!err)
+		UNIONFS_SB(mnt->mnt_sb)->dev_name =
+			kstrdup(dev_name, GFP_KERNEL);
+	return err;
 }
 
 static struct file_system_type unionfs_fs_type = {
diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
index 2456654..e5cb235 100644
--- a/fs/unionfs/super.c
+++ b/fs/unionfs/super.c
@@ -134,6 +134,7 @@ static void unionfs_put_super(struct super_block *sb)
 		atomic_dec(&s->s_active);
 	}
 
+	kfree(spd->dev_name);
 	kfree(spd->data);
 	kfree(spd);
 	sb->s_fs_info = NULL;
@@ -805,7 +806,8 @@ out_no_change:
 	atomic_set(&UNIONFS_D(sb->s_root)->generation, i);
 	atomic_set(&UNIONFS_I(sb->s_root->d_inode)->generation, i);
 	if (!(*flags & MS_SILENT))
-		pr_info("unionfs: new generation number %d\n", i);
+		pr_info("unionfs: %s: new generation number %d\n",
+			UNIONFS_SB(sb)->dev_name, i);
 	/* finally, update the root dentry's times */
 	unionfs_copy_attr_times(sb->s_root->d_inode);
 	err = 0;		/* reset to success */
diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
index e93b9ef..4ca4e76 100644
--- a/fs/unionfs/union.h
+++ b/fs/unionfs/union.h
@@ -159,6 +159,7 @@ struct unionfs_sb_info {
 	struct rw_semaphore rwsem;
 	pid_t write_lock_owner;	/* PID of rw_sem owner (write lock) */
 	int high_branch_id;	/* last unique branch ID given */
+	char *dev_name;		/* to identify ...
From: Erez Zadok
Date: Tuesday, April 1, 2008 - 2:06 pm

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/unionfs/commonfops.c |   32 ++++++++++-----------------
 fs/unionfs/dirfops.c    |   18 ++++++++++-----
 fs/unionfs/file.c       |   55 +++++++++++++++++++++++++++++------------------
 fs/unionfs/union.h      |    1 -
 4 files changed, 58 insertions(+), 48 deletions(-)

diff --git a/fs/unionfs/commonfops.c b/fs/unionfs/commonfops.c
index 2add167..50f4eda 100644
--- a/fs/unionfs/commonfops.c
+++ b/fs/unionfs/commonfops.c
@@ -302,7 +302,7 @@ out:
  * @willwrite: true if caller may cause changes to the file; false otherwise.
  * Caller must lock/unlock dentry's branch configuration.
  */
-int unionfs_file_revalidate_locked(struct file *file, bool willwrite)
+int unionfs_file_revalidate(struct file *file, bool willwrite)
 {
 	struct super_block *sb;
 	struct dentry *dentry;
@@ -312,6 +312,7 @@ int unionfs_file_revalidate_locked(struct file *file, bool willwrite)
 	int err = 0;
 
 	dentry = file->f_path.dentry;
+	verify_locked(dentry);
 	sb = dentry->d_sb;
 
 	/*
@@ -419,17 +420,6 @@ out_nofree:
 	return err;
 }
 
-int unionfs_file_revalidate(struct file *file, bool willwrite)
-{
-	int err;
-
-	unionfs_lock_dentry(file->f_path.dentry, UNIONFS_DMUTEX_CHILD);
-	err = unionfs_file_revalidate_locked(file, willwrite);
-	unionfs_unlock_dentry(file->f_path.dentry);
-
-	return err;
-}
-
 /* unionfs_open helper function: open a directory */
 static int __open_dir(struct inode *inode, struct file *file)
 {
@@ -632,6 +622,8 @@ int unionfs_file_release(struct inode *inode, struct file *file)
 	int fgen, err = 0;
 
 	unionfs_read_lock(sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
+
 	/*
 	 * Yes, we have to revalidate this file even if it's being released.
 	 * This is important for open-but-unlinked files, as well as mmap
@@ -650,7 +642,6 @@ int unionfs_file_release(struct inode *inode, struct file *file)
 	bstart = fbstart(file);
 	bend = fbend(file);
 ...
From: Erez Zadok
Date: Tuesday, April 1, 2008 - 2:06 pm

This is no longer needed, as we don't have upper and lower pages.  Plus this
was racy, requiring the unexported inode_lock variable.

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/unionfs/dentry.c |   11 -----------
 fs/unionfs/super.c  |    8 +++++---
 2 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/fs/unionfs/dentry.c b/fs/unionfs/dentry.c
index 5e498c2..ee0da4f 100644
--- a/fs/unionfs/dentry.c
+++ b/fs/unionfs/dentry.c
@@ -273,17 +273,6 @@ static inline void purge_inode_data(struct inode *inode)
 	 */
 }
 
-void purge_sb_data(struct super_block *sb)
-{
-	struct inode *inode;
-
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
-		if (inode->i_state & (I_FREEING|I_WILL_FREE))
-			continue;
-		purge_inode_data(inode);
-	}
-}
-
 /*
  * Revalidate a single file/symlink/special dentry.  Assume that info nodes
  * of the dentry and its parent are locked.  Assume that parent(s) are all
diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
index 4cddc83..82b4045 100644
--- a/fs/unionfs/super.c
+++ b/fs/unionfs/super.c
@@ -747,10 +747,12 @@ out_no_change:
 	 * function).  This revalidation will happen lazily and
 	 * incrementally, as users perform operations on cached inodes.  We
 	 * would like to encourage this revalidation to happen sooner if
-	 * possible, so we try to invalidate as many other pages in our
-	 * superblock as we can.
+	 * possible, so we like to try to invalidate as many other pages in
+	 * our superblock as we can.  We used to call drop_pagecache_sb() or
+	 * a variant thereof, but either method was racy (drop_caches alone
+	 * is known to be racy).  So now we let the revalidation happen on a
+	 * per file basis in ->d_revalidate.
 	 */
-	purge_sb_data(sb);
 
 	/* grab new lower super references; release old ones */
 	for (i = 0; i < new_branches; i++)
-- 
1.5.2.2

--

From: Erez Zadok
Date: Tuesday, April 1, 2008 - 2:06 pm

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/unionfs/super.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
index e5cb235..4cddc83 100644
--- a/fs/unionfs/super.c
+++ b/fs/unionfs/super.c
@@ -755,7 +755,7 @@ out_no_change:
 	/* grab new lower super references; release old ones */
 	for (i = 0; i < new_branches; i++)
 		atomic_inc(&new_data[i].sb->s_active);
-	for (i = 0; i < new_branches; i++)
+	for (i = 0; i < sbmax(sb); i++)
 		atomic_dec(&UNIONFS_SB(sb)->data[i].sb->s_active);
 
 	/* copy new vectors into their correct place */
-- 
1.5.2.2

--

From: Erez Zadok
Date: Tuesday, April 1, 2008 - 2:06 pm

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/unionfs/unlink.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/unionfs/unlink.c b/fs/unionfs/unlink.c
index c66bb3e..cad0386 100644
--- a/fs/unionfs/unlink.c
+++ b/fs/unionfs/unlink.c
@@ -280,6 +280,8 @@ out:
 		    !unionfs_lower_inode_idx(inode, dend))
 			ibstart(inode) = ibend(inode) = -1;
 		d_drop(dentry);
+		/* update our lower vfsmnts, in case a copyup took place */
+		unionfs_postcopyup_setmnt(dentry);
 	}
 
 	if (namelist)
-- 
1.5.2.2

--

From: Erez Zadok
Date: Tuesday, April 1, 2008 - 2:06 pm

As per recommendations made at LSF'08, a stackable file system which does
not change the data across layers, should try to use vm_operations instead
of address_space_operations.  This means we now have to implement out own
->read and ->write methods because we cannot rely on VFS helpers which
require us to have a ->readpage method.  Either way there are two caveats:

(1) It's not possible currently to set inode->i_mapping->a_ops to NULL,
because too many code paths expect i_mapping to be non-NULL.

(2) a small/dummy ->readpage is still needed because generic_file_mmap,
which we used in unionfs_mmap, still check for the existence of the
->readpage method.  These code paths may have to be changed to remove the
need for readpage().

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
---
 fs/unionfs/file.c  |  109 +++++++++++++++--
 fs/unionfs/mmap.c  |  338 +++++++---------------------------------------------
 fs/unionfs/union.h |    4 +-
 3 files changed, 147 insertions(+), 304 deletions(-)

diff --git a/fs/unionfs/file.c b/fs/unionfs/file.c
index 1fcaf8e..9397ff3 100644
--- a/fs/unionfs/file.c
+++ b/fs/unionfs/file.c
@@ -18,17 +18,83 @@
 
 #include "union.h"
 
+static ssize_t unionfs_read(struct file *file, char __user *buf,
+			    size_t count, loff_t *ppos)
+{
+	int err;
+	struct file *lower_file;
+	struct dentry *dentry = file->f_path.dentry;
+
+	unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
+	unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
+	err = unionfs_file_revalidate_locked(file, false);
+	if (unlikely(err))
+		goto out;
+
+	lower_file = unionfs_lower_file(file);
+	err = vfs_read(lower_file, buf, count, ppos);
+	/* update our inode atime upon a successful lower read */
+	if (err >= 0) {
+		fsstack_copy_attr_atime(file->f_path.dentry->d_inode,
+					lower_file->f_path.dentry->d_inode);
+		unionfs_check_file(file);
+	}
+
+out:
+	unionfs_unlock_dentry(dentry);
+	unionfs_read_unlock(file->f_path.dentry->d_sb);
+	return err;
+}
+
+static ...
From: Tomas M
Date: Wednesday, April 2, 2008 - 6:48 am

Interesting to see you're finally switching to the approach,
which is in AUFS for ages.

May I ask you, why did it took so long?
Mainly, why did you refused to make the changes last year?

Thank you


Tomas M

--

From: Erez Zadok
Date: Wednesday, April 2, 2008 - 9:54 am

The current ->fault API requires (IMHO) an ugly hack such as this:

1. get the upper file pointer from vma->vm_file
2. get the lower file from the upper file
3. *override* the vma->vm_file pointer with the lower file
4. call the lower ->fault op with the changed vma->vm_file pointer
5. upon return, *reset* the vma->vm_file pointer back to its original value
   (i.e., pointing to the upper file)

Steps 3 and 5 require this "pointer flipping" that I was sure was
unacceptable to kernel developers who might be reviewing the code.  I didn't
want to use this technique before I've had the sanctioning of the
appropriate kernel maintainers, along with an acceptable path to fix the
->fault API.  LSF was an excellent opportunity to discuss this and many
other related issues in person.

Cheers,
Erez.
--

Previous thread: none

Next thread: kmemcheck caught read from freed memory (cfq_free_io_context) by Vegard Nossum on Tuesday, April 1, 2008 - 2:08 pm. (69 messages)