Re: [RFC] Reinstate NFS exportability for JFFS2.

Previous thread: [RFC] Reinstate NFS exportability for JFFS2. by David Woodhouse on Thursday, May 1, 2008 - 12:42 pm. (3 messages)

Next thread: RFC: case-insensitive rename support in the VFS by Barry Naujok on Thursday, May 1, 2008 - 9:14 pm. (2 messages)
From: Neil Brown
Date: Thursday, May 1, 2008 - 6:38 pm

Actually it is the kernel that doesn't need to do anything....
Mapping between the filesystem and the filesystem part of the
filehandle is done entirely in user space.
The kernel says "Here is a filehandle fragement, what filesystem
should I be accessing".

So what you really want is to teach nfs-utils to recognise JFFS2 and
extract an appropriate uuid.
It already uses libblkid to get uuids for ext3 and XFS and others.

Why is there a deadlock here?
Both readdir and lookup are called with i_mutex held on the directory
so there should need to do any extra locking (he said, naively).  In
the readdirplus cases, i_mutex is held across both the readdir and the
lookup....

One problem with your proposed solution is that filehandles aren't all
the same length, so you cannot reliably leave space for them.

Awkward.

NeilBrown
--

From: David Woodhouse
Date: Friday, May 2, 2008 - 4:37 am

For JFFS2, there is no UUID; only i_sb->s_dev. Actually. if we just set
FS_REQUIRES_DEV then it would work out OK -- at least for the NFS
export. We don't do that though, because it gives behaviour we don't

Many file systems have their own locking, and lookup() can end up trying
to re-take a lock which readdir() is already holding. In the JFFS2 case,
it's the fs-internal inode mutex, which is required because the garbage
collector can't use i_mutex for lock ordering reasons.

See also the readdir implementation and surrounding comments in
fs/xfs/linux-2.6/xfs_file.c -- and the way GFS2 uses
gfs2_glock_is_locked_by_me() to avoid the deadlock.

The annoying thing is that JFFS2 doesn't even _implement_ i_generation,

Not without moving stuff around during the postprocessing, I suppose.
Which isn't very pretty -- but it's prettier than some of the hacks we
have at the moment to avoid the deadlock.

-- 
dwmw2

--

From: J. Bruce Fields
Date: Friday, May 2, 2008 - 7:08 am

This comes up periodically.  It would be great to finally get it fixed.
The last conversation I remember started at about:

	http://marc.info/?l=linux-kernel&m=119506226209056&w=2

Summary: one approach would be to define a ->readdirplus() that passes a
dentry to its equivalent of the ->filldir callback.  (Christoph points
out that returning a stat struct would be simpler.  But unfortunately we
need to check for mountpoints here, so that's not sufficient.)

--b.
--

From: David Woodhouse
Date: Thursday, July 31, 2008 - 2:54 pm

Yeah. I think the sanest plan for the short term is, as hch suggests,
just to transplant the existing XFS hack into the nfsd code. That way,
at least we can avoid using the hack for local users. And it makes NFS
export from other file systems (jffs2, btrfs, etc.) easier without
having to put the same hacks in each one.

Git tree at git.infradead.org/users/dwmw2/nfsexport-2.6.git; patch
sequence follows...

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

--

From: David Woodhouse
Date: Thursday, July 31, 2008 - 2:54 pm

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
 fs/nfsd/vfs.c |   40 ++++++++++++++++++++++++----------------
 1 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 18060be..3e22634 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1814,6 +1814,29 @@ out:
 	return err;
 }
 
+static int nfsd_do_readdir(struct file *file, filldir_t func,
+			   struct readdir_cd *cdp, loff_t *offsetp)
+{
+	int host_err;
+
+	/*
+	 * Read the directory entries. This silly loop is necessary because
+	 * readdir() is not guaranteed to fill up the entire buffer, but
+	 * may choose to do less.
+	 */
+	do {
+		cdp->err = nfserr_eof; /* will be cleared on successful read */
+		host_err = vfs_readdir(file, func, cdp);
+	} while (host_err >=0 && cdp->err == nfs_ok);
+
+	*offsetp = vfs_llseek(file, 0, 1);
+
+	if (host_err)
+		return nfserrno(host_err);
+	else
+		return cdp->err;
+}
+
 /*
  * Read entries from a directory.
  * The  NFSv3/4 verifier we ignore for now.
@@ -1823,7 +1846,6 @@ nfsd_readdir(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t *offsetp,
 	     struct readdir_cd *cdp, filldir_t func)
 {
 	__be32		err;
-	int 		host_err;
 	struct file	*file;
 	loff_t		offset = *offsetp;
 
@@ -1837,21 +1859,7 @@ nfsd_readdir(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t *offsetp,
 		goto out_close;
 	}
 
-	/*
-	 * Read the directory entries. This silly loop is necessary because
-	 * readdir() is not guaranteed to fill up the entire buffer, but
-	 * may choose to do less.
-	 */
-
-	do {
-		cdp->err = nfserr_eof; /* will be cleared on successful read */
-		host_err = vfs_readdir(file, func, cdp);
-	} while (host_err >=0 && cdp->err == nfs_ok);
-	if (host_err)
-		err = nfserrno(host_err);
-	else
-		err = cdp->err;
-	*offsetp = vfs_llseek(file, 0, 1);
+	err = nfsd_do_readdir(file, func, cdp, offsetp);
 
 	if (err == nfserr_eof || err == nfserr_toosmall)
 		err = nfs_ok; /* can still be found in ->err ...
From: David Woodhouse
Date: Thursday, July 31, 2008 - 2:54 pm

Some file systems with their own internal locking have problems with the
way that nfsd calls the ->lookup() method from within a filldir function
called from their ->readdir() method. The recursion back into the file
system code can cause deadlock.

XFS has a fairly hackish solution to this which involves doing the
readdir() into a locally-allocated buffer, then going back through it
calling the filldir function afterwards. It's not ideal, but it works.

It's particularly suboptimal because XFS does this for local file
systems too, where it's completely unnecessary.

Copy this hack into the NFS code where it can be used only for NFS, and
only for file systems which indicate that they need it by setting
FS_NO_LOOKUP_IN_READDIR in their fs_type flags.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
 fs/nfsd/vfs.c      |  111 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/fs.h |    2 +
 2 files changed, 112 insertions(+), 1 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 3e22634..81c9411 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1814,6 +1814,111 @@ out:
 	return err;
 }
 
+struct hack_dirent {
+	u64		ino;
+	loff_t		offset;
+	int		namlen;
+	unsigned int	d_type;
+	char		name[];
+};
+
+struct hack_callback {
+	char		*dirent;
+	size_t		len;
+	size_t		used;
+};
+
+static int nfsd_hack_filldir(void *__buf, const char *name, int namlen,
+			     loff_t offset, u64 ino, unsigned int d_type)
+{
+	struct hack_callback *buf = __buf;
+	struct hack_dirent *de = (void *)(buf->dirent + buf->used);
+	unsigned int reclen;
+
+	reclen = ALIGN(sizeof(struct hack_dirent) + namlen, sizeof(u64));
+	if (buf->used + reclen > buf->len)
+		return -EINVAL;
+
+	de->namlen = namlen;
+	de->offset = offset;
+	de->ino = ino;
+	de->d_type = d_type;
+	memcpy(de->name, name, namlen);
+	buf->used += reclen;
+
+	return 0;
+}
+
+static int nfsd_hack_readdir(struct file *file, filldir_t func, 
+			     struct readdir_cd *cdp, ...
From: David Woodhouse
Date: Thursday, July 31, 2008 - 2:55 pm

Now that we've moved the readdir hack to the nfsd code, we can just set
the FS_NO_LOOKUP_IN_READDIR flag and then remove the local hack from the
xfs code.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
 fs/xfs/linux-2.6/xfs_file.c  |  120 ------------------------------------------
 fs/xfs/linux-2.6/xfs_super.c |    2 +-
 2 files changed, 1 insertions(+), 121 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index 5f60363..d65d377 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -212,7 +212,6 @@ xfs_file_fsync(
  * Hopefully we'll find a better workaround that allows to use the optimal
  * version at least for local readdirs for 2.6.25.
  */
-#if 0
 STATIC int
 xfs_file_readdir(
 	struct file	*filp,
@@ -244,125 +243,6 @@ xfs_file_readdir(
 		return -error;
 	return 0;
 }
-#else
-
-struct hack_dirent {
-	u64		ino;
-	loff_t		offset;
-	int		namlen;
-	unsigned int	d_type;
-	char		name[];
-};
-
-struct hack_callback {
-	char		*dirent;
-	size_t		len;
-	size_t		used;
-};
-
-STATIC int
-xfs_hack_filldir(
-	void		*__buf,
-	const char	*name,
-	int		namlen,
-	loff_t		offset,
-	u64		ino,
-	unsigned int	d_type)
-{
-	struct hack_callback *buf = __buf;
-	struct hack_dirent *de = (struct hack_dirent *)(buf->dirent + buf->used);
-	unsigned int reclen;
-
-	reclen = ALIGN(sizeof(struct hack_dirent) + namlen, sizeof(u64));
-	if (buf->used + reclen > buf->len)
-		return -EINVAL;
-
-	de->namlen = namlen;
-	de->offset = offset;
-	de->ino = ino;
-	de->d_type = d_type;
-	memcpy(de->name, name, namlen);
-	buf->used += reclen;
-	return 0;
-}
-
-STATIC int
-xfs_file_readdir(
-	struct file	*filp,
-	void		*dirent,
-	filldir_t	filldir)
-{
-	struct inode	*inode = filp->f_path.dentry->d_inode;
-	xfs_inode_t	*ip = XFS_I(inode);
-	struct hack_callback buf;
-	struct hack_dirent *de;
-	int		error;
-	loff_t		size;
-	int		eof = 0;
-	xfs_off_t       start_offset, curr_offset, ...
From: David Woodhouse
Date: Thursday, July 31, 2008 - 2:55 pm

Now that the readdir/lookup deadlock issues have been dealt with, we can
export JFFS2 file systems again.

(For now, you have to specify fsid manually; we should add a method to
the export_ops to handle that too.)

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
 fs/jffs2/super.c |   61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 61 insertions(+), 0 deletions(-)

diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c
index efd4012..2138b26 100644
--- a/fs/jffs2/super.c
+++ b/fs/jffs2/super.c
@@ -22,6 +22,7 @@
 #include <linux/mtd/super.h>
 #include <linux/ctype.h>
 #include <linux/namei.h>
+#include <linux/exportfs.h>
 #include "compr.h"
 #include "nodelist.h"
 
@@ -62,6 +63,64 @@ static int jffs2_sync_fs(struct super_block *sb, int wait)
 	return 0;
 }
 
+static struct inode *jffs2_nfs_get_inode(struct super_block *sb, uint64_t ino,
+					 uint32_t generation)
+{
+	return jffs2_iget(sb, ino);
+}
+
+static struct dentry *jffs2_fh_to_dentry(struct super_block *sb, struct fid *fid,
+					 int fh_len, int fh_type)
+{
+        return generic_fh_to_dentry(sb, fid, fh_len, fh_type,
+                                    jffs2_nfs_get_inode);
+}
+
+static struct dentry *jffs2_fh_to_parent(struct super_block *sb, struct fid *fid,
+					 int fh_len, int fh_type)
+{
+        return generic_fh_to_parent(sb, fid, fh_len, fh_type,
+                                    jffs2_nfs_get_inode);
+}
+
+static struct dentry *jffs2_get_parent(struct dentry *child)
+{
+	struct jffs2_inode_info *f;
+	struct dentry *parent;
+	struct inode *inode;
+	uint32_t pino;
+
+	if (!S_ISDIR(child->d_inode->i_mode))
+		return ERR_PTR(-EACCES);
+
+	f = JFFS2_INODE_INFO(child->d_inode);
+
+	pino = f->inocache->pino_nlink;
+
+	JFFS2_DEBUG("Parent of directory ino #%u is #%u\n",
+		    f->inocache->ino, pino);
+
+	inode = jffs2_iget(child->d_inode->i_sb, pino);
+	if (IS_ERR(inode)) {
+		JFFS2_ERROR("Failed to get inode #%u: %ld\n", pino,
+			    ...
Previous thread: [RFC] Reinstate NFS exportability for JFFS2. by David Woodhouse on Thursday, May 1, 2008 - 12:42 pm. (3 messages)

Next thread: RFC: case-insensitive rename support in the VFS by Barry Naujok on Thursday, May 1, 2008 - 9:14 pm. (2 messages)