Actually it is the kernel that doesn't need to do anything.... Mapping between the filesystem and the filesystem part of the filehandle is done entirely in user space. The kernel says "Here is a filehandle fragement, what filesystem should I be accessing". So what you really want is to teach nfs-utils to recognise JFFS2 and extract an appropriate uuid. It already uses libblkid to get uuids for ext3 and XFS and others. Why is there a deadlock here? Both readdir and lookup are called with i_mutex held on the directory so there should need to do any extra locking (he said, naively). In the readdirplus cases, i_mutex is held across both the readdir and the lookup.... One problem with your proposed solution is that filehandles aren't all the same length, so you cannot reliably leave space for them. Awkward. NeilBrown --
For JFFS2, there is no UUID; only i_sb->s_dev. Actually. if we just set FS_REQUIRES_DEV then it would work out OK -- at least for the NFS export. We don't do that though, because it gives behaviour we don't Many file systems have their own locking, and lookup() can end up trying to re-take a lock which readdir() is already holding. In the JFFS2 case, it's the fs-internal inode mutex, which is required because the garbage collector can't use i_mutex for lock ordering reasons. See also the readdir implementation and surrounding comments in fs/xfs/linux-2.6/xfs_file.c -- and the way GFS2 uses gfs2_glock_is_locked_by_me() to avoid the deadlock. The annoying thing is that JFFS2 doesn't even _implement_ i_generation, Not without moving stuff around during the postprocessing, I suppose. Which isn't very pretty -- but it's prettier than some of the hacks we have at the moment to avoid the deadlock. -- dwmw2 --
This comes up periodically. It would be great to finally get it fixed. The last conversation I remember started at about: http://marc.info/?l=linux-kernel&m=119506226209056&w=2 Summary: one approach would be to define a ->readdirplus() that passes a dentry to its equivalent of the ->filldir callback. (Christoph points out that returning a stat struct would be simpler. But unfortunately we need to check for mountpoints here, so that's not sufficient.) --b. --
Yeah. I think the sanest plan for the short term is, as hch suggests, just to transplant the existing XFS hack into the nfsd code. That way, at least we can avoid using the hack for local users. And it makes NFS export from other file systems (jffs2, btrfs, etc.) easier without having to put the same hacks in each one. Git tree at git.infradead.org/users/dwmw2/nfsexport-2.6.git; patch sequence follows... -- David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation --
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
fs/nfsd/vfs.c | 40 ++++++++++++++++++++++++----------------
1 files changed, 24 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 18060be..3e22634 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1814,6 +1814,29 @@ out:
return err;
}
+static int nfsd_do_readdir(struct file *file, filldir_t func,
+ struct readdir_cd *cdp, loff_t *offsetp)
+{
+ int host_err;
+
+ /*
+ * Read the directory entries. This silly loop is necessary because
+ * readdir() is not guaranteed to fill up the entire buffer, but
+ * may choose to do less.
+ */
+ do {
+ cdp->err = nfserr_eof; /* will be cleared on successful read */
+ host_err = vfs_readdir(file, func, cdp);
+ } while (host_err >=0 && cdp->err == nfs_ok);
+
+ *offsetp = vfs_llseek(file, 0, 1);
+
+ if (host_err)
+ return nfserrno(host_err);
+ else
+ return cdp->err;
+}
+
/*
* Read entries from a directory.
* The NFSv3/4 verifier we ignore for now.
@@ -1823,7 +1846,6 @@ nfsd_readdir(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t *offsetp,
struct readdir_cd *cdp, filldir_t func)
{
__be32 err;
- int host_err;
struct file *file;
loff_t offset = *offsetp;
@@ -1837,21 +1859,7 @@ nfsd_readdir(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t *offsetp,
goto out_close;
}
- /*
- * Read the directory entries. This silly loop is necessary because
- * readdir() is not guaranteed to fill up the entire buffer, but
- * may choose to do less.
- */
-
- do {
- cdp->err = nfserr_eof; /* will be cleared on successful read */
- host_err = vfs_readdir(file, func, cdp);
- } while (host_err >=0 && cdp->err == nfs_ok);
- if (host_err)
- err = nfserrno(host_err);
- else
- err = cdp->err;
- *offsetp = vfs_llseek(file, 0, 1);
+ err = nfsd_do_readdir(file, func, cdp, offsetp);
if (err == nfserr_eof || err == nfserr_toosmall)
err = nfs_ok; /* can still be found in ->err ...Some file systems with their own internal locking have problems with the
way that nfsd calls the ->lookup() method from within a filldir function
called from their ->readdir() method. The recursion back into the file
system code can cause deadlock.
XFS has a fairly hackish solution to this which involves doing the
readdir() into a locally-allocated buffer, then going back through it
calling the filldir function afterwards. It's not ideal, but it works.
It's particularly suboptimal because XFS does this for local file
systems too, where it's completely unnecessary.
Copy this hack into the NFS code where it can be used only for NFS, and
only for file systems which indicate that they need it by setting
FS_NO_LOOKUP_IN_READDIR in their fs_type flags.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
fs/nfsd/vfs.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++++++-
include/linux/fs.h | 2 +
2 files changed, 112 insertions(+), 1 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 3e22634..81c9411 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1814,6 +1814,111 @@ out:
return err;
}
+struct hack_dirent {
+ u64 ino;
+ loff_t offset;
+ int namlen;
+ unsigned int d_type;
+ char name[];
+};
+
+struct hack_callback {
+ char *dirent;
+ size_t len;
+ size_t used;
+};
+
+static int nfsd_hack_filldir(void *__buf, const char *name, int namlen,
+ loff_t offset, u64 ino, unsigned int d_type)
+{
+ struct hack_callback *buf = __buf;
+ struct hack_dirent *de = (void *)(buf->dirent + buf->used);
+ unsigned int reclen;
+
+ reclen = ALIGN(sizeof(struct hack_dirent) + namlen, sizeof(u64));
+ if (buf->used + reclen > buf->len)
+ return -EINVAL;
+
+ de->namlen = namlen;
+ de->offset = offset;
+ de->ino = ino;
+ de->d_type = d_type;
+ memcpy(de->name, name, namlen);
+ buf->used += reclen;
+
+ return 0;
+}
+
+static int nfsd_hack_readdir(struct file *file, filldir_t func,
+ struct readdir_cd *cdp, ...Now that we've moved the readdir hack to the nfsd code, we can just set
the FS_NO_LOOKUP_IN_READDIR flag and then remove the local hack from the
xfs code.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
fs/xfs/linux-2.6/xfs_file.c | 120 ------------------------------------------
fs/xfs/linux-2.6/xfs_super.c | 2 +-
2 files changed, 1 insertions(+), 121 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index 5f60363..d65d377 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -212,7 +212,6 @@ xfs_file_fsync(
* Hopefully we'll find a better workaround that allows to use the optimal
* version at least for local readdirs for 2.6.25.
*/
-#if 0
STATIC int
xfs_file_readdir(
struct file *filp,
@@ -244,125 +243,6 @@ xfs_file_readdir(
return -error;
return 0;
}
-#else
-
-struct hack_dirent {
- u64 ino;
- loff_t offset;
- int namlen;
- unsigned int d_type;
- char name[];
-};
-
-struct hack_callback {
- char *dirent;
- size_t len;
- size_t used;
-};
-
-STATIC int
-xfs_hack_filldir(
- void *__buf,
- const char *name,
- int namlen,
- loff_t offset,
- u64 ino,
- unsigned int d_type)
-{
- struct hack_callback *buf = __buf;
- struct hack_dirent *de = (struct hack_dirent *)(buf->dirent + buf->used);
- unsigned int reclen;
-
- reclen = ALIGN(sizeof(struct hack_dirent) + namlen, sizeof(u64));
- if (buf->used + reclen > buf->len)
- return -EINVAL;
-
- de->namlen = namlen;
- de->offset = offset;
- de->ino = ino;
- de->d_type = d_type;
- memcpy(de->name, name, namlen);
- buf->used += reclen;
- return 0;
-}
-
-STATIC int
-xfs_file_readdir(
- struct file *filp,
- void *dirent,
- filldir_t filldir)
-{
- struct inode *inode = filp->f_path.dentry->d_inode;
- xfs_inode_t *ip = XFS_I(inode);
- struct hack_callback buf;
- struct hack_dirent *de;
- int error;
- loff_t size;
- int eof = 0;
- xfs_off_t start_offset, curr_offset, ...Now that the readdir/lookup deadlock issues have been dealt with, we can
export JFFS2 file systems again.
(For now, you have to specify fsid manually; we should add a method to
the export_ops to handle that too.)
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
fs/jffs2/super.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 61 insertions(+), 0 deletions(-)
diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c
index efd4012..2138b26 100644
--- a/fs/jffs2/super.c
+++ b/fs/jffs2/super.c
@@ -22,6 +22,7 @@
#include <linux/mtd/super.h>
#include <linux/ctype.h>
#include <linux/namei.h>
+#include <linux/exportfs.h>
#include "compr.h"
#include "nodelist.h"
@@ -62,6 +63,64 @@ static int jffs2_sync_fs(struct super_block *sb, int wait)
return 0;
}
+static struct inode *jffs2_nfs_get_inode(struct super_block *sb, uint64_t ino,
+ uint32_t generation)
+{
+ return jffs2_iget(sb, ino);
+}
+
+static struct dentry *jffs2_fh_to_dentry(struct super_block *sb, struct fid *fid,
+ int fh_len, int fh_type)
+{
+ return generic_fh_to_dentry(sb, fid, fh_len, fh_type,
+ jffs2_nfs_get_inode);
+}
+
+static struct dentry *jffs2_fh_to_parent(struct super_block *sb, struct fid *fid,
+ int fh_len, int fh_type)
+{
+ return generic_fh_to_parent(sb, fid, fh_len, fh_type,
+ jffs2_nfs_get_inode);
+}
+
+static struct dentry *jffs2_get_parent(struct dentry *child)
+{
+ struct jffs2_inode_info *f;
+ struct dentry *parent;
+ struct inode *inode;
+ uint32_t pino;
+
+ if (!S_ISDIR(child->d_inode->i_mode))
+ return ERR_PTR(-EACCES);
+
+ f = JFFS2_INODE_INFO(child->d_inode);
+
+ pino = f->inocache->pino_nlink;
+
+ JFFS2_DEBUG("Parent of directory ino #%u is #%u\n",
+ f->inocache->ino, pino);
+
+ inode = jffs2_iget(child->d_inode->i_sb, pino);
+ if (IS_ERR(inode)) {
+ JFFS2_ERROR("Failed to get inode #%u: %ld\n", pino,
+ ...