Re: [PATCH 00/46] rcu-walk and dcache scaling

Previous thread: [PATCH 27/46] fs: reduce dcache_inode_lock width in lru scanning by Nick Piggin on Saturday, November 27, 2010 - 2:44 am. (1 message)

Next thread: Ensure Your Account by bruno.dondero on Saturday, November 27, 2010 - 3:13 am. (1 message)
From: Nick Piggin
Date: Saturday, November 27, 2010 - 3:15 am

git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin.git vfs-scale-working

Here is an new set of vfs patches for review, not that there was much interest
last time they were posted. It is structured like:

* preparation patches
* introduce new locks to take over dcache_lock, then remove it
* cleaning up and reworking things for new locks
* rcu-walk path walking
* start on some fine grained locking steps

Thanks,
Nick


Nick Piggin (46):
  Revert "fs: use RCU read side protection in d_validate"
  fs: d_validate fixes
  kernel: kmem_ptr_validate considered harmful
  fs: dcache documentation cleanup
  fs: change d_delete semantics
  cifs: dont overwrite dentry name in d_revalidate
  jfs: dont overwrite dentry name in d_revalidate
  fs: change d_compare for rcu-walk
  fs: change d_hash for rcu-walk
  hostfs: simplify locking
  fs: dcache scale hash
  fs: dcache scale lru
  fs: dcache scale dentry refcount
  fs: dcache scale d_unhashed
  fs: dcache scale subdirs
  fs: scale inode alias list
  fs: Use rename lock and RCU for multi-step operations
  fs: increase d_name lock coverage
  fs: dcache remove dcache_lock
  fs: dcache avoid starvation in dcache multi-step operations
  fs: dcache reduce dput locking
  fs: dcache reduce locking in d_alloc
  fs: dcache reduce dcache_inode_lock
  fs: dcache rationalise dget variants
  fs: dcache reduce d_parent locking
  fs: dcache reduce prune_one_dentry locking
  fs: reduce dcache_inode_lock width in lru scanning
  fs: use RCU in shrink_dentry_list to reduce lock nesting
  fs: consolidate dentry kill sequence
  fs: icache RCU free inodes
  fs: avoid inode RCU freeing for pseudo fs
  kernel: optimise seqlock
  fs: rcu-walk for path lookup
  fs: fs_struct use seqlock
  fs: dcache remove d_mounted
  fs: dcache reduce branches in lookup path
  fs: cache optimise dentry and inode for rcu-walk
  fs: prefetch inode data in dcache lookup
  fs: d_revalidate_rcu for rcu-walk
  fs: provide rcu-walk aware permission ...
From: Nick Piggin
Date: Saturday, November 27, 2010 - 2:45 am

RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
  permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
  to take i_lock no longer need to take sb_inode_list_lock to walk the list in
  the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
  page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
---
 Documentation/filesystems/porting         |   14 ++++++++++++++
 arch/powerpc/platforms/cell/spufs/inode.c |   10 ++++++++--
 drivers/staging/pohmelfs/inode.c          |   11 +++++++++--
 fs/9p/vfs_inode.c                         |    9 ++++++++-
 fs/adfs/super.c                           |    9 ++++++++-
 fs/affs/super.c                           |    9 ++++++++-
 fs/afs/super.c                            |   10 +++++++++-
 fs/befs/linuxvfs.c                        |   10 ++++++++--
 fs/bfs/inode.c                            |    9 ...
From: Nick Piggin
Date: Saturday, November 27, 2010 - 2:45 am

Rather than keep a d_mounted count in the dentry, set a dentry flag instead.
The flag can be cleared by checking the hash table to see if there are any
mounts left, which is not time critical because it is performed at detach time.

The mounted state of a dentry is only used to speculatively take a look in the
mount hash table if it is set -- before following the mount, vfsmount lock is
taken and mount re-checked without races.

This saves 4 bytes on 32-bit, nothing on 64-bit but it does provide a hole I
might use later (and some configs have larger than 32-bit spinlocks which might
make use of the hole).

Autofs4 conversion and changelog by Ian Kent <raven@themaw.net>:
In autofs4, when expring direct (or offset) mounts we need to ensure that we
block user path walks into the autofs mount, which is covered by another mount.
To do this we clear the mounted status so that follows stop before walking into
the mount and are essentially blocked until the expire is completed. The
automount daemon still finds the correct dentry for the umount due to the
follow mount logic in fs/autofs4/root.c:autofs4_follow_link(), which is set as
an inode operation for direct and offset mounts only and is called following
the lookup that stopped at the covered mount.

At the end of the expire the covering mount probably has gone away so the
mounted status need not be restored. But we need to check this and only restore
the mounted status if the expire failed.

XXX: autofs may not work right if we have other mounts go over the top of it?

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
---
 fs/autofs4/expire.c    |   13 +++++++++++--
 fs/dcache.c            |    1 -
 fs/namespace.c         |   29 ++++++++++++++++++++++++++---
 include/linux/dcache.h |   42 +++++++++++++++++++++---------------------
 4 files changed, 58 insertions(+), 27 deletions(-)

diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
index 7869b3a..d9f9a15 100644
--- a/fs/autofs4/expire.c
+++ b/fs/autofs4/expire.c
@@ -295,7 ...
From: Nick Piggin
Date: Saturday, November 27, 2010 - 2:45 am

Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -l "d_op =" | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
---
 arch/ia64/kernel/perfmon.c    |    2 +-
 drivers/staging/autofs/root.c |    2 +-
 fs/9p/vfs_inode.c             |   26 +++++++++++++-------------
 fs/adfs/dir.c                 |    2 +-
 fs/adfs/super.c               |    2 +-
 fs/affs/namei.c               |    2 +-
 fs/affs/super.c               |    2 +-
 fs/afs/dir.c                  |    2 +-
 fs/anon_inodes.c              |    2 +-
 fs/autofs4/inode.c            |    2 +-
 fs/autofs4/root.c             |   10 +++++-----
 fs/btrfs/export.c             |    4 ++--
 fs/btrfs/inode.c              |    2 +-
 fs/ceph/dir.c                 |    6 +++---
 fs/cifs/dir.c                 |   16 ++++++++--------
 fs/cifs/inode.c               |    8 ++++----
 fs/cifs/link.c                |    4 ++--
 fs/cifs/readdir.c             |    4 ++--
 fs/coda/dir.c                 |    2 +-
 fs/configfs/dir.c             |    8 ++++----
 fs/dcache.c                   |   25 +++++++++++++++++++++----
 fs/ecryptfs/inode.c           |    2 +-
 fs/ecryptfs/main.c            |    4 ++--
 fs/fat/inode.c                |    4 ++--
 fs/fat/namei_msdos.c          |    6 +++---
 fs/fat/namei_vfat.c           |    8 ++++----
 fs/fuse/dir.c                 |    2 +-
 fs/fuse/inode.c               |    4 ++--
 fs/gfs2/export.c              |    4 ++--
 fs/gfs2/ops_fstype.c          |    2 +-
 fs/gfs2/ops_inode.c           |    2 +-
 fs/hfs/dir.c                  |    2 +-
 fs/hfs/super.c          ...
From: Nick Piggin
Date: Saturday, November 27, 2010 - 2:45 am

dcache_inode_lock can be replaced with per-inode locking. Use existing
inode->i_lock for this. This is slightly non-trivial because we sometimes
need to find the inode from the dentry, which requires d_inode to be
stabilised (either with refcount or d_lock).

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
---
 fs/9p/vfs_inode.c      |    4 +-
 fs/affs/amigaffs.c     |    4 +-
 fs/dcache.c            |   84 +++++++++++++++++++++++++----------------------
 fs/exportfs/expfs.c    |   12 ++++---
 fs/nfs/getroot.c       |    4 +-
 fs/notify/fsnotify.c   |    4 +-
 fs/ocfs2/dcache.c      |    4 +-
 include/linux/dcache.h |    1 -
 8 files changed, 62 insertions(+), 55 deletions(-)

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index df8bbb3..5978298 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -277,11 +277,11 @@ static struct dentry *v9fs_dentry_from_dir_inode(struct inode *inode)
 {
 	struct dentry *dentry;
 
-	spin_lock(&dcache_inode_lock);
+	spin_lock(&inode->i_lock);
 	/* Directory should have only one entry. */
 	BUG_ON(S_ISDIR(inode->i_mode) && !list_is_singular(&inode->i_dentry));
 	dentry = list_entry(inode->i_dentry.next, struct dentry, d_alias);
-	spin_unlock(&dcache_inode_lock);
+	spin_unlock(&inode->i_lock);
 	return dentry;
 }
 
diff --git a/fs/affs/amigaffs.c b/fs/affs/amigaffs.c
index 600101a..3a4557e 100644
--- a/fs/affs/amigaffs.c
+++ b/fs/affs/amigaffs.c
@@ -128,7 +128,7 @@ affs_fix_dcache(struct dentry *dentry, u32 entry_ino)
 	void *data = dentry->d_fsdata;
 	struct list_head *head, *next;
 
-	spin_lock(&dcache_inode_lock);
+	spin_lock(&inode->i_lock);
 	head = &inode->i_dentry;
 	next = head->next;
 	while (next != head) {
@@ -139,7 +139,7 @@ affs_fix_dcache(struct dentry *dentry, u32 entry_ino)
 		}
 		next = next->next;
 	}
-	spin_unlock(&dcache_inode_lock);
+	spin_unlock(&inode->i_lock);
 }
 
 
diff --git a/fs/dcache.c b/fs/dcache.c
index 5e19940..809ec46 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -39,8 +39,8 ...
From: Nick Piggin
Date: Saturday, November 27, 2010 - 2:45 am

We can turn the dcache hash locking from a global dcache_hash_lock into
per-bucket locking.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
---
 fs/dcache.c            |  131 ++++++++++++++++++++++++++++++++---------------
 fs/super.c             |    3 +-
 include/linux/dcache.h |   23 ++-------
 include/linux/fs.h     |    3 +-
 4 files changed, 97 insertions(+), 63 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 67a08d4..5e19940 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -33,13 +33,15 @@
 #include <linux/bootmem.h>
 #include <linux/fs_struct.h>
 #include <linux/hardirq.h>
+#include <linux/bit_spinlock.h>
+#include <linux/rculist_bl.h>
 #include "internal.h"
 
 /*
  * Usage:
  * dcache_inode_lock protects:
  *   - i_dentry, d_alias, d_inode
- * dcache_hash_lock protects:
+ * dcache_hash_bucket lock protects:
  *   - the dcache hash table
  * dcache_lru_lock protects:
  *   - the dcache lru lists and counters
@@ -57,7 +59,7 @@
  * dcache_inode_lock
  *   dentry->d_lock
  *     dcache_lru_lock
- *     dcache_hash_lock
+ *     dcache_hash_bucket lock
  *
  * If there is an ancestor relationship:
  * dentry->d_parent->...->d_parent->d_lock
@@ -74,13 +76,11 @@ int sysctl_vfs_cache_pressure __read_mostly = 100;
 EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
 
 __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_inode_lock);
-__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_hash_lock);
 static __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lru_lock);
 __cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
 
 EXPORT_SYMBOL(rename_lock);
 EXPORT_SYMBOL(dcache_inode_lock);
-EXPORT_SYMBOL(dcache_hash_lock);
 
 static struct kmem_cache *dentry_cache __read_mostly;
 
@@ -97,13 +97,35 @@ static struct kmem_cache *dentry_cache __read_mostly;
 
 static unsigned int d_hash_mask __read_mostly;
 static unsigned int d_hash_shift __read_mostly;
-static struct hlist_head *dentry_hashtable __read_mostly;
+
+struct dcache_hash_bucket {
+	struct ...
From: Nick Piggin
Date: Saturday, November 27, 2010 - 2:45 am

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
---
 fs/namei.c         |  131 ++++++++++++++++++++++++++++++++++++++++------------
 include/linux/fs.h |    6 ++
 2 files changed, 107 insertions(+), 30 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index d8f7ece..7fa6119 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -169,8 +169,36 @@ EXPORT_SYMBOL(putname);
 /*
  * This does basic POSIX ACL permission checking
  */
-static inline int __acl_permission_check(struct inode *inode, int mask,
-		int (*check_acl)(struct inode *inode, int mask), int rcu)
+static int acl_permission_check_rcu(struct inode *inode, int mask, unsigned int flags,
+		int (*check_acl_rcu)(struct inode *inode, int mask, unsigned int flags))
+{
+	umode_t			mode = inode->i_mode;
+
+	mask &= MAY_READ | MAY_WRITE | MAY_EXEC;
+
+	if (current_fsuid() == inode->i_uid)
+		mode >>= 6;
+	else {
+		if (IS_POSIXACL(inode) && (mode & S_IRWXG) && check_acl_rcu) {
+			int error = check_acl_rcu(inode, mask, flags);
+			if (error != -EAGAIN)
+				return error;
+		}
+
+		if (in_group_p(inode->i_gid))
+			mode >>= 3;
+	}
+
+	/*
+	 * If the DACs are ok we don't need any capability check.
+	 */
+	if ((mask & ~mode) == 0)
+		return 0;
+	return -EACCES;
+}
+
+static int acl_permission_check(struct inode *inode, int mask, unsigned int flags,
+		int (*check_acl)(struct inode *inode, int mask))
 {
 	umode_t			mode = inode->i_mode;
 
@@ -180,7 +208,7 @@ static inline int __acl_permission_check(struct inode *inode, int mask,
 		mode >>= 6;
 	else {
 		if (IS_POSIXACL(inode) && (mode & S_IRWXG) && check_acl) {
-			if (rcu) {
+			if (flags) {
 				return -ECHILD;
 			} else {
 				int error = check_acl(inode, mask);
@@ -201,10 +229,52 @@ static inline int __acl_permission_check(struct inode *inode, int mask,
 	return -EACCES;
 }
 
-static inline int acl_permission_check(struct inode *inode, int mask,
-		int (*check_acl)(struct inode *inode, int mask))
+/**
+ * generic_permission_rcu  -  check for access ...
From: Nick Piggin
Date: Saturday, November 27, 2010 - 2:44 am

prune_one_dentry can avoid quite a bit of locking in the common case where
ancestors have an elevated refcount. Alternatively, we could have gone the
other way and made fewer trylocks in the case where d_count goes to zero, but
is probably less common.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
---
 fs/dcache.c |   27 +++++++++++++++------------
 1 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 23f6fed..4fa27b5 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -547,26 +547,29 @@ static void prune_one_dentry(struct dentry *dentry, struct dentry *parent)
 	 * Prune ancestors.
 	 */
 	while (dentry) {
-		spin_lock(&dcache_inode_lock);
-again:
+relock:
 		spin_lock(&dentry->d_lock);
+		if (dentry->d_count > 1) {
+			dentry->d_count--;
+			spin_unlock(&dentry->d_lock);
+	 		return;
+	 	}
+		if (!spin_trylock(&dcache_inode_lock)) {
+relock2:
+			spin_unlock(&dentry->d_lock);
+			cpu_relax();
+			goto relock;
+		}
+
 		if (IS_ROOT(dentry))
 			parent = NULL;
 		else
 			parent = dentry->d_parent;
 		if (parent && !spin_trylock(&parent->d_lock)) {
-			spin_unlock(&dentry->d_lock);
-			goto again;
-		}
-		dentry->d_count--;
-		if (dentry->d_count) {
-			if (parent)
-				spin_unlock(&parent->d_lock);
-			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dcache_inode_lock);
-			return;
+			goto relock2;
 		}
-
+		dentry->d_count--;
 		dentry_lru_del(dentry);
 		__d_drop(dentry);
 		dentry = d_kill(dentry, parent);
-- 
1.7.1

--

From: Nick Piggin
Date: Saturday, November 27, 2010 - 2:44 am

dcache_inode_lock can be avoided in d_delete() and d_materialise_unique()
in cases where it is not required.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
---
 fs/dcache.c |   24 ++++++++++++------------
 1 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 3e4f7c1..6fe387d 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1794,10 +1794,15 @@ void d_delete(struct dentry * dentry)
 	/*
 	 * Are we the only user?
 	 */
-	spin_lock(&dcache_inode_lock);
+again:
 	spin_lock(&dentry->d_lock);
 	isdir = S_ISDIR(dentry->d_inode->i_mode);
 	if (dentry->d_count == 1) {
+		if (!spin_trylock(&dcache_inode_lock)) {
+			spin_unlock(&dentry->d_lock);
+			cpu_relax();
+			goto again;
+		}
 		dentry->d_flags &= ~DCACHE_CANT_MOUNT;
 		dentry_iput(dentry);
 		fsnotify_nameremove(dentry, isdir);
@@ -1808,7 +1813,6 @@ void d_delete(struct dentry * dentry)
 		__d_drop(dentry);
 
 	spin_unlock(&dentry->d_lock);
-	spin_unlock(&dcache_inode_lock);
 
 	fsnotify_nameremove(dentry, isdir);
 }
@@ -2106,14 +2110,15 @@ struct dentry *d_materialise_unique(struct dentry *dentry, struct inode *inode)
 
 	BUG_ON(!d_unhashed(dentry));
 
-	spin_lock(&dcache_inode_lock);
-
 	if (!inode) {
 		actual = dentry;
 		__d_instantiate(dentry, NULL);
-		goto found_lock;
+		d_rehash(actual);
+		goto out_nolock;
 	}
 
+	spin_lock(&dcache_inode_lock);
+
 	if (S_ISDIR(inode->i_mode)) {
 		struct dentry *alias;
 
@@ -2145,10 +2150,9 @@ struct dentry *d_materialise_unique(struct dentry *dentry, struct inode *inode)
 	actual = __d_instantiate_unique(dentry, inode);
 	if (!actual)
 		actual = dentry;
-	else if (unlikely(!d_unhashed(actual)))
-		goto shouldnt_be_hashed;
+	else
+		BUG_ON(!d_unhashed(actual));
 
-found_lock:
 	spin_lock(&actual->d_lock);
 found:
 	spin_lock(&dcache_hash_lock);
@@ -2164,10 +2168,6 @@ out_nolock:
 
 	iput(inode);
 	return actual;
-
-shouldnt_be_hashed:
-	spin_unlock(&dcache_inode_lock);
-	BUG();
 }
 ...
From: Nick Piggin
Date: Saturday, November 27, 2010 - 2:44 am

Protect d_subdirs and d_child with d_lock, except in filesystems that aren't
using dcache_lock for these anyway (eg. using i_mutex).

Note: if we change the locking rule in future so that ->d_child protection is
provided only with ->d_parent->d_lock, it may allow us to reduce some locking.
But it would be an exception to an otherwise regular locking scheme, so we'd
have to see some good results. Probably not worthwhile.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
---
 drivers/staging/smbfs/cache.c |    4 +
 drivers/usb/core/inode.c      |    8 +-
 fs/autofs4/autofs_i.h         |   11 +++
 fs/autofs4/expire.c           |  129 +++++++++++++--------------
 fs/autofs4/root.c             |   18 ++++-
 fs/ceph/dir.c                 |    6 +-
 fs/ceph/inode.c               |    8 ++-
 fs/coda/cache.c               |    2 +
 fs/dcache.c                   |  195 +++++++++++++++++++++++++++++++----------
 fs/libfs.c                    |   24 ++++--
 fs/ncpfs/dir.c                |    3 +
 fs/ncpfs/ncplib_kernel.h      |    4 +
 fs/notify/fsnotify.c          |    4 +-
 include/linux/dcache.h        |    1 +
 kernel/cgroup.c               |   19 ++++-
 security/selinux/selinuxfs.c  |   12 ++-
 16 files changed, 314 insertions(+), 134 deletions(-)

diff --git a/drivers/staging/smbfs/cache.c b/drivers/staging/smbfs/cache.c
index dbb9865..29afb05 100644
--- a/drivers/staging/smbfs/cache.c
+++ b/drivers/staging/smbfs/cache.c
@@ -63,6 +63,7 @@ smb_invalidate_dircache_entries(struct dentry *parent)
 	struct dentry *dentry;
 
 	spin_lock(&dcache_lock);
+	spin_lock(&parent->d_lock);
 	next = parent->d_subdirs.next;
 	while (next != &parent->d_subdirs) {
 		dentry = list_entry(next, struct dentry, d_u.d_child);
@@ -70,6 +71,7 @@ smb_invalidate_dircache_entries(struct dentry *parent)
 		smb_age_dentry(server, dentry);
 		next = next->next;
 	}
+	spin_unlock(&parent->d_lock);
 	spin_unlock(&dcache_lock);
 }
 
@@ -97,6 +99,7 @@ smb_dget_fpos(struct dentry *dentry, struct dentry ...
From: Nick Piggin
Date: Saturday, November 27, 2010 - 2:44 am

The remaining usages for dcache_lock is to allow atomic, multi-step read-side
operations over the directory tree by excluding modifications to the tree.
Also, to walk in the leaf->root direction in the tree where we don't have
a natural d_lock ordering.

This could be accomplished by taking every d_lock, but this would mean a
huge number of locks and actually gets very tricky.

Solve this instead by using the rename seqlock for multi-step read-side
operations, retry in case of a rename so we don't walk up the wrong parent.
Concurrent dentry insertions are not serialised against.  Concurrent deletes
are tricky when walking up the directory: our parent might have been deleted
when dropping locks so also need to check and retry for that.

We can also use the rename lock in cases where livelock is a worry (and it
is introduced in subsequent patch).

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
---
 drivers/staging/pohmelfs/path_entry.c |   15 +++-
 fs/autofs4/waitq.c                    |   18 ++++-
 fs/dcache.c                           |  136 +++++++++++++++++++++++++++------
 fs/nfs/namespace.c                    |   14 +++-
 include/linux/dcache.h                |    1 +
 5 files changed, 153 insertions(+), 31 deletions(-)

diff --git a/drivers/staging/pohmelfs/path_entry.c b/drivers/staging/pohmelfs/path_entry.c
index 8ec83d2..bbe42f4 100644
--- a/drivers/staging/pohmelfs/path_entry.c
+++ b/drivers/staging/pohmelfs/path_entry.c
@@ -83,10 +83,11 @@ out:
 int pohmelfs_path_length(struct pohmelfs_inode *pi)
 {
 	struct dentry *d, *root, *first;
-	int len = 1; /* Root slash */
+	int len;
+	unsigned seq;
 
-	first = d = d_find_alias(&pi->vfs_inode);
-	if (!d) {
+	first = d_find_alias(&pi->vfs_inode);
+	if (!first) {
 		dprintk("%s: ino: %llu, mode: %o.\n", __func__, pi->ino, pi->vfs_inode.i_mode);
 		return -ENOENT;
 	}
@@ -95,6 +96,11 @@ int pohmelfs_path_length(struct pohmelfs_inode *pi)
 	root = dget(current->fs->root.dentry);
 	spin_unlock(&current->fs->lock);
 ...
From: Nick Piggin
Date: Saturday, November 27, 2010 - 2:44 am

Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list
from concurrent modification. d_alias is also protected by d_lock.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
---
 fs/9p/vfs_inode.c      |    2 +
 fs/affs/amigaffs.c     |    2 +
 fs/cifs/inode.c        |    3 ++
 fs/dcache.c            |   66 ++++++++++++++++++++++++++++++++++++++++++------
 fs/exportfs/expfs.c    |    4 +++
 fs/nfs/getroot.c       |    4 +++
 fs/notify/fsnotify.c   |    2 +
 fs/ocfs2/dcache.c      |    3 +-
 include/linux/dcache.h |    1 +
 9 files changed, 78 insertions(+), 9 deletions(-)

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 34bf71b..47dfd5d 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -271,9 +271,11 @@ static struct dentry *v9fs_dentry_from_dir_inode(struct inode *inode)
 	struct dentry *dentry;
 
 	spin_lock(&dcache_lock);
+	spin_lock(&dcache_inode_lock);
 	/* Directory should have only one entry. */
 	BUG_ON(S_ISDIR(inode->i_mode) && !list_is_singular(&inode->i_dentry));
 	dentry = list_entry(inode->i_dentry.next, struct dentry, d_alias);
+	spin_unlock(&dcache_inode_lock);
 	spin_unlock(&dcache_lock);
 	return dentry;
 }
diff --git a/fs/affs/amigaffs.c b/fs/affs/amigaffs.c
index 7d0f0a3..2321cc9 100644
--- a/fs/affs/amigaffs.c
+++ b/fs/affs/amigaffs.c
@@ -129,6 +129,7 @@ affs_fix_dcache(struct dentry *dentry, u32 entry_ino)
 	struct list_head *head, *next;
 
 	spin_lock(&dcache_lock);
+	spin_lock(&dcache_inode_lock);
 	head = &inode->i_dentry;
 	next = head->next;
 	while (next != head) {
@@ -139,6 +140,7 @@ affs_fix_dcache(struct dentry *dentry, u32 entry_ino)
 		}
 		next = next->next;
 	}
+	spin_unlock(&dcache_inode_lock);
 	spin_unlock(&dcache_lock);
 }
 
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index ef3a55b..0ee1767 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -805,12 +805,15 @@ inode_has_hashed_dentries(struct inode *inode)
 	struct dentry *dentry;
 
 ...
From: Nick Piggin
Date: Saturday, November 27, 2010 - 2:44 am

Change d_hash so it may be called from lock-free RCU lookups. See similar
patch for d_compare for details.

For in-tree filesystems, this is just a mechanical change.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
---
 Documentation/filesystems/Locking |    5 +++--
 Documentation/filesystems/porting |    7 +++++++
 Documentation/filesystems/vfs.txt |    8 ++++++--
 fs/adfs/dir.c                     |    3 ++-
 fs/affs/namei.c                   |   20 ++++++++++++--------
 fs/cifs/dir.c                     |    5 +++--
 fs/cifs/readdir.c                 |    2 +-
 fs/dcache.c                       |    2 +-
 fs/ecryptfs/inode.c               |    4 ++--
 fs/fat/namei_msdos.c              |    3 ++-
 fs/fat/namei_vfat.c               |    8 +++++---
 fs/gfs2/dentry.c                  |    3 ++-
 fs/hfs/hfs_fs.h                   |    3 ++-
 fs/hfs/string.c                   |    3 ++-
 fs/hfsplus/hfsplus_fs.h           |    3 ++-
 fs/hfsplus/unicode.c              |    3 ++-
 fs/hpfs/dentry.c                  |    3 ++-
 fs/isofs/inode.c                  |   28 ++++++++++++++++++----------
 fs/jfs/namei.c                    |    3 ++-
 fs/namei.c                        |    5 +++--
 fs/ncpfs/dir.c                    |   10 +++++-----
 fs/sysv/namei.c                   |    3 ++-
 include/linux/dcache.h            |    3 ++-
 23 files changed, 88 insertions(+), 49 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index fc5e1b7..5bceb19 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -10,7 +10,8 @@ be able to use diff(1).
 --------------------------- dentry_operations --------------------------
 prototypes:
 	int (*d_revalidate)(struct dentry *, int);
-	int (*d_hash) (struct dentry *, struct qstr *);
+	int (*d_hash)(const struct dentry *, const struct inode *,
+			struct qstr *);
 	int (*d_compare)(const struct dentry *,
 			const struct dentry *, const struct inode *,
 ...
From: Nick Piggin
Date: Saturday, November 27, 2010 - 2:44 am

Change d_compare so it may be called from lock-free RCU lookups. This
does put significant restrictions on what may be done from the callback,
however there don't seem to have been any problems with in-tree fses.
If some strange use case pops up that _really_ cannot cope with the
rcu-walk rules, we can just add new rcu-unaware callbacks, which would
cause name lookup to drop out of rcu-walk mode.

For in-tree filesystems, this is just a mechanical change.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
---
 Documentation/filesystems/Locking |    4 +-
 Documentation/filesystems/porting |    7 +++
 Documentation/filesystems/vfs.txt |   25 +++++++++-
 fs/adfs/dir.c                     |    8 ++-
 fs/affs/namei.c                   |   44 +++++++++++--------
 fs/cifs/dir.c                     |   11 +++--
 fs/dcache.c                       |    4 +-
 fs/fat/namei_msdos.c              |   15 ++++---
 fs/fat/namei_vfat.c               |   39 +++++++++++-----
 fs/hfs/hfs_fs.h                   |    4 +-
 fs/hfs/string.c                   |   14 +++---
 fs/hfsplus/hfsplus_fs.h           |    4 +-
 fs/hfsplus/unicode.c              |   14 +++---
 fs/hpfs/dentry.c                  |   21 +++++---
 fs/isofs/inode.c                  |   88 ++++++++++++++++++-------------------
 fs/isofs/namei.c                  |    3 +-
 fs/jfs/namei.c                    |   10 +++--
 fs/ncpfs/dir.c                    |   29 ++++++++----
 fs/ncpfs/ncplib_kernel.h          |    8 ++--
 fs/proc/proc_sysctl.c             |   12 +++---
 include/linux/dcache.h            |   12 ++---
 include/linux/ncp_fs.h            |    4 +-
 22 files changed, 228 insertions(+), 152 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index a91f308..fc5e1b7 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -11,7 +11,9 @@ be able to use diff(1).
 prototypes:
 	int (*d_revalidate)(struct dentry *, int);
 	int (*d_hash) (struct dentry ...
From: Nick Piggin
Date: Saturday, November 27, 2010 - 2:44 am

Add a new lock, dcache_hash_lock, to protect the dcache hash table from
concurrent modification. d_hash is also protected by d_lock.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
---
 fs/dcache.c            |   38 +++++++++++++++++++++++++++-----------
 include/linux/dcache.h |    3 +++
 2 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 4f9ccbe..50c65c7 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -35,12 +35,27 @@
 #include <linux/hardirq.h>
 #include "internal.h"
 
+/*
+ * Usage:
+ * dcache_hash_lock protects dcache hash table
+ *
+ * Ordering:
+ * dcache_lock
+ *   dentry->d_lock
+ *     dcache_hash_lock
+ *
+ * if (dentry1 < dentry2)
+ *   dentry1->d_lock
+ *     dentry2->d_lock
+ */
 int sysctl_vfs_cache_pressure __read_mostly = 100;
 EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
 
- __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lock);
+__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_hash_lock);
+__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lock);
 __cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
 
+EXPORT_SYMBOL(dcache_hash_lock);
 EXPORT_SYMBOL(dcache_lock);
 
 static struct kmem_cache *dentry_cache __read_mostly;
@@ -1195,7 +1210,9 @@ struct dentry *d_obtain_alias(struct inode *inode)
 	tmp->d_flags |= DCACHE_DISCONNECTED;
 	tmp->d_flags &= ~DCACHE_UNHASHED;
 	list_add(&tmp->d_alias, &inode->i_dentry);
+	spin_lock(&dcache_hash_lock);
 	hlist_add_head(&tmp->d_hash, &inode->i_sb->s_anon);
+	spin_unlock(&dcache_hash_lock);
 	spin_unlock(&tmp->d_lock);
 
 	spin_unlock(&dcache_lock);
@@ -1581,7 +1598,9 @@ void d_rehash(struct dentry * entry)
 {
 	spin_lock(&dcache_lock);
 	spin_lock(&entry->d_lock);
+	spin_lock(&dcache_hash_lock);
 	_d_rehash(entry);
+	spin_unlock(&dcache_hash_lock);
 	spin_unlock(&entry->d_lock);
 	spin_unlock(&dcache_lock);
 }
@@ -1661,8 +1680,6 @@ static void switch_names(struct dentry *dentry, struct dentry *target)
  */
 static void d_move_locked(struct ...
From: Dave Chinner
Date: Wednesday, December 8, 2010 - 11:09 pm

What locking is used to keep DCACHE_UNHASHED/d_unhashed() in check
with the whether the dentry is on the hash list or not? It looks to
me that to make any hash modification, you have to hold both the
dentry->d_lock and the dcache_hash_lock to keep them in step. If

Perhaps the places where we need to lock two dentries should use a
wrapper like we do for other objects. Such as:

void dentry_dlock_two(struct dentry *d1, struct dentry *d2)
{
	if (d1 < d2) {
		spin_lock(&d1->d_lock);
		spin_lock_nested(&d2->d_lock, DENTRY_D_LOCK_NESTED);
	} else {
		spin_lock(&d2->d_lock);
		spin_lock_nested(&d1->d_lock, DENTRY_D_LOCK_NESTED);
	}

Shouldn't we really kill _d_rehash() by replacing all the callers
with direct calls to __d_rehash() first? There doesn't seem to be much

Un-inline __d_drop so you don't need to make the dcache_hash_lock
visible outside of fs/dcache.c. That happens later in the series
anyway, so may as well do it now...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--

From: Nick Piggin
Date: Wednesday, December 8, 2010 - 11:28 pm

No, dcache_lock still does that. d_unhashed is protected with

It only happens once in rename, so I don't think it's useful.

No. Several filesystems are using it, and it's an exported symbol. I'm
focusing on changed to locking, and keeping APIs the same, where
possible. I don't want just more and more depencendies on pushing
through filesystem changes before this series.

Like I said, there are infinite cleanups or improvements you can make.
It does not particularly matter that they happen before or after the
scaling work, except if there are classes of APIs that the new locking

Yeah that makes sense.

Thanks,
Nick

--

From: Dave Chinner
Date: Thursday, December 9, 2010 - 1:17 am

d_unhashed() is just a flag bit in ->d_flags, which is apparently
protected by ->d_lock in the next the LRU patch. I say apparently
because that patch doesn't appear to protect anything to do with
d_flags. I'm struggling to work out what is being protected in each
patch because it is not clearexactly what is being done.

It makes much more sense to me to start by making all access to
d_flags atomic (i.e. protected by ->d_lock) in a separate patch than
to do it hodge-podge across multiple patches as you currently are.
Its hard to follow when things that are intimately related are



That's __d_rehash().

_d_rehash() (single underscore) is static and only called by
d_rehash() and d_materialise_unique() And is one line of code

We do plenty of cleanups when changing code when the result gives us
simpler and easier to understand code. It's a trivial change that,
IMO, makes the code more consistent and easier to follow.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--

From: Nick Piggin
Date: Thursday, December 9, 2010 - 5:53 am

No, d_flags in upstream kernel is protected by d_lock (and also you

Well hopefully the code and changelog is more clear. The lockorder
doc changes probably aren't a good source for a running commentary.
Not that it's wrong, it just may have a few nits like this where an existing

It might be easier to follow if you know d_flags is already protected
by d_lock.

The d_unhashed patch uses d_lock to keep _both_ the d_flags and
the hash list membership status in sync. It does not make the d_flags

Anyway, cleanup. It can equally well be done before or after, and seeing as
we're being nice and not breaking my tree with minutiae, can you base it on


That doesn't belong in this patch either, it's shuffling. Anyway I like

Unrelated "cleanups" in the same patch as non trivial locking change
is stupid.

Necessary changes to prevent bad ugliness resulting, or preventing
repeated steps for the particular changes, etc. of course. Killing un
related functions no.
--

From: Dave Chinner
Date: Thursday, December 9, 2010 - 4:42 pm

So put it in another prepartory patch. It makes the locking changes

Ok, I get the picture. You don't want a code review, you want a
rubber stamp. Find someone else to get it from.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--

From: Nick Piggin
Date: Thursday, December 9, 2010 - 7:35 pm

I didn't change that, though, the ordering of locking unrelated
dentries and the code is already in rename code and is not touched

Of course I want code review. I am not going to just do everything
you say that I don't agree with, but I will explain why every time
(as I have done to all your points).

I would prefer more in-depth review than from someone who doesn't know
d_lock protects d_flags, but any and all help is welcome. Even minor
nitpicking or cleanups are welcome if they are relevant to the patches.

Thanks,
Nick

PS. don't accuse me of not wanting a code review, because you're just
projecting. To paraphrase you:

 I don't have to justify myself to you, nick, only the maintainers, so
 I'm not answering.

In response to my questions.

--

From: Dave Chinner
Date: Friday, December 10, 2010 - 2:01 am

Which generally comes down to "I disagree with you". That's hard to
argue against because you aren't willing to compromise.

So, to address your next comment, I'll restate what I was proposing.
That is, to ensure all the d_flags accesses protected by d_lock as
an initial patch rather than cleaning it up in an ad-hoc fashion
later on, such as this later patch in your series:

[PATCH 14/46] fs: dcache scale d_unhashed

which has the description:

	Protect d_unhashed(dentry) condition with d_lock.

which illustrates my point that not all accesses to d_flags are

Your implication about my competence is incorrect and entirely
inappropriate.  Ad hominen attacks don't improve your argument or

If _you_ decide they are relevant.

Nick, in the past couple of months you've burnt everyone who has
tried to review your changes in any meaningful way. Nobody wants to
engage with you because you've aggressively disagreed with every
significant change that has been requested. You have shown no desire
to compromise, instead you argue that you are right until you've had
the last word, and you have frequently resorted to condesending and
disrespectful attacks on reviewers. You would do well to keep that
in mind next time you wonder why nobody is stepping up to review
your code.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--

From: Nick Piggin
Date: Sunday, December 12, 2010 - 9:48 pm

I have been trying to explain my reasoning. For example, the suggestion
to change _d_rehash and to put by-address ordering of locking 2 dentries
in its own function I simply said that I don't want to pull in such
changes because they're not related or really touched by the patches.

I think that's reasonable, and so if I have a reasonable objection to a

It depends what you mean by accesses to d_flags.

No, not all of them are, because there are in fact some cases
where d_flags is read without any locking, when races don't matter
or aren't applicable.

But all writes to d_flags, in code where the dentry is live and
there can be concurrent writes to d_flags *are* protected by d_lock.

d_unhashed() is defined to:
 Returns true if the dentry passed is not currently hashed.

So what I have called the d_unhashed condition, I mean the combination
of DCACHE_UNHASHED and dentry membership on the hash list.

I'll improve that changelog because now you've brought it to my

Well you said that my patch adds d_flags protection in bits and
pieces in a random manner. d_flags is already protected by d_lock

Well you keep escalating it too, like you swore at me when I try
several times to explain an issue.



This is exactly what you and Christoph did, to me, actually. And you're
wrong, nobody was reviewing my code long before that little episode. I
certainly did compromise with Al, regarding the merging of the inode
lock stuff, and although I disagreed with some parts, I said OK fine.

You can't seem to concede a single time that I am right or have a valid
point. The best you can possibly manage to to go silent (and then maybe
bring it up again a few weeks later). This is perhaps why I appear so
insolent, because when I disagree with you, I'm wrong so my reasoning
must be irrelevant. It just keeps happening (recently again with the vfs
percpu counters thread).

So as far as I can see, there never was a bridge there to begin with. I
wish we could work together because I don't in ...
From: Nick Piggin
Date: Sunday, December 12, 2010 - 10:05 pm

I'll also just add that if you don't like ad-hominen attacks, you
shouldn't have made implications about my integrity and honesty by
accusing me of wanting a rubber stamp, rather than a real review.
Which was before I suggested that you were confused about d_flags
locking, you'll note.

--

From: Nick Piggin
Date: Saturday, November 27, 2010 - 2:44 am

Remove dcache_lock locking from hostfs filesystem, and move it into dcache
helpers. All that should really be required is a coherent path name, protection
from concurrent modification is not provided outside path name generation
because dcache_lock is dropped before the path is used.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
---
 fs/dcache.c             |   15 +++++++++++++--
 fs/hostfs/hostfs_kern.c |   24 ++++++++++--------------
 include/linux/dcache.h  |    2 +-
 3 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 9fd5180..4f9ccbe 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2140,7 +2140,7 @@ char *dynamic_dname(struct dentry *dentry, char *buffer, int buflen,
 /*
  * Write full pathname from the root of the filesystem into the buffer.
  */
-char *__dentry_path(struct dentry *dentry, char *buf, int buflen)
+static char *__dentry_path(struct dentry *dentry, char *buf, int buflen)
 {
 	char *end = buf + buflen;
 	char *retval;
@@ -2167,7 +2167,18 @@ char *__dentry_path(struct dentry *dentry, char *buf, int buflen)
 Elong:
 	return ERR_PTR(-ENAMETOOLONG);
 }
-EXPORT_SYMBOL(__dentry_path);
+
+char *dentry_path_raw(struct dentry *dentry, char *buf, int buflen)
+{
+	char *retval;
+
+	spin_lock(&dcache_lock);
+	retval = __dentry_path(dentry, buf, buflen);
+	spin_unlock(&dcache_lock);
+
+	return retval;
+}
+EXPORT_SYMBOL(dentry_path_raw);
 
 char *dentry_path(struct dentry *dentry, char *buf, int buflen)
 {
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index cfe8bc7..39dc505 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -92,12 +92,10 @@ __uml_setup("hostfs=", hostfs_args,
 
 static char *__dentry_name(struct dentry *dentry, char *name)
 {
-	char *p = __dentry_path(dentry, name, PATH_MAX);
+	char *p = dentry_path_raw(dentry, name, PATH_MAX);
 	char *root;
 	size_t len;
 
-	spin_unlock(&dcache_lock);
-
 	root = dentry->d_sb->s_fs_info;
 	len = strlen(root);
 	if ...
From: Nick Piggin
Date: Saturday, November 27, 2010 - 2:44 am

Use vfat's method for dealing with negative dentries to preserve case,
rather than overwrite dentry name in d_revalidate, which is a bit ugly
and also gets in the way of doing lock-free path walking.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
---
 fs/jfs/namei.c |   43 +++++++++++++++++++++++++++++++++++--------
 1 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/fs/jfs/namei.c b/fs/jfs/namei.c
index 231ca4a..2da1546 100644
--- a/fs/jfs/namei.c
+++ b/fs/jfs/namei.c
@@ -18,6 +18,7 @@
  */
 
 #include <linux/fs.h>
+#include <linux/namei.h>
 #include <linux/ctype.h>
 #include <linux/quotaops.h>
 #include <linux/exportfs.h>
@@ -1597,21 +1598,47 @@ static int jfs_ci_compare(struct dentry *dir, struct qstr *a, struct qstr *b)
 			goto out;
 	}
 	result = 0;
+out:
+	return result;
+}
 
+static int jfs_ci_revalidate(struct dentry *dentry, struct nameidata *nd)
+{
 	/*
-	 * We want creates to preserve case.  A negative dentry, a, that
-	 * has a different case than b may cause a new entry to be created
-	 * with the wrong case.  Since we can't tell if a comes from a negative
-	 * dentry, we blindly replace it with b.  This should be harmless if
-	 * a is not a negative dentry.
+	 * This is not negative dentry. Always valid.
+	 *
+	 * Note, rename() to existing directory entry will have ->d_inode,
+	 * and will use existing name which isn't specified name by user.
+	 *
+	 * We may be able to drop this positive dentry here. But dropping
+	 * positive dentry isn't good idea. So it's unsupported like
+	 * rename("filename", "FILENAME") for now.
 	 */
-	memcpy((unsigned char *)a->name, b->name, a->len);
-out:
-	return result;
+	if (dentry->d_inode)
+		return 1;
+
+	/*
+	 * This may be nfsd (or something), anyway, we can't see the
+	 * intent of this. So, since this can be for creation, drop it.
+	 */
+	if (!nd)
+		return 0;
+
+	/*
+	 * Drop the negative dentry, in order to make sure to use the
+	 * case sensitive name which is specified by user ...
From: Nick Piggin
Date: Saturday, November 27, 2010 - 2:44 am

Use vfat's method for dealing with negative dentries to preserve case,
rather than overwrite dentry name in d_revalidate, which is a bit ugly
and also gets in the way of doing lock-free path walking.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
---
 fs/cifs/dir.c |   43 ++++++++++++++++++++++++-------------------
 1 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c
index 3840edd..521d841 100644
--- a/fs/cifs/dir.c
+++ b/fs/cifs/dir.c
@@ -656,22 +656,34 @@ lookup_out:
 static int
 cifs_d_revalidate(struct dentry *direntry, struct nameidata *nd)
 {
-	int isValid = 1;
-
 	if (direntry->d_inode) {
 		if (cifs_revalidate_dentry(direntry))
 			return 0;
-	} else {
-		cFYI(1, "neg dentry 0x%p name = %s",
-			 direntry, direntry->d_name.name);
-		if (time_after(jiffies, direntry->d_time + HZ) ||
-			!lookupCacheEnabled) {
-			d_drop(direntry);
-			isValid = 0;
-		}
+		else
+			return 1;
 	}
 
-	return isValid;
+	/*
+	 * This may be nfsd (or something), anyway, we can't see the
+	 * intent of this. So, since this can be for creation, drop it.
+	 */
+	if (!nd)
+		return 0;
+
+	/*
+	 * Drop the negative dentry, in order to make sure to use the
+	 * case sensitive name which is specified by user if this is
+	 * for creation.
+	 */
+	if (!(nd->flags & (LOOKUP_CONTINUE | LOOKUP_PARENT))) {
+		if (nd->flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET))
+			return 0;
+	}
+
+	if (time_after(jiffies, direntry->d_time + HZ) || !lookupCacheEnabled)
+		return 0;
+
+	return 1;
 }
 
 /* static int cifs_d_delete(struct dentry *direntry)
@@ -709,15 +721,8 @@ static int cifs_ci_compare(struct dentry *dentry, struct qstr *a,
 	struct nls_table *codepage = CIFS_SB(dentry->d_inode->i_sb)->local_nls;
 
 	if ((a->len == b->len) &&
-	    (nls_strnicmp(codepage, a->name, b->name, a->len) == 0)) {
-		/*
-		 * To preserve case, don't let an existing negative dentry's
-		 * case take precedence.  If a is not a negative dentry, this
-		 * ...
From: Nick Piggin
Date: Saturday, November 27, 2010 - 2:44 am

d_validate has been broken for a long time.

kmem_ptr_validate does not guarantee that a pointer can be dereferenced
if it can go away at any time. Even rcu_read_lock doesn't help, because
the pointer might be queued in RCU callbacks but not executed yet.

So the parent cannot be checked, nor the name hashed. The dentry pointer
can not be touched until it can be verified under lock. Hashing simply
cannot be used.

Instead, verify the parent/child relationship by traversing parent's
d_child list. It's slow, but only ncpfs and the destaged smbfs care
about it, at this point.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
---
 fs/dcache.c |   25 +++++++------------------
 1 files changed, 7 insertions(+), 18 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index cc2b938..9d1a59d 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1483,41 +1483,30 @@ out:
 }
 
 /**
- * d_validate - verify dentry provided from insecure source
+ * d_validate - verify dentry provided from insecure source (deprecated)
  * @dentry: The dentry alleged to be valid child of @dparent
  * @dparent: The parent dentry (known to be valid)
  *
  * An insecure source has sent us a dentry, here we verify it and dget() it.
  * This is used by ncpfs in its readdir implementation.
  * Zero is returned in the dentry is invalid.
+ *
+ * This function is slow for big directories, and deprecated, do not use it.
  */
- 
 int d_validate(struct dentry *dentry, struct dentry *dparent)
 {
-	struct hlist_head *base;
-	struct hlist_node *lhp;
-
-	/* Check whether the ptr might be valid at all.. */
-	if (!kmem_ptr_validate(dentry_cache, dentry))
-		goto out;
-
-	if (dentry->d_parent != dparent)
-		goto out;
+	struct dentry *child;
 
 	spin_lock(&dcache_lock);
-	base = d_hash(dparent, dentry->d_name.hash);
-	hlist_for_each(lhp,base) { 
-		/* hlist_for_each_entry_rcu() not required for d_hash list
-		 * as it is parsed under dcache_lock
-		 */
-		if (dentry == hlist_entry(lhp, struct dentry, d_hash)) ...
From: Dave Chinner
Date: Tuesday, December 7, 2010 - 6:53 pm

I'd drop the previous revert patch and just convert the RCU hash
traversal straight to the d_child traversal code you introduce here.
This is a much better explanation of why the d_validate mechanism
needs to be changed, and the revert is really an unnecessary extra
step...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--

From: Nick Piggin
Date: Tuesday, December 7, 2010 - 11:59 pm

Has to be backported, though. Patch that is to be reverted obviously
adds more brokenness and is a good example that you cannot dget() under
rcu read protection even if the rest of the surrounding function is
bugfree. I wouldn't have thought it's a big deal.



--

From: Dave Chinner
Date: Wednesday, December 8, 2010 - 5:50 pm

Reverting something broken to something already broken just to fix
to the less broken version seems like an unnecessary step. Just
fix the brokenneѕs in a single patch - no need to indirect the real
fix through a revert. One less patch to worry about.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--

From: Nick Piggin
Date: Wednesday, December 8, 2010 - 9:50 pm

Backported to stable/distro kernels I suppose. I'm not sure what your

OK but I disagree. Firstly, reverting that patch gives a good record of
that particular pattern of bug (that Christoph and Al both missed).
With more RCU going into the vfs, people need to be pretty clear about
the pitfalls.

Secondly, as I said, reverting means that I can use exact same patch
for upstream and stable kernels.

And finally, it gives better bisectability. If somebody hits a bug in
my patch, I would rather have them bisect into the well-worn (if buggy)
version of the code than bisect into a different type of brokenness.

It isn't indirecting the real fix through a revert, they are broken in
different ways. My fix is for the bug that it doesn't guarantee the
persistence of *memory* we are using, and the revert is for the bug that
it doesn't guarantee the persistence/validity of the *object*, and which
is actually more likely to be a problem if you think about it, because
the window is much larger.

Git has no problem with lots of patches, so I don't see any advantage
to doing one patch, and you lose the advantages above.


--

From: Nick Piggin
Date: Saturday, November 27, 2010 - 2:56 am

This reverts commit 3825bdb7ed920845961f32f364454bee5f469abb.

Patch is broken, you can't dget() without holding any locks!

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
---
 fs/dcache.c |   31 +++++++++++++++++++------------
 1 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 23702a9..cc2b938 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1491,26 +1491,33 @@ out:
  * This is used by ncpfs in its readdir implementation.
  * Zero is returned in the dentry is invalid.
  */
-int d_validate(struct dentry *dentry, struct dentry *parent)
+ 
+int d_validate(struct dentry *dentry, struct dentry *dparent)
 {
-	struct hlist_head *head = d_hash(parent, dentry->d_name.hash);
-	struct hlist_node *node;
-	struct dentry *d;
+	struct hlist_head *base;
+	struct hlist_node *lhp;
 
 	/* Check whether the ptr might be valid at all.. */
 	if (!kmem_ptr_validate(dentry_cache, dentry))
-		return 0;
-	if (dentry->d_parent != parent)
-		return 0;
+		goto out;
 
-	rcu_read_lock();
-	hlist_for_each_entry_rcu(d, node, head, d_hash) {
-		if (d == dentry) {
-			dget(dentry);
+	if (dentry->d_parent != dparent)
+		goto out;
+
+	spin_lock(&dcache_lock);
+	base = d_hash(dparent, dentry->d_name.hash);
+	hlist_for_each(lhp,base) { 
+		/* hlist_for_each_entry_rcu() not required for d_hash list
+		 * as it is parsed under dcache_lock
+		 */
+		if (dentry == hlist_entry(lhp, struct dentry, d_hash)) {
+			__dget_locked(dentry);
+			spin_unlock(&dcache_lock);
 			return 1;
 		}
 	}
-	rcu_read_unlock();
+	spin_unlock(&dcache_lock);
+out:
 	return 0;
 }
 EXPORT_SYMBOL(d_validate);
-- 
1.7.1

--

From: Dave Chinner
Date: Tuesday, December 7, 2010 - 6:16 pm

I believe you can - for the same reasons we can take a reference to
an inode without holding the inode_lock. That is, as long as the
caller already holds an active reference to the dentry,
dget() can be used to take another reference without needing the
dcache_lock.

Such usage appears to be described in the comment above dget() and
there's a BUG_ON() in dget() to catch callers that don't already
have an active reference. An example of a valid unlocked dget():
d_alloc() does an unlocked dget() to take a reference to the parent
dentry which we already are guaranteed to have a reference to.

As to d_validate() - it depends on the caller behaviour as to
whether the unlocked dget() is valid or not.  From a cursory check
of the NCP and SMB readdir caches, both appear to hold an active
reference to the dentry it is passing to d_validate(). If that is
the case then there is nothing wrong with the way d_validate uses
dget(). Can someone with more SMB/NCP expertise than me validate the
use of cached dentries?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--

From: Nick Piggin
Date: Wednesday, December 8, 2010 - 2:38 am

I don't see where? Can you point to where the refcount is taken?
AFAIKS it drops the reference 3 lines after it puts the pointer

Then why would it have to use d_validate if it has a reference?
That is supposed to be for an "untrusted" pointer (which is why
it had all the crazy checks that it's in kmem and in the right
slab etc).

--

From: Dave Chinner
Date: Wednesday, December 8, 2010 - 5:44 pm

Right, so the commit message is wrong. Can you update it to tell us why
dget() can't be used there - the commit message from the second

Yeah, you're right, I missed that one - I spent more tiem checking
the validation part of the code than the initial insertion. Hence

Code changes. It may not be doing what it was originally
needed/intended to be doing - I don't need to waste time on code
archeology and second guessing when there are others around that can
tell me this off the top oftheir head. ;)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--

From: Nick Piggin
Date: Wednesday, December 8, 2010 - 9:38 pm

I suppose if you're not reading it in the context of d_validate,
then yes. And as an historical record, I'll clarify.

Obviously if we do have a reference, then we can take another,
and if we don't, then we need more than RCU because RCU only
provides persistence guarantee for the memory, not any persistence


Well the d_validate API is meant to provide that, so it's broken
whether or not its callers use it correctly. It's also exported
to external modules...

Yes we should remove smbfs and rip the cache out of ncpfs and
remove d_validate entirely when possible (or, provide a more
reasonable API and caching library entirely in the dcache code
that a filesystem might use). But this is the right first step.


--

From: Nick Piggin
Date: Wednesday, December 8, 2010 - 10:16 pm

The reason why I don't remove the caching crap entirely is because
it is not a trivial change to properly remove it, and I don't have
the time or inclination to do it properly and have it tested when
it's easy to provide a correct, simple, and back compatible API.

The patch Christoph posted for smbfs and ncpfs just hacks out where the
dentry is used from the cache, but leaves a lot of cache infrastructure

For the record, if anyone cares, the sanest and simplest way to do this
would be to store the dentry's hash along with its pointer, and have
interfaces to allocate, destroy, insert, and delete from cache, and
lookup based on a supplied compare function. (but obviously we shouldn't
bother unless some good numbers turn up)

--

From: Anca Emanuel
Date: Saturday, November 27, 2010 - 8:04 am

I get:
  CC [M]  fs/cifs/inode.o
fs/cifs/inode.c: In function ‘inode_has_hashed_dentries’:
fs/cifs/inode.c:807: error: ‘dcache_inode_lock’ undeclared (first use
in this function)
fs/cifs/inode.c:807: error: (Each undeclared identifier is reported only once
fs/cifs/inode.c:807: error: for each function it appears in.)
make[3]: *** [fs/cifs/inode.o] Error 1
make[2]: *** [fs/cifs] Error 2
make[1]: *** [fs] Error 2

I used the latest mainline.
--

From: Nick Piggin
Date: Saturday, November 27, 2010 - 8:28 pm

Sorry, missed a conversion, it just needs to be changed to
inode->i_lock. Pushed the fix to git.

Thanks,
Nick

--

From: Sedat Dilek
Date: Saturday, November 27, 2010 - 11:24 pm

I attached a patch to my posting in [1] but it was somehow "eaten"
(here in my mbox the patch is definitely attached).
Patchwork is also not listing my patch.
Anyway, it's fixed - that's good.

I have to check why I get a Call trace with systemd-v15 but not with
sysvinit package here on Debian.

- Sedat -

[1] http://lkml.org/lkml/2010/11/27/145
From: David Miller
Date: Wednesday, December 1, 2010 - 11:03 am

From: Nick Piggin <npiggin@kernel.dk>

Just want to say that I've been running this code for the past few days on
my 128 cpu box and it seems quite sturdy.

If there are any kinds of vfs benchmarks you want me to run on this
machine both with and without the scaling changes, just let me know.
--

From: Nick Piggin
Date: Friday, December 3, 2010 - 9:55 am

Hi Dave, great, thanks!

I am mostly interested in single-thread performance on different ISAs
and different microarchitectures.

Microbenchmarks are easy, I attached a couple of quick ones I use in
an offline mail.

Single-threaded (preloadindex=false), cached and uncached, git diff on
an unmodified tree is slightly more real world.

And then anything else you can think of.

Thanks,
Nick

--

From: Nick Piggin
Date: Tuesday, December 7, 2010 - 8:24 am

So I vfs-scale-working branch may not be entirely in the clear, seeing
as it touches
the code lower in the call chain. However I don't know what can cause
lockdep to go off
the rails like this.

There is a sequence I used to hack around lockdep nesting
restrictions, following this
pattern:

 repeat:
    spin_lock(&parent->d_lock);
    spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
    /* do stuff */
    spin_unlock(&parent->d_lock);
    spin_release(&dentry->d_lock.dep_map, 1, _RET_IP_);
    parent = dentry;
    spin_acquire(&this_parent->d_lock.dep_map, 0, 1, _RET_IP_);
    goto repeat;

It's not directly in this call chain, but I wonder if it could have
caused any problem?
--

From: Peter Zijlstra
Date: Tuesday, December 7, 2010 - 8:49 am

shouldn't that be s/this_parent/parent/ ?

So what you're trying to do is:

  A -> B -> C -> ...

lock A
lock B, nested
unlock A
flip B from nested to top
lock C, nested
unlock B
flip C from nested to top
lock ...

Anyway, the way to write that is something like:

  lock_set_subclass(&detry->d_lock.dep_map, 0, _RET_IP_);

Which will reset the subclass of the held lock from DENTRY_D_LOCK_NESTED
to 0.

This is also used in double_unlock_balance(), we go into
double_lock_balance() with this_rq locked and want to lock busiest,
because of the lock ordering we might need to unlock this_rq and lock
busiest first, at which point this_rq is nested.

On unlock we thus need to map this_rq back to subclass 0 (which it had
before double_lock_balance(), because otherwise subsequent lock
operations will be done against the subclass and confuse things.


--

From: Nick Piggin
Date: Tuesday, December 7, 2010 - 8:59 am

OK, thanks. My version should not have caused any problems though,

Thanks,
Nick
--

From: Peter Zijlstra
Date: Tuesday, December 7, 2010 - 9:23 am

I tihnk so, yes, altough looking at it again I wonder why you use
spin_aquire(.trylock=1) -- but that too shouldn't cause anything like

Not directly, no. Usually lockdep crashes indicate use after free like
things, where we try to lock a lock that's been scribbled on. But that
usually explodes a bit earlier.

You faulting in the middle of that breath-first-search does suggest some
data corruption, but I'm not quite sure what kind, I can't remember ever
having seem something like this before.

I've CC'ed Ming Lei who wrote the bfs search, maybe he's got an idea.


Copy of the splat:
--

From: Nick Piggin
Date: Tuesday, December 7, 2010 - 8:28 pm

Ah, may have been a stupid little bug. The list entry check was being done
and then the pointer reloaded to be used.

What does the asm for shrink_dentry_list look like (before this patch)?

Thanks,
Nick

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-12-08 12:41:35.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-12-08 14:23:46.000000000 +1100
@@ -657,10 +657,10 @@ static void shrink_dentry_list(struct li
 	struct dentry *dentry;
 
 	rcu_read_lock();
-	while (!list_empty(list)) {
-		dentry = list_entry(list->prev, struct dentry, d_lru);
-
-		/* Don't need RCU dereference because we recheck under lock */
+	for (;;) {
+		dentry = list_entry_rcu(list->prev, struct dentry, d_lru);
+		if (&dentry->d_lru == list)
+			break;
 		spin_lock(&dentry->d_lock);
 		if (dentry != list_entry(list->prev, struct dentry, d_lru)) {
 			spin_unlock(&dentry->d_lock);
--

From: Dave Chinner
Date: Tuesday, December 7, 2010 - 2:56 pm

Stress test doing:

	single thread 50M inode create
	single thread rm -rf
	2-way 50M inode create
	2-way rm -rf
	4-way 50M inode create
	4-way rm -rf
	8-way 50M inode create
	8-way rm -rf
	8-way 250M inode create
	8-way rm -rf

Failed about 5 minutes into the "4-way rm -rf" (~3 hours into the test)
with a CPU stuck spinning on here:

[37372.084012] NMI backtrace for cpu 5
[37372.084012] CPU 5 
[37372.084012] Modules linked in:
[37372.084012] 
[37372.084012] Pid: 15214, comm: rm Not tainted 2.6.37-rc4-dgc+ #797 /Bochs
[37372.084012] RIP: 0010:[<ffffffff810643c4>]  [<ffffffff810643c4>] __ticket_spin_lock+0x14/0x20
[37372.084012] RSP: 0018:ffff880114643c98  EFLAGS: 00000213
[37372.084012] RAX: 0000000000008801 RBX: ffff8800687be6c0 RCX: ffff8800c4eb2688
[37372.084012] RDX: ffff880114643d38 RSI: ffff8800dfd4ea80 RDI: ffff880114643d14
[37372.084012] RBP: ffff880114643c98 R08: 0000000000000003 R09: 0000000000000000
[37372.084012] R10: 0000000000000000 R11: dead000000200200 R12: ffff880114643d14
[37372.084012] R13: ffff880114643cb8 R14: ffff880114643d38 R15: ffff8800687be71c
[37372.084012] FS:  00007fd6d7c93700(0000) GS:ffff8800dfd40000(0000) knlGS:0000000000000000
[37372.084012] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[37372.084012] CR2: 0000000000bbd108 CR3: 0000000107146000 CR4: 00000000000006e0
[37372.084012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[37372.084012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[37372.084012] Process rm (pid: 15214, threadinfo ffff880114642000, task ffff88011b16f890)
[37372.084012] Stack:
[37372.084012]  ffff880114643ca8 ffffffff81ad044e ffff880114643cf8 ffffffff81167ae7
[37372.084012]  0000000000000000 ffff880114643d38 000000000000000e ffff88011901d800
[37372.084012]  ffff8800cdb7cf5c ffff88011901d8e0 0000000000000000 0000000000000000
[37372.084012] Call Trace:
[37372.084012]  [<ffffffff81ad044e>] _raw_spin_lock+0xe/0x20
[37372.084012]  [<ffffffff81167ae7>] ...
From: Nick Piggin
Date: Tuesday, December 7, 2010 - 6:47 pm

OK good, with any luck, that's the same bug.

Is this XFS? Is there any concurrent activity happening on the same dentries?
Ie. are the rm -rf threads running on the same directories, or is
there any reclaim
happening in the background?

Thanks,
Nick
--

From: Dave Chinner
Date: Tuesday, December 7, 2010 - 8:32 pm

IIRC, kswapd was consuming about 5-10% of a CPU during parallel
unlink tests. Mainly reclaiming XFS inodes, I think, but there may
be dentry cache reclaim going as well.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--

From: Dave Chinner
Date: Tuesday, December 7, 2010 - 9:28 pm

Turns out that the kswapd peaks are upwards of 50% of a CPU for a
few seconds, then idle for 10-15s. Typical perf top output of kswapd
while it is active during unlinks is:

             samples  pcnt function                    DSO
             _______ _____ ___________________________ _________________

            17168.00 10.2% __call_rcu                  [kernel.kallsyms]
            13223.00  7.8% kmem_cache_free             [kernel.kallsyms]
            12917.00  7.6% down_write                  [kernel.kallsyms]
            12665.00  7.5% xfs_iunlock                 [kernel.kallsyms]
            10493.00  6.2% xfs_reclaim_inode_grab      [kernel.kallsyms]
             9314.00  5.5% __lookup_tag                [kernel.kallsyms]
             9040.00  5.4% radix_tree_delete           [kernel.kallsyms]
             8694.00  5.1% is_bad_inode                [kernel.kallsyms]
             7639.00  4.5% __ticket_spin_lock          [kernel.kallsyms]
             6821.00  4.0% _raw_spin_unlock_irqrestore [kernel.kallsyms]
             5484.00  3.2% __d_drop                    [kernel.kallsyms]
             5114.00  3.0% xfs_reclaim_inode           [kernel.kallsyms]
             4626.00  2.7% __rcu_process_callbacks     [kernel.kallsyms]
             3556.00  2.1% up_write                    [kernel.kallsyms]
             3206.00  1.9% _cond_resched               [kernel.kallsyms]
             3129.00  1.9% xfs_qm_dqdetach             [kernel.kallsyms]
             2327.00  1.4% radix_tree_tag_clear        [kernel.kallsyms]
             2327.00  1.4% call_rcu_sched              [kernel.kallsyms]
             2262.00  1.3% __ticket_spin_unlock        [kernel.kallsyms]
             2215.00  1.3% xfs_ilock                   [kernel.kallsyms]
             2200.00  1.3% radix_tree_gang_lookup_tag  [kernel.kallsyms]
             1982.00  1.2% xfs_reclaim_inodes_ag       [kernel.kallsyms]
             1736.00  1.0% xfs_trans_unlocked_item     [kernel.kallsyms]
             1707.00  1.0% ...
From: Nick Piggin
Date: Wednesday, December 8, 2010 - 12:09 am

call_rcu shouldn't be doing much, except for disabling irqs and linking
the object into the list. I have a patch somewhere to reduce the irq
disable overhead a bit, but it really shouldn't be doing a lot of work.

Sometimes you find that touching the rcu head field needs to get a
cacheline exclusive, so a bit of work gets transferred there....

But it may also be something going a bit wrong in RCU. I blew it up
once already, after the files_lock splitup that enabled all CPUs to
create and destroy files :)

--

From: Paul E. McKenney
Date: Friday, December 10, 2010 - 1:32 pm

Could you please enable CONFIG_RCU_TRACE, mount debugfs somewhere, and
look at rcu/rcudata?  There will be a "ql=" number printed for each
CPU, and if that number is too large, __call_rcu() does take what it
considers to be corrective action, which can incur some overhead.

If this is the problem, then increasing the value of the qhimark module
parameter might help.

If this is not the problem, I could make a patch that disables some of
__call_rcu()'s grace-period acceleration code if you are willing to try

I would certainly like the opportunity to fix any bugs that might be
in RCU...  ;-)

							Thanx, Paul
--

From: Paul E. McKenney
Date: Sunday, December 12, 2010 - 7:54 am

Another thing that might help is to reduce the value of CONFIG_RCU_FANOUT
to something like 16.  If this does help, then there is a reasonably
straightforward change I can make to RCU.

--

Previous thread: [PATCH 27/46] fs: reduce dcache_inode_lock width in lru scanning by Nick Piggin on Saturday, November 27, 2010 - 2:44 am. (1 message)

Next thread: Ensure Your Account by bruno.dondero on Saturday, November 27, 2010 - 3:13 am. (1 message)