[PATCH 1/5] inode: Make unused inode LRU per superblock

Previous thread: [PATCH 2/5] mm: add context argument to shrinker callback by Dave Chinner on Tuesday, May 25, 2010 - 1:53 am. (1 message)

Next thread: [PATCH 3/5] superblock: introduce per-sb cache shrinker infrastructure by Dave Chinner on Tuesday, May 25, 2010 - 1:53 am. (16 messages)
From: Dave Chinner
Date: Tuesday, May 25, 2010 - 1:53 am

From: Dave Chinner <dchinner@redhat.com>

The inode unused list is currently a global LRU. This does not match
the other global filesystem cache - the dentry cache - which uses
per-superblock LRU lists. Hence we have related filesystem object
types using different LRU reclaimatin schemes.

To enable a per-superblock filesystem cache shrinker, both of these
caches need to have per-sb unused object LRU lists. Hence this patch
converts the global inode LRU to per-sb LRUs.

The patch only does rudimentary per-sb propotioning in the shrinker
infrastructure, as this gets removed when the per-sb shrinker
callouts are introduced later on.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/fs-writeback.c         |    2 +-
 fs/inode.c                |   87 +++++++++++++++++++++++++++++++++++++++-----
 fs/super.c                |    1 +
 include/linux/fs.h        |    4 ++
 include/linux/writeback.h |    1 -
 5 files changed, 83 insertions(+), 12 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 5c4161f..b1e76ef 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -565,7 +565,7 @@ select_queue:
 			/*
 			 * The inode is clean, unused
 			 */
-			list_move(&inode->i_list, &inode_unused);
+			list_move(&inode->i_list, &inode->i_sb->s_inode_lru);
 		}
 	}
 	inode_sync_complete(inode);
diff --git a/fs/inode.c b/fs/inode.c
index 2bee20a..3caa758 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -25,6 +25,7 @@
 #include <linux/mount.h>
 #include <linux/async.h>
 #include <linux/posix_acl.h>
+#include "internal.h"
 
 /*
  * This is needed for the following functions:
@@ -74,7 +75,6 @@ static unsigned int i_hash_shift __read_mostly;
  */
 
 LIST_HEAD(inode_in_use);
-LIST_HEAD(inode_unused);
 static struct hlist_head *inode_hashtable __read_mostly;
 
 /*
@@ -292,6 +292,7 @@ void __iget(struct inode *inode)
 	if (!(inode->i_state & (I_DIRTY|I_SYNC)))
 		list_move(&inode->i_list, &inode_in_use);
 ...
From: Nick Piggin
Date: Wednesday, May 26, 2010 - 9:17 am

Is this an improvement I wonder? The dcache is using per sb lists
because it specifically requires sb traversal.

What allocation/reclaim really wants (for good scalability and NUMA
characteristics) is per-zone lists for these things. It's easy to
convert a single list into per-zone lists.

It is much harder to convert per-sb lists into per-sb x per-zone lists.

--

From: Dave Chinner
Date: Wednesday, May 26, 2010 - 4:01 pm

Right - I originally implemented the per-sb dentry lists for
scalability purposes. i.e. to avoid monopolising the dentry_lock
during unmount looking for dentries on a specific sb and hanging the
system for several minutes.

However, the reason for doing this to the inode cache is not for
scalability, it's because we have a tight relationship between the
dentry and inode cacheѕ. That is, reclaim from the dentry LRU grows
the inode LRU.  Like the registration of the shrinkers, this is kind
of an implicit, undocumented behavour of the current shrinker
implemenation.

What this patch series does is take that implicit relationship and
make it explicit.  It also allows other filesystem caches to tie
into the relationship if they need to (e.g. the XFS inode cache).
What it _doesn't do_ is change the macro level behaviour of the

No it's not. Just convert the s_{dentry,inode}_lru lists on each
superblock and call the shrinker with a new zone mask field to pick
the correct LRU. That's no harder than converting a global LRU.
Anyway, you'd still have to do per-sb x per-zone lists for the dentry LRUs,
so changing the inode cache to per-sb makes no difference.

However, this is a moot point because we don't have per-zone shrinker
interfaces. That's an entirely separate discussion because of the
macro-level behavioural changes it implies....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--

From: Dave Chinner
Date: Wednesday, May 26, 2010 - 9:02 pm

I've done some testing showing parity. They've been along the lines
of:
	- populate cache with 1m dentries + inodes
	- run 'time echo 2 > /proc/sys/vm/drop_caches'

I've used different methods of populating the caches to have them
non-sequential in the LRU (i.e. trigger fragmentation), have dirty
backing inodes (e.g. the VFS inode clean, the xfs inode dirty
because transactions haven't completed), etc.

The variation on the test is around +-10%, with the per-sb shrinkers
averaging about 5% lower time to reclaim. This is within the error
margin of the test, so it's not really a conclusive win, but it is
certainly shows that it does not slow anything down. If you've got a

You've still got to allocate that extra memory on the per-sb dentry
LRUs so it's not really a valid argument. IOWs, if it's too much
memory for per-sb inode LRUs, then it's too much memory for the

The same vague questions wondering about the benefit of per-sb
dentry LRUs were raised when I first proposed them years ago, and
look where we are now.  Besides, focussing on whether this one patch
is a benefit or not is really missing the point because it's the
benefits of this patchset as a whole that need to be considered....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--

From: Nick Piggin
Date: Wednesday, May 26, 2010 - 9:23 pm

I guess the problem is that inode LRU cache isn't very useful as
long as there are dentries in the way (which is most of the time,
isn't it?). I think nfsd will exercise them better? Dont know of

Well it would be per-zone, per-sb list, but I don't think that

Not about how much is too much, it's about more cost or memory

To be fair that is because there were specific needs to do per-sb

I would indeed like to focus on the benefits of the patchset as a
whole. Leaving aside the xfs changes, it would be interesting to
have at least a few numbers for dcache/icache heavy workloads.


--

From: Andrew Morton
Date: Thursday, May 27, 2010 - 1:32 pm

On Tue, 25 May 2010 18:53:04 +1000

It's a shape that s_inode_lru is still protected by inode_lock.  One
day we're going to get in trouble over that lock.  Migrating to a
per-sb lock would be logical and might help.

Did you look into this?  I expect we'd end up taking both inode_lock
and the new sb->lru_lock in several places, which wouldn't be of any
help, at least in the interim.  Long-term, the locking for
fs-writeback.c should move to the per-superblock one also, at which
time this problem largely goes away I think.  Unfortunately the
writeback inode lists got moved into the backing_dev_info, whcih messes

It's regrettable to be counting the same thing twice.  Did you look

May as well fix the typo while we're there.

Please review all these comments to ensure that they are still accurate
and complete.

--

From: Dave Chinner
Date: Thursday, May 27, 2010 - 3:54 pm

Yes, I have. Yes, it's possible.  It's solving a different problem,


Sort of. The complexity is the stats are userspace visible, so they
can't just be removed. Replacing the current stats means that when
they are read from /proc we would need to walk all the superblocks
to aggregate them. The bit I haven't looked at yet is whether
walking superblocks is allowed in a proc handler.

So in the mean time, I just copied what was done for the
dentry_stats. If it's ok to do this walk, then we can change both

Will do.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--

From: Nick Piggin
Date: Friday, May 28, 2010 - 3:07 am

It almost all goes away in my inode lock splitup patches. Inode lru
and dirty lists were the last things protected by the global lock
there.

I am actually going to do per-zone lrus for these guys and per-zone
locks (which is actually better than per-sb because it gives NUMA
scalability within a single sb).

The dirty/writeback lists should probably be per-bdi locked.


--

Previous thread: [PATCH 2/5] mm: add context argument to shrinker callback by Dave Chinner on Tuesday, May 25, 2010 - 1:53 am. (1 message)

Next thread: [PATCH 3/5] superblock: introduce per-sb cache shrinker infrastructure by Dave Chinner on Tuesday, May 25, 2010 - 1:53 am. (16 messages)