Re: [patch 31/35] fs: icache per-zone inode LRU

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Dave Chinner
Date: Wednesday, October 20, 2010 - 3:19 am

On Wed, Oct 20, 2010 at 02:20:24PM +1100, Nick Piggin wrote:

<sigh>

I don't think anyone wants per-ag X per-zone reclaim lists on a 1024
node machine with a 1,000 AG (1PB) filesystem.

As I have already said, the XFS inode caches are optimised in
structure to minimise IO and maximise internal filesystem
parallelism. They are not optimised for per-cpu or NUMA scalability
because if you don't have filesystem level parallelism, you can't
scale to large numbers of concurrent operations across large numbers
of CPUs in the first place.

In the case of XFS, per-allocation group is the way we scale
internal parallelism and as long as you have more AGs than you have
CPUs, there is very good per-CPU scalability through the filesystem
because most operations are isolated to a single AG.  That is how we
scale parallelism in XFS, and it has proven to scale pretty well for
even the largest of NUMA machines. 

This is what I mean about there being an impedence mismatch between
the way the VM and the VFS/filesystem caches scale. Fundamentally,
the way filesystems want their caches to operate for optimal
performance can be vastly different to the way you want shrinkers to
operate for VM scalability. Forcing the MM way of doing stuff down
into the LRUs and shrinkers is not a good way of solving this
problem.


Having a global lock in a shrinker is already a major point of
contention because shrinkers have unbound parallelism.  Hence all
shrinkers need to be converted to use scalable structures. What we
need _first_ is the infrastructure to do this in a sane manner, not
tie a couple of shrinkers tightly into the mm structures and then
walk away.

And FWIW, most subsystems that use shrinkers can be compiled in as
modules or not compiled in at all. That'll probably leave #ifdef
CONFIG_ crap all through the struct zone definition as they are
converted to use your current method....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[patch 31/35] fs: icache per-zone inode LRU, npiggin, (Mon Oct 18, 8:42 pm)
Re: [patch 31/35] fs: icache per-zone inode LRU, Dave Chinner, (Tue Oct 19, 5:38 am)
Re: [patch 31/35] fs: icache per-zone inode LRU, Nick Piggin, (Tue Oct 19, 7:35 pm)
Re: [patch 31/35] fs: icache per-zone inode LRU, Nick Piggin, (Tue Oct 19, 8:12 pm)
Re: [patch 31/35] fs: icache per-zone inode LRU, KOSAKI Motohiro, (Tue Oct 19, 8:14 pm)
Re: [patch 31/35] fs: icache per-zone inode LRU, Nick Piggin, (Tue Oct 19, 8:20 pm)
Re: [patch 31/35] fs: icache per-zone inode LRU, KOSAKI Motohiro, (Tue Oct 19, 8:29 pm)
Re: [patch 31/35] fs: icache per-zone inode LRU, Dave Chinner, (Wed Oct 20, 2:43 am)
Re: [patch 31/35] fs: icache per-zone inode LRU, Nick Piggin, (Wed Oct 20, 3:02 am)
Re: [patch 31/35] fs: icache per-zone inode LRU, Dave Chinner, (Wed Oct 20, 3:19 am)
Re: [patch 31/35] fs: icache per-zone inode LRU, Nick Piggin, (Wed Oct 20, 3:41 am)