On Thu, Apr 15, 2010 at 11:28:37AM +0100, Mel Gorman wrote:
vi fs/direct-reclaim-helper.c, it has a few placeholders for where the
real code needs to go....just look for the ~ marks.
I mostly meant that the bdi helper threads were the best place to add
knowledge about which pages we want to write for reclaim. We might need
to add a thread dedicated to just doing the VM's dirty work, but that's
where I would start discussing fancy new interfaces.
I'd like to add one more:
5. Don't dive into filesystem locks during reclaim.
This is different from splicing code paths together, but
the filesystem writepage code has become the center of our attempts at
doing big fat contiguous writes on disk. We push off work as late as we
can until just before the pages go down to disk.
I'll pick on ext4 and btrfs for a minute, just to broaden the scope
outside of XFS. Writepage comes along and the filesystem needs to
actually find blocks on disk for all the dirty pages it has promised to
write.
So, we start a transaction, we take various allocator locks, modify
different metadata, log changed blocks, take a break (logging is hard
work you know, need_resched() triggered a by now), stuff it
all into the file's metadata, log that, and finally return.
Each of the steps above can block for a long time. Ext4 solves
this by not doing them. ext4_writepage only writes pages that
are already fully allocated on disk.
Btrfs is much more efficient at not doing them, it just returns right
away for PF_MEMALLOC.
This is a long way of saying the filesystem writepage code is the
opposite of what direct reclaim wants. Direct reclaim wants to
find free ram now, and if it does end up in the mess describe above,
it'll just get stuck for a long time on work entirely unrelated to
finding free pages.
-chris
--