On Fri, Apr 23, 2010 at 02:38:01AM +1000, Nick Piggin wrote:
Well, I've gone and done this global shrinker because I need a fix
for the problem before .34 releases, not because I like it.
Now my problem is that the accepted method of using global shrinkers
(i.e. split nr_to-scan into portions based on per-fs usage) is
causing a regression compared to not having a shrinker at all. The
context based shrinker did not cause this regression, either.
The regression is oom-killer panics with "no killable tasks" - it
kills my 1GB RAM VM dead. Without a shrinker or with the context
based shrinkers I will see one or two dd processes getting
OOM-killed maybe once every 10 or so runs on this VM, but the machine
continues to stay up. The global shrinker is turning this into a
panic, and it is happening about twice as often.
To fix this I've had to remove all the code that proportions the
reclaim across all the XFS filesystems in the system. Basically it
now walks from the first filesystem in the list to the last every
time and effectively it only reclaims from the first filesystem it
finds with reclaimable inodes.
This is exactly the behaviour the context based shrinkers give me,
without the need for adding global lists, additional locking and
traverses. Also, context based shrinkers won't re-traverse all the
filesystems, avoiding the potential for starving some filesystems of
shrinker based reclaim if filesystems earlier in the list are
putting more inodes into reclaim concurrently.
Given that this behaviour matches pretty closely to the reasons I've
already given for preferring context based per-fs shrinkers than a
global shrinker and list, can we please move forward with this API
change, Nick?
As it is, I'm going to cross my fingers and ship this global
shrinker because of time limitations, but I certainly hoping that
for .35 we can move to context based shrinking....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--