Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Andrew Morton <akpm@...>
Cc: Chakri n <chakriin5@...>, linux-pm <linux-pm@...>, lkml <linux-kernel@...>, <nfs@...>, Peter Zijlstra <a.p.zijlstra@...>
Date: Friday, September 28, 2007 - 8:46 pm

On Thursday 27 September 2007 23:50, Andrew Morton wrote:

It is not necessary to restrict total dirty pages at all.  Instead it is 
necessary to restrict total writeout in flight.  This is evident from 
the fact that making progress is the one and only reason our kernel 
exists, and writeout is how we make progress clearing memory.  In other 
words, if we guarantee the progress of writeout, we will live happily 
ever after and not have to sell the farm.

The current situation has an eerily similar feeling to the VM 
instability in early 2.4, which was never solved until we convinced 
ourselves that the only way to deal with Moore's law as applied to 
number of memory pages was to implement positive control of swapout in 
the form of reverse mapping[1].  This time round, we need to add 
positive control of writeout in the form of rate limiting.

I _think_ Peter is with me on this, and not only that, but between the 
too of us we already have patches for most of the subsystems that need 
it, and we have both been busy testing (different subsets of) these 
patches to destruction for the better part of a year.

Anyway, to fix the immediate bug before the one true dirty_limit removal 
patch lands (promise) I think you are on the right track by noticing 
that balance_dirty_pages has to become aware of how congested the 
involved block device is, since blocking a writeout process on an 
underused block device is clearly a bad idea.  Note how much this idea 
looks like rate limiting.

[1] We lost the scent for a number of reasons, not least because the 
experimental implementation of reverse mapping at the time was buggy 
for reasons entirely unrelated to the reverse mapping itself.

Regards,

Daniel
-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: A unresponsive file system can hang all I/O in the syste..., Daniel Phillips, (Fri Sep 28, 8:46 pm)
KDB?, Daniel Phillips, (Fri Sep 28, 9:51 pm)
[PATCH] lockstat: documentation, Peter Zijlstra, (Wed Oct 3, 5:28 am)
Re: [PATCH] lockstat: documentation, Ingo Molnar, (Wed Oct 3, 5:35 am)