Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Chakri n <chakriin5@...>
Cc: linux-pm <linux-pm@...>, lkml <linux-kernel@...>, <nfs@...>, Peter Zijlstra <a.p.zijlstra@...>
Date: Friday, September 28, 2007 - 2:50 am

On Thu, 27 Sep 2007 23:32:36 -0700 "Chakri n" <chakriin5@gmail.com> wrote:


yup.


It's unrelated to the actual value of dirty_thresh: if the machine fills up
with dirty (or unstable) NFS pages then eventually new writers will block
until that condition clears.

2.4 doesn't have this problem at low levels of dirty data because 2.4
VFS/MM doesn't account for NFS pages at all.

I'm not sure what we can do about this from a design perspective, really. 
We have data floating about in memory which we're not allowed to discard
and if we allow it to increase without bound it will eventually either
wedge userspace _anyway_ or it will take the machine down, resulting in
data loss.

What it would be nice to do would be to write that data to local disk if
poss, then reclaim it.  Perhaps David Howells' fscache code can do that (or
could be tweaked to do so).

If you really want to fill all memory with pages whic are dirty against a
dead NFS server then you can manually increase
/proc/sys/vm/dirty_background_ratio and dirty_ratio - that should give you
the 2.4 behaviour.


<thinks>

Actually we perhaps could address this at the VFS level in another way. 
Processes which are writing to the dead NFS server will eventually block in
balance_dirty_pages() once they've exceeded the memory limits and will
remain blocked until the server wakes up - that's the behaviour we want.

What we _don't_ want to happen is for other processes which are writing to
other, non-dead devices to get collaterally blocked.  We have patches which
might fix that queued for 2.6.24.  Peter?
-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: A unresponsive file system can hang all I/O in the syste..., Andrew Morton, (Fri Sep 28, 2:50 am)
KDB?, Daniel Phillips, (Fri Sep 28, 9:51 pm)
[PATCH] lockstat: documentation, Peter Zijlstra, (Wed Oct 3, 5:28 am)
Re: [PATCH] lockstat: documentation, Ingo Molnar, (Wed Oct 3, 5:35 am)