On Mon, Oct 01, 2007 at 07:14:57PM -0700, Andrew Morton wrote:
There're only two 'break' conditions in the loop:
1. nr_dirty + nr_unstable + nr_writeback < dirty_limit
=> *mostly* FALSE for a busy system
=> *always* FALSE in Chakri's stucked NFS case
2. nr_written >= 6MB
for a light-load bdi:
=> *never* TRUE until there comes many new writers, contributing
more dirty pages to sync
=> more worse, those new writers will also stuck here...
the obvious unbalance here is:
each writer contributes only 32KB new dirty pages, but
want to consume (not necessarily available) 6MB
So loooong = min(global-less-busy-time, bdi-many-new-writers-arrival-time).
You are right in the reasoning. The exact consequence is:
the light-load sdb is made as _unresponsive_ as the busy sda
Hence Chakri's case: whenever NFS is stuck, every device get stuck.
In theory, every CPU/paralle writer could contribute 8 pages of error.
Hence we get 1MB/32KB = 32 (CPUs/writers).
One more serious problem is, a busy writer could also drain all the
dirty pages and make (nr_writeback == dirty_limit+1MB). In that case,
I suspect the light-load sdb writer still have good chance to
make progress(need confirmation).
Not well tested till now. My system becomes unusable soon after
starting the NFS write(even before plugging the network). I'm seeing
large latencies in try_to_wake_up(). Hope that Ingo could help it out.
Yeah, Peter and me were both aware of the timing.
This patch is only meant for 2.6.23 and 2.6.22.10.
Fengguang
-