From this, if we have more_io on one superblock and we skip pages on a
different superblock, the combination of the two will causes us to stop
To me it reads as:
while (!done) {
/* sync all data or until one inode skips */
congestion_wait(up to 100ms);
}
and it ignores that we might have more superblocks with dirty data
on them that we haven't flushed because we skipped pages on
If that's the worst case, then it's far better than the current
"wait 30s for every 4MB". ;)
But it takes a modern SATA disk ~40-50ms to write 4MB (80-100MB/s).
IOWs, what you've timed above is a burst workload, not a steady
state behaviour. And it actually shows that the elevator queues
are growing in constrast to your goal of preventing them from
growing.
In more detail, the first half of the trace indicates no pages under
writeback, that tends to imply that all I/O is complete by the
time wb_kupdate is woken - it's been sucked into the drive
cache as fast as possible.
About half way through we start to see windup of the the number of
pages under writeback of about 800-900 pages per printk. That's
1024 pages minus 1 or 2 512k I/Os. This implies that the disk cache
is now full and the disk has reached saturation. I/O is now
being queued in the elevator. The last trace has 13051 pages under
writeback, which at 128 pages per I/O is ~100 queued 512k I/Os.
The default queue depth with cfq is 128 requests, and IIRC it
congests at 7/8s full, or 112 requests. IOWs, you file that you
wrote was about 10MB short of what is needed to see congestion on
your test rig.
So the trace shows we slept on neither congestion or more_io
and it points towards congestion being the thing will typically
block us on large file I/O. Before drawing any conclusions on
whether wbc.more_io is needed or not, do you have any way of
You are using ext3? That would be my guess based simply on the write
rate - ext3 has long been stuck at about that speed for buffered
writes even on much faster ...