Re: [PATCH 5/5] writeback: introduce writeback_control.more_io to indicate more io

Previous thread: Bug in drm modules of kernel 2.6.23-rc9-git2 by werner on Thursday, October 4, 2007 - 11:27 pm. (3 messages)

Next thread: Re: race with page_referenced_one->ptep_test_and_clear_young and pagetable setup/pulldown by Andi Kleen on Friday, October 5, 2007 - 1:03 am. (1 message)
From: David Chinner
Date: Friday, October 5, 2007 - 12:41 am

From this, if we have more_io on one superblock and we skip pages on a
different superblock, the combination of the two will causes us to stop

To me it reads as:

	while (!done) {
		/* sync all data or until one inode skips */
		congestion_wait(up to 100ms);
	}

and it ignores that we might have more superblocks with dirty data
on them that we haven't flushed because we skipped pages on

If that's the worst case, then it's far better than the current
"wait 30s for every 4MB".  ;)




But it takes a modern SATA disk ~40-50ms to write 4MB (80-100MB/s).
IOWs, what you've timed above is a burst workload, not a steady
state behaviour. And it actually shows that the elevator queues
are growing in constrast to your goal of preventing them from
growing.

In more detail, the first half of the trace indicates no pages under
writeback, that tends to imply that all I/O is complete by the
time wb_kupdate is woken - it's been sucked into the drive
cache as fast as possible.

About half way through we start to see windup of the the number of
pages under writeback of about 800-900 pages per printk.  That's
1024 pages minus 1 or 2 512k I/Os. This implies that the disk cache
is now full and the disk has reached saturation. I/O is now
being queued in the elevator. The last trace has 13051 pages under
writeback, which at 128 pages per I/O is ~100 queued 512k I/Os.

The default queue depth with cfq is 128 requests, and IIRC it
congests at 7/8s full, or 112 requests. IOWs, you file that you
wrote was about 10MB short of what is needed to see congestion on
your test rig.

So the trace shows we slept on neither congestion or more_io
and it points towards congestion being the thing will typically
block us on large file I/O. Before drawing any conclusions on
whether wbc.more_io is needed or not, do you have any way of

You are using ext3? That would be my guess based simply on the write
rate - ext3 has long been stuck at about that speed for buffered
writes even on much faster ...
From: Fengguang Wu
Date: Friday, October 5, 2007 - 4:55 am

No, the two cases will occur at the same time to a super_block.

AFAIK, generic_sync_sb_inodes() will simply skip the inode in trouble
and _continue_ to sync other inodes:

                if (wbc->pages_skipped != pages_skipped) {
                        /*
                         * writeback is not making progress due to locked
                         * buffers.  Skip this inode for now.
                         */
                        redirty_tail(inode);
                }




Exactly.

wfg ~% cat /sys/block/sda/queue/nr_requests   
128
wfg ~% cat /sys/block/sda/queue/max_sectors_kb
512
 
More exactly, I was writing a huge file. It produces
balance_dirty_pages, background_writeout, and at last wb_kupdate. The
trace messages are collected after the copy completes, when


Yes, I was running ext3.  It seems that XFS is about the same speed:

[ 1427.278454] mm/page-writeback.c 668 wb_kupdate: pdflush(5606) 37974 global 16727 0 0 wc _M tw -4 sk 0
[ 1427.293653] mm/page-writeback.c 668 wb_kupdate: pdflush(5606) 36946 global 15704 0 0 wc _M tw -3 sk 0
[ 1427.308891] mm/page-writeback.c 668 wb_kupdate: pdflush(5606) 35919 global 14650 0 0 wc _M tw -13 sk 0
[ 1427.322462] mm/page-writeback.c 668 wb_kupdate: pdflush(5606) 34882 global 13937 0 0 wc _M tw 300 sk 0
[ 1427.338194] mm/page-writeback.c 668 wb_kupdate: pdflush(5606) 34158 global 12914 0 0 wc _M tw -9 sk 0
[ 1427.353473] mm/page-writeback.c 668 wb_kupdate: pdflush(5606) 33125 global 11860 0 0 wc _M tw -12 sk 0
[ 1427.362984] mm/page-writeback.c 668 wb_kupdate: pdflush(5606) 32089 global 11860 0 0 wc _M tw 1018 sk 0

That's 14ms per 4MB.  Maybe it's a VFS issue.

-

Previous thread: Bug in drm modules of kernel 2.6.23-rc9-git2 by werner on Thursday, October 4, 2007 - 11:27 pm. (3 messages)

Next thread: Re: race with page_referenced_one->ptep_test_and_clear_young and pagetable setup/pulldown by Andi Kleen on Friday, October 5, 2007 - 1:03 am. (1 message)