Re: Performance testing of various barrier reduction patches [was: Re: [RFC v4] ext4: Coordinate fsync requests]

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Darrick J. Wong
Date: Thursday, September 23, 2010 - 4:25 pm

Hi all,

I just retested with 2.6.36-rc5 and the same set of patches as before
(flush_fua, fsync_coordination, etc) and have an even larger spreadsheet:
http://bit.ly/ahdhyk

This time, however, I instrumented the kernel to report the amount of time it
takes to complete the flush operation.  The test setups elm3a63, elm3c44_sas,
and elm3c71_sas are all arrays that have battery backed write-back cache; it
should not be a huge shock that the average flush time generally stays under
8ms for these setups.  elm3c65 and elm3c75_ide are single disk SAS and IDE
disks (no write cache), and the other setups all feature md-raids backed by
SCSI disks (also no write cache).  The flush_times tab in the spreadsheet lists
average, max, and min sync times.

Turning to the ffsb scores, I can see some of the same results that I saw while
testing 2.6.36-rc1 a few weeks ago.  Now that I've had the time to look at how
the code works and evaluate a lot more setups, I think I can speculate further
about the cause of the regression that I see with the fsync coordination patch.
Because I'm testing the effects of varying the fsync_delay values, I've bolded
the highest score for each unique (directio, nojan, nodj) configuration, and it
appears that the most winning cases are fsync_delay=0 which corresponds to the
old fsync behavior (every caller issues a flush), and fsync_delay=-1 which
corresponds to a coordination delay equal to the average flush duration.

To try to find an explanation, I started looking for connections between fsync
delay values and average flush times.  I noticed that the setups with low (<
8ms) flush times exhibit better performance when fsync coordination is not
attempted, and the setups with higher flush times exhibit better performance
when fsync coordination happens.  This also is no surprise, as it seems
perfectly reasonable that the more time consuming a flush is, the more desirous
it is to spend a little time coordinating those flushes across CPUs.

I think a reasonable next step would be to alter this patch so that
ext4_sync_file always measures the duration of the flushes that it issues, but
only enable the coordination steps if it detects the flushes taking more than
about 8ms.  One thing I don't know for sure is whether 8ms is a result of 2*HZ
(currently set to 250) or if 8ms is a hardware property.

As for safety testing, I've been running power-fail tests on the single-disk
systems with the same ffsb profile.  So far I've observed a lot of fsck
complaints about orphaned inodes being truncated ("Truncating orphaned inode
1352607 (uid=0, gid=0, mode=0100700, size=4096)") though this happens
regardless of whether I run with this 2.6.36 test kernel of mine or a plain
vanilla 2.6.35 configuration.  I've not seen any serious corruption yet.

So, what do people think of these latest results?

--D

On Mon, Aug 23, 2010 at 11:31:19AM -0700, Darrick J. Wong wrote:
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[RFC v3] ext4: Combine barrier requests coming from fsync, Darrick J. Wong, (Mon Aug 9, 12:53 pm)
Re: [RFC v3] ext4: Combine barrier requests coming from fsync, Christoph Hellwig, (Mon Aug 9, 2:07 pm)
[RFC v4] ext4: Coordinate fsync requests, Darrick J. Wong, (Wed Aug 18, 7:14 pm)
Re: [RFC v3] ext4: Combine barrier requests coming from fsync, Christoph Hellwig, (Thu Aug 19, 1:53 am)
Re: Performance testing of various barrier reduction patch ..., Darrick J. Wong, (Thu Sep 23, 4:25 pm)
Re: Performance testing of various barrier reduction patch ..., Christoph Hellwig, (Tue Oct 12, 7:14 am)
Re: Performance testing of various barrier reduction patch ..., Christoph Hellwig, (Fri Oct 15, 4:40 pm)
Re: Performance testing of various barrier reduction patch ..., Christoph Hellwig, (Tue Oct 19, 11:28 am)