Re: ext4 benchmark questions

Previous thread: Question about e2fsck and HTree by Evgeniy Ivanov on Thursday, April 22, 2010 - 12:43 pm. (4 messages)

Next thread: [RFC][PATCH] Journal superblock update should send a barrier by Jan Kara on Thursday, April 22, 2010 - 5:25 pm. (3 messages)
From: Steve Brown
Date: Thursday, April 22, 2010 - 2:38 pm

I'm in the process of evaluating various storage options for a large
array (12TB) I'm creating.  First. I will preface all of this by
saying that I understand the note in the kernel docs about comparing
file systems under various workloads, and I acknowledge that my exact
methodology isn't perfect.  But it works for what I'm doing. :)  This
array will be used for storage of large media files (up to 20-30GB per
file).  I'm testing using iozone with various file sizes ranging from
4GB to 32GB.  I'm pretty much settled on a RAID50 (128kb stripe size)
running ext4 on top of LVM (for snapshots, future expansion, etc.).
I'm running kernel 2.6.33.2, e2fsprogs 1.41.11 and util-linux-ng 2.16.

The file system in question was created with the following options:

mkfs -t ext4 -T large -i 524288 -b 4096 -I 256 -E
stride=32,stripe-width=192 /dev/vg/lv

Currently, I'm testing the effect of various mount options on an ext4
file system and my results are not what I would have expected based on
the docs I have read.  I wanted to bounce some of them off the list to
find out if I'm completely missing something, or if my expectations
were off.

I'll start with the craziest one: noatime.  Everything I have read
says that the noatime option should increase both read and write
performance.  My results are finding that write speeds are comparable
with or without this option, but read speeds are significantly faster
*without* the noatime option.  For example, a 16GB file reads about
210MB/s with noatime but reads closer to 250MB/s without the noatime
option.

Next is the write barrier.  I'm an in a fully battery-backed
environment, so I'm not worried about disabling it.  From my testing,
setting barrier=0 will improve write performance on large files
(>10GB), but hurts performance on smaller files (<10GB).  Read
performance is effected similarly.  Is this to be expected with files
of this size?

Next is the data option.  I am seeing a significant increase in read
performance when using ...
From: Eric Sandeen
Date: Thursday, April 22, 2010 - 2:52 pm

Steve Brown wrote:

the kernel uses "relatime" now by default, which gives you most of the

not expected by me; barriers == drive write cache flushes, which I

data=writeback is not safe for data integrity; unless you can handle

not sure offhand what to make of decreased write performance with a longer
commit time...

-Eric
--

From: Steve Brown
Date: Thursday, April 22, 2010 - 3:11 pm

hmmm... this would seem to conflict with the docs in the kernel, especially:

"Write barriers enforce proper on-disk ordering
of journal commits, making volatile disk write caches
safe to use, at some performance penalty.  If
your disks are battery-backed in one way or another,

I'm not worried about powerloss.  The kernel docs seem to imply that
data=[journaled,ordered] come with a performance hit.  My results
would indicate otherwise.  Should I be seeing this kinda of

Steve
--

From: Eric Sandeen
Date: Thursday, April 22, 2010 - 3:20 pm

they are not exactly the same thing, so noatime may be -slightly-

what you saw is in conflict with what is expected, yes; I don't know
why barriers would ever increase performance.

(my description of barriers as drive write caches isn't in conflict

Sorry, I misread...  I also don't know why reading would be much affected
at all by the journalling mode, which journals -writes- (reading can
update metadata, but not much, esp. if you have noatime/relatime).


--

From: Ric Wheeler
Date: Friday, April 23, 2010 - 7:42 am

Barriers when working should never make things faster, at best, we 
should have parity.

Also important to note that barriers should be disabled if you hardware 
RAID card exports itself as a "write through" cache, even if you enable 
barriers on the command line.

What controller are you using and what kind of drives do you have in the 
back end?


--

From: Steve Brown
Date: Friday, April 23, 2010 - 8:38 am

Thats good to know about the write barriers with WT cache.  I'm still
setting everything manually in /etc/fstab because, well... I don't
always trust software. ;)

The controller is an LSI 9280-8e (megaraid_sas kernel module).  Drives
are 1TB Seagate ES.2s, 16 of them in the chassis.

Steve
--

From: Ric Wheeler
Date: Friday, April 23, 2010 - 8:45 am

If you have the boot time log messages for the disks you use, you can 
see how the cache is advertised to the kernel.

Also note that having battery backed RAID cards does not mean that your 
drive's write cache will survive a power outage. You need to use vendor 
specific tools usually to poke at the drives and make sure that the 
write cache on the S-ATA disks is properly disabled (unless the LSI 
firmware does something to manage the write cache on the drives).

Thanks!

Ric


--

From: Steve Brown
Date: Friday, April 23, 2010 - 8:49 am

The server is fully battery backed for up to 45 minutes.  Also, LSI
does provide tools to disable the cache when the BBU fails.  Its one
of the array config parameters.
--

Previous thread: Question about e2fsck and HTree by Evgeniy Ivanov on Thursday, April 22, 2010 - 12:43 pm. (4 messages)

Next thread: [RFC][PATCH] Journal superblock update should send a barrier by Jan Kara on Thursday, April 22, 2010 - 5:25 pm. (3 messages)