Hardware:
1. Utilized (6) 400 gigabyte sata hard drives.
2. Everything is on PCI-e (965 chipset & a 2port sata card)Used the following 'optimizations' for all tests.
# Set read-ahead.
echo "Setting read-ahead to 64 MiB for /dev/md3"
blockdev --setra 65536 /dev/md3# Set stripe-cache_size for RAID5.
echo "Setting stripe_cache_size to 16 MiB for /dev/md3"
echo 16384 > /sys/block/md3/md/stripe_cache_size# Disable NCQ on all disks.
echo "Disabling NCQ on all disks..."
for i in $DISKS
do
echo "Disabling NCQ on $i"
echo 1 > /sys/block/"$i"/device/queue_depth
doneSoftware:
Kernel: 2.6.23.1 x86_64
Filesystem: XFS
Mount options: defaults,noatimeResults:
http://home.comcast.net/~jpiszcz/raid/20080528/raid-levels.html
http://home.comcast.net/~jpiszcz/raid/20080528/raid-levels.txtNote: 'deg' means degraded and the number after is the number of disks
failed, I did not test degraded raid10 because there are many ways you can
degrade a raid10; however, the 3 types of raid10 were benchmarked
f2,n2,o2.Each test was run 3 times and averaged--FYI.
Justin.
--
Why is the Sequential Output (Block) for raid6 165719 and for raid5 only
86797? I would have thought that raid6 was always a bit slower in writting
due to having to write double amount of parity data.Holger
--
RAID5 (2nd test of 3 averaged runs) & Single disk added:
http://home.comcast.net/~jpiszcz/raid/20080528/raid-levels.html
--
Other than repeating my (possibly lost) comment that this would be
vastly easier to read if the number were aligned and all had the same
number of decimal places in a single column, good stuff. For sequential
i/o the winners and losers are clear, and you can set cost and
performance to pick the winners. Seems obvious that raid-1 is the loser
for single threaded load, I suspect that it would be poor against other
levels in multithread loads, but not so much for read.--
Bill Davidsen <davidsen@tmr.com>
"Woe unto the statesman who makes war without a reason that will still
be valid when the war is over..." Otto von Bismark--
On my wishlist to Justin is also what is the performance of the raid10's
in degraded mode.And then I note that raid1 performs well on random seeks 702/s
while the raid10,f2 (my pet) only performs 520/s - but this is on a
2.6.23 kernel without the seek performance patch for raid10,f2.I wonder if the random seeks are related to random read (and write) - it
probably is, but there seems to be a difference between the results
found with bonnie++ and my tests as reported on the
http://linux-raid.osdl.org/index.php/Performance page.Best regards
keld
--
I will re-run the RAID5 test and also run the test on a single disk and
update the results later.Justin.
--
I have two tiny nits to pick with this information. One is the
readahead, which as someone else mentioned is in sectors. The other is
the unaligned display of the numbers, leading the eye to believe that
values with a similar number of digits can be compared. In truth there's
a decimal, but only sometimes. I imported the csv file, formatted all
the numbers to an equal number of places after the decimal, and it is
far easier to read.Okay, and a half-nit, there were some patches to improve raid-1
performance, I think by running io on multiple drives when you can, and
by doing reads from the outer tracks if there are two idle drives.
That's not in the stable version you used, I assume, it may not be in
2.6.26 either, I'm doing other things at the moment.A very nice bit of work, my only questions is if you ever feel motivated
to repeat this test, it would be fun to do it with ext3 (or ext4) using
the stride= parameter. I did limited testing and it really seemed to
help, but nothing remotely as format as your test.--
Bill Davidsen <davidsen@tmr.com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
--
Speaking about which, it would probably be good to adjust a little how
the filesystem is created and mounted (both in xfs and ext3/4 cases).
E.g. lazy-count=1 is still not the default last time I checked mkfs.xfs.
And even ext4.txt from kernel documentation recommends mounting it with
data=writeback,nobh when doing comparison with metadata journaling
filesystems (the same would go for ext3).Along with different journal sizes, keeping an eye on stripe &
stripe-width, and other settings that might be of interest.
--
I added this in the wiki performance section.
I think it would have been informative if also a test with one drive in
a non-raid setup was described.Are there any particular findings you want to highlight?
Is there some way to estimate random read and writes from this test?
Are the XFS file systems completely new when running the tests?
Best regards
keld--
Since the performance of bonnie++ deals with single threads/a raid1 would p=
Not in particular, just I could never find this information provided anywhe=
re
that showed all of the raid variation/types in one location that was easy t=
o
Yes, after the creation of each array, mkfs.xfs -f /dev/md3 was run to ensu=
re
Either the RAID 1 read speed must be wrong, or something is odd in the
Linux implementation. There's six drives that can be used for reading
at the same time, as they contain the very same data. 63MB/s
sequential looks like what you would get from a single drive.
--
On Wed, 28 May 2008 18:34:00 +0200
Which is fairly typical of a cheap desktop PC where the limitation is the
memory and PCI bridge as much as the drive.Alan
--
I really don't think that's any part of the issue, the same memory and
bridge went 4-5x faster in other read cases. The truth is that the
raid-1 performance is really bad, and it's the code causing it AFAIK. If
you track the actual io it seems to read one drive at a time, in order,
without overlap.--
Bill Davidsen <davidsen@tmr.com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot--
Make sure the readahead is set to be a fair bit over the stripe size if
you are doing bulk data tests for a single file. (Or indeed in the real
world for that specific case ;))--
IIRC Justin has readahead at 16MB and chunk at 256k. I would think that
if multiple devices were used at all by the md code, that the chunk
rather than stripe size would be the issue. In this case the RA seems
large enough to trigger good behavior, were there are available.Note: this testing was done with an old(er) kernel, as were all of mine.
Since my one large raid array has become more mission critical I'm not
comfortable playing with new kernels. The fate of big, fast, and stable
machines is to slide into production use. :-(
I suppose that's not a bad way to do it, I now have faith in what I'm
running.--
Bill Davidsen <davidsen@tmr.com>
"Woe unto the statesman who makes war without a reason that will still
be valid when the war is over..." Otto von Bismark--
The RAID1 is correct. As has been discussed on this list before, you will=
=20
only see raid speed > 1 disk if you run 2(?, or 3 minimal) threads=20
from the same device (raid1).Justin.
The RAID 1 read speed metrics do not depict multithreaded
processes reading from the array simutaneouly. I would suspect
that the read performance metrics would look better if 2 bonnie
simulations were ran together (for RAID 1 that is).Bryan
--
The test is a single thread reading one block at a time, so this is not
surprising. If you get this doing multi-megabyte readahead, or with
several threads, something is very wrong.-- Chris
--
Given that one of the greatest benefits of NCQ/TCQ is with parity RAID,
I'd be fascinated to see how enabling NCQ changes your results. Of
course, you'd want to use a single SATA controller with a known good NCQ
implementation, and hard drives known to not do stupid things like
disable readahead when NCQ is enabled.-- Chris
--
Only/usually on multi-threaded jobs/tasks, yes?
Also, I turn off NCQ on all of my hosts that has it enabled by default because
there are many bugs that occur when NCQ is on, they are working on it in the
libata layer but IMO it is not safe at all for running SATA disks w/NCQ as
with it on I have seen drives drop out of the array (with it off, no problems).--
Generally, yes, but there's caching and readahead at various layers in
software that can expose the benefit on certain single-threadedAre you using SATA drives with RAID-optimized firmware? Most SATA
manufacturers have variants of their drives for a few dollars more that
have firmware that provides bounded latency for error recovery
operations, for precisely this reason.-- Chris
--
I see--however, as I understood it there were bugs utilizing NCQ in libata?
But FYI--
In this test, they were regular SATA drives, not special raid-ones (RE2,etc).Thanks for the info!
Justin.
--
You wouldnt happen to have some more information about this? i havent
personally had problems yet, but i havent used it for very long - but
since it comes activated by DEFAULT, i would assume it to be relatively--
Not off-hand, check LKML and my email address from early this year or last
year and/or the ide-list.Justin.
--
I have done NCQ measurements in the past, for single threaded apps NCQ off
is the way to go, check this out from earlier (10 raptors raid5):http://home.comcast.net/~jpiszcz/ncq_vs_noncq/
--
Results are meaningless without a crucial detail - what was the chunk size
used during array creation time? Otherwise interesting test :)Cheers
Peter
--
Indeed, the chunk size used was 256 KiB for all tests.
Justin.
--
| Davide Libenzi | [patch 7/8] fdmap v2 - implement sys_socket2 |
| Benjamin Herrenschmidt | Re: [PATCH] Remove process freezer from suspend to RAM pathway |
| Greg Kroah-Hartman | [PATCH 011/196] sysfs: Fix a copy-n-paste typo in comment |
| Greg KH | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 0/37] dccp: Feature negotiation - last call for comments |
| Rémi Denis-Courmont | [PATCH] USB host CDC Phonet network interface driver |
| David Miller | [GIT]: Networking |
git: | |
