Hardware: 1. Utilized (6) 400 gigabyte sata hard drives. 2. Everything is on PCI-e (965 chipset & a 2port sata card) Used the following 'optimizations' for all tests. # Set read-ahead. echo "Setting read-ahead to 64 MiB for /dev/md3" blockdev --setra 65536 /dev/md3 # Set stripe-cache_size for RAID5. echo "Setting stripe_cache_size to 16 MiB for /dev/md3" echo 16384 > /sys/block/md3/md/stripe_cache_size # Disable NCQ on all disks. echo "Disabling NCQ on all disks..." for i in $DISKS do echo "Disabling NCQ on $i" echo 1 > /sys/block/"$i"/device/queue_depth done Software: Kernel: 2.6.23.1 x86_64 Filesystem: XFS Mount options: defaults,noatime Results: http://home.comcast.net/~jpiszcz/raid/20080528/raid-levels.html http://home.comcast.net/~jpiszcz/raid/20080528/raid-levels.txt Note: 'deg' means degraded and the number after is the number of disks failed, I did not test degraded raid10 because there are many ways you can degrade a raid10; however, the 3 types of raid10 were benchmarked f2,n2,o2. Each test was run 3 times and averaged--FYI. Justin. --
Results are meaningless without a crucial detail - what was the chunk size used during array creation time? Otherwise interesting test :) Cheers Peter --
Indeed, the chunk size used was 256 KiB for all tests. Justin. --
Given that one of the greatest benefits of NCQ/TCQ is with parity RAID, I'd be fascinated to see how enabling NCQ changes your results. Of course, you'd want to use a single SATA controller with a known good NCQ implementation, and hard drives known to not do stupid things like disable readahead when NCQ is enabled. -- Chris --
Only/usually on multi-threaded jobs/tasks, yes? Also, I turn off NCQ on all of my hosts that has it enabled by default because there are many bugs that occur when NCQ is on, they are working on it in the libata layer but IMO it is not safe at all for running SATA disks w/NCQ as with it on I have seen drives drop out of the array (with it off, no problems). --
I have done NCQ measurements in the past, for single threaded apps NCQ off is the way to go, check this out from earlier (10 raptors raid5): http://home.comcast.net/~jpiszcz/ncq_vs_noncq/ --
Generally, yes, but there's caching and readahead at various layers in software that can expose the benefit on certain single-threaded Are you using SATA drives with RAID-optimized firmware? Most SATA manufacturers have variants of their drives for a few dollars more that have firmware that provides bounded latency for error recovery operations, for precisely this reason. -- Chris --
I see--however, as I understood it there were bugs utilizing NCQ in libata? But FYI-- In this test, they were regular SATA drives, not special raid-ones (RE2,etc). Thanks for the info! Justin. --
You wouldnt happen to have some more information about this? i havent personally had problems yet, but i havent used it for very long - but since it comes activated by DEFAULT, i would assume it to be relatively --
Not off-hand, check LKML and my email address from early this year or last year and/or the ide-list. Justin. --
Either the RAID 1 read speed must be wrong, or something is odd in the Linux implementation. There's six drives that can be used for reading at the same time, as they contain the very same data. 63MB/s sequential looks like what you would get from a single drive. --
The test is a single thread reading one block at a time, so this is not surprising. If you get this doing multi-megabyte readahead, or with several threads, something is very wrong. -- Chris --
The RAID 1 read speed metrics do not depict multithreaded processes reading from the array simutaneouly. I would suspect that the read performance metrics would look better if 2 bonnie simulations were ran together (for RAID 1 that is). Bryan --
The RAID1 is correct. As has been discussed on this list before, you will= =20 only see raid speed > 1 disk if you run 2(?, or 3 minimal) threads=20 from the same device (raid1). Justin.
On Wed, 28 May 2008 18:34:00 +0200 Which is fairly typical of a cheap desktop PC where the limitation is the memory and PCI bridge as much as the drive. Alan --
I really don't think that's any part of the issue, the same memory and bridge went 4-5x faster in other read cases. The truth is that the raid-1 performance is really bad, and it's the code causing it AFAIK. If you track the actual io it seems to read one drive at a time, in order, without overlap. -- Bill Davidsen <davidsen@tmr.com> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot --
Make sure the readahead is set to be a fair bit over the stripe size if you are doing bulk data tests for a single file. (Or indeed in the real world for that specific case ;)) --
IIRC Justin has readahead at 16MB and chunk at 256k. I would think that if multiple devices were used at all by the md code, that the chunk rather than stripe size would be the issue. In this case the RA seems large enough to trigger good behavior, were there are available. Note: this testing was done with an old(er) kernel, as were all of mine. Since my one large raid array has become more mission critical I'm not comfortable playing with new kernels. The fate of big, fast, and stable machines is to slide into production use. :-( I suppose that's not a bad way to do it, I now have faith in what I'm running. -- Bill Davidsen <davidsen@tmr.com> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark --
I added this in the wiki performance section. I think it would have been informative if also a test with one drive in a non-raid setup was described. Are there any particular findings you want to highlight? Is there some way to estimate random read and writes from this test? Are the XFS file systems completely new when running the tests? Best regards keld --
Since the performance of bonnie++ deals with single threads/a raid1 would p= Not in particular, just I could never find this information provided anywhe= re that showed all of the raid variation/types in one location that was easy t= o Yes, after the creation of each array, mkfs.xfs -f /dev/md3 was run to ensu= re > keld
I have two tiny nits to pick with this information. One is the readahead, which as someone else mentioned is in sectors. The other is the unaligned display of the numbers, leading the eye to believe that values with a similar number of digits can be compared. In truth there's a decimal, but only sometimes. I imported the csv file, formatted all the numbers to an equal number of places after the decimal, and it is far easier to read. Okay, and a half-nit, there were some patches to improve raid-1 performance, I think by running io on multiple drives when you can, and by doing reads from the outer tracks if there are two idle drives. That's not in the stable version you used, I assume, it may not be in 2.6.26 either, I'm doing other things at the moment. A very nice bit of work, my only questions is if you ever feel motivated to repeat this test, it would be fun to do it with ext3 (or ext4) using the stride= parameter. I did limited testing and it really seemed to help, but nothing remotely as format as your test. -- Bill Davidsen <davidsen@tmr.com> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot --
Speaking about which, it would probably be good to adjust a little how the filesystem is created and mounted (both in xfs and ext3/4 cases). E.g. lazy-count=1 is still not the default last time I checked mkfs.xfs. And even ext4.txt from kernel documentation recommends mounting it with data=writeback,nobh when doing comparison with metadata journaling filesystems (the same would go for ext3). Along with different journal sizes, keeping an eye on stripe & stripe-width, and other settings that might be of interest. --
Why is the Sequential Output (Block) for raid6 165719 and for raid5 only 86797? I would have thought that raid6 was always a bit slower in writting due to having to write double amount of parity data. Holger --
I will re-run the RAID5 test and also run the test on a single disk and update the results later. Justin. --
RAID5 (2nd test of 3 averaged runs) & Single disk added: http://home.comcast.net/~jpiszcz/raid/20080528/raid-levels.html --
Other than repeating my (possibly lost) comment that this would be vastly easier to read if the number were aligned and all had the same number of decimal places in a single column, good stuff. For sequential i/o the winners and losers are clear, and you can set cost and performance to pick the winners. Seems obvious that raid-1 is the loser for single threaded load, I suspect that it would be poor against other levels in multithread loads, but not so much for read. -- Bill Davidsen <davidsen@tmr.com> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark --
On my wishlist to Justin is also what is the performance of the raid10's in degraded mode. And then I note that raid1 performs well on random seeks 702/s while the raid10,f2 (my pet) only performs 520/s - but this is on a 2.6.23 kernel without the seek performance patch for raid10,f2. I wonder if the random seeks are related to random read (and write) - it probably is, but there seems to be a difference between the results found with bonnie++ and my tests as reported on the http://linux-raid.osdl.org/index.php/Performance page. Best regards keld --
