On Tue, Aug 24, 2010 at 11:43 AM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
While I can't tell you where the bottlenecks are, I can share some
performance numbers...
4 initiators can get >600K random 4KB IOPS off a single target...
which is ~150% of what the Emulex/Intel/Microsoft results show using 8
targets at 4KB (their 1M IOPS was at 512 byte blocks, which is not a
realistic test point) here:
http://itbrandpulse.com/Documents/Test2010001%20-%20The%20Sun%20Rises%20on%20CNAs%20Te...
The blog referenced earlier used 10 targets... and I'm not sure how
many 10G ports per target.
In general, my target seems capable of 65% the local small-block
random write performance over IB, and 85% the local small-block
random read performance. For large block performance, ~95% efficiency
is easily achievable, read or write (i.e. 5.6GB/s over fabric, where
6GB/s is achievable on the drives locally at 1MB random blocks).
These small-block efficiencies are achievable only when tested with
multiple initiators.
The single initiator is only capable of <150K 4KB IOPS... but gets
full bandwidth w/ larger blocks.
If I were to chose my problem, target or initiator bottleneck, I'd
certainly rather have an initiator bottleneck rather than Microsoft's
target bottleneck.
The numbers are suspicious for other reasons. "Random" is often used
loosely (and the blog referenced earlier doesn't even claim "random").
If there is any merging/coalescing going on, then the "IOPS" are
going to look vastly better. If I allow coalescing, I can easily get
4M 4KB IOPS, but can't honestly call those 4KB IOPS (even if the
benchmark thinks it's doing 4KB I/O). They need to show that their
advertised block size is maintained end-to-end.
Chris
--