On Thu, 2008-08-14 at 22:24 +0400, Vladislav Bolkhovitin wrote:
Well, the first step wrt to this for us software folks is getting the
Slicing by 8 algoritim CRC32C into the kernel.. This would be a great
benefit for not just traditional iSCSI/TCP, but Linux/SCTP and
Linux/iWARP software codebases.
I have always found this to be the historical case wrt iSCSI on x86
hardware. The rough estimate was that given identical hardware and
network configuration, an iSCSI target talking to a SCSI subsystem layer
would be able to handle 2x throughput compared to an iSCSI Initiator,
obviously as long as the actual storage could handle it.
Heh, I think the period of designing news ASICs for traditional iSCSI
offload is probably slowing. Aside from the actual difficulting of
doing this and competing with software iSCSI on commodity x86 4x & 8x
core (8x and 16x thread) micropressors with highly efficent software
implementation, that can do BOTH traditional iSCSI offload (where
available) and real deal OS independent connection recovery
(ErrorRecoveryLevel=2) between multiple stateless iSER iWARP/TCP
connections across both hardware *AND* software iWARP RNICs.
With traditional iSCSI, I definately agree on this.
With iWARP and iSER however, I believe the end balance of simplicity is
greater for both hardware and software, and allows both hardware and
software to scale more effectively because The simple gain of having a
Framed PDU on top of legacy TCP with RFC 504[0-4] in order to determine
the offload of the received packet that will be mapped to storage
subsystem later memory for eventual hardware DMA on a vast array of
Linux supported storage hardware and CPU architectures.
So yes, we are talking about quite a few possible cases:
I) Traditional iSCSI:
1) Complete hardware offload for legacy HBAs
2) Hybrid of hardware/software
As mentioned, reducing application layer checksum overhead for current
software implementations is very important for our quickly increase user
base. Using the Slicing by 8 CRC32C will help the current code, but I
think the only other real optimization by network ASIC design folks
would be to do something along the lines with traditional iSCSI with the
application layer that the say the e1000 driver does with transport and
network layer checksums today. I believe the complexity and time to
market considerations of a complete traditional iSCSI offload solution
compared to highly optimized software iSCSI on dedicated commodity cores
still outweighs the benefit IMHO.
Not that I am saying there is no room for improvement from the current
set iSCSI Initiator TOEs. Again I could build a children's fortress
from iSCSI TOE's and their retail boxes that I have in my office that I
have gotten over the years. I would definately like to see them
running on the LIO production fabric and VHACS bare-metal storage clouds
at some point for validation purposes, et al. But as for new designs,
this is still a very difficult proposition, I am glad to see it being
discussed here..
II) iWARP/TCP and iSER
1) Hardware RNIC w/ iWARP/TCP with software iSER
2) Software RNIC w/ iWARP/TCP with software iSER
3) More possible iSER logic in hardware for latency/performance
optimizations (We won't know this until #1 and #2 happen)
Ahh, now this is the interesting case for scaling vendor independent IP
storage fabric to multiple port full duplex 10 Gb/sec fabrics. As this
hardware on PCIe gets out (yes, I have some AMSO1100 goodness too
Steve :-), and iSER initiator/targets on iWARP/TCP come online, I
believe the common code between the different flavours of implemenations
will be much larger here. For example, I previously mentioned ERL=2 in
the context of traditional iSCSI/iSER. This logic is independent of
what RFC5045 knows a network fabric capable of of direct data placement.
I will also make this code independent in lio-target-2.6.git for my
upstream work.
Well, I think alot of this depends on hardware. For example, there is
the X3100 adapter from Neterion today that can do 10 Gb/sec line rate
with x86_64 virtualization. Obviously, the Linux kernel (and my
project, Linux-iSCSI.org) wants to be able to support this as vendor
neutral as possible, which is why we make extensive use of multiple
technologies in our production fabrics, and in the VHACS stack. :-)
Also, the Nested Page Tables would be a big win for this particular
case, but I am not familar with the exact numbers..
<nod> :-)
--nab
--