"As you probably know there is a trend in enterprise computing towards networked storage. This is illustrated by the emergence during the past few years of standards like SRP (SCSI RDMA Protocol), iSCSI (Internet SCSI) and iSER (iSCSI Extensions for RDMA)," began Bart Van Assche, proposing that SCST be merged into the mainline kernel. He suggested that while similar to the STGT project which has been part of the mainline kernel since 2.6.20, "SCST is superior to STGT with respect to features, performance, maturity, stability, and number of existing target drivers. Unfortunately the SCST kernel code lives outside the kernel tree, which makes SCST harder to use than STGT."
SCSI subsystem maintainer, James Bottomley, was not convinced, explaining:
"The two target architectures perform essentially identical functions, so there's only really room for one in the kernel. Right at the moment, it's STGT. Problems in STGT come from the user<->kernel boundary which can be mitigated in a variety of ways. The fact that the figures are pretty much comparable on non IB networks shows this. I really need a whole lot more evidence than at worst a 20% performance difference on IB to pull one implementation out and replace it with another. Particularly as there's no real evidence that STGT can't be tweaked to recover the 20% even on IB."
From: Bart Van Assche <bart.vanassche@...> Subject: Integration of SCST in the mainstream Linux kernel Date: Jan 23, 10:22 am 2008As you probably know there is a trend in enterprise computing towards
networked storage. This is illustrated by the emergence during the
past few years of standards like SRP (SCSI RDMA Protocol), iSCSI
(Internet SCSI) and iSER (iSCSI Extensions for RDMA). Two different
pieces of software are necessary to make networked storage possible:
initiator software and target software. As far as I know there exist
three different SCSI target implementations for Linux:
- The iSCSI Enterprise Target Daemon (IETD,
http://iscsitarget.sourceforge.net/);
- The Linux SCSI Target Framework (STGT, http://stgt.berlios.de/);
- The Generic SCSI Target Middle Level for Linux project (SCST,
http://scst.sourceforge.net/).
Since I was wondering which SCSI target software would be best suited
for an InfiniBand network, I started evaluating the STGT and SCST SCSI
target implementations. Apparently the performance difference between
STGT and SCST is small on 100 Mbit/s and 1 Gbit/s Ethernet networks,
but the SCST target software outperforms the STGT software on an
InfiniBand network. See also the following thread for the details:
http://sourceforge.net/mailarchive/forum.php?thread_name=e2e108260801170....About the design of the SCST software: while one of the goals of the
STGT project was to keep the in-kernel code minimal, the SCST project
implements the whole SCSI target in kernel space. SCST is implemented
as a set of new kernel modules, only minimal changes to the existing
kernel are necessary before the SCST kernel modules can be used. This
is the same approach that will be followed in the very near future in
the OpenSolaris kernel (see also
http://opensolaris.org/os/project/comstar/). More information about
the design of SCST can be found here:
http://scst.sourceforge.net/doc/scst_pg.html.My impression is that both the STGT and SCST projects are well
designed, well maintained and have a considerable user base. According
to the SCST maintainer (Vladislav Bolkhovitin), SCST is superior to
STGT with respect to features, performance, maturity, stability, and
number of existing target drivers. Unfortunately the SCST kernel code
lives outside the kernel tree, which makes SCST harder to use than
STGT.As an SCST user, I would like to see the SCST kernel code integrated
in the mainstream kernel because of its excellent performance on an
InfiniBand network. Since the SCST project comprises about 14 KLOC,
reviewing the SCST code will take considerable time. Who will do this
reviewing work ? And with regard to the comments made by the
reviewers: Vladislav, do you have the time to carry out the
modifications requested by the reviewers ? I expect a.o. that
reviewers will ask to move SCST's configuration pseudofiles from
procfs to sysfs.Bart Van Assche.
--
From: James Bottomley <James.Bottomley@...> Subject: Re: Integration of SCST in the mainstream Linux kernel Date: Jan 29, 4:42 pm 2008On Wed, 2008-01-23 at 15:22 +0100, Bart Van Assche wrote:
> As you probably know there is a trend in enterprise computing towards
> networked storage. This is illustrated by the emergence during the
> past few years of standards like SRP (SCSI RDMA Protocol), iSCSI
> (Internet SCSI) and iSER (iSCSI Extensions for RDMA). Two different
> pieces of software are necessary to make networked storage possible:
> initiator software and target software. As far as I know there exist
> three different SCSI target implementations for Linux:
> - The iSCSI Enterprise Target Daemon (IETD,
> http://iscsitarget.sourceforge.net/);
> - The Linux SCSI Target Framework (STGT, http://stgt.berlios.de/);
> - The Generic SCSI Target Middle Level for Linux project (SCST,
> http://scst.sourceforge.net/).
> Since I was wondering which SCSI target software would be best suited
> for an InfiniBand network, I started evaluating the STGT and SCST SCSI
> target implementations. Apparently the performance difference between
> STGT and SCST is small on 100 Mbit/s and 1 Gbit/s Ethernet networks,
> but the SCST target software outperforms the STGT software on an
> InfiniBand network. See also the following thread for the details:
> http://sourceforge.net/mailarchive/forum.php?thread_name=e2e108260801170....That doesn't seem to pull up a thread. However, I assume it's these
figures:.............................................................................................
. . STGT read SCST read . STGT read SCST read .
. . performance performance . performance performance .
. . (0.5K, MB/s) (0.5K, MB/s) . (1 MB, MB/s) (1 MB, MB/s) .
.............................................................................................
. Ethernet (1 Gb/s network) . 77 78 . 77 89 .
. IPoIB (8 Gb/s network) . 163 185 . 201 239 .
. iSER (8 Gb/s network) . 250 N/A . 360 N/A .
. SRP (8 Gb/s network) . N/A 421 . N/A 683 .
.............................................................................................On the comparable figures, which only seem to be IPoIB they're showing a
13-18% variance, aren't they? Which isn't an incredible difference.> About the design of the SCST software: while one of the goals of the
> STGT project was to keep the in-kernel code minimal, the SCST project
> implements the whole SCSI target in kernel space. SCST is implemented
> as a set of new kernel modules, only minimal changes to the existing
> kernel are necessary before the SCST kernel modules can be used. This
> is the same approach that will be followed in the very near future in
> the OpenSolaris kernel (see also
> http://opensolaris.org/os/project/comstar/). More information about
> the design of SCST can be found here:
> http://scst.sourceforge.net/doc/scst_pg.html.
>
> My impression is that both the STGT and SCST projects are well
> designed, well maintained and have a considerable user base. According
> to the SCST maintainer (Vladislav Bolkhovitin), SCST is superior to
> STGT with respect to features, performance, maturity, stability, and
> number of existing target drivers. Unfortunately the SCST kernel code
> lives outside the kernel tree, which makes SCST harder to use than
> STGT.
>
> As an SCST user, I would like to see the SCST kernel code integrated
> in the mainstream kernel because of its excellent performance on an
> InfiniBand network. Since the SCST project comprises about 14 KLOC,
> reviewing the SCST code will take considerable time. Who will do this
> reviewing work ? And with regard to the comments made by the
> reviewers: Vladislav, do you have the time to carry out the
> modifications requested by the reviewers ? I expect a.o. that
> reviewers will ask to move SCST's configuration pseudofiles from
> procfs to sysfs.The two target architectures perform essentially identical functions, so
there's only really room for one in the kernel. Right at the moment,
it's STGT. Problems in STGT come from the user<->kernel boundary which
can be mitigated in a variety of ways. The fact that the figures are
pretty much comparable on non IB networks shows this.I really need a whole lot more evidence than at worst a 20% performance
difference on IB to pull one implementation out and replace it with
another. Particularly as there's no real evidence that STGT can't be
tweaked to recover the 20% even on IB.James
--
From: Bart Van Assche <bart.vanassche@...> Subject: Re: Integration of SCST in the mainstream Linux kernel Date: Jan 30, 4:29 am 2008On Jan 29, 2008 9:42 PM, James Bottomley
wrote:
> > As an SCST user, I would like to see the SCST kernel code integrated
> > in the mainstream kernel because of its excellent performance on an
> > InfiniBand network. Since the SCST project comprises about 14 KLOC,
> > reviewing the SCST code will take considerable time. Who will do this
> > reviewing work ? And with regard to the comments made by the
> > reviewers: Vladislav, do you have the time to carry out the
> > modifications requested by the reviewers ? I expect a.o. that
> > reviewers will ask to move SCST's configuration pseudofiles from
> > procfs to sysfs.
>
> The two target architectures perform essentially identical functions, so
> there's only really room for one in the kernel. Right at the moment,
> it's STGT. Problems in STGT come from the user<->kernel boundary which
> can be mitigated in a variety of ways. The fact that the figures are
> pretty much comparable on non IB networks shows this.Are you saying that users who need an efficient iSCSI implementation
should switch to OpenSolaris ? The OpenSolaris COMSTAR project involves
the migration of the existing OpenSolaris iSCSI target daemon from
userspace to their kernel. The OpenSolaris developers are
spending time on this because they expect a significant performance
improvement.> I really need a whole lot more evidence than at worst a 20% performance
> difference on IB to pull one implementation out and replace it with
> another. Particularly as there's no real evidence that STGT can't be
> tweaked to recover the 20% even on IB.My measurements on a 1 GB/s InfiniBand network have shown that the current
SCST implementation is able to read data via direct I/O at a rate of 811 GB/s
(via SRP) and that the current STGT implementation is able to transfer data at a
rate of 589 MB/s (via iSER). That's a performance difference of 38%.And even more important, the I/O latency of SCST is significantly
lower than that
of STGT. This is very important for database workloads -- the I/O pattern caused
by database software is close to random I/O, and database software needs low
latency I/O in order to run efficiently.In the thread with the title "Performance of SCST versus STGT" on the
SCST-devel /
STGT-devel mailing lists not only the raw performance numbers were discussed but
also which further performance improvements are possible. It became clear that
the SCST performance can be improved further by implementing a well known
optimization (zero-copy I/O). Fujita Tomonori explained in the same
thread that it is
possible to improve the performance of STGT further, but that this would require
a lot of effort (implementing asynchronous I/O in the kernel and also
implementing
a new caching mechanism using pre-registered buffers).See also:
http://sourceforge.net/mailarchive/forum.php?forum_name=scst-devel&viewm...Bart Van Assche.
--
From: James Bottomley <James.Bottomley@...> Subject: Re: Integration of SCST in the mainstream Linux kernel Date: Jan 30, 12:22 pm 2008On Wed, 2008-01-30 at 09:29 +0100, Bart Van Assche wrote:
> On Jan 29, 2008 9:42 PM, James Bottomley
> wrote:
> > > As an SCST user, I would like to see the SCST kernel code integrated
> > > in the mainstream kernel because of its excellent performance on an
> > > InfiniBand network. Since the SCST project comprises about 14 KLOC,
> > > reviewing the SCST code will take considerable time. Who will do this
> > > reviewing work ? And with regard to the comments made by the
> > > reviewers: Vladislav, do you have the time to carry out the
> > > modifications requested by the reviewers ? I expect a.o. that
> > > reviewers will ask to move SCST's configuration pseudofiles from
> > > procfs to sysfs.
> >
> > The two target architectures perform essentially identical functions, so
> > there's only really room for one in the kernel. Right at the moment,
> > it's STGT. Problems in STGT come from the user<->kernel boundary which
> > can be mitigated in a variety of ways. The fact that the figures are
> > pretty much comparable on non IB networks shows this.
>
> Are you saying that users who need an efficient iSCSI implementation
> should switch to OpenSolaris ?I'd certainly say that's a totally unsupported conclusion.
> The OpenSolaris COMSTAR project involves
> the migration of the existing OpenSolaris iSCSI target daemon from
> userspace to their kernel. The OpenSolaris developers are
> spending time on this because they expect a significant performance
> improvement.Just because Solaris takes a particular design decision doesn't
automatically make it the right course of action.Microsoft once pulled huge gobs of the C library and their windowing
system into the kernel in the name of efficiency. It proved not only to
be less efficient, but also to degrade their security model.Deciding what lives in userspace and what should be in the kernel lies
at the very heart of architectural decisions. However, the argument
that "it should be in the kernel because that would make it faster" is
pretty much a discredited one. To prevail on that argument, you have to
demonstrate that there's no way to enable user space to do the same
thing at the same speed. Further, it was the same argument used the
last time around when the STGT vs SCST investigation was done. Your own
results on non-IB networks show that both architectures perform at the
same speed. That tends to support the conclusion that there's something
specific about IB that needs to be tweaked or improved for STGT to get
it to perform correctly.Furthermore, if you have already decided before testing that SCST is
right and that STGT is wrong based on the architectures, it isn't
exactly going to increase my confidence in your measurement methodology
claiming to show this, now is it?> > I really need a whole lot more evidence than at worst a 20% performance
> > difference on IB to pull one implementation out and replace it with
> > another. Particularly as there's no real evidence that STGT can't be
> > tweaked to recover the 20% even on IB.
>
> My measurements on a 1 GB/s InfiniBand network have shown that the current
> SCST implementation is able to read data via direct I/O at a rate of 811 GB/s
> (via SRP) and that the current STGT implementation is able to transfer data at a
> rate of 589 MB/s (via iSER). That's a performance difference of 38%.
>
> And even more important, the I/O latency of SCST is significantly
> lower than that
> of STGT. This is very important for database workloads -- the I/O pattern caused
> by database software is close to random I/O, and database software needs low
> latency I/O in order to run efficiently.
>
> In the thread with the title "Performance of SCST versus STGT" on the
> SCST-devel /
> STGT-devel mailing lists not only the raw performance numbers were discussed but
> also which further performance improvements are possible. It became clear that
> the SCST performance can be improved further by implementing a well known
> optimization (zero-copy I/O). Fujita Tomonori explained in the same
> thread that it is
> possible to improve the performance of STGT further, but that this would require
> a lot of effort (implementing asynchronous I/O in the kernel and also
> implementing
> a new caching mechanism using pre-registered buffers).These are both features being independently worked on, are they not?
Even if they weren't, the combination of the size of SCST in kernel plus
the problem of having to find a migration path for the current STGT
users still looks to me to involve the greater amount of work.James
--
From: Vladislav Bolkhovitin <vst@...> Subject: Re: Integration of SCST in the mainstream Linux kernel Date: Jan 30, 7:17 am 2008James Bottomley wrote:
> The two target architectures perform essentially identical functions, so
> there's only really room for one in the kernel. Right at the moment,
> it's STGT. Problems in STGT come from the user<->kernel boundary which
> can be mitigated in a variety of ways. The fact that the figures are
> pretty much comparable on non IB networks shows this.
>
> I really need a whole lot more evidence than at worst a 20% performance
> difference on IB to pull one implementation out and replace it with
> another. Particularly as there's no real evidence that STGT can't be
> tweaked to recover the 20% even on IB.James,
Although the performance difference between STGT and SCST is apparent,
this isn't the only point why SCST is better. I've already written about
it many times in various mailing lists, but let me summarize it one more
time here.As you know, almost all kernel parts can be done in user space,
including all the drivers, networking, I/O management with block/SCSI
initiator subsystem and disk cache manager. But does it mean that
currently Linux kernel is bad and all the above should be (re)done in
user space instead? I believe, not. Linux isn't a microkernel for very
pragmatic reasons: simplicity and performance. So, additional important
point why SCST is better is simplicity.For SCSI target, especially with hardware target card, data are came
from kernel and eventually served by kernel, which does actual I/O or
getting/putting data from/to cache. Dividing requests processing between
user and kernel space creates unnecessary interface layer(s) and
effectively makes the requests processing job distributed with all its
complexity and reliability problems. From my point of view, having such
distribution, where user space is master side and kernel is slave is
rather wrong, because:1. It makes kernel depend from user program, which services it and
provides for it its routines, while the regular paradigm is the
opposite: kernel services user space applications. As a direct
consequence from it that there is no real protection for the kernel from
faults in the STGT core code without excessive effort, which, no
surprise, wasn't currently done and, seems, is never going to be done.
So, on practice debugging and developing under STGT isn't easier, than
if the whole code was in the kernel space, but, actually, harder (see
below why).2. It requires new complicated interface between kernel and user spaces
that creates additional maintenance and debugging headaches, which don't
exist for kernel only code. Linus Torvalds some time ago perfectly
described why it is bad, see http://lkml.org/lkml/2007/4/24/451,
http://lkml.org/lkml/2006/7/1/41 and http://lkml.org/lkml/2007/4/24/364.3. It makes for SCSI target impossible to use (at least, on a simple and
sane way) many effective optimizations: zero-copy cached I/O, more
control over read-ahead, device queue unplugging-plugging, etc. One
example of already implemented such features is zero-copy network data
transmission, done in simple 260 lines put_page_callback patch. This
optimization is especially important for the user space gate (scst_user
module), see below for details.The whole point that development for kernel is harder, than for user
space, is totally nonsense nowadays. It's different, yes, in some ways
more limited, yes, but not harder. For ones who need gdb (I for many
years - don't) kernel has kgdb, plus it also has many not available for
user space or more limited there debug facilities like lockdep, lockup
detection, oprofile, etc. (I don't mention wider choice of more
effectively implemented synchronization primitives and not only them).For people who need complicated target devices emulation, like, e.g., in
case of VTL (Virtual Tape Library), where there is a need to operate
with large mmap'ed memory areas, SCST provides gateway to the user space
(scst_user module), but, in contrast with STGT, it's done in regular
"kernel - master, user application - slave" paradigm, so it's reliable
and no fault in user space device emulator can break kernel and other
user space applications. Plus, since SCSI target state machine and
memory management are in the kernel, it's very effective and allows only
one kernel-user space switch per SCSI command.Also, I should note here, that in the current state STGT in many aspects
doesn't fully conform SCSI specifications, especially in area of
management events, like Unit Attentions generation and processing, and
it doesn't look like somebody cares about it. At the same time, SCST
pays big attention to fully conform SCSI specifications, because price
of non-conformance is a possible user's data corruption.Returning to performance, modern SCSI transports, e.g. InfiniBand, have
as low link latency as 1(!) microsecond. For comparison, the
inter-thread context switch time on a modern system is about the same,
syscall time - about 0.1 microsecond. So, only ten empty syscalls or one
context switch add the same latency as the link. Even 1Gbps Ethernet has
less, than 100 microseconds of round-trip latency.You, probably, know, that QLogic Fibre Channel target driver for SCST
allows commands being executed either directly from soft IRQ, or from
the corresponding thread. There is a steady 5-7% difference in IOPS
between those modes on 512 bytes reads on nullio using 4Gbps link. So, a
single additional inter-kernel-thread context switch costs 5-7% of IOPS.Another source of additional unavoidable with the user space approach
latency is data copy to/from cache. With the fully kernel space
approach, cache can be used directly, so no extra copy will be needed.
We can estimate how much latency the data copying adds. On the modern
systems memory copy throughput is less than 2GB/s, so on 20Gbps
InfiniBand link it almost doubles data transfer latency.So, putting code in the user space you should accept the extra latency
it adds. Many, if not most, real-life workloads more or less latency,
not throughput, bound, so there shouldn't be surprise that single stream
"dd if=/dev/sdX of=/dev/null" on initiator gives too low values. Such
"benchmark" isn't less important and practical, than all the
multithreaded latency insensitive benchmarks, which people like running,
because it does essentially the same as most Linux processes do when
they read data from files.You may object me that the target's backstorage device(s) latency is a
lot more, than 1 microsecond, but that is relevant only if data are
read/written from/to the actual backstorage media, not from the cache,
even from the backstorage device's cache. Nothing prevents target from
having 8 or even 64GB of cache, so most even random accesses could be
served by it. This is especially important for sync writes.Thus, why SCST is better:
1. It is more simple, because it's monolithic, so all its components are
in one place and communicate using direct function calls. Hence, it is
smaller, faster, more reliable and maintainable. Currently it's bigger,
than STGT, just because it supports more features, see (2).2. It supports more features: 1 to many pass-through support with all
necessary for it functionality, including support for non-disk SCSI
devices, like tapes, SGV cache, BLOCKIO, where requests converted to
bio's and directly sent to block level (this mode is effective for
random mostly workloads with data set size >> memory size on the
target), etc.3. It has better performance and going to have it even better. SCST only
now enters in the phase, where it starts exploiting all advantages of
being in the kernel. Particularly, zero-copy cached I/O is currently
being implemented.4. It provides safer and more effective interface to emulate target
devices in the user space via scst_user module.5. It much more confirms to SCSI specifications (see above).
Vlad
--
