Re: [PATCH 8/19]: SCST SYSFS interface implementation

Previous thread: [RESEND PATCH 0/2] Fix IRQ round-robing w/o irqbalance on pseries by Nishanth Aravamudan on Friday, October 1, 2010 - 2:26 pm. (9 messages)

Next thread: 2.6.35-rc6 REGRESSION: Dirtiable inode bdi default != sb bdi ext2/ext3/ext4 by Theodore Ts'o on Friday, October 1, 2010 - 2:35 pm. (1 message)
From: Vladislav Bolkhovitin
Date: Friday, October 1, 2010 - 2:34 pm

Hi All,

Please review the next iteration of the patch set of the new (although,
in fact, the oldest) SCSI target framework for Linux SCST with a set of
dev handlers and 3 target drivers: for local access (scst_local), for
Infiniband SRP (srpt) and IBM POWER Virtual SCSI .

SCST is the most advanced and features rich SCSI target subsystem for
Linux. It allows to build the fastest devices and clusters delivering
millions of IOPS. Many companies are using it as a foundation for their
storage products and solutions. List of some of them you can find on
http://scst.sourceforge.net/users.html. There are also many other who
not yet added there or prefer for some reasons to not be listed on this
page.

The previous iterations of the SCST patch set you can find on
http://lkml.org/lkml/2008 and http://lkml.org/lkml/2010/4/13/146.

We believe that code is fully mainline ready (considering that ibmvstgt
driver for SCST not yet tested on hardware).

This iteration for simplicity contains only 2 target drivers: for local
access (scst_local) and SRP (ib_srpt). If SCST accepted, we will prepare
and submit patches for other target drivers: for iSCSI, FCoE, QLogic
Fibre Channel QLA 2xxx, Emulex Fibre Channel/FCoE and Marvell
88SE63xx/64xx/68xx/94xx SAS hardware.

Also this patchset contains port of ibmvstgt driver for SCST, many
thanks to Bart Van Assche for performing it!

This patchset is for kernel 2.6.35. On request we will rebase it for any
other kernel tree. Since SCST is quite self-contained and almost doesn't
touch outside kernel code, there are no problems in it.

As Dmitry Torokhov requested (thanks for the review!), in this iteration
we added more detail descriptions of each patch. Since the space for
description is too limited, we can't describe full SCST internals (it's
worth a book). But we hope that those description will be a good
starting point.

More detail description of SCST, its drivers and utilities you can find
on SCST home page http://scst.sourceforge.net. ...
From: Vladislav Bolkhovitin
Date: Friday, October 1, 2010 - 2:36 pm

This patch contains SCST core's Makefile and Kconfig.

Signed-off-by: Vladislav Bolkhovitin <vst@vlnb.net>
---
 Kconfig  |  244 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 Makefile |   11 ++
 2 files changed, 255 insertions(+)

diff -uprN orig/linux-2.6.35/drivers/scst/Kconfig linux-2.6.35/drivers/scst/Kconfig
--- orig/linux-2.6.35/drivers/scst/Kconfig
+++ linux-2.6.35/drivers/scst/Kconfig
@@ -0,0 +1,244 @@
+menu "SCSI target (SCST) support"
+
+config SCST
+	tristate "SCSI target (SCST) support"
+	depends on SCSI
+	help
+	  SCSI target (SCST) is designed to provide unified, consistent
+	  interface between SCSI target drivers and Linux kernel and
+	  simplify target drivers development as much as possible. Visit
+	  http://scst.sourceforge.net for more info about it.
+
+config SCST_DISK
+	tristate "SCSI target disk support"
+	default SCST
+	depends on SCSI && SCST
+	help
+	  SCST pass-through device handler for disk device.
+
+config SCST_TAPE
+	tristate "SCSI target tape support"
+	default SCST
+	depends on SCSI && SCST
+	help
+	  SCST pass-through device handler for tape device.
+
+config SCST_CDROM
+	tristate "SCSI target CDROM support"
+	default SCST
+	depends on SCSI && SCST
+	help
+	  SCST pass-through device handler for CDROM device.
+
+config SCST_MODISK
+	tristate "SCSI target MO disk support"
+	default SCST
+	depends on SCSI && SCST
+	help
+	  SCST pass-through device handler for MO disk device.
+
+config SCST_CHANGER
+	tristate "SCSI target changer support"
+	default SCST
+	depends on SCSI && SCST
+	help
+	  SCST pass-through device handler for changer device.
+
+config SCST_PROCESSOR
+	tristate "SCSI target processor support"
+	default SCST
+	depends on SCSI && SCST
+	help
+	  SCST pass-through device handler for processor device.
+
+config SCST_RAID
+	tristate "SCSI target storage array controller (RAID) support"
+	default SCST
+	depends on SCSI && SCST
+	help
+	  SCST pass-through device handler for raid ...
From: Vladislav Bolkhovitin
Date: Friday, October 1, 2010 - 2:36 pm

This patch contains changes in drivers/Kconfig and drivers/Makefile
to integrate SCST into the Linux kernel tree.

Signed-off-by: Vladislav Bolkhovitin <vst@vlnb.net>
---
 Kconfig  |    2 ++
 Makefile |    1 +
 2 files changed, 3 insertions(+)

diff -upkr -X linux-2.6.35/Documentation/dontdiff linux-2.6.35/drivers/Kconfig linux-2.6.35/drivers/Kconfig
--- orig/linux-2.6.35/drivers/Kconfig 01:51:29.000000000 +0400
+++ linux-2.6.35/drivers/Kconfig 14:14:46.000000000 +0400
@@ -22,6 +22,8 @@ source "drivers/ide/Kconfig"
 
 source "drivers/scsi/Kconfig"
 
+source "drivers/scst/Kconfig"
+
 source "drivers/ata/Kconfig"
 
 source "drivers/md/Kconfig"
diff -upkr -X linux-2.6.35/Documentation/dontdiff linux-2.6.35/drivers/Makefile linux-2.6.35/drivers/Makefile
--- orig/linux-2.6.35/drivers/Makefile 01:51:29.000000000 +0400
+++ linux-2.6.35/drivers/Makefile 14:15:29.000000000 +0400
@@ -43,6 +43,7 @@ obj-$(CONFIG_ATM)		+= atm/
 obj-y				+= macintosh/
 obj-$(CONFIG_IDE)		+= ide/
 obj-$(CONFIG_SCSI)		+= scsi/
+obj-$(CONFIG_SCST)		+= scst/
 obj-$(CONFIG_ATA)		+= ata/
 obj-y				+= net/
 obj-$(CONFIG_ATM)		+= atm/

--

From: Vladislav Bolkhovitin
Date: Friday, October 1, 2010 - 2:38 pm

This patch contains declarations of all externally visible SCST
constants, types and functions prototypes.

Particularly, it has definitions of all core SCST structures:

 - scst_tgt_template - defines SCST target driver, its callbacks
   and SCST core behavior dealing with it.

 - scst_dev_type - defines SCST backend driver, its callbacks and
   SCST core behavior dealing with it.

 - scst_tgt - defines SCST target (SCSI target port). Analog of Scsi_Host on
   the initiator side.

 - scst_session - defines SCST session (SCSI I_T nexus)

 - scst_cmd - defines SCST command (SCSI I_T_L_Q nexus)

 - scst_device - defines SCST device (SCSI device)

 - scst_tgt_dev - defines SCSI I_T_L nexus

 - scst_acg  - defines SCST access control group. Such groups define
   which initiators see which LUNs.

as well as prototypes for all the functions to manage those objects.

Signed-off-by: Vladislav Bolkhovitin <vst@vlnb.net>
---
 scst.h       | 3642 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scst_const.h |  376 ++++++
 2 files changed, 4018 insertions(+)

diff -uprN orig/linux-2.6.35/include/scst/scst_const.h linux-2.6.35/include/scst/scst_const.h
--- orig/linux-2.6.35/include/scst/scst_const.h
+++ linux-2.6.35/include/scst/scst_const.h
@@ -0,0 +1,376 @@
+/*
+ *  include/scst_const.h
+ *
+ *  Copyright (C) 2004 - 2010 Vladislav Bolkhovitin <vst@vlnb.net>
+ *  Copyright (C) 2007 - 2010 ID7 Ltd.
+ *
+ *  Contains common SCST constants.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation, version 2
+ *  of the License.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ *  GNU General Public License for more details.
+ */
+
+#ifndef __SCST_CONST_H
+#define ...
From: Vladislav Bolkhovitin
Date: Friday, October 1, 2010 - 2:39 pm

This patch contains main management files and private headers.

File scst_main.c contains management functions to create and destroy
target drivers, targets, sessions, device drivers, devices and threads.

A typical life-circle of a target driver:

1. scst_register_target_template()

2. scst_register_target() for each target

3. scst_register_session() for each incoming session.

4. Serving commands:

4.1. scst_rx_cmd() on a new command

4.2. scst_cmd_init_done() after the command's initialization finished

4.3. For WRITE-direction commands only:

4.3.1. rdy_to_xfer() callback to receive data into prepared buffer

4.3.2. scst_rx_data() to notify SCST core that all the data received

4.4. xmit_response() callback to transfer response with (possibly) data
to the initiator

4.5. scst_tgt_cmd_done() - to notify SCST core that the response has sent

4.6. on_free_cmd() callback to notify that the command is about to be freed

5. scst_unregister_session() when initiator disconnects

6. scst_unregister_target()

7. scst_unregister_target_template()


A typical lifecircle of a virtual dev handler:

1. scst_register_virtual_dev_driver()

2. scst_register_virtual_device() for each virtual device as requested user space
configurator

3. Serving commands:

3.1. parse() callback to ensure that command initialized correctly and initialize
what's not initialized yet. Particularly, data transfer direction and data transfer size.

3.2. exec() callback to execute the command

3.3. Calling cmd->scst_cmd_done() to notify SCST core that the command finished.

3.4. dev_done() callback to notify that SCST core finished post processing of the
command

3.5. on_free_cmd() callback to notify that the command is about to be freed

4. scst_unregister_virtual_device()

5. scst_register_virtual_dev_driver()

Signed-off-by: Vladislav Bolkhovitin <vst@vlnb.net>
---
 scst_main.c   | 2105 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scst_module.c |   63 +
 ...
From: Vladislav Bolkhovitin
Date: Friday, October 1, 2010 - 2:42 pm

This patch contains implementation of the SCSI target state machine as well
as functions to interact with target drivers and dev handlers during commands
processing.

The core of SCSI target state machine is scst_process_active_cmd(), which
depending on the current processing state (cmd->state) schedules the next
processing function.

All the processing supposed to be asynchronous, i.e. with no sleeping states.
Each command can be processed only by a single thread at time.

There are the following processing states:

1. SCST_CMD_STATE_INIT_WAIT - waiting for the target driver to call scst_cmd_init_done()

2. SCST_CMD_STATE_INIT - LUN is being translated to the corresponding SCST device

3. SCST_CMD_STATE_PRE_PARSE - internal parsing of the command to do the most
common work of it in scst_pre_parse()

4. SCST_CMD_STATE_DEV_PARSE - the dev handler's parse() is going to be called

5. SCST_CMD_STATE_PREPARE_SPACE - allocation of the cmd's data buffer

6. SCST_CMD_STATE_PREPROCESSING_DONE - calling the target driver's preprocessing_done(),
if requested by it.

7. SCST_CMD_STATE_PREPROCESSING_DONE_CALLED - waiting for the target driver to call
scst_restart_cmd()

8. SCST_CMD_STATE_RDY_TO_XFER - the target driver's rdy_to_xfer() is going to be called

9. SCST_CMD_STATE_DATA_WAIT - waiting for the target driver to call scst_rx_data()

10. SCST_CMD_STATE_TGT_PRE_EXEC - the target driver's pre_exec() is going to be called

11. SCST_CMD_STATE_SEND_FOR_EXEC - cmd is going to be sent for execution

12. SCST_CMD_STATE_LOCAL_EXEC - cmd is being checked if it should be executed locally
(e.g. REPORT LUNS or reservation commands) and, if yes, execute them

13. SCST_CMD_STATE_REAL_EXEC - cmd is being sent for execution

14. SCST_CMD_STATE_REAL_EXECUTING - waiting for the dev handler to call cmd->scst_cmd_done()

15. SCST_CMD_STATE_PRE_DEV_DONE - internal post-exec checks

16. SCST_CMD_STATE_MODE_SELECT_CHECKS - internal MODE SELECT pages related checks for
MODE SELECT command to ...
From: Vladislav Bolkhovitin
Date: Friday, October 1, 2010 - 2:43 pm

This patch contains SCST internal library functions. For instance to create
and destroy various internal objects, like commands, targets, sessions, etc.

Signed-off-by: Vladislav Bolkhovitin <vst@vlnb.net>
---
 scst_lib.c | 7007 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 7007 insertions(+)
diff -uprN orig/linux-2.6.35/drivers/scst/scst_lib.c linux-2.6.35/drivers/scst/scst_lib.c
--- orig/linux-2.6.35/drivers/scst/scst_lib.c
+++ linux-2.6.35/drivers/scst/scst_lib.c
@@ -0,0 +1,7007 @@
+/*
+ *  scst_lib.c
+ *
+ *  Copyright (C) 2004 - 2010 Vladislav Bolkhovitin <vst@vlnb.net>
+ *  Copyright (C) 2004 - 2005 Leonid Stoljar
+ *  Copyright (C) 2007 - 2010 ID7 Ltd.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation, version 2
+ *  of the License.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ *  GNU General Public License for more details.
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include <linux/kthread.h>
+#include <linux/cdrom.h>
+#include <linux/unistd.h>
+#include <linux/string.h>
+#include <linux/ctype.h>
+#include <linux/delay.h>
+#include <linux/vmalloc.h>
+#include <asm/kmap_types.h>
+#include <asm/unaligned.h>
+
+#include <scst/scst.h>
+#include "scst_priv.h"
+#include "scst_mem.h"
+#include "scst_pres.h"
+
+struct scsi_io_context {
+	void *data;
+	void (*done)(void *data, char *sense, int result, int resid);
+	char sense[SCST_SENSE_BUFFERSIZE];
+};
+static struct kmem_cache *scsi_io_context_cache;
+
+/* get_trans_len_x extract x bytes from cdb as length starting from off ...
From: Vladislav Bolkhovitin
Date: Friday, October 1, 2010 - 2:44 pm

This patch contains SCST Persistent Reservations implementation.

SCST implements Persistent Reservations with full set of capabilities,
including "Persistence Through Power Loss".
   
The "Persistence Through Power Loss" data are saved in /var/lib/scst/pr
with files with names the same as the names of the corresponding
devices. Also this directory contains backup versions of those files
with suffix ".1". Those backup files are used in case of power or other
failure to prevent Persistent Reservation information from corruption
during update.

Signed-off-by: Alexey Obitotskiy <alexeyo1@open-e.com>
Signed-off-by: Vladislav Bolkhovitin <vst@vlnb.net>
---
 scst_pres.c | 2498 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 scst_pres.h |  159 +++
 2 files changed, 2657 insertions(+)

diff -uprN orig/linux-2.6.35/drivers/scst/scst_pres.c linux-2.6.35/drivers/scst/scst_pres.c
--- orig/linux-2.6.35/drivers/scst/scst_pres.c
+++ linux-2.6.35/drivers/scst/scst_pres.c
@@ -0,0 +1,2498 @@
+/*
+ *  scst_pres.c
+ *
+ *  Copyright (C) 2009 - 2010 Alexey Obitotskiy <alexeyo1@open-e.com>
+ *  Copyright (C) 2009 - 2010 Open-E, Inc.
+ *  Copyright (C) 2009 - 2010 Vladislav Bolkhovitin <vst@vlnb.net>
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation, version 2
+ *  of the License.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ *  GNU General Public License for more details.
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include <linux/smp_lock.h>
+#include <linux/unistd.h>
+#include <linux/string.h>
+#include ...
From: Vladislav Bolkhovitin
Date: Friday, October 1, 2010 - 2:46 pm

This patch contains SCST debugging support routines.

Signed-off-by: Vladislav Bolkhovitin <vst@vlnb.net>
---
 drivers/scst/scst_debug.c |  223 ++++++++++++++++++++++++++++++++++++
 include/scst/scst_debug.h |  284 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 507 insertions(+)

diff -uprN orig/linux-2.6.35/include/scst/scst_debug.h linux-2.6.35/include/scst/scst_debug.h
--- orig/linux-2.6.35/include/scst/scst_debug.h
+++ linux-2.6.35/include/scst/scst_debug.h
@@ -0,0 +1,284 @@
+/*
+ *  include/scst_debug.h
+ *
+ *  Copyright (C) 2004 - 2010 Vladislav Bolkhovitin <vst@vlnb.net>
+ *  Copyright (C) 2004 - 2005 Leonid Stoljar
+ *  Copyright (C) 2007 - 2010 ID7 Ltd.
+ *
+ *  Contains macroses for execution tracing and error reporting
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation, version 2
+ *  of the License.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ *  GNU General Public License for more details.
+ */
+
+#ifndef __SCST_DEBUG_H
+#define __SCST_DEBUG_H
+
+#include <generated/autoconf.h>	/* for CONFIG_* */
+
+#include <linux/bug.h>		/* for WARN_ON_ONCE */
+
+#ifdef CONFIG_SCST_EXTRACHECKS
+#define EXTRACHECKS_BUG_ON(a)		BUG_ON(a)
+#define EXTRACHECKS_WARN_ON(a)		WARN_ON(a)
+#define EXTRACHECKS_WARN_ON_ONCE(a)	WARN_ON_ONCE(a)
+#else
+#define EXTRACHECKS_BUG_ON(a)		do { } while (0)
+#define EXTRACHECKS_WARN_ON(a)		do { } while (0)
+#define EXTRACHECKS_WARN_ON_ONCE(a)	do { } while (0)
+#endif
+
+#define TRACE_NULL           0x00000000
+#define TRACE_DEBUG          0x00000001
+#define TRACE_FUNCTION       0x00000002
+#define TRACE_LINE           0x00000004
+#define TRACE_PID            0x00000008
+#define TRACE_BUFF           ...
From: Vladislav Bolkhovitin
Date: Friday, October 1, 2010 - 2:46 pm

This patch contains SYSFS interface implementation.

This interface provides possibility for a user to configure his/her SCST
server configuration: add/delete/manage target drivers, targets, dev handlers,
virtual devices and access control to them as well as it allows to see current
SCST configuration with necessary statistical and debug info (e.g. SGV cache
statistics).

For some management events processing is redirected to dedicated thread for
2 reasons:

1. It naturally serializes all SYSFS management operations, so allows to
simplify locking in target drivers and dev handlers. For instance, it allows
in add_target() callback don't worry if del_target() or another add_target()
for the same target name called simultaneously.

2. To make processing be done outside of the internal SYSFS locking.

All internal SCST management is done for simplicity under scst_mutex.
It's simple, robust and worked well even under the highest load for ages.
But in 2.6.35 sysfs was improved to make lockdep to check s_active related
deadlocks and we discovered potential circular locking dependency
between scst_mutex and s_active. On some management operations lockdep triggered
output like:

[ 2036.926891] =======================================================
[ 2036.927670] [ INFO: possible circular locking dependency detected ]
[ 2036.927670] 2.6.35-scst-dbg #15
[ 2036.927670] -------------------------------------------------------
[ 2036.927670] rmmod/4715 is trying to acquire lock:
[ 2036.927670]  (s_active#230){++++.+}, at: [<78240a24>] sysfs_hash_and_remove+0x63/0x67
[ 2036.927670] 
[ 2036.927670] but task is already holding lock:
[ 2036.927670]  (&scst_mutex){+.+.+.}, at: [<fefd7fe2>] scst_unregister_virtual_device+0x58/0x216 [scst]
[ 2036.927670] 
[ 2036.927670] which lock already depends on the new lock.
[ 2036.927670] 
[ 2036.927670] 
[ 2036.927670] the existing dependency chain (in reverse order) is:
[ 2036.927670] 
[ 2036.927670] -> #2 (&scst_mutex){+.+.+.}:
[ 2036.927670]      ...
From: Greg KH
Date: Saturday, October 9, 2010 - 2:20 pm

Nice, but you forgot to document it.  All sysfs changes need to be
documented in Documentation/ABI/




And here.

thanks,

greg k-h
--

From: Vladislav Bolkhovitin
Date: Monday, October 11, 2010 - 12:29 pm

Thanks for the review. In all those functions kobjects for simplicity
are embedded into the outer objects, so they will be freed as part of
the outer objects free. Hence, kfree() for the kobjects in the release
functions are not needed.

Thanks again,
Vlad
--

From: Greg KH
Date: Monday, October 11, 2010 - 2:32 pm

Sweet, you now have opened yourself up to public ridicule as per the
documentation in the kernel for how to use kobjects!

Nice job :)

Seriously, you CAN NOT DO THIS!  If you embed a kobject in a different
structure, then you have to rely on the kobject to handle the reference
counting for that larger structure.  To do ANYTHING else is a bug and
wrong.

Please read the kobject documentation and fix this code up before
submitting it again.

thanks,

greg k-h
--

From: Vladislav Bolkhovitin
Date: Tuesday, October 12, 2010 - 11:53 am

Sure, I have read it and we rely on the kobject to handle the reference
counting for the larger structure. It's only done not in a
straightforward way, because the way it is implemented is simpler for us
+ for some other reasons.

For instance, for structure scst_tgt it is done using
tgt_kobj_release_cmpl completion. When a target driver calls
scst_unregister_target(), scst_unregister_target() in the end calls
scst_tgt_sysfs_del(), which calls kobject_put(&tgt->tgt_kobj) and wait
for tgt_kobj_release_cmpl to complete. At this point tgt_kobj can be
taken only by the SYSFS. Scst_tgt_sysfs_del() can wait as much as needed
until the SYSFS code released it. As far as I can see, it can't be
forever, so it's OK. Then, after scst_tgt_sysfs_del() returned,
scst_unregister_target() will free scst_tgt together with embedded tgt_kobj.

Sure, if you insist, I can convert tgt_kobj and other similar kobjects
to pointers, but it would be just a formal code introducing additional
kmalloc()/kfree() pair per each kobject without changing any logic anywhere.

Thanks,
Vlad
--

From: Greg KH
Date: Tuesday, October 12, 2010 - 12:03 pm

I don't understand, why can't you just free the memory, what are you
having to wait for?

You are only having one kobject for your structure, right?  If so, then
free the memory in the release, to wait for something else to free the

As no other kernel code is like this, I don't think it's valid to be
doing so, sorry.

Please fix this.

good luck,

greg k-h
--

From: Vladislav Bolkhovitin
Date: Thursday, October 14, 2010 - 12:48 pm

I'm sorry, but after I started implementing it I'm confused..

Originally I thought you are asking to make tgt_kobj be not embedded in
struct scst_tgt, but a pointer in it, so scst_tgtt_release() will
kfree() tgt_kobj. Hence, all the above I wrote about why we have
tgt_kobj embedded.

But now I feel like you are asking that scst_tgtt_release() should
kfree() tgt, not tgt_kobj.

Is it correct?

If yes, we did it this way to make errors processing uniform and
straightforward. In the code (simplified) our current errors recovery
looks like:

void scst_tgt_sysfs_del(struct scst_tgt *tgt)
{
...
	kobject_put(&tgt->tgt_kobj);
	wait_for_completion(&tgt->tgt_kobj_release_cmpl);
}

int scst_tgt_sysfs_create(struct scst_tgt *tgt)
{
	init_completion(&tgt->tgt_kobj_release_cmpl);

	res = kobject_init_and_add(&tgt->tgt_kobj, &tgt_ktype,
			&tgt->tgtt->tgtt_kobj, tgt->tgt_name);
	if (res != 0)
		goto out;

	res = sysfs_create_file(&tgt->tgt_kobj,
			&tgt_enable_attr.attr);
	if (res != 0)
		goto out_err;
...

out:
	return res;

out_err:
	scst_tgt_sysfs_del(tgt);
	goto out;
}

struct scst_tgt *scst_register_target()
{
	struct scst_tgt *tgt;
	int rc = 0;

	TRACE_ENTRY();

	rc = scst_alloc_tgt();
	if (rc != 0)
		goto out;
...
	tgt->tgt_name = kmalloc(strlen(target_name) + 1, GFP_KERNEL);
	if (tgt->tgt_name == NULL)
		goto out_free_tgt;
...
	mutex_lock();

	rc = scst_tgt_sysfs_create(tgt);
	if (rc < 0)
		goto out_unlock;

	tgt->default_acg = scst_alloc_add_acg();
	if (tgt->default_acg == NULL)
		goto out_sysfs_del;

...

out:
	return tgt;

out_sysfs_del:
	mutex_unlock();
	scst_tgt_sysfs_del(tgt)
	goto out_free_tgt;

out_unlock:
	mutex_unlock();

out_free_tgt:
	scst_free_tgt(tgt);
}

We have a simple and straightforward errors recovery semantic: if
scst_tgt_sysfs_create() failed, then tgt_kobj returned deinited as if
scst_tgt_sysfs_create() has never been called. This is a regular
practice in the kernel: don't return half-initialized ...
From: Greg KH
Date: Thursday, October 14, 2010 - 1:04 pm

I am asking that ANY kobject release function call kfree to release the
memory that object is embedded in.  That is how kobjects work, please


That's not "error prone", it's "people don't read the provided
documentation about how to use the API".

And yes, one could argue to make the API easier to use, and patches are

There are good reasons, but the most important one being, if you pass
off an object, as of that moment in time, you had better handle the
reference counting correctly.  If you think of it this way, cleanup
logic is even simpler as that's the only rule you ever need to think

Again, yes.  Please read the documentation.

thanks,

greg k-h
--

From: Vladislav Bolkhovitin
Date: Friday, October 22, 2010 - 10:30 am

Here is the patch. It converts all the cases except 3:

 - struct scst_tgt_template and struct scst_dev_type, because they are supplied by
target drivers and dev handlers correspondingly and usually static.

 - struct scst_session, because sometimes we need to create several sessions for the
same initiator (for iSCSI sessions reinstatement, for instance) and for such cases need
a logic to have different SYSFS names for them. To have this logic functioning correctly,
we need to keep sessions alive longer than life of their kobject (see scst_sess_sysfs_create()
and scst_free_session() for more details).

I don't like this patch, because it makes the code in almost 100 lines of code bigger and
worse by making objects initialization/freeing a lot more complicated and harder to audit.
But, since you insist, I applied it.

It isn't late to revert it, just say a word.

Thanks,
Vlad

Signed-off-by: Vladislav Bolkhovitin <vst@vlnb.net>
---
 drivers/scst/scst_lib.c   |  102 ++++++++++++++----
 drivers/scst/scst_main.c  |   26 +---
 drivers/scst/scst_priv.h  |   37 +++++-
 drivers/scst/scst_sysfs.c |  247 ++++++++++++++++++++++++----------------------
 include/scst/scst.h       |   22 +++-
 5 files changed, 261 insertions(+), 173 deletions(-)

diff -upr linux-2.6.35/include/scst/scst.h linux-2.6.35/include/scst/scst.h
--- include/scst/scst.h
+++ include/scst/scst.h
@@ -1431,8 +1431,10 @@ struct scst_tgt {
 	char *default_group_name;
 #endif
 
+	unsigned int tgt_kobj_initialized:1;
+
 	/* sysfs release completion */
-	struct completion tgt_kobj_release_cmpl;
+	struct completion *tgt_kobj_release_cmpl;
 
 	struct kobject tgt_kobj; /* main targets/target kobject */
 	struct kobject *tgt_sess_kobj; /* target/sessions/ */
@@ -2077,6 +2079,9 @@ struct scst_device {
 	/* If set, dev is read only */
 	unsigned short rd_only:1;
 
+	/* If set, dev_kobj was initialized */
+	unsigned short dev_kobj_initialized:1;
+
 ...
From: Greg KH
Date: Friday, October 22, 2010 - 10:56 am

It's the middle of the merge window, and I'm about to go on vacation, so
I didn't read this patch after this line.

It's obvious that this patch is wrong, you shouldn't need to worry about
this.  And even if you did, you don't need this flag.

Why are you trying to do something that no one else needs?  Why make
things harder than they have to be.

{sigh}

greg k-h
--

From: Vladislav Bolkhovitin
Date: Friday, October 22, 2010 - 11:40 am

I tried to explain that to you in http://lkml.org/lkml/2010/10/14/291
and mentioned there the need to create this flag to track
half-initialized kobjects. You agreed
(http://lkml.org/lkml/2010/10/14/299) that don't return half-initialized
objects is a regular kernel practice, but then requested to strictly
bound the larger object freeing to its kobject release(), which means
that all SYSFS creating functions now have to return half-initialized
SYSFS hierarchy in case of any error. Hence the flag to track it.

Simply, any SCST object has a lot of other things to initialize besides
its kobject, hence the need either to free it independently from its
kobject, or track by a flag if its kobjects initialized.

For illustration, here is the simplified code from my previous example,
i.e. without this patch. I added to it scst_unregister_target() to make
it more complete.

void scst_tgt_sysfs_del(struct scst_tgt *tgt)
{
...
	kobject_put(&tgt->tgt_kobj);
	wait_for_completion(&tgt->tgt_kobj_release_cmpl);
}

int scst_tgt_sysfs_create(struct scst_tgt *tgt)
{
	init_completion(&tgt->tgt_kobj_release_cmpl);

	res = kobject_init_and_add(&tgt->tgt_kobj, &tgt_ktype,
			&tgt->tgtt->tgtt_kobj, tgt->tgt_name);
	if (res != 0)
		goto out;

	res = sysfs_create_file(&tgt->tgt_kobj,
			&tgt_enable_attr.attr);
	if (res != 0)
		goto out_err;
...
out:
	return res;

out_err:
	scst_tgt_sysfs_del(tgt);
	goto out;
}

struct scst_tgt *scst_register_target()
{
	struct scst_tgt *tgt;
	int rc = 0;

	rc = scst_alloc_tgt();
	if (rc != 0)
		goto out;
...
	tgt->tgt_name = kmalloc(strlen(target_name) + 1, GFP_KERNEL);
	if (tgt->tgt_name == NULL)
		goto out_free_tgt;
...

	mutex_lock();

	rc = scst_tgt_sysfs_create(tgt);
	if (rc < 0)
		goto out_unlock;

	tgt->default_acg = scst_alloc_add_acg();
	if (tgt->default_acg == NULL)
		goto out_sysfs_del;
...

out:
	return tgt;

out_sysfs_del:
	mutex_unlock();
	scst_tgt_sysfs_del(tgt)
	goto ...
From: Greg KH
Date: Friday, October 22, 2010 - 11:54 am

I agreed that you needed to do something about it, not that this is the
correct way to do it.

Think for a second as to why your code path looks different from EVERY
other kobject user in the kernel.  Perhaps it is incorrect?  You don't
need all this completion mess, in fact, it's wrong.

Just do what everyone else does please, as that is the simpler, and
correct, way to do it.

Oh, and why are you using a kobject at all anyway?  Shouldn't you be
using a 'struct device'?

Anyway, I don't have time for this anymore for the next 2 weeks, good
luck.

greg k-h
--

From: Vladislav Bolkhovitin
Date: Monday, November 8, 2010 - 12:58 pm

Hello Greg,

Why SCST objects are different from most other kernel objects and why
they can't be implemented the same way as them is exactly what I'm
trying to explain you. Let me try again from a bit different angle.

SCST objects are different from the most other kernel objects, because
they are very complex, hence complex to initialize and delete with
dependencies in order how initialization and delete actions should be
performed. Particularly, SCST objects have a lot of attributes and
sub-objects, so their kobjects can't be inited and exposed to SYSFS or
removed from it until they reach some point during initialization or
delete correspondingly, otherwise their attributes' reads and writes can
crash.

I can elaborate all those on examples, if you request.

I understand you are very busy and appreciate very much your review and
comments, so be glad to implement what you are requesting. But I don't
see how to implement that in an acceptable manner with neither the
completion-based delete as now, nor with x_initialized flag as in the
previous patch.

Note, SCST's kobjects are not the only kobjects in the kernel using the
completion based delete. See, for instance, ktype_cpufreq, ktype_cpuidle
or ext4_ktype.

Also, elevator (struct elevator_queue) uses "registered" flag to see if
it was added to the SYSFS.

Thus, you can see, both the completion and the flag approaches already
used in the kernel. Hence, I believe, the way how SCST is doing it
should also be acceptable.

I'm not a person who, as you, probably, guessed, at first doing, only
then thinking and reading docs. I before writing a code line at first
read all the available docs and carefully consider possible
alternatives, so there isn't a well thought bit in SCST.


We need only to represent our internal SCST objects on the SYSFS. The
objects are not devices, but special entities for special purposes. For
instance, struct scst_session is a representation of SCSI I_T nexuses.
Struct scst_tgt_dev - I_T_L ...
From: Greg KH
Date: Monday, November 8, 2010 - 5:28 pm

Then don't abuse kobjects with this "different" type of kobject, as that
is not how the kobject code was designed to be used.


That sounds like an implementation error.  No other kernel code has that

ext4 is using this to get stuff under the /sys/fs/ext4 location.  And
even there, I don't think it is using kobjects correctly, but I really
don't want to go audit that code at the moment.

cpufreq and cpuidle is also probably incorrect, but again, I don't feel
like auditing it right now.



No, you should be using a struct device as you are putting stuff into
the device tree.  NEVER put a kobject into the device tree, that is just
wrong and will cause problems.

thanks,

greg k-h
--

From: Vladislav Bolkhovitin
Date: Tuesday, November 9, 2010 - 1:06 pm

SCST objects are generally 1:1 mapping of SCSI Architecture Model (SAM)
objects. Is SAM insane and should have been designed with kobjects in

"To get stuff under /sys/..." is exactly for what SCST needs kobjects.
SCST objects are user interface agnostic and can be equally well exposed
to user space via SYSFS, PROCFS, or anything else, like CONFIGFS or

Sorry, but what is incorrect in the working implementation without any
bugs doing its job in the simplest, smallest and clearest way?

If those objects remade to free themselves in the kobjects release(),
what value would it add to them? Would the implementation be simpler,
smaller or clearer? Not, I believe, new implementation would be only

Hmm, we never put any stuff into the device tree. Why do you think we
do? None of SCST objects has any relation to any hardware device, hence
there's no need to touch the device tree.

Thanks,
Vlad
--

From: Boaz Harrosh
Date: Wednesday, November 10, 2010 - 2:58 am

Totally theoretically speaking, since I have not inspected the code.

If today you wait for the count to reach zero, then unregister
and send an event to some other subsystem to free the object.

Is it not the same as if you take an extra refcount, unregister and
send the event at count=1. Then at that other place decrement the last
count to cause the object to be freed.

I agree that it is hard to do lockless. what some places do is have
an extra kref. The kobj has a single ref on it. everything takes the
other kref. when that reaches zero the unregister and event fires
and at free you decrement the only kobj ref to deallocate. This is one
way. In some situations you can manage with a single counter it all
depends.

Boaz
--

From: Vladislav Bolkhovitin
Date: Wednesday, November 10, 2010 - 1:19 pm

Thanks for sharing your thoughts with us. But the question isn't about
if it's possible to implement what we need locklessly. The question is
in two approaches how to synchronously delete objects with entries on SYSFS:

1. struct object_x {
	...
	struct kobject kobj;
	struct completion *release_completion;
};

static void x_release(struct kobject *kobj)
{
	struct object_x *x;
	struct completion *c;

	x = container_of(kobj, struct object_x, kobj);
	c = x->release_completion;
	kfree(x);
	complete_all(c);
}

void del_object(struct object_x *x)
{
	DECLARE_COMPLETION_ONSTACK(completion);

	...
	x->release_completion = &completion;
	kobject_put(&x->kobj);
	wait_for_completion(&completion);
}

and

2. struct object_x {
	...
	struct kobject kobj;
	struct completion release_completion;
};

static void x_release(struct kobject *kobj)
{
	struct object_x *x;

	x = container_of(kobj, struct object_x, kobj);
	complete_all(&x->release_completion);
}

void del_object(struct object_x *x)
{
	...
	kobject_put(&x->kobj);
	wait_for_completion(&completion);
	...
	kfree(x);
}

Greg asserts that (1) is the only correct approach while (2) is
incorrect, and I'm trying to justify that (2) is correct too and
sometimes could be better, i.e. simpler and clearer, because it
decouples object_x from SYSFS and its kobj. Then kobj becomes an
ordinary member of struct object_x without any special treatment and
with the same lifetime rules as other members of struct object_x. While
in (1) all lifetime of struct object_x is strictly attached to kobj, so
it needs be specially handled with additional code for that if struct
object_x has many other members which needed to be initialized/deleted
_before and after_ kobj as we have in SCST.

Vlad
--

From: Joe Eykholt
Date: Wednesday, November 10, 2010 - 1:29 pm

I'll admit I don't understand this all that well, but
why not just have x_release() (based on (2))
do free(x), and have del_object
do the kobject_put(&x->kobj) as its very last thing?
--

From: Vladislav Bolkhovitin
Date: Wednesday, November 10, 2010 - 1:38 pm

We are discussing _synchronous_ delete of x, so need to wait until
x->kobj released, hence the completion is needed in both cases.

For instance, the sync delete is needed for targets to let the
corresponding target driver be safely unloaded after its target
unregistered.

Vlad
--

From: Joe Eykholt
Date: Wednesday, November 10, 2010 - 1:42 pm

Ah, well to answer my own question, I guess that's (1).
Never mind.

--

From: Boaz Harrosh
Date: Thursday, November 11, 2010 - 2:59 am

Thanks for putting up an example we can now speak more specifically.
(And saved me the time to actually look at code)
I'll first comment on your code below and I have some questions, please
see if I understood you correctly. Later below I'll try to explain what I



I don't see the unregister of the object_x.kobj, where do

This put might not be the last put on the object, IOs in flight
and/or open files might have extra reference on the object.
We release our initial ref, and below wait for all operations
 	DECLARE_COMPLETION_ONSTACK(completion);
	x->release_completion = &completion;

One possibility (There are others)

3. struct object_x {
	...
	struct kref kref;
	struct kobject kobj;
	struct completion *release_completion;
};

Every body takes kref_put(&object_x.kref) and  kref_get(&object_x.kref)
I hope you have x_get/x_put, Yes?

static void x_kref_release(struct kref *kref)
{
	struct object_x *x = container_of(kref, struct object_x, kref);

	complete_all(&x->release_completion);
}

static void x_obj_release(struct kobject *kobj)
{
	struct object_x *x = container_of(kobj, struct object_x, kobj);

	kfree(x);
}

int x_put(struct object_x *x)
{
	return kref_put(&x->kref, x_kref_release);
}

void del_object(struct object_x *x)
{
	DECLARE_COMPLETION_ONSTACK(completion);

	...
	x->release_completion = &completion;
	x_put(x)
	wait_for_completion(&completion);
	kobject_put(&x->kobj);
}

Or

4. Exactly Like 3 but without the extra kref member
   Only x_put() changes and x_kref_release() now receives
   an x_object

int x_put(struct object_x *x)
{
	if (kobject_put(&x->kobj) == 1)
		// Like above [3] x_kref_release()
		x_kref_release(x);
}

Note that in 4 you don't actually have a kref member, and that you have
one extra ref on kobj from the beginning. In del_object above the first
x_put(x) makes it possible to reach the "1" count and then the final
kobject_put(&x->kobj); frees the object.
(You need to be carfull with [4] because it must ...
From: Greg KH
Date: Thursday, November 11, 2010 - 5:04 am

This is racy, please never do this.

thanks,

greg k-h
--

From: Boaz Harrosh
Date: Thursday, November 11, 2010 - 7:05 am

The last ref belongs to the core code. 1 means there are no
more external clients on the object. So it can not race with
decrements. But I guess there is a possibility that it can
race with new increments. If it is the case that new increments
can only come from, say, sysfs access, then if we call the
x_put() == 1 after we are unregistered from sysfs and no new
users are allowed then the counter can only go down and we
have the last reference. No?


Thanks
Boaz
--

From: Greg KH
Date: Thursday, November 11, 2010 - 7:16 am

Just don't do this, it's not worth it and will break over time when
others mess with the code.

Also note that kobject_put() does not even return a value, so the code
above will not even compile, let alone work.

thanks,

greg k-h
--

From: Boaz Harrosh
Date: Thursday, November 11, 2010 - 7:19 am

OK Point taken, it is fragile. So there is option [3] then, with the extra

Thanks
Boaz
--

From: Vladislav Bolkhovitin
Date: Thursday, November 11, 2010 - 1:50 pm

This is the last internal put. All other references are from outsiders.

OK, I see. You know, all non-trivial things can be done in >1 correct
way ;) (Although (4) is not too correct as Greg already wrote)

Vlad
--

From: Dmitry Torokhov
Date: Thursday, November 11, 2010 - 6:23 pm

The question is why do you need to wait here? I presume it is module
unloading path, but then it is quite bad - you can easily wedge your
subsystem if you make something to take a reference to your kobject
while module is ytying to be unloaded. Back when sysfs attributes tied
kobjects the easiest thing was to do:

	rmmod <module> < / sys/devices/..../attribute


If you are done with the kobject - just proceed with what you were doing
and let it die its own peaceful death some time later. You just need to
make sure release code sticks around to free it and your subsystem core
can be tasked with this. Use module counter to prevent unloading of the
subsystem core until all kobjects belonging to the subsystem are
destroyed.

-- 
Dmitry
--

From: Bart Van Assche
Date: Friday, November 12, 2010 - 5:09 am

On Fri, Nov 12, 2010 at 2:23 AM, Dmitry Torokhov

Do you mean keeping a kref object in the kernel module, invoking
kref_get() every time a kobject has been created and invoking
kref_put() from the kobject/ktype release method ? That would help to
reduce the race window but would not eliminate all races: as soon as
the last kref_put() has been invoked from the release method, the
module can get unloaded. And module unloading involves freeing all
module code sections, including the section that contains the
implementation of the release method. Which is a race condition.

I'm not sure that it is even possible with the current kobject
implementation to solve this race. I haven't found any information
about this race in Documentation/kobject.txt. And it seems to me that
the code in samples/kobject/kobject-example.c is vulnerable to this
race: methods like foo_show() and foo_store() can access statically
allocated memory ("static int foo") after the module has been
unloaded. Although the race window is small, this makes me wonder
whether module unloading been overlooked at the time the kobject
subsystem has been designed and implemented ?

Bart.
--

From: Dmitry Torokhov
Date: Friday, November 12, 2010 - 11:44 am

No, you do not add a kref, but rather manipulate module use counter:

static void blah_blah_release(struct kobject *kobj)
{
	struct blah_blah *b = to_blah_blah(kobj);

	...
	kfree(kobj);

	module_put(THIS_MODULE);
}

int blah_blah_register(struct blah_blah *blah)
{
	...

	__module_get(THIS_MODULE);

	...

	return 0;
}

The above should reside in subsystem _core_ and it will pin the core
module until last kobject belonging to the subsystem is released.
Once all users are gone module counter will go to 0 and rmmod will
allow core to unload. Note that no new kobjects will be created while
module usage count is 0 because there are no users of the core - all of
them have to be unloaded already, otherwise module loader would have


-- 
Dmitry
--

From: Bart Van Assche
Date: Saturday, November 13, 2010 - 3:52 am

On Fri, Nov 12, 2010 at 7:44 PM, Dmitry Torokhov

Thanks for replying, but sorry, it's still not clear to me. The use
counter of which module should be manipulated ? Manipulating the use
counter of the module that contains the kobject/ktype callback
function implementations would make it impossible to unload that
module because rmmod refuses to unload any module whose use counter is
above zero. And manipulating the use counter of a parent module would
not help in any way.

It would help if you could tell us where in the kernel tree we can
find a good example. I have tried to find such an example, but all I
found are examples of kobject release method implementations in which
no attempt is made to prevent the aforementioned race:
* iscsi_boot_kobj_release() in drivers/scsi/iscsi_boot_sysfs.c
* pkt_kobj_release() in drivers/block/pktcdvd.c

Bart.
--

From: Vladislav Bolkhovitin
Date: Saturday, November 13, 2010 - 10:20 am

This is a very good question. During implementation I spent a lot of
time working on it.

In fact, the first implementation was asynchronous similarly as you
proposing, i.e. it just proceed with what you were doing and let it die
its own peaceful death some time later. But soon the implementation was
becoming so complicated, so it started getting out of control. For
instance, some of the tasks to solve with this approach were:

1. What to do if another SCST object is being created with the same name
as supposed to be deleted, but not completely dead yet?

2. What to do if a dieing object is found on some list and reference for
is supposed to be taken? If the object deleted from the list before it
marked dieing, i.e. the latest internal put() done, it made additional
problems during deleting it after the latest external put done.

...

So, I decided to reimplement it to be completely synchronous. SYSFS
authors did really great job and thanks to the excellent internal SYSFS
design and implementation it is absolutely safe. See:

[root@tgt ~]# modprobe scst
[root@tgt ~]# cd /sys/kernel/scst_tgt/
[root@tgt scst_tgt]# ls -l
total 0
drwxr-xr-x 4 root root    0 Nov 13 21:31 devices
drwxr-xr-x 2 root root    0 Nov 13 21:31 handlers
-r--r--r-- 1 root root 4096 Nov 13 21:30 last_sysfs_mgmt_res
-rw-r--r-- 1 root root 4096 Nov 13 21:30 setup_id
drwxr-xr-x 5 root root    0 Nov 13 21:31 sgv
drwxr-xr-x 2 root root    0 Nov 13 21:31 targets
-rw-r--r-- 1 root root 4096 Nov 13 21:30 threads
-rw-r--r-- 1 root root 4096 Nov 13 21:30 trace_level
-r--r--r-- 1 root root 4096 Nov 13 21:30 version
[root@tgt scst_tgt]# cat version
2.1.0-pre1
EXTRACHECKS
DEBUG
[root@tgt scst_tgt]# rmmod scst </sys/kernel/scst_tgt/version
[root@tgt scst_tgt]# ls -l
total 0
[root@tgt scst_tgt]# pwd
/sys/kernel/scst_tgt
[root@tgt scst_tgt]# lsmod
Module                  Size  Used by
scsi_debug             65188  0
w83627hf               22424  0
hwmon_vid               2207  1 w83627hf
adm1021            ...
From: Greg KH
Date: Saturday, November 13, 2010 - 4:59 pm

Sorry, but no, you can't put this in /sys/kernel/ without getting the
approval of the sysfs maintainer.

I really don't understand why you are using kobjects in the first place,
why isn't this in the main device tree in the kernel, using 'struct
device'?

In the end, I guess it really doesn't matter as this code isn't getting
merged so I shouldn't worry about it, right?

thanks,

greg k-h
--

From: Dmitry Torokhov
Date: Sunday, November 14, 2010 - 11:59 pm

It is my understanding that Vlad is able to reflect the topology by

This is quite unfortunate as I still have not seen the public comparison
of the 2 implementations and the lists of benefits and shortfalls for
both of them.

-- 
Dmitry
--

From: Bart Van Assche
Date: Monday, November 15, 2010 - 10:53 am

On Mon, Nov 15, 2010 at 7:59 AM, Dmitry Torokhov

The reason why we are proposing SCST for upstream inclusion are:
- It has a large user base.
- The project is known for its high performance, low latency I/O and
excellent stability.
- A feature-wise overview can be found here:
http://scst.sourceforge.net/comparison.html.

Bart.
--

From: Vladislav Bolkhovitin
Date: Monday, November 15, 2010 - 1:36 pm

Correct. As I wrote in the previous e-mail, SCST doesn't deal with

Indeed, it is unfortunate :(. Undercover political games continue...

Vlad
--

From: Boaz Harrosh
Date: Monday, November 15, 2010 - 2:46 am

This is not nice and is uncharacteristic of you.

This project, even though out-of-tree, is an old and mature project that
has many users. These are all *Linux* users. The authors and community
have come to us for help, and advice on making this code acceptable for
mainline and hardening the code the way, only one project on the planet
can do, the Linux community. I think it is our courtesy and obligation
to the Linux users of this Project to comment where they are doing wrong
and where they should do better.

It is not of their choice to be out-of-tree. It is ours. The least we can
do. Is give then some assistance if we can, and have 5 minutes of our time.

All these issues we were discussing are interesting and are real Kernel
problems. For instance the last comment you made was that for such a dynamic
system and life time problems, and functionality. A better and expected
solution might be the device tree and not sysfs. And for such big additions
the sysfs maintainer must give his blessings. This is most valuable information
regardless of if we accept their code or not at the end.

Sincerely yours
Boaz
--

From: Greg KH
Date: Monday, November 15, 2010 - 9:16 am

Why, I'm not allowed to get frustrated at repeated attempts to get the
original poster to change their code to something that is acceptable and
just give up and walk away?


It is also the job of the kernel community to say "No, what you are
doing is wrong, please don't do that."



Yes, that is what I have been saying for a while now.

Again:
	This code is using kobjects incorrectly.
	This code should not be using kobjects.

this is my last response to this thread now, and I'm sure you can
understand why.

thanks,

greg k-h
--

From: Boaz Harrosh
Date: Monday, November 15, 2010 - 10:19 am

Thank you Greg for your time and most valuable input.
I'm sorry for not understanding your position. I needed the
clear cut statement:

	This code should not be using kobjects. i.e not belong in sysfs

SCST guys. This sounds pretty clear cut to me. Sysfs was not built
in mind for such dynamic systems, and it will cause never ending
conflicts with future maintenance of sysfs vs SCST.

Perhaps consider a new alternative like the device tree as Greg suggested
or maybe finally accept the harsh realities of ConfigFS, and come join us
in the LIO project. SCST is a most valuable project and community which
we would like to join forces with in making Linux the best.

Lets call it Linux-Target and unify all our efforts.

Boaz
--

From: Bart Van Assche
Date: Monday, November 15, 2010 - 10:49 am

I think that Vlad has already explained several times why ConfigFS is
not suited for the needs of a storage target: a storage target must
not only be able to accept configuration information from userspace
but must also be able to create new directories and file nodes itself.
See e.g. this message from October 6:
http://kerneltrap.org/mailarchive/linux-kernel/2010/10/6/4628664.

Bart.
--

From: Nicholas A. Bellinger
Date: Monday, November 15, 2010 - 1:19 pm

<Trimming CC list>


Sorry, but this post explains nothing but a single misguided and
uninformed opinion, with no hard facts on the actual usage of a native
configfs control plane within target mode infrastructure.  

The hard facts are:

Using configfs works to drive the generation/release, real time
configuration, dependancy relationships because:

*) configfs represents the individual target data structures who's
creation/deletion are driven from enterily userspace.

*) The parent/child relationships of dependant data structures is
handled transparently to the configfs consumer (eg: no hard requirement
internal reference counting)

*) The module reference counting of target core -> fabric module is
handled transparently to configfs consumers *and* TCM fabric modules

*) The current implementation of TCM/ConfigFS contains no global locks.
Each /sys/kernel/config/target/$FABRIC_1 operates independently of
/sys/kernel/config/target/$FABRIC_2

*) Expecting target fabric module developers to (by-hand) add struct
config_groups, struct kobjects or anything else directly to their fabric
module code is really clumsy and stupid.  This problem was solved
earlier this year in TCM v4.0 with the advent of:

http://git.kernel.org/?p=linux/kernel/git/nab/lio-core-2.6.git;a=blob;f=drivers/target...

What this code means is that target fabric module developers no longer
has to duplicate complex control path code, and all of the interesting
port attributes (like ALUA secondary access state for example) are
transparent to new fabric modules, and 'just work'.  We can even
generate *functional* configfs skeletons for new fabric modules using
tcm_mod_builder.py discussed here:

http://git.kernel.org/?p=linux/kernel/git/nab/lio-core-2.6.git;a=blob;f=Documentation/...

For the specific case of the target control path, until someone from
SCST can present some hard facts *and* running code about ...
From: Vladislav Bolkhovitin
Date: Tuesday, November 16, 2010 - 6:12 am

What is "misguided and uninformed opinion"? That you can't with ConfigFS
serve real life needs for target driver developers and can't create
targets for hardware target ports from withing the kernel and instead
enforce users to clumsy workarounds to somehow magically know their
names to manually perform "mkdir target_name"?

The same problem exists for all other objects where you need to create
ConfigFS entries (directories), like for per-session/per-initiator
statistics or default ACLs.

ConfigFS is just too simple to serve real life needs of a SCSI target
subsystem.

Vlad
--

From: Richard Williams
Date: Tuesday, November 16, 2010 - 4:59 am

I'm just an outsider - but maybe my perspective has value - it seems there are two sides to this debate:

1) sysfs is great for scst due to certain stability concerns and code concerns
2) sysfs is bad for scst due to the intended role of sysfs and its namespace

Maybe I misunderstand -
But if both sides have merit then wouldn't a compromise be appropriate?

Maybe the sensical compromise is to use sysfs code to create a new namespace that would fit this purpose?  It seems that I am also hearing that the alternatives to sysfs aren't always adequate - so why not use sysfs, but have a place where it's appropriate to use it?  

Apologies in advance if I'm just way off base here... 

- Richard Williams--

From: Vladislav Bolkhovitin
Date: Tuesday, November 16, 2010 - 6:17 am

(Since this discussion goes to a quite fundamental scope, I let myself
to add Joel Becker and Linus Torvalds on CC)


Your questions are very good, so let's summarize what we need to serve
the needs of a SCSI target subsystem (not necessary SCST) and see what
can fit them.

So, the needs:

1. Be capable to represent to user space internal configuration to let
user space be able to see and analyze it, including various statistics.

2. Let user space manage the internal configuration.

3. Desired: possibility to send to user space events about important
internal actions, like I/O failures, which may need user space
intervention to recover, like switching from active to passive role in a
cluster.

So, what can we do with ConfigFS:

(1): Only partially, because by design ConfigFS isn't supposed to
represent internal configuration, it can only manage it. Extending
ConfigFS to be capable to do that would be, in my understanding, a
strong violation of its purpose and, hence, design and if went this way
eventually ConfigFS would become just a duplication of the SYSFS
functionality.

(2): ConfigFS can do that. This is exactly for what it was designed and
implemented. But in this particular application it would have some
limitations derived from (1): to manage harware-related entries a user
should magically know from somewhen names of those entries to create
them by "mkdir" command.

For instance, consider a user has a Fibre Channel HBA and want to use it
in the target mode. Before he can configure it, he should somehow know
its ports names and for each of them run:

# mkdir /sys/kernel/config/.../50:50:00:00:00:00:00:11
# mkdir /sys/kernel/config/.../50:50:00:00:00:00:00:12
...

where 50:50:00:00:00:00:00:1x are the ports' names. Only after that
those ports appear on the ConfigFS and can be managed.

(3): No events at all.

Now consider SYSFS:

(1): Easily. This is exactly for what it was designed and implemented.

(2): Possible without any limitations and side ...
From: Vladislav Bolkhovitin
Date: Thursday, November 18, 2010 - 2:02 pm

Since nobody objected, Greg, could you consider to ACK SCST SYSFS
management interface in /sys/kernel/scst_tgt/, please? Please find the
SCST SYSFS ABI documentation file you requested below.

We are also preparing a new patch to free our objects in kobjects.release()
without explicit x_initialized flags as you requested.

Thank you for your effort,
Vlad

Documentation/ABI/testing/sysfs-scst:

What:		/sys/kernel/scst_tgt/
Date:		November 2010
Contact:	Vladislav Bolkhovitin <vst@vlnb.net>
Description:
		Contains SCST management interface entries.

What:		/sys/kernel/scst_tgt/devices/
Date:		November 2010
Contact:	Vladislav Bolkhovitin <vst@vlnb.net>
Description:
		Contains subdirectories for all SCST devices

What:		/sys/kernel/scst_tgt/devices/<device>/exported/exportX
Date:		November 2010
Contact:	Vladislav Bolkhovitin <vst@vlnb.net>
Description:
		Links to LUNs in the LUNs group where <device> exported, e.g. to
		/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.vlnb:tgt/luns/11

What:		/sys/kernel/scst_tgt/devices/<device>/handler
Date:		November 2010
Contact:	Vladislav Bolkhovitin <vst@vlnb.net>
Description:
		Link to dev handler of this device, if assigned, e.g. to
		/sys/kernel/scst_tgt/handlers/dev_disk

What:		/sys/kernel/scst_tgt/devices/<device>/type
Date:		November 2010
Contact:	Vladislav Bolkhovitin <vst@vlnb.net>
Description:
		SCSI type of this device as define by SAM.

What:		/sys/kernel/scst_tgt/handlers/
Date:		November 2010
Contact:	Vladislav Bolkhovitin <vst@vlnb.net>
Description:
		Contains all SCST dev handlers.

What:		/sys/kernel/scst_tgt/handlers/<handler>/<device>
Date:		November 2010
Contact:	Vladislav Bolkhovitin <vst@vlnb.net>
Description:
		Links to <device> managed by this dev handler, e.g.
		ext3_disk1_4K -> /sys/kernel/scst_tgt/devices/ext3_disk1_4K

What:		/sys/kernel/scst_tgt/handlers/<handler>/mgmt
Date:		November 2010
Contact:	Vladislav Bolkhovitin <vst@vlnb.net>
Description:
		Management entry, which allows to ...
From: Greg KH
Date: Thursday, November 18, 2010 - 2:46 pm

No, sorry, again, you should not be using kobjects, and do not polute
the main /sys/kernel/ namespace with this.

Use 'struct device' please, that is what it is there for, and is what
the rest of the kernel is using.  And use the rest of the
driver/bus/device infrastructure as your model will work with it just
fine.

Yes, I know you said you didn't think you could use it, and that your
code was different than everyone elses, but I still do not believe it,
sorry.

good luck,

greg k-h
--

From: Vladislav Bolkhovitin
Date: Friday, November 19, 2010 - 11:00 am

Greg, sorry, I don't understand your requirements and, because of this,
we can't go ahead implementing them. Could you explain your position,
please?

None of the SCST objects are Linux devices. None of them has entries in
/dev, none of them needs to send any events to udev and none of them
sends or receives data from DMA, hence has any DMA parameters or
restrictions. So, how can them fit into the driver/bus/device model you
are enforcing?

For instance:

 - struct sgv_pool (/sys/kernel/scst_tgt/sgv/<cache>) is an SG vectors
cache. Isn't it a nonsense to make it device?

 - struct scst_acg
(/sys/kernel/scst_tgt/targets/<target_driver>/<target>/ini_groups/<group>/)
is an access control group to define which initiators see which LUNs
from target <target>. Same, isn't it a nonsense to make it device?

 - struct scst_acn
(/sys/kernel/scst_tgt/targets/<target_driver>/<target>/ini_groups/<group>/initiators/<initiator_name>)
is an entry in the access control group <group> to define which
initiators should use group <group>. Again, isn't it a nonsense to make
it device?

Etc.

How could they fit in the driver/bus/device model?

I guess, your confusion comes from word "device" you see in the SCST
SYSFS tree and SCST supposed to work with "devices" and "dev handlers"?
But those devices are not Linux devices, they are just objects to couple
initiators' requests and backstorage which stores data.

In other words, SCST devices are just redirection points passing
incoming requests to Linux files and devices.

SCST devices are the same as NFS exported directories. Do you believe,
that struct device must be created for each NFS export?

In the cases when we need to present SCST devices as Linux devices we
use scst_local driver [1], which performs it exactly as you requesting,
i.e. doing:

root_device_register()
bus_register()
driver_register()

What can we do to make you believe it? Shall we write a document
describing all SCST objects and their relationship between each ...
From: Dmitry Torokhov
Date: Friday, November 19, 2010 - 1:22 pm

Note that the entities in /sys/devices/... tree and not necessarily
physical devices bit rather interface abstractionss. Consider, for
example, /sys/class/input/*. None of the "devices" there talk directly
to hardware, do DMA or other things. Some of them don't even talk to
usrespace directly but rather through additional interfaces (evdev.
mousedev, ect). Still they are represented there and even have suspend
and resume methods (because even for logical devices it makes sense to

Maybe not all of them are. Some of them could probably be represented by
attributes of other devices. And some of them are, fitting into the
overall /sys/devices hierarchy and describing physical and logical
relations between them.

-- 
Dmitry
--

From: Vladislav Bolkhovitin
Date: Friday, November 19, 2010 - 1:50 pm

But all of them still place from where to events received and where
requests from Linux sent, aren't them?

SCST devices are not even logical devices. As I wrote, "devices" word is
misleading. SCST devices are converse of what Linux means under this
word. SCST devices are like NFS exports: a place where those events
generated and those requests received.

Think of SCST device as if it sits on the opposite side of the PCI bus
of the corresponding SCSI device Linux sees in /sys/class and /sys/bus.

So, if we need Linux devices for SCST devices, we create them using
scst_local driver. And then, of course, all them have their place in
/sys/class/ and /sys/bus. Although more common use of them from remote
systems via iSCSI, Fibre Channel, InfiniBand, etc. The remote systems
create devices in /sys/class/ and /sys/bus (if they are Linux), we serve
them.

Vlad
--

From: Greg KH
Date: Friday, November 19, 2010 - 2:16 pm

No, just /sys/bus/ which will cause them to become part of the big
device tree in /sys/devices/

Again, use this interface, it is what it is there for, to not use it is
just wrong.

good luck,

greg k-h
--

From: Vladislav Bolkhovitin
Date: Wednesday, November 24, 2010 - 1:35 pm

USB gadgets are a good example, but not quite.

I'm objecting not the possibility to implement SCST objects as devices.
We are creating software, so all what hardware allows can be implemented.

I'm arguing usage of devices view for something which fundamentally not
devices.

USB gadgets are subset of what SCST allows. On the external side they
are tightly coupled to hardware, on the backend side they don't need any
management interface, because (at least for storage) all devices
configuration they have is static via module parameters. The only
attributes file_storage has are to change FUA and file path, which can
be placed anywhere, including /sys/module/g_file_storage/parameters.

So, for USB gadgets the config interface and place in /sys doesn't
matter much. I guess, making its LUNs as Linux devices were just part of
the ritual to get accepted into the kernel. On the USB gadgets
developers place I wouldn't mind too to go this way.

But SCST is a _big_, _complete_, _new_ Linux subsystem. Actually, USB
storage gadgets should be SCST targets to use full power of all possible
virtual and real backends SCST provides, hence stay inside SCST subtree.

For SCST it is very important that all its functionality concentrated in
one place and not confused with other client side devices Linux has.
Important for clearness, easy to use, easy to understand, flexibility,
maintainability, etc.

It's like as you created rules to keep and train dogs. Then you need to
keep and train cats as well. Would it be wise to keep cats in the same
place together with dogs and train them the same tricks using the same

If you have such wide devices definition, why file systems have separate
/sys/fs namespace? Or ksm place in /sys/kernel/mm/ksm? Or hugepages in
/sys/kernel/mm/hugepages?

If NFS exports are devices, file systems must also be devices, correct?

Like /sys/fs/ext4/sda11. Isn't it a full analogy to NFS export

Well, if I don't agree with you it doesn't mean I'm not listening to
you. ...
From: Greg KH
Date: Friday, November 19, 2010 - 2:19 pm

That doesn't matter.  They are still "devices" that the kernel knows

It's the duty of a subsystem's maintainer to enforce the correct model
of the kernel, and that is what I am doing.

Again, this is the last email I'm writing on this topic, as none of the
previous ones seem to be sinking in.

good luck,

greg k-h
--

From: Bart Van Assche
Date: Friday, December 10, 2010 - 5:06 am

How about using 'struct device' as follows ?
* Move /sys/kernel/scst_tgt/targets to /sys/class/target_driver.
* Move /sys/kernel/scst_tgt/handlers to /sys/class/device_driver.
* Move /sys/kernel/scst_tgt/devices to /sys/class/target_device.
* Move the attributes of /sys/kernel/scst_tgt to /sys/devices/scst.
* Move /sys/kernel/scst_tgt/sgv to /sys/devices/scst/sgv

Some quickly hacked up code (that needs further polishing) that
implements this scheme can be found here:
https://scst.svn.sourceforge.net/svnroot/scst/branches/sysfs-tree-changes.

An example of an SCST configuration file and the corresponding sysfs hierarchy:

# uname -r
2.6.37-rc5-scst+

# cat /etc/scst.conf
HANDLER vdisk_blockio {
	DEVICE disk01 {
		filename /dev/ram0
		nv_cache 1
	}
	DEVICE disk02 {
		filename /dev/ram1
		nv_cache 1
	}
}

HANDLER vdisk_fileio {
	DEVICE disk03 {
		filename /dev/vdisk
		nv_cache 1
	}
	DEVICE disk04 {
		filename /dev/vdisk
		nv_cache 1
	}
}

HANDLER vdisk_nullio {
	DEVICE disk05
	DEVICE disk06
}

TARGET_DRIVER scst_local {
	TARGET local {
		session_name local

		LUN 0 disk01
		LUN 1 disk02
		LUN 2 disk03
		LUN 3 disk04
		LUN 4 disk05
		LUN 5 disk06
	}
}

TARGET_DRIVER ib_srpt {
	TARGET ib_srpt_target_0 {
		rel_tgt_id 1
		enabled 1

		LUN 0 disk01
		LUN 1 disk02
		LUN 2 disk03
		LUN 3 disk04
		LUN 4 disk05
		LUN 5 disk06
	}

	TARGET ib_srpt_target_1 {
		rel_tgt_id 2
		enabled 1

		LUN 0 disk01
		LUN 1 disk02
		LUN 2 disk03
		LUN 3 disk04
		LUN 4 disk05
		LUN 5 disk06
	}
}

TARGET_DRIVER iscsi {
	enabled 1

	TARGET iqn.2005-03.org.open-iscsi:dbc01e1792b:storage {
		rel_tgt_id 4
		LUN 0 disk01
		LUN 1 disk02
		LUN 2 disk03
		LUN 3 disk04
		LUN 4 disk05
		LUN 5 disk06
		enabled 1
	}
}

# find /sys/{class,devices/virtual}/{target_driver,device_driver,target_device}
/sys/devices/scst | while read f; do if [ -h $f ]; then echo "$f ->
$(readlink $f)"; else echo "$f"; fi; ...
From: Greg KH
Date: Friday, December 10, 2010 - 12:36 pm

Please never create a new class, use 'struct bus_type' instead and
create a bus and have drivers and devices on it.

thanks,

greg k-h
--

From: Bart Van Assche
Date: Tuesday, December 14, 2010 - 7:10 am

How about the hierarchy illustrated below, using the two new bus types
scsi_tgt_dev and scsi_tgt_port ? Note: this time I neither have loaded
the scst_local kernel module nor iscsi-scst in order to keep the
output short.

# find /sys/bus/scsi_tgt_* /sys/devices/scst | while read f; do if [
-h $f ]; then echo "$f -> $(readlink $f)"; else echo $f; fi; done
/sys/bus/scsi_tgt_dev
/sys/bus/scsi_tgt_dev/uevent
/sys/bus/scsi_tgt_dev/devices
/sys/bus/scsi_tgt_dev/devices/2:0:0:0 -> ../../../devices/2:0:0:0
/sys/bus/scsi_tgt_dev/devices/2:0:1:0 -> ../../../devices/2:0:1:0
/sys/bus/scsi_tgt_dev/devices/3:0:0:0 -> ../../../devices/3:0:0:0
/sys/bus/scsi_tgt_dev/devices/disk01 -> ../../../devices/disk01
/sys/bus/scsi_tgt_dev/devices/disk02 -> ../../../devices/disk02
/sys/bus/scsi_tgt_dev/devices/disk03 -> ../../../devices/disk03
/sys/bus/scsi_tgt_dev/devices/disk04 -> ../../../devices/disk04
/sys/bus/scsi_tgt_dev/devices/disk05 -> ../../../devices/disk05
/sys/bus/scsi_tgt_dev/devices/disk06 -> ../../../devices/disk06
/sys/bus/scsi_tgt_dev/drivers
/sys/bus/scsi_tgt_dev/drivers/dev_disk
/sys/bus/scsi_tgt_dev/drivers/dev_disk/module -> ../../../../module/scst_disk
/sys/bus/scsi_tgt_dev/drivers/dev_disk/uevent
/sys/bus/scsi_tgt_dev/drivers/dev_disk/type
/sys/bus/scsi_tgt_dev/drivers/dev_disk/trace_level
/sys/bus/scsi_tgt_dev/drivers/dev_disk_perf
/sys/bus/scsi_tgt_dev/drivers/dev_disk_perf/module ->
../../../../module/scst_disk
/sys/bus/scsi_tgt_dev/drivers/dev_disk_perf/uevent
/sys/bus/scsi_tgt_dev/drivers/dev_disk_perf/type
/sys/bus/scsi_tgt_dev/drivers/dev_disk_perf/trace_level
/sys/bus/scsi_tgt_dev/drivers/vdisk_fileio
/sys/bus/scsi_tgt_dev/drivers/vdisk_fileio/module ->
../../../../module/scst_vdisk
/sys/bus/scsi_tgt_dev/drivers/vdisk_fileio/uevent
/sys/bus/scsi_tgt_dev/drivers/vdisk_fileio/type
/sys/bus/scsi_tgt_dev/drivers/vdisk_fileio/add_device_parameters
/sys/bus/scsi_tgt_dev/drivers/vdisk_fileio/trace_level
/sys/bus/scsi_tgt_dev/drivers/vdisk_fileio/disk01 -> ...
From: Bart Van Assche
Date: Friday, November 19, 2010 - 11:01 am

Hello Greg,

As you can see in recent messages (e.g.
http://lkml.org/lkml/2010/11/18/578 or
http://lkml.org/lkml/2010/11/15/296), the abstractions represented in
SCST are:
* Target templates (scst_tgt_template), e.g. scst_local or ib_srpt.
These are drivers that implement a storage protocol.
* Target ports (scst_tgt), e.g. ib_srpt_target_0, which represent a
communication interface controlled by a storage protocol driver.
* Sessions (scst_session), e.g. 0x00000000000000000002c9030005f34b. A
session corresponds to a single initiator-target nexus.
* Device handlers (scst_dev_type), e.g. scst_disk or scst_vdisk, which
are drivers that allow SCST to export storage.
* Target devices (scst_device), e.g. disk01, which represent exported
storage. Each instance is controlled by a single device handler. This
concept includes e.g. block devices and files. Some but not all target
devices have a corresponding device node in /dev.
* Access control groups (scst_acg), which allow e.g. to implement LUN masking.
* Device-specific information such as the SCSI LUN (scst_acg_dev).

As we all know the driver, bus and device abstractions were invented
to model how peripheral devices are connected to a system. What a
storage target does is the converse - define devices and make it
possible for other systems to use these devices. As a result,
unfortunately, the driver, bus and device abstractions do not map
one-to-one on all of the concepts used in a storage target. So I'm
still not sure whether it is a good idea to use the driver, bus and
or/device concepts for all of the above concepts.

Also, using the driver, bus or device concept for one or more of the
storage target concepts would open up a potential for naming
conflicts. There is already e.g. the well-known null device. SCST e.g.
defines a vdisk_nullio target template. Having these all as device
nodes under /dev might not only confuse users but also creates a huge
potential for naming conflicts. That's why we prefer separate
namespaces ...
From: Vladislav Bolkhovitin
Date: Monday, November 15, 2010 - 1:39 pm

As I explained in the previous e-mail, I believe, SYSFS perfectly suits
SCST and SCST perfectly suits SYSFS.

If you think it isn't so, let's discuss each showstopper for that, one

Looks like a great idea!

Thanks,
Vlad
--

From: Vladislav Bolkhovitin
Date: Monday, November 15, 2010 - 1:39 pm

Hmm, frankly, I decided that you agreed with my arguments..

As I wrote, I'm willing to make any changes you requests. I only asked
why this should be done.

I really don't understand why we and other similar in-kernel developers
should treat kobjects in the different way than any other subobjects of
our outer objects and make for them _additional code_ to specially treat
them as life-time center (http://lkml.org/lkml/2010/11/10/421)? You have
not explained it anywhere in any doc I can find.

This is just small "why" question. Greg, don't we have a right to ask


It is REALLY frustrating you are refusing to explain why. I guess, I'm
too stupid to figure out that alone. Don't you want we rise as highly
skilled kernel developers? I believe, not only SCST developers are very
interested to know background behind particular moves in the kernel.

Thanks,
Vlad
--

From: Bart Van Assche
Date: Monday, November 15, 2010 - 10:45 am

We might have missed something, but as far as we know it has not yet
been explained in this thread why using 'struct device' would be an
advantage over using 'struct kobject'. All I can see are the
disadvantages of such a transition: instead of having a single
hierarchy that represents all SCST-related information, there would be
multiple, and the hierarchical relationship between objects would be
lost. Also, during startup, once all SCST-related kernel modules have
been loaded, configuration happens by writing values to individual
sysfs variables. There is a user-space tool included with SCST that
not only can restore a configuration from a file but also can save an
existing configuration to file. I'm afraid that saving an existing
configuration would be made considerably more difficult by
transforming the single SCST sysfs-tree into multiple. Below you can
find an example of a sysfs-tree created by SCST:

# (cd /sys/kernel/scst_tgt && find | cut ...
From: Greg KH
Date: Monday, November 15, 2010 - 11:44 am

It's very simple.

You want your device to show up in the global device tree in the kernel,
not off to one side, unconnected to anything else.

Please use 'struct device', it is what you want to do here.

good luck,

greg k-h
--

From: Vladislav Bolkhovitin
Date: Monday, November 15, 2010 - 1:39 pm

But we don't have any device to show up in the global device tree! We
don't have any devices in the struct device's understanding at all!

Vlad
--

From: Greg KH
Date: Monday, November 15, 2010 - 3:13 pm

Then create them just like you are doing so for your kobject use.

The first device would be the root one, and then everything trickles
down from there.

And use configfs for your configuration stuff, that's what it is there
for, and if it doesn't somehow work properly for you, please work with
the configfs developers to fix that up.

thanks,

greg k-h
--

From: Joe Eykholt
Date: Monday, November 15, 2010 - 10:04 pm

I don't have any opinion on the above, but I don't see why sysfs can't be
used for configuration as well as its other roles. It seems to me wasteful
to require configfs to be used in order to change configuration when
sysfs works fine for this.

Here are a couple of existing examples where sysfs is used in a role that
would seem similar to SCST's usage:

1) scsi_transport_fc already has an sysfs file, fc_host/vport_create, which
can be written to create new fc_host and scsi_host instances.

2) fcoe.ko uses a write to the sysfs file /sys/module/fcoe/parameters/create
to start the protocol on a particular ethernet interface.  There's another
file for destroy.

I'll bet there are other examples in other subsystems, and I don't think
there is anything wrong with the above usages of sysfs.

I agree with Vladislav's point that configfs doesn't work instead of sysfs
because configs doesn't make it easy for the kernel side to create nodes for
dynamic information like connected initiators and statistics.
So, if the above examples are considered a misuse of sysfs,
SCST would need to use both sysfs and configfs.  It would use configfs to
do configuration, and sysfs to display and access the dynamic information
like connected initiators.  That seems a minor role for configfs,
which is easily handled in sysfs as SCST currently does.

Just my two cents.

	Joe
--

From: Nicholas A. Bellinger
Date: Monday, November 15, 2010 - 11:03 pm

I disagree that conversion of 'demo mode' struct se_node_acls to explict
userspace syscall driven configfs registered struct
se_node_acl->acl_group is overly complex, or otherwise difficult in
context of a configfs consumer for a transition demo mode -> explict
NodeACL via:

mkdir -p /sys/kernel/config/target/$FABRIC/$WWN/$TPGT/acls/$INITIATOR_WWPN.

Note the current v4.0 code supports this for LIO-Target and TCM/FC
fabric module code, and will just work 'out of the box' for v4.0
compatible fabric modules using target_core_fabric_configfs.c logic.

As for statistics point, this is also currently handled by TCM in struct
se_node_acl in target_core_mib.c:scsi_auth_intr_seq_show() for both
explict NodeACL and TPG context 'demo-mode' generated struct se_node_acl

I think the question is more along the lines of: does sysfs have a hard
requirement for target mode using native TCM/ConfigFS logic.?.  Early in
the development of TCM/LIO v3.0 I thought that driving creation of
configfs struct_groups from kernel code was in fact useful for my own
development and for future mainline code, but with the help of foresight
of jlbec, gregkh and others with our modern v4.0 code I am no longer
convinced we need any direct kernel level interaction between native
v4.0 target mode configfs code and existing sysfs code.

So far using interpreted userspace python/shell code has served this
purpose really quite well in terms of interaction between target mode
configfs and existing sysfs design, and I do not currently see any
inherient limitations from a userspace code driven persective for a full
native configfs target implementation.  I would be happy to be proven
wrong on this point, but so far the original decision to drive target
mode completely from userspace with native configfs code has in fact
proven to be a better decision than a hybrid kernel level configfs +
sysfs implementation from the perspective of the userspace developer
driving /sys/kernel/config/target/* code.

So on this ...
From: Florian Mickler
Date: Tuesday, November 16, 2010 - 1:49 am

On Mon, 15 Nov 2010 21:04:19 -0800

Well, I'm not involved here.. but to me it makes sense. It's just a
useful principle in software design. You don't take something that is
designed for one purpose and mis-use it. It will grow warts and
mutate, stealing flexibility. 

it's one of the reasons many software projects become
unmaintainable... nobody can change anything anymore because the


Flo
--

From: Vladislav Bolkhovitin
Date: Tuesday, November 16, 2010 - 6:18 am

Thank you, Joe, for those good examples. I may not know for what SYSFS
was _originally_ designed, but the current reality is that SYSFS is
_widely_ used to both represent the kernel internal configuration as
well as to change it. And SCST is just following that.

After all, if SYSFS supposed to only represent the kernel internal
configuration, why wasn't it from the beginning made read-only?

Vlad
--

From: Bart Van Assche
Date: Tuesday, November 16, 2010 - 12:15 am

Hello Greg,

The reason why we use sysfs instead of configfs is that we do not only
want to export kernel objects to user space but because we also want
to allow configuration from user space. As far as I can see configfs
has been designed to allow configuration from user space only and not
for exporting kernel objects. A quote from
Documentation/filesystems/configfs/configfs.txt:

12 [What is configfs?]
13
14 configfs is a ram-based filesystem that provides the converse of
15 sysfs's functionality.  Where sysfs is a filesystem-based view of
16 kernel objects, configfs is a filesystem-based manager of kernel
17 objects, or config_items.

Bart.
--

From: Vladislav Bolkhovitin
Date: Tuesday, November 16, 2010 - 6:19 am

Why?

Greg, sorry, you keep writing as if we all are idiots, but keep refusing
to explain us, idiots, why we can't do what we did. It isn't very

Sorry, I can't understand what you mean. What would be the purpose of
this configuration and what benefits it would bring?

Anyway, let's forget about SCSI. Do you believe that struct device
should be created for all IP addresses on which an NFS server listen and

I explained why it isn't a too good option in another e-mail.

Thanks,
Vlad
--

From: Vladislav Bolkhovitin
Date: Monday, November 15, 2010 - 1:36 pm

Hmm, I thought we had the approvement in the "[RFC]: SCST sysfs layout"
thread (http://osdir.com/ml/linux-kernel/2009-04/msg07822.html). Particularly,
in the message http://osdir.com/ml/linux-kernel/2009-04/msg08557.html.

But, looks like it isn't so and I should have asked you, the SYSFS maintainer.
Sorry.

Could you consider it, please?


I'll try to explain. It's a bit long story involving deep SCSI target specific
knowledge, but I'll try to make it as simple and short as possible.

SCST is a SCSI _target_ side subsystem, i.e. it is a _server_ side sybsystem.
It exports SCSI-speaking devices, not using them. You can consider it as an NFS
server. What usually meant by "SCSI subsystem" is SCSI _initiator_ subsystem,
i.e. client side subsystem (like NFS client), which is using SCSI-speaking
devices by sending SCSI commands to them.

Any SCSI-speaking protocol can be used with SCST: parallel (wide) SCSI, Fibre Channel,
iSCSI, SAS, SRP, iSER, etc. (Also, non-SCSI speaking protocols, like AoE and
NBD can be used, but that's another story.)

Strictly as required by SCSI Architecture Model (SAM [1]), SCST doesn't deal with
hardware devices. The closest to hardware things SCST deals with are SCSI target
ports and SCSI target devices.

SCSI target port is an abstract concept reflecting path through which SCST
exports devices. You can consider it as an IP network (network, not interface!)
through which an NFS server's exports can be used. For instance, for iSCSI
such ports are iSCSI targets. For Fibre Channel - virtual or hardware Fibre
Channel ports.

SCSI target device is an abstract concept, which provides a way to reach real storage
(files, block devices, etc.) and contains internal state information (reservations,
configuration parameters, etc.). You can consider it as an NFS export. Please don't
confuse it with SCSI _initiator_ device, which is a place to generate SCSI commands
and send them via one or more SCSI initiator ports (MPIO). On the target side
they will be ...
From: Dmitry Torokhov
Date: Monday, November 15, 2010 - 12:04 am

The same rules as with files - the object disappears from the
"directories" so no new users can get it but is not destroyed till last

You delete the object from the list, then mark it as dead, notify users,
drop refcount. No new users will get it (as it is not on the list
anymore) and existing ones should notice that it is dead and stop using

Right, Tejun plugged this particular (and very annoying) attributes
behavior, but that does not mean that this is the only way kobject's
reference might be pinned.

-- 
Dmitry
--

From: Vladislav Bolkhovitin
Date: Monday, November 15, 2010 - 1:37 pm

Those are good in theory, but on practice, you know, devil is in the

This behavior isn't annoying, it's GREAT, because it allows to use SYSFS

Could you be more specific and point out on exact ways for that? From my
quite deep SYSFS source code study I see such cases should not exist.

Thanks,
Vlad
--

From: Dmitry Torokhov
Date: Monday, November 15, 2010 - 2:14 pm

Right, I mean that _before_ Tejun plugged that hole the behavior _was_

While I do not know offhand I am sure there are such scenarios. Isn't
there any way for the users that you are waiting on descend back into
your module that is waiting for kobject removal and get stuck on some
resource?

Even if it isn't possible now the scheme is quite fragile. Kobjects are
refcounted so work with them appropriately (rely on refcount, do not
wait, etc).

-- 
Dmitry
--

From: Vladislav Bolkhovitin
Date: Tuesday, November 16, 2010 - 6:13 am

No, I don't see any, because SYSFS implements atomic "all or nothing"

The same is true for other SCST objects. For instance, a target can't be
destroyed until there are commands from it being processed. So, kobjects
are only one of the objects we wait for all their ref counters reach
zero, hence addition of kobjects to SCST objects changed nothing in this
area.

Vlad


--

From: Vladislav Bolkhovitin
Date: Friday, October 1, 2010 - 2:48 pm

This patch contains SCST SGV cache.

SCST SGV cache is a memory management subsystem in SCST. One can call it
a "memory pool", but Linux kernel already have a mempool interface,
which serves different purposes. SGV cache provides to SCST core, target
drivers and backend dev handlers facilities to allocate, build and cache
SG vectors for data buffers. The main advantage of it is the caching
facility, when it doesn't free to the system each vector, which is not
used anymore, but keeps it for a while (possibly indefinitely) to let it
be reused by the next consecutive command. This allows to:

 - Reduce commands processing latencies and, hence, improve performance;

 - Make commands processing latencies predictable, which is essential
   for RT applications.

The freed SG vectors are kept by the SGV cache either for some (possibly
indefinite) time, or, optionally, until the system needs more memory and
asks to free some using the set_shrinker() interface. Also the SGV cache
allows to:

  - Cluster pages together. "Cluster" means merging adjacent pages in a
single SG entry. It allows to have less SG entries in the resulting SG
vector, hence improve performance handling it as well as allow to
work with bigger buffers on hardware with limited SG capabilities.

  - Set custom page allocator functions. For instance, scst_user device
handler uses this facility to eliminate unneeded mapping/unmapping of
user space pages and avoid unneeded IOCTL calls for buffers allocations.
In fileio_tgt application, which uses a regular malloc() function to
allocate data buffers, this facility allows ~30% less CPU load and
considerable performance increase.

 - Prevent each initiator or all initiators altogether to allocate too
much memory and DoS the target. Consider 10 initiators, which can have
access to 10 devices each. Any of them can queue up to 64 commands, each
can transfer up to 1MB of data. So, all of them in a peak can allocate
up to 10*10*64 = ~6.5GB of memory for data buffers. This ...
From: Vladislav Bolkhovitin
Date: Friday, October 1, 2010 - 2:48 pm

This patch contains SCST core's docs.

Signed-off-by: Vladislav Bolkhovitin <vst@vlnb.net>
---
 README.scst | 1473 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 SysfsRules  |  942 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 2415 insertions(+)

diff -uprN orig/linux-2.6.35/Documentation/scst/README.scst linux-2.6.35/Documentation/scst/README.scst
--- orig/linux-2.6.35/Documentation/scst/README.scst
+++ linux-2.6.35/Documentation/scst/README.scst
@@ -0,0 +1,1473 @@
+Generic SCSI target mid-level for Linux (SCST)
+==============================================
+
+SCST is designed to provide unified, consistent interface between SCSI
+target drivers and Linux kernel and simplify target drivers development
+as much as possible. Detail description of SCST's features and internals
+could be found on its Internet page http://scst.sourceforge.net.
+
+SCST supports the following I/O modes:
+
+ * Pass-through mode with one to many relationship, i.e. when multiple
+   initiators can connect to the exported pass-through devices, for
+   the following SCSI devices types: disks (type 0), tapes (type 1),
+   processors (type 3), CDROMs (type 5), MO disks (type 7), medium
+   changers (type 8) and RAID controllers (type 0xC).
+
+ * FILEIO mode, which allows to use files on file systems or block
+   devices as virtual remotely available SCSI disks or CDROMs with
+   benefits of the Linux page cache.
+
+ * BLOCKIO mode, which performs direct block IO with a block device,
+   bypassing page-cache for all operations. This mode works ideally with
+   high-end storage HBAs and for applications that either do not need
+   caching between application and disk or need the large block
+   throughput.
+
+ * "Performance" device handlers, which provide in pseudo pass-through
+   mode a way for direct performance measurements without overhead of
+   actual data transferring from/to underlying SCSI device.
+
+In addition, SCST supports advanced per-initiator access and ...
From: Vladislav Bolkhovitin
Date: Friday, October 1, 2010 - 2:49 pm

This patch contains SCST dev handlers' Makefile.

Signed-off-by: Vladislav Bolkhovitin <vst@vlnb.net>
---
 Makefile |   13 +++++++++++++
 1 file changed, 13 insertions(+)

diff -uprN orig/linux-2.6.35/drivers/scst/dev_handlers/Makefile linux-2.6.35/drivers/scst/dev_handlers/Makefile
--- orig/linux-2.6.35/drivers/scst/dev_handlers/Makefile
+++ linux-2.6.35/drivers/scst/dev_handlers/Makefile
@@ -0,0 +1,13 @@
+ccflags-y += -Wno-unused-parameter
+
+obj-m := scst_cdrom.o scst_changer.o scst_disk.o scst_modisk.o scst_tape.o \
+	scst_vdisk.o scst_raid.o scst_processor.o
+
+obj-$(CONFIG_SCST_DISK)		+= scst_disk.o
+obj-$(CONFIG_SCST_TAPE)		+= scst_tape.o
+obj-$(CONFIG_SCST_CDROM)	+= scst_cdrom.o
+obj-$(CONFIG_SCST_MODISK)	+= scst_modisk.o
+obj-$(CONFIG_SCST_CHANGER)	+= scst_changer.o
+obj-$(CONFIG_SCST_RAID)		+= scst_raid.o
+obj-$(CONFIG_SCST_PROCESSOR)	+= scst_processor.o
+obj-$(CONFIG_SCST_VDISK)	+= scst_vdisk.o


--

From: Vladislav Bolkhovitin
Date: Friday, October 1, 2010 - 2:50 pm

This patch contains SCST vdisk dev handler. This dev handler allows
to create virtual disks and CDROMs from files on file system.

It supports the following modes:

 - FILEIO mode, which allows to use files on file systems or block
   devices as virtual remotely available SCSI disks or CDROMs with
   benefits of the Linux page cache.

 - BLOCKIO mode, which performs direct block IO with a block device,
   bypassing page-cache for all operations. This mode works ideally with
   high-end storage HBAs and for applications that either do not need
   caching between application and disk or need the large block
   throughput.

 - NULLIO mode, in which all the commands immediately completed. This mode
   is intended for performance measurements without overhead of actual data
   transfers from/to underlying SCSI device.

Signed-off-by: Vladislav Bolkhovitin <vst@vlnb.net>
---
 scst_vdisk.c | 4186 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 4186 insertions(+)

diff -uprN orig/linux-2.6.35/drivers/scst/dev_handlers/scst_vdisk.c linux-2.6.35/drivers/scst/dev_handlers/scst_vdisk.c
--- orig/linux-2.6.35/drivers/scst/dev_handlers/scst_vdisk.c
+++ linux-2.6.35/drivers/scst/dev_handlers/scst_vdisk.c
@@ -0,0 +1,4186 @@
+/*
+ *  scst_vdisk.c
+ *
+ *  Copyright (C) 2004 - 2010 Vladislav Bolkhovitin <vst@vlnb.net>
+ *  Copyright (C) 2004 - 2005 Leonid Stoljar
+ *  Copyright (C) 2007 Ming Zhang <blackmagic02881 at gmail dot com>
+ *  Copyright (C) 2007 Ross Walker <rswwalker at hotmail dot com>
+ *  Copyright (C) 2007 - 2010 ID7 Ltd.
+ *
+ *  SCSI disk (type 0) and CDROM (type 5) dev handler using files
+ *  on file systems or block devices (VDISK)
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation, version 2
+ *  of the License.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ...
From: Vladislav Bolkhovitin
Date: Friday, October 1, 2010 - 2:51 pm

This patch contains SCST pass-through dev handlers.

Those handlers allows to export local SCSI-capable devices to remote
initiators. 1:many relationship with remote initiators is supported,
i.e. many initiators can connect to single exported local SCSI-capable
device at the same time.

It is possible, because SCST core emulates necessary functionality of
SCSI host adapter. Such emulation is needed, because from remote initiators'
point of view a SCSI target acts as a SCSI host with its own devices. You can
find more deep elaboration why it is needed in
http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg06911.html.

Some of the emulated functions are:

 * Generation of necessary UNIT ATTENTIONs, their storage and delivery to all
   connected remote initiators.

 * RESERVE/RELEASE functionality.

 * All types of RESETs and other task management functions.

 * REPORT LUNS command as well as SCSI address space management in order to have
   consistent address space on all remote initiators, since local SCSI devices
   could not know about each other to report via REPORT LUNS command.

Signed-off-by: Vladislav Bolkhovitin <vst@vlnb.net>
---
 scst_cdrom.c       |  245 ++++++++++++++++++++
 scst_changer.c     |  167 +++++++++++++
 scst_dev_handler.h |   27 ++
 scst_disk.c        |  647 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 scst_modisk.c      |  328 ++++++++++++++++++++++++++
 scst_processor.c   |  167 +++++++++++++
 scst_raid.c        |  168 +++++++++++++
 scst_tape.c        |  361 +++++++++++++++++++++++++++++
 8 files changed, 2110 insertions(+)

diff -uprN orig/linux-2.6.35/drivers/scst/dev_handlers/scst_cdrom.c linux-2.6.35/drivers/scst/dev_handlers/scst_cdrom.c
--- orig/linux-2.6.35/drivers/scst/dev_handlers/scst_cdrom.c
+++ linux-2.6.35/drivers/scst/dev_handlers/scst_cdrom.c
@@ -0,0 +1,245 @@
+/*
+ *  scst_cdrom.c
+ *
+ *  Copyright (C) 2004 - 2010 Vladislav Bolkhovitin <vst@vlnb.net>
+ *  Copyright (C) 2004 - 2005 Leonid Stoljar
+ *  ...
From: Vladislav Bolkhovitin
Date: Friday, October 1, 2010 - 2:53 pm

This patch implements function blk_rq_map_kern_sg(), which allows to
map a kernel-originated SG vector to a block request. It is necessary
to execute SCSI commands with from the kernel going SG buffer. SCST needs
this functionality, because its target drivers, which are, basically,
SCSI drivers, can deal only with SGs, not with BIOs.

Highlight of this implementations:

 - It uses BIOs chaining instead of kmalloc()'ing the whole bio, which is
more reliable.

 - It uses SGs chaining instead of kmalloc()'ing one big SG in case if
direct mapping failed (e.g. because of DMA alignment or padding), which is
also more reliable.

 - When needed, copy_page() is used instead of memcpy() to copy the
whole pages faster.

Also this patch adds and exports function sg_copy(), which copies
one SG vector to another.

This patch has already been reviewed in LKML/linux-scsi in
http://lkml.org/lkml/2009/8/12/304. Since then it was only been updated for
2.6.35.

All this functionality can be a part of the internal SCST library, but looks
like a better place will be in the common place near similar routines.

Signed-off-by: Vladislav Bolkhovitin <vst@vlnb.net>
---
 block/blk-map.c             |  334 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h      |    3 
 include/linux/scatterlist.h |    5 
 lib/scatterlist.c           |  129 ++++++++++++++++
 4 files changed, 471 insertions(+)

diff -upkr linux-2.6.35/block/blk-map.c linux-2.6.35/block/blk-map.c
--- linux-2.6.35/block/blk-map.c	2010-05-17 01:17:36.000000000 +0400
+++ linux-2.6.35/block/blk-map.c	2010-05-24 15:19:49.000000000 +0400
@@ -5,6 +5,8 @@
 #include <linux/module.h>
 #include <linux/bio.h>
 #include <linux/blkdev.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
 #include <scsi/sg.h>		/* for struct sg_iovec */
 
 #include "blk.h"
@@ -271,6 +273,338 @@ int blk_rq_unmap_user(struct bio *bio)
 }
 EXPORT_SYMBOL(blk_rq_unmap_user);
 
+struct blk_kern_sg_work {
+	atomic_t ...
From: Vladislav Bolkhovitin
Date: Friday, October 1, 2010 - 2:57 pm

This patchset contains scst_local target driver with documentation.

This driver allows to access devices that are exported via SCST directly
on the same Linux system that they are exported from. Some highlights:

1. Fully zero-copy implementation with in-kernel dev handlers

2. Support for BIDI commands and long CDBs

3. Easy way to locate sg/bsg devices (LUNs) for each scst_local's session (Scsi_Host)

4. Allows to create several targets with several sessions (Scsi_Hosts) each

5. Allows for each target to set transport type (SCSI and physical transport
   version descriptors)

6. Allows for each session set TransportID, used, e.g., to identify the 'I' part of
I_T nexus in Persistent Reservation commands

(3) means that each scst_local's session's SYSFS entry contains a link to the
corresponding SCSI host. Using it one can find local sg/bsg/sd/etc. devices of
this session. For instance, this links points out to host12, so you can find your
sg devices by:

$ lsscsi -g|grep "\[12:"
[12:0:0:0]   disk    SCST_FIO rd1               200  /dev/sdc  /dev/sg2
[12:0:0:1]   disk    SCST_FIO nullio            200  /dev/sdd  /dev/sg3

They are /dev/sg2 and /dev/sg3.

(4)-(6) allows creation of full features multi-transport/multi-initiator (I_T nexus)
SCST target drivers in user space, which can share devices with in-kernel target
drivers, i.e. all the reservations, including Persistent Reservations, and other
similar per-I_T nexus functionality will be done correctly.

The recommended workflow for a user space target driver:

1. For each SCSI target a user space target driver should create an
   scst_local's target using "add_target" SYSFS command.
   
2. Then the user space target driver should, if needed, set its SCSI and
   physical transport version descriptors (by default it is SAS) using SYSFS
   attributes scsi_transport_version and phys_transport_version correspondingly in
   /sys/kernel/scst_tgt/targets/scst_local/target_name directory.
   
3. For incoming session ...
From: Vladislav Bolkhovitin
Date: Friday, October 1, 2010 - 2:58 pm

This patch contains SCST InfiniBand SRP target driver.

This driver works directly on top of InfiniBand stack and SCST.

It is a high performance driver capable of handling 600K+ 4K random write
IOPS by a single target as well as 2.5+GB/s sequential throughput from
a single QDR IB port.

It was originally developed by Vu Pham/Mellanox, currently
Bart Van Assche is maintaining and improving it.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Bart Van Assche <bart.vanassche@gmail.com>
Signed-off-by: Vladislav Bolkhovitin <vst@vlnb.net>
---
 Documentation/scst/README.srpt |  112 +
 drivers/scst/srpt/Kconfig      |   12 
 drivers/scst/srpt/Makefile     |    1 
 drivers/scst/srpt/ib_dm_mad.h  |  139 +
 drivers/scst/srpt/ib_srpt.c    | 3809 +++++++++++++++++++++++++++++++++++++++++
 drivers/scst/srpt/ib_srpt.h    |  370 +++
 6 files changed, 4443 insertions(+)

diff -uprN orig/linux-2.6.35/drivers/scst/srpt/Kconfig linux-2.6.35/drivers/scst/srpt/Kconfig
--- orig/linux-2.6.35/drivers/scst/srpt/Kconfig
+++ linux-2.6.35/drivers/scst/srpt/Kconfig
@@ -0,0 +1,12 @@
+config SCST_SRPT
+	tristate "InfiniBand SCSI RDMA Protocol target support"
+	depends on INFINIBAND && SCST
+	---help---
+
+	  Support for the SCSI RDMA Protocol (SRP) Target driver. The
+	  SRP protocol is a protocol that allows an initiator to access
+	  a block storage device on another host (target) over a network
+	  that supports the RDMA protocol. Currently the RDMA protocol is
+	  supported by InfiniBand and by iWarp network hardware. More
+	  information about the SRP protocol can be found on the website
+	  of the INCITS T10 technical committee (http://www.t10.org/).
diff -uprN orig/linux-2.6.35/drivers/scst/srpt/Makefile linux-2.6.35/drivers/scst/srpt/Makefile
--- orig/linux-2.6.35/drivers/scst/srpt/Makefile
+++ linux-2.6.35/drivers/scst/srpt/Makefile
@@ -0,0 +1,1 @@
+obj-$(CONFIG_SCST_SRPT)			+= ib_srpt.o
diff -uprN orig/linux-2.6.35/drivers/scst/srpt/ib_dm_mad.h ...
From: Vladislav Bolkhovitin
Date: Friday, October 1, 2010 - 3:04 pm

The current ibmvstgt and libsrp kernel modules are based on the tgt
infrastructure. Both modules need the scsi_tgt kernel module and the tgtd user
space process in order to function properly. This patch modifies the ibmvstgt
and libsrp kernel modules such that both use the SCST storage target framework
instead of tgt.

This patch introduces one backwards-incompatible change, namely that the path
of the ibmvstgt sysfs attributes is modified. This change is unavoidable
because this patch dissociates ibmvstgt SRP sessions from a SCSI host instance.

Notes:
- ibmvstgt is the only user of libsrp.
- A 2.6.35 kernel tree with this patch applied does compile cleanly on the
systems supported by the ibmvstgt kernel module, the patch itself is checkpatch
clean and does not introduce any new sparse warnings. This patch has not been
tested in any other way however. The primary purpose of this patch is to invite
feedback about the chosen approach.

<IMPORTANT>

We are looking for hardware to complete this driver. Any help will be greatly
appreciated!

</IMPORTANT>

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Vladislav Bolkhovitin <vst@vlnb.net>
---
 Documentation/powerpc/ibmvstgt.txt |    2 
 drivers/scsi/ibmvscsi/ibmvstgt.c   |  609 +++++++++++++++++++++++++------------
 drivers/scsi/libsrp.c              |   87 ++---
 include/scsi/libsrp.h              |   16 
 4 files changed, 474 insertions(+), 240 deletions(-)

--- orig/linux-2.6.35/drivers/scsi/ibmvscsi/ibmvstgt.c 16:47:55.220115813 +0400
+++ linux-2.6.35/drivers/scsi/ibmvscsi/ibmvstgt.c 15:50:36.616855875 +0400
@@ -5,6 +5,7 @@
  *			   Linda Xie (lxie@us.ibm.com) IBM Corp.
  *
  * Copyright (C) 2005-2006 FUJITA Tomonori <tomof@acm.org>
+ * Copyright (C) 2010 Bart Van Assche <bvanassche@acm.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -24,15 +25,13 @@
 #include <linux/interrupt.h>
 #include ...
From: Vladislav Bolkhovitin
Date: Friday, October 1, 2010 - 3:05 pm

Because of the conversion of the ibmvstgt driver from tgt to SCST, and because
the ibmvstgt driver was the only user of scsi_tgt, the scsi_tgt kernel module,
the CONFIG_SCSI_TGT, CONFIG_SCSI_SRP_TGT_ATTRS and CONFIG_SCSI_FC_TGT_ATTRS
kbuild variable, the scsi_host_template member variables transfer_response,
supportedmode and active_mode and the constants MODE_UNKNOWN, MODE_INITIATOR
and MODE_TARGET are no longer needed.

Note: this patch applies cleanly on a 2.6.35 kernel tree. The patch tool
however complains about the defconfig changes when trying to apply this patch
on a 2.6.36 kernel tree.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Vladislav Bolkhovitin <vst@vlnb.net>
---
 arch/arm/configs/at572d940hfek_defconfig         |    1 
 arch/arm/configs/cam60_defconfig                 |    1 
 arch/arm/configs/s3c2410_defconfig               |    1 
 arch/m68k/configs/amiga_defconfig                |    2 
 arch/m68k/configs/apollo_defconfig               |    2 
 arch/m68k/configs/atari_defconfig                |    2 
 arch/m68k/configs/bvme6000_defconfig             |    2 
 arch/m68k/configs/hp300_defconfig                |    2 
 arch/m68k/configs/mac_defconfig                  |    2 
 arch/m68k/configs/multi_defconfig                |    2 
 arch/m68k/configs/mvme147_defconfig              |    2 
 arch/m68k/configs/mvme16x_defconfig              |    2 
 arch/m68k/configs/q40_defconfig                  |    2 
 arch/m68k/configs/sun3_defconfig                 |    2 
 arch/m68k/configs/sun3x_defconfig                |    2 
 arch/mips/configs/bcm47xx_defconfig              |    1 
 arch/mips/configs/decstation_defconfig           |    1 
 arch/mips/configs/ip22_defconfig                 |    1 
 arch/mips/configs/ip27_defconfig                 |    1 
 arch/mips/configs/ip32_defconfig                 |    1 
 arch/mips/configs/jazz_defconfig                 |    1 
 arch/mips/configs/malta_defconfig                |    1 
 ...
From: Bart Van Assche
Date: Saturday, October 2, 2010 - 12:40 am

Note: as far as I know currently SCST is the first and only in-kernel
target storage framework that satisfies both criteria (1) and (2) as
defined during the LSF 2010 summit storage track (see also
http://lwn.net/Articles/400589/):

1. Being a drop in replacement for STGT (the current in-kernel target
mode driver).
2. Using a modern sysfs based control and configuration plane.

Regarding criterion (3), that the code was reviewed as clean enough
for inclusion: any feedback is welcome.

Bart.
--

From: Steve Modica
Date: Wednesday, October 6, 2010 - 1:21 pm

Hi All,

Small Tree would very much like to see the SCST framework included in the mainline kernel.  It's used in one of the products we've been working on as well as in some products we sell.  It seems reliable and the performance seems good.  Having it in the mainline kernel would save us a great deal of headache.

Steve


--
Steve Modica
CTO -  Small Tree Communications
www.small-tree.com
phone: 651-209-6509 ext 301
mobile: 651-261-3201






--

Previous thread: [RESEND PATCH 0/2] Fix IRQ round-robing w/o irqbalance on pseries by Nishanth Aravamudan on Friday, October 1, 2010 - 2:26 pm. (9 messages)

Next thread: 2.6.35-rc6 REGRESSION: Dirtiable inode bdi default != sb bdi ext2/ext3/ext4 by Theodore Ts'o on Friday, October 1, 2010 - 2:35 pm. (1 message)