Re: [PATCH v4 0/2] [SCSI] Asynchronous event notification infrastructure

Previous thread: [PATCH] watchdog: spin_lock_init() fixes by Alexey Dobriyan on Monday, October 29, 2007 - 10:40 am. (1 message)

Next thread: drivers/net/sunhme.c patch by Tom Southerland on Monday, October 29, 2007 - 10:54 am. (6 messages)
To: LKML <linux-kernel@...>, Linux-SCSI <linux-scsi@...>
Cc: <akpm@...>
Date: Monday, October 29, 2007 - 10:42 am

This is the next revision of the SCSI event notification infrastructure
patchset, enabling SATA Asynchronous Notification ("AN") for CD/DVD
devices that support it.

For devices that support SATA AN (only very recent ones do), this means
that HAL and other userspace utilities no longer need to repeatedly poll
the CD/DVD device to determine if the user has changed the media.

This revision takes into account James' comments from earlier today,
modulo the following notes:

* I think the various event attributes should always be present,
for all devices at all times. If various events are not supported,
the attribute will of course return zero (false, not supported).

* I do not think this work should be blocked behind a revamp
of the attribute group interface.

* I was slack and did not bother to implement the 'set' operation
for the attributes. This can easily be done at a later time in a
separate patch. It is not a merge stopper to have the driver
exclusively control the event mask, rather than driver+sysfs.

-

To: Jeff Garzik <jeff@...>
Cc: LKML <linux-kernel@...>, Linux-SCSI <linux-scsi@...>, <akpm@...>
Date: Monday, October 29, 2007 - 11:43 am

Actually, I don't think so. We have precedent for this in the transport
classes: if a device doesn't support a feature, we don't export the flag
for that feature through sysfs. This allows not only feature control,
but an immediate view of the device capabilities simply by viewing the
sysfs directory.

I think this functionality is very easy to layer in, so there's no

James

-

To: James Bottomley <James.Bottomley@...>
Cc: LKML <linux-kernel@...>, Linux-SCSI <linux-scsi@...>, <akpm@...>
Date: Monday, October 29, 2007 - 11:58 am

Think about about the values being exported by these sysfs attributes:
they indicate whether or not that feature is supported.

Thus, using the presence/absence of an attribute to communicate the same
thing would be redundant.

This suggestion adds a whole lot of complexity -- mirroring every change
to sdev->supported_events by dynamically adding or removing attributes.

The current nice, simple, elegant bitops-based interface is suddenly a
lot more cumbersome if forced to deal with attribute creation and disposal.

Finally, this additional complexity of dynamic attribute management also
eliminates some key information: userland can test the existence of the
attribute to determine if that support is present in the kernel.

Jeff

-

To: Jeff Garzik <jeff@...>
Cc: LKML <linux-kernel@...>, Linux-SCSI <linux-scsi@...>, <akpm@...>
Date: Monday, October 29, 2007 - 12:10 pm

Ah, OK; I haven't communicated what we need very clearly. We need a way
to see if the event is supported by the device, as well as a way to turn
it off. For some of the events (possibly not the SATA AN one, since I
know all SATA devices will be well behaved) there's going to be a need
to deal with berserk or broken devices that become trigger happy, so
turning off the event will be a useful (and possibly essential) way of

James

-

To: James Bottomley <James.Bottomley@...>
Cc: LKML <linux-kernel@...>, Linux-SCSI <linux-scsi@...>, <akpm@...>
Date: Monday, October 29, 2007 - 12:24 pm

That's possible with the presented interface[1]:

# see if event is supported
cat $path/evt_media_change

# turn off event to deal with broken/beserk devices
echo 0 > $path/evt_media_change

Some sillyhead can always do

echo 1 > $path/evt_some_event_my_device_does_not_support

but that will be obviously be a no-op because their device simply will
not send such events.

Granted ls(1) is no longer a method for viewing supported-at-boot-time
list of events -- ls(1) in the presented interface lists what events the
_kernel_ supports, and cat(1) is used to discover which events are
actually enabled.

I think that is the only difference between our two positions: [if I
understand you correctly] you want ls(1) to be able to list the device's
supported events. However, I feel that is inconsistent: for your
proposal, userspace must perform two checks in order to determine a
feature's availability: 1) does the file exist? 2) is the file context
non-zero?

Regards,

Jeff

-

To: Jeff Garzik <jeff@...>
Cc: LKML <linux-kernel@...>, Linux-SCSI <linux-scsi@...>, <akpm@...>
Date: Monday, October 29, 2007 - 12:34 pm

Yes, I agree ... however, open file is one op for the user -ENXIO means
device doesn't support the event; value indicates whether the event is
currently triggering.

I just would rather we use the file exists if device supports event,

James

-

To: James Bottomley <James.Bottomley@...>
Cc: LKML <linux-kernel@...>, Linux-SCSI <linux-scsi@...>, <akpm@...>
Date: Monday, October 29, 2007 - 12:48 pm

Two problems with what you just described:

1) "value indicates current event state" is a new concept in this thread
(maybe you were thinking this all along, but I didn't get that from your
writing).

Watching the sysfs node for event activity is definitely outside the
scope of this work, and IMO not very useful. The time from when LLDD
calls sdev_evt_notify() until uevent completion is very short, so the
time window for actually receiving a useful value in your scenario is
also short.

My patch presented the attributes purely as control nodes, only affected
sdev->supported_events and nothing more. You seem to be suggesting
exporting the true-for-only-a-few-milliseconds activity state, rather
then enable/disable state.

2) Event support itself is dynamic, which causes me to revisit the
"complexity" argument. In libata, for example, we only note that the
media-change event is supported after some time passes -- not in the
initial slave_config. Or error handler may disable it at runtime
because that event is problematic.

As such, that implies that the LLDD (with help from scsi_lib) is
dynamically adding and removing these attributes at runtime -- a lot
more complexity than is really needed AFAICS.

It is easy and straightforward for the driver to set a bit.

We cannot assume the state of event support bits are constant from
modprobe/slave_config time.

Jeff

-

To: LKML <linux-kernel@...>, Linux-SCSI <linux-scsi@...>
Cc: <akpm@...>
Date: Monday, October 29, 2007 - 10:42 am

An end to CD-ROM polling (if you have a device that supports AN)...
hooray!

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
---
drivers/ata/libata-scsi.c | 12 ++++++++----
1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index f752edd..e6d5627 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -773,6 +773,9 @@ static void ata_scsi_dev_config(struct scsi_device *sdev,
blk_queue_max_hw_segments(q, q->max_hw_segments - 1);
}

+ if (dev->flags & ATA_DFLAG_AN)
+ set_bit(SDEV_EVT_MEDIA_CHANGE, sdev->supported_events);
+
if (dev->flags & ATA_DFLAG_NCQ) {
int depth;

@@ -3225,10 +3228,11 @@ static void ata_scsi_handle_link_detach(struct ata_link *link)
*/
void ata_scsi_media_change_notify(struct ata_device *dev)
{
-#ifdef OTHER_AN_PATCHES_HAVE_BEEN_APPLIED
- if (dev->sdev)
- scsi_device_event_notify(dev->sdev, SDEV_MEDIA_CHANGE);
-#endif
+ if (dev->sdev) {
+ DECLARE_BITMAP(map, SDEV_EVT_MAXBITS);
+ __set_bit(SDEV_EVT_MEDIA_CHANGE, map);
+ sdev_evt_notify(dev->sdev, map);
+ }
}

/**
--
1.5.2.4

-

To: LKML <linux-kernel@...>, Linux-SCSI <linux-scsi@...>
Cc: <akpm@...>
Date: Monday, October 29, 2007 - 10:42 am

Originally based on a patch by Kristen Carlson Accardi @ Intel.
Copious input from James Bottomley.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
---
drivers/scsi/scsi_lib.c | 66 ++++++++++++++++++++++++++++++++++++++++++++
drivers/scsi/scsi_scan.c | 2 +
drivers/scsi/scsi_sysfs.c | 20 +++++++++++++
include/scsi/scsi_device.h | 12 ++++++++
4 files changed, 100 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 61fdaf0..f55ec80 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -18,6 +18,7 @@
#include <linux/delay.h>
#include <linux/hardirq.h>
#include <linux/scatterlist.h>
+#include <linux/bitmap.h>

#include <scsi/scsi.h>
#include <scsi/scsi_cmnd.h>
@@ -2115,6 +2116,71 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
EXPORT_SYMBOL(scsi_device_set_state);

/**
+ * sdev_evt_thread - send a uevent for each scsi event
+ * @work: work struct for scsi_device
+ *
+ * Emit all queued media events as environment variables
+ * sent during a uevent.
+ */
+void scsi_evt_thread(struct work_struct *work)
+{
+ struct scsi_device *sdev;
+ char *envp[SDEV_EVT_LAST + 2];
+ DECLARE_BITMAP(mask, SDEV_EVT_MAXBITS);
+ unsigned long flags;
+ int evt, idx;
+
+ sdev = container_of(work, struct scsi_device, event_work);
+
+ spin_lock_irqsave(&sdev->list_lock, flags);
+ bitmap_copy(mask, sdev->event_mask, SDEV_EVT_MAXBITS);
+ bitmap_zero(sdev->event_mask, SDEV_EVT_MAXBITS);
+ spin_unlock_irqrestore(&sdev->list_lock, flags);
+
+ idx = 0;
+ for (evt = 0; evt < SDEV_EVT_LAST; evt++) {
+ if (!test_bit(evt, mask))
+ continue;
+
+ switch (evt) {
+ case SDEV_EVT_MEDIA_CHANGE:
+ envp[idx++] = "SDEV_MEDIA_CHANGE=1";
+ break;
+ }
+ }
+ envp[idx++] = NULL;
+
+ kobject_uevent_env(&sdev->sdev_gendev.kobj, KOBJ_CHANGE, envp);
+}
+
+/**
+ * sdev_evt_notify - send asserted events...

To: Jeff Garzik <jeff@...>
Cc: LKML <linux-kernel@...>, Linux-SCSI <linux-scsi@...>, <akpm@...>
Date: Monday, October 29, 2007 - 11:51 am

This still doesn't solve the fundamental corruption problem:
sdev->event_work has to contain the work entry until the workqueue has
finished executing it (which is some unspecified time in the future).
As soon as you drop the sdev->list_lock, the system thinks
sdev->event_work is available for reuse. If we fire another event
before the work queue finished processing the prior event, the queue
will be corrupted.

Although I hate GFP_ATOMIC allocations, I think that's the only viable
way to get out of this corruption problem (using a mechanism similar to
what I proposed yesterday).

Also, I think Kristin's initial use of execute_in_user_context() was a
good call .. if we already have a user context, there's no need to
bother the workqueue ... some of these events will likely trigger from
thread backed kernel daemons.

James

-

To: James Bottomley <James.Bottomley@...>
Cc: LKML <linux-kernel@...>, Linux-SCSI <linux-scsi@...>, <akpm@...>
Date: Monday, October 29, 2007 - 12:07 pm

I think you're misunderstanding the workqueue code? You can call
schedule_work(&sdev->event_work) from anywhere, any time you like, as

Quite agreed that sdev_evt_notify() might be called from kernel daemons,
but in general this is a fire-and-forget API that is -likely- to be
called from interrupt or completion context of many drivers, just like
scsi_done or other completion APIs. It is a fundamentally parallel
interface.

If thread-backed kernel daemons want to use this, it is trivial for them
to schedule work, then sync.

Jeff

-

To: Jeff Garzik <jeff@...>
Cc: LKML <linux-kernel@...>, Linux-SCSI <linux-scsi@...>, <akpm@...>
Date: Monday, October 29, 2007 - 12:17 pm

OK, take me through it slowly then ... I think schedule_work(work)
inserts work->entry onto the workqueue list (in
workqueue.c:insert_work()). If the event hasn't fired, it will already
be on the list, so adding the same entry to a list twice causes a list
corruption problem.

Plus, unfortunately, the CC/UA events are going to have to carry extra
sense data; they're not simply going to be triggers saying something
happened.

James

-

To: James Bottomley <James.Bottomley@...>
Cc: LKML <linux-kernel@...>, Linux-SCSI <linux-scsi@...>, <akpm@...>
Date: Monday, October 29, 2007 - 12:29 pm

It does a test_and_set_bit() first thing in queue_work(). Similar
exclusivity logic is found in net device land. Ah, the fun of locking

OK this is a fair criticism.

If additional data must be carried, then I must ditch the beloved bitmap
implementation and go back to a list (with associated GFP_ATOMIC alloc).

I will fix this, unless I receive email to the contrary...

Jeff

-

To: Jeff Garzik <jeff@...>
Cc: LKML <linux-kernel@...>, Linux-SCSI <linux-scsi@...>, <akpm@...>
Date: Monday, October 29, 2007 - 1:01 pm

Yes, unfortunately, thanks. If all events were a simple number, it's
easy, but the CC/UA events carry data as well.

James

-

Previous thread: [PATCH] watchdog: spin_lock_init() fixes by Alexey Dobriyan on Monday, October 29, 2007 - 10:40 am. (1 message)

Next thread: drivers/net/sunhme.c patch by Tom Southerland on Monday, October 29, 2007 - 10:54 am. (6 messages)