This is the next revision of the SCSI event notification infrastructure patchset, enabling SATA Asynchronous Notification ("AN") for CD/DVD devices that support it. For devices that support SATA AN (only very recent ones do), this means that HAL and other userspace utilities no longer need to repeatedly poll the CD/DVD device to determine if the user has changed the media. This revision takes into account James' comments from earlier today, modulo the following notes: * I think the various event attributes should always be present, for all devices at all times. If various events are not supported, the attribute will of course return zero (false, not supported). * I do not think this work should be blocked behind a revamp of the attribute group interface. * I was slack and did not bother to implement the 'set' operation for the attributes. This can easily be done at a later time in a separate patch. It is not a merge stopper to have the driver exclusively control the event mask, rather than driver+sysfs. -
Originally based on a patch by Kristen Carlson Accardi @ Intel.
Copious input from James Bottomley.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
---
drivers/scsi/scsi_lib.c | 66 ++++++++++++++++++++++++++++++++++++++++++++
drivers/scsi/scsi_scan.c | 2 +
drivers/scsi/scsi_sysfs.c | 20 +++++++++++++
include/scsi/scsi_device.h | 12 ++++++++
4 files changed, 100 insertions(+), 0 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 61fdaf0..f55ec80 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -18,6 +18,7 @@
#include <linux/delay.h>
#include <linux/hardirq.h>
#include <linux/scatterlist.h>
+#include <linux/bitmap.h>
#include <scsi/scsi.h>
#include <scsi/scsi_cmnd.h>
@@ -2115,6 +2116,71 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
EXPORT_SYMBOL(scsi_device_set_state);
/**
+ * sdev_evt_thread - send a uevent for each scsi event
+ * @work: work struct for scsi_device
+ *
+ * Emit all queued media events as environment variables
+ * sent during a uevent.
+ */
+void scsi_evt_thread(struct work_struct *work)
+{
+ struct scsi_device *sdev;
+ char *envp[SDEV_EVT_LAST + 2];
+ DECLARE_BITMAP(mask, SDEV_EVT_MAXBITS);
+ unsigned long flags;
+ int evt, idx;
+
+ sdev = container_of(work, struct scsi_device, event_work);
+
+ spin_lock_irqsave(&sdev->list_lock, flags);
+ bitmap_copy(mask, sdev->event_mask, SDEV_EVT_MAXBITS);
+ bitmap_zero(sdev->event_mask, SDEV_EVT_MAXBITS);
+ spin_unlock_irqrestore(&sdev->list_lock, flags);
+
+ idx = 0;
+ for (evt = 0; evt < SDEV_EVT_LAST; evt++) {
+ if (!test_bit(evt, mask))
+ continue;
+
+ switch (evt) {
+ case SDEV_EVT_MEDIA_CHANGE:
+ envp[idx++] = "SDEV_MEDIA_CHANGE=1";
+ break;
+ }
+ }
+ envp[idx++] = NULL;
+
+ kobject_uevent_env(&sdev->sdev_gendev.kobj, KOBJ_CHANGE, envp);
+}
+
+/**
+ * sdev_evt_notify - send asserted events to uevent thread
+ * @sdev: scsi_device event occurred on
+ ...This still doesn't solve the fundamental corruption problem: sdev->event_work has to contain the work entry until the workqueue has finished executing it (which is some unspecified time in the future). As soon as you drop the sdev->list_lock, the system thinks sdev->event_work is available for reuse. If we fire another event before the work queue finished processing the prior event, the queue will be corrupted. Although I hate GFP_ATOMIC allocations, I think that's the only viable way to get out of this corruption problem (using a mechanism similar to what I proposed yesterday). Also, I think Kristin's initial use of execute_in_user_context() was a good call .. if we already have a user context, there's no need to bother the workqueue ... some of these events will likely trigger from thread backed kernel daemons. James -
I think you're misunderstanding the workqueue code? You can call schedule_work(&sdev->event_work) from anywhere, any time you like, as Quite agreed that sdev_evt_notify() might be called from kernel daemons, but in general this is a fire-and-forget API that is -likely- to be called from interrupt or completion context of many drivers, just like scsi_done or other completion APIs. It is a fundamentally parallel interface. If thread-backed kernel daemons want to use this, it is trivial for them to schedule work, then sync. Jeff -
OK, take me through it slowly then ... I think schedule_work(work) inserts work->entry onto the workqueue list (in workqueue.c:insert_work()). If the event hasn't fired, it will already be on the list, so adding the same entry to a list twice causes a list corruption problem. Plus, unfortunately, the CC/UA events are going to have to carry extra sense data; they're not simply going to be triggers saying something happened. James -
It does a test_and_set_bit() first thing in queue_work(). Similar exclusivity logic is found in net device land. Ah, the fun of locking OK this is a fair criticism. If additional data must be carried, then I must ditch the beloved bitmap implementation and go back to a list (with associated GFP_ATOMIC alloc). I will fix this, unless I receive email to the contrary... Jeff -
Yes, unfortunately, thanks. If all events were a simple number, it's easy, but the CC/UA events carry data as well. James -
An end to CD-ROM polling (if you have a device that supports AN)...
hooray!
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
---
drivers/ata/libata-scsi.c | 12 ++++++++----
1 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index f752edd..e6d5627 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -773,6 +773,9 @@ static void ata_scsi_dev_config(struct scsi_device *sdev,
blk_queue_max_hw_segments(q, q->max_hw_segments - 1);
}
+ if (dev->flags & ATA_DFLAG_AN)
+ set_bit(SDEV_EVT_MEDIA_CHANGE, sdev->supported_events);
+
if (dev->flags & ATA_DFLAG_NCQ) {
int depth;
@@ -3225,10 +3228,11 @@ static void ata_scsi_handle_link_detach(struct ata_link *link)
*/
void ata_scsi_media_change_notify(struct ata_device *dev)
{
-#ifdef OTHER_AN_PATCHES_HAVE_BEEN_APPLIED
- if (dev->sdev)
- scsi_device_event_notify(dev->sdev, SDEV_MEDIA_CHANGE);
-#endif
+ if (dev->sdev) {
+ DECLARE_BITMAP(map, SDEV_EVT_MAXBITS);
+ __set_bit(SDEV_EVT_MEDIA_CHANGE, map);
+ sdev_evt_notify(dev->sdev, map);
+ }
}
/**
--
1.5.2.4
-
Actually, I don't think so. We have precedent for this in the transport classes: if a device doesn't support a feature, we don't export the flag for that feature through sysfs. This allows not only feature control, but an immediate view of the device capabilities simply by viewing the sysfs directory. I think this functionality is very easy to layer in, so there's no James -
Think about about the values being exported by these sysfs attributes: they indicate whether or not that feature is supported. Thus, using the presence/absence of an attribute to communicate the same thing would be redundant. This suggestion adds a whole lot of complexity -- mirroring every change to sdev->supported_events by dynamically adding or removing attributes. The current nice, simple, elegant bitops-based interface is suddenly a lot more cumbersome if forced to deal with attribute creation and disposal. Finally, this additional complexity of dynamic attribute management also eliminates some key information: userland can test the existence of the attribute to determine if that support is present in the kernel. Jeff -
Ah, OK; I haven't communicated what we need very clearly. We need a way to see if the event is supported by the device, as well as a way to turn it off. For some of the events (possibly not the SATA AN one, since I know all SATA devices will be well behaved) there's going to be a need to deal with berserk or broken devices that become trigger happy, so turning off the event will be a useful (and possibly essential) way of James -
That's possible with the presented interface[1]: # see if event is supported cat $path/evt_media_change # turn off event to deal with broken/beserk devices echo 0 > $path/evt_media_change Some sillyhead can always do echo 1 > $path/evt_some_event_my_device_does_not_support but that will be obviously be a no-op because their device simply will not send such events. Granted ls(1) is no longer a method for viewing supported-at-boot-time list of events -- ls(1) in the presented interface lists what events the _kernel_ supports, and cat(1) is used to discover which events are actually enabled. I think that is the only difference between our two positions: [if I understand you correctly] you want ls(1) to be able to list the device's supported events. However, I feel that is inconsistent: for your proposal, userspace must perform two checks in order to determine a feature's availability: 1) does the file exist? 2) is the file context non-zero? Regards, Jeff -
Yes, I agree ... however, open file is one op for the user -ENXIO means device doesn't support the event; value indicates whether the event is currently triggering. I just would rather we use the file exists if device supports event, James -
Two problems with what you just described: 1) "value indicates current event state" is a new concept in this thread (maybe you were thinking this all along, but I didn't get that from your writing). Watching the sysfs node for event activity is definitely outside the scope of this work, and IMO not very useful. The time from when LLDD calls sdev_evt_notify() until uevent completion is very short, so the time window for actually receiving a useful value in your scenario is also short. My patch presented the attributes purely as control nodes, only affected sdev->supported_events and nothing more. You seem to be suggesting exporting the true-for-only-a-few-milliseconds activity state, rather then enable/disable state. 2) Event support itself is dynamic, which causes me to revisit the "complexity" argument. In libata, for example, we only note that the media-change event is supported after some time passes -- not in the initial slave_config. Or error handler may disable it at runtime because that event is problematic. As such, that implies that the LLDD (with help from scsi_lib) is dynamically adding and removing these attributes at runtime -- a lot more complexity than is really needed AFAICS. It is easy and straightforward for the driver to set a bit. We cannot assume the state of event support bits are constant from modprobe/slave_config time. Jeff -
