From: Alan Stern <stern@rowland.harvard.edu>
Add support for autosuspend/autoresume. Lowlevel driver can use it to
spin the disk down and power down its SATA link, to turn off the USB
interface, etc.
Spinning down the disk is useful - saves ~0.5W here. Powering down
SATA controller is even better -- should save ~1W.
Now, I guess the patch will need to be split to small pieces for
merge... I tried to rearrange it so that the documentation and hooks
go before stuff that needs the hooks, and before Kconfig enabler. If
it looks reasonably good, I'll split it into smaller pieces.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Tejun Heo <teheo@novell.com>
diff --git a/Documentation/scsi/scsi_mid_low_api.txt b/Documentation/scsi/scsi_mid_low_api.txt
index a6d5354..98dc005 100644
--- a/Documentation/scsi/scsi_mid_low_api.txt
+++ b/Documentation/scsi/scsi_mid_low_api.txt
@@ -782,6 +782,8 @@ In some cases more detail is given in sc
The interface functions are listed below in alphabetical order.
Summary:
+ autoresume - perform dynamic (runtime) host resume
+ autosuspend - perform dynamic (runtime) host suspend
bios_param - fetch head, sector, cylinder info for a disk
detect - detects HBAs this driver wants to control
eh_timed_out - notify the host that a command timer expired
@@ -802,6 +804,54 @@ Summary:
Details:
/**
+ * autoresume - perform dynamic (runtime) host resume
+ * @shp: host to resume
+ *
+ * Resume (return to an operational power level) the specified host.
+ * Return 0 if the resume was successful, otherwise a negative
+ * error code.
+ *
+ * Locks: struct Scsi_Host::pm_mutex held throughout the call.
+ *
+ * Calling context: process
+ *
+ * Notes: If the host is not currently suspended, this method does
+ * need to do anything.
+ *
+ * Optionally defined in: LLD
+ **/
+ int autoresume(struct Scsi_Host *shp)
+
+
+/**
+ * autosuspend - perform dynamic ...James had a number of objections to my original patch; you can read them here: https://lists.linux-foundation.org/pipermail/linux-pm/2008-March/016849.html I haven't had time yet to work on an improved version. Alan Stern --
Very well. I see a basic problem here. For USB it is necessary that child devices be suspended before anything higher up in the tree is suspended. SATA seems to be able to power down a link while the device is not suspended. In fact in true SCSI busses can be shared. So are we using the correct approach? Regards Oliver --
Is the USB transport unique in its requirement that all the child devices must be suspended before the link can be powered down? Maybe that requirement should be made an explicit property of the transport This is a good question. Most USB mass-storage devices do not act as a true SCSI bus, but I believe there are a few non-standard ones that do -- the USB device really contains a SCSI host and arbitrary SCSI targets can be attached to it. For the moment, we should be safe enough using a model in which there are no other initiators on a USB-type SCSI transport, but it's something to keep in mind. Alan Stern --
All children that are USB must be powered down. We know in fact that most drives don't care that the device is suspended. The problem was drive So do we really want to do autosuspend on the device level? Or do we work on hosts and just use the suspend()/resume() support of the sd, sr, ... etc? Regards Oliver --
You misunderstood my question. Are there SCSI transports other than USB sharing the requirement that all child devices must be suspended For transports which are like USB, we should do autosuspend at the target (not device) level. This means invoking the suspend/resume routines of the ULDs like sd and sr. The transport gets notified when all of the targets are suspended. (Or maybe the host driver gets notified instead; there probably isn't any advantage to using the transport class here.) For other transports, we should only do idle-timeout detection. The transport gets notified when any target has been idle for sufficiently long, so that it can power down the link. The ULDs are not involved. Does that sound okay? Alan Stern --
Yes in case of FireWire; it's necessary there too (but not sufficient). (It's a bad example though since I have no good idea whether power management beyond (a) system suspend and (b) disk spindown is feasible in reality at all.) Minor correction: The ULD suspend/resume methods necessarily work on logical units, not targets. -- Stefan Richter -=====-==--- =--- -==-= http://arcgraph.de/sr/ --
Yes; I should said that the suspend/resume methods of the ULD for each of the target's LUNs gets invoked. Alan Stern --
I dispute that USB in general has this property. Some storage devices need their caches flushed. USB itself is perfectly happy with autosuspending the storage device (host) without telling the disks (devices) You could even argue that these storage devices violate the USB spec. Regards Oliver --
How can you dispute that? You said it yourself, in the top quote Oliver, you can't have it both ways. Either we do spin down disks and drain device caches before autosuspending usb-storage or we don't. For safety's sake, obviously we should. The overhead is minimal since this happens only after the idle timeout has expired. And for devices that don't support it (like flash storage), sd skips the spin-down command anway. At any rate, Stefan Richter has answered my original question. Firewire has essentially the same restrictions as USB. Alan Stern --
But you cannot make the conclusion that the ultimate children should have any autosuspend attributes. We can implement autosuspend in usb storage and propagate the suspend calls down the tree without SCSI knowing about autosuspend. Such a system would have it drawbacks, but it'd be a lot simpler. Regards Oliver --
Oh, I see. All right, yes. However USB in general _does_ have the property that child devices might not be able to accomplish much while the USB link is suspended, particularly if they are bus-powered. This The way I designed the autosuspend framework, you _can't_ do that. In my framework autosuspend and autoresume events propagate _up_ the device tree, not _down_. This means an autosuspend has to be initiated by the child SCSI layer, not by the USB layer. Which is as it should be, since the USB layer doesn't know when it is appropriate for a SCSI It would be a layering violation. Alan Stern --
Hmm... but suspended devices have very little power budget, right? So unless you have external power supply (2.5" frames generally don't), you can't really suspend and stay spinned up... -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
True, but the spec says that no state shall be lost. I don't really argue against flushing the caches. But I cannot that this would demand that we should implement autopsuspend for SCSI. It seems like overengineering to me. Regards Oliver --
What can we do?... Real world devices don't always obey the spec. You could argue that the suspend current should be sufficient to maintain the contents of the cache, which would then be written out after resume. But even if that is true, it's a very fragile guarantee Think of it in two parts: idle-timeout detection and autosuspend. Presumably you don't object to the idle-timeout detection (which is needed for powering down links in general), and you don't argue against the cache-flushing part of autosuspend. Taken together, that's about 90% of my proposal. So what is the objectionable 10%? Alan Stern --
The core problem is that you insist on a rigid bottom-to-top flow of autosuspensions. That's good for systems like USB and PCI which are trees for PM purposes. It makes no sense for true busses with equal members on the bus. Regards Oliver --
My framework is tree-oriented because it's based on the driver model, which uses a tree of devices. Even on a true bus, the members can't be entirely equal -- one of them has to be closer to the CPU than the others are. If that one member is in a low-power state then the CPU can't communicate with anything on the bus, unlike when one of the other members is in a low-power state. (I suppose in theory there could be a situation in which the CPU has direct communication with a bunch of devices, which can also communicate among themselves over some other bus. In such a situation we would represent the devices as members of separate branches in the device tree, so that suspending one would have no impact on suspending the others. The presence of the interconnecting bus would be ignored.) Alan Stern --
Yes, that means under some circumstances you cannot suspend the member closest to the CPU, but under others you can. In a tree this question is very simply answered, on a bus you will actually need to compute whether you need the connection to the bus. It is true that you won't need the bus if all other members on the bus have been suspended, but that's not very good because physically spinning down and up a disk is a very expensive operation, while suspending a host adapter can be trivial. Regards Oliver --
How do you know? Is that just a guess based on some of Greg KH's and I don't see why any computation is needed. If the CPU will need to communicate with any devices on the bus (i.e., if any of these devices are not idle) then you need the connection to the bus, otherwise you don't. It's exactly the same with a tree. The fact that the interconnections form a bus rather than a tree is irrelevant. (Viewed in logical terms, even a true bus can be described as a tree. The nodes are partially ordered by their communication paths to the CPU.) More to the point is whether you should ever suspend any of these devices if there can be multiple initiators. But that's a separate What is your point? You seem to be saying that it would be nice to suspend a host adapter at times when some of the SCSI targets beneath it are not suspended. I agree, but how would you determine whether such a thing was safe? Alan Stern --
I suggest by talking to the HLDs. It seems to me that abstractly talking there are three criteria for suspension - the cpu needs to talk to the device now - the device may need to talk to the CPU at unpredictable times - suspending has side effects Suspension in USB has always side effects. That's not true for other subsystems. It seems to me that for the general case we need to divorce the notion of a child being suspended itself from a child agreeing to its parent being suspended. Regards Oliver --
One possibility is to have an attribute flag for SCSI transport classes, indicating whether the transport supports multiple initiators. Besides, isn't this already an issue? What happens when someone does a system suspend or hibernate? Don't the attached disk drives get spun down, even if there are other initiators on the same SCSI bus? (And is this really a problem? If an error occurs because a drive is spun down when some other device tries to access it, that other device Why would the HLD (= ULD?) know? For example, consider a USB disk drive. How is sd.c (the HLD) supposed to know that it's not safe to suspend the USB link without spinning down the drive? Or consider a traditional SCSI parallel interface drive. How is sd.c supposed to know that it is safe to suspend the I'm not sure what you mean by that. Suspension always has side effects Name one. At the very least, suspending a device means you can't use it again without first calling the driver's resume method. That's a side effect. Hopefully, in most subsystems suspending a device would reduce its This is already possible. For example, you may remember a couple of years ago I posted a patch for usb-storage which would autosuspend it without regard for the state of its child devices. The patch didn't work out, because some devices really did need to have their caches drained or disks spun down. There's nothing about my suspend framework to prevent a driver from autosuspending its device while the children are still active. Rather, the framework insists on notifications going the other way: The driver has to be told whenever one of its device's children is suspended or resumed. Alan Stern --
In (fw-)sbp2, we have for example this simple code:
static int sbp2_scsi_slave_configure(struct scsi_device *sdev)
{
...
if (sbp2_param_exclusive_login)
sdev->manage_start_stop = 1;
...
By setting the exclusive_login module parameter from Y (default) to N,
multiple initiators per logical unit become possible. We are too lazy
to check whether there are actually other initiators at a given moment;
after all they can come and go all the time. So the simplest strategy
is to suppress managed START STOP when concurrent initiators are _possible_.
I suppose though that all multiple initiator capable transports have
ways to query the presence of other initiators at any given time; but I
The high latency may be a problem.
--
Stefan Richter
-=====-==--- =--- =-=--
http://arcgraph.de/sr/
--
IDE, actually. I don't think it is relevant, but you can do hdparm -y, and it will automatically spin up when you try to talk to it next time. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
It's a matter of definitions... "hdparm -y" doesn't call the driver's suspend method, so in some sense it isn't truly a suspend. But it's true that some systems can power down more or less transparently (with restart latency as the only visible side effect). Alan Stern --
The HLD is responsible for suspending the disk in case the system is suspended. The HLD must know how to safely suspend a device. It may be I am talking about correctness for controllers. So remote wakeup may or may not But not outside the controller. If you suspend the root hub of a usb bus, you suspend everything on the bus. It's a feature of the hardware. Other That's the problem. You don't tell the children when the parent might want to suspend. Regards Oliver --
You didn't answer my question: How does the HLD know whether it's okay to suspend the link without suspending the device? I should think that it _doesn't_ know. The transport class code might know, or the link's driver -- but not the HLD. The HLD probably doesn't even know what type of transport is I don't understand. Are you saying that whether or not it's correct to suspend a link depends on whether the device may need to talk to the CPU at unpredictable times? And if so, isn't that the same as saying that remote wakeup for the link can be enabled? As for predicting how long the link will be idle... I doubt it is Why should the children need to know? If the children are already suspended then we certainly don't need to tell them the link is going down. If the children are active, then the link's driver or the transport class must already have given the okay for suspending the link while leaving the children active. So again, why consult the children's drivers? Alan Stern --
There's some truth to that. Unfortunately the transport does not know whether a device or link may be suspended. Take the case of a CD playing sound. The transport may know what the consequences of suspending a link will be to the devices, but only the devices know whether the Remote wakeup is a concept specific to USB. If you are writing for a generic system the question is indeed whether devices may want to talk to the host and whether they can. It seems to me that the ULD will know whether its devices will need Because the transport class may not know either. Regards Oliver --
Even the device (or more properly, the driver) might not know! In your example the driver might realize that playing had been started, but it That's not true at all. Maybe the name is specific to USB, but the concept isn't. Notice how we have power/wakeup files in the sysfs directory for every device, even non-USB devices? Requesting a In general, the link or transport class will know whether it is possible for a device to initiate communication with the CPU. If it is possible then the link would probably want to have remote wakeup enabled before autosuspending, even if none of the devices currently So sd.c might, in theory, want to respond in two different ways to an autosuspend request: (A) Drain the cache, (B) Drain the cache and spin down the drive. How does it know which to do? Ask the transport class for help choosing? (A) would leave us in an awkward "half-suspended" state. Is the device suspended or not? It is, in the sense that now the link can safely be suspended. But it isn't, in the sense that a system sleep would still require the drive to be spun down. It's kind of like the state we have following a PMSG_FREEZE -- quiescent but not suspended. Somehow this extra state needs to be incorporated into the autosuspend framework. Alan Stern --
(C) Do nothing (D) Refuse (i.e. the user has opened a block device and used a vendor Why? Unless the device can be skipped for purposes of autosuspend and system sleep, isn't it active? Regards Oliver --
I don't know -- it would depend on the particular transport. In any Also possible, although I don't think your example is a good one since To my mind, if the driver has to do something special to prepare for the link going down (such as draining the cache), then afterward the device is in a special state -- not the same as the active state. The difference between the two states is that in one the link may be autosuspended and in the other it mustn't. I see the driver making the transition between these states in response to autosuspend and autoresume calls. This means a driver such as sd.c has to respond in different sorts of ways to various autosuspend scenarios, either doing a real power-down or merely preparing for the link to go down. The implication is that we might want to send the driver two different autosuspend calls: One to prepare for the link to go down (after, say, a couple of seconds of idleness) and another to power-down the device (after, say, 15 minutes of idleness). Thus, there would be two "autosuspended" states: a shallow autosuspend (cache is drained) and a deep autosuspend (disk is spun down). Such an approach could be made to work, even though it seems slightly artificial. Alan Stern --
OK, but does it make sense to have SCSI autosuspend? Or should autosuspend operate on the bus the _host_ is connected to (usb, pci, ...)? Regards Oliver --
[Quoting Oliver: true SCSI busses can be shared. So are we using the In Alan's patch, SCSI calls scsi_host_template methods (if the LLD provides ones) to suspend and resume a Scsi_Host. The LLD can use them to work with the underlying infrastructure to determine what can be done at that time. I.e. are there other protocols or other initiator-like nodes sharing the link? If yes or if "maybe yes", the infrastructure keeps the link up. If not, it can move it into a low-power state. -- Stefan Richter -=====-==--- =--- -==-= http://arcgraph.de/sr/ --
That is a parculiar way of viewing it. Alan's patch introduce runtime pm attributes to the devices. Quoting: +/** + * scsi_suspend_sdev - suspend a SCSI device + * @sdev: the scsi_device to suspend + * @msg: Power Management message describing this state transition + * + * SCSI devices can't actually be suspended in a literal sense, + * because SCSI doesn't have any notion of power management. Instead + * this routine drains the request queue and calls the ULD's suspend + * method to flush caches, spin-down drives, and so on. + * + * If the suspend succeeds, we call scsi_autosuspend_host to decrement + * the host's count of unsuspended devices and invoke the LLD's suspend + * method. So you cannot operate on the link independent from the devices. Regards Oliver --
With the original patch, you can't operate on the link independent from the devices. But with the revised patch (whenever I manage to find time to write it!), you _will_ be able to. Alan Stern --
That sounds great .. if you link it through the transport class, that can implement the policy you want (as in power all devices down before the link for USB, but just power down the link for SAS/SATA). James --
Assuming we have a transport class for USB/Firewire! That's the reason I proposed adding such a thing. Alan Stern --
Regarding these scsi suspend patches, there's a general problem to drop power on disk devices on a running system. I discussed it in: http://www.gossamer-threads.com/lists/linux/kernel/811598 We have a sequence: a) stop further block requests b) sync the disk (and sync the cache -- there was talk on and off for several years about sync not realling syncing) c) drop power on the disk We tweak the ext3 mount timeout and the /proc/sys/vm settings to put the computer into laptop mode. When a read comes along, we reverse the process...(or a forced write which generally won't happen). But we need to add patches to the device driver and the block layer to enable this...it seems useful if there was a more generic way to handle it...maybe registering a callback to reenable power and a mechanism to start the poweroff sequence... We've done this in 2.6.20, I wonder if there's any work along these lines in recent kernels (I'm going to look at 2.6.2[67]...) --
There's a much worse problem which that thread completely ignored: When you turn off power to a disk device, to the system it looks like a hot-unplug event. Any mounted filesystems or memory mappings on that disk will be lost. Alan Stern --
Being able to drop power on the disk on demand is a useful concept. We do it, but need a number of custom patches in a number of places. When the system WANTS to access the disk, it does what is necessary to get the disk to spin up... I'm not sure dropping disk power in a control way should trigger a hot-plug event --if everyone EXPECTS it. I'm just looking for a more generic way to enable this...I'm going to be looking a 2.6.26/27 for this soon... --
That's the situation we're in now. Autosuspend operates on the USB bus, but it can't do anything with usb-storage because the child SCSI devices don't do a SCSI autosuspend. Alan Stern --
Ok, I see, "its done at the wrong level" sounds pretty serious. First the general comments/questions: # #1. It's done at the wrong level: suspend "device" is actually a target #function. There's no way on a multi-lun device we want to keep the #flags and last_busy anywhere but in the target So... if there's one device with Lun0==cdrom1 and Lun1==cdrom2, it is a single target, and we want to keep flags/last busy common to all that? What is good data structure to add? I see scsi_tgt*.h, but it is very short, and there does not seem to be good structure to hook into. #2. As you say in the comment, the thing we're trying to power down is #the link. In most SCSI implementations, the link has a rather complex #relationship to the target, what we want to do in #periodic_autosuspend_scan() is run over the devices on each link, and #if #they're not busy suspend the link? What's probably needed is a set of #adjunct helpers for the transport classes to do this. So the host suspend/resume stuff should go into struct scsi_transport_template? #3. The link power down is much faster than device spin down ... in #your #patch these two things seem to be coupled ... we really need to keep #them separate. # ACK. #4. The entanglement with error handling is incredibly problematic #(since #eh is a nastily complex state machine in its own right). What do #transports that use eh_strategy_handler do about all of this? /me scared... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
Is this step in the right direction? Moved autosuspend from
scsi_host_template to scsi_transport_template...
Pavel
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index e4864d9..2b8cf09 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -320,14 +320,14 @@ static struct device_attribute *ahci_sde
};
struct pci_dev *my_pdev;
-int autosuspend_enabled = 0; /* HERE */
+int autosuspend_enabled = 1; /* HERE */
struct sleep_disabled_reason ahci_active = {
"ahci"
};
/* The host and its devices are all idle so we can autosuspend */
-static int autosuspend(struct Scsi_Host *host)
+int ahci_autosuspend(struct Scsi_Host *host)
{
if (my_pdev && autosuspend_enabled) {
printk("ahci: should autosuspend\n");
@@ -340,7 +340,7 @@ static int autosuspend(struct Scsi_Host
}
/* The host needs to be autoresumed */
-static int autoresume(struct Scsi_Host *host)
+int ahci_autoresume(struct Scsi_Host *host)
{
if (my_pdev && autosuspend_enabled) {
printk("ahci: should autoresume\n");
@@ -360,8 +360,8 @@ static struct scsi_host_template ahci_sh
.sg_tablesize = AHCI_MAX_SG,
.dma_boundary = AHCI_DMA_BOUNDARY,
.shost_attrs = ahci_shost_attrs,
- .autosuspend = autosuspend,
- .autoresume = autoresume,
+// .autosuspend = autosuspend,
+// .autoresume = autoresume,
.sdev_attrs = ahci_sdev_attrs,
};
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index b9d3ba4..d3526a0 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -103,6 +103,9 @@ static const u8 def_control_mpage[CONTRO
0, 30 /* extended self test time, see 05-359r1 */
};
+int ahci_autosuspend(struct Scsi_Host *host);
+int ahci_autoresume(struct Scsi_Host *host);
+
/*
* libata transport template. libata doesn't do real transport stuff.
* It just needs the eh_timed_out hook.
@@ -111,6 +114,8 @@ static struct scsi_transport_template at
.eh_strategy_handler = ata_scsi_error,
.eh_timed_out = ...Actually a command set driver like sd surely wants last_busy (time of last use) separate for each LU for auto-spindown, doesn't it? include/scsi/scsi_tgt*.h are for local target implementations. The representation of "remote" targets, as seen by local initiators, is include/scsi/scsi_device.h's struct scsi_target. -- Stefan Richter -=====-==--- =--- -===- http://arcgraph.de/sr/ --
Would it make sense to split the patches into "autosuspend for SCSI devices" and "autosuspend for SCSI controllers" for easier review/merge? I guess I'll start with the devices, they seem easier... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
Runtime PM will have to leave devices alone while error handling is active. Unfortunately error handling is done at controller level. So I am afraid this would be very difficult. Regards Oliver --
It really should be split up differently. The topics of interest are: idle detection for SCSI devices, autosuspend for SCSI targets and its relation to suspend for SCSI devices, passing suspend & resume notifications to the transport class, adding a USB transport class so that usb-storage can respond to those notifications, modifying other transport classes as needed so that the LLDs can power-down links or host adapters. I think that covers it. Oliver may have some additional suggestions. Alan Stern --
