Dear me, I totally forgot about that, didn't I. Anyway, I meant to ask you about that when you mentioned it the last time round, so thanks for Well, I can see your point. Technically, we are talking about magnitudes in the order of seconds rather than milliseconds here because the specs only guarantee command completion for head unload in 300 or even 500 msecs. This means that the daemon should always schedule timeouts well above this limit. That's the reason why we have only accepted timeouts in seconds rather than milliseconds at the user's request. When reading from sysfs, we have returned seconds for consistency. I'm a bit torn between the options now: 1. Switch the interface completely to msecs: consistent with the rest of libata but slightly misleading because it may promise more accuracy than we can actually provide for; 2. keep it the way it was (i.e. seconds on read and write): we don't promise too much as far as accuracy is concerned, but it is inconsistent with the rest of libata. Besides, user space can still issue a 0 and another nonzero timeout within a very short time and we don't protect against that anyway; 3. only switch to msecs on read: probably the worst of all options. Well, it is strange, but it pretty much reflects reality as close as it can get. Devices can only opt in / out of actually issuing the unload command but they will always stop I/O and thus be affected by the First of all, we cannot do a proper per-dev implementation internally. Admittedly, we could do it per-link rather than per-port, but the point I'm making is this: there really is just *one* grobal timeout (per-port now or perhaps per-link in the long run). The confusing thing right now is that you can read the current timeout on any device, but you can only set a timeout on a device that actually supports head unloading. Perhaps we should return something like "n/a" when reading the sysfs attribute for a device that doesn't support head unloads, even ...
My favorite is #1. Millisecond is small amount of time but it's also not hard to imagine some future cases where, say, 0.5 sec of Not yet but I think we should move toward per-queue EH which will enable fine-grained exception handling like this. Such approach would also help things like ATAPI CHECK_SENSE behind PMP. I think it's better to define the interface which suits the problem best rather If the timeout is global, it's best to have one knob. If the timeout is per-port, it's best to have one knob per-port, and so on. I can't think of a good reason to implement per-port timeout with per-device opt out instead of doing per-device timeout from the beginning. It just doesn't make much sense interface-wise to me. As this is an interface which is gonna stick around for a long time, I really think it should be done as straight forward as possible even though the current implementation of the feature has to do it in more crude manner. Thanks. -- tejun --
Does the following patch look like what you've had in mind (still
applies to next-20080903)?
Regards,
Elias
Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
---
drivers/ata/ahci.c | 1
drivers/ata/libata-eh.c | 89 ++++++++++++++++++++++++++++++++++++-
drivers/ata/libata-scsi.c | 109 +++++++++++++++++++++++++++++++++++++++++++++
drivers/ata/libata.h | 1
include/linux/libata.h | 12 ++++-
5 files changed, 209 insertions(+), 3 deletions(-)
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index c729e69..9539050 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -316,6 +316,7 @@ static struct device_attribute *ahci_shost_attrs[] = {
static struct device_attribute *ahci_sdev_attrs[] = {
&dev_attr_sw_activity,
+ &dev_attr_unload_heads,
NULL
};
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index bd0b2bc..c1a4060 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2447,6 +2447,79 @@ int ata_eh_reset(struct ata_link *link, int classify,
goto retry;
}
+static unsigned long ata_eh_park_devs(struct ata_port *ap)
+{
+ struct ata_link *link;
+ struct ata_device *dev;
+ struct ata_taskfile tf;
+ unsigned int err_mask;
+ unsigned long deadline = jiffies;
+
+ ata_port_for_each_link(link, ap) {
+ ata_link_for_each_dev(dev, link) {
+ struct ata_eh_context *ehc = &link->eh_context;
+ struct ata_eh_info *ehi = &link->eh_info;
+
+ if (dev->class != ATA_DEV_ATA ||
+ dev->flags & ATA_DFLAG_NO_UNLOAD)
+ continue;
+
+ if (ehc->i.dev_action[dev->devno] & ATA_EH_PARK ||
+ ehi->dev_action[dev->devno] & ATA_EH_PARK) {
+ unsigned long tmp = dev->unpark_deadline;
+
+ if (time_before(deadline, tmp))
+ deadline = tmp;
+ else if (time_before_eq(tmp, jiffies))
+ continue;
+ }
+
+ if (ehc->did_unload_mask & (1 << dev->devno))
+ continue;
+
+ ata_tf_init(dev, &tf);
+ tf.command = ATA_CMD_IDLEIMMEDIATE;
+ tf.feature = ...Hello, Elias. The correct way to do this is ata_eh_about_to_do(). After that, you can just look at ehc->i.dev_action[]. Also, you'll need to call And it's probably better to have ehc->unloaded_mask instead of ehc->did_unload_mask and clear it here so that if unload is scheduled after this point but before EH completes, it does unloading again. ie. Something like the following. ata_eh_done(ATA_EH_UNLOAD); I think it would be better to put timeout computation and handling out here instead of inside ata_eh_park_devs(). ata_eh_park_devs() just parks the heads if ATA_DEV_UNLOAD and the outer loop controls when it Can't we just drop ATA_DFLAG_NO_UNLOAD? It doesn't provide any real functionality anymore. Thanks. -- tejun --
We have a problem here, I'm afraid, because we may keep looping in EH context and still want to pick up ATA_EH_PARK requests. Imagine that ATA_EH_PARK has been scheduled for device A and the EH thread has reached the call to schedule_timeout_uninterruptible(). Now, ATA_EH_PARK is scheduled for device B on the same port. This will wake up the EH thread, but ATA_EH_PARK is only recorded in link->eh_info, not in link->eh_context.i. ata_eh_about_to_do() will unconditionally clear the flag in eh_info, but checking ehc->i.dev_action afterwards will only tell us whether this flag was set when we entered EH, not whether it had been set since. Should I change ata_eh_about_to_do() so that it will record the action No need for that because link->eh_context is cleared in I was afraid you'd say something like that in the end ;-). Well, we can't. We really should only issue the unload command if we know that it's safe, i.e., the device supports that feature. We assume it to be safe if ata_id_has_unload() returns true or if the user told us that the device does support the command. ATA_DFLAG_NO_UNLOAD is initialised during device setup by ata_id_has_unload(). For pre-ATA-7 devices (like mine), the user can manually clear that flag afterwards. Regards, Elias --
That's what ata_eh_about_to_do() currently does, exactly. Actually, that's the whole reason it's there as the described problem exists for No, for example, later steps of EH could fail in which case eh_recover Oh I see, so it's initialized during dev_configure (I missed that) and the user needs to be able to override it. Alright, no objection then. Thanks. -- tejun --
Sounds reasonable enough. Much as I regret it, though, I really can't find that this is what actually happens. Where exactly is the action propagated from ehi to ehc->i? (Checked next-20080903, v2.6.27-rc5 and v2.6.26). On another matter: I don't particularly like the idea that there should appear an "EH complete" in the logs every time a head unload request has been processed. Is it safe to set ATA_EHI_QUIET when scheduling unload Alright then. Regards, Elias --
Oops, that was me being stupid. I can't find it either. Right, it's
never pulled in as for all other actions, it's enough to make sure
that EH is repeated if an action gets scheduled after
ata_eh_about_to_do(). Sorry about the confusion. Can you please use
the following function before ata_eh_about_to_do()?
static void ata_eh_pull_action(struct ata_link *link, struct ata_device
*dev,
unsigned int action)
{
...
struct ata_eh_info *ehi = &link->eh_info;
struct ata_eh_context *ehc = &link->eh_context;
...
spin_lock_irqsave(ap->lock, flags);
if (dev)
ehc->i.dev_action[dev->devno] |=
ehi->dev_action[dev->devno] & action;
ehc->i.action |= ehi->action & action;
spin_unlock_irqrestore(ap->lock, flags);
}
And add comment explaning why the operation is needed for unload
Hmmm... ATA_EHI_QUIET masks all EH reporting and as error conditions
are not unlikely under physical shocks, I don't think suppressing them
all is a good idea. How about adding ATA_EH_QUIET_MASK or a boolean
parameter to ata_eh_about_to_do() such that unload action doesn't set
RECOVERED flag?
Thanks. :-)
--
tejun
--
We mustn't release the lock between pulling the actions into eh_context
and clearing eh_info, so I've designed ata_eh_pull_action() to be used
instead of ata_eh_about_to_do().
What about the following patch? Also, I've slipped in a minor comment
fix to ata_eh_done() which probably doesn't warrant a separate patch; on
the other hand, mixing things like that isn't quite the right thing
either, so, perhaps I should drop it in the final version?
Regards,
Elias
From: Elias Oltmanns <eo@nebensachen.de>
Subject: [PATCH] libata: Implement disk shock protection support
On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD
FEATURE as specified in ATA-7 is issued to the device and processing of
the request queue is stopped thereafter until the specified timeout
expires or user space asks to resume normal operation. This is supposed
to prevent the heads of a hard drive from accidentally crashing onto the
platter when a heavy shock is anticipated (like a falling laptop
expected to hit the floor). In fact, the whole port stops processing
commands until the timeout has expired in order to avoid any resets due
to failed commands on another device.
Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
---
drivers/ata/ahci.c | 1
drivers/ata/libata-eh.c | 122 ++++++++++++++++++++++++++++++++++++++++++++-
drivers/ata/libata-scsi.c | 109 ++++++++++++++++++++++++++++++++++++++++
drivers/ata/libata.h | 1
include/linux/libata.h | 12 ++++
5 files changed, 241 insertions(+), 4 deletions(-)
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index c729e69..9539050 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -316,6 +316,7 @@ static struct device_attribute *ahci_shost_attrs[] = {
static struct device_attribute *ahci_sdev_attrs[] = {
&dev_attr_sw_activity,
+ &dev_attr_unload_heads,
NULL
};
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index bd0b2bc..04b762d 100644
--- ...Hmmm... I think either way is okay but there's no harm in splitting Acked-by: Tejun Heo <tj@kernel.org> -- tejun --
Ah.. just one more thing. I think it would be easier on the application if the written timeout value is cropped if it's over the maximum instead of failing the write. Thanks. -- tejun --
Which is better, failing the write so the application *knows* there is a problem, or letting the application proceed with a totally incorrect idea of what the value is set to? For instance, what happens if the program tries to set 100, it's silently clamped to 10, and it then tries to set a timer for itself to '90% of the value'? It might be in for an unpleasant surprise when it finds out that it's overshot by 81....
It depends. As -EINVAL either results in program failure or no Hitting the limit would be a pretty rare occasion and which way we go it's not gonna be too pretty. e.g. Let's say a program calculates timeout according to some algorithm which 99.9% of the time stays in the limit but once in the blue moon hits the ceiling. Given the characteristics of the problem and very high limit value, I think it's better to have cropped value. How about returning -OVERFLOW while still setting the timeout to the maximum? Thanks. -- tejun --
Yes, that makes sense. I'll take care to document it that way: -EINVAL for values < -2, -EOVERFLOW for values > 30000. Once we have smoothed things out wrt the ide patch, I'll repost the whole series against a recent linux-next tree, so it can be queued up for 2.6.28. By the way, the ide patch I've just posted does pretty much what libata should be aiming for in the long run. Hopefully, it'll work out as expected. Anyway, I'm really grateful to Bart and you for holding my hand so patiently with all this. Also, I'll backport the patches to 2.6.27 and make appropriate changes to hdapsd, so we can get feedback from a wider range of testers. Since there has been no comment from Jeff as yet, I'll wait with that until the patches have been merged though. Regards, Elias --
