Re: [PATCH 2/4] libata: Implement disk shock protection support

Previous thread: frame unwinder patches by Bernd Schubert on Thursday, September 4, 2008 - 9:46 am. (12 messages)

Next thread: Building Kernel with -O0 by Keith A. Prickett on Thursday, September 4, 2008 - 10:50 am. (10 messages)
From: Elias Oltmanns
Date: Thursday, September 4, 2008 - 10:32 am

Dear me, I totally forgot about that, didn't I. Anyway, I meant to ask
you about that when you mentioned it the last time round, so thanks for


Well, I can see your point. Technically, we are talking about magnitudes
in the order of seconds rather than milliseconds here because the specs
only guarantee command completion for head unload in 300 or even 500
msecs. This means that the daemon should always schedule timeouts well
above this limit. That's the reason why we have only accepted timeouts
in seconds rather than milliseconds at the user's request. When reading
from sysfs, we have returned seconds for consistency. I'm a bit torn
between the options now:

1. Switch the interface completely to msecs: consistent with the rest of
   libata but slightly misleading because it may promise more accuracy
   than we can actually provide for;
2. keep it the way it was (i.e. seconds on read and write): we don't
   promise too much as far as accuracy is concerned, but it is
   inconsistent with the rest of libata. Besides, user space can still
   issue a 0 and another nonzero timeout within a very short time and we
   don't protect against that anyway;
3. only switch to msecs on read: probably the worst of all options.






Well, it is strange, but it pretty much reflects reality as close as it
can get. Devices can only opt in / out of actually issuing the unload
command but they will always stop I/O and thus be affected by the

First of all, we cannot do a proper per-dev implementation internally.
Admittedly, we could do it per-link rather than per-port, but the point
I'm making is this: there really is just *one* grobal timeout (per-port
now or perhaps per-link in the long run). The confusing thing right now
is that you can read the current timeout on any device, but you can only
set a timeout on a device that actually supports head unloading. Perhaps
we should return something like "n/a" when reading the sysfs attribute
for a device that doesn't support head unloads, even ...
From: Tejun Heo
Date: Friday, September 5, 2008 - 1:51 am

My favorite is #1.  Millisecond is small amount of time but it's also
not hard to imagine some future cases where, say, 0.5 sec of

Not yet but I think we should move toward per-queue EH which will
enable fine-grained exception handling like this.  Such approach would
also help things like ATAPI CHECK_SENSE behind PMP.  I think it's
better to define the interface which suits the problem best rather

If the timeout is global, it's best to have one knob.  If the timeout
is per-port, it's best to have one knob per-port, and so on.  I can't
think of a good reason to implement per-port timeout with per-device
opt out instead of doing per-device timeout from the beginning.  It
just doesn't make much sense interface-wise to me.  As this is an
interface which is gonna stick around for a long time, I really think
it should be done as straight forward as possible even though the
current implementation of the feature has to do it in more crude
manner.

Thanks.

-- 
tejun
--

From: Elias Oltmanns
Date: Wednesday, September 10, 2008 - 6:53 am

Does the following patch look like what you've had in mind (still
applies to next-20080903)?

Regards,

Elias

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
---

 drivers/ata/ahci.c        |    1 
 drivers/ata/libata-eh.c   |   89 ++++++++++++++++++++++++++++++++++++-
 drivers/ata/libata-scsi.c |  109 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/ata/libata.h      |    1 
 include/linux/libata.h    |   12 ++++-
 5 files changed, 209 insertions(+), 3 deletions(-)

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index c729e69..9539050 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -316,6 +316,7 @@ static struct device_attribute *ahci_shost_attrs[] = {
 
 static struct device_attribute *ahci_sdev_attrs[] = {
 	&dev_attr_sw_activity,
+	&dev_attr_unload_heads,
 	NULL
 };
 
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index bd0b2bc..c1a4060 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2447,6 +2447,79 @@ int ata_eh_reset(struct ata_link *link, int classify,
 	goto retry;
 }
 
+static unsigned long ata_eh_park_devs(struct ata_port *ap)
+{
+	struct ata_link *link;
+	struct ata_device *dev;
+	struct ata_taskfile tf;
+	unsigned int err_mask;
+	unsigned long deadline = jiffies;
+
+	ata_port_for_each_link(link, ap) {
+		ata_link_for_each_dev(dev, link) {
+			struct ata_eh_context *ehc = &link->eh_context;
+			struct ata_eh_info *ehi = &link->eh_info;
+
+			if (dev->class != ATA_DEV_ATA ||
+			    dev->flags & ATA_DFLAG_NO_UNLOAD)
+				continue;
+
+			if (ehc->i.dev_action[dev->devno] & ATA_EH_PARK ||
+			    ehi->dev_action[dev->devno] & ATA_EH_PARK) {
+				unsigned long tmp = dev->unpark_deadline;
+
+				if (time_before(deadline, tmp))
+					deadline = tmp;
+				else if (time_before_eq(tmp, jiffies))
+					continue;
+			}
+
+			if (ehc->did_unload_mask & (1 << dev->devno))
+				continue;
+
+			ata_tf_init(dev, &tf);
+			tf.command = ATA_CMD_IDLEIMMEDIATE;
+			tf.feature = ...
From: Tejun Heo
Date: Wednesday, September 10, 2008 - 7:40 am

Hello, Elias.



The correct way to do this is ata_eh_about_to_do().  After that, you
can just look at ehc->i.dev_action[].  Also, you'll need to call

And it's probably better to have ehc->unloaded_mask instead of
ehc->did_unload_mask and clear it here so that if unload is scheduled
after this point but before EH completes, it does unloading again.
ie. Something like the following.

	ata_eh_done(ATA_EH_UNLOAD);

I think it would be better to put timeout computation and handling out
here instead of inside ata_eh_park_devs().  ata_eh_park_devs() just
parks the heads if ATA_DEV_UNLOAD and the outer loop controls when it

Can't we just drop ATA_DFLAG_NO_UNLOAD?  It doesn't provide any real
functionality anymore.

Thanks.

-- 
tejun
--

From: Elias Oltmanns
Date: Wednesday, September 10, 2008 - 12:28 pm

We have a problem here, I'm afraid, because we may keep looping in EH
context and still want to pick up ATA_EH_PARK requests. Imagine that
ATA_EH_PARK has been scheduled for device A and the EH thread has
reached the call to schedule_timeout_uninterruptible(). Now, ATA_EH_PARK
is scheduled for device B on the same port. This will wake up the EH
thread, but ATA_EH_PARK is only recorded in link->eh_info, not in
link->eh_context.i. ata_eh_about_to_do() will unconditionally clear the
flag in eh_info, but checking ehc->i.dev_action afterwards will only
tell us whether this flag was set when we entered EH, not whether it had
been set since.

Should I change ata_eh_about_to_do() so that it will record the action

No need for that because link->eh_context is cleared in


I was afraid you'd say something like that in the end ;-). Well, we
can't. We really should only issue the unload command if we know that
it's safe, i.e., the device supports that feature. We assume it to be
safe if ata_id_has_unload() returns true or if the user told us that the
device does support the command. ATA_DFLAG_NO_UNLOAD is initialised
during device setup by ata_id_has_unload(). For pre-ATA-7 devices (like
mine), the user can manually clear that flag afterwards.

Regards,

Elias
--

From: Tejun Heo
Date: Wednesday, September 10, 2008 - 1:23 pm

That's what ata_eh_about_to_do() currently does, exactly.  Actually,
that's the whole reason it's there as the described problem exists for

No, for example, later steps of EH could fail in which case eh_recover

Oh I see, so it's initialized during dev_configure (I missed that) and
the user needs to be able to override it.  Alright, no objection then.

Thanks.

-- 
tejun
--

From: Elias Oltmanns
Date: Wednesday, September 10, 2008 - 2:04 pm

Sounds reasonable enough. Much as I regret it, though, I really can't
find that this is what actually happens. Where exactly is the action
propagated from ehi to ehc->i? (Checked next-20080903, v2.6.27-rc5 and
v2.6.26).

On another matter: I don't particularly like the idea that there should
appear an "EH complete" in the logs every time a head unload request has
been processed. Is it safe to set ATA_EHI_QUIET when scheduling unload

Alright then.

Regards,

Elias
--

From: Tejun Heo
Date: Wednesday, September 10, 2008 - 3:56 pm

Oops, that was me being stupid.  I can't find it either.  Right, it's
never pulled in as for all other actions, it's enough to make sure
that EH is repeated if an action gets scheduled after
ata_eh_about_to_do().  Sorry about the confusion.  Can you please use
the following function before ata_eh_about_to_do()?

static void ata_eh_pull_action(struct ata_link *link, struct ata_device
*dev,
			       unsigned int action)
{
	...
	struct ata_eh_info *ehi = &link->eh_info;
	struct ata_eh_context *ehc = &link->eh_context;
	...

	spin_lock_irqsave(ap->lock, flags);

	if (dev)
		ehc->i.dev_action[dev->devno] |=
			ehi->dev_action[dev->devno] & action;
	ehc->i.action |= ehi->action & action;

	spin_unlock_irqrestore(ap->lock, flags);
}

And add comment explaning why the operation is needed for unload

Hmmm... ATA_EHI_QUIET masks all EH reporting and as error conditions
are not unlikely under physical shocks, I don't think suppressing them
all is a good idea.  How about adding ATA_EH_QUIET_MASK or a boolean
parameter to ata_eh_about_to_do() such that unload action doesn't set
RECOVERED flag?

Thanks.  :-)

-- 
tejun
--

From: Elias Oltmanns
Date: Thursday, September 11, 2008 - 5:26 am

We mustn't release the lock between pulling the actions into eh_context
and clearing eh_info, so I've designed ata_eh_pull_action() to be used
instead of ata_eh_about_to_do().

What about the following patch? Also, I've slipped in a minor comment
fix to ata_eh_done() which probably doesn't warrant a separate patch; on
the other hand, mixing things like that isn't quite the right thing
either, so, perhaps I should drop it in the final version?

Regards,

Elias

From: Elias Oltmanns <eo@nebensachen.de>
Subject: [PATCH] libata: Implement disk shock protection support

On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD
FEATURE as specified in ATA-7 is issued to the device and processing of
the request queue is stopped thereafter until the specified timeout
expires or user space asks to resume normal operation. This is supposed
to prevent the heads of a hard drive from accidentally crashing onto the
platter when a heavy shock is anticipated (like a falling laptop
expected to hit the floor). In fact, the whole port stops processing
commands until the timeout has expired in order to avoid any resets due
to failed commands on another device.

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
---

 drivers/ata/ahci.c        |    1 
 drivers/ata/libata-eh.c   |  122 ++++++++++++++++++++++++++++++++++++++++++++-
 drivers/ata/libata-scsi.c |  109 ++++++++++++++++++++++++++++++++++++++++
 drivers/ata/libata.h      |    1 
 include/linux/libata.h    |   12 ++++
 5 files changed, 241 insertions(+), 4 deletions(-)

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index c729e69..9539050 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -316,6 +316,7 @@ static struct device_attribute *ahci_shost_attrs[] = {
 
 static struct device_attribute *ahci_sdev_attrs[] = {
 	&dev_attr_sw_activity,
+	&dev_attr_unload_heads,
 	NULL
 };
 
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index bd0b2bc..04b762d 100644
--- ...
From: Tejun Heo
Date: Thursday, September 11, 2008 - 5:51 am

Hmmm... I think either way is okay but there's no harm in splitting

Acked-by: Tejun Heo <tj@kernel.org>

-- 
tejun
--

From: Tejun Heo
Date: Thursday, September 11, 2008 - 6:01 am

Ah.. just one more thing.

I think it would be easier on the application if the written timeout
value is cropped if it's over the maximum instead of failing the
write.

Thanks.

-- 
tejun
--

From: Valdis.Kletnieks
Date: Thursday, September 11, 2008 - 11:28 am

Which is better, failing the write so the application *knows* there is a
problem, or letting the application proceed with a totally incorrect idea of
what the value is set to?

For instance, what happens if the program tries to set 100, it's silently
clamped to 10, and it then tries to set a timer for itself to '90% of the
value'?  It might be in for an unpleasant surprise when it finds out that
it's overshot by 81....


From: Tejun Heo
Date: Thursday, September 11, 2008 - 4:25 pm

It depends.  As -EINVAL either results in program failure or no

Hitting the limit would be a pretty rare occasion and which way we go
it's not gonna be too pretty.  e.g. Let's say a program calculates
timeout according to some algorithm which 99.9% of the time stays in
the limit but once in the blue moon hits the ceiling.  Given the
characteristics of the problem and very high limit value, I think it's
better to have cropped value.

How about returning -OVERFLOW while still setting the timeout to the
maximum?

Thanks.

-- 
tejun
--

From: Elias Oltmanns
Date: Friday, September 12, 2008 - 3:15 am

Yes, that makes sense. I'll take care to document it that way:
-EINVAL for values < -2, -EOVERFLOW for values > 30000.

Once we have smoothed things out wrt the ide patch, I'll repost the
whole series against a recent linux-next tree, so it can be queued up
for 2.6.28. By the way, the ide patch I've just posted does pretty much
what libata should be aiming for in the long run. Hopefully, it'll work
out as expected. Anyway, I'm really grateful to Bart and you for holding
my hand so patiently with all this.

Also, I'll backport the patches to 2.6.27 and make appropriate changes
to hdapsd, so we can get feedback from a wider range of testers. Since
there has been no comment from Jeff as yet, I'll wait with that until
the patches have been merged though.

Regards,

Elias
--

From: Valdis.Kletnieks
Date: Friday, September 12, 2008 - 11:11 am

Works for me...
Previous thread: frame unwinder patches by Bernd Schubert on Thursday, September 4, 2008 - 9:46 am. (12 messages)

Next thread: Building Kernel with -O0 by Keith A. Prickett on Thursday, September 4, 2008 - 10:50 am. (10 messages)