Right, will do. Maybe I'm missing something but I don't see how I could use msleep() Well, I really am concerned about dev->sdev. So far, I haven't quite figured out yet whether under which circumstances I can safely assume that the scsi counter part of dev including the block layer request queue has been completely set up and configured so there won't be any null pointer dereferences. However, if you think that I needn't bother with stopping the request queue anyway, checking for ATA_DEV_ATA (what Quite right. It's just that it will be un- and replugged every (3 * HZ) / 1000, so I thought it might be worthwhile to stop the queue anyway. Perhaps it really isn't worth bothering and the code would Just to be sure, are you just referring to the queue lock, or to the host lock as well? Am I wrong in thinking that we have to protect all access to dev->flags because bit operations are performed non atomically Probably, I should insert a comment somewhere. The problem is that device internal power management will be disabled until the next command is received. If you have laptop mode enabled and the device has received the unload command while spinning with no more commands in the queue to follow, the device may keep spinning for quite a while and won't go into standby which rather defeats the purpose of laptop mode. This behaviour Since it's a user process that controls when we have to keep the heads off the platter, a suspend operation has to be blocked *before* process Ah, I can see that this while loop can replace my call to wait_event() in the eh sequence earlier on. However, I wonder how I am to replace the call to mod_timer(). As you can see, we perform different actions depending on whether the timer has merely been updated, or a new timer has been started. Only in the latter case we want to schedule eh. In order to achieve the equivalent in your setting while preventing any races, I'd have to protect the deadline field in struct ata_port by the host lock, ...
Hello,
Ah.. you need to part ATAPI too? If so, just test for
ata_dev_enabled(). One way or the other, there's no need to care
While the EH is running, nothing gets through other than commands
issued by EH itself, so no need to worry about how upper layers would
Yes, when modifying the flags. You don't need to when testing a
Ah.. Okay. I somehow thought the command would spin down the disk.
Can you please elaborate a bit? The reloading is done by the kernel
after a timeout, right? What kind of precarious situation can the
You can just do
spin_lock_irq(ap->lock);
ap->deadline = jiffies + timeout;
ap->link.eh_info.action |= ATA_EH_PARK;
ata_port_schedule_eh(ap);
Yeah, this looks about right, but you can make it a bit simpler...
while (time_before((now = jiffies), ap->deadline))
schedule_timeout_uninterruptible(ap->deadline - now);
As locking on reader side doesn't mean much in cases like this. The
deadline update which triggered EH is guaranteed to be visible and the
the window between waking up from schedule_timeout and ap->deadline
Oh.. what I meant was whether we need a separate sysfs node to
indicate whether unload feature is enabled or not but now I come to
think about it, that is per-device and the timer is per-port. :-)
Thanks.
--
tejun
--
Well, I'm not quite sure really. Perhaps you are right and I'd better leave ATAPI alone, especially given the problem that the unload command might mess up a CD/DVD write operation. As long as no laptop HDs are Right. Sorry, I haven't expressed myself very clearly there, it seems. The user space process detects some acceleration and starts writing timeouts to the sysfs file. This causes the unload command to be issued to the device and stops all I/O until the user space daemon decides that the danger has passed, writes 0 to the sysfs file and leaves it alone afterwards. Now, if the daemon happens to request head parking right at the beginning of a suspend sequence, this means that we are in danger of falling, i.e. we have to make sure that I/O is stopped until that user space daemon gives the all-clear. However, suspending implies freezing all processes which means that the daemon can't keep checking and signalling to the kernel. The last timeout received before the daemon has been frozen will expire and the suspend procedure goes ahead. By means of the notifiers we can make sure that suspend is blocked until Please note that I want to schedule EH (and thus the head unload - check power command sequence) only once in the event of overlapping timeouts. For instance, when the daemon sets a timeout of 2 seconds and does so again after one second has elapsed, I want the following to happen: [ Daemon writes 2 to the unload_heads file ] 1. Set timer / deadline; 2. eh_info.action |= ATA_EH_PARK; 3. schedule EH; 4. execute EH and sleep, waiting for the timeout to expire; [ Daemon writes 2 to the unload_heads file before the previous timeout has expired. ] 5. update timer / deadline; 6. the EH thread keeps blocking until the new timeout has expired. In particular, I don't want to reschedule EH in response to the second write to the unload_heads file. Also, we have to consider the case where the daemon signals to resume I/O prematurely by writing a timeout of ...
Hello, Is it really worth protecting against that? What if the machine started to fall after the userland tasks have been frozen? And how long the usual timeout would be? If the machine has been falling for 10 secs, there really isn't much point in trying to protect anything unless there also is ATA DEPLOY PARACHUTE command somewhere in the new revision. In libata, as with any other exceptions, suspend/resume are handled by EH so while emergency head unload is in progress, suspend won't commence which is about the same effect as the posted code sans the timeout extension part. I don't really think there's any significant danger in not being able to extend timeout while suspend is in progress. It's not a very big window after all. If you're really worried about it, you can also let libata reject suspend if head unload is in progress. Also, the suspend operation is unloading the head and spin down the drive which sound like a good thing to do before crashing. Maybe we can modify the suspend sequence such that it always unload the head first and then issue spindown. That will ensure the head is in safe position as soon as possible. If it's done this way, it'll be probably a good idea to split unloading and loading operations and do loading only when EH is being finished and the disk is not spun down. To me, much more serious problem seems to be during hibernation. The kernel is actively writing memory image to userland and it takes quite Whether EH is scheduled multiple times or not doesn't matter at all. EH can be happily scheduled without any actual action to do and that does happen from time to time due to asynchronous nature of events. libata EH doesn't have any problem with that. The only thing that's required is there's at least one ata_schedule_eh() after the latest EH-worthy event. So, the simpler code might enter EH one more time once in the blue moon, but it won't do any harm. EH will just look around and realize that there's nothing much to do ...
Hi, Seconded. To be honest I also don't like the change to issue UNLOAD to all devices on the port (it only needlessly increases complexity right now since the _only_ use case at the moment is ThinkPad w/ hdaps + 1 HD). Which also brings again the question whether it is really the best to use user-space solution instead of kernel thread? After taking the look into the deamon program and hdaps driver I tend to "Nope." answer. The kernel-space solution would be more reliable, should result in significatly less code and would free us from having a special purpose libata/ide interfaces. It should also make the maintainance and future enhancements (i.e. making hibernation unload friendly) a lot easier. I imagine that this comes a bit late but can we at least give it an another thought, please? Thanks, Bart --
We are not just protecting against free fall. Think of a series of minor
Personaaly, I'm very much against plain rejection because of the odd
head unload. A delay in the suspend procedure, on the other hand, is
Well, scsi midlayer will also issue a flush cache command. Besides, with
previous implementations I have observed occasional lock ups when
suspending while the unload timer was running. Once we have settled the
timer vs deadline issue, I'm willing to do some more investigation in
this area if you really insist that pm_notifiers should be avoided. But
then I am still not too sure about your reasoning and do feel happier
That's right. The first requirement to protect against this problem is
to have the policy all in kernel space which isn't going to happen for
some time yet. This really is a *best effort* solution rather than a
The whole EH machinery is a very complex beast. Any user of the
emergency head park facility has a particular interest that the system
spends as little time as possible in the EH code even if it's real error
recovery that matters most. Perhaps we could agree on the following
compromise:
spin_lock_irq(ap->lock);
old_deadline = ap->deadline;
ap->deadline = jiffies + timeout;
if (old_deadline < jiffies) {
ap->link.eh_info.action |= ATA_EH_PARK;
ata_port_schedule_eh(ap);
}
spin_unlock_irq(ap->lock);
There is still a race but it is very unlikely to trigger.
Still, you have dismissed my point about the equivalent of stopping a
running timer by specifying a 0 timeout. In fact, whenever the new
deadline is *before* the old deadline, we have to signal the sleeping EH
thread to wake up in time. This way we end up with something like
wait_event().
Regards,
Elias
--
Hello, I'm not particularly against pm_notifiers. I just can't see what advantages it have given the added complexity. The only race window it closes is the one between suspend start and userland task freeze, which is a relatively short one and there are other much larger gaping holes there, so I don't really see much benefit in the particular pm_notifiers usage. Maybe we need to keep the task running till the power is pulled? If we can do that in sane manner which is no easy Yeap, it is. I just thought pm_notifiers bit didn't really contribute The logic is quite complex due to the wonderful ATA but for users requesting actions, it really is quite simple. Well, at least I think Really, it doesn't matter at all. That's just an over optimization. The whole EH machinery pretty much expects spurious EH events and schedules and deals with them quite well. No need to add extra Yes, right, reducing the timeout. How about doing the following? wait_event(ata_scsi_part_wq, time_before(jiffies, ap->unload_deadline)); Heh... then again, it's not much different from your original code. I won't object strongly to the original code but I still prefer just setting deadline and kicking EH from the userside instead of directly manipulating the timer. That way, implementation is more separate from the interface and to me it seems easier to follow the code. Thanks. -- tejun --
I doubt that is feasible without substantial work especially as long as part of it all is implemented in user space. With regard to the pm_notifiers, I'll try to figure out exactly what the problem was without them (I thought it was related to timers not firing anymore after process freezing, but Pavel doubts that and I'm not too sure anymore). Yes, you would ;-). Seriously though, I do agree that it is easy for ... because of its complexity it is hard for me to estimate timing impacts and constraints. The questions I'm concerned about are: What is the average and worst case time it takes to get the heads parked and what can I do if not to improve either of them, then at least not to make things worse. In particular, I don't really have a clue about how much time it takes to go through EH if no action is requested in comparison to, say, the average read / write command. Obviously, I don't want to schedule EH unnecessarily if that would mean that I won't be able to issue another head unload for considerably longer than during normal I/O or, indeed, on an idle system. Arguably, I don't even want to do anything that causes more logging than absolutely necessary because this will ultimately result in the disk spinning up from standby. But then I believe that I only came across this logging issue when I was still playing around with eh_revalidate and the like. So, can you set my mind at rest that timing is no issue with spurious EH sequences? Now that I come to think of it, I suppose it would harm performance anyway, so That's fine with me. All I want is that my code doesn't end up leaving the system in an unresponsive state (to a head unload request, that is) more often than before by spuriously scheduling EH. If that is not a problem, I'm content. Regards, Elias --
For STR, I think it shouldn't be too difficult. For STD, it's more difficult but it would be easy if we used a kexec kernel for Hmm... I don't think pm notifiers would be necessary to prevent things I can't tell you the exact delays, but sans times for actual actions, there is no real delay. It just involves scheduling a few times and jumps through various functions in EH. I don't think that would be anything measureable. Also, generating duplicate events. Events would be duplicate only when those events occur during EH is in progress, right? In that case, as EH finishes it would see that there's another EH action requested and re-enter EH. The second invocation of EH would go through the diagnostic steps (doesn't involve issuing any command, just checks data structures) and find out that there's nothing to do and just exit. For those bogus runs, EH won't print out anything either. So, really, nothing to worry about there. If that's not enough assurance, even ATAPI CHECK SENSE is done via EH. That is, many ATAPI commands invoke EH after completion but nobody No, it won't leave the system unresponsive to unload request or anything else. If it does, it's a bug in EH core and should be fixed. So, there should be no problem in making the implementation simple. Thanks. -- tejun --
As yet, I haven't been able to reproduce the problems with an
implementation without a dedicated park_timer. So, I'm fine with
Yes, but that will be the rule rather than the exception because the
daemon will hardly ever let the timeout expire. Eitehr it decides that
it needs to extend the timeout, or it recons that everything's going to
be alright and issues a 0 timeout in order to resume normal operation
Right then. Here is another patch where, hopefully, most of your
concerns have been addressed. Please tell me what you make of it
(applies to next-20080903).
Regards,
Elias
---
drivers/ata/ahci.c | 1 +
drivers/ata/ata_piix.c | 6 +++
drivers/ata/libata-eh.c | 48 ++++++++++++++++++++++++++
drivers/ata/libata-scsi.c | 84 +++++++++++++++++++++++++++++++++++++++++++++
drivers/ata/libata.h | 1 +
include/linux/libata.h | 4 ++
6 files changed, 144 insertions(+), 0 deletions(-)
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index c729e69..9539050 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -316,6 +316,7 @@ static struct device_attribute *ahci_shost_attrs[] = {
static struct device_attribute *ahci_sdev_attrs[] = {
&dev_attr_sw_activity,
+ &dev_attr_unload_heads,
NULL
};
diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index b1d08a8..1b470ad 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -298,8 +298,14 @@ static struct pci_driver piix_pci_driver = {
#endif
};
+static struct device_attribute *piix_sdev_attrs[] = {
+ &dev_attr_unload_heads,
+ NULL
+};
+
static struct scsi_host_template piix_sht = {
ATA_BMDMA_SHT(DRV_NAME),
+ .sdev_attrs = piix_sdev_attrs,
};
static struct ata_port_operations piix_pata_ops = {
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index bd0b2bc..7754f32 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2447,6 +2447,40 @@ int ata_eh_reset(struct ata_link *link, int ...Oh.. I see. Either way, it should be fine. There is no reason to Hmm... I meant more like extern struct device_attribute **libata_sdev_attrs; #define ATA_BASE_SHT(name) \ .... .sdev_attrs = libata_sdev_attrs; \ .... Which will give unload_heads to all libata drivers. As ahci needs its Nitpicking: Do you mind taking the schedule_timeout out of the while condition? It's just not very customary to put a statement with that level of side effect into a condition clause. Also, it would force Isn't seconds a bit too crude? Or it just doesn't matter as it's usually adjusted before expiring? For most time interval values (except for transfer timings of course) in ATA land, millisecs seem to ata_scsi_find_dev() should be inside ap->lock. Looking through the You'll probably want to use spin_lock_irqsave and restore. It's a It doesn't really matter as all these are under the lock but maybe moving ata_port_schedule_eh() below unpark_deadline is a good idea just for clarification - you know, set the state and trigger the Hmmm... Sorry to bring another issue with it but I think the interface is a bit convoluted. The unpark node is per-dev but the action is per-port but devices can opt out by writing -2. Also, although the sysfs nodes are per-dev, writing to a node changes the value of park node in the device sharing the port except when the value is -1 or -2. That's strange, right? How about something like the following? * In park_store: set dev->unpark_timeout, kick and wake up EH. * In park EH action: until the latest of all unpark_timeout are passed, park all drives whose unpark_timeout is in future. When none of the drives needs to be parked (all timers expired), the action completes. * There probably needs to be a flag to indicate that the timeout is valid; otherwise, we could get spurious head unparking after jiffies wraps (or maybe just use jiffies_64?). With something like the above, the interface is cleanly per-dev ...
