Calling ap->ops->set_piomode(ap, dev) on a device/controller which got
already removed, locks the system hard. Reproducibly on an X60 attached to
a dock station containing a cdrom device with doing
$ echo 1 > /sys/devices/platform/dock.0/undock && echo 123 > /dev/sr0
This calls ata_eh_reset(...) which in turn tries to force PIO mode 0. But
the device is already gone.
Bisecting revealed the following commit as culprit:
commit cdeab1140799f09c5f728a5ff85e0bdfa5679cd2
Author: Tejun Heo <htejun@gmail.com>
Date: Mon Oct 29 16:41:09 2007 +0900
libata: relocate forcing PIO0 on reset
Forcing PIO0 on reset was done inside ata_bus_softreset(), which is a
bit out of place as it should be applied to all resets - hard, soft
and implementation which don't use ata_bus_softreset(). Relocate it
such that...
* For new EH, it's done in ata_eh_reset() before calling prereset.
* For old EH, it's done before calling ap->ops->phy_reset() in
ata_bus_probe().
This makes PIO0 forced after all resets. Another difference is that
reset itself is done after PIO0 is forced.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ATTENTION! The following patch solves the problem on my system, but please
be aware that I don't really know what I'm doing because I don't have the
big picture. There's surely a better way to check if the device/controller
is still functional than calling ata_link_{online,offline}.
Signed-off-by: Holger Macht <hmacht@suse.de>
---
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 4e31071..d6a7c57 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2131,23 +2131,25 @@ int ata_eh_reset(struct ata_link *link, int classify,
ata_eh_about_to_do(link, NULL, ehc->i.action & ATA_EH_RESET_MASK);
- ata_link_for_each_dev(dev, ...Hello, In the above example, even the reset sequence itself can cause hang if the hardware is implemented slightly differently. The reason why set_piomode() locks up but reset sequence doesn't is simple dumb luck. I think the proper fix is to tell libata to detach the cdrom before NACK. -- tejun --
Wouldn't the proper fix be to call ata_acpi_handle_hotplug _somewhere_? (which is currently called nowhere AFAICS) Anyway, kernel hackers keep telling me that the kernel should just do the right thing. Shouldn't userspace never be able to freeze the system? It's completely ok for me to handle this from userspace, if that's the position of the libata developers. In this case, we should change the dock driver to default to immediate_undock=false, because otherwise it's far too risky to freeze the system. Regards, Holger --
Hello,
It should be called via ata_acpi_{ap|dev}_notify() callbacks installed
via acpi_install_notify_handler(). Can you add dump_stack() in the
function and verify that it actually is being called? It could be that
the method is called too late or libata takes too long to actually
unplug the device. Hmmm... It seems what ata_acpi_handle_hotplug() does
isn't enough for undock. It probably should request detaching the
device instead of just notifying hotplug event. Anyways, please lemme
Yeah, I think most things should be done automatically but it's true
that somethings are a bit awkward to handle in kernel. Also, if you're
I'm not too familiar with how docks work. Can you please explain
briefly what immediate_undock is?
Thanks.
--
tejun
--
I already checked, it's never called AFAICS. And I couldn't find a place where it should be installed, otherwise, I would have sent a patch. The dock driver already calls the notify methods on devices in the dock immediate_undock=1: User presses undock button on the dock station, dock driver calls ACPI undock method immediately. immediate_undock=0: User presses undock button on the dock station, dock driver throws uevent and waits for userland to undock the system via sysfs. immediate_undock is currently set to 1 by default. Regards, Holger --
ata_acpi_associate() calls acpi_install_notify_handler() for each Thanks for the explanation. -- tejun --
It should be. I tried once more and noticed that ata_acpi_handle_hotplug(...) is called when the cdrom is about to be removed via the bay driver (just removing the device, not the whole dock station). Actually there is a connection between the bay and the dock driver, and one of them should notice that the cdrom/bay device is dependent on the dock, but I don't know what's going wrong here. Kristen (CC) should definitely know more about this interaction... On a related note, shouldn't ata_acpi_handle_hotplug delete the device like what is done when doing echo 1 > /sys/devices/.../block/sr0/device/delete ? Regards, Holger --
Yeap, when the ata_acpi_handle_hotplug() was added, the focus was supporting hotplug when the controller itself doesn't have hotplug notification. Removing device for undocking would need explicitly killing it. -- tejun --
I don't think it is. When the device/bay is inside a dock station, we need to register for dock events, too. I've sent a patch in a different mail: http://lkml.org/lkml/2008/2/14/123 Regards, Holger --
I think so. The T61 at least generated ACPI dock and undock messages for IDE master/slaves and we can use those. --
Another thing, whether it's poor luck or not, it worked like this than at least 2.6.22. And it was _heavily_ tested. The above commit broke it between 2.6.24-rc1 and 2.6.24-rc2, which is at least a regression. Regards, Holger --
Hello, Yes, in a sense but the change is in the area where there's no defined behavior, so I don't think it's a good idea to revert the change here. Let's concentrate on finding the proper solution. Thanks. -- tejun --
Not neccessarily. It may just be timing chance on your box. I agree however we should be doing the reset after the PIO0 switch if it turns out that the precise order of the two events matters. Alan --
Just a thought but which ICH variant is this. For the older ones we have to turn IORDY off in software before an eject and they cannot fully support hot unplug Alan --
