Re: [PATCH] libata: Forcing PIO0 mode on reset must not freeze system

Previous thread: [PATCH] udf: fix udf_add_free_space by Marcin Slusarz on Sunday, February 10, 2008 - 12:27 pm. (3 messages)

Next thread: [PATCH 3/8] drivers/net/e1000: Use FIELD_SIZEOF by Julia Lawall on Sunday, February 10, 2008 - 1:11 pm. (1 message)
From: Holger Macht
Date: Sunday, February 10, 2008 - 12:55 pm

Calling ap->ops->set_piomode(ap, dev) on a device/controller which got
already removed, locks the system hard. Reproducibly on an X60 attached to
a dock station containing a cdrom device with doing

  $ echo 1 > /sys/devices/platform/dock.0/undock && echo 123 > /dev/sr0

This calls ata_eh_reset(...) which in turn tries to force PIO mode 0. But
the device is already gone.

Bisecting revealed the following commit as culprit:

  commit cdeab1140799f09c5f728a5ff85e0bdfa5679cd2
  Author: Tejun Heo <htejun@gmail.com>
  Date:   Mon Oct 29 16:41:09 2007 +0900
  
      libata: relocate forcing PIO0 on reset
      
      Forcing PIO0 on reset was done inside ata_bus_softreset(), which is a
      bit out of place as it should be applied to all resets - hard, soft
      and implementation which don't use ata_bus_softreset().  Relocate it
      such that...
      
      * For new EH, it's done in ata_eh_reset() before calling prereset.
      
      * For old EH, it's done before calling ap->ops->phy_reset() in
        ata_bus_probe().
      
      This makes PIO0 forced after all resets.  Another difference is that
      reset itself is done after PIO0 is forced.
      
      Signed-off-by: Tejun Heo <htejun@gmail.com>
      Acked-by: Alan Cox <alan@redhat.com>
      Signed-off-by: Jeff Garzik <jeff@garzik.org>


ATTENTION! The following patch solves the problem on my system, but please
be aware that I don't really know what I'm doing because I don't have the
big picture. There's surely a better way to check if the device/controller
is still functional than calling ata_link_{online,offline}.


Signed-off-by: Holger Macht <hmacht@suse.de>
---

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 4e31071..d6a7c57 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2131,23 +2131,25 @@ int ata_eh_reset(struct ata_link *link, int classify,
 
 	ata_eh_about_to_do(link, NULL, ehc->i.action & ATA_EH_RESET_MASK);
 
-	ata_link_for_each_dev(dev, ...
From: Tejun Heo
Date: Sunday, February 10, 2008 - 7:37 pm

Hello,


In the above example, even the reset sequence itself can cause hang if
the hardware is implemented slightly differently.  The reason why
set_piomode() locks up but reset sequence doesn't is simple dumb luck.
I think the proper fix is to tell libata to detach the cdrom before

NACK.

-- 
tejun
--

From: Holger Macht
Date: Monday, February 11, 2008 - 3:04 am

Wouldn't the proper fix be to call ata_acpi_handle_hotplug _somewhere_?
(which is currently called nowhere AFAICS)

Anyway, kernel hackers keep telling me that the kernel should just do the
right thing. Shouldn't userspace never be able to freeze the system?

It's completely ok for me to handle this from userspace, if that's the
position of the libata developers.

In this case, we should change the dock driver to default to
immediate_undock=false, because otherwise it's far too risky to freeze the
system.

Regards,
	Holger
--

From: Tejun Heo
Date: Monday, February 11, 2008 - 4:16 am

Hello,


It should be called via ata_acpi_{ap|dev}_notify() callbacks installed
via acpi_install_notify_handler().  Can you add dump_stack() in the
function and verify that it actually is being called?  It could be that
the method is called too late or libata takes too long to actually
unplug the device.  Hmmm... It seems what ata_acpi_handle_hotplug() does
isn't enough for undock.  It probably should request detaching the
device instead of just notifying hotplug event.  Anyways, please lemme

Yeah, I think most things should be done automatically but it's true
that somethings are a bit awkward to handle in kernel.  Also, if you're


I'm not too familiar with how docks work.  Can you please explain
briefly what immediate_undock is?

Thanks.

-- 
tejun
--

From: Holger Macht
Date: Monday, February 11, 2008 - 4:28 am

I already checked, it's never called AFAICS. And I couldn't find a place
where it should be installed, otherwise, I would have sent a patch. The
dock driver already calls the notify methods on devices in the dock


immediate_undock=1:
 User presses undock button on the dock station, dock driver calls ACPI
 undock method immediately.

immediate_undock=0:
 User presses undock button on the dock station, dock driver throws uevent
 and waits for userland to undock the system via sysfs.

immediate_undock is currently set to 1 by default.

Regards,
	Holger
--

From: Tejun Heo
Date: Monday, February 11, 2008 - 6:11 am

ata_acpi_associate() calls acpi_install_notify_handler() for each

Thanks for the explanation.

-- 
tejun
--

From: Holger Macht
Date: Monday, February 11, 2008 - 7:06 am

It should be. I tried once more and noticed that
ata_acpi_handle_hotplug(...) is called when the cdrom is about to be
removed via the bay driver (just removing the device, not the whole dock
station). Actually there is a connection between the bay and the dock
driver, and one of them should notice that the cdrom/bay device is
dependent on the dock, but I don't know what's going wrong here. Kristen
(CC) should definitely know more about this interaction...

On a related note, shouldn't ata_acpi_handle_hotplug delete the device
like what is done when doing

  echo 1 > /sys/devices/.../block/sr0/device/delete

?

Regards,
	Holger
--

From: Tejun Heo
Date: Monday, February 11, 2008 - 4:31 pm

Yeap, when the ata_acpi_handle_hotplug() was added, the focus was
supporting hotplug when the controller itself doesn't have hotplug
notification.  Removing device for undocking would need explicitly
killing it.

-- 
tejun
--

From: Holger Macht
Date: Thursday, February 14, 2008 - 5:42 am

I don't think it is. When the device/bay is inside a dock station, we need
to register for dock events, too. I've sent a patch in a different mail:

 http://lkml.org/lkml/2008/2/14/123

Regards,
	Holger
--

From: Alan Cox
Date: Monday, February 11, 2008 - 4:25 am

I think so. The T61 at least generated ACPI dock and undock messages for
IDE master/slaves and we can use those.

--

From: Holger Macht
Date: Monday, February 11, 2008 - 3:29 am

Another thing, whether it's poor luck or not, it worked like this than at
least 2.6.22. And it was _heavily_ tested. The above commit broke it
between 2.6.24-rc1 and 2.6.24-rc2, which is at least a regression.

Regards,
	Holger
--

From: Tejun Heo
Date: Monday, February 11, 2008 - 3:49 am

Hello,


Yes, in a sense but the change is in the area where there's no defined
behavior, so I don't think it's a good idea to revert the change here.
Let's concentrate on finding the proper solution.

Thanks.

-- 
tejun
--

From: Alan Cox
Date: Monday, February 11, 2008 - 4:32 am

Not neccessarily. It may just be timing chance on your box.

I agree however we should be doing the reset after the PIO0 switch if it
turns out that the precise order of the two events matters.

Alan

--

From: Alan Cox
Date: Monday, February 11, 2008 - 4:50 am

Just a thought but which ICH variant is this. For the older ones we have
to turn IORDY off in software before an eject and they cannot fully
support hot unplug

Alan

--

Previous thread: [PATCH] udf: fix udf_add_free_space by Marcin Slusarz on Sunday, February 10, 2008 - 12:27 pm. (3 messages)

Next thread: [PATCH 3/8] drivers/net/e1000: Use FIELD_SIZEOF by