Re: [PATCH] libata: Handle bay devices in dock stations

Previous thread: Q: Is the "x86: clear DF before calling signal handler" issue affecting User Mode Linux, too? by Ronald Wahl on Wednesday, May 28, 2008 - 6:39 am. (1 message)

Next thread: [PATCH] FRV: Specify the minimum slab/kmalloc alignment by David Howells on Wednesday, May 28, 2008 - 7:36 am. (3 messages)
From: Holger Macht
Date: Wednesday, May 28, 2008 - 7:38 am

* Differentiate between bay devices in dock stations and others:

 - When an ACPI_NOTIFY_EJECT_REQUEST appears, just signal uevent to
   userspace (that is when the optional eject button on a bay device is
   pressed/pulled) giving the possibility to unmount file systems and to
   clean up. Also, only send uevent in case we get an EJECT_REQUEST
   without doing anything else. In other cases, you'll get an add/remove
   event because libata attaches/detaches the device.

 - In case of a dock event, which in turn signals an
   ACPI_NOTIFY_EJECT_REQUEST, immediately detach the device, because it
   may already have been gone

* In case of an ACPI_NOTIFY_DEVICE/BUS_CHECK, evaluate _STA to check if
  the device has been plugged or unplugged. If plugged, hotplug it, if
  unplugged, just signal event to userspace
  (initial patch by Matthew Garrett <mjg59@srcf.ucam.org>)

* Call ACPI _EJ0 for detached devices

Signed-off-by: Holger Macht <hmacht@suse.de>
---

Changes regarding the previous patch:

 * Make sure kobj does not go away outside locking
 * Coding style cleanups
 * Call _EJ0 for detached devices

diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c
index dbf6ca7..e0e49fd 100644
--- a/drivers/ata/libata-acpi.c
+++ b/drivers/ata/libata-acpi.c
@@ -118,8 +118,40 @@ static void ata_acpi_associate_ide_port(struct ata_port *ap)
 		ap->pflags |= ATA_PFLAG_INIT_GTM_VALID;
 }
 
+static void ata_acpi_eject_device(acpi_handle handle)
+{
+	struct acpi_object_list arg_list;
+	union acpi_object arg;
+
+	arg_list.count = 1;
+	arg_list.pointer = &arg;
+	arg.type = ACPI_TYPE_INTEGER;
+	arg.integer.value = 1;
+
+	if (ACPI_FAILURE(acpi_evaluate_object(handle, "_EJ0",
+					      &arg_list, NULL)))
+		printk(KERN_ERR "Failed to evaluate _EJ0!\n");
+}
+
+static void ata_acpi_detach_device(struct ata_port *ap, struct ata_device *dev)
+{
+	if (dev)
+		dev->flags |= ATA_DFLAG_DETACH;
+	else {
+		struct ata_link *tlink;
+		struct ata_device ...
From: Tejun Heo
Date: Wednesday, May 28, 2008 - 7:39 am

Acked-by: Tejun Heo <htejun@gmail.com>

-- 
tejun
--

From: Andrew Morton
Date: Wednesday, May 28, 2008 - 8:02 pm

This might look nicer (and safer) if it used the

	union acpi_object arg = {
		.foo = bar,
		...
	};


It would be better if the printk were to self-identify where it is
coming from.  If your kernel just blurts "Failed to evaluate _EJ0!"
it's a bit of a head-scratcher.


I guess the significance of `dev==0' is known to those who are steeped
in ata_acpi_handle_hotplug() knowledge, but it's pretty inscrutable to
the occasional visitor.  Some comments would help.


--

From: Tejun Heo
Date: Wednesday, May 28, 2008 - 8:08 pm

ata_port_printk() would be better.

Thanks.

-- 
tejun
--

From: Matthew Garrett
Date: Thursday, May 29, 2008 - 6:22 am

Hm. This deadlocks my Thinkpad when I pull a device from the internal 
bay.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
--

From: Holger Macht
Date: Thursday, May 29, 2008 - 6:33 am

Can you please give a bit more details? Does your userspace react on the
eject request? I tested with Thinkpads, too.

 - Holger

--

From: Matthew Garrett
Date: Thursday, May 29, 2008 - 6:32 am

No, I'm not doing anything with userspace - just pulling the drive out 
directly. I'm looking into it now.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
--

From: Holger Macht
Date: Thursday, May 29, 2008 - 6:39 am

This was one of my test scenarios. Just a hint for debugging, before my
patch, HAL accessed /dev/sr0 directly after removing the device, and that
caused the lockup. Hopefully that doesn't bite us again...

 - Holger
--

From: Matthew Garrett
Date: Thursday, May 29, 2008 - 6:40 am

Running with init=/bin/bash, I still get it. The only kernel output I 
get is libata telling me that the device is disabled, and the machine is 
dead after that.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
--

From: Tejun Heo
Date: Thursday, May 29, 2008 - 6:44 am

Does this fix the problem?

---
 drivers/ata/libata-acpi.c |  166 +++++++++++++++++++++++++++++++---------------
 1 file changed, 115 insertions(+), 51 deletions(-)

diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c
index dbf6ca7..1ac3e8f 100644
--- a/drivers/ata/libata-acpi.c
+++ b/drivers/ata/libata-acpi.c
@@ -118,12 +118,63 @@ static void ata_acpi_associate_ide_port(struct ata_port *ap)
 		ap->pflags |= ATA_PFLAG_INIT_GTM_VALID;
 }
 
-static void ata_acpi_handle_hotplug(struct ata_port *ap, struct ata_device
-				    *dev, u32 event)
+static void ata_acpi_eject_device(acpi_handle handle)
+{
+	struct acpi_object_list arg_list;
+	union acpi_object arg;
+
+	arg_list.count = 1;
+	arg_list.pointer = &arg;
+	arg.type = ACPI_TYPE_INTEGER;
+	arg.integer.value = 1;
+
+	if (ACPI_FAILURE(acpi_evaluate_object(handle, "_EJ0",
+					      &arg_list, NULL)))
+		printk(KERN_ERR "Failed to evaluate _EJ0!\n");
+}
+
+/* @ap and @dev are the same as ata_acpi_handle_hotplug() */
+static void ata_acpi_detach_device(struct ata_port *ap, struct ata_device *dev)
+{
+	if (dev)
+		dev->flags |= ATA_DFLAG_DETACH;
+	else {
+		struct ata_link *tlink;
+		struct ata_device *tdev;
+
+		ata_port_for_each_link(tlink, ap)
+			ata_link_for_each_dev(tdev, tlink)
+				tdev->flags |= ATA_DFLAG_DETACH;
+	}
+
+	ata_port_freeze(ap);
+	ata_port_schedule_eh(ap);
+}
+
+/**
+ * ata_acpi_handle_hotplug - ACPI event handler backend
+ * @ap: ATA port ACPI event occurred
+ * @dev: ATA device ACPI event occurred (can be NULL)
+ * @event: ACPI event which occurred
+ * @is_dock_event: boolean indicating whether the event was a dock one
+ *
+ * All ACPI bay / device realted events end up in this function.  If
+ * the event is port-wide @dev is NULL.  If the event is specific to a
+ * device, @dev points to it.
+ *
+ * Hotplug (as opposed to unplug) notification is always handled as
+ * port-wide while unplug only kills the target device on device-wide
+ * event.
+ *
+ * ...
From: Matthew Garrett
Date: Thursday, May 29, 2008 - 7:02 am

No, I'm seeing an identical hang. Mainline works fine.
-- 
Matthew Garrett | mjg59@srcf.ucam.org
--

From: Holger Macht
Date: Thursday, May 29, 2008 - 7:14 am

Maybe you could also try this on top of Tejun's patch?

--- linux-2.6/drivers/ata/libata-acpi.c.orig	2008-05-29 16:13:00.000000000 +0200
+++ linux-2.6/drivers/ata/libata-acpi.c	2008-05-29 16:13:04.000000000 +0200
@@ -237,10 +237,8 @@
 
 	spin_unlock_irqrestore(ap->lock, flags);
 
-	if (wait) {
+	if (wait)
 		ata_port_wait_eh(ap);
-		ata_acpi_eject_device(handle);
-	}
 
 	if (kobj && !is_dock_event) {
 		sprintf(event_string, "BAY_EVENT=%d", event);

--

From: Matthew Garrett
Date: Thursday, May 29, 2008 - 7:35 am

No luck there either, I'm afraid.
-- 
Matthew Garrett | mjg59@srcf.ucam.org
--

From: Tejun Heo
Date: Thursday, May 29, 2008 - 7:37 am

That means device isn't being detached or something afterwards triggers
hotplug event on the detached port.  Can you please sprinkle printks
around and see what ata_acpi_handle_hotplug() does?

Thanks.

-- 
tejun
--

From: Matthew Garrett
Date: Thursday, May 29, 2008 - 7:49 am

Got it. It works if I lose the ata_port_freeze() from 
ata_acpi_detach_device. 

-- 
Matthew Garrett | mjg59@srcf.ucam.org
--

From: Holger Macht
Date: Thursday, May 29, 2008 - 9:32 am

Tejun, any idea? This is in for quite some time now. Maybe there's some
involved...

Regards,
	Holger

--

From: Tejun Heo
Date: Thursday, May 29, 2008 - 9:40 am

Does changing it to ata_port_schedule_eh() make any difference?

-- 
tejun
--

From: Holger Macht
Date: Sunday, June 1, 2008 - 9:05 am

Finally found a system to reproduce the freeze...

And yes, removing ata_port_freeze() and only doing ata_port_schedule_eh()
in the case where the device has already gone works and prevents the
freeze. Also the docking case works. If this change is not only for
finding out what's going wrong, I'm going resend the patch with your
documentation additions, ok?

Regards,
	Holger

--

From: Holger Macht
Date: Thursday, May 29, 2008 - 9:46 am

The last sentence misses the one important word 'race'.

 - Holger
--

From: Henrique de Moraes Holschuh
Date: Thursday, May 29, 2008 - 10:51 am

<naivë question> Do we handle the possible fact that _EJ0 might want the
port (and maybe the device) to be active when it is called AND that _EJ0
might attempt to shutdown the port or the device itself?  </naivë
question>

Please ignore if it is something not even worth answering.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--

From: tom
Date: Friday, May 30, 2008 - 4:07 am

Will this patch or the other bay/dock related patches you send in the  
past days allow me to undock my laptop and still be able to  
suspend/resume without locking the laptop up? And without having to  
run any userspace scripts?

My dock is not a simple port replicator, it has an USB hub and an ATA  
bay in it. My tests showed that I need to 'echo 1 >  
/sys/.../scsi/.../eject' or something like that before I can take the  
laptop out of the docking station. If I don't do that and try to  
access the cdrom in the bay (or even rescaning the scsi bus) after I  
have taken the laptop out of the dock it results in a hard lockup.  
That in itself would not be a problem, because it's just a simple  
command that I can do in an acpid script. But far worse is that even  
if I do that, the computer locks up when I resume from suspend. I run  
2.6.24.7 which locks up every time I resume. In recent versions from  
git it has somehow improved, there are situations where it doesn't  
lock up, but there are still a few left where it does (I don't  
remember the exact actions/commands I have to take).

tom


--

From: Holger Macht
Date: Sunday, June 1, 2008 - 9:06 am

Which kernel versions did you already try? But yes, those patches will
most likely help. But not sure about the resume issues...

Regards,
	Holger
--

From: Jeff Garzik
Date: Tuesday, June 3, 2008 - 11:07 am

Matthew, any comments?

It would be nice if you and Holger could work together to produce a 
single patch[set]...  right now I have patches from both of you, and I 
was sorta waiting on the thread to die down to see if competing patches 
might merge into a single set

--

From: Matthew Garrett
Date: Tuesday, June 3, 2008 - 11:13 am

I think we were waiting for feedback from Tejun as to why removing the 
port freeze call fixed the hang I was seeing? Beyond that, Holger's 
latest patch looked good to me.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
--

From: Holger Macht
Date: Tuesday, June 3, 2008 - 11:23 am

My suspicion is just that calling ata_port_freeze() on an already frozen
port is not good ;-)

I'll resend the merged patch.

Regards,
	Holger
--

From: Tejun Heo
Date: Sunday, June 8, 2008 - 6:44 pm

Sorry, was off for the last week.

The difference between freezing and scheduling EH is that the former 
immediately aborts all in-flight commands and resets the port while the 
latter waits till the in-flight commands finish or time out (EH 
scheduling kicks fast-drain and the timeout is reduced to three seconds).

TF-based ATA controllers are very sensitive to how the registers are 
accessed and sometimes lock up the whole machine when they are not happy 
by indefinitely holding the PCI bus.  This could have been the case if 
IOs were in flight when the dock event occurred.  Were they?

Thanks.

-- 
tejun
--

From: Matthew Garrett
Date: Sunday, June 8, 2008 - 6:48 pm

I'd stopped hal, so I can't imagine that userspace was causing any to be 
generated at that point.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
--

From: Tejun Heo
Date: Sunday, June 8, 2008 - 9:56 pm

Ah... okay.  Stupid me.  libata EH always resets a frozen port to 
un-freeze it even if it's unoccupied to listen for hotplug events.  So, 
if dock notifies device removal after the actual device is gone && the 
port is frozen as a result, libata EH will try to reset the port after 
the device is gone and in this case the controller locks up the whole 
machine for that.  If schedule_eh is used, libata EH just removes the 
device and does nothing else and the controller is happy.

This isn't too safe tho.  There can be other things which can trigger 
port reset.  ie. hotplug request from userland, in-flight IOs at the 
time of dock removal, etc...  Maybe we need to implement a flag to 
indicate that the port is dead and shouldn't be accessed in any way.

Thanks.

-- 
tejun
--

From: Holger Macht
Date: Tuesday, June 3, 2008 - 11:27 am

* Differentiate between bay devices in dock stations and others:

 - When an ACPI_NOTIFY_EJECT_REQUEST appears, just signal uevent to
   userspace (that is when the optional eject button on a bay device is
   pressed/pulled) giving the possibility to unmount file systems and to
   clean up. Also, only send uevent in case we get an EJECT_REQUEST
   without doing anything else. In other cases, you'll get an add/remove
   event because libata attaches/detaches the device.

 - In case of a dock event, which in turn signals an
   ACPI_NOTIFY_EJECT_REQUEST, immediately detach the device, because it
   may already have been gone

* In case of an ACPI_NOTIFY_DEVICE/BUS_CHECK, evaluate _STA to check if
  the device has been plugged or unplugged. If plugged, hotplug it, if
  unplugged, just signal event to userspace
  (initial patch by Matthew Garrett <mjg59@srcf.ucam.org>)

* Call ACPI _EJ0 for detached devices

Signed-off-by: Holger Macht <hmacht@suse.de>
---

Previous patch differences:
 * Remove one call to ata_port_freeze
 * Add some documentation from Tejun

diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c
index dbf6ca7..3ff8b14 100644
--- a/drivers/ata/libata-acpi.c
+++ b/drivers/ata/libata-acpi.c
@@ -118,12 +118,62 @@ static void ata_acpi_associate_ide_port(struct ata_port *ap)
 		ap->pflags |= ATA_PFLAG_INIT_GTM_VALID;
 }
 
-static void ata_acpi_handle_hotplug(struct ata_port *ap, struct ata_device
-				    *dev, u32 event)
+static void ata_acpi_eject_device(acpi_handle handle)
+{
+	struct acpi_object_list arg_list;
+	union acpi_object arg;
+
+	arg_list.count = 1;
+	arg_list.pointer = &arg;
+	arg.type = ACPI_TYPE_INTEGER;
+	arg.integer.value = 1;
+
+	if (ACPI_FAILURE(acpi_evaluate_object(handle, "_EJ0",
+					      &arg_list, NULL)))
+		printk(KERN_ERR "Failed to evaluate _EJ0!\n");
+}
+
+/* @ap and @dev are the same as ata_acpi_handle_hotplug() */
+static void ata_acpi_detach_device(struct ata_port *ap, struct ata_device *dev)
+{
+	if ...
From: Matthew Garrett
Date: Tuesday, June 3, 2008 - 11:29 am

Acked-by: Matthew Garrett <mjg@redhat.com>

-- 
Matthew Garrett | mjg59@srcf.ucam.org
--

From: Jeff Garzik
Date: Tuesday, June 3, 2008 - 11:54 am

thanks, both!


--

From: Jeff Garzik
Date: Wednesday, June 4, 2008 - 3:29 am

applied


--

Previous thread: Q: Is the "x86: clear DF before calling signal handler" issue affecting User Mode Linux, too? by Ronald Wahl on Wednesday, May 28, 2008 - 6:39 am. (1 message)

Next thread: [PATCH] FRV: Specify the minimum slab/kmalloc alignment by David Howells on Wednesday, May 28, 2008 - 7:36 am. (3 messages)