[PATCH] PM: Acquire device locks on suspend

Previous thread: Why is "GoTop Super_Q2/GogoPen/PenPower tablets" treated as touchscreen? by JustFillBug on Saturday, January 5, 2008 - 10:48 am. (2 messages)

Next thread: Supports for new slab allocator now in latest release by Mark Seger on Saturday, January 5, 2008 - 12:29 pm. (3 messages)
From: Rafael J. Wysocki
Date: Saturday, January 5, 2008 - 11:36 am

Greg, Andrew,

The appended patch is a replacement for
gregkh-driver-pm-acquire-device-locks-prior-to-suspending.patch that deadlocked
suspend and hibernation on some systems.

Please consider for applying.

Thanks,
Rafael

---
From: Alan Stern <stern@rowland.harvard.edu>, Rafael J. Wysocki <rjw@sisk.pl>

This patch reorganizes the way suspend and resume notifications are
sent to drivers.  The major changes are that now the PM core acquires
every device semaphore before calling the methods, and calls to
device_add() during suspends will fail.

It also provides a way to safely remove a suspended device with the help of
the PM core, by using the destroy_suspended_device() callback introduced
specifically for this purpose, and updates two drivers (msr and cpuid) that need
to do that.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/cpuid.c    |    6 
 arch/x86/kernel/msr.c      |    6 
 drivers/base/core.c        |   67 +++++-
 drivers/base/power/main.c  |  452 ++++++++++++++++++++++++++-------------------
 drivers/base/power/power.h |   12 +
 include/linux/device.h     |    8 
 6 files changed, 352 insertions(+), 199 deletions(-)

Index: linux-2.6/drivers/base/core.c
===================================================================
--- linux-2.6.orig/drivers/base/core.c
+++ linux-2.6/drivers/base/core.c
@@ -726,11 +726,20 @@ int device_add(struct device *dev)
 {
 	struct device *parent = NULL;
 	struct class_interface *class_intf;
-	int error = -EINVAL;
+	int error;
+
+	error = pm_sleep_lock();
+	if (error) {
+		dev_warn(dev, "Illegal %s during suspend\n", __FUNCTION__);
+		dump_stack();
+		return error;
+	}
 
 	dev = get_device(dev);
-	if (!dev || !strlen(dev->bus_id))
-		goto Error;
+	if (!dev || !strlen(dev->bus_id)) {
+		error = -EINVAL;
+		goto Done;
+	}
 
 	pr_debug("DEV: registering device: ID = '%s'\n", dev->bus_id);
 
@@ -795,6 +804,7 @@ int device_add(struct device *dev)
 	}
  Done:
 ...
From: Alan Stern
Date: Saturday, January 5, 2008 - 1:08 pm

since the call to device_del() will occur while the pm_sleep_rwsem is
still locked for writing.  That's why I suggested not unregistering
these devices until after everything else has been resumed and the
rwsem has been dropped.

Another thing to watch out for: Just in case somebody ends up calling
destroy_suspended_device(dev) from within dev's own resume method, you 
should interchange the resume_device() and the list_move_tail() 
calls in dpm_resume().

Alan Stern



--

From: Rafael J. Wysocki
Date: Saturday, January 5, 2008 - 1:19 pm

However, if we unregister them all at once after releasing pm_sleep_rwsem,
that shouldn't be necessary, right?

Rafael
--

From: Alan Stern
Date: Saturday, January 5, 2008 - 1:39 pm

It's still necessary, because destroy_suspended_device() still has to
move the device from one list to another.  You don't want it to end up 
on the dpm_locked list.

Alan Stern

--

From: Rafael J. Wysocki
Date: Saturday, January 5, 2008 - 2:13 pm

Hmm.  That means we'd have to do the same thing in dpm_power_up() in case
someone calls destroy_suspended_device() from resume_device_early(dev).

Still, even doing that is not enough, since someone can call
destroy_suspended_device() from a .suspend() routine and then the device
will end up on a wrong list just as well.

Greetings,
Rafael


--

From: Alan Stern
Date: Saturday, January 5, 2008 - 2:41 pm

That should never happen.  The whole idea of destroy_suspended_device()
is that the device couldn't be resumed and in fact should be
unregistered because it is no longer working or no longer present.  A
suspend routine won't detect this sort of thing since it doesn't try to
resume the device.

But it wouldn't hurt to mention in the kerneldoc that 
destroy_suspended_device() is meant to be called only during a system 
resume.

Alan Stern

--

From: Rafael J. Wysocki
Date: Saturday, January 5, 2008 - 2:58 pm

Hmm.  Please have a look at the appended patch.

I have removed the warning from device_del() and used list_empty() to detect
removed devices in the .suspend() routines.  Is that viable?

Rafael


---
 arch/x86/kernel/cpuid.c    |    6 
 arch/x86/kernel/msr.c      |    6 
 drivers/base/core.c        |   67 +++++-
 drivers/base/power/main.c  |  454 ++++++++++++++++++++++++++-------------------
 drivers/base/power/power.h |   12 +
 include/linux/device.h     |    8 
 6 files changed, 354 insertions(+), 199 deletions(-)

Index: linux-2.6/drivers/base/core.c
===================================================================
--- linux-2.6.orig/drivers/base/core.c
+++ linux-2.6/drivers/base/core.c
@@ -726,11 +726,20 @@ int device_add(struct device *dev)
 {
 	struct device *parent = NULL;
 	struct class_interface *class_intf;
-	int error = -EINVAL;
+	int error;
+
+	error = pm_sleep_lock();
+	if (error) {
+		dev_warn(dev, "Illegal %s during suspend\n", __FUNCTION__);
+		dump_stack();
+		return error;
+	}
 
 	dev = get_device(dev);
-	if (!dev || !strlen(dev->bus_id))
-		goto Error;
+	if (!dev || !strlen(dev->bus_id)) {
+		error = -EINVAL;
+		goto Done;
+	}
 
 	pr_debug("DEV: registering device: ID = '%s'\n", dev->bus_id);
 
@@ -795,6 +804,7 @@ int device_add(struct device *dev)
 	}
  Done:
 	put_device(dev);
+	pm_sleep_unlock();
 	return error;
  BusError:
 	device_pm_remove(dev);
@@ -1156,14 +1173,11 @@ error:
 EXPORT_SYMBOL_GPL(device_create);
 
 /**
- * device_destroy - removes a device that was created with device_create()
+ * find_device - finds a device that was created with device_create()
  * @class: pointer to the struct class that this device was registered with
  * @devt: the dev_t of the device that was previously registered
- *
- * This call unregisters and cleans up a device that was created with a
- * call to device_create().
  */
-void device_destroy(struct class *class, dev_t devt)
+static struct device *find_device(struct class ...
From: Alan Stern
Date: Saturday, January 5, 2008 - 9:04 pm

It's not good.

The warning in device_del() is vital.  It's what will tell people where
the problem is when a deadlock occurs during system resume because some
driver has mistakenly tried to unregister a device at the wrong time.  
It would have pointed immediately to the msr driver in the case of the
bug Andrew found, for instance.

If you can figure out a way to disable the warning in device_del() for 
just the one device being unregistered by 
device_pm_destroy_suspended(), I suppose that would be okay.

Alan Stern



--

From: Rafael J. Wysocki
Date: Sunday, January 6, 2008 - 6:19 am

Something like this, perhaps:

@@ -905,6 +915,18 @@ void device_del(struct device * dev)
 	struct device * parent = dev->parent;
 	struct class_interface *class_intf;
 
+	if (down_trylock(&dev->sem)) {
+		if (pm_sleep_lock()) {
+			dev_warn(dev, "Illegal %s during suspend\n",
+				__FUNCTION__);
+			dump_stack();
+		} else {
+			pm_sleep_unlock();
+		}
+	} else {
+		up(&dev->sem);
+	}
+
 	if (parent)
 		klist_del(&dev->knode_parent);

Rafael
--

From: Alan Stern
Date: Sunday, January 6, 2008 - 10:06 am

Bizarre, but it should work.  Be sure to include plenty of explanatory 
comments -- otherwise nobody will be able to figure it out!  :-)

Alan Stern

--

From: Rafael J. Wysocki
Date: Sunday, January 6, 2008 - 12:05 pm

OK

Still, shouldn't we fail the removal of the device apart from giving the

I will.

I think that code can be moved to its own function in

Well, I guess so. :-)

Thanks,
Rafael
--

From: Rafael J. Wysocki
Date: Sunday, January 6, 2008 - 12:57 pm

Actually, having thought about it a bit more, I don't see the point in
preventing the removal of the device from the list in device_pm_remove() if
we allow all of the operations in device_del() preceding it to be performed.

Shouldn't we just take pm_sleep_rwsem in device_del() upfront and block on that
if locked?

Rafael
--

From: Rafael J. Wysocki
Date: Sunday, January 6, 2008 - 3:19 pm

Ugh, the $subject patch looks like a city of races.  I'm struggling to close
them all, but it's getting complicated.

I'll post the result in a new thread.

Thanks,
Rafael
--

From: Alan Stern
Date: Sunday, January 6, 2008 - 3:21 pm

That's not the issue.  We _don't_ allow all of the operations in 
device_del() preceding the call to device_pm_remove().  In particular, 
the call to the device's driver's remove method will deadlock because 

No -- the whole idea here is to print an error message in the system
log if a driver's resume method tries to call device_del().  Deadlock 
is unavoidable in this case, but at least we'll know which driver is 
guilty.

Alan Stern

--

From: Rafael J. Wysocki
Date: Sunday, January 6, 2008 - 3:34 pm

Still, if we do that, we won't need to acquire dev->sem in device_pm_remove()
any more.  Apart from this, by acqiring pm_sleep_rwsem for reading in
device_del() we can prevent a suspend from starting while the device is being
removed.

Consider, for example, the scenario possible with the $subject patch:
- device_del() starts and notices pm_sleep_rwsem unlocked, so the warning is
  not printed
- it proceeds and everything before device_pm_remove() succeeds
- now, device_suspend() is called and locks dev->sem
- device_del() calls device_pm_remove() and blocks on that with the device
  partialy removed
I think we should prevent this from happening.

Rafael
--

From: Rafael J. Wysocki
Date: Sunday, January 6, 2008 - 3:47 pm

Not if pm_sleep_rwsem is held by device_del(), since in that case we won't
reach lock_all_devices() (device_add() calls device_pm_remove() under

In that case, we'll attempt to call the device's .suspend() and .resume()
routines, but we shouldn't do that, IMHO.

Rafael
--

From: Alan Stern
Date: Monday, January 7, 2008 - 9:16 am

Let's try to summarize the main issues here:

     1. We want the PM core to lock all devices during suspend and
	hibernation.  This implies that registration and unregistration
	at such times can't work, because they need to lock the
	device sem in order to make probe and remove method calls.

     2. Registration calls can be failed, with an error message in the
	system log.  However unregistration calls cannot fail.  They
	_can_ block until the system resumes, but if the unregistration
	call was made from within a suspend or resume method it will
	deadlock.  This seems inescapable, but at least we should print
	an error in the log so the offending driver can be identified.

     3. In response to 2, the PM core should have a special routine for
	unregistering devices while a suspend is in progress.  Rafael
	proposed that the core should unlock the device to permit the
	call to go through.  This seems dangerous to me; I would prefer
	to leave the locks in place and defer the unregistration until
	after the system is back up and the locks have all been 
	dropped.  This would avoid all sorts of locking, deadlock, and 
	mutual exclusion problems.

(As a side note: destroy_suspended_device() has a rather limited
interface anyway, since it can handle only devices that were created by
create_device().)

     4. Rafael pointed out that unregistration can occur concurrently
	with system suspend.  When this happens we can end up trying to
	suspend a device which has already been through 
	bus_remove_device(), because it hasn't yet been removed from 
	the dpm_active list.  He proposes we make unregistration block
	system suspend, just as registration does.

I don't see 4 as a real problem.  Starting an unregistration before
the suspend and finishing it afterward should be okay.  Once a device
has gone through bus_remove_device() it hasn't got a suspend method any
more, so trying to suspend it won't do anything at all -- the tests in
suspend_device() will all fail.  Conversely, if ...
From: Rafael J. Wysocki
Date: Monday, January 7, 2008 - 9:51 am

Please see the patch at: http://lkml.org/lkml/2008/1/6/298 .  It represents my
current idea about how to do that.

Thanks,
Rafael
--

From: Alan Stern
Date: Monday, January 7, 2008 - 10:23 am

It has some problems.

First, note that the list manipulations in dpm_suspend(), 
device_power_down(), and so on aren't protected by dpm_list_mtx.  So 
your patch could corrupt the list pointers.  Are you assuming that no 
other threads can be running at this time?

Note also that device_pm_destroy_suspended() does up(&dev->sem), but it 
doesn't know whether or not dev->sem was locked to begin with.

Do you want to rule out the possibility of a driver's suspend or remove 
methods calling destroy_suspended_device() on its own device?  With 
your synchronous approach, this would mean that the suspend/resume 
method would indirectly end up calling the remove method.  This is 
dangerous at best; with USB it would be a lockdep violation.  With an 
asynchronous approach, on the other hand, this wouldn't be a problem.

Alan Stern

--

From: Rafael J. Wysocki
Date: Monday, January 7, 2008 - 11:01 am

Yes, they need the locking.  I have overlooked that, mostly because the locking
was removed by gregkh-driver-pm-acquire-device-locks-prior-to-suspending.patch
too (because you assumed there woundn't be any need to remove a device during


Do you mean it might have been released already by another thread

Well, the asynchronous apprach has the problem that the device may end up
on a wrong list when removed by one of the .suspend() callbacks (and I don't
see how to avoid that without extra complexity).  Perhaps that's something we
can live with, though.

One more question: is there any particular reason not to call
device_pm_remove() at the beginning of device_del()?

Rafael
--

From: Alan Stern
Date: Monday, January 7, 2008 - 12:29 pm

I was thinking that it might be called before lock_all_devices().

However let's ignore that possibility and simplify the discussion by 
assuming that destroy_suspended_device() is never called except by a 
suspend or resume method for that device or one of its ancestors.  
(This still leaves the possibility that it might get called by mistake 

The same problem affects the synchronous approach.  We can fix it by
having dpm_suspend() do the list_move() before calling

I think it's done this way to avoid having a window where the device 
isn't on a PM list and is still owned by the bus and the driver.  But 
if a suspend occurs during that window, it shouldn't matter that the 
device will be left unsuspended.  After all, the same thing would have 
happened if the suspend occurred after bus_remove_device().

So no, there shouldn't be a problem with moving the call.

Alan Stern

--

From: Rafael J. Wysocki
Date: Monday, January 7, 2008 - 1:37 pm

I've added pm_sleep_start_end_mtx and the locking dance in




Okay, well, now I'm leaning towards the asynchronous approach.

I'll prepare a new patch and send it later today.

Thanks,
Rafael
--

From: Alan Stern
Date: Monday, January 7, 2008 - 2:32 pm

Yes, I see.  What about the fact that device_suspend() locks 
pm_sleep_start_end_mtx first and pm_sleep_rwsem second, whereas 
device_pm_destroy_suspended() locks pm_sleep_start_end_mtx while 

This suggests another approach, simpler but not as general.  So far all
the problems we've seen have been associated with those CPU notifiers.  
Suppose the notifications about CPUs that failed to come back up were
delayed until after the resume was complete?  Drivers like msr would
then have to check in their resume handler whether the CPU was actually 
up, but no other changes would be needed.

In this way we could fix the immediate problem.  It wouldn't help with 
other sorts of devices that need to be unregistered during a suspend, 

Okay.

Alan Stern

--

From: Rafael J. Wysocki
Date: Monday, January 7, 2008 - 5:25 pm

Appended is what I managed to put together today.

It probably still has some problems, but I'm not seeing them right now (too
tired).  At least, it doesn't break my system. ;-)

Please review.

Thanks,
Rafael

---
From: Alan Stern <stern@rowland.harvard.edu>, Rafael J. Wysocki <rjw@sisk.pl>

This patch reorganizes the way suspend and resume notifications are
sent to drivers.  The major changes are that now the PM core acquires
every device semaphore before calling the methods, and calls to
device_add() during suspends will fail, while calls to device_del()
during suspends will block.

It also provides a way to safely remove a suspended device with the
help of the PM core, by using the device_pm_schedule_removal() callback
introduced specifically for this purpose, and updates two drivers (msr
and cpuid) that need to use it.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/cpuid.c    |    6 
 arch/x86/kernel/msr.c      |    6 
 drivers/base/core.c        |   60 ++++-
 drivers/base/power/main.c  |  504 +++++++++++++++++++++++++++++----------------
 drivers/base/power/power.h |   12 +
 include/linux/device.h     |    8 
 6 files changed, 408 insertions(+), 188 deletions(-)

Index: linux-2.6/drivers/base/core.c
===================================================================
--- linux-2.6.orig/drivers/base/core.c
+++ linux-2.6/drivers/base/core.c
@@ -726,11 +726,20 @@ int device_add(struct device *dev)
 {
 	struct device *parent = NULL;
 	struct class_interface *class_intf;
-	int error = -EINVAL;
+	int error;
+
+	error = pm_sleep_lock();
+	if (error) {
+		dev_warn(dev, "Suspicious %s during suspend\n", __FUNCTION__);
+		dump_stack();
+		return error;
+	}
 
 	dev = get_device(dev);
-	if (!dev || !strlen(dev->bus_id))
+	if (!dev || !strlen(dev->bus_id)) {
+		error = -EINVAL;
 		goto Error;
+	}
 
 	pr_debug("DEV: registering device: ID = '%s'\n", dev->bus_id);
 
@@ -795,6 +804,7 @@ int device_add(struct device *dev)
 	}
  Done:
 ...
From: Alan Stern
Date: Wednesday, January 9, 2008 - 2:01 pm

Okay, this seems to be better.  I like the way the complicated tests 
are all localized in power/main.c.

In dpm_resume() you shouldn't need to use dpm_list_mtx at all, because
the list_move_tail() comes before the resume_device().  It's the same
as in dpm_power_up().

The same is true for dpm_suspend().  Once all the device have been 
locked, there shouldn't be any other tasks accessing the dpm lists.  
Hence there should be no need to protect the list.

Which reminds me, the kerneldoc for device_pm_schedule_removal() is 
inaccurate.  The routine always just moves the device to dpm_destroy 
list for later processing.

Also, the kerneldoc for destroy_suspended_device() should contain an 
extra paragraph warning that the routine should never be called except 
within the scope of a system sleep transition.  In practice this means 
it has to be directly or indirectly invoked by a suspend or resume 
method.

It looks good.

Alan Stern

--

From: Rafael J. Wysocki
Date: Wednesday, January 9, 2008 - 3:14 pm

Still, device_pm_schedule_removal() can (in theory) be called concurrently
with dpm_resume() by another thread and this might corrupt the list without




Thanks for the review.

I'll fix the comments and repost the patch from scratch for merging in a
separate thread.

Greetings,
Rafael
--

From: Alan Stern
Date: Wednesday, January 9, 2008 - 3:46 pm

Any thread doing that would be in violation of the restrictions you're 
going to add to the kerneldoc for destroy_suspended_device().

However the overhead for the locking isn't critical.  There won't be
any contention (if everything is working right) and it isn't a hot path
anyway.  So you can leave the extra locking in if you want.  But then
you should put it in all the routines where the lists get manipulated,
not just some of them.  That is: device_power_down(), dpm_power_up(),

In your patch the call is made in response to a CPU_UP_CANCELED_FROZEN
notification.  Isn't it true that this notification is issued only as
part of a system sleep transition?  We wouldn't want to allow
destroy_suspended_device() to be called when an arbitrary CPU hotplug
notification occurs.

Alan Stern

--

From: Rafael J. Wysocki
Date: Wednesday, January 9, 2008 - 4:29 pm

Of course.

Greetings,
Rafael
--

From: Rafael J. Wysocki
Date: Thursday, January 10, 2008 - 9:59 am

Rather, by the resume core.  Technically, it's invoked by
enable_nonboot_cpus(), which is not a resume method literally.

Greetings,
Rafael
--

From: Alan Stern
Date: Thursday, January 10, 2008 - 10:04 am

Okay, then the routine should only be called directly or indirectly 
from a suspend or resume method or from the suspend or resume core.

Alan Stern

--

From: Alan Stern
Date: Sunday, January 6, 2008 - 3:11 pm

We can't.  device_del() can't fail -- it returns void.  Besides, how 
can a driver hope to deal with an unregistration failure?

Alan Stern

--

From: Alan Stern
Date: Sunday, January 6, 2008 - 3:31 pm

We could, however I don't think it's dangerous to allow it.  The two
problems to avoid are (1) messing up the PM device list pointers, and
(2) calling a driver's suspend/resume methods while its remove method
is running.  (1) is handled by the pm_list_mutex and (2) is handled by
locking dev->sem.

Alan Stern

--

Previous thread: Why is "GoTop Super_Q2/GogoPen/PenPower tablets" treated as touchscreen? by JustFillBug on Saturday, January 5, 2008 - 10:48 am. (2 messages)

Next thread: Supports for new slab allocator now in latest release by Mark Seger on Saturday, January 5, 2008 - 12:29 pm. (3 messages)