Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Rafael J. Wysocki <rjw@...>
Cc: Linux-pm mailing list <linux-pm@...>, Kernel development list <linux-kernel@...>
Date: Saturday, March 1, 2008 - 11:30 am

On Sat, 1 Mar 2008, Rafael J. Wysocki wrote:


That's right.


Yes, but it's still wrong.  It's also wrong to register a new device 
(even one that has no parent) during the suspend_late stage, because 
this device wouldn't get suspended before the system went to sleep.


It's worse than you describe, because suspend_device() is unable to
tell whether a concurrent child registration is valid or invalid.  By
"valid", I mean that it took place during the window before the suspend
method was called or before the method was able to prevent new child
registrations.

Valid child registrations must be allowed to proceed.  There's no
reason to fail them, because the driver has done nothing wrong -- and
it wouldn't be safe since drivers don't check for failure.  Likewise,
blocking them is likely to cause the suspend method to deadlock; see 
below.

But since suspend_device() can't tell which concurrent child
registrations are valid, it has no choice but to allow all of them.  
This means our only option for recovering from a concurrent child
registration is to resume the parent (don't move it to dpm_off) and 
continue.  That way the list ordering will remain correct.

Once the parent is fully suspended, suspend_device() has returned, and
the parent has been moved to dpm_off, the situation is different.  
Then we _know_ that child registrations are invalid, so either blocking
or failing them would be okay.


In view of my comments above, we _must not_ prevent registration of
children of suspending devices.  It's okay to prevent registration of
suspended devices.  There's nothing wrong with using a lock for this
purpose, but the lock should not be acquired until after
suspend_device() has returned and we have verified that no children
were added while suspend_device() was running.


Consider a situation where a kernel thread is used by a driver for 
several purposes, including registering new children.  The suspend 
method will have to synchronize with the thread for obvious reasons: 
you don't want the thread to be using the device at the same time as 
the device is being suspended.

A typical synchronization approach is for the suspend method to wait
until the other thread is idle (the sort of thing that
flush_scheduled_work() does).  But if the other thread is blocked on
dev->power.lock, it will never become idle and the suspend will
deadlock.

A better approach IMO is for the other thread to always acquire 
dev->sem before doing anything.  (That's how khubd works.)  Then it is 
automatically mutually exclusive with the suspend method, with no need 
to add dev->power.lock at all.  In fact, I have long believed that any 
thread adding a child device should hold the parent's semaphore.

But until all drivers are carefully designed in accordance with these
ideas, we have to assume it is dangerous to block child registrations
before the parent is fully suspended.



That's not the issue.  The only problem I have with your approach is
that it blocks child registrations before the parent is fully
suspended.


I believe it isn't achievable without modifying the drivers.  That was 
the case with the MMC subsystem, for instance.


There must some strange interactions going on.  For instance, what if a
device sends a remote wakeup request before the system is fully asleep?


When the USB design is complete it could be used as a model.  But I 
have to admit that it is rather intricate.


Oh, I agree.  But this protection simply isn't possible without either 
allowing devices to resume after the target sleep state has been set or 
else modifying an unknown number of drivers.

Alan Stern

--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: Fundamental flaw in system suspend, exposed by freezer r..., Benjamin Herrenschmidt, (Wed Feb 27, 4:36 pm)
Re: Fundamental flaw in system suspend, exposed by freezer r..., Rafael J. Wysocki, (Mon Feb 25, 6:24 pm)
Re: [linux-pm] Fundamental flaw in system suspend, exposed b..., Rafael J. Wysocki, (Mon Feb 25, 6:25 pm)
Re: [linux-pm] Fundamental flaw in system suspend, exposed b..., Rafael J. Wysocki, (Mon Feb 25, 8:07 pm)
Re: [linux-pm] Fundamental flaw in system suspend, exposed b..., Rafael J. Wysocki, (Tue Feb 26, 7:17 pm)
Re: [linux-pm] Fundamental flaw in system suspend, exposed b..., Rafael J. Wysocki, (Wed Feb 27, 3:50 pm)
Re: [linux-pm] Fundamental flaw in system suspend, exposed b..., Rafael J. Wysocki, (Fri Feb 29, 10:26 am)
Re: [linux-pm] Fundamental flaw in system suspend, exposed b..., Rafael J. Wysocki, (Fri Feb 29, 1:02 pm)
Re: [linux-pm] Fundamental flaw in system suspend, exposed b..., Rafael J. Wysocki, (Fri Feb 29, 5:57 pm)
Re: [linux-pm] Fundamental flaw in system suspend, exposed b..., Rafael J. Wysocki, (Fri Feb 29, 8:13 pm)
Re: [linux-pm] Fundamental flaw in system suspend, exposed b..., Alan Stern, (Sat Mar 1, 11:30 am)
Re: [linux-pm] Fundamental flaw in system suspend, exposed b..., Rafael J. Wysocki, (Mon Mar 3, 12:32 pm)
Re: [linux-pm] Fundamental flaw in system suspend, exposed b..., Rafael J. Wysocki, (Thu Feb 28, 8:01 pm)