Cc: Nigel Cunningham <nigel@...>, Benjamin Herrenschmidt <benh@...>, Pavel Machek <pavel@...>, Rafael J. Wysocki <rjw@...>, Matthew Garrett <mjg59@...>, <linux-kernel@...>, <linux-pm@...>
For that matter, shutting down a CPU and hibernation are fundamentally
different operations -- but they both use the freezer. Is that a big
muddy mess? But never mind; I won't discuss hibernation.
You don't understand my point. If a wakeup request arrives before the
system goes to sleep, and it is serviced, then a device which ought to
have been suspended will in fact be awake. This will (if the parent's
driver is written correctly) cause the sleep transition to abort.
Not that there's necessarily anything wrong with that. I just wanted
to be sure you were aware of the potential problems.
I didn't say "bind", I said "registered". Admittedly, they are rather
similar.
Still, there are difficulties. Let's say a driver has set the NO_BIND
flag for one of its devices. A bind request comes in, and the driver
puts it on a waitqueue. Note that the binding thread holds the device
semaphore; this is always true when a driver's probe routine is called.
Later on it comes time for the PM core to resume the device, which will
start up the threads on the waitqueue. Before doing so it must acquire
the device semaphore. Deadlock!
Why do you say that? A "process freezer" can prevent bind and
registration calls from occurring, since these calls have to run in
process context. Ergo a freezer _can_ fix some of these problems.
Who mentioned network packets? And who says a remote wakeup event will
get dropped once interrupts are disabled? More likely it will set a
bit somewhere that causes the system to wake up immediately after it
has gone to sleep.
One of your conditions (embodied in the pseudocode you posted earlier)
was that drivers should be told to prevent binding and registration
before the child devices are suspended. Currently the PM core doesn't
do anything like that. You can't blame the drivers for this lack.
Of course it could be added. Or perhaps more easily, the drivers that
support asynchronous probing could be notified when a suspend is about
to start so they could begin blocking bindings/registrations then.
("Parent device"? Do you mean the device being bound? If so then I
agree. Or do you mean the device's parent? If so then your statement
is not clear at all. There is special-case code in the driver core to
make sure it is true for USB devices, and it looks ugly as can be.)
My question referred to drivers trying to bind or unbind a device
_after_ the device has been suspended. I suppose you'll say that's
covered by the NO_BIND flag. But now we have the locking problem
mentioned above: The thread trying to bind is holding a lock which is
needed for resuming.
That won't cause binding to block or be postponed; it will cause it to
fail. Not the same thing at all.
As one of the people responsible for the USB power management
implementation, I would appreciate more details about this. For
example, a dmesg log with CONFIG_USB_DEBUG turned on together with a
complete description of the actions you took to provoke the bug.
(I wonder how much of this "buginess" is caused by the lack of the
freezer in PPC.)
The USB drivers (at least, the ones with runtime PM support) rely on
the freezer to block I/O during suspend. As far as I know, they do
suspend properly, on systems where the freezer is used.
Is reproducibility really a problem at this stage? A bug which bites
50% of the time might not be quite as easy to fix as one which occurs
every time, but it isn't terribly bad either.
Alan Stern
-