Hi,
This is the 3rd revision of the patch introducing new callbacks for suspend
and hibernation. It has been tested on x86-64.I did my best to address the Alan's comments regarding the 2nd revision of the
patch. In particular:
* The ->prepare() callbacks are now executed in the forward order to avoid
races between the PM core and the registration of new devices. Analogously,
the ->complete() callbacks are executed in the reverse order.
* Only one list, called dpm_list, is used by the PM core its operations and
the dev->power.status field is used to mark the current status of the device
wrt the suspend/hibernation operations.
* The registrations of parentless devices are disabled before the first
->prepare() method is called and enabled before the first ->resume() method
is called
* The registration of new devices with a parent is disabled right after the
parent's ->prepare() is called and enabled just prior to calling the parent's
->resume().The patch is on top of the "PM: Remove destroy_suspended_device()" patch
I sent yesterday.Please review.
Thanks,
Rafael---
From: Rafael J. Wysocki <rjw@sisk.pl>Introduce 'struct pm_ops' and 'struct pm_noirq_ops' representing
suspend and hibernation operations for bus types, device classes and
device types.Modify the PM core to use 'struct pm_ops' and 'struct pm_noirq_ops'
objects, if defined, instead of the ->suspend() and ->resume() or,
respectively, ->suspend_late() and ->resume_early() callbacks that
will be considered as legacy and gradually phased out.The main purpose of doing this is to separate suspend (aka S2RAM and
standby) callbacks from hibernation callbacks in such a way that the
new callbacks won't take arguments and the semantics of each of them
will be clearly specified. This has been requested for multiple
times by many people, including Linus himself, and the reason is that
within the current scheme if ->resume() is called, for example, ...
It would be okay to wait until after the last prepare() method is
Can we also have a DPM_PREPARING state, set when ->prepare() is about
to be called? The PM core wouldn't make use of it but some drivers
would. (I can't think of any use at all for the analogousClearer to say: if (dev->parent->power.status >= DPM_SUSPENDING) {
Log a warning here? If this ever happened, it would be the sort of
Clearer to say: if (dev->power.status >= DPM_OFF) {
Note that if dev->power.status is equal to DPM_SUSPENDING then you
don't want to call resume_device(), but you still do want to change
dev->power.status to DPM_RESUMING so that new children can beThis isn't the way I imagined doing it (your extra "list"), but it's
fine.Instead you could eliminate the list_splice_init() above and put here:
list_splice(&list, dpm_list->prev);
On the whole it looks quite good.
Alan Stern
--
The patch below should address all of your comments above. It also seems to
work.If it looks good to you, please let me know and I'll repost it in a new thread
along with the platform and PCI patches adapted to it.Thanks,
RafaelSigned-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---arch/x86/kernel/apm_32.c | 4
drivers/base/power/main.c | 654 ++++++++++++++++++++++++++++++++++-----------
drivers/base/power/power.h | 2
drivers/base/power/trace.c | 4
include/linux/device.h | 8
include/linux/pm.h | 282 +++++++++++++++++--
kernel/power/disk.c | 14
kernel/power/main.c | 4
8 files changed, 772 insertions(+), 200 deletions(-)Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -114,7 +114,9 @@ typedef struct pm_message {
int event;
} pm_message_t;-/*
+/**
+ * struct pm_ops - device PM callbacks
+ *
* Several driver power state transitions are externally visible, affecting
* the state of pending I/O queues and (for drivers that touch hardware)
* interrupts, wakeups, DMA, and other hardware state. There may also be
@@ -122,6 +124,254 @@ typedef struct pm_message {
* to the rest of the driver stack (such as a driver that's ON gating off
* clocks which are not in active use).
*
+ * The externally visible transitions are handled with the help of the following
+ * callbacks included in this structure:
+ *
+ * @prepare: Prepare the device for the upcoming transition, but do NOT change
+ * its hardware state. Prevent new children of the device from being
+ * registered and prevent new calls to the probe method from being made
+ * after @prepare() returns. If @prepare() detects a situation it cannot
+ * handle (e.g. registration of a child already in progress), it may return
+ * -EAGAIN, so that the PM core can execute it once again (e.g. after the
+ * new chi...
I don't see anything else to change. ACK.
Alan Stern
--
Thanks.
I'd like to fix the problem with dpm_list_mtx vs the disabling of interrupts
you pointed out in the other message, though.Rafael
--
How is a driver supposed to prevent calls to probe()? You can block in probe()
or you can fail it, but how do you prevent it from being called? User space
can trigger probe via sysfs. To be useful to driver writers, this hasProbably it should be mentioned that this is the time to allocate memory
if you have to. And it is too late to request firmware.Regards
Oliver
--
The comment is supposed to mean that probe() should be prevented from running
Well, not exactly. Afterwards you cannot use GFP_KERNEL allocations, but
GFP_NOIO should still work, although for hibernation it's quite possible thatYes.
Thanks,
Rafael
--
The driver isn't supposed to prevent calls to its own probe(). The
comment means that the subsystem -- or the rest of the kernel generally
-- is supposed to avoid binding a driver to the device (thereby calling
the probe routine), assuming the device isn't already bound.I don't expect this sort of thing to be very common. Mostly it happens
when new kernel modules are loaded and new drivers are registered; we
will have to block module loading during a sleep transition. It also
happens when the user writes to a driver's "bind" attribute in sysfs;
the code there will have to block during a sleep transition.Alan Stern
--
I see no locking that would would prevent disconnect() in the window between
This is better documented explicitly.
Regards
Oliver--
There is no such locking. It's perfectly legal for a device to be
unregistered between prepare() and suspend(). I suppose it wouldn't
hurt to add a general comment explaining that a device can be
unregistered at any time except when one of its methods is running.Alan Stern
--
Hmm. dev->power.status is protected by dpm_list_mtx. Do you think it would be
The device can't be removed at this point, because we hold dpm_list_mtx, which
is needed by device_del(). Still, I'll move the put_device() to avoidOkay, thanks for the comments.
Rafael
--
I don't think so. What I have in mind is situations where there
accessed has already been synchronized by other means, while the
prepare() method is running. For example:Task 0 Task 1
------ ------
->prepare() is called
Waits for currently-running
registration in task 1
to finish
Does other stuff
Receives a request to register
a new child under dev
Sees that dev->power.state is
still DPM_ON, so goes ahead
with the child's registration
->prepare() returns
dev->power.state is set to
DPM_SUSPENDING
device_pm_add() checks
dev->power.state and fails
the registrationIf dev->power.state had been set to DPM_PREPARING before ->prepare()
was called, then task 1 would have avoided trying to register theTrue, it can't be removed at this point. But it _can_ be removed
It will still work correctly. If dpm_list is empty then dpm_list->prev
is equal to &dpm_list, so it will do the same thing as your current
code does.I just thought of another problem. At the point where
local_irq_disable() is called, in between device_suspend() and
device_power_down(), it is possible in a preemptible kernel that
another task is holding dpm_list_mtx and is in the middle of updating
the list pointers. This would mess up the traversal in
device_power_down().I'm not sure about the best way to prevent this. Is it legal to call
unlock_mutex() while interrupts or preemption are disabled?Alan Stern
--
Well, I think it is, but I'm not sure how that can help.
To prevent the race from happening, we can lock dpm_list_mtx before switching
interrupts off in kernel/power/main.c:suspend_enter() and analogously in
kernel/power/disk.c .Thanks,
Rafael
--
That's right. And once interrupts are turned off you should unlock
dpm_list_mtx again, in case a noirq method wants to unregister a
device. Hence my question: Is it legal to call unlock_mutex() while
interrupts are disabled?Alan Stern
--
Why would a noirq method want to do that? IMO, it's not a big deal if noirq
Well, I suspect that will confuse lockdep quite a bit. Otherwise, I don't see
a problem with it (it's just changing the value of a shared variable after
all).Thanks,
Rafael
--
Then you have your answer. Perhaps have device_suspend() exit with the
mutex held and have device_resume() release it (with appropriate
handling for error situations, of course).Alan Stern
--
That wouldn't work, because enable_nonboot_cpus() is called before
device_resume() and the notifiers in there may want to unregister devices
if some CPUs fail to go online.I added two accessor functions device_pm_lock() and device_pm_unlock()
to be called just prior to disabling interrupts and right after enabling them,
respectively, in the higher-level PM core (ie. kernel/power/main(disk).c).Thanks,
Rafael
--
| Greg Kroah-Hartman | [PATCH 004/196] Chinese: add translation of SubmittingPatches |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Matt Mackall | Re: [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c compiling |
| James Bottomley | Re: Integration of SCST in the mainstream Linux kernel |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Natalie Protasevich | [BUG] New Kernel Bugs |
git: | |
