This patch series adds a suspend-block api that provides the same functionality as the android wakelock api. This version fixes a race in suspend blocking work, has some documentation changes and opportunistic suspend now uses the same workqueue as runtime pm. -- Arve Hjønnevåg <arve@android.com> --
Add a misc device, "suspend_blocker", that allows user-space processes to block auto suspend. The device has ioctls to create a suspend_blocker, and to block and unblock suspend. To delete the suspend_blocker, close the device. Signed-off-by: Arve Hjønnevåg <arve@android.com> --- Documentation/ioctl/ioctl-number.txt | 3 +- Documentation/power/opportunistic-suspend.txt | 17 ++++ include/linux/suspend_block_dev.h | 25 +++++ kernel/power/Kconfig | 9 ++ kernel/power/Makefile | 1 + kernel/power/user_suspend_blocker.c | 128 +++++++++++++++++++++++++ 6 files changed, 182 insertions(+), 1 deletions(-) create mode 100644 include/linux/suspend_block_dev.h create mode 100644 kernel/power/user_suspend_blocker.c diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt index dd5806f..e2458f7 100644 --- a/Documentation/ioctl/ioctl-number.txt +++ b/Documentation/ioctl/ioctl-number.txt @@ -254,7 +254,8 @@ Code Seq#(hex) Include File Comments 'q' 80-FF linux/telephony.h Internet PhoneJACK, Internet LineJACK linux/ixjuser.h <http://www.quicknet.net> 'r' 00-1F linux/msdos_fs.h and fs/fat/dir.c -'s' all linux/cdk.h +'s' all linux/cdk.h conflict! +'s' all linux/suspend_block_dev.h conflict! 't' 00-7F linux/if_ppp.h 't' 80-8F linux/isdn_ppp.h 't' 90 linux/toshiba.h diff --git a/Documentation/power/opportunistic-suspend.txt b/Documentation/power/opportunistic-suspend.txt index 3d060e8..f2b145e 100644 --- a/Documentation/power/opportunistic-suspend.txt +++ b/Documentation/power/opportunistic-suspend.txt @@ -117,3 +117,20 @@ if (list_empty(&state->pending_work)) else suspend_block(&state->suspend_blocker); +User-space API +============== + +To create a suspend_blocker from user-space, open the suspend_blocker device: + fd = open("/dev/suspend_blocker", O_RDWR | O_CLOEXEC); +then call: + ioctl(fd, ...
When connecting usb or the charger the device would often go back to sleep
before the charge led and screen turned on.
Signed-off-by: Arve Hjønnevåg <arve@android.com>
---
drivers/power/power_supply_core.c | 9 ++++++---
include/linux/power_supply.h | 3 ++-
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/power/power_supply_core.c b/drivers/power/power_supply_core.c
index cce75b4..577a131 100644
--- a/drivers/power/power_supply_core.c
+++ b/drivers/power/power_supply_core.c
@@ -39,7 +39,7 @@ static int __power_supply_changed_work(struct device *dev, void *data)
static void power_supply_changed_work(struct work_struct *work)
{
struct power_supply *psy = container_of(work, struct power_supply,
- changed_work);
+ changed_work.work);
dev_dbg(psy->dev, "%s\n", __func__);
@@ -55,7 +55,7 @@ void power_supply_changed(struct power_supply *psy)
{
dev_dbg(psy->dev, "%s\n", __func__);
- schedule_work(&psy->changed_work);
+ schedule_suspend_blocking_work(&psy->changed_work);
}
EXPORT_SYMBOL_GPL(power_supply_changed);
@@ -155,7 +155,8 @@ int power_supply_register(struct device *parent, struct power_supply *psy)
goto dev_create_failed;
}
- INIT_WORK(&psy->changed_work, power_supply_changed_work);
+ suspend_blocking_work_init(&psy->changed_work,
+ power_supply_changed_work, "power-supply");
rc = power_supply_create_attrs(psy);
if (rc)
@@ -172,6 +173,7 @@ int power_supply_register(struct device *parent, struct power_supply *psy)
create_triggers_failed:
power_supply_remove_attrs(psy);
create_attrs_failed:
+ suspend_blocking_work_destroy(&psy->changed_work);
device_unregister(psy->dev);
dev_create_failed:
success:
@@ -184,6 +186,7 @@ void power_supply_unregister(struct power_supply *psy)
flush_scheduled_work();
power_supply_remove_triggers(psy);
power_supply_remove_attrs(psy);
+ suspend_blocking_work_destroy(&psy->changed_work);
device_unregister(psy->dev);
}
...Add an ioctl, EVIOCSSUSPENDBLOCK, to enable a suspend_blocker that will block
suspend while the event queue is not empty. This allows userspace code to
process input events while the device appears to be asleep.
Signed-off-by: Arve Hjønnevåg <arve@android.com>
---
drivers/input/evdev.c | 22 ++++++++++++++++++++++
include/linux/input.h | 3 +++
2 files changed, 25 insertions(+), 0 deletions(-)
diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c
index 2ee6c7a..66e0d16 100644
--- a/drivers/input/evdev.c
+++ b/drivers/input/evdev.c
@@ -20,6 +20,7 @@
#include <linux/input.h>
#include <linux/major.h>
#include <linux/device.h>
+#include <linux/suspend_blocker.h>
#include "input-compat.h"
struct evdev {
@@ -43,6 +44,8 @@ struct evdev_client {
struct fasync_struct *fasync;
struct evdev *evdev;
struct list_head node;
+ struct suspend_blocker suspend_blocker;
+ bool use_suspend_blocker;
};
static struct evdev *evdev_table[EVDEV_MINORS];
@@ -55,6 +58,8 @@ static void evdev_pass_event(struct evdev_client *client,
* Interrupts are disabled, just acquire the lock
*/
spin_lock(&client->buffer_lock);
+ if (client->use_suspend_blocker)
+ suspend_block(&client->suspend_blocker);
client->buffer[client->head++] = *event;
client->head &= EVDEV_BUFFER_SIZE - 1;
spin_unlock(&client->buffer_lock);
@@ -234,6 +239,8 @@ static int evdev_release(struct inode *inode, struct file *file)
mutex_unlock(&evdev->mutex);
evdev_detach_client(evdev, client);
+ if (client->use_suspend_blocker)
+ suspend_blocker_destroy(&client->suspend_blocker);
kfree(client);
evdev_close_device(evdev);
@@ -335,6 +342,8 @@ static int evdev_fetch_next_event(struct evdev_client *client,
if (have_event) {
*event = client->buffer[client->tail++];
client->tail &= EVDEV_BUFFER_SIZE - 1;
+ if (client->use_suspend_blocker && client->head == client->tail)
+ suspend_unblock(&client->suspend_blocker);
}
...Allow work to be queued that will block suspend while it is pending
or executing. To get the same functionality in the calling code often
requires a separate suspend_blocker for pending and executing work, or
additional state and locking. This implementation does add additional
state and locking, but this can be removed later if we add support for
suspend blocking work to the core workqueue code.
Signed-off-by: Arve Hjønnevåg <arve@android.com>
---
include/linux/suspend_blocker.h | 67 ++++++++++++++++++++++++
kernel/power/suspend_blocker.c | 109 +++++++++++++++++++++++++++++++++++++++
2 files changed, 176 insertions(+), 0 deletions(-)
diff --git a/include/linux/suspend_blocker.h b/include/linux/suspend_blocker.h
index c80764c..bf41a57 100755
--- a/include/linux/suspend_blocker.h
+++ b/include/linux/suspend_blocker.h
@@ -18,6 +18,7 @@
#include <linux/list.h>
#include <linux/ktime.h>
+#include <linux/workqueue.h>
/**
* struct suspend_blocker - the basic suspend_blocker structure
@@ -57,6 +58,38 @@ struct suspend_blocker {
#endif
};
+/**
+ * struct suspend_blocking_work - the basic suspend_blocking_work structure
+ * @work: Standard work struct.
+ * @suspend_blocker: Suspend blocker.
+ * @func: Callback.
+ * @lock: Spinlock protecting pending and running state.
+ * @active: Number of cpu workqueues where work is pending or
+ * callback is running.
+ *
+ * When suspend blocking work is pending or its callback is running it prevents
+ * the system from entering opportunistic suspend.
+ *
+ * The suspend_blocking_work structure must be initialized by
+ * suspend_blocking_work_init().
+ */
+
+struct suspend_blocking_work {
+ struct work_struct work;
+#ifdef CONFIG_OPPORTUNISTIC_SUSPEND
+ struct suspend_blocker suspend_blocker;
+ work_func_t func;
+ spinlock_t lock;
+ int active;
+#endif
+};
+
+static inline struct suspend_blocking_work *to_suspend_blocking_work(
+ struct work_struct *work)
+{
+ return container_of(work, struct ...Reviewed-by: Tejun Heo <tj@kernel.org> Thank you. -- tejun --
Looks ok. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
Report suspend block stats in /sys/kernel/debug/suspend_blockers.
Signed-off-by: Arve Hjønnevåg <arve@android.com>
---
include/linux/suspend_blocker.h | 21 ++++-
kernel/power/Kconfig | 7 ++
kernel/power/power.h | 5 +
kernel/power/suspend.c | 4 +-
kernel/power/suspend_blocker.c | 190 +++++++++++++++++++++++++++++++++++++--
5 files changed, 218 insertions(+), 9 deletions(-)
diff --git a/include/linux/suspend_blocker.h b/include/linux/suspend_blocker.h
index f9928cc..c80764c 100755
--- a/include/linux/suspend_blocker.h
+++ b/include/linux/suspend_blocker.h
@@ -17,12 +17,21 @@
#define _LINUX_SUSPEND_BLOCKER_H
#include <linux/list.h>
+#include <linux/ktime.h>
/**
* struct suspend_blocker - the basic suspend_blocker structure
* @link: List entry for active or inactive list.
- * @flags: Tracks initialized and active state.
+ * @flags: Tracks initialized, active and stats state.
* @name: Name used for debugging.
+ * @count: Number of times this blocker has been deacivated
+ * @wakeup_count: Number of times this blocker was the first to block suspend
+ * after resume.
+ * @total_time: Total time this suspend blocker has prevented suspend.
+ * @prevent_suspend_time: Time this suspend blocker has prevented suspend while
+ * user-space requested suspend.
+ * @max_time: Max time this suspend blocker has been continuously active
+ * @last_time: Monotonic clock when the active state last changed.
*
* When a suspend_blocker is active it prevents the system from entering
* opportunistic suspend.
@@ -35,6 +44,16 @@ struct suspend_blocker {
struct list_head link;
int flags;
const char *name;
+#ifdef CONFIG_SUSPEND_BLOCKER_STATS
+ struct {
+ int count;
+ int wakeup_count;
+ ktime_t total_time;
+ ktime_t prevent_suspend_time;
+ ktime_t max_time;
+ ktime_t last_time;
+ } stat;
+#endif
#endif
};
diff --git ...Report active and inactive suspend blockers in
/sys/kernel/debug/suspend_blockers.
Signed-off-by: Arve Hjønnevåg <arve@android.com>
---
kernel/power/suspend_blocker.c | 43 ++++++++++++++++++++++++++++++++++++++-
1 files changed, 41 insertions(+), 2 deletions(-)
diff --git a/kernel/power/suspend_blocker.c b/kernel/power/suspend_blocker.c
index 2c58b21..ced4993 100644
--- a/kernel/power/suspend_blocker.c
+++ b/kernel/power/suspend_blocker.c
@@ -17,6 +17,7 @@
#include <linux/rtc.h>
#include <linux/suspend.h>
#include <linux/suspend_blocker.h>
+#include <linux/debugfs.h>
#include "power.h"
extern struct workqueue_struct *pm_wq;
@@ -42,6 +43,7 @@ static int current_event_num;
struct suspend_blocker main_suspend_blocker;
static suspend_state_t requested_suspend_state = PM_SUSPEND_MEM;
static bool enable_suspend_blockers;
+static struct dentry *suspend_blocker_stats_dentry;
#define pr_info_time(fmt, args...) \
do { \
@@ -55,6 +57,21 @@ static bool enable_suspend_blockers;
tm.tm_hour, tm.tm_min, tm.tm_sec, ts.tv_nsec); \
} while (0);
+static int suspend_blocker_stats_show(struct seq_file *m, void *unused)
+{
+ unsigned long irqflags;
+ struct suspend_blocker *blocker;
+
+ seq_puts(m, "name\tactive\n");
+ spin_lock_irqsave(&list_lock, irqflags);
+ list_for_each_entry(blocker, &inactive_blockers, link)
+ seq_printf(m, "\"%s\"\t0\n", blocker->name);
+ list_for_each_entry(blocker, &active_blockers, link)
+ seq_printf(m, "\"%s\"\t1\n", blocker->name);
+ spin_unlock_irqrestore(&list_lock, irqflags);
+ return 0;
+}
+
static void print_active_blockers_locked(void)
{
struct suspend_blocker *blocker;
@@ -106,8 +123,8 @@ static DECLARE_WORK(suspend_work, suspend_worker);
/**
* suspend_blocker_init() - Initialize a suspend blocker
* @blocker: The suspend blocker to initialize.
- * @name: The name of the suspend blocker to show in debug messages.
- *
+ * @name: The name of the suspend blocker to show in debug messages and
+ ...Could you report the pid here too? The name set by the application might be meaningless or duplicated. -Andi -- ak@linux.intel.com -- Speaking for myself only. --
Currently all the names set by android user-space are unique but the pids are not. We can add columns, or extend the name of user space suspend blockers in the ioctl interface, later if needed. -- Arve Hjønnevåg --
If a suspend_blocker is active, suspend will fail anyway. Since
try_to_freeze_tasks can take up to 20 seconds to complete or fail, aborting
as soon as someone blocks suspend (e.g. from an interrupt handler) improves
the worst case wakeup latency.
On an older kernel where task freezing could fail for processes attached
to a debugger, this fixed a problem where the device sometimes hung for
20 seconds before the screen turned on.
Signed-off-by: Arve Hjønnevåg <arve@android.com>
---
kernel/power/process.c | 12 ++++++++++--
1 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 71ae290..d8ebd50 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -15,6 +15,7 @@
#include <linux/syscalls.h>
#include <linux/freezer.h>
#include <linux/delay.h>
+#include <linux/suspend_blocker.h>
/*
* Timeout for stopping processes
@@ -38,6 +39,7 @@ static int try_to_freeze_tasks(bool sig_only)
struct timeval start, end;
u64 elapsed_csecs64;
unsigned int elapsed_csecs;
+ bool wakeup = false;
do_gettimeofday(&start);
@@ -63,6 +65,10 @@ static int try_to_freeze_tasks(bool sig_only)
todo++;
} while_each_thread(g, p);
read_unlock(&tasklist_lock);
+ if (todo && suspend_is_blocked()) {
+ wakeup = true;
+ break;
+ }
if (!todo || time_after(jiffies, end_time))
break;
@@ -85,13 +91,15 @@ static int try_to_freeze_tasks(bool sig_only)
* but it cleans up leftover PF_FREEZE requests.
*/
printk("\n");
- printk(KERN_ERR "Freezing of tasks failed after %d.%02d seconds "
+ printk(KERN_ERR "Freezing of tasks %s after %d.%02d seconds "
"(%d tasks refusing to freeze):\n",
+ wakeup ? "aborted" : "failed",
elapsed_csecs / 100, elapsed_csecs % 100, todo);
read_lock(&tasklist_lock);
do_each_thread(g, p) {
task_lock(p);
- if (freezing(p) && !freezer_should_skip(p))
+ if (freezing(p) && !freezer_should_skip(p)
+ && elapsed_csecs > ...I acked this one before, please preserve such tags. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
Yeah, this one is overly complex; instead of having 'open blocks suspend' semantics and lsof listing active blockers, it adds strange ioctl based interface passing names, and then adds debugfs infrastructure listing those back. I guess this is why you are getying 'it should be in /proc, no in /sys, no in debugfs, no in /proc' kind of feedback. This should simply -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
Hmm. It doesn't seem to be possible to create two different suspend blockers using the same file handle. So, what exactly is a process supposed to do to use two suspend blockers at the same time? Rafael --
Adds /sys/power/policy that selects the behaviour of /sys/power/state. After setting the policy to opportunistic, writes to /sys/power/state become non-blocking requests that specify which suspend state to enter when no suspend blockers are active. A special state, "on", stops the process by activating the "main" suspend blocker. Signed-off-by: Arve Hjønnevåg <arve@android.com> --- Documentation/power/opportunistic-suspend.txt | 119 +++++++++++ include/linux/suspend_blocker.h | 64 ++++++ kernel/power/Kconfig | 16 ++ kernel/power/Makefile | 1 + kernel/power/main.c | 92 ++++++++- kernel/power/power.h | 9 + kernel/power/suspend.c | 4 +- kernel/power/suspend_blocker.c | 261 +++++++++++++++++++++++++ 8 files changed, 559 insertions(+), 7 deletions(-) create mode 100644 Documentation/power/opportunistic-suspend.txt create mode 100755 include/linux/suspend_blocker.h create mode 100644 kernel/power/suspend_blocker.c diff --git a/Documentation/power/opportunistic-suspend.txt b/Documentation/power/opportunistic-suspend.txt new file mode 100644 index 0000000..3d060e8 --- /dev/null +++ b/Documentation/power/opportunistic-suspend.txt @@ -0,0 +1,119 @@ +Opportunistic Suspend +===================== + +Opportunistic suspend is a feature allowing the system to be suspended (ie. put +into one of the available sleep states) automatically whenever it is regarded +as idle. The suspend blockers framework described below is used to determine +when that happens. + +The /sys/power/policy sysfs attribute is used to switch the system between the +opportunistic and "forced" suspend behavior, where in the latter case the +system is only suspended if a specific value, corresponding to one of the +available system sleep states, is written into /sys/power/state. However, in +the former, opportunistic, case the system is ...
As I explained before (and got no reply), the proposed interface is NAK. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
In fact this behavior was discussed at the LF Collab Summit and no one Ignored. Thanks, Rafael --
Well, I explained why I disliked in previous mail in more details, and neither you nor Arve explained why it is good solution. WTF? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
Because it's less confusing. Having two different attributes returning almost the same contents and working in a slightly different way wouldn't be too clean IMO. Literally. I'm not going to take that NAK into consideration. Rafael --
No, I don't think it is similar to pm_test. pm_test is debug-only, and orthogonal to state -- all combinations make sense. With 'oportunistic > policy', state changes from blocking to nonblocking (surprise!). Plus, it is not orthogonal: (assume we added s-t-flash on android for powersaving... or imagine I get oportunistic suspend working on PC --I was there with limited config on x60). policy: oportunistic forced state: on mem disk First disadvantage of proposed interface is that while 'opportunistic mem' is active, I can't do 'forced disk' to save bit more power. Next, not all combinations make sense. oportunistic on == forced <nothing> oportunistic disk -- probably something that will not be implemented any time soon. oportunistic mem -- makes sense. forced on -- NOP forced mem -- makes sense. forced disk -- makes sense. So we have matrix of 7 possibilities, but only 4 make sense... IMO its confusing. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
That's correct, but the "opportunistic on" thing is there to avoid switching back and forth from "opportunistic" to "forced". So that's a matter of The main problem is that the entire suspend subsystem is going to work in a different way when suspend blockers are enforced. Thus IMO it makes sense to provide a switch between the "opportunistic" and "forced" modes, because that clearly indicates to the user (or user space in general) how the whole suspend subsystem actually works at the moment. As long as it's "opportunistic", the system will autosuspend if suspend blockers are not active and the behavior of "state" reflects that. If you want to enforce a transition, switch to "forced" first. That's not at all confusing if you know what you're doing. The defailt mode is "forced", so the suspend subsystem works "as usual" by default. You have to directly switch it to "opportunistic" to change the behavior and once you've done that, you shouldn't really be surprised that the behavior has changed. That's what you've requested after all. So no, I don't really think it's confusing. Thanks, Rafael --
Also note that this patch mixes up adding new in-kernel interface with adding new userland API. The in-kernel interface seems to be sane and it would be nice to merge -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
You know things would be so much easier if the policy was a one-shot styled thing. i.e. when enabled it does what it does, but upon resume the policy must be re-enabled by user mode to get the opportunistic behavior. That way we don't need to grab the suspend blocker from the wake up irq handler all the way up to user mode as the example below calls out. I suppose doing this would put a burden on the user mode code to keep track of if it has no pending blockers registered after a wake up from the suspend. but that seems nicer to me than sprinkling overlapping blocker critical sections from the mettle up to user mode. Please consider making the policy a one shot API that needs to be re-enabled after resume by user mode. That would remove my issue with the design. --
Making it one shot does not change where kernel code needs to block suspend, but it does force user space to poll trying to suspend while suspend is blocked by a driver. -- Arve Hjønnevåg --
Earlier this month, several folks intersted in embedded PM had a BoF as part of the Embedded Linux Conference[1] in San Francisco. Many of us had concerns about wakelocks/suspend-blockers and I wanted to share some of mine here, since I don't know if embedded folks (other than Google) were included in discussions during the LF Collab summmit. I hope other embedded folks will chime in here as well. My background is in embedded as one of the kernel developers on the TI OMAP SoCs, and I work primarily on PM stuff. My comments are not about this implementation of suspend blockers in particular, but rather on the potential implications of suspend blockers in general. Sorry for the lengthy mail, it's broken up in to 3 parts: - suspend blockers vs. runtime PM - how to handle PM aware drivers? - what about dumb or untrusted apps Suspend blockers vs runtime PM ------------------------------ My primary concern is that suspend blockers attempt to address the same problem(s) as runtime PM, but with a very different approach. Suspend blockers use one very large hammer whereas runtime PM hands out many little hammers. Since I believe power management to be a problem of many little nails, I think many little hammers are better suited for the job. Currently in the kernel, we have two main forms of PM - static PM (system PM, traditional suspend/resume etc.) - dynamic PM (runtime PM, CPUfreq, CPUidle, etc.) And with the addition of suspend blockers we have something in between. In my simple world, I think of suspend_blockers as static PM with a retrofit of some basic dynamic capabilities. In my view, a poor man's dynamic PM. The current design of suspend blockers was (presumably) taken due to major limitations and/or problems in dynamic PM when it was designed. However, since then, some very signifcant improvements in dynamic PM have come along, particularily in the form of runtime PM. What I still feel is missing from this discussion are details about why ...
The other concern here is that for many mobile systems the actual semantic intended by "suspend" as it's currently used is more runtime PM like than full suspend - the classic example of this is that when suspending while on a call in a phone you don't want to suspend the modem or audio CODEC, you want to leave them running. If you use a full system suspend then the drivers for affected components have to play guessing games (or add currently non-standard knobs for apps to twiddle) to decide if the system intends them to actually implement the suspend or not but with runtime PM it all falls out very naturally without any I fully agree with this. We do need to ensure that a runtime PM based system can suspend the CPU core and RAM as well as system suspend can but that seems doable. --
I _think_ it would be hard at least. On ACPI-based systems it's not doable at all AFAICS. However, the real question is whether or not the opportunistic suspend feature is worth adding to the kernel as such and I think it is. To me, it doesn't duplicate the runtime PM framework which is aimed at the power management of individual devices rather than the system as a whole. Thanks, Rafael --
Please forgive the ignorance of ACPI (in embedded, we thankfully live in magical world without ACPI) but doesn't that already happen with CPUidle and C-states? I think of CPUidle as basically runtime PM for the CPU. IOW, runtime PM manages the devices, CPUidle manages the CPU (via C-states), resulting in dynaimc PM for the entire system. What From the use cases presented, the *usage* of suspend blockers is aimed at power management of individual devices or subsystems, just like usage of runtime PM. So I still see a large duplication in the usage and the goals of both frameworks. The goal of both is to always enter lowest-power state except - if there's activity (runtime PM for devices, CPUidle for CPU) - if there's a suspend blocker (opportunitic suspend) In addition, it will likely cause duplicate work to be done in drivers. Presumably, PM aware drivers will want to know if the system is in opportunistic mode. For example, for many drivers, doing runtime PM may not be worth the effort if the system is in opportunistic mode. This last point is especially troubling. I don't find it a comforting path to go down if the drivers have to start caring about which PM policy is currently in use. Kevin --
On Mon, May 3, 2010 at 4:37 PM, Kevin Hilman I'm not that familiar with ACPI either, but I think the S-states entered by suspend are much lower power than the C-states entered by No, suspend blockers are mostly used to ensure wakeup events are not ignored, and to ensure tasks triggered by these wakeup events Why? If a device is not in use it should be off regardless of what -- Arve Hjønnevåg --
I'll echo Arve here -- all drivers should seek to be in the lowest power state possible at all times. We've never suggested that suspend_block is a substitute for that. Suspend blockers give us some flexibility on systems where runtime pm will not get us all the way there. If you can meet your power needs without needing suspend blockers, awesome, you don't need them on that platform. The patchset Arve sent out makes this feature an off-by-default kernel configuration option that compiles out to no-ops when disabled. In our experience, periodic timers and polling, both userspace side and kernelside make suspend blockers a win even on platforms like OMAP which have pretty flexible hardware power management. Going to low power states in idle results in higher average power consumption than going to the same states in full suspend, at least on Android devices shipping today. Brian --
Looking at this from a subsystem/driver author point of view the problem I'm faced with is that as a result of using system suspend much more aggressively the subsystem and driver layers are getting conflicting I think a big part of this for me is that this approach changes the intended use of the system-wide suspend a bit, complicating it a bit further than it already is. We end up doing a suspend (which in the normal course of affairs means that the driver should shut everything down) partly as a substitute for runtime PM on the core system devices and partly because our runtime PM isn't doing as well as it should on the individual devices. I'd be a lot more comfortable with this if it came with a mechanism for communicating the intended effect of a suspend on a per-device level so that the drivers have a clear way of figuring out what to do when the system suspends. If there isn't one we can add subsystem or driver specific hooks, of course, but it'd be better to address this at the There's definite work to be done here, yes. --
Exactly. With runtime PM, there is flexibility in choosing the lowest power state at the device/subsystem level, based on activity, timeouts, bitrate, dependencies, latency/throughput constraints, etc. With opportunistic suspend, all of this flexibility is gone, and the device/subsystem is told to go into the lowest power, highest latency Agreed, and because of this, part of my concern is that opportunistic suspend will take the place of doing the "right thing" which (IMHO) I agree. I think there needs to be more discussion on the indended usage of suspend blockers by drivers/subsystems, especially those PM aware And I've admitted this to be the only compelling reason for opportunistic suspend so far. But the driver/subsystem implications of opportunistic suspend still need some fleshing out IMO. Kevin --
Well, half the problem I have is that unfortunately it's not a case of doing that period. The prime example I'm familiar with is that for understandable reasons users become irate when you power down the audio CODEC while they're in the middle of a call so if opportunistic PM is in use then the audio subsystem needs some additional help interpreting a suspend request so that it can figure out how to handle it. Similar issues apply to PMICs, though less pressingly for various reasons. Just to be clear, I do understand and mostly agree with the idea that opportunistic suspend presents a reasonable workaround for our current inability to deliver good power savings with runtime PM methods on many important platforms but I do think that if we're going to make this standard Linux PM functionality then we need to be clearer about how everything is intended to hang together. --
At the moment the rule of thumb is: if you don't need the opportunistic suspend, don't use it. It is not going to be enabled by default on anything other than Android right now. However, since Android is a legitimate user of the Linux kernel, I see no reason to reject this feature right away. There are many kernel features that aren't used by all platforms. Rafael --
Sure, but there are driver authors and subsystem maintainers who care about optimal PM in "normal" Linux *and* in Android. As a PM maintainer for an embedded platform (OMAP) used in both, I certainly care about both, so "disable it if you don't like it" is not really an option. IMO, driver/subsystem authors should not have to care if the userspace is Android or not. We should be working towards a world where Android is not a special case. Kevin --
I agree, but "working on towards a world where ..." need not mean "be there from day one" IMO. Also, I guess you'll agree that using the same _binary_ kernel with Android and non-Android user spaces doesn't necessarily make sense (regardless of the fact that Android kernels have hardware-specific board files included), so probably you'll still disable opportunistic suspend building the kernel for non-Android systems (at least for the time being). It looks like you're thinking the opportunistic suspend somehow is (or can be used as) a replacement for runtime PM, but this is not the case. The reason why it's not the case is that runtime PM works with the assumption that user space is not frozen and it works on individual devices. OTOH, generally, there is a limit on the amount of energy savings you can achieve with runtime PM alone and something like the opportunistic suspend is necessary to go beyond that limit. Now, I can easily imagine using suspend blockers and runtime PM in the same driver with the general rule that you'll probably want to call suspend_unblock() whenever the device is suspended with the help of the runtime PM framework. That really depends on the device and the driver in question, though. Rafael --
Guys, please. The opportunistic suspend feature is _not_ to replace runtime PM by any means! However, there are situations in which runtime PM is clearly insufficient. The idea behind runtime PM is that subsystems and device drivers will know when to put devices into low power states and save energy this way. Still, even if all subsystems do that 100% efficiently, there may be more savings possible by putting the entire system into a sleep state (like on ACPI-based It really can't, at least on some platforms. For example, resuming a PC notebook from suspend to RAM takes more than 1s (with an SSD; rotational disks add more latency here) which is way too much for such a replacement. Besides, there also is room for runtime PM in the Android framework, because it needs to suspend certain devices without freezing processes, for example. The freezing of processes is the most important difference to me. Runtime PM works with the assumption that processes are not frozen (hence the name), but arguably energy savings you can get without going behind that point are At the moment, if you're not on Android, there are none. Thanks, Rafael --
Certainly in my case and I think Kevin's I agree with the need for the bodge at the minute if we can get a clearer idea of how it's supposed to Right, this is likely to be a win for PCs - for embedded systems we rarely have other software to interoperate with so if you can make the hardware do it then it's unlikely there would be any fundamental issue The use case that causes serious issues with this at the minute in the domains I work is this: On a mobile phone when the system is in a voice call the data flow for the call is entirely handled outside the CPU (normally referred to as Applications Processor or AP here) by the baseband and audio CODEC, which are either integrated or directly connected by analogue or digital audio links on the board. If the user has the phone to their ear and is talking away with the screen blanked then the AP is just waiting for input, will appear idle and so trigger an opportunistic suspend. If the audio CODEC is managed by Linux (as is standard) then running through the suspend process will as things stand cause the audio subsystem to be suspended. Since in the normal course of affairs suspend means power down that's what will happen, but this is clearly highly undesirable in this situation. Now, the solution here if we're going to use opportunistic suspend like this is to provide some method for the audio subsystem to figure out that the system doesn't actually want it to suspend when it gets told do do so. Like I say ideally we'd provide some standard interface in the PM subsystem for userspace to communicate this to subsystems and drivers so that we've got a joined up story when people run into issues in cases like this, though obviously if this goes in then we'll have to put in something subsystem or driver specific for affected areas. I know what I'd implement generically for audio, but I've held back since the option is fairly messy when not used in conjunction with a deliberate choice to use opportunistic suspend and I was ...
Evidently, the Android developers had a problem with that. Of course, you can argue that they didn't consider using runtime PM for this purpose, but the real question is how much time it would take to achieve the same level of energy In that case someone (either a driver or, most likely, user space) will need to My understanding is that on Android suspend blockers are used for this Suspend really is a sledgehammer thing. Either you suspend the whole system or you don't suspend at all. You can prevent opportunistic suspend from happening using suspend blockers (that's what they are for), but that's pretty much everything you can do about it. If you want power savings while some parts of the system are active, use runtime PM for that. What I'd use opportunistic suspend for is not the saving of energy when there's a call in progress, because in that case some parts of the system need to be active, but to save energy when the device is completely idle, ie. the user interface is not used, music is not played "in the background" etc. (my phone is in such a state 90% of the time at least). Rafael --
We actually do use opportunistic suspend for cases like this on Android today. It mostly comes down to a question of latency for return from full suspend. For example: Some MSM7201 based devices typically resume from power collapse in 3-5ms, but absolute worst case (because the L1 radio services are higher priority than the wake-the-apps-cpu services) is ~90ms. On these devices we hold a suspend_blocker while playing audio. Before discovery of this we went into full suspend while playing audio which worked fine, but in certain network conditions we would drop buffers if we could not wake fast enough. So, it's somewhat system dependent, but it can be very helpful in a lot of cases. Brian --
That is not actually what Android systems are doing, and if it is what's supposed to happen then I'd really expect to see a patch to ASoC as part of this series which shows how this is supposed to be integrated - it's the sort of thing I'd expect to see some kernel space management for. Honestly I don't think that's a very good solution for actual systems, though. A part of the reasoning behind designing systems in this way is allowing the AP to go into the lowest possible power state while on a voice call so it doesn't seem at all unreasonable for the system integrator to expect that the AP will be driven into the standard low power state the system uses for it during a call and in a system using On the one hand that's the answer that works well with the existing Linux design here so great. On the other hand as discussed above that Remember that even in your full system suspend the system is not actually fully idle - the baseband is still sitting there looking after itself, keeping the phone on the network. The only difference between on call and off call from the point of view of the Linux system is that there is an active audio path which happens to have been set up by the Linux system. --
On Wed, May 5, 2010 at 4:06 AM, Mark Brown Yup. And that's exactly what happens on the platforms we've shipped on so far -- the apps side of the world fully suspends while in a voice call. Some events (prox sensor, buttons, modem sending a state notification) will wake it up, of course. I haven't spent much time looking at alsa/asoc yet, but it's on my list for 2010 -- I'm hoping to migrate the very simple audio drivers I wrote for the msm platform originally to actually be alsa drivers as part of the general "try to use existing interfaces if they fit" plan we've been working toward. The suspend_block system gets us what we need today (well what we needed three years ago too!) to ship and hit reasonable power targets on a number of platforms. There's been a lot of various handwaving about "android kernel forks" and what have you, but here we are again, trying to work out perhaps the only "new subsystem" type change that our driver code depends on (almost all other contentious stuff is self contained and you can take or leave it). Our hope here is to get something out there in the near term so that the various drivers we maintain would "just work" in mainline (your choice of if you use suspend_block or not -- it's designed to be an option) and we can move forward. If in the future runtime power management or other systems completely obsolete suspend_blockers, awesome, we remove 'em. Brian --
Yup, that'd be good - even with the AP/CP/CODEC SoCs like the MSM devices we really need to get ASoC integration since systems are being built hanging external components such as speaker drivers off the line outputs, and some of those have registers so really do benefit from the sequencing, automated PM and so on that ASoC offers. There was some work on MSM ASoC support posted last year back but there were a number of review issues with it. Daniel Walker also talked about submitting stuff just before Christmas, quite possibly independently of the other work which looked like a community effort, but I've not seen Like I've said a few times here I broadly agree with your goals - they seem to be a reasonable solution to the engineering problems we face presently, even though they're not where we actually want to be. What I do miss from the current proposal is more consideration of how things that do need to ignore the suspend should integrate. If the conclusion is that we don't have anything generic within the kernel then it'd be good to at least have this explicitly spelled out so that we're clear what everyone thinks is going on here and how things are supposed to work. At the minute it doesn't feel like we've had the discussion so we could end up working at cross purposes. I don't want to end up in the situation where I have to work around the APIs I'm using without the relevant maintainers having sight of that since that not only am I likely to be missing some existing solution to the problem but is also prone to causing problems maintaining the underlying API. This hasn't been a pressing issue while the Android code is out of tree since we can just say it's an out of tree patch that has a different design approach to current mainline and it's fairly straightforward for users to introduce suitable system-specific changes. Once it's mainline and part of the standard Linux PM toolkit then obviously subsystems and drivers need to support opportunistic suspend properly ...
We seem to have ended up managing most of our PM infrastructure iteratively. If the concern is more about best practices than intrinsic incompatibilities, I'd lean towards us being better off merging this now and then working things out over the next few releases as we get a better understanding of the implications. The main thing that we have to get right in the short term is the userspace API - everything else is easier to fix up as we go along. -- Matthew Garrett | mjg59@srcf.ucam.org --
Right, the big issue for me is that there's likely to be some form of userspace API visible for controlling this. I think I can keep it mostly in-kernel for audio (by providing new APIs to mark some inputs and outputs as being live during suspend) but need to check. --
OK, but my point was that their *usage* is at the level of inidividual Not necessarily. If a device is not in use, what power state it goes into depends on many device/subsystem specific things. For example, recent activity (timeouts), whether it will be busy soon (pending transfers), latency/throughput constraints, dependency on other devices, or any other device/subsystem specific reason. All of these can be handled with runtime PM. None of which are taken into consideration with opportunistic suspend. Kevin --
ACPI doesn't provide any functionality for cutting power to most devices other than shifting into full system suspend. The number of wakeup events available to us on a given machine is usually small and the wakeup latency large, so it's not terribly practical to do this transparently on most hardware. -- Matthew Garrett | mjg59@srcf.ucam.org --
OK, that's a major difference with embedded SoCs where the kernel must directly manage the power state of all devices using runtime PM. So basically, on ACPI systems, runtime PM doesn't get you any power savings for most devices. Kevin --
I'd say it does for most devices, but the power savings may not be as great as they would be with fine-grained control over power rails. -- Matthew Garrett | mjg59@srcf.ucam.org --
I'd say that this is certainly the main issue, though the remaining I considered this. The problem is that not all of your wakeup events pass through trusted code. Assume we've used a freezer cgroup and the applications are now frozen. One of them is blocking on a network socket. A packet arrives and triggers a wakeup of the hardware. How do we unfreeze the userspace app? I agree that the runtime scenario is a far more appealing one from an aesthetic standpoint, but so far we don't have a very compelling argument for dealing with the starting and stopping of userspace. The use-cases that Google have provided are valid and they have an implementation that addresses them, and while we're unable to provide an alternative that provides the same level of functionality I think we're in a poor position to prevent this from going in. -- Matthew Garrett | mjg59@srcf.ucam.org --
Although both are OMAP3430 and run 2.6.29 you cannot compare the N900 and Droid's perceived user battery life to one another to evaluate opportunistic suspend. There are many factors uncounted for such as: network reception, screen brightness (and size), button back-light, keyboard back-light, modem stack (CDMA vs UMTS). Also the difference Opportunistic suspend is an extension to the current suspend model, not a replacement dynamic / run-time PM. If you can replace good old suspend then this would be a different story. As you mention, Droid uses opportunistic suspend + dynamic pm + cpuidle + freq. So I decided to do some measurements on a Droid using our 2.9.32 kernel (http://android.git.kernel.org/?p=kernel/omap.git;a=summary). For a little better apples to apples comparison. Droid (idle system, airplane mode, screen off, 3 min interval): measured average current - with opportunity suspend: 3.19mA - without opportunistic suspend: 3.5mA Stock userspace build, all I did was replace the kernel. We are hitting retention on idle as well as suspend for omap (instead of full off-mode). Also, your point about wifi, the 15 min timeout is in the framework and is configurable in the code and via UI, nothing to do with kernel, opportunistic suspend or run time suspend. --
Yes, but what does it extend? What aspect it makes the current suspend That's implementation specific. If CPUIdle implemented CPU deep sleep, You don't quite get it :) This should NOT at all be timeout driven. This should be activity driven or constraint driven which perfectly fits into the runtime PM paradigm but is in no way cleanly implemented within the "pure" Android PM. Thanks, Vitaly --
Okay, I measured with and without using the scientific method. With a 1390mAh battery, if you set your device in airplane mode you will get 435.7 hours of standby vs 397.14 hours of standby. Network configuration and cell reception is a big factor here. You can easily get +4-8mA on the original numbers I gave below, so its large The important part here is not the absolute value, but how they compare relatively. If I added off-mode support then I the averages would both drop, but they would not be the same (key point). I simply wanted show with real numbers that there is a difference in opportunistic suspend and without. Instead of purely hypothesizing I admit, our timeout approach is a bit hacky, but there is method for the madness. In order to properly determine when you should turn off wifi for "perfect" power management, you need to know a few things. 1) When, in the 'future' is the user going to turn on the screen. 2) How many network packets will be sent to the device while the screen is off. Some of these factors are Android specific, in particular the device always has a TCP connection open to google servers. So switching from WIFI -> 3G for example causes us to generate extra network traffic as we try to establish our SSL connection to google servers. Likewise this cost is paid when moving from 3G -> wifi. Its *really* hard to predict the future, so we use this timeout based off of various heuristics, usability studies and power measurements. So we attempt to minimize excessive network io for both power reasons and network reasons (ie: cell carriers), as well as user experience (latency to connect / disconnect to a wifi access point). However we are not talking about kernel power management here, we are talking about Android policy that might be dedicated by various OEM / carrier requirements and user experience. I wouldn't necessarily go so far and say its flat out "wrong" what we are doing, but I do feel that it is Android specific enough that ...
Agreed. I was reluctant to even make the comparison for all those reasons, but with the lack of real numbers, it was all I had to show a very rough subjective guess, and at least show that with and without Excellent. Thanks for some real numbers. [at risk of stating the obvious] Since both approaches hit the same power states, if the system was truly idle, you'd expect the numbers to be the same, right? So what the difference shows is that the system is not fully idle, IOW userspace and/or kernel wakeups are cusing the difference. It might also be instructive to see these numbers with a "noop" userspace, like just booting into busybox init=/bin/sh (or the android equivalent.) That would show how much of that difference is due to kernel idleness (or lack thereof.) Even still, to me this all boils down to the lack of definition of the problem and clear description of the solution The fundamental problem is one of idleness. What we want is the system to be idle in order to hit the lowest power states. We can't get there because the system is not truly idle. So, there are basically two solutions: 1) find and fix the problem spots so we can be idle more often 2) add a new definition of idle so we can be idle more often Opporuntistic suspend takes the second approach where the new definition of idle is "no suspend blockers held." Of course, I clearly prefer the former, but it's also becoming clear that since I'm the only one still whining about it, I must be too much of an idealist to keep hoping for it. Kevin --
On Mon, May 17, 2010 at 3:55 PM, Kevin Hilman I'd love to see the former work, but it is not something that I think is going to be fixed rapidly. It potentially involves many different subsystems, and still requires some need for managing arbitrary userspace agents which may have non-ideal behaviors (if we're going to solve the problem for general purpose devices that can run arbitrary user-installed software). Incremental improvements are great though (for example, that we can now be in lowest power idle for periods greater than 2 seconds due to the change in .32). Even when operating in opportunistic suspend, it is still advantageous for idle to be as idle as possible and we should keep working toward that goal. If we get to the point where the power difference between suspend-in-idle and opportunistic suspend is zero, then we no longer need the latter. I don't think anybody on the Google/Android side is arguing that we *shouldn't* pursue dynamic pm and overall making idle more idle more of the time. We're just saying that here and now we are not at the ideal and opportunistic suspend gains us real power savings and is useful. Brian --
Uhuh? "We have this ugly code here, but it works and we don't have better one, so lets merge it"? I don't really like this line of reasoning. I would not want to judge wakelocks here, but... "it works, merge it" should not be used as argument. And btw I do have wakelock-less implementation of autosleep, that only sleeped the machine when nothing was ready to run. It was called "sleepy linux". Should I dig it out? Major difference was that it only sleeped the machine when it was absolutely certain machine is idle and no timers are close to firing -- needing elimination or at least markup of all short timers. It erred on side of not sleeping the machine when it would break something. Still I believe it is better design than wakelocks -- that need markup/fixes to all places where machine must not sleep -- effectively sleeping the machine too often than fixing stuff with wakelocks all over kernel and userspace... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
As has been explained several times now, that's not equivalent. The difference is what makes this a hard problem. If a usecase is valid and nobody can come up with a convincing explanation of what an elegant solution would look like, then the ugly solution wins. -- Matthew Garrett | mjg59@srcf.ucam.org --
I decided to go to sleep with interrupts disabled... it was prototype on x86, so it was rather tricky. I'd expect external events after sleep decision to just wakeup machine immediately, similar to entering deep CPU sleep mode... -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
I also think we need to take a hard look at the process here. --
Hello,
I think many folks are still confused about exactly the problem being
solved by this series as well as mixed up between opportunistic
suspend and suspend blockers. Also, how this series impatcs the rest
of the kernel (especially PM-aware drivers and subsystems) has caused
a bit of confusion.
To help with the confusion, I think a much clearer description of the
problem being solved and the proposed solution is needed.
To that end, I created a starting point for that below which
summarizes how I understand the problem and the proposed solution, but
of course this should be filled out in more detail and updated as part
of the documentation that goes with this series.
Hope this helps improve the understanding of this feature,
Kevin
Table of Contents
=================
1 Problem Statement
2 Solution: Opportunistic suspend
2.1 When to use a suspend blocker
2.2 Usage in PM aware drivers
1 Problem Statement
~~~~~~~~~~~~~~~~~~~~
Want to to hit deep power state, even when the system is not actually
idle.
Why?
- some hardware is not capable of deep power states in idle
- difficulty getting userspace and/or kernel to be idle
2 Solution: Opportunistic suspend
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Create an additional "idle path" which has new rules for determining
idleness. When this new "idle" is reached, trigger full-system
suspend. Since a suspend is triggered whenever the opportunity
arises, this is called opportunistic suspend.
The new rules for making the idleness decision are simple:
1. system may suspend if and only if no suspend blockers are held
2.1 When to use a suspend blocker
==================================
[A list of reasons why suspend blockers might be used would be very
helpful here.]
- ensure wakeup events propagate to userspace (e.g. keypad example in doc)
- screen is on
- someone mentioned "Google use cases"
(would be nice to hear about more of these)
2.2 Usage in PM aware drivers ...Yes, I think the Android developers know quite a few cases where suspend When we have any drivers using both in the tree, they will be used as examples here. Thanks, Rafael --
I agree, this is is the right way to handle when to enter suspend. Especially if the system needs to run while waiting for something To me it sounds like this should only be allowed to happen when you do: # echo 1 > /sys/power/suspend_while_idle As it kills the timers and leads to non-standard behaviour of the apps as they won't run :) And then the remaining question is how to make sure the use cases --
That's handled by the /sys/power/policy opportunistic/forced switch. -- Matthew Garrett | mjg59@srcf.ucam.org --
OK, so can the suspend blocker then become just: # Block suspend while idle, system stays running # echo default > /sys/power/policy and the when it's OK to suspend: # Allow suspend while idle, system suspends when it hits kernel idle loop # echo opportunistic > /sys/power/policy or do you still need something more to ensure the data gets into your app and be handled? The part I really don't like is the idea of patching all over the drivers and userspace for the wakelocks/suspendblocks. Regards, Tony --
I don't like the idea either, but given that nobody has actually provided any other ideas that would actually work then I don't think we've got a great deal of choice. -- Matthew Garrett | mjg59@srcf.ucam.org --
Maybe not if echo opportunistic > /sys/power/policy gets cleared by the kernel if the kernel idle loop can't make it. That means something has blocked the suspend attempt in a driver for example. The system If the opportunistic kernel flag is one time attempt only, then you could take care of the wakelock/suspendblock handling in the userspace completely. And once the userspace wakelock/suspendblock is cleared, the userspace can again echo opportunistic > /sys/power/policy. Regards, Tony --
So an event arrives just as userspace does this write. How do you avoid the race? Plausible answers mostly appear to end up looking like suspend blockers. -- Matthew Garrett | mjg59@srcf.ucam.org --
Assuming you attempt suspend in a custom pm_idle function, any driver handling the event can fail the suspend attempt. And that would clear the opportunistic suspend flag. And the userspace would be still running and could handle the event. And when the userspace is done, it can again echo opportunistic > /sys/power/policy. For the failed suspend path in the kernel, currently the kernel would unwind back all the drivers because of the failed driver, but that path should be possible to optimize. Regards, Tony --
If you think it's possible to make this work then feel free to. But at the point where you're adding code to every driver's suspend function to determine whether or not it's got any pending events that userspace hasn't consumed yet, and adding code to every bit of userspace to allow it to indicate whether or not it's busy consuming events or just busy drawing 3D bouncing cattle, I think you've reinvented suspend blocks. -- Matthew Garrett | mjg59@srcf.ucam.org --
Sorry, I have a working system that idles nicely and stays up on batteries for a long time while running. I don't need to implement anything like this :) Cheers, Tony --
Right, but your system will only idle nicely if all of your userspace is well-written. It's not reasonable to expect that all userspace will be well-written and thus it's necessary to implement a power management strategy that doesn't require that. Refusing an implementation that achieves that on the basis that there's hypothetically a better way of doing it is entirely unreasonable. -- Matthew Garrett | mjg59@srcf.ucam.org --
Yeah, pretty much so. Thanks, Rafael --
There are two parts to this. Suspend freezes tasks that don't idle at all, and it stops the monotonic clock which in turn stops tasks that Most uses of suspend blockers are in some way tied to a potential wakeup event. We use (a) suspend blocker(s) to make sure the event propagates to the code that will handle the event, and that code then uses another suspend blocker while it handles the event. Some examples: The battery monitor on Nexus One uses a periodic alarm to wake up the system. The alarm driver will block suspend so the timer can fire, and the battery monitor will block suspend while reading the battery status and changing the charge mode. Phone rings: We use a few suspend blockers to process the low level message from the modem which end up returning a message on a tty. The last step in the kernel currently uses a wakelock with a timeout since we have not modified the tty layer to block suspend. The user space ril process then blocks suspend while it handles this message. USB: We get a (wakeup) message from the modem that there is power on the usb connector and we block suspend while we detect what is connected. If we are connected to a USB host, we block suspend and An audio driver can block suspend while audio is playing. We don't currently use the runtime pm framework since this is new, but we do runtime pm by turning off clocks and power when the device is inactive. -- Arve Hjønnevåg --
Could you clarify what's going on here please? What do you mean by an "audio driver" here - there's several different classes of driver which work together to produce the embedded audio subsystem and I'm not clear which you're referring to here. I'd not expect a chip-specific audio driver to be taking suspend blockers at all except during things like accessory detect handling which can take a long time. A system-specific driver could reasonably block during playback but doing it in a chip specific driver sounds like a bit too much of a policy decision. The kernel runtime PM framwwork tends not to come into play for anything except MFD and SoC drivers with audio where they're a component of PM on the larger chip and it's useful for coordinating with the other drivers in play. Otherwise we already have very detailed automatic power management (especially for the CODECs) and there's no benefit in bouncing through runtime PM. --
Hello,
Some general comments on the suspend blockers/wakelock/opportunistic
suspend v6 patch series, posted here:
https://lists.linux-foundation.org/pipermail/linux-pm/2010-April/025146.html
The comments below are somewhat telegraphic in the interests of
readability - more specific comments to follow in later E-mails. I am
indebted to those of us who discussed these issues at LPC last year and
ELC this year for several stimulating discussions.
There are several general problems with the design of opportunistic
suspend and suspend-blocks.
1. The opportunistic suspend code bypasses existing Linux kernel code,
such as timers and the scheduler, that indicates when code
needs to run, and when the system is idle. This causes two problems:
a. When opportunistic suspend is enabled, the default mode is to
break all timers and scheduling on the system. This isn't
right: the default mode should be to preserve standard Linux
behavior. Exceptions can then be added for process groups that
should run with the non-standard timer and scheduler behavior.
b. The series introduces a de novo kernel API and userspace API
that are unrelated to timers and the scheduler, but if the point
is to modify the behavior of timers or the scheduler, the
existing timer or scheduler APIs should be extended. Any new
APIs will need to be widely spread throughout the kernel and
userspace.
2. The suspend-block kernel API tells the kernel _how_ to accomplish a
goal, rather than telling the kernel _what_ the goal is. This
results in layering violations, unstated assumptions, and is too
coarse-grained. These problems in turn will cause fragile kernel
code, kernel code with userspace dependencies, and power management
problems on modern hardware. Code should ask for what it wants.
For example, if a driver needs to place an upper bound on its
device wakeup latency, or if it needs to place an upper bound on
...I basically agree, except that despite having a year to do so none of us have come up with a different way that would actually work. Google have done this work. Who's going to prove that there is actually a different way to do this? -- Matthew Garrett | mjg59@srcf.ucam.org --
We all feel the pain of inelegance right? I think it's clear that this system will not last (but there's no other immediate option) .. That doesn't mean we should reject it, but we need to be clear that this system will get replaced. So we should format the patches appropriately. To me the userspace aspect is a permanent change .. If we could drop that (or put it into debugfs) then it would make this a lot easy to accept as a stepping stone. Daniel --
I'm in agreement on the first half of this -- from the Google/Android point of view, if we can someday accomplish everything we need with some different facility, we'll happily shift over to that. That seems like a normal operating mode for mainline -- new solutions arise, drivers are migrated to those new solutions, old solutions fall to the wayside. We fully expect that the world will change over time and one of our largest goals with trying to get work upstream is to reduce the pain of those changes for everyone, while trying to get to "you can build a kernel out of the box from mainline that Just Works with an android userspace". I'm not sure this necessitates using only debugfs for the userspace interface. A userspace interface is necessary to accomplish what we're trying to do here, otherwise we have only half a solution, and our hope is that it'd be a stable interface (as userspace interfaces are supposed to be) for as long as its needed. I could totally imagine the userspace interface eventually becoming a no-op or punching through to some other facility, depending on how this problem is solved long-term in the ideal post-suspend-block future. Brian --
The problem is that once this userspace interface is exposed, it's nearly permanent and has to be support for a long long time .. It might seen trivial to just remove something your not using, but we never know who is using what once the kernel is released. Daniel --
Deprecating sysfs interfaces can be done within 6 months or so, especially if there's only one real consumer. -- Matthew Garrett | mjg59@srcf.ucam.org --
I'll assume your right (can you give an example of this?), but why should we even add it if we know it's already going to get replaced. It's like it's pre-deprecated .. Daniel --
See feature-removal-schedule.txt. So far we have no indication that it's going to be replaced, because nobody has actually suggested a working way to do this better. If we had a concrete implementation proposal for that then we'd be in a pretty different position right now. -- Matthew Garrett | mjg59@srcf.ucam.org --
Ok, feature-removal-schedule.txt applies to everything tho. What your saying is that if this interface only last a short time it might take 6 months, if it last for a long time it would take longer. There's no easy way to know that Google is the only user after some amount of time passes. Daniel --
If the interface is there for a long time, it's because we haven't come up with anything better. And if we haven't come up with anything better, the interface deserves to be there. -- Matthew Garrett | mjg59@srcf.ucam.org --
Moreover, the interface is already in use out-of-tree and that usage is actually _growing_, so we have a growing number of out-of-tree drivers that aren't megrgeable for this very reason. I don't see any _realistic_ way of solving this problem other than merging the opportunistic suspend. If anyone sees one, and I mean _realistic_ and _practically_ _feasible_, please tell me. Rafael --
Why can't we merge the drivers? Drivers are just drivers, they don't need this to get merged. (They need it to work with Android) Daniel --
Because someone would have to remove suspend blockers (or rather wakelocks) from the drivers, test that they work correctly without suspend blockers and submit the modified versions. Going forward, every party responsible for such a driver would have to maintain an out-of-tree version with suspend blockers (or wakelocks) anyway, so the incentive to do that is zero. Practically, as long as the opportunistic suspend is out of tree, there will be a _growing_ number of out-of-tree drivers out there, which is not acceptable in the long run. Rafael --
They should work without wakelock since wakelock are optional .. I mean there's nothing in suspend blockers I've seen that indicates it's required for some drivers to work. So it's just a matter of patching out the wakelocks, with no need to re-test anything. You get the driver mainlined, then maintain a small patch to add wakelocks. Not hard at all , with lots of incentive to do so since you I don't see why your saying that. These driver should work with out all of this, which means they can get mainlined right now. Daniel --
I agree with Daniel here. We should keep merging the drivers separate from the suspend blocks issues. Regards, Tony --
Unfortunately, that's completely unrealistic. Please refer to the Greg's reply for details. Rafael --
Sorry, but it doesn't seem to work that way. Look at the large number of out-of-tree android device drivers that remain sitting there because of the lack of this interface being in the kernel. Also note that such a driver, without wakelocks, would never get tested, and so, things start quickly diverging. thanks, greg k-h --
Is that really the issue? The ones I've looked at have mostly suffered from being built on 2.6.29 and needing refreshes for current kernel APIs or from general code quality issues - I don't recall ever looking at one Chances are that if the driver is useful people will start using it in non-Android systems anyway. As people have already observed wakelocks needn't have any practical effect on the running system so if the drivers are broken without wakelocks they'd be broken anyway. If this really is a big concern with getting drivers into mainline then surely we could just add some noop wakelock functions for the drivers to call? It's not particularly pretty but it'd deal with the driver merge side of things. --
You're missing the point IMO. Even if they are only used on Android, there still is a problem if they don't go into the mainline, because that leads to code divergence that's growing over time and will ultimately create an entire ecosystem that's independent of the mainline. We've been successful in avoiding that for quite some time and I don't think we should allow that to happen right now because of the opportunistic suspend feature. I'm not a big fan of suspend blockers myself, but let's face it, _currently_ You need to prove the reverse, ie. that a driver working correctly with Well, I don't see a big difference between that and adding the feature Again, I don't see why you hate that feature so much. What is there to worry about? Rafael --
See my first paragraph there. My point here is that we appear to have the standard vendor BSP ecosystem problem here rather than a wakelock problem. It's fairly common in the embedded space to get a whole bunch of work done which doesn't make its way into mainline due to a SoC vendor having produced a BSP for their SoC which is based around a particular kernel version which never gets updated. This means users with that SoC can't boot anything newer until someone does the work of mainlining support for the system, meaning that development on systems using that SoC gets stuck on an old kernel which mainline drifts away from. Users find it hard to contribute back since they can't run current code easily and often have to jump through serious hoops to backport drivers from newer kernels. If the SoC is successful enough then you do get something of an ecosystem around the BSP, though eventually that usually results in This is still a work in progress in the embedded space (where wakelocks are primarily relevant). Many of the major vendors are working in the right direction here, but it's far from universal and it's not something I don't think this is a major part of the trend - like I say, the fact that people have been working with an old kernel version is generally If they can be compiled out then the updates really ought to be trivial. If not I really need to go back and reexamine what's going on here to make sure I understand what drivers are supposed to do, I have to As I have said previously I'm not actively opposed to merging wakelocks at this point. My major concern has been addressed, I now agree with most of what the PM guys are saying - it's not the nicest thing ever but it works for now. The issue was that when I originally noticed the patch series was being considered for mainline again was that one effect of using it in a mobile phone with the standard Linux embedded audio subsystem would be to break use cases such as audio on voice calls, which ...
On Thu, May 13, 2010 at 4:36 PM, Mark Brown Current published trees are based on .32 (used for the coming-soon froyo release that's been in late QA for a while now) and forward development is moving to .34 post final (or in the case of tegra2, tracking .34-rc series as it happens). We've been actively snapping up to track mainline since we started doing this around 2.6.16. We'd *love* to be able to get more stuff sanely upstream instead of I'd love to have a separate discussion on using standard linux embedded audio for mobile devices -- one of my goals for 2010 is to try to migrate from our "one off" approach on MSM to making use of ALSA and standard interfaces. I have a lot of questions about handing encoded data (mp3/aac/etc) that will be processed by the DSP, how to approach routing control, and how to best interact with the user/kernel interface, etc. Brian --
Yeah, I had noticed the development stuff was more up to date - it'll be good once it gets out of QA and into general use since system Having greatly cut down the number of out of tree drivers I'm carrying I Probably best to do that, this thread is already quite big enough without going off on a tangent. FWIW we can usually be found in #alsa-soc on freenode as well as on the lists. A couple of pointers This usually doesn't go through ALSA at all (the ALSA APIs can't cope with variable bitrate data), the data currently goes via DSP specific APIs and gets injected into the ALSA domain in much the same way as data Routing control for embedded systems is done by exposing the routing control to userspace via ALSA controls which can be set by apps - using a GUI to configure the routes interactively is a massive usability win in development. Abstraction for generic user-visible apps is intended to be handled by a layer on top of that: http://www.slimlogic.co.uk/?p=40 which is currently in development but I believe is expected to come to fruition this year (Liam is driving this one). --
Yes, it is an issue. See the mess that we have had in trying to get some of the original G1 drivers merged in the staging tree for proof of No, not usually. Do you really think that someone else is going to use I fail to see why getting the real functionality of the wakelocks into the kernel is a problem. Especially as they fit a real need, and work well for that. thanks, greg k-h --
Ah, that's surprising - I had thought most of the issues there were due to the substantial MSM architecture core code which most of the G1 stuff depended on (things like the DSP interface and so on) and the general need for staging-style updates which churned against the non-mainline versions rather than the wakelocks. It's true that the wakelocks weren't helping, though. Most of the cases I've seen have been off-CPU drivers that were either working to outdated APIs or having to jump through hoops because they Not if it's genuinely G1 specific, although presumably most of that driver is actually chip drivers for the various components of a camera subsystem which may well appear in other systems (not that I have the Well, like I've said I personally don't object to merging them now that the audio use case has been sorted. I suggested this because it would allow something to be put in place to facilitate driver merging which would avoid the core and userspace issues that people are still raising with the implementation and let people get on with at least the driver side. If wakelocks don't make the next merge window and there is a problem with drivers then it'd be nice to get the stubs in so that the APIs are present in subsystem trees for drivers to merge. --
Do wakelock enabled drivers require a wakelock aware user space to function properly? If the driver is added you want to make sure the benefit is there and testable for all userspaces. Early Android service managers did take/release userspace locks to ensure the handoff worked. Getting all these drivers is positive. Getting to some constraint mechanism is positive. It does need to exist end to end to make a difference at the battery. Regards, Richard W. --
Some of our drivers may not work correctly with forced suspend, but if you don't use suspend at all, the wakelocks have no effect and all the -- Arve Hjønnevåg --
Drivers with correct wakelock usage will play nice with opportunistic suspend. If you're not using opportunistic suspend, you probably don't need wakelocks at all, and the driver (unless it's broken) should work just Definitely. The fact that say the dsp audio driver or serial driver on MSM use wakelocks to play nice with opportunistic suspend should have no impact on, say, Debian-on-G1 which is not using that feature. Debian would still be able to play audio or write to the serial port. With wakelock support in the kernel, I'm able to maintain drivers that (provided they meet the normal style, correctness, etc requirements) that both can be submitted to mainline (yay!) and can ship on production hardware as-is (yay!). Porting other linux based environments to hardware like G1, N1, etc becomes that much easier too, which hopefully makes various folks happy. This helps get us ever closer to being able to build a production-ready kernel for various android devices "out of the box" from the mainline tree and gets me ever closer to not being in the business of maintaining a bunch of SoC-specific android-something-2.6.# trees, which seriously is not a business I particularly want to be in. Brian --
I don't think that's due to this interface tho. During your CELF presentation you noted several bits of code that could go in right now but haven't (and still haven't as far as I've seen). I'm actively pushing code related to Android (with wakelocks removed).. Putting a That's not totally true. For example the MMC driver had wakelocks (I think, or for sure mmc core does), and the MMC driver has been tested for G1 and works fine so far without them. I have code that is queued for my tree that will enable MMC on G1. I can merge Nexus one support through my tree also which would allow all the drivers for that device to eventually be used. With that device support in mainline then those drivers become nice things to have with or with out wakelocks. You don't need wakelocks to run Debian or anything else except Android. Daniel --
| Jesse Barnes | Re: [stable] [BUG][PATCH] cpqphp: fix kernel NULL pointer dereference |
| Greg KH |
