As noted, we already have this.
This is a little tricky. Rafael's model currently does not allow
wakeup events started by pm_wakeup_event() to be cancelled any way
other than by having their timer expire. This essentially means that
for some devices, expire_count will always be the same as count and for
others it will always be 0. To change this would require adding an
extra timer struct, which could be done (in fact, an earlier version of
the code included it). It would be nice if we could avoid the need.
Does Android use any kernel-internal wakelocks both with a timer and
with active cancellation?
This could be done easily enough, but if it's not very useful then
there's no point.
Easily added. But you didn't mention any field saying whether the
wakelock is currently active. That could be added too (although it
would be racy -- but for detecting unreleased wakelocks you wouldn't
care).
Also easily added.
Not applicable to general systems. Is there anything like it that
_would_ apply in general?
Again, easily added. The only drawback is that all these additions
will bloat the size of struct device. Of course, that's why you used
separately-allocated structures for your wakelocks. Maybe we can
change to do the same; it seems likely that the majority of device
structures won't ever be used for wakeup events.
Rafael doesn't _discourage_ drivers from doing this. However you have
to keep in mind that many kernel developers are accustomed to working
on systems (mostly PCs) with a different range of hardware devices from
embedded systems like your phones. With PCI devices(*), for example,
there's no clear point where a wakeup event gets handed off to
userspace.
On the other hand, there's no reason the input layer shouldn't use
pm_stay_awake and pm_relax. It simply hasn't been implemented yet.
Alan Stern
(*) Speaking of PCI devices, I'm not convinced that the way Rafael is
using the pm_wakeup_event interface in the PCI core is entirely
correct. The idea is to resolve the race between wakeup events and
suspend. The code assumes that a wakeup event will be consumed in 100
ms or less, which is a reasonable assumption.
But what sorts of things qualify as wakeup events? Right now, the code
handles only events coming by way of the PME# signal (or its platform
equivalent). But that signal usually gets activated only when a PCI
device is in a low-power mode; if the device is at full power then it
simply generates an IRQ. It's the same event, but reported to the
kernel in a different way. So consider...
Case 1: The system is suspending and the PCI device has already been
placed in D3hot when an event occurs. PME# is activated,
the wakeup event is reported, the suspend is aborted, and the
system won't try to suspend again for at least 100 ms. Good.
Case 2: The system is running normally and the PCI device is at full
power when an event occurs. PME# isn't activated and
pm_wakeup_event doesn't get called. Then when the system
tries to suspend 25 ms later, there's nothing to prevent it
even though the event is still being processed. Bad.
In case 2 the race has not been resolved. It seems to me that the
only proper solution is to call pm_wakeup_event for _every_ PCI
interrupt. This may be too much to add to a hot path, but what's the
alternative?
--