On Wed, May 26, 2010 at 02:53:16PM +0200, Peter Zijlstra wrote:
[...]
With my userspace developer hat on, I'd *kill* for a way to tell
the kernel that there are more important things for the system to
be doing than executing my runnable task. In some cases, the set of
"more important things" the system might include running other tasks,
but it also might include conserving power. I'd like to have my program
tell the kernel things like "wake me up in 0.1 seconds, plus or minus
a year if you have something better to do."
With my sysadmin hat on (which is nearly identical to my phone owner hat,
BTW), I'd like whatever syscall implements those features to take a PID
argument, so I can impose my importance decisions on other processes.
I'd also like to set the relative importance of keeping the CPU idle on
the same scale, so that I could raise or lower the importance of keeping
the CPU idle as power availability changes.
It's impossible in the general case for an application to know whether
it's important or not, so it's also impossible for the kernel to derive
this information from the application's behavior--and impossible, in the
general case, to decide whether the application is more important than the
battery or some other scarce resource the kernel might also be managing
(e.g. if the machine is running hot, heat dissipation might be scarce,
and we'd want to be idle then too). This is similar to niceness and
SCHED_RR/FIFO: there's no way for the kernel to automatically assign
those values either, they have to be specified by a user or administrator.
Of course, programs are free--within limits--to specify these values
about themselves.
Consider a traditional Unix program like "sort". Seriously, how is "sort"
supposed to know that it's the most important application on the system
(because I need my contacts list alphabetized *now*), or the least
(because the screensaver needs to know which is the oldest graphics
hack in the list)?
"sort" gets invoked from a shell, cooperating with other processes to do
its work. It knows very little about the context in which it is executing
(nor should it). Should "sort" sprout command-line arguments for every
possible scheduling latency and power management policy option, or should
"sort" not care, and defer such decisions to other command-line
tools which set these options before exec()ing "sort", or to a management
utility like "top" that implements policy across the entire system?
--