Cc:
The linux scheduling algorithm is one of the simplest ones possible:
what essentially happens is roughly:
- every task gets a fixed number of ticks (100 Hz), defaulting to 15.
At each clocktick the timer interrupt decreases the current task
counter, and when the counter reaches zero, it sets the
'need_resched' flag. Note that this doesn't actually result in a
task switch yet.
- at every return to user mode, the low-level routines check the
'need_resched' flag, and call the scheduler if it is set, otherwise
just do a normal return.
- the scheduler just checks all processes, selects the one that has the
highest tick-number, and lets it run. If all runnable processes have
zero ticks remaining, the scheduler does an aging algorithm and tries
again.
- the aging algorithm just does a divide-by-two on the tick counter (0
for all runnable processes, but may be non-zero for sleeping
processes), and adds the base timer value (15 by default, but changed
by 'nice' etc) to the result.
The above is the basic logic - very simple, and results in round-robin
behaviour when all tasks are CPU-bound. If something sleeps for a
longer time, it will get a higher tick-counter due to the aging
mechanism, so that when it wakes up it can have up to 2*default ticks to
do any work (although I doubt the aging actually makes much difference).
In addition to the above, the linux 'wake_up()' functions also set the
'need_resched' flag if the task awakened has a higher tick counter than
the currently running task. This is done to get better IO throughput
and response times. Note that linux does not keep any state information
in the scheduler: this makes the actual scheduling a bit inefficient,
but gives more freedom in the sleep/wakup code to do some things that
would otherwise be hard to do cleanly.
The linux scheduler can be bogged down under heaveir load, but the logic
hasn't changed for some time, mainly due to:
- it's simple, and *ve...