Cc: In article <20jor2$2d0@hunter-05.cs.strath.ac.uk> gor@cs.strath.ac.uk (Gordon Russell) writes:The linux scheduling algorithm is one of the simplest ones possible: what essentially happens is roughly: - every task gets a fixed number of ticks (100 Hz), defaulting to 15. At each clocktick the timer interrupt decreases the current task counter, and when the counter reaches zero, it sets the 'need_resched' flag. Note that this doesn't actually result in a task switch yet. - at every return to user mode, the low-level routines check the 'need_resched' flag, and call the scheduler if it is set, otherwise just do a normal return. - the scheduler just checks all processes, selects the one that has the highest tick-number, and lets it run. If all runnable processes have zero ticks remaining, the scheduler does an aging algorithm and tries again. - the aging algorithm just does a divide-by-two on the tick counter (0 for all runnable processes, but may be non-zero for sleeping processes), and adds the base timer value (15 by default, but changed by 'nice' etc) to the result. The above is the basic logic - very simple, and results in round-robin behaviour when all tasks are CPU-bound. If something sleeps for a longer time, it will get a higher tick-counter due to the aging mechanism, so that when it wakes up it can have up to 2*default ticks to do any work (although I doubt the aging actually makes much difference). In addition to the above, the linux 'wake_up()' functions also set the 'need_resched' flag if the task awakened has a higher tick counter than the currently running task. This is done to get better IO throughput and response times. Note that linux does not keep any state information in the scheduler: this makes the actual scheduling a bit inefficient, but gives more freedom in the sleep/wakup code to do some things that would otherwise be hard to do cleanly. The linux scheduler can be bogged down under heaveir load, but the logic hasn't changed for some time, mainly due to: - it's simple, and *very* flexible. Anybody interested in why things are done as they are, should probably take a look at the "add_wait_queue()" and "remove_wait_queue()" functions, especially the way they are used by select() and the tty routines. It can be confusing to see a process set it's own state to TASK_INTERRUPTIBLE (ie sleeping), yet still continue to run, but there are very good reasons for it (races). - it actually gives reasonable performance and good responsetimes under most circumstances, and has little overhead due to various priority calculations. Jim Wissner actually did some testing on the algorithm and implemented a more clever version (with priority queues), but he also said that the simple version seemed to work very well in almost all circumstances. Hope that gave you some idea about it, Linus
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Christoph Lameter | [00/41] Large Blocksize Support V7 (adds memmap support) |
| Linus Torvalds | Linux 2.6.27-rc5 |
git: | |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | [GIT]: Networking |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| Nick Piggin | Re: Mainline kernel OLTP performance update |
