Ingo Molnar [interview]'s Completely Fair Scheduler [story] has been merged into the Linux kernel for inclusion in the upcoming 2.6.23 release. A comment in the patch titled 'sched: cfs core code' noted, "apply the CFS core code. This change switches over the scheduler core to CFS's modular design and makes use of kernel/sched_fair/rt/idletask.c to implement Linux's scheduling policies." Another patch included documentation which described the new scheduler, "80% of CFS's design can be summed up in a single sentence: CFS basically models an 'ideal, precise multi-tasking CPU' on real hardware." It goes on to explain:
"CFS's task picking logic is based on this p->wait_runtime value and it is thus very simple: it always tries to run the task with the largest p->wait_runtime value. In other words, CFS tries to run the task with the 'gravest need' for more CPU time. So CFS always tries to split up CPU time between runnable tasks as close to 'ideal multitasking hardware' as possible.
"Most of the rest of CFS's design just falls out of this really simple concept, with a few add-on embellishments like nice levels, multiprocessing and various algorithm variants to recognize sleepers."
Credit was given to four developers other than Ingo Molnar: "Con Kolivas, for pioneering the fair-scheduling approach [story]; Peter Williams, for smpnice; Mike Galbraith, for interactivity tuning of CFS, Srivatsa Vaddagiri, for group scheduling enhancements [story]".
The CFS scheduler replaces Ingo's own O(1) scheduler which was first proposed on January 3rd, 2002 [story], and merged into the 2.5.2-pre10 kernel a few days later, on January 8th [story].
What really happened
Get real folks. Here's the inside scoop, plain and simple:
The 2.6 O(1) scheduler has severe issues, which really became obvious (to me, at least) on multi-core CPUs. A replacement was definitely needed.
Con has been doing a good job maintaining out-of-tree kernel patches that help with system performance. His initial SD scheduler was *usually* better than the O(1), but had some terrible interactivity failings. This is where Con screwed up: he refused to acknowledge/fix those failings, and just basically said "too bad", and "I'm too busy (health) to do much right now".
Just the type of support situation one doesn't want to have when introducing a new core kernel ingredient.
Given the small testing base, and lack of good support from Con, several other kernel developers proposed their own schedulers. Ingo actually wrote one: CFS. He published it, and actively solicited testers, bug reports, and fixes from Con and the entire Linux community. This contrasted with Con's much less open support model.
CFS started out with deficiences compared to SD/DL, but over time those were identified, and the code was tweaked/redesigned as required, with no hestitation from Ingo. Excellent support, and a model example of how Linux kernel development is supposed to happen.
When Linus had to choose something to replace the outdated O(1) scheduler, the choice was made easy. CFS has had a much broader testing base, a lead designer dedicated to listening to others and incorporating their improvements, and in the end was *measured* to be the best available scheduler by any benchmarks that were used. Including the ever so important "support" benchmark.
Now it's getting even broader use than in the testing phase, and there will be new problems/bugs discovered, and those will need to be fixed. Having Ingo actively digging for them and fixing them, will help keep 2.6.23 on track.
Cheers
Re: Con has been doing a good
Con has been doing a good job maintaining out-of-tree kernel patches that help with system performance.
This is the only thing you got right in your post.
His initial SD scheduler was *usually* better than the O(1), but had some terrible interactivity failings.
Stop lying. Many tests have shown otherwise. In fact CFS kept (or still is) trying to keep up with the performance of SD in many ways.
This is where Con screwed up: he refused to acknowledge/fix those failings, and just basically said "too bad", and "I'm too busy (health) to do much right now".
Get a freaking clue. How can you code (or even be nice, for that matter), when your health is so bad that you're almost hospitalized?!
I've been on the ck-list for many years and Con has *always* been very responsive to bug reports. I can ensure you that he is also a very nice person.
Given the small testing base, and lack of good support from Con,
Of course he had small testing base as the "insider boys" didn't let him in, and merge his work to mm-mainline.
several other kernel developers proposed their own schedulers. Ingo actually wrote one: CFS.
So, in fact Ingo took advantage of Con's bad health. Nice!
The absolute truth is that *truly* improving the kernel scheduler was *never* Ingo's mission until he noticed that Linus was really interested in Con's design.
So instead of helping Con (who was sick and needed help) he just "had to" make his own design... So he *took* Con's ideas as a basis for his design in a hurry, so that he could somehow keep his position in the Linux development Merry Go Round.
The fact is that while Ingo has been overlooking the scheduler (he maintained!) all these years, it has been Con's true heart and passion to make it better.
And then comes Ingo and literally steals his place in the spotlight... then gives a little credit to Con and so comes off like an angel.
Oh man, Con must be pissed!
He published it, and actively solicited testers, bug reports, and fixes from Con and the entire Linux community.
And yet SD scheduler was more stable and had less bugs far sooner than CFS...
Even if SD is not perfect, it's not as if it could not have been improved further by the support and testing from mainline developers and users. Also this would have given Con motivation to improve his design further.
This contrasted with Con's much less open support model.
This is utter and absolute bullshit! Man you really don't know what the hell you're talking about. Stop lying your ass out.
Give me a break. Are Linux
Give me a break. Are Linux kernel developers first and foremost engineers, or drama queens that require a spotlight on their great ideas?
CK, and everyone else, should be happy that the scheduler issues got fixed, no matter whose code it was in the end. Who cares? The product is better now.
they care not just for fame
they care not just for fame and ego, but also because their salary and next jobs are based on this... So yea why would everyone be pink and smily because this is open source?
No, assholes, everywhere. Human nature.
Scheduling by hand
Con's scheduler was so much better than CFS that it needed *manual* adjustment to priority level of X for things to work as expected. Now, there is some fine work, right there :-)
Will this help console response when the system is I/O bound?
This biggest complaint that I have always had with Linux is that when there is a significant amount of I/O happening on the system, the consoles really lag. This is something that Solaris always handled far better than Linux.
With the new scheduler, will this problem be a thing of the past?
Wrong scheduler.
That's really more of an I/O scheduler problem than a CPU scheduler problem, although the two certainly intertwine. For example, an I/O bound program won't use very much CPU, so it's very likely to get what CPU it does ask for, so that it can continue to generate I/O at a very high rate.
If the I/O scheduler doesn't allow other tasks to get around it, then they block. The lag you see on the console probably has to do with paging traffic and other file reads that get blocked up by heavy writers. Dependent reads (e.g. a sequence of reads that depend on each other, such as directory lookup -> inode lookup -> file) are the worst hit, because their latency goes up.
Personal experience
I agree with the decision to merge CFS over SD. IngoM has given credit to ConK and has been very genial about his contributions, from reading the LKML. In my eyes, SD has not seen the development and community support that CFS has, in sheer number of developers and patch release cycles. (I believe this is why the -mm series stopped having SD patches, but not sure.)
CFS definitely has better latency and fairness characteristics on my SMP laptop, which is why when both were releasing patches, I preferred CFS over SD. It *feels* better, which is the point of an interactivity feature. But take it with a grain of salt, it's just this user's experience.
SMP laptop?! 2 different
SMP laptop?!
2 different processors in one laptop? Are you sure? It's not 2 cores instead?
And if it's 2 cores i can report, that with the -ck patch, on my laptop it's much better to watch movies while compiling kernels.
Multicore is SMP
Two cores, whether on the same piece of silicon or two pieces of silicon is still considered SMP. The memory architecture outside the CPU might be non-uniform, but it's still symmetric multi-processing from the standpoint of the CPU.
Asymmetric multiprocessing is where there are two noticeably different CPUs (e.g. an ARM and a DSP, such as you find on TI's OMAP and DaVinci CPUs).
Multithreading (symmetric or asymmetric) is yet a different beast. Hyperthreading fits in this category.
the development and
the development and community support that CFS has, in sheer number of developers and patch release cycles.
CFS support? You mean Ingo replying to himself and the reports " it doesn't work good on this, and this, and this condition". Yes, it needs a lot of patches and releases. True.
Have you treyed both? looked to the code?
Least-Time-To-Go versus CFS
It is not clear if the waiting tasks in CFS are sorted by their expected completion times, with the task with the earliest completion time running or prempting the current one which has a later completion time. In either case, some form of deadline inheritance would be needed if a task Y with a later completion time is blocking another task X with an earlier deadline that also needs access to a shared resource. Does CFS use deadline inheritance to avoid deadline inversion? That is, does it allow the blocking task Y to be speeded up (given the same deadline as X) while it is within its critical region using a shared resource?
Are there any theoretical papers comparing CFS versus other hard real-time schedulers?
Good!
Nice to see some healthy arguing, even tho some parts are mainly flaming, but still, finally ppl are talking!
That's a Prince Mishkin (by
That's a Prince Mishkin (by Dostoyevsky) Scheduler, so it should be named PM(D)S :)
Actually, for me it would be interesting, how it will behave under a typical ERP database workload, for example a DB2 with a lot of connections, interactive and batch applications together, and having a DB2 governer working.