Ingo Molnar announced that version 24 of his Completely Fair Scheduler patch is now available backported to the 2.6.24-rc3, 2.6.23.8, 2.6.22.13, and 2.6.21.7 kernels. He noted that there have been significant changes since the last backport, "36 files changed, 2359 insertions(+), 1082 deletions(-). That's 187 individual commits from 32 authors." Ingo noted, "99% of these changes are already upstream in Linus's git tree and they will be released as part of v2.6.24. (there are 4 pending commits that are in the small 2.6.24-rc3-v24 patch.)" He also highlighted some of the more significant improvements:
"Improved interactivity via Peter Ziljstra's 'virtual slices' feature. As load increases, the scheduler shortens the virtual timeslices that tasks get, so that applications observe the same constant latency for getting on the CPU. (This goes on until the slices reach a minimum granularity value).
"CONFIG_FAIR_USER_SCHED is now available across all backported kernels and the per user weights are configurable via /sys/kernel/uids/. Group scheduling got refined all around."
From: Ingo Molnar <mingo@...> Subject: [patch/backport] CFS scheduler, -v24, for v2.6.24-rc3, v2.6.23.8, v2.6.22.13, v2.6.21.7 Date: Nov 19, 11:17 am 2007By popular demand, here is release -v24 of the CFS scheduler patch.
It is a full backport of the latest & greatest scheduler code to
v2.6.24-rc3, v2.6.23.8, v2.6.22.13, v2.6.21.7. The patches can be
downloaded from the usual place:http://people.redhat.com/mingo/cfs-scheduler/
There's tons of changes since v22 was released:
36 files changed, 2359 insertions(+), 1082 deletions(-)
that's 187 individual commits from 32 authors.
So even if CFS v22 worked well for you, please try this release too and
report regressions (if any).There are countless improvements in -v24 (see the shortlog further below
for details), but here are a few highlights:- improved interactivity via Peter Ziljstra's "virtual slices" feature.
As load increases, the scheduler shortens the virtual timeslices that
tasks get, so that applications observe the same constant latency for
getting on the CPU. (This goes on until the slices reach a minimum
granularity value)- CONFIG_FAIR_USER_SCHED is now available across all backported
kernels and the per user weights are configurable via
/sys/kernel/uids/. Group scheduling got refined all around.- performance improvements
- bugfixes
99% of these changes are already upstream in Linus's git tree and they
will be released as part of v2.6.24. (there are 4 pending commits that
are in the small 2.6.24-rc3-v24 patch.)As usual, any sort of feedback, bugreport, fix and suggestion is more
than welcome!Ingo
------------------>
Adrian Bunk (3):
sched: make kernel/sched.c:account_guest_time() static
sched: proper prototype for kernel/sched.c:migration_init()
sched: make sched_nr_latency staticAlexey Dobriyan (1):
sched: uninline schedulerAndi Kleen (5):
sched: cleanup: remove unnecessary gotos
sched: cleanup: refactor common code of sleep_on / wait_for_completion
sched: cleanup: refactor normalize_rt_tasks
sched: remove stale comment from sched_group_set_shares()
sched: fix return value of wait_for_completion_interruptible()Arjan van de Ven (1):
Make scheduler debug file operations constBalbir Singh (1):
sched: fix delay accounting regressionChristian Borntraeger (1):
sched: fix accounting of interrupts during guest execution on s390Cliff Wickman (1):
hotplug cpu: migrate a task within its cpusetDhaval Giani (1):
sched: group scheduling, sysfs tunablesDmitry Adamushko (16):
sched: clean up struct load_stat
sched: clean up schedstat block in dequeue_entity()
sched: sched_setscheduler() fix
sched: add set_curr_task() calls
sched: do not keep current in the tree and get rid of sched_entity::fair_key
sched: optimize task_new_fair()
sched: simplify sched_class::yield_task()
sched: rework enqueue/dequeue_entity() to get rid of set_curr_task()
sched: yield fix
sched: fix __pick_next_entity()
sched: tidy up SCHED_RR
sched: cleanup, remove calc_weighted()
sched: cleanup, make dequeue_entity() and update_stats_wait_end() similar
sched: fix group scheduling for SCHED_BATCH
sched: fix __set_task_cpu() SMP race
sched: remove activate_idle_task()Eric Dumazet (1):
sched: cleanup, use NSEC_PER_MSEC and NSEC_PER_SECEugene Teo (1):
Fix tsk->exit_state usageGautham R Shenoy (1):
sched: fix rt ptracer monopolizing CPUHiroshi Shimamoto (1):
sched: clean up sched_fork()Ingo Molnar (80):
sched: fix sysctl_sched_child_runs_first flag
sched: resched task in task_new_fair()
sched: small sched_debug cleanup
sched: debug: track maximum 'slice'
sched: uniform tunings
sched: use constants if !CONFIG_SCHED_DEBUG
sched: remove stat_gran
sched: remove precise CPU load
sched: remove precise CPU load calculations #2
sched: track cfs_rq->curr on !group-scheduling too
sched: cleanup: simplify cfs_rq_curr() methods
sched: uninline __enqueue_entity()/__dequeue_entity()
sched: speed up update_load_add/_sub()
sched: clean up calc_weighted()
sched: introduce se->vruntime
sched: move sched_feat() definitions
sched: optimize vruntime based scheduling
sched: simplify check_preempt() methods
sched: wakeup granularity increase
sched: add se->vruntime debugging
sched: remove SCHED_FEAT_SKIP_INITIAL
sched: add more vruntime statistics
sched: debug: update exec_clock only when SCHED_DEBUG
sched: remove wait_runtime limit
sched: remove wait_runtime fields and features
sched: fix delay accounting performance regression
sched: prettify /proc/sched_debug output
sched: enhance debug output
sched: kernel/sched_fair.c whitespace cleanups
sched debug: BKL usage statistics
sched: remove unneeded tunables
sched debug: print settings
sched debug: more width for parameter printouts
sched: entity_key() fix
sched: remove condition from set_task_cpu()
sched: remove last_min_vruntime effect
sched: undo some of the recent changes
sched: fix sign check error in place_entity()
sched: fix sched_fork()
sched: remove set_leftmost()
sched: clean up schedstats, cnt -> count
sched: cleanup, remove stale comment
sched: mark scheduling classes as const
sched: whitespace cleanups
sched: vslice fixups for non-0 nice levels
sched: optimize schedule() a bit on SMP
sched: tweak wakeup granularity
sched: run sched_domain_debug() if CONFIG_SCHED_DEBUG=y
sched: break out if printing a warning in sched_domain_debug()
sched: style cleanup
sched: kfree(NULL) is valid
sched: cleanup: rename SCHED_FEAT_USE_TREE_AVG to SCHED_FEAT_TREE_AVG
sched: cleanup: rename task_grp to task_group
sched: cleanup: function prototype cleanups
sched: fix: move the CPU check into ->task_new_fair()
sched: update comment
sched: clean up is_migration_thread()
sched: do not normalize kernel threads via SysRq-N
sched: do not wakeup-preempt with SCHED_BATCH tasks
sched: speed up context-switches a bit
sched: reintroduce cache-hot affinity
sched: debug: increase width of debug line
sched: debug, improve migration statistics
sched: allow the immediate migration of cache-cold tasks
sched: affine sync wakeups
sched: sync wakeups preempt too
sched: cleanup, fix spacing
sched: cleanup, make struct rq comments more consistent
sched: add KERN_CONT annotation
sched: fix fastcall mismatch in completion APIs
sched: clean up sched_domain_debug()
sched: fix style of swap() macro in kernel/sched_fair.c
sched: fix style in kernel/sched.c
sched: reintroduce SMP tunings again
sched: turn off PREEMPT_RESTRICT
sched: remove PREEMPT_RESTRICT
sched: wakeup preemption fix
sched: clean up the wakeup preempt check
sched: clean up the wakeup preempt check, #2
sched: reorder SCHED_FEAT_ bitsJames Bottomley (1):
sched: fix incorrect assumption that cpu 0 existsKen Chen (2):
sched: fix improper load balance across sched domain
sched: reduce schedstat variable overhead a bitLaurent Vivier (2):
sched: guest CPU accounting: maintain stats in account_system_time()
sched: don't clear PF_VCPU in schedulerMatthias Kaehlcke (1):
sched: use list_for_each_entry_safe() in __wake_up_common()Michael Neuling (2):
Add scaled time to taskstats based process accounting
kernel/sched.c: remove bogus comment from account_user_timeMike Galbraith (3):
sched: fix SMP migration latencies
sched: fix formatting of /proc/sched_debug
sched: prevent wakeup over-schedulingMilton Miller (7):
sched: domain sysctl fixes: use kcalloc()
sched: domain sysctl fixes: use for_each_online_cpu()
sched: domain sysctl fixes: unregister the sysctl table before domains
sched: domain sysctl fixes: do not crash on allocation failure
sched: domain sysctl fixes: add terminator comment
sched: more robust sd-sysctl entry freeing
sched: fix sched_domain sysctl registration againOleg Nesterov (3):
do CPU_DEAD migrating under read_lock(tasklist) instead of write_lock_irq(tasklist)
migration_call(CPU_DEAD): use spin_lock_irq() instead of task_rq_lock()
sched: fix SCHED_FIFO tasks & FAIR_GROUP_SCHEDPaul E. McKenney (1):
sched: export cpu_clock()Paul Jackson (2):
cpuset: remove sched domain hooks from cpusets
cpuset sched_load_balance flagPaul Menage (4):
Task Control Groups: example CPU accounting subsystem
Fix cpusets update_cpumask
sched: clean up some control group code
sched: report CPU usage in CFS cgroup directoriesPavel Emelyanov (3):
pid namespaces: changes to show virtual ids to user
Uninline find_task_by_xxx set of functions
Use helpers to obtain task pid in printksPeter Williams (2):
sched: reduce balance-tasks overhead
sched: isolate SMP balancing code a bit morePeter Zijlstra (21):
sched: simplify SCHED_FEAT_* code
sched: new task placement for vruntime
sched: simplify adaptive latency
sched: clean up new task placement
sched: add tree based averages
sched: handle vruntime 64-bit overflow
sched: better min_vruntime tracking
sched: add vslice
sched debug: check spread
sched: max_vruntime() simplification
sched: clean up min_vruntime use
sched: speed up and simplify vslice calculations
sched: another wakeup_granularity fix
sched: disable sleeper_fairness on SCHED_BATCH
sched: disable forced preemption by default
sched: activate task_hot() only on fair-scheduled tasks
sched: fix unconditional irq lock
sched: fix vslice
sched: documentation: place_entity() comments
sched: reintroduce the sched_min_granularity tunable
sched: avoid large irq-latencies in smp-balancingS.Caglar Onur (1):
sched debug: BKL usage statistics, fixSatyam Sharma (1):
sched: use show_regs() to improve __schedule_bug() outputSrivatsa Vaddagiri (16):
sched: group-scheduler core
sched: revert recent removal of set_curr_task()
sched: fix minor bug in yield
sched: print nr_running and load in /proc/sched_debug
sched: print &rq->cfs stats
sched: clean up code under CONFIG_FAIR_GROUP_SCHED
sched: add fair-user scheduler
sched: group scheduler wakeup latency fix
sched: group scheduler SMP migration fix
sched: group scheduler, fix coding style issues
sched: group scheduler, fix bloat
sched: group scheduler, fix latency
sched: fix new task startup crash
Hook up group scheduler with control groups
sched: move rcu_head to task_group struct
sched: fix copy_namespace() <-> sched_fork() dependency in do_forkZou Nan hai (1):
sched: some proc entries are missed in sched_domain sys_ctl debug code-

Thanks!!
That is what precisely I was waiting for to roll out the updates on the kernel I use (2.6.22 serie). Many thanks to all kernel developers for the hard work.
smp-only
Make sure you compile with CONFIG_SMP=y even if you have only one core. The -v24 backport patch (at least the 2.6.23.8 variant) doesn't work for uniprocessor kernels.
If it doesn't work then it's
If it doesn't work then it's a bug and should be reported to Ingo.
Works fine here on UP.
Works fine here on UP.
Sound familiar
Just like Roman's scheduler did months ago. Imagine that.
And your point is?
And your point is?
I really don't see a point in your statement, but you seem to be implying that Roman's scheduler was better just because it had one additional feature that is useful? Have you forgotten that, at that point, CFS had already had several more features compared to Roman's, such as group scheduling and instrumentation?
My point
My point is that the scheduler mafia routinely receives valuable contributions and ignores them. Then they deviously reimplement the ideas without giving proper attribution. For in outside contributer this is the worst place in the kernel to work. And, not surprisingly, this is technically the worst part of the kernel.
My POV
I'm seeing a completely inverse picture here.
In order to take advantage of Roman's contributions, the kernel team would have had to replace the whole CFS. That wouldn't have made much sense as I explained in my previous post.
Ironically, CFS was already fully functional when Roman ignored it and started writing his own reimplementation of CFS. Roman decided not to cooperate with other developers and add to CFS.
I personally found his exchanges with Ingo evasive, as if he didn't even want other developers to understand his scheduler. For instance, he was unwilling to break his work into a set of smaller patches, and this is absolutely essential to getting your code reviewed and accepted in the first place (even if it had made sense to throw out CFS completely at that point). Ingo even offered to do this work for him in order to learn from his scheduler, with the "RSDL".
Obviously nobody could force Roman to port his improvements over to CFS, so there was no other choice than to wait for someone else to do it, such as Peter Ziljstra.
I can't agree with this. Quoting Ingo's announcement: "That's 187 individual commits from 32 authors.". Only 80 of these commits came from Ingo. None of these contributions were "ignored by the scheduler mafia".
Just because some people fail to get along with kernel developers and make a huge fuss about it, doesn't mean that this is the case in general.
My POV
Nice POV ,, but ,, why bother? Everyone should know all these by now. Yet, some guys keep telling the same old story over and over again. It's something like football to them, they don't really care about arguments, facts and reality.
pfff... :)
Pot calling the kettle black
You mean like way Con's SD scheduler was already fully functional when Molnar wrote is own reimplementation CFS? The patches thing was a ruse. Molnar is known to stonewall contributers in this way, never honestly intending to merge their code. The question came down to was Molnar able or willing to understand Roman's work? The answer is a definitive no. A lot of us think that today's CFS scheduler is joe code.
Ahhh... When the first
Ahhh... When the first attack is refuted, try another one. Then another. Then another.
Oh, and only answer the paragraph where you think you have an edge.
You'd do fine in politics.
*Re*implementation
CFS was not a "reimplementation" of the SD, because the design of the two schedulers is nothing alike.
Roman's scheduler pretty much re-used the same approach as CFS, with various tweaks (many of which had already been implemented into CFS by Peter Ziljstra, by the time Roman posted his scheduler).
Wrong
Just like Roman's scheduler did months ago.
No, the RFS patch did not do that at all.
Take a look at the check_preempt_curr_fair() function in kernel/sched_norm.c that Roman wrote, it's using the same static timeslices that CFS is using: "gran_norm" is not load-dependent at all, it's static. (it's a modified version of the original CFS code and it did not change CFS's time-slicing logic.)
So your argument does not even pass the sniff test.
Just like Roman's scheduler
Much like something I did as an exercise, seven years ago or so--I don't think it's a revolutionary idea, but it's nice to see it in the kernel.
How to tune the scheduling on 2.6
Hi,
I have been using kernel 2.4 for a long time and I installed 2.6.22.12 and
2.6.23.8 last week. I find that when the CPU usage is 100%, kernel 2.6
becomes non responsive (sluggish). Currently, I am running kernel 2.4.35,
the CPU usage is 100% and I don't even notice.
I pointed my browser on kerneltrap and the first thing I see is Ingo's
message.
Is there a simple explanation as to why scheduling on kernel 2.6 is not
as good as on kernel 2.4.
Or are there parameters that I can set to improve interactivity under high load.
Thanks
Richard
Just to point out here, CFS
Just to point out here, CFS was merged into mainline for the 2.6.23 release, so you might want to check that kernel out. Other than that, there are quite a few reasons you could be having perceived sluggishness. One that seems common to me is not having proper DMA support in your kernel (IO slowness seems to make everything sluggish).
Lack of responsiveness at 100% CPU on 2.6 kernel
Thanks for the info. I am trying to get up to date.
What I am referring to by 'non responsiveness' is the lag
between the cursor movement on the screen and the mouse movement,
the time between typing a letter and seeing the character
on the screen, and general window operations such as getting
the focus on a window. All at 100% CPU.
Anyway I need to get used to 2.6. I was just surprised by
the difference in behavior between 2.4 and 2.6 on an otherwise
identical system and with mostly the same kernel parameters
(kernel 2.6 inherited most of the parameters from kernel 2.4
in my installation).
I noted already that DMA activation works differently on 2.6 than on 2.4.
I believe that DMA is activated by I still have to make sure.
Many of the options' names
Many of the options' names have changed between 2.4 and 2.6, it is probably just as easy to start from scratch when configuring a 2.6 kernel if coming from 2.4
The responsiveness of my pc is back to normal
Thanks for the nice comments. I replaced the hard disk IDE cable
with a 40 wire cable so my computer can now use dma5.
At last I think I have got the setup of kernel 2.6 right. My CPU is
currently running at 100% use (preparing a live dvd) and the response
is very good.
So I apologize for raising this issue.
But now, when I switch to a virtual console the screen becomes
dim. I tried 'setterm -half-bright off' with no effect.
After I boot, the brightness is normal, but after I switch
to another virtual console the text becomes almost unreadable.
I have searched the net and the kernel documentation without luck, yet.
Otherwise I feel comfortable running 2.6.