Linux: CFS Scheduler v19, Group Scheduling

Submitted by Jeremy
on July 7, 2007 - 11:46am

Ingo Molnar [interview] released version 19 of his CFS scheduler patchset [story], first released back in April [story]. He noted:

"The biggest user-visible change in -v19 is reworked sleeper fairness: it's similar in behavior to -v18 but works more consistently across nice levels. Fork-happy workloads (like kernel builds) should behave better as well. There are also a handful of speedups: unsigned math, 32-bit speedups, O(1) task pickup, debloating and other micro-optimizations."

Among the other changes found in this latest version of the scheduler patchset, Ingo noted, "merged the group-scheduling CFS-core changes from Srivatsa Vaddagiri [story]. This makes up for the bulk of the changes in -v19 but has no behavioral impact. The final group-fairness enabler patch is now a small and lean add-on patch to CFS."


From: Ingo Molnar [email blocked]
To: 	linux-kernel
Subject: [patch] CFS scheduler, -v19
Date:	Fri, 6 Jul 2007 19:33:19 +0200


i'm pleased to announce release -v19 of the CFS scheduler patchset.

The rolled-up CFS patch against today's -git kernel, v2.6.22-rc7, 
v2.6.22-rc6-mm1, v2.6.21.5 or v2.6.20.14 can be downloaded from the 
usual place:

    http://people.redhat.com/mingo/cfs-scheduler/
 
The biggest user-visible change in -v19 is reworked sleeper fairness: 
it's similar in behavior to -v18 but works more consistently across nice 
levels. Fork-happy workloads (like kernel builds) should behave better 
as well. There are also a handful of speedups: unsigned math, 32-bit 
speedups, O(1) task pickup, debloating and other micro-optimizations.

Changes since -v18:

 - merged the group-scheduling CFS-core changes from Srivatsa Vaddagiri. 
   This makes up for the bulk of the changes in -v19 but has no
   behavioral impact. The final group-fairness enabler patch is now a 
   small and lean add-on patch to CFS.

 - fix the bloat noticed by Andrew. On 32-bit it's now this:

      text    data     bss     dec     hex   filename
     24362    3905      24   28291    6e83   sched.o-rc7
     33015    2538      20   35573    8af5   sched.o-v18
     25805    2426      20   28251    6e5b   sched.o-v19

   so it's a net win compared to vanilla. On 64-bit it's even better:

      text    data     bss     dec     hex   filename
     35732   40314    2168   78214   13186   sched.o.x64-rc7
     41397   37642    2168   81207   13d37   sched.o.x64-v18
     36132   37410    2168   75710   127be   sched.o.x64-v19

   ( and there's also a +1.5K data win per CPU on x32, which is not
     shown here. [+3.0K data win per CPU on x64.] )

 - good number of core code updates, cleanups and streamlining.
   (Mike Galbraith, Srivatsa Vaddagiri, Dmitry Adamushko, me.)

 - use unsigned data types almost everywhere in CFS. This produces 
   faster and smaller code, and simplifies the logic.

 - turn as many 'u64' data types into 'unsigned long' as possible, to 
   reduce the 32-bit footprint and to reduce 64-bit arithmetics.

 - replaced the nr_running based 'sleep fairness' logic with a more 
   robust concept. The end-result is similar in behavior to v18, but 
   negative nice levels are handled much better in this scheme.

 - speedup: O(1) task pickup by Srivatsa Vaddagiri. [sleep/wakeup is
   O(log2(nr_running)).] This gives 5-10% better hackbench 100/500
   results on a 4-way box.

 - fix: set idle->sched_class back to &idle_sched_class in 
   migration_call(). (Dmitry Adamushko)

 - cleanup: use an enum for the sched_feature flags. (suggested by 
   Andrew Morton)

 - cleanup: turn the priority macros into inlines. (suggested by
   Andrew Morton)

 - (other cleanups suggested by Andrew Morton)

 - debug: split out the debugging data into CONFIG_SCHED_DEBUG.
 
As usual, any sort of feedback, bugreport, fix and suggestion is more 
than welcome!
 
	Ingo

Related Links:

With CFS Linux has got a

Anonymous (not verified)
on
July 7, 2007 - 12:35pm

With CFS Linux has got a quality scheduler like 4BSD of FreeBSD. Sacrificing some speed in favour of quality is always the best, it's UNIX not "hack the OS" :)

jeffr tech

FreeBSD? Does that even do

Anonymous (not verified)
on
July 7, 2007 - 1:10pm

FreeBSD? Does that even do SMP nowadays?

Are you trying to be funny

Anonymous (not verified)
on
July 8, 2007 - 3:45am

Are you trying to be funny or do you rally not know? Of course FreeBSD runs on SMP machines.

The FreeBSD developers have worked for the last five years to implement a fine grained locking system similar to what is used in Linux to allow multiple threads to run concurrently in the kernel.

Linux is still somewhat ahead in this area as there are more people working on it, but for common hardware, I defy you to notice significant differences in all around performance.

Last I checked,

Anonymous (not verified)
on
July 8, 2007 - 4:52am

Last I checked, FreeBSD-current's new scheduler (ULE?) still used a pretty crude ticks/HZ-based sleep-bonus model to implement interactivity.

That is a far cry from the nanoseconds based fairness model implemented by the CFS scheduler, both in terms of precision and in terms of quality of scheduling.

Re: FreeBSD? Does that even do

Cabal
on
July 8, 2007 - 9:17pm

Not only has FreeBSD done SMP for a decade, it has done fine-grain locking for many years, and it looks like Linux is still playing catch-up:

http://people.freebsd.org/~kris/scaling/scaling.png

http://people.freebsd.org/~kris/scaling/mysql.html

Nice troll, though.

Heh...

Mr_Z
on
July 9, 2007 - 12:40am

Gotta love the fine print:

.... and an uncommitted patch from Jeff Roberson [1] that addresses poor scalability of file descriptor locking (using a new sleepable mutex primitive); this patch is responsible for almost all of the performance and scaling improvements measured.

If that's the case then the benchmark is more a measure of that aspect than the benchmark in general.

I'm all for healthy competition between Linux and FreeBSD. They keep each other honest, and tend to be neck and neck with each other even if they're leagues ahead of everyone else.

I'm all for healthy

samb
on
July 10, 2007 - 5:39am

I'm all for healthy competition between Linux and FreeBSD. They keep each other honest, and tend to be neck and neck with each other even if they're leagues ahead of everyone else.

Linux and FreeBSD hasn't been neck and neck for many years. FreeBSD is occupying pretty much the exact same niche it did ten years ago, as primarily a web hosting platform. In the mean time, Linux has established itself on everything from mainframes and supercomputers to handhelds. These days Linux even does a lot of the heavy lifting at what used to be a FreeBSD stronghold, Yahoo!.

That's exactly why I hate

Anonymous (not verified)
on
July 9, 2007 - 3:46am

That's exactly why I hate the FreeBSD community.. It took them several years to get something decent and now they present *ONE* benchmark and claim that Linux is playing catch-up.. That must have been great for your ego ;) Funny that this benchmark http://people.freebsd.org/~kris/scaling/nickel.png didn't test Postgres on Linux, which doesn't have the scalability issue.. So it looks like they wanted to embarrass Linux with one benchmark where it didn't perform that well..

And please keep in mind that:

1.) FreeBSD 7 isn't released yet.
2.) Linux is usable right now.
3.) sched_smp is not committed and has still issues (read the current mainling list).
4.) sched_ule still has a dropoff between 4 and 10 threads.
5.) the old sched_4bsd will be the default scheduler for FreeBSD 7.
6.) the author of this benchmark didn't test the latest Linux kernel (which includes an important fix by Nick Piggin) whereas he used bleeding-edge FreeBSD...
7.) several people in #kernelnewbies couldn't reproduce the problem (after Nick's patch which is in 2.6.22 and an update to glibc 2.6)
8.) MySQL does some funny systemcalls which fail on Linux and produce nothing but overhead.. (Remove #ifdef HAVE_PTHREAD_SETSCHEDPARAM from mysys/my_pthread.c [MySQL sourcode] and recompile.. gives a decent boost)
9.) Thanks for your time :]

Way to go fanboy! Your

Anonymous (not verified)
on
July 9, 2007 - 9:51am

Way to go fanboy!

Your benchmark is skewed by a bug in MySQL/libc which does myriads of syscall which shouldn't be there.

And no, FreeBSD doesn't have useable SMP support for a decade. Not even for a few years.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.