Linux: Choosing A CPU Scheduler

Submitted by Jeremy
on June 28, 2004 - 9:44pm

Those interested in testing the various efforts focused on improving Linux desktop performance may wish to try the patches available from the CPU Scheduler Evaluation project page. Peter Williams announced his latest patch against the 2.6.7 stable Linux kernel which provides the ability to select a CPU scheduler at runtime. The current version of the patch allows a user to select either Con Kolivas [interview]' staircase scheduler [story], or Peter's own priority based scheduler.

Other patchsets designed to improve performance and worth trying out include Con Kolivas' -ck patchset, currently at 2.6.7-ck3 [story], and Nick Piggins [interview]' -np patchset, currently at 2.6.7-np2 [story]. Con's patchset which applies against the plain 2.6.7 kernel is described as, "patches designed to improve system responsiveness with specific emphasis on the desktop, but has scheduler changes suitable/configurable to any workload". Nick's patchset, which applies against the 2.6.7-mm3 kernel, provides a different CPU scheduler and additional memory management work. For the interested, Nick advises, "if anyone is having swapping or interactivity problems, please try it out."


From: Peter Williams [email blocked]
To: Linux Kernel Mailing List [email blocked]
Subject: [PATCH] CPU scheduler evaluation tool
Date: 	Mon, 28 Jun 2004 16:37:33 +1000

To facilitate the comparative evaluation of alternative CPU schedulers a 
patch that allows the run time selection of CPU scheduler between 
version 7.7 of Con Kolivas's staircase scheduler ("sc") and the priority 
based scheduler with interactive and throughput bonuses ("pb") has been 
created.  This patch is available for download at:

<http://prdownloads.sourceforge.net/cpuse/patch-2.6.7-spa_hydra_FULL-v1.2?download&gt;

The file /proc/sys/kernel/cpusched/mode has been added to those provided 
by the "pb" to control which of the schedulers is in control.  The 
string "sc" is used to select the staircase scheduler and "pb" to select 
the priority based scheduler described above.  The staircase scheduler 
control parameters "compute" and "interactive" have been moved into 
/proc/sys/kernel/cpusched along with the "pb" scheduler control 
parameters.  The scheduler starts in the "sc" mode. A primitive 
Glade/PyGTK GUI that provides the ability to switch between schedulers 
and to control scheduler parameters is available at:

<http://prdownloads.sourceforge.net/cpuse/gcpuctl_hydra-1.0.tar.gz?download&gt;

This GUI should also work with a standard version of Con Kolivas's 
staircase scheduler as well as the version based on my single priority 
array patch:

<http://prdownloads.sourceforge.net/cpuse/patch-2.6.7-spa_sc_FULL-v1.2?download&gt;

and the basic priority based scheduler:

<http://prdownloads.sourceforge.net/cpuse/patch-2.6.7-spa_pb_FULL-v1.2?download&gt;

and, for those so inclined, these patches broken into smaller parts for 
easier digestion are available at:

<https://sourceforge.net/projects/cpuse/>

An entitlement based "eb" scheduler will be added to the selection 
available in the near future.

Controls:

base_promotion_interval -- (milliseconds) controls the interval between 
successive promotions (is multiplied by the number of active tasks on 
the CPU in question) NB no promotion occurs if there are less than 2 
active tasks

time_slice -- (milliseconds) the size of the time slice (i.e. how long 
it will be allowed to hold the CPU before it is kicked off to allow 
other tasks a chance to run) that is allocated to a task when it becomes 
active or finishes a time slice.  (min is 1 millisec and max is 1 second).

max_ia_bonus -- (a value between 0 and 10) that determines the maximum 
interactive bonus that a task can acquire

initial_ia_bonus -- (a value between 0 and 10) that determines the 
initial interactive bonus that a newly forked task will be given.  This 
value will be capped by the max_ia_bonus.

ia_threshold  -- (parts per thousand) is the sleep to (sleep + on_cpu) 
ratio above which a task will have its interactive bonus increased 
asymptotically towards the maximum

cpu_hog_threshold  -- (parts per thousand) is the usage rate above which 
a task will be considered a CPU hog and start to lose interactive bonus 
points if it has any

max_tpt_bonus -- (a value between 0 and 9) that determines the maximum 
throughput bonus that tasks may be awarded

log_at_exit - (0 or 1) turns off/on the logging of tasks' scheduling 
statistics at exit.  This feature is useful for determining the 
scheduling characteristics of relatively short lived tasks that run as 
part of some larger job such as a kernel build where trying to get time 
series data is impractical.

compute -- (0 or 1) turn on/off the staircase schedulers "compute" switch

interactive -- (0 or 1) turn on/off the staircase schedulers 
"interactive" mode

Peter
-- 
Peter Williams [email blocked]

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

Related Links:

runtime?

Anonymous
on
June 29, 2004 - 1:16am

now if it were only runtime selectable... like 'echo "ck" >/proc/scheduler" or something...

That's precisely what the pat

Anonymous
on
June 29, 2004 - 1:30am

That's precisely what the patch here does. It looks like there are only two available at the moment, however.

np2 test

Anonymous
on
June 29, 2004 - 4:50am

I've just switched from 2.6.7-rc3 (which ran fine with the occasional load explosion)to -np2, which sadly gave this:

nnrpd: page allocation failure. order:0, mode:0x20
 [] __alloc_pages+0x2c0/0x2dc
 [] scheduler_tick+0x1b2/0x1bc
 [] kmem_cache_alloc+0x3d/0x44
 [] __get_free_pages+0x1d/0x30
 [] kmem_getpages+0x20/0xcc
 [] cache_grow+0x91/0x114
 [] cache_alloc_refill+0x1af/0x204
 [] __kmalloc+0x63/0x80
 [] alloc_skb+0x41/0xf8
 [] e1000_alloc_rx_buffers+0x56/0xf8
...

frequently in various processes (nnrpd, kswapd0, swapper). I'll see how things go. This is a news server with 4 GB ram, >100 MB/s disk I/O and 400-500 Mb/s network traffic.

Allocation failure

Anonymous
on
June 29, 2004 - 5:01am

This is an atomic memory allocation failure from the network driver interrupt, not actually the process it mentions (that process is just the one that happens to be running when the network interrupt triggers).

-np has a couple of changes in the allocator, it sounds like this needs some work. Those allocation failure shouldn't be harmful... but try increasing /proc/sys/vm/mapped_page_cost.

Nick

wait

Anonymous
on
June 29, 2004 - 5:03am

That should be /proc/sys/vm/min_free_kbytes

try increasing that 2-4 times.

Will try

Anonymous
on
June 29, 2004 - 10:22am

But tomorrow... now it is a bit busy on that server. Now trying plain 2.6.7 for the first time, should be a good baseline.

Thanks

Anonymous
on
June 29, 2004 - 11:23am

OK, thanks. Please email me personally if you'd like (or anyone else with problems). You'd be able to find my email address from the mailing list.

Thanks
Nick

Swappable scheduler

jayesh senjaliya (not verified)
on
September 11, 2005 - 2:16pm

hello sir
i read about your work in switching scheduler runtime
so can u please tel me whats will be the advantage or performance issues and when there is a need to change the scheduler and from where i can get the Documentation and downloads....
please reply me on my mail address

Nick Piggin

Anonymous
on
June 29, 2004 - 4:53am

Does anyone tried the -np patch set?

That modifications on the mm system can be merged in mainline?

Use on mail server and mail address

Anonymous
on
July 1, 2004 - 2:51am

Your current mail is on yahoo in .au? Then you should have seen some updates already. FWIW after a reboot the problems crop up again:

Message from syslogd@testnews1 at Thu Jul  1 08:49:52 2004 ...
testnews1 kernel: flags:0x20480008 mapping:00000000 mapcount:0 count:0

Message from syslogd@testnews1 at Thu Jul  1 08:49:52 2004 ...
testnews1 kernel: Bad page state at free_hot_cold_page (in process 'kswapd0', page c13bd4c0)

Message from syslogd@testnews1 at Thu Jul  1 08:49:52 2004 ...
testnews1 kernel: Backtrace:

Message from syslogd@testnews1 at Thu Jul  1 08:49:53 2004 ...
testnews1 kernel: Trying to fix it up, but a reboot is needed

Message from syslogd@testnews1 at Thu Jul  1 08:49:54 2004 ...
testnews1 kernel: Bad page state at prep_new_page (in process 'innd', page c13bd4c0)

Message from syslogd@testnews1 at Thu Jul  1 08:49:54 2004 ...
testnews1 kernel: flags:0x20480008 mapping:00000000 mapcount:0 count:0

Message from syslogd@testnews1 at Thu Jul  1 08:49:54 2004 ...
testnews1 kernel: Backtrace:

Message from syslogd@testnews1 at Thu Jul  1 08:49:55 2004 ...
testnews1 kernel: Trying to fix it up, but a reboot is needed

Thomas

4KiB-pages are not for the best performance!!!

Anonymous
on
July 2, 2004 - 5:00pm

What are the matters of (4KiB, 2MiB & 4MiB)-pages from i386-architecture with 0-swappiness for minimize the TLB's misses?

open4free ©

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.