Those interested in testing the various efforts focused on improving Linux desktop performance may wish to try the patches available from the CPU Scheduler Evaluation project page. Peter Williams announced his latest patch against the 2.6.7 stable Linux kernel which provides the ability to select a CPU scheduler at runtime. The current version of the patch allows a user to select either Con Kolivas [interview]' staircase scheduler [story], or Peter's own priority based scheduler.
Other patchsets designed to improve performance and worth trying out include Con Kolivas' -ck patchset, currently at 2.6.7-ck3 [story], and Nick Piggins [interview]' -np patchset, currently at 2.6.7-np2 [story]. Con's patchset which applies against the plain 2.6.7 kernel is described as, "patches designed to improve system responsiveness with specific emphasis on the desktop, but has scheduler changes suitable/configurable to any workload". Nick's patchset, which applies against the 2.6.7-mm3 kernel, provides a different CPU scheduler and additional memory management work. For the interested, Nick advises, "if anyone is having swapping or interactivity problems, please try it out."
From: Peter Williams [email blocked]
To: Linux Kernel Mailing List [email blocked]
Subject: [PATCH] CPU scheduler evaluation tool
Date: Mon, 28 Jun 2004 16:37:33 +1000
To facilitate the comparative evaluation of alternative CPU schedulers a
patch that allows the run time selection of CPU scheduler between
version 7.7 of Con Kolivas's staircase scheduler ("sc") and the priority
based scheduler with interactive and throughput bonuses ("pb") has been
created. This patch is available for download at:
<http://prdownloads.sourceforge.net/cpuse/patch-2.6.7-spa_hydra_FULL-v1.2?download>
The file /proc/sys/kernel/cpusched/mode has been added to those provided
by the "pb" to control which of the schedulers is in control. The
string "sc" is used to select the staircase scheduler and "pb" to select
the priority based scheduler described above. The staircase scheduler
control parameters "compute" and "interactive" have been moved into
/proc/sys/kernel/cpusched along with the "pb" scheduler control
parameters. The scheduler starts in the "sc" mode. A primitive
Glade/PyGTK GUI that provides the ability to switch between schedulers
and to control scheduler parameters is available at:
<http://prdownloads.sourceforge.net/cpuse/gcpuctl_hydra-1.0.tar.gz?download>
This GUI should also work with a standard version of Con Kolivas's
staircase scheduler as well as the version based on my single priority
array patch:
<http://prdownloads.sourceforge.net/cpuse/patch-2.6.7-spa_sc_FULL-v1.2?download>
and the basic priority based scheduler:
<http://prdownloads.sourceforge.net/cpuse/patch-2.6.7-spa_pb_FULL-v1.2?download>
and, for those so inclined, these patches broken into smaller parts for
easier digestion are available at:
<https://sourceforge.net/projects/cpuse/>
An entitlement based "eb" scheduler will be added to the selection
available in the near future.
Controls:
base_promotion_interval -- (milliseconds) controls the interval between
successive promotions (is multiplied by the number of active tasks on
the CPU in question) NB no promotion occurs if there are less than 2
active tasks
time_slice -- (milliseconds) the size of the time slice (i.e. how long
it will be allowed to hold the CPU before it is kicked off to allow
other tasks a chance to run) that is allocated to a task when it becomes
active or finishes a time slice. (min is 1 millisec and max is 1 second).
max_ia_bonus -- (a value between 0 and 10) that determines the maximum
interactive bonus that a task can acquire
initial_ia_bonus -- (a value between 0 and 10) that determines the
initial interactive bonus that a newly forked task will be given. This
value will be capped by the max_ia_bonus.
ia_threshold -- (parts per thousand) is the sleep to (sleep + on_cpu)
ratio above which a task will have its interactive bonus increased
asymptotically towards the maximum
cpu_hog_threshold -- (parts per thousand) is the usage rate above which
a task will be considered a CPU hog and start to lose interactive bonus
points if it has any
max_tpt_bonus -- (a value between 0 and 9) that determines the maximum
throughput bonus that tasks may be awarded
log_at_exit - (0 or 1) turns off/on the logging of tasks' scheduling
statistics at exit. This feature is useful for determining the
scheduling characteristics of relatively short lived tasks that run as
part of some larger job such as a kernel build where trying to get time
series data is impractical.
compute -- (0 or 1) turn on/off the staircase schedulers "compute" switch
interactive -- (0 or 1) turn on/off the staircase schedulers
"interactive" mode
Peter
--
Peter Williams [email blocked]
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
runtime?
now if it were only runtime selectable... like 'echo "ck" >/proc/scheduler" or something...
That's precisely what the pat
That's precisely what the patch here does. It looks like there are only two available at the moment, however.
np2 test
I've just switched from 2.6.7-rc3 (which ran fine with the occasional load explosion)to -np2, which sadly gave this:
frequently in various processes (nnrpd, kswapd0, swapper). I'll see how things go. This is a news server with 4 GB ram, >100 MB/s disk I/O and 400-500 Mb/s network traffic.
Allocation failure
This is an atomic memory allocation failure from the network driver interrupt, not actually the process it mentions (that process is just the one that happens to be running when the network interrupt triggers).
-np has a couple of changes in the allocator, it sounds like this needs some work. Those allocation failure shouldn't be harmful... but try increasing /proc/sys/vm/mapped_page_cost.
Nick
wait
That should be /proc/sys/vm/min_free_kbytes
try increasing that 2-4 times.
Will try
But tomorrow... now it is a bit busy on that server. Now trying plain 2.6.7 for the first time, should be a good baseline.
Thanks
OK, thanks. Please email me personally if you'd like (or anyone else with problems). You'd be able to find my email address from the mailing list.
Thanks
Nick
Swappable scheduler
hello sir
i read about your work in switching scheduler runtime
so can u please tel me whats will be the advantage or performance issues and when there is a need to change the scheduler and from where i can get the Documentation and downloads....
please reply me on my mail address
Nick Piggin
Does anyone tried the -np patch set?
That modifications on the mm system can be merged in mainline?
Use on mail server and mail address
Your current mail is on yahoo in .au? Then you should have seen some updates already. FWIW after a reboot the problems crop up again:
Thomas
4KiB-pages are not for the best performance!!!
What are the matters of (4KiB, 2MiB & 4MiB)-pages from i386-architecture with 0-swappiness for minimize the TLB's misses?
open4free ©