Linux: Staircase Performance Improvements

Submitted by Jeremy
on July 1, 2004 - 5:15am

Con Kolivas [interview] released on updated version of his staircase CPU scheduler [story]. Con explains [blog], "Version 7.7 was nice and stable but probably underperformed about 4 minor versions before it. The stability was necessary, though, because a whole swag of little annoying starvation issues made it into 7.4. This version adds a few more planned features, and has improved the performance substantially, and improved the fairness of the non-interactive and computational scheduler settings."

The scheduler is available as a patch against the 2.6.7 stable Linux kernel [story], as well as within Con's performance enhancing -ck patchset [story]. Regular users of this CPU scheduler or patchset will also be interested in the newly formed ck mailing list. Read on for more information about the updated scheduler.


From: Con Kolivas [email blocked]
To: linux kernel mailing list [email blocked]
Subject: [PATCH] Staircase scheduler v7.8
Date: 	Wed, 30 Jun 2004 22:40:48 +1000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This is a scheduler policy rewrite designed to be interactive by design
without tweaks or tuning and be lean and extensible for all sorts of
settings. (see previous announcements for more detail).


Patches (including incrementals from previous versions) against 2.6.7
can be downloaded from:
http://ck.kolivas.org/patches/2.6/2.6.7

For those with -ck kernels, the ck patchset was updated to 2.6.7-ck4
with no other changes to remain in sync with the staircase scheduler:
http://kernel.kolivas.org


Version 7.7 proved to be very stable so this version introduces some
planned improvements. So far no issues have shown up in testing, and
performance appears better.


Changes:
- - Yield logic made robust. Tasks that yield go after everything else,
but once scheduled are seen as their normal priority - lots of
applications use yield and this makes them behave a lot better.
- - Uninterruptible sleep has no effect on burst during interactive mode -
this improves the responsiveness under I/O load
- - The 'non-interactive' and 'compute' mode is now much stricter about
cpu distribution
- - Code cleanups


Patch not attached for brevity of email size.
7 files changed, 283 insertions(+), 610 deletions(-)
Signed-off-by: Con Kolivas [email blocked]

Comments, questions, patches and testing welcome,
Con

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFA4rTQZUg7+tp6mRURAi+tAJ9ZvacG1YlZPqLZP2qkwx1L3lTGGgCgkvkE
ekatU5O6OGH7r7Y8ID42SUE=
=HVc4
-----END PGP SIGNATURE-----

Related Links:

Performance measurements?

tholti
on
July 1, 2004 - 5:50am

Hi Con,
how do evaluate performance? Are there any testcases like contest or similar? In particular how do you test the non-interactive and computational scheduler performance and the I/O-throughput (IIRC there were some I/O problems with the first ck's for the 2.4.x series).
Thx for your work!
Cheers,
Thorsten

Choice of measurements

Con Kolivas
on
July 1, 2004 - 6:03am

There are two different performance metrics I was interested in; cpu throughput and scalability as measured by the regular benchmarks, and the good old fashioned try it on your desktop and see how fast/smooth it is.

The throughput benchmarks show equivalent or better than mainline even in the fully interactive mode, and some (like hackbench) show marked improvements in latency and potential for scalability. The gains in compute mode are substantial for cpu intensive tasks. I have been repeatedly testing them using the osdl resources and all the benchmarks are publicly available http://www.osdl.org .

As for the interactive feel, the design of the staircase scheduler is not actually "tweakable" like that of pretty much all other schedulers, and I've been looking for bugs in the way the scheduler works and fixing them along the way. These changes I have had a swag of people testing for me and found significant gains going from 7.7 to 7.8.

The idea behind this scheduler rewrite is that the interactive feel of staircase should be very good without tweaks, and thus is more predictable and works good in virtually all settings rather than have to be "tuned" as different test cases come up (like in 2.6 and other schedulers). Another aim of the rewrite was to make the actual design much simpler than the current one, so it behaves far less like that of a "state machine" and is predictable, and leaner. Finally the configurable options for strict cpu distribution in server usage and cache preservation in computational usage were things that people wanted out of their scheduler design, and the design of staircase lends itself easily to these modifications at runtime even.

Re: Choice of measurements

tholti
on
July 1, 2004 - 6:51am

Sounds very promising! After searching the OSDL site I was able to discover some of your testruns. Of course I can't make much of the results as I don't know any comparative numbers. If your scheduler gives so much improvements over the default one may be you should emphasize these advantages a little bit more on lkml (e.g. by posting some of the numbers).

BTW what are the differences to Nick Piggin's scheduler work (OR or XOR relationship)?

Posting results

Con Kolivas
on
July 1, 2004 - 6:57am

Well I did post some on the very first announcement. The truth is that 2.6 should not have a complete scheduler rewrite so this is purely for developmental trees. Nick's scheduler work is orthogonal to this; ie completely unrelated and there is no meaningful way of adding one to the other.

Nick Explanations

Anonymous
on
July 1, 2004 - 8:47am

Maybe Nick can give us some more detailed info on his work, as you have done for your staircase scheduler.

Nick,
what does your scheduler address, and how it does that?

Thanks for you both!!!

I am a big fan of you, Con!!! And I am using your scheduler, (actually -ck3), will test -ck4 soon.

...

Nick
on
July 1, 2004 - 9:05am

My scheduler basically rips lots of stuff out, like all the interactivity estimator stuff. It adds a few things where needed too, but they are generally simple and straightforward.

I haven't looked too closely at the other schedulers flying about these days, but my scheduler retains both priority queues (ie. active, expired arrays). Some interesting problems could arise when you use a single array, and I am somewhat interested to see how the SPA schedulers handle them :)

Anyway, I'm not too sure how widely tested my scheduler has been, but I have had a pretty good ratio of positive feedback. My scheduler has been around since about the start of 2.6, before Con's interactivity work went in. Unfortunately it was never a serious contender to be merged due to a fairly serious regression in the "reaim" test at OSDL, which I never really had the time to work through and fix... until now. Hence my renewed interest in it.

MM in your code

Anonymous
on
July 1, 2004 - 2:09pm

Is there any chance that the mm code to be merged apart from your scheduler?

Thanks!!!

MM changes

Nick
on
July 3, 2004 - 9:48pm

Yes, the memory management changes could be merged without the scheduler or vice versa as they don't rely on each other at all. That said, the mm changes would need a lot more testing and review before they could even be considered for Andrew's -mm patchset.

AS scheduler

Anonymous
on
July 1, 2004 - 2:11pm

Hi, Nick!

How do you compare CFQ with AS?

Who is better?

Who is to be the default in the long term?

Thanks for your response!

[OT] *sigh*

Mr_Z
on
July 1, 2004 - 3:08pm

Who else feels like there should be a big glowing link on any article that talks about "schedulers" that:

  • Carefully defines what the different types of schedulers are
  • Which type is being discussed in a given article

Between I/O, CPU, instruction, etc... a surprising amount of computing comes down to scheduling!

Offtopic but right

Anonymous
on
July 1, 2004 - 3:46pm

Yes, my previuos topic was offtopic, but I didnt messed up CPU and IO schedulers. I was referring to IO, for sure!

How can I rate messages too?

Some time ago I was rating them, but now it is not possible.

Can I be notified when someone responds my messages?

Sliding hopelessly further off-topic.

Mr_Z
on
July 1, 2004 - 4:31pm

There are enough people that get them confused that it's hard to tell sometimes. Perhaps next time, call it out. For instance, say "I know this is slightly off topic, but as long as we're talking about schedulers...."

IO schedulers

Nick
on
July 3, 2004 - 9:48pm

OK, going off topic a bit... I think AS is probably better than CFQ even for the desktop, however they both have different features, and there will definitely be places where CFQ is better than AS.

In the long term I think AS will probably remain the default, however I'd like to implement IO priorities for it too.

Thanks...

Anonymous
on
July 4, 2004 - 5:14am

That is what I was willing to hear.

Did you have plans to implement I/O priorities to 2.6
or it is a feature for 2.7?

please implement io

Anonymous (not verified)
on
February 25, 2007 - 6:23pm

please implement io priorities with write io control. That's what sucks most under Linux. Every time the write buffers get flushed to disk, the whole system hangs -- even with CFQ. At least let me limit the write bandwidth for background processes.

4K, nvidia, cfq, etc.

Anonymous
on
July 1, 2004 - 10:03pm

I have not setup any performance monitoring other then the feel of my system for the AS to cfq scheduler. There is a noticable difference. Now that Nvidia has their act together with the support of 4k stack this is just that much better. The E100 napi is awsome also if get anywhere from right around 400Kbs download on comcast here before it was around 330Kbs.
Keep up the good work.
Walt....

A little confused..

Anonymous
on
July 2, 2004 - 1:03am

Does Con's scheduler replace the 2.6 schedules and is it better? I hear that its performance is better, but than what is the downside? Does Nick's scheduler do the same as Con's?

Will either of these possibly become part of mainline in 2.8 or 3.0?

No idea

Anonymous
on
July 2, 2004 - 8:17am

Right now, there are a number of CPU schedulers, just like I/O schedulers, duking it out for the title of "best." But, just like the Zen Koan informs us, I suspect each of them is best at some scenario.

In all likelihood, I suspect something like this "selection" patch will get pulled along into 2.7, and people will play around with the different schedulers to figure out what ones work better in more scenarios than others. In the end, I think Linux would be best off with 2 or 3 different CPU and I/O schedulers and an easy way to select among them. That would allow the end user to pick interactive response over throughput, for instance.

Over time, I expect that at most 2 or 3 CPU schedulers and 2 or 3 I/O schedulers to ultimately be maintained well enough to stay in a mainline kernel. More than that, and your testing base becomes too fragmented. If nothing else, there will be a couple popular out-of-tree schedulers if mainline just sticks to 1. I imagine you'll end up with "desktop distros" using one combination of settings, "workstation distros" using a slightly different set, and "server distros" using still another. (And who knows, maybe "embedded distros" might need still different settings, but something tells me those kernels are heavily patched anyway.)

a question

Anonymous
on
July 3, 2004 - 8:50pm

Can anybody tell me how to properly apply this patch, or point to a place with a nice howto?

I've tried patch -p0 < [ck's patch] and it asked me for file location, after altering the patchfile a bit to point to my 2.6.7 (vanila) source it looks like it patched. Once I compiled and put the bzImage in the right place, it booted fine. But I have no indication that it's working nothing in /proc to suggest it either.

I'm not new to compiling kernels, but I have no patching experience.

I know it's something I'm forgetting, can anybody help?

an answer

Con Kolivas
on
July 3, 2004 - 9:01pm

If you're using the full -ck4 patch you can use

uname -a

and it will give you the name of your kernel; it should come up as

linux-2.6.7-ck4

If you're just using the staircase patch, the only indication that it has applied is if you have the extra controls (files)

/proc/sys/kernel/interactive

and

/proc/sys/kernel/compute

These files do not exist without the staircase scheduler.

patch-ck4 to patch-bk16ed linux??

Anonymous
on
July 4, 2004 - 5:52pm

can i patch it with no problem?

i am using linux-2.6.7-bk16.

--
Thanks!
http://befree.sf.net ( no it's not my homepage ;) )

bk snapshots

Con Kolivas
on
July 4, 2004 - 6:03pm

It will probably not apply. However I constantly keep a development version, and occasionally will diff against -bk patches (currently there is one against bk17) and put them up on my web server:
http://ck.kolivas.org/patches/2.6/2.6.7/
The bitkeeper patches I take down when I post a new bk snapshot.

Also...

Anonymous
on
July 4, 2004 - 5:08am

you have to enter your kernel source directory to
apply the patch.
So, for example, go to the
cd /usr/src/linux-2.6.7
cp patch-2.6.7-ck4.bz2 .
bunzip2 patch-2.6.7-ck4.bz2
patch -p1 < patch-2.6.7-ck4

I think you have mistaken -p1 with -p0

bzcat /path/to/patch-2.6.7-ck

Ano Nymous
on
July 4, 2004 - 3:06pm

bzcat /path/to/patch-2.6.7-ck4.bz2 | patch -p1

is easier/faster. Also you can use the --dry-run option with patch to mess around without causing any damage. If that went ok then run it again without the --dry-run option.

bzip2 -dc patch-2.6.7-x.bz2 |

Anonymous
on
July 4, 2004 - 3:12pm

bzip2 -dc patch-2.6.7-x.bz2 | patch -p1

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.