Linux: Comparing Scalability Between 2.4 And 2.6

Submitted by Jeremy
on December 5, 2003 - 11:53pm

Craig Thomas posted some interesting performance benchmark results using Rusty Russell's [interview] hackbench to compare the 2.4 scheduler to the 2.6 O(1) scheduler. As would be expected. Craig explained, "the results obtained seem to show that the 2.6 scheduler is more efficient and allows for greater scalability on larger systems."

Craig's email contains tables comparing the 2.4.18 kernel with the 2.6.0-test9 kernel. For those interested in a more visual presentation, follow this link, where he explains "[In 2.6] not only are processes scheduled more efficiently, but the scheduler has been redesigned to be more sclalable when the number of processes in a machine are increased."


From: Craig Thomas [email blocked]
To:  linux-kernel
Subject: Prcess scheduler Imiprovements in 2.6.0-test9
Date: 05 Dec 2003 10:55:22 -0800

OSDL has been running peformance tests with hackbench to measure the
improvment of the scheduler, compared with LInux 2.4.18.  We ran the
test on our Scalable Test Platform on different system sizes.  The
results obtained seem to show that the 2.6 scheduler is more
efficient and allows for greater scalability on larger systems.
See http://marc.theaimsgroup.com/?l=linux-kernel&m=100805466304516&w=2
for a description of hackbench.

The set of data below shows an average time of five hackbench runs
for each set of groups.  Linux 2.6.0-test9 clearly shows significan
improvement in the completion times.

Test set 1: Performance of hackbench

(times are in seconds, lower number is better)

number of groups     50     100     150     200
--------------------------------------------------
1 CPU
   2.4.18          15.52   37.63   74.34   110.62
   2.6.0-test9      9.91   17.86   27.55    39.77
--------------------------------------------------
2 CPUs
   2.4.18          10.50   30.42   64.26   112.46
   2.6.0-test9      7.44   13.45   19.68    26.68
--------------------------------------------------
4 CPUs
   2.4.18           7.07   22.75   54.10   101.45
   2.6.0-test9      5.16   9.25    13.64    18.65
--------------------------------------------------
8 CPUs
   2.4.18           7.02   24.63   61.48   114.93
   2.6.0-test9      4.08   7.15    10.31    13.84
--------------------------------------------------

The set of data below shows how many groups can be run before the
system failed with some resouce limitiation having been exceeded.
The kernel was not tuned, so this tests a defalut configuration.

Test set 2: Max Groups Before Out of Resource
(maximum nuber of groups that completed a successful run;  larger
numbers are better)

------------------------
1 CPU
   2.4.18           200
   2.6.0-test9      200
------------------------
2 CPUs
   2.4.18           225
   2.6.0-test9      350
------------------------
4 CPUs
   2.4.18           225
   2.6.0-test9      525
------------------------
8 CPUs
   2.4.18           225
   2.6.0-test9      425
------------------------

We have been running hackbench results up through 2.6.0-test11 and
we see no significant differences between test-11 and test9, so these
results should be valid for test-11 as well.

A write-up of these results (complete with graphical plots) is posted
at  http://developer.osdl.org/craiger/hackbench/index.html

-- 
Craig Thomas
craiger@osdl.org

Related Links:

Stunning

Anonymous
on
December 6, 2003 - 2:18am

I expected improvements, but this is stunnning. How are other areas of the kernel? I/O I expect will be lots improved, what about throughput? Networking performance? NFS? VM? Is it safe to say that, at the very least, we've caught up to BSD's historical strengths and likely surpased them in every area? I'd assume so. :-)

check this site: in some area

Anonymous
on
December 6, 2003 - 10:18am

check this site: in some areas BSD still scale a little better; but 2.6 at least as good or better as BSD in most areas. it also benchmarks 2.4, and *that* really doesn't scale as well als BSD currently does. i'd call it a tie. :)

FreeBSD

on
December 6, 2003 - 11:47am

From those results I would conclude that Linux 2.6 does much better than FreeBSD or any other free BSD. I don't think it is a good indicator of actual network performance, but it seems like its a very good indicator of the performance of the syscalls that it measures.

Its important to note we're talking about two types of scalability here, CPU scalability (how parallel the kernel is), and algorithmic scalability (O(1), O(n) algorithms, etc). All the tests I have seen show that Linux 2.6 is without peer in both these areas.

WRT CPU scalability, Linux 2.

Anonymous
on
December 6, 2003 - 5:55pm

WRT CPU scalability, Linux 2.4 is also well ahead of the BSDs, including FreeBSD 5.x.

Question

Anonymous
on
December 7, 2003 - 1:58am

Have there been any benchmarks done to showcase the CPU scalability b/w 2.4/2.6? I've seen a number of benchmarks measuring how fast it takes to do a number of tasks, or that show the asymptotic behavior of certain syscalls, but I haven't noticed many that showcase how additional CPUs affect a workload.

Scalability

Anonymous
on
December 8, 2003 - 3:43pm

You are correct. The benchmarks were only testing scalability based on the number of processes running for different numbers of CPUs. It did not test other types of scalability like the number of clients. Is testing against hundreds of processes really realistic?

i/o not so great particularly as-iosched

on
December 8, 2003 - 5:20am

Nick,Jens,Andrew Morton know about this but in case you are interested, this is OSDL's current tiobench numbers comparing 2.6/2.4

Nick has asked OSDL to tune some magic numbers. Maybe they might do the trick. So far, as-iosched is not as good compared to 2.4. deadline-iosched is great but that's not the default for 2.6

random reads/writes are the typical workloads experienced by most users

http://developer.osdl.org/judith/tiobench/test11/

The relativeness of scalability

Anonymous
on
December 7, 2003 - 4:59pm

If you go from 1-way to 4-way, you get ... double performance only. That is what the tests show.

So there is still room for shocking 20-80% improvement lkml announcements.

Simon

Re: The relativeness of scalability

on
December 7, 2003 - 8:09pm

But how much of this is hardware limitations. There are
in built scalability issues with SMP hardware itself, issues
such as bus sharing/locking, cache flushing... The result is
none linear scaling regardless of operating system.
I'm not saying this is all hardware limitations but I wonder
how much of this is hardware?
NUMA, clusters and hyperthreading are examples at an attempt
to solve these problems, then its over to the OS to make good
use of these architectures.

I think it's test specific

Anonymous
on
December 8, 2003 - 3:34am

There is always going to be overhead from managing more CPUs. The tests could have been chosen so the overhead didn't show up as much. These tests are good for development where you want to measure the overhead but they don't represent typical usage.

The real improvement that these graphs show is the effects of the O(1) scheduler.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.