Craig Thomas posted some interesting performance benchmark results using Rusty Russell's [interview] hackbench to compare the 2.4 scheduler to the 2.6 O(1) scheduler. As would be expected. Craig explained, "the results obtained seem to show that the 2.6 scheduler is more efficient and allows for greater scalability on larger systems."
Craig's email contains tables comparing the 2.4.18 kernel with the 2.6.0-test9 kernel. For those interested in a more visual presentation, follow this link, where he explains "[In 2.6] not only are processes scheduled more efficiently, but the scheduler has been redesigned to be more sclalable when the number of processes in a machine are increased."
From: Craig Thomas [email blocked] To: linux-kernel Subject: Prcess scheduler Imiprovements in 2.6.0-test9 Date: 05 Dec 2003 10:55:22 -0800 OSDL has been running peformance tests with hackbench to measure the improvment of the scheduler, compared with LInux 2.4.18. We ran the test on our Scalable Test Platform on different system sizes. The results obtained seem to show that the 2.6 scheduler is more efficient and allows for greater scalability on larger systems. See http://marc.theaimsgroup.com/?l=linux-kernel&m=100805466304516&w=2 for a description of hackbench. The set of data below shows an average time of five hackbench runs for each set of groups. Linux 2.6.0-test9 clearly shows significan improvement in the completion times. Test set 1: Performance of hackbench (times are in seconds, lower number is better) number of groups 50 100 150 200 -------------------------------------------------- 1 CPU 2.4.18 15.52 37.63 74.34 110.62 2.6.0-test9 9.91 17.86 27.55 39.77 -------------------------------------------------- 2 CPUs 2.4.18 10.50 30.42 64.26 112.46 2.6.0-test9 7.44 13.45 19.68 26.68 -------------------------------------------------- 4 CPUs 2.4.18 7.07 22.75 54.10 101.45 2.6.0-test9 5.16 9.25 13.64 18.65 -------------------------------------------------- 8 CPUs 2.4.18 7.02 24.63 61.48 114.93 2.6.0-test9 4.08 7.15 10.31 13.84 -------------------------------------------------- The set of data below shows how many groups can be run before the system failed with some resouce limitiation having been exceeded. The kernel was not tuned, so this tests a defalut configuration. Test set 2: Max Groups Before Out of Resource (maximum nuber of groups that completed a successful run; larger numbers are better) ------------------------ 1 CPU 2.4.18 200 2.6.0-test9 200 ------------------------ 2 CPUs 2.4.18 225 2.6.0-test9 350 ------------------------ 4 CPUs 2.4.18 225 2.6.0-test9 525 ------------------------ 8 CPUs 2.4.18 225 2.6.0-test9 425 ------------------------ We have been running hackbench results up through 2.6.0-test11 and we see no significant differences between test-11 and test9, so these results should be valid for test-11 as well. A write-up of these results (complete with graphical plots) is posted at http://developer.osdl.org/craiger/hackbench/index.html -- Craig Thomas craiger@osdl.org
Stunning
I expected improvements, but this is stunnning. How are other areas of the kernel? I/O I expect will be lots improved, what about throughput? Networking performance? NFS? VM? Is it safe to say that, at the very least, we've caught up to BSD's historical strengths and likely surpased them in every area? I'd assume so. :-)
check this site: in some area
check this site: in some areas BSD still scale a little better; but 2.6 at least as good or better as BSD in most areas. it also benchmarks 2.4, and *that* really doesn't scale as well als BSD currently does. i'd call it a tie. :)
FreeBSD
From those results I would conclude that Linux 2.6 does much better than FreeBSD or any other free BSD. I don't think it is a good indicator of actual network performance, but it seems like its a very good indicator of the performance of the syscalls that it measures.
Its important to note we're talking about two types of scalability here, CPU scalability (how parallel the kernel is), and algorithmic scalability (O(1), O(n) algorithms, etc). All the tests I have seen show that Linux 2.6 is without peer in both these areas.
WRT CPU scalability, Linux 2.
WRT CPU scalability, Linux 2.4 is also well ahead of the BSDs, including FreeBSD 5.x.
Question
Have there been any benchmarks done to showcase the CPU scalability b/w 2.4/2.6? I've seen a number of benchmarks measuring how fast it takes to do a number of tasks, or that show the asymptotic behavior of certain syscalls, but I haven't noticed many that showcase how additional CPUs affect a workload.
Scalability
You are correct. The benchmarks were only testing scalability based on the number of processes running for different numbers of CPUs. It did not test other types of scalability like the number of clients. Is testing against hundreds of processes really realistic?
i/o not so great particularly as-iosched
Nick,Jens,Andrew Morton know about this but in case you are interested, this is OSDL's current tiobench numbers comparing 2.6/2.4
Nick has asked OSDL to tune some magic numbers. Maybe they might do the trick. So far, as-iosched is not as good compared to 2.4. deadline-iosched is great but that's not the default for 2.6
random reads/writes are the typical workloads experienced by most users
http://developer.osdl.org/judith/tiobench/test11/
The relativeness of scalability
If you go from 1-way to 4-way, you get ... double performance only. That is what the tests show.
So there is still room for shocking 20-80% improvement lkml announcements.
Simon
Re: The relativeness of scalability
But how much of this is hardware limitations. There are
in built scalability issues with SMP hardware itself, issues
such as bus sharing/locking, cache flushing... The result is
none linear scaling regardless of operating system.
I'm not saying this is all hardware limitations but I wonder
how much of this is hardware?
NUMA, clusters and hyperthreading are examples at an attempt
to solve these problems, then its over to the OS to make good
use of these architectures.
I think it's test specific
There is always going to be overhead from managing more CPUs. The tests could have been chosen so the overhead didn't show up as much. These tests are good for development where you want to measure the overhead but they don't represent typical usage.
The real improvement that these graphs show is the effects of the O(1) scheduler.