logo
Published on KernelTrap (http://kerneltrap.org)

Interview: Con Kolivas

By Jeremy
Created Oct 16 2002 - 09:40

Con Kolivas, a practicing doctor in Australia, has written a benchmarking tool called ConTest which has proven to be tremendously useful to kernel developers, having been designed to compare the performance of different versions of the Linux kernel. He was kind enough to speak with us, explaining how he got started on this project, what makes his benchmark unique, and how to interpret the resulting output. Comparing the 2.5 development kernel to the 2.4 stable kernel, Con says, "a good 2.5 kernel (and that's not all of them) feels faster than 2.4 in most ways and this bodes well for 2.6." The interesting results from his frequent benchmarks back up this statement.

Con also describes his high performance patchset for the 2.4 stable kernel, currently at version 2.4.19-ck9. This patchset adds a number of performance boosting patches ideal for a desktop environment, such as the O(1) scheduler, kernel preemption, low latency and compressed caching. Read on for the full interview...


JA: Please share a little about yourself and your background...

[1]Con Kolivas: I'm 32 years old, live and grew up in Melbourne Australia, am very happily married and have a 9 month old son. I'm a little embarrassed people get me confused for a kernel hacker, as my real profession is very remote from IT. I'm a doctor; a specialist in anaesthesia.

JA: How and when did you get started with Linux?

Con Kolivas: I grew up with computers (geek) but did not work or study them in any formal manner. While studying medicine and then specialising I went a long time without computers at all (the post amiga days). When I finally got back to computers in about 97 I was incredibly frustrated with the microsoft based machines I only had to work with after being so happy with the performance and flexibility of much lower spec amiga machines. A friend introduced me to linux in 1997 but being too far removed from computers at the time I found it difficult to get started with it. In 1999 I decided to try again and got quite addicted (bordering on the obsessive) in a very short timespace. 6 months later I gave up on other OSs as I noticed linux had a momentum that would make it unstoppable, even if it definitely wasn't (and still isn't) the best tool for all tasks. I've used numerous distributions in the past but Mandrake gets me up and running with more things working with less fuss so I tend to stick with it.

JA: When did you first start reading kernel code?

Con Kolivas: 2.4.18 when I started trying to merge O1, preempt, low latency and compressed cache. After applying each patch I had to sort out the problems with each merge and found that looking at the code it made a lot of sense to me and I could sort out the problems - mind you I can't program in c at all. Look at the code for long enough and you start understanding what it is doing.

JA: You've recently been doing quite a lot of work on a benchmarking tool called Contest. How is this tool different than other benchmarks?

Con Kolivas: Long story to explain this. When I started merging interesting kernel patches for 2.4.18 that were known for improving system response initially people just gave me small amounts of positive feedback. When I posted that I had merged the patches for 2.4.19 for some reason it attracted a lot more feedback. This time I had people repeatedly asking me if I had benchmarked these patches; could I substantiate my claims that they made the system more responsive. I used the excellent resources of the open source development lab http://www.osdl.org [2] to benchmark my kernels and got the results I expected - virtually unchanged from a vanilla kernel. At about the same time Rik van Riel had been defending his -rmap patches repeatedly on lkml and the #kernelnewbies channel about the fact that although benchmarks didn't show any improvement in performance, users had found that it made a difference. Many lkml threads followed about how one thing benchmarked after the other was not a real measure of system responsiveness. None of the standard benchmarks available at the time would tell you that. I was quoted as saying "a good anecdote is worth a thousand benchmarks". We all know that if you start a cpu intensive process in the background, linux won't bat an eyelid with no noticeable slowdown in system response. Do a big file write or untar a file and try to do anything and be prepared to go make yourself a coffee while waiting. Rik encouraged me and others to "do something" about this on IRC. Repeatedly on lkml Linus was quoted as saying "if we can't measure it it doesn't exist" and Rik said "If we don't measure it our method of development will ensure it won't exist". Even though my c programming skills are shall we say bordering on the /dev/zero I had been thinking about this very thing and knew I could do it with a simple script.

Contest (pun and name courtesy of Rik van Riel) takes an easily reproducible thing to do - compile a kernel - which represents a whole swag of things a user may do and notice a slowdown on the machine; use heavy cpu, file IO etc and times it in different settings. It is run on as fresh a machine as possible in single user mode to eliminate the influence of other activity on the results. Then it flushes the memory and all the swap so the benchmark is always starting up "cold". Then it times a kernel compilation by itself, and in the presence of a number of different loads - a heavy context switching load (process_load) a heavy file write (IO_load), file read (read_load) memory grabbing (mem_load), extracting a tar (xtar_load), creating a tar (ctar_load) and a huge directory listing (list_load). The idea is that by doing it for the duration of the kernel compile it will increase the signal to noise ratio of the test and pick up slowdowns that we may momentarily notice when trying to do things on our machines. This was quite a departure from the "throughput" approach to benchmarking, and appears to more realistically represent what happens in the real world.

JA: For what sorts of benchmarking is Contest best suited?

Con Kolivas: Contest is a very specific tool for kernel comparisons. Because the tools (c compiler etc) and hardware do not change between benchmarks it can only note a difference between kernels. As I was watching kernel development for 2.5 head toward the heavy iron I, and many others, were concerned that the desktop was taking a back seat and that it would suffer and be worse by 2.6. A few threads on lkml went so far as saying this [change or that] would benefit NumaQ machines and be only a small detriment to ordinary machines. Contest is good at picking up changes that would cause real slowdowns on ordinary machines under stress. In fact, the pickup on these machines with contest is greater than big machines which tend to have hardware that compensate in one way or another. As it turns out, though, these slowdowns that affect smaller machines, if corrected, benefit across the board.

JA: For what sorts of benchmarking is Contest not well suited?

Con Kolivas: For just about everything else. Unfortunately although it's an easy to use script for any user, the results from a users point of view are not useful - kernel development was the intended audience.

Comparisons between hardware - even with minor changes - are meaningless. It's almost impossible to pickup what has caused the difference. A faster hard disk for example can speed up the benchmark by taking some of the load off the machine OR it can slow it down by being so busy writing the cpu chokes and doesn't get a chance to do anything else.

The traditional benchmark measures of throughput, iterations, data processed etc are not measured to any great extent. The background load is the only thing measured as lets say you start writing a file in the background - you want the machine to remain responsive, but you also want the file to be written as fast as possible. Contest gives a measure of the responsiveness in the kernel compilation time, and a measure of the file write as the number of "loads" in the result. Ideally time will be low first and loads high second but the balance can swing, and contest will demonstrate this.

JA: Have you received much feedback on your benchmarking tool and the results you regularly post to the Linux kernel mailing list?

Con Kolivas: I've received a lot of feedback. The most reassuring thing is that I've been getting feedback from the people who actually have the ability to act on the information themselves. This has prompted me to spend an unbelievable amount of time developing contest, as I didn't really have the skills to create it in the first place - just the idea - and their feedback has helped develop it. Andrew Morton has given me probably the best comments. Many of his ideas for loads have been incorporated into contest and he has the ability and knowledge of the workings to best interpret the data. Some of the recent feedback has suggested that people _with_ programming skills (unlike me) are developing other benchmarks based on the ideas I used in contest. This is a good thing. Although I am clear of most of the limitations of contest, I have to learn the skills to work around them. While it's good fun, I don't want biased or inaccurate results swaying kernel development the wrong way. I suspect it won't be long before someone realizes I'm a fraud and displaces me.

JA: What are some benchmarks that are beginning to use ideas originated with contest, and what aspects are being used?

Con Kolivas: Bill Davidsen, who's sig reads "Doing interesting things with little computers since 1979" is developing a responsiveness benchmark and just started posting his results to lkml which he is calling "resp1 trivial response time test". In his words "The benchmark runs a noload case, then forks a load process (or processes) in the background, pauses long enough to simulate user interaction (and get swapped out if the system is memory stressed), then reports the time it took to complete, including the ratio of loaded to noload time." The concept being used is that of trying to perform tasks in the presence of different loads much like contest does. It seems to be a very interesting benchmark and if he develops it I think contest will already have a successor.

JA: Based on your benchmarking tests, what observations can you make about the 2.5 development kernel, especially compared to the 2.4 stable kernel.

Con Kolivas: As most of the contest results show changes in scheduling, IO and VM, I can't really comment on any other area. It surprised me when I first started looking at the VM code just how restrained some of the 2.4 stuff was and how many changes have gone into 2.5. Despite this, there are heaps of great ideas for the VM that just won't make it into 2.6, and may not even be ready for 2.8 because of how stepwise the development needs to be. The results I've obtained show that sometimes the most minor changes can have enormous implications for one part of contest or the other. There is no doubt that file writing is a _lot_ faster in 2.5 and is so fast it can all but kill the machine. Secondly the kernel has become very swap happy. This has upset a few people and AKPM is heavily working on a vm_swappiness feature that still needs quite a lot of tuning to not choke the machine in either direction. However a good 2.5 kernel (and that's not all of them) feels faster than 2.4 in most ways and this bodes well for 2.6. What I have noticed with contest is that great code untested in one way or another is not necessarily a speedup unless it's tuned. Hopefully contest and any successor benchmarks can help that.

JA: What are some recent changes in 2.5 that have had significant impact on kernel performance?

Con Kolivas: When rmap was being integrated into 2.5 the results were less than great. At the same time Andrew Morton was heavily putting together his mm patch series which showed substantial improvements with contest (and other benchmarks) and his patches have gradually been incorporated into 2.5. His (and those contributing to his) vm patches have made enormous measurable gains. The deadline scheduler introduced significantly prevented kernel choking. His vm swappiness feature which still needs some tuning can be either very good or very bad for different contest results depending on whether it is set to low or high - he is modifying and tuning it for this to be tamed and the results are already improving dramatically. I came in to 2.5 too late to know exactly where the gains prior to this occurred, but almost across the board 2.5 outperforms 2.4 in contest.

JA: What do you mean when you say that file writing is so fast in 2.5 it can "all but kill the machine"?

ConTest Home Page [3] - (http://contest.kolivas.net [4])

  • -ck Patchset Home Page [5] - (http://kernel.kolivas.net [6])
  • KernelTrap Story On ConTest [7] - (http://kerneltrap.org/node.php?id=411 [8])
  • KernelTrap Story On -ck9 [9] - (http://kerneltrap.org/node.php?id=462 [10])
  • KernelTrap Story On -ck7 [11] - (http://kerneltrap.org/node.php?id=403 [12])

  • Source URL:
    http://kerneltrap.org/node/465