login
Header Space

 
 

Linux: Interbench, An Interactivity Benchmark

July 12, 2005 - 8:04am
Submitted by Jeremy on July 12, 2005 - 8:04am.
Linux news

Con Kolivas [interview], a doctor specializing in anaesthesia, has released a new benchmark application called Interbench, designed to benchmark "interactivity". Con's earlier benchmark, Contest, was designed to measure responsiveness. In his release announcement, he begins, "there has been a lot of talk about what makes up a nice feeling desktop under linux. It comes down to two different but intimately related parameters which are not well defined." He goes on to explain that the two parameters are "responsiveness" and "interactivity". The former he defines as, "the rate at which your workloads can proceed under different load conditions." The latter he defines as, "the scheduling latency and jitter present in tasks where the user would notice a palpable deterioration under different load conditions." He continues, "responsiveness would allow you to continue using your machine without too much interruption to your work, whereas interactivity would allow you to play audio or video without any dropouts, or drag a gui window across the screen and have it render smoothly across the screen without jerks."

The new benchmarking tool emulates various cpu scheduling behaviors of interactive tasks, measuring latency and jitter. Con goes on to describe how the new tool can be used, noting, "in response to critisicm of difficulty in setting up my previous benchmark, contest, I've made this as simple as possible." He also provides some sample benchmark results, comparing the plain 2.6.13-rc1 kernel to one patched with his own -ck patchset. In his conclusion, he notes, "this was quite some time in the making... I realise there's so much more that could be done trying to simulate the interactive tasks and the loads, but this is a start, it's quite standardised and the results are reproducible."


From: Con Kolivas [email blocked]
To: linux kernel mailing list [email blocked]
Subject: [ANNOUNCE] Interbench v0.20 - Interactivity benchmark
Date:	Tue, 12 Jul 2005 21:10:41 +1000

	Interbench - The Linux Interactivity Benchmark v0.20

http://interbench.kolivas.org

direct download link:
http://ck.kolivas.org/apps/interbench/interbench-0.20.tar.bz2

	Introduction

This benchmark application is designed to benchmark interactivity in Linux.

	Interactivity, what is it?

There has been a lot of talk about what makes up a nice feeling desktop under
linux. It comes down to two different but intimately related parameters which
are not well defined. We often use the terms responsiveness and interactivity
in the same sentence, but I'd like to separate the two. As there is no formal
definition I prefer to define them as such:

Responsiveness: The rate at which your workloads can proceed under different
load conditions.

Interactivity: The scheduling latency and jitter present in tasks where the 
user would notice a palpable deterioration under different load conditions.

Responsiveness would allow you to continue using your machine without too much
interruption to your work, whereas interactivity would allow you to play audio
or video without any dropouts, or drag a gui window across the screen and have
it render smoothly across the screen without jerks .

Contest was a benchmark originally written by me to test system 
responsiveness, and interbench is a benchmark I wrote as a sequel to contest 
to test interactivity.

It is designed to measure the effect of changes in Linux kernel design or 
system
configuration changes such as cpu, I/O scheduler and filesystem changes and
options. With careful benchmarking, different hardware can be compared.


	What does it do?

It is designed to emulate the cpu scheduling behaviour of interactive tasks 
and
measure their scheduling latency and jitter. It does this with the tasks on
their own and then in the presence of various background loads, both with
configurable nice levels and the benchmarked tasks can be real time.


	How does it work?

First it benchmarks how best to reproduce a fixed percentage of cpu usage on 
the machine currently being used for the benchmark. It saves this to a file 
and then uses this for all subsequent runs to keep the emulation of cpu usage 
constant.

It runs a real time high priority timing thread that wakes up the thread or
threads of the simulated interactive tasks and then measures the latency in 
the time taken to schedule. As there is no accurate timer driven scheduling 
in linux the timing thread sleeps as accurately as linux kernel supports, and 
latency is considered as the time from this sleep till the simulated task 
gets scheduled.


	What interactive tasks are simulated and how?

X:
X is simulated as a thread that uses a variable amount of cpu ranging from 0 
to 100%. This simulates an idle gui where a window is grabbed and then 
dragged across the screen.

Audio:
Audio is simulated as a thread that tries to run at 50ms intervals that then
requires 5% cpu. This behaviour ignores any caching that would normally be 
done by well designed audio applications, but has been seen as the interval 
used to write to audio cards by a popular linux audio player. It also ignores 
any of the effects of different audio drivers and audio cards. Audio can also 
be run as a real time SCHED_FIFO task.

Video:
Video is simulated as a thread that tries to receive cpu 60 times per second
and uses 40% cpu. This would be quite a demanding video playback at 60fps. 
Like the audio simulator it ignores caching, drivers and video cards. As per 
audio, video can be run SCHED_FIFO.


	What loads are simulated?

None:
Otherwise idle system.

Video:
The video simulation thread is also used as a background load.

X:
The X simulation thread is used as a load.

Burn:
A configurable number of threads fully cpu bound (4 by default).

Write:
A streaming write to disk repeatedly of a file the size of physical ram.

Read:
Repeatedly reading a file from disk the size of physical ram (to avoid any
caching effects).

Compile:
Simulating a heavy 'make -j4' compilation by running Burn, Write and Read
concurrently.

Memload:
Simulating heavy memory and swap pressure by repeatedly accessing 110% of
available ram and moving it around and freeing it.


	What is measured and what does it mean?

1. The average scheduling latency (time to requesting cpu till actually 
getting
it) of deadlines met during the test period. 
2. The scheduling jitter is represented by calculating the standard deviation
of the latency
3. The maximum latency seen during the test period
4. Percentage of desired cpu
5. Percentage of deadlines met.

This data is output to console and saved to a file which is stamped with the
kernel name and date. Use fixed font for clarity:

	Sample:
--- Benchmarking X in the presence of loads ---
	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	  0.495 +/- 0.495         45		 100	         96
Video	   11.7 +/- 11.7        1815		89.6	       62.7
Burn	   27.9 +/- 28.1        3335		78.5	         44
Write	   4.02 +/- 4.03         372		  97	       78.7
Read	   1.09 +/- 1.09         158		99.7	         88
Compile	   28.8 +/- 28.8        3351		78.2	       43.7
Memload	   2.81 +/- 2.81         187		98.7	         85

What can be seen here is that never during this test run were all the so 
called deadlines met by the X simulator, although all the desired cpu was 
achieved under no load. In X terms this means that every bit of window 
movement was drawn while moving the window, but some were delayed and there 
was enough time to catch up before the next deadline. In the 'Burn' column we 
can see that only 44% of the deadlines were met, and only 78.5% of the 
desired cpu was achieved. This means that some deadlines were so late 
(%deadlines met was low) that some redraws were dropped entirely to catch up. 
In X terms this would translate into jerky movement, in audio it would be a 
skip, and in video it would be a dropped frame. Note that despite the massive 
maximum latency of >3seconds, the average latency is still less than 30ms. 
This is because redraws are dropped in order to catch up usually by these 
sorts of applications.


	What is relevant in the data?

The results pessimise quite a lot what happens in real world terms because 
they ignore the reality of buffering, but this allows us to pick up subtle 
differences more readily. In terms of what would be noticed by the end user,
dropping deadlines would make noticable clicks in audio, subtle visible frame
time delays in video, and loss of "smooth" movement in X. Dropping desired cpu
would be much more noticeable with audio skips, missed video frames or jerks
in window movement under X. The magnitude of these would be best represented 
by the maximum latency. When the deadlines are actually met, the average 
latency represents how "smooth" it would look. Average humans' limit of 
perception for jitter is in the order of 7ms. Trained audio observers might 
notice much less.


	How to use it?

In response to critisicm of difficulty in setting up my previous benchmark, 
contest, I've made this as simple as possible.

	Short version:
make
./interbench

	Longer version:
Build with 'make'. It is a single executable once built so if you desire to
install it simply copy the interbench binary wherever you like.

To get good reproducible data from it you should boot into runlevel one so
that nothing else is running on the machine. All power saving (cpu throttling,
cpu frequency modifications) must be disabled on the first run to get an
accurate measurement for cpu usage. You may enable them later if you are
benchmarking their effect on interactivity on that machine. Root is almost
mandatory for this benchmark, or real time privileges at the very least. You
need free disk space in the directory it is being run in the order of 2* your
physical ram for the disk loads. A default run in v0.20 takes about 15
minutes to complete, longer if your disk is slow.

Command line options supported:
interbench [-l <int>] [-L <int>] [-t <int] [-B <int>] [-N <int>] [-b] [-c] 
[-h] [-n] [-r]
 -l     Use <int> loops per sec (default: use saved benchmark)
 -L     Use cpu load of <int> with burn load (default: 4)
 -t     Seconds to run each benchmark (default: 30)
 -B     Nice the benchmarked thread to <int> (default: 0)
 -N     Nice the load thread to <int> (default: 0)
 -b     Benchmark loops_per_ms even if it is already known
 -c     Output to console only (default: use console and logfile)
 -r     Perform real time scheduling benchmarks (default: non-rt)
 -h     Show this help

There is one hidden option which is not supported by default, -u
which emulates a uniprocessor when run on an smp machine. The support for cpu
affinity is not built in by default because there are multiple versions of
the sched_setaffinity call in glibc that not only accept different variable
types but across architectures take different numbers of arguments. For x86
support you can change the '#if 0' in interbench.c to '#if 1' to enable the
affinity support to be built in. The function on x86_64 for those very keen
does not have the sizeof argument.


So how does -ck perform? As much as I'd like to say it was a walkover I have 
to admit you need to squint hard to be convinced that -ck is better overall. 
Both mainline and -ck perform better in different load settings:

The SCHED_NORMAL nice 0 runs are as below, performed on a pentium M 1.7Ghz:

Benchmarking kernel 2.6.13-rc1 with datestamp 200507121411

--- Benchmarking Audio in the presence of loads ---
	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	  0.003 +/- 0          0.005		 100	        100
Video	   1.02 +/- 0.487       1.68		 100	        100
X	   1.32 +/- 2.22          10		 100	        100
Burn	  0.518 +/- 306004        52		 100	         99
Write	  0.031 +/- 0.209       2.58		 100	        100
Read	  0.006 +/- 0.00173     0.01		 100	        100
Compile	   4.59 +/- 5.74         426		96.5	         94
Memload	  0.021 +/- 0.0697     0.659		 100	        100

--- Benchmarking Video in the presence of loads ---
	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	  0.003 +/- 0          0.005		 100	        100
X	   3.27 +/- 3.2         41.3		88.8	       77.7
Burn	  0.003 +/- 0.001      0.005		 100	        100
Write	  0.151 +/- 0.67          50		99.5	         99
Read	  0.004 +/- 0.00173    0.037		 100	        100
Compile	  0.025 +/- 0.248       4.81		 100	        100
Memload	  0.018 +/- 0.0572     0.715		 100	        100

--- Benchmarking X in the presence of loads ---
	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	  0.009 +/- 0.0966         1		 100	         99
Video	   4.46 +/- 4.43         572		91.9	         66
Burn	   1.58 +/- 1.58         156		 100	         98
Write	  0.002 +/- 0.0237         4		 100	         98
Read	  0.008 +/- 0.0797        15		 100	         96
Compile	  0.009 +/- 0.0896         2		 100	         99
Memload	  0.108 +/- 0.13          10		 100	         98


Benchmarking kernel 2.6.12-rc6-ck1 with datestamp 200507121345

--- Benchmarking Audio in the presence of loads ---
	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	  0.003 +/- 0          0.005		 100	        100
Video	  0.003 +/- 0          0.004		 100	        100
X	   2.53 +/- 3.01          11		 100	        100
Burn	  0.294 +/- 1.47          11		 100	        100
Write	  0.025 +/- 0.116       1.02		 100	        100
Read	  0.007 +/- 0.001       0.01		 100	        100
Compile	  0.393 +/- 1.68          11		 100	        100
Memload	  0.095 +/- 0.545          6		 100	        100

--- Benchmarking Video in the presence of loads ---
	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	  0.003 +/- 0.00245    0.052		 100	        100
X	   3.57 +/- 3.21        22.7		95.7	       91.3
Burn	  0.837 +/- 2.49          50		97.7	       95.5
Write	  0.094 +/- 0.596       16.7		 100	       99.8
Read	  0.005 +/- 0.00872    0.169		 100	        100
Compile	  0.543 +/- 1.91        33.3		98.8	       97.7
Memload	   0.21 +/- 0.836       16.7		99.7	       99.3

--- Benchmarking X in the presence of loads ---
	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	  0.009 +/- 0.0964         1		 100	         99
Video	   2.31 +/- 2.27         754		90.9	         65
Burn	  0.129 +/- 0.151         12		 100	         98
Write	  0.069 +/- 0.112          6		 100	         98
Read	  0.009 +/- 0.0896         1		 100	         99
Compile	  0.039 +/- 0.102          3		 100	         98
Memload	  0.004 +/- 0.0408         1		 100	         99


The full logs are available here (including niced runs and real time runs):
http://ck.kolivas.org/apps/interbench/2.6.13-rc1.log
http://ck.kolivas.org/apps/interbench/2.6.12-rc6-ck1.log

Thanks:
For help from Zwane Mwaikambo, Bert Hubert, Seth Arnold, Rik Van Riel,  
Nicholas Miell and John Levon. Aggelos Economopoulos for contest code, and
Bob Matthews for irman (mem_load) code.

This was quite some time in the making... I realise there's so much more that 
could be done trying to simulate the interactive tasks and the loads, but 
this is a start, it's quite standardised and the results are reproducible. 
Adding more code to simulate loads and threads to benchmark is quite easy if 
someone wishes to suggest or code up something I'm all ears. Of course 
bugfixes, comments and suggestions are most welcome.

Cheers,
Con Kolivas



Related Links:

Con is great.

July 12, 2005 - 10:58am
Federico (not verified)

As always, Con contribute is great and interesting.

I wonder how a Doctor could be so good in kernel developement ! :-)

Patience

July 12, 2005 - 11:41am
PayShunts (not verified)

Patients.

"the browser is the OS" ;-)

July 12, 2005 - 1:01pm

What Con did is great.
It's funny, though, that most of the interactivity issues I'm seeing now are related to the browser (Firefox) - when opening up a Slashdot page with lots of comments in a new tab, the browser is frozen for several seconds. Yeah, the OS has great interactivity, too bad the browser itself doesn't. ;-)

Try Firefox 1.0+: Deer Park Alpha

July 12, 2005 - 3:33pm
Anonymous5 (not verified)

There has been a massive improvement in speed with Firefox 1.0+ especially on the Linux platform.

As for your interactivity problem with Firefox under Linux, I have the same problem under windows. Firefox doesnt handle huge content too well.

Its fixed in the new firefox.

its not really fixed, when op

July 12, 2005 - 6:45pm
Anonymous

its not really fixed, when opening lots of tabs, it will cause firefox to screach to a slowness unseen in any app. Its a problem that really hasent been adressed, firefox loads pages even if they cant be seen, the end result being it spends more time loading then doing what the user wants in the current tab

Wrong. It's being addressed a

July 13, 2005 - 7:12am
AnonymousC (not verified)

Wrong. It's being addressed and apparently fixed in the 1.1 beta or whatwasit.

Not wrong

July 18, 2005 - 12:04pm
Anonymoose (not verified)

Unless you have some figures to back that up, I don't believe you. I tried the Deer Park build, and it may be a bit better than the old Firefox but it is far from how Konqueror and Opera behaves.

FFox

July 14, 2005 - 1:01pm
Anonymouse! (not verified)

"It's being adressed in the new version"
Sure... I still get this behaviour with the latest and greatest Firefox.
Possibly starting a new process for each "tab" could solve this? That way the task yould be left to the kernel which apparently does it well already.
(Or do new "tab"s start a new thread already?)

IE does this

July 14, 2005 - 9:42pm

Not sure how you'd do this with tabs but in IE there is a setting to run each window in a separate process, which helps a lot if something goes crazy and causes a crash.

Disto version?

July 13, 2005 - 4:44am
BenRoe (not verified)

Are you using the version of Firefox which came with your distro? I found the Ubuntu and Fedora core versions of Firefox were much slower than vanilla Firefox 1.0.4 downloaded from mozilla.org: I think it's because of the GNOME integration the distros add.

The ubuntu version would freeze almost completely for ages when opening more than 15 tabs at once on my system, while the mozilla version worked fine.

Nice to see that the standard

July 14, 2005 - 2:55am
AnonymousC (not verified)

Nice to see that the standard kernel is already almost as good as what you could get with the -ck patches. (In fact I wonder just how repeatable these numbers are. Chances are that the differences in most of the tests are due to random variation.)

Good work, kudos to the kernel team and especially Con.

Firefox

July 21, 2005 - 2:05pm
Slacker (not verified)

On the firefox subject, it's pretty slow on a low end machine anyway (celeron 566Mhz/320Mb ram). Epiphany and Galeon are much more responsive. Firefox is overrated proberly because of it's "cool" name.

There's still a problem in X, too

July 21, 2005 - 4:39pm
Henrik Clausen (not verified)

> there has been a lot of talk about what makes up a nice feeling desktop under linux.

One smallish, but very noticable thing is that if you click something _while_ moving the mouse, the actual click recorded will be some pixels off (depending of speed of mouse and speed of hardware). Some people at X.org found that it appears that the click event is sent after the _next_ move event, which would explain the behaviour. But they don't seem to interested in actually fixing it.

https://bugs.freedesktop.org/show_bug.cgi?id=1752

Mac OS X and Win XP both have this detail right. Makes clicking tiny stuff like Window edges much more reliable, and your desktop work becomes more fluent.

I hope we can get this silly bug fixed, it's been there for years.

This drives me absolutely bon

July 23, 2005 - 10:44pm

This drives me absolutely bonkers, BTW. I go to click the corner of one window to resize it and I end up raising something underneath it, or fire off a selection rectangle on the desktop. Or, I click on a tool in GIMP, but because the release event happens outside the tool widget, it doesn't actually select the tool.

Grrr...grrr....grrr.....

Memory pressure test

July 23, 2005 - 2:32am

I don't think the memory pressure test is necessarily being performed correctly. I have found that under 2.6, once the kernel has used about 50% of the available swap, it degenerates into a swap madhouse that comatoses the entire system. With 1.5GB of swap the system runs fine until 800MB or so is used. This is a ceiling -- swap usage won't go above this number, the system just goes out into the weeds. It is nice enough to leave 36MB of buffers in RAM though so it doesn't have to swap them back in or something... </rant>

I assume Con is actually walking the RAM after allocating it ...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
speck-geostationary