Bob Beck is an OpenBSD [1] developer from Edmonton in Canada. He's one of around 60 OpenBSD developers currently working in an undisclosed hotel somewhere in downtown Calgary at the 2005 OpenBSD hackathon [story [2]]. Bob was involved in setting up the infrastructure [story [3]], and was responsible for the annual barbecue at OpenBSD creator Theo de Raadt [interview [4]]'s house [story [5]]. Following these two days of effort that helped to make the hackathon possible, he finally sat down to work on spamd and catch up on email. One of the emails in his inbox caught his attention, leading to a day's effort about which he notes, "some Days end up far far far from where they start."
In the following article, Bob provides a first-person account of tracking down what began simply as a RAID performance issue, but ultimately turned out to be a problem with the idle loop that when fixed resulted in an impressive performance boost. Bob noted, "the idle loop is where the kernel spins when there is no work to do in userland, because of this, it's also where we catch and service many of our interrupts from drivers that may queue work to the device and then tsleep waiting for an interrupt from the card saying the work is done." Bob went on to explain that prior to today's fix, interrupts were handled appropriately when there was userland work happening, but not when there was nothing happening in userland and the kernel was simply waiting for device input/output. Read on for Bob's full account of the day, leading up to the discovery of the problem and the implementation of the fix, including performance numbers.
Bob Beck writes:
A day at the OpenBSD Hackathon.. Some Days end up far far far from
where they start.
Well, it's Monday at the OpenBSD hackathon. I'm tired - I've spent
two days putting together the world here, and doing a barbecue for everyone
with Theo. Now it's finally *my time*. I can sit down and work on what I want
to rip apart, I fire up a couple of emacs windows and start redoing the spamd
internals, and catch up on my mail. Hmm, interesting stuff is here from marco.
Marco Pereboom and I have been chasing RAID performance issues for a
couple of months. Horrible read perfomance, and it's down to a few machines.
Some are fast, and some really suck. Marco, Art Grabowski, and Toby
Weingartner have discovered a condition in locore.s in the idle loop where we
could lose interrupts!. Hmm. I get in and take a look with them. Aha, a race!
OK Mickey Shalayeff helps us with that one, and Toby and Dale Rahn help us
dive into the locore.s assembler. Yeah, I'm getting into this, but I have a
feeling I'm not in userland anymore toto.. but the hell with it, this is big,
I really want it fixed, and it's fun...
Ok, so we fix the race, we're up from 14 MB/sec to 50 MB/sec.
Better... but there are still issues... Now what? Hmm... the APM code appears
to affect us in the idle loop! if we don't call it as often, performance goes
up.. Marco and Toby start experimenting with calling APM only every N times,
with the idea to put in a knob. I hate knobs. I say "Call it only if apmd is
active!" Great! new diff time, it has good performance, we're up to 100MB/sec.
But wait! Theo steps in. No, we can't rely on userland being there because some
crappy laptops insist on APM being around to control thermal issues. We can't
use those semantics... (Grumble, groan..) Back to the drawing board, and Toby,
wisely decides to read the APM spec... When the light went on it nearly blinded
us, the noise level rises in the corner, and everyone starts getting involved.
We're calling the APM idle functions in the idle loop, but the APM bios
can do a "hlt". if we then we receive an interrupt we resume, but our
idle loop then does *another* hlt instruction after the APM call, instead
of continuing. (I will remember the look on Theo's face as I was
telling him this for a long time...)
Now that we know what's wrong, let's fix it right. Find the right
semantics. The room gets loud now. deraadt, art, tholo, marco, beck, weingart,
niklas, dale, nate, and a bunch of others start throwing ideas off of each
other. "make a thread".. "no too heavyweight".. "do it in softclock".. "no you
can't hlt there, and APM might"... "make a knob"... "no we hate knobs"... "do
it on ticks"... "Hey wait"... (Art types something horrible on my screen)..
"Use this".. Yes! that's it. Check the profiling counters and only call the
bios idling function when we show the CPU_IDLE counters are increasing. (as
opposed to every time we enter the idle loop, which may be when we are in a
tsleep driver). Yep, this is the right semantics for us...
Now, ami gets 105 MB/sec.
My laptop goes from 12 MB/sec to 25 MB/sec...
Nate's laptop goes from 12 MB/sec to 40 MB/sec...
"Holy crap. only machines with busted APM were ever fast!"
"this affects userland crypto too.."
"this affects Ethernet drivers"
"Hey, this makes battery life better too".
"This affects EVERYTHING on i386"
We go for beer, wait for the test reports to come in, and sure enough
this is good. Come back and commit... Wow, we've spent a whole day on the IDLE
LOOP! But man was it worth it.
Well, I've still not started my userland stuff.. but well, strange
stuff happens at a hackathon. The kind of stuff where you get enough people
in a room to chase stuff, and really get it right. Sometimes it's kind of fun
to get sucked off into dimensions you had no idea you'd be in.
Translations:
- Russian [6] (OSRC.info)