Interviews: Dave Jones

Submitted by Jeremy
on December 27, 2001 - 11:55am

Dave Jones currently lives in London, employed by SuSE as a Linux kernel hacker. In the past six months since he graduated from the University of Glamorgan he has gotten involved in an impressive range of kernel related projects, including Powertweak, x86info, OProfile and the Kernel Janitors Project. Additionally, he maintains a -dj patch for the 2.5 development kernel, helping to sync it with the stable 2.4 kernel as well as offering increased stability.


Jeremy Andrews:
Please share a little about yourself and your background...

Dave Jones:
I'm 27, I grew up in the valleys of South Wales. I had an interest in
computers from a young age, and misspent most of my youth learning the
intricacies of things like 68k assembly language.

After leaving school, I spent the better part of 10 years in various
jobs from mind-numbing data entry positions, to games developer, before
becoming disillusioned enough to decide to go to get my degree.

I spent a few years doing that, and then last summer, I got my BSc in
computer studies from the University of Glamorgan. After graduation, I
relocated to London. Moving here was partly due to having friends here,
and partly for work reasons. Whilst I work mostly from home, being within
an hours traveling distance from the office is useful at times.

JA:
When did you get started with Linux?

Dave Jones:
Before going to university I worked for a short time for (what is now) a
failed Amiga games startup. During this time, I discovered various GNU
tools, and had read about Linux in a few places, but never really 'found'
it until I went to university. The idea behind going to university was to
figure out what I was going to do with myself in 4 years time.
I never did find out, but I found Linux during my first year, and became
fascinated enough with it that before I knew it, it was eating up quite
a sizable chunk of my spare time. Unix related classes became easier as
I'd began teaching myself ahead of the class, which meant I ended up with
more spare time than some.. which went back into learning more about Linux.

By the time I got to my second year, I was getting to the stage where
it was apparent to me that Linux would feature in my future somehow, but
I had no idea the following year I'd be getting job offers from various
distributions & hardware manufacturers/vendors. It really all sort of
came from nowhere, and all at once. It was just after the initial hype of
various IPOs had died down, and things were starting to become 'normal' again.

I spent the last year at university juggling studies and working part time
for SuSE. In retrospect, I probably could have got a slightly better grade
had I not done so, but the opportunity was something I really couldn't turn
down, and I don't regret my decision one bit. It was tough with both competing
for time at some points though.

JA:
What do you do for SuSE?

Dave Jones:
I'm involved in quite a lot of projects to various extents. SuSE pretty much
give me a free reign to work on them, but primarly I'm employed as a
kernel hacker. Some of the projects I've become involved in the past year
have been Athlon support for John Levon's oprofile (and lots of subsequent
profiling work), a few releases of Powertweak, quite a bit of work at SuSE
on the AMD Sledgehammer project, the x86 side of Russell Kings cpufreq code,
Getting the x86info tool to a useful state. And lots of smaller projects like
getting the pci.ids repository back on its feet again. In addition I've done
some work on internal SuSE projects. It's been a busy 6 months since
graduation. 8-)

JA:
Do you then help decide what goes into the SuSE distibution kernels?

Dave Jones:
I have some involvement, forwarding patches I've tested, resyncing
updated drivers with the SuSE tree etc. Sometimes a driver author may
not decide the time is right to merge updates, or sometimes Linus
makes this decision.. This can lead to some drivers getting quite old,
and shipping drivers for up to date hardware is quite an involved task
when drivers for example support multiple versions of the hardware.

Blindly updating isn't possible, so we try to test updates on as many
configurations as we can before we get them into the SuSE tree.

Other than this, most of the decisions are made in Germany where
all the main development takes place. Whilst it's possible for a
lot of work to be done by mail, I probably miss out on lots of the
"in the corridor" discussions that go on in the environments.

JA:
We spoke with John Levon
earlier about oprofile. You mention above that you work with this tool,
offering the Athlon support. Are you using this tool at SuSE?

Dave Jones:
Yes, it was partly the motivation for making it work on Athlon.
Like myself, many developers at SuSE have Athlon workstations,
and before I added the support, the tool was useless to us unless
we went to go find a Pentium II box to test on instead.

We're using it on a whole bunch of stuff, the kernel folks are profiling
all different areas, the YaST guys are also using it, and I expect
some other projects also.

JA:
What is involved in providing "Athlon support"?

Dave Jones:
It was actually quite trivial in retrospect counting the number of changes, but
it took us quite a while to figure out what was going on. The performance counter
registers on the Athlon have exactly the same bit layout, but have been moved
to different addresses. So the first hurdle was to hack some defines in to replace
the hardcoded register addresses.

Nothing is ever simple however, and this still didn't work. It took a few weeks
of experiments to find out why the APIC wasn't generating an interrupt on an
event, and what to do about it. At the time, the NMI code in Alans tree was
slightly more advanced, so I started using that to develop with.

After some quick and dirty hacks, we got it generating interrupts, and from
then on things fell into place. Some of the NMI setup code was taken from the
NMI watchdog setup in Alans tree, and merged into oprofile. It still didn't
work after this on a Linus kernel. That took another week or so of bug hunting.

The other big change needed was that Oprofile was initially hardcoded
to use the 2 counters that the Intel P6 architecture provides.
The Athlon has 4 counters, so we had to change lots of assumptions in
the code. Not difficult, but another of those boring tasks that needed doing.

Adding support for the Athlon has paved the way for any future implementation.
The Pentium 4 perfctr implementation for example is different again, but now that
we've got rid of all the assumptions, it shouldn't be too hard a port
for someone with the right hardware to play with.

JA:
I was just looking at Powertweak, a pretty interesting tool! Can you explain
what it does, and where the idea came from?

Dave Jones:
Something I'm commonly asked, so I made an overview page on the project
website just for this question.

A quick summary would be to say it's a collection of smaller tools that
function as plugins. These plugins are incredibly small, and easy to write.
The advantage of writing a plugin instead of a standalone tuning tool is it
takes away the boring GUI design etc, and just allows you to write the actual
tuning code. The code tells the core that "This is a boolean tweak" etc
and whatever user interface is being used (Currently GTK or ncurses/tvision)
will generate the correct look & feel. GUI feedback is then redirected back
to the plugin.

We've got plugins there to tune PCI registers (Which was where the project
grew from when I moved "Tune PCI bridges" from the 2.3 kernel to userspace),
sysctl plugins, a disk elevator plugin, a CPU MSR plugin, and a handful of
others.

JA:
Does Powertweak still have much in common with the original Windows version?

Dave Jones:
Initially there was very little in common. The only thing connecting
the two was the PCI tuning. We shared some info between each other on
various chipset features etc. Then there was a long period of silence between
both myself and the Windows version author. Since we rewrote everything from scratch
(over a year ago now), some parts of the Linux version have become more advanced
than the Windows version. One example is the PCI plugin now supports chipset
definitions in XML files. Adding support for a new chipset just means writing
a new one, and no code has to be touched at all. The Windows version will
aparently support the same format we've designed, which means not just they'll
be able to use our XML files, but we also get to use any new ones they create.

JA:
Powertweak is currently at version 0.99.4. What has to happen before it's
ready for a v1.0 release?

Dave Jones:
The config file parser needs a rewrite. It's the one corner that really
stands out as an ugly area of the code. The CPU plugin is currently
a little monolithic.. it supports a bunch of features to twiddle bits in
model specific registers, which is a good thing, but it would be better
implemented in a similar way to how we've done the PCI plugin.
(We have "core" plugins, and "dependant" plugins) It would clean up some
more messy code. Profile support is a really neat feature we have where you
can load "roles" such as "This computer is a laptop" "This is a firewall"
etc, and it'll change various values to known good presets for that
chosen configuration. It needs a bit more work, and we could use some extra
prewritten profiles.

JA:
If someone who's not too famliar with their system was to play around somewhat
randomly with the many settings in Powertweak, can they do anything to badly
damage their system?

Dave Jones:
There are methods of damaging quite a lot of modern hardware in software, and
it's surprisingly easy to do so. There are CPUs that have bits to disable
thermal throttling, bits to fiddle core voltages, multipliers, bus speeds etc.
In the prewritten XMLs that we ship with the package, we've left out any of
these potentially dangerous tweaks, as there's no valid reason for turning them
off. We've also classified the different plugins in order of "safety", and
ship seperate RPMs/Debs for an "extras" package containing the hardware
related plugins. The default ones you get without installing extras just
allows manipulation of kernel features like sysctls & ioctls.

JA:
Where is a good place someone new to your tool might start tuning for a
noticeable increase in system performance?

Dave Jones:
Another common question. There's no quick and dirty "Make my system fast" button.
A misconception people have is that by loading up the GUI and turning everything
on or up to its maximum their system will be faster, stronger, better..

The real answer would be to find out what the specific problem you are experiencing
is, and then hopefully, you'll find one of the plugins provides the functionality to
tune that feature. Use the IO elevator plugin to test for disk I/O improvements for
example. The elevator plugin is a good example of why "everything on 12" approach
is a bad thing, as by increasing the size of the buffer to be sorted may end up
with a more linear path, the latency involved during the sort may have killed any
potential win. So some things are a balancing act to be decided by the users
preference.

Some other things are more obvious.. A badly programmed BIOS may forget to enable
performance related bits in CPU registers perhaps. Or maybe PCI bridge registers.
Sometimes (although become more rare these days), BIOS writers play it safe, and
leave features in their "off" state if there's a known hardware compatability problem.
Even if the system doesn't contain that hardware. So some of the more experiemental
hardware based tweaks are of a "Trial and error" approach, although Powertweak can
give warnings if certain tweaks shouldn't be applied on the current system, as it
generally has a better view of the hardware than what the BIOS has been programmed
to look at.

JA:
You mention doing quite a bit of work at SuSE on the AMD 64-bit processor
(Sledgehammer) port. What has your involvement been with this project?

Dave Jones:
Quite varied. As you may be aware the initial codebase was the arch/i386 directory
copied to arch/x86-64. Over time, the x86-64 specific bits were added.
I've done considerable cleaning, removing the non-x86-64 leftovers that were
also copied across. On x86-64 for example we'll have no need for supporting
K6 MTRRs, or f00f bug (hopefully 8-), so a lot of this stuff hit the bit bucket.

As well as cleaning up the setup code, MTRR code etc, I've also got involved
with some profiling/x86-64 specific optimisation work that we probably won't
see the benefit from until we've actually got silicon to play with. The simulator
can only simulate so much, and performance testing there doesn't make much sense.

I'm also working on a few other projects that should be equally of use to
for any ix86, such as machine check exception decoders and the like.

JA:
You also mentioned working on the x86 side of Russell King's cpufreq code.
We spoke with Russell King in an
earlier interview, but we didn't talk
about cpufreq. What is it?

Dave Jones:
Quite a few CPUs these days allow changing of the voltage/multiplier/bus speed
through software. Russell and Erik Mouw did a bunch of work on the ARM CPUs
that support this feature, and started writing a generic framework for this
type of technology so that he wouldn't have to duplicate code that for eg,
recalculates loops_per_sec in every speed scaling implementation.

In parallel to this, Arjan van de Ven and myself were experimenting with Powernow!
on the AMD K6-2+. With help of someone who actually had a chip we could test
with, we managed to get a reverse engineered implementation of multiplier scaling
into Powertweak. (This code also appeared as a standalone tool called k6mult)

When we learned of Russells work, we started porting to the kernel interface he
had developed, and asked for a few extra bits to be added along the way.
It stayed that way for a while, until I got hold of some VIA Cyrix CPUs last
summer, and I wrote an implementation for the Cyrix III. It's quite a neat
chip that will run from 866MHz right down to 300MHz.

Speed changing is managed by manipulating a sysctl, voltage changing is done
by the driver, as it picks the best one for a given speed. In time, we'll have
userspace tools to define policy for how this sysctl gets manipulated.
Providing options such as "go slow when not under load", "go slow when battery
is running low" etc..

This technology is becoming more commonplace, especially in laptops and other
mobile devices. The cpufreq code currently supports just the two implementations
mentioned above, but we've got also Athlon Powernow support to come at some point
not in the too distant future. Intel speedstep is something of a problem as theres
no real specification on how it works other than "Talk to ACPI".

The difference between cpufreq & ACPI however, is that cpufreq does the scaling
implementation natively, and takes up around 4kb of kernel memory. ACPI requires
the full AML interpretor to be loaded which takes up a *lot* more kernel memory.
We also have the added advantage of not having to rely on BIOS writers shipping
AML that actually works.

This code is usable right now, it's in the armlinux CVS, and only requires
a few small cleanups before it's ready to be pushed to Linus for 2.5

JA:
Another x86 tool you mentioned was x86info. The only information about
this tool that I could find on its sourceforge page said, "This tool
dumps information of low-level CPU configuration." Can you talk more
about what it does, and why someone would use it?

Dave Jones:
Sure. It can dump the contents of every register it knows about for a given
CPU in both a hexdump format, and a human-parsable translated equivalent.
In short, it's a verbose "cat /proc/cpuinfo". Cache type information,
TLB type decoding, Model specific register parsing, and a whole bunch more.

JA:
All of this in the past six months since graduation!! How do you find
time to work on this many projects?

Dave Jones:
Round robin scheduling 8)

It's one big juggling game. After spending a few weeks doing nothing
but kernel hacking, it's nice to take a break for a few days and go fix up some
userspace problem, and come back to kernel hacking with a fresh outlook.
And having fingers in lots of pies means you always have an alternative to hack
on when you get stuck with a problem.

With each passing week I seem to find myself getting involved in something new.
I'm a sucker for talking myself into things.

JA:
How did you get started with the Kernel Janitor Project?
What does the group do?

Dave Jones:
The janitor project is something that initially began when Arnaldo Carvalho de Melo
started posting his TODO list of small items that needed fixing.
It became aparent quite quickly that there was interest there from people who
wanted to help out, but weren't exactly sure how to go about fixing some problems.
I helped Arnaldo organise some of the boring stuff like setting up the CVS/mailing
lists and other such things.. since then, it's pretty much running itself.

JA:
How many people are involved?

Dave Jones:
Based upon when I last counted the users subscribed to the lists,
there are probably around 150-200 people subscribed currently, although
not all of these are active. People tend to have 'bursts' of activity.
They'll pop up every 6 months with a dozen patches, and then hide again.

JA:
Your name shows up a couple times in the current 2.4.17-pre changelogs. In
-pre3 your name appears with "Enable ppro errata workaround", and again in -pre8
with "Remove mcheck_init() call from processor dependant code and put it in
unified codepath". Many people have complained about the changelogs being too
cryptic for the non-technical. Can you explain what you did here?

Dave Jones:
Both of these fixes are in an area of the kernel that myself and H. Peter Anvin
look after. On booting, we check the CPU vendor & model, and work around
any known bugs the chip may have. In the example above, the code to workaround
a problem was present, but wasn't activated. A simple addition to the Config.in
files for the relevant processor was all that was needed. Sometimes workarounds
for CPU bugs are just a few simple lines. "Flip bit X of this model specific register"
for example. Other times, they are somewhat more involved, especially if the
workaround causes problems for other members of the same CPU family. It involves
getting enough sample data from the possible affected CPUs (where tools like
x86info come in useful), and then cross-referencing dumps with datasheets.
Sometimes tedious work for just a dozen lines of code.

Other related fixes of this nature are the enabling of certain CPU features.
For example when I got a VIA Cyrix III to play with, I found a bunch of
things that needed doing, such as adding recognition to the MTRR code,
and working around another CPU bug where it reported the wrong L2 cache size.

This sort of thing happens pretty much every time a vendor makes a new CPU.

The second change you mention was a simple case where we were doing a
check to see "What Vendor made the CPU I'm running on" in the machine check
exception handler setup, and then every vendor that supported machine check
architecture had a call to mcheck_init(). By making calls to mcheck_init()
unconditional the per-vendor setup paths got a little cleaner.

JA:
I read that you are queuing up fixes that have gone into
2.4, ready to be passed off to Linus in 2.5 when the block layer settles down.
What does this involve?

Dave Jones:
I was keeping bits of the last 2.4.13-ac patch, and some bits of the .17 pre-patches
synced to 2.5 for my own experiments anyway. Initially I was just taking the bits
that affected hardware I had at my disposal and/or interested me, and ignored
the rest. At around 2.5.1pre10 time, Dave Miller suggested it would be a good
idea if someone kept the trees in sync ready for merging. 2.4.17rc1 hasn't really
deviated that much from 2.5 yet. The rejects were trivial, the harder part was
figuring out what was worth merging, and what bits to drop because there were
better versions in 2.5 (such as devfs & some drivers for eg).

For the most part its boring work. Fixing rejects by hand can take a while
when there's lots of them. As I mentioned, we're not that far apart in the
two trees really. My current diff is just over 1.6mb, but this also contains
things that have gone into 2.4 which are not bugfixes.
Things like new/updated drivers, support for features like CPU hyperthreading etc..
Hopefully it won't grow too large before we get a chance to resync.

Whilst the focus is on making sure the various 2.4 fixes stay in sync,
I'm also including small things to keep things usable for anyone
interested in experimenting with 2.5. Merging some things in advance also
cuts down on merge time when Linus then releases a new kernel (If he's
picked up the same set of fixes).

The plan is to wait until Linus is done with the block layer, find out
if he's ready to resync up as far as 2.4.17 (which should be final by then),
and then find out how he wants to do it. Right now he's not interested, and
wants to focus on getting bio into a solid solution before anything else goes
in. Then, after a sizable (all?) chunk is resynced, Linus can go focus on
another subsystem, and we probably start the process again.

A few people have drawn parallels to Alans patches, but for now at least, I've no
intention of merging things like new architectures / filesystems / other large features.
Tracking Marcelo & Linus, and picking up 'obvious' fixes from the kernel list
already takes up quite a bit of time. Doing something on the scale of what Alan was
doing would probably mean me giving up work on some other project.

There's been some talk recently about how difficult it is to get Linus to
accept patches these days. Whilst I've no objection to anyone sending me
updates/fixes etc for inclusion that Linus is silently dropping, I've no
intention of pushing anything like that to Linus when it comes to the resync.
2.5 is the development tree after all, not mine.

Once the first merge is over and we've got most of the pending bits out of the
way, I'll reconsider what direction my tree takes.

JA:
What main tools do you use when developing?

Dave Jones:
Nothing amazing software-wise, just vim, patch, diff, cvs..
I've a few perl scripts to spot common errors in code (things I wrote for
the janitor project), interdiff is at least partially useful..
When the resync starts I imagine Linus will want small chunks of patches,
so 'diffsplit' will probably be heavily used in the months ahead 8)

Hardware.. Most of my work is done on an Athlon box with all sorts of
toys, RAID, g550, dual-head monitors, and lots of RAM. In addition to
this, I've *lots* of other test boxes that see various uses, ranging from
stress-testing & profiling, to finding bugs in particular x86 implementations.

Working on things like the CPU Errata fixes means its useful to have a box with
any particular CPU to hand. The big advantage is that when someone reports
"New kernel doesn't boot for me on my Cyrix M2" I can go dig out a box with one,
and start looking at the problem almost instantly. The big disadvantage is the
storage space they all take up. 8-)

An addition to the x86 stuff, there are some old sparcs around too, but they've
not been used in nearly a year, through lack of time/interest.
Whilst the really fast machines are great for compiles etc, I find it just
as interesting to hack support for some random feature of some ancient hardware.

Whilst the Linux kernel is growing enormously, support for older hardware is
unlikely to be removed for as long as there's a single user ready to complain
that it no longer works. One of the things I've a personal interest in is making
sure our SMP startup code continues to support older boxes like the dual P5
I was fortunate enough to inherit.

Not just CPUs either, one of the items on my TODO list currently is to fix up
the driver for an ancient EISA network card. I just need to find a time when
I've nothing more urgent that needs doing. 8-)

JA:
Are the perl scripts you mentioned available online for others to use?

Dave Jones:
Yup, http://www.kerneljanitor.org/kernel-janitor/scripts/
Any comments on bad perl style are entirely my fault.
These grew out of an idle sunday evening when I decided to learn some perl.
It's ok for a quick hack, but could be done a thousand times better by someone
who knows their way around the language. But, as those with the knowledge didn't
begin anything I decided to up the ante a little.

JA:
Do you use other operating systems besides Linux?

Dave Jones:
I don't use any other OS any more, but I take a look at what the various BSD guys are
up to every so often. It's always useful in case they come up with something
interesting, such as the recent postings
you covered regarding the 'fsx' tool.
Porting that to Linux was trivial, and it turned up a bunch of bugs in several
of our filesystems.

JA:
What are some of the Linux filesystems fsx has been tested against? What were the
results? I remember reading on the lkml that ext2 was the only one to not turn
up any bugs.

Dave Jones:
We found a bunch of NFS bugs, at least one Reiserfs bug,
and I think someone else tested Samba and found that had problems also.
Ext2 is taking the beating and not batting an eyelid.
Ext3 also handled the abuse admirably.

These are just the fs's I had ready for testing on. There are a whole bunch
of others that still need testing. I imagine it's very likely the Cerberus folks will
implement this into their harness. It's certainly an extremely useful tool.
The ltp also look like they'll be including it in
their tests now that they got confirmation on the license.

There are also a bunch of ways to try and coerce the tool into breaking
things more quickly, like scripts that call sync(1) every second, running
multiple fsx instances, etc. Testing fs's over loopback may be something
else that could show up something. We've not had a decent coverage testing
tools before other than ltp & the cerberus test suite. The difference here is that
this tool doesn't require 2 day runs to show a problem.

The guy at Apple who wrote fsx seemed happy enough with merging the patch I did to
make things work under Linux. I've some other ideas for improving the tool as well
with some additional options. If these turn out to be worthwhile experiments,
I'll also make sure these find their way back to the guys at Apple.
Who knows maybe my changes will show up more of the FreeBSDs folks bugs too.

JA:
As anyone following Linux news is aware, Marcelo Tosatti recently took over the
maintenance of the 2.4.x stable kernel tree. Did you know Marcelo prior to his
recommendation by Alan Cox?

Dave Jones:
Yes. Besides knowing him through postings on linux-kernel, at one point, I
almost ended up working with him. A year and half ago when I was in the very
fortunate position of being able to choose an employer, Conectiva were one of
the vendors that were interested.
There was no real deciding factor why I chose to work for SuSE over Conectiva,
other than "It just happened".

JA:
Have you met Linus? Alan? Other kernel hackers?

Dave Jones:
Not met Linus yet, Alan several times at various events, many many others
at conferences such as Linux-Kongress, OLS, and various expos..
Obviously several times, I've also met Andrea Arcangeli, Andi Kleen,
and all the other SuSE kernel guys I work with.

JA:
What do you enjoy doing in your non-Linux time?

Dave Jones:
Sleep? 8-) Seriously, I'm a big music fan. I try to play bass and lead guitar
(neither of which I'm particularly good at), but it's fun all the same.
I try to see as much live music as I can find time for, which was another
deciding factor for me moving to London. There's nearly always something
interesting to go see/listen to.

Asides from that, I'm not unlike a few thousand other Linux hackers.
I enjoy going out for beer & food with friends etc, although it's sometimes
difficult to 'switch off' when your closest friends are as interested in Linux
as yourself, and at least one point in every night out the conversation will
turn to something Linux related. At least it gets me away from a CRT once
in a while 8)

JA:
Live music is always fun! Who have you enjoyed recently?

Dave Jones:
Most recently was "In the nursery", and "VNV Nation".
Both of which were better live than I anticipated, which is always a
good thing to see. Especially from synth based bands like VNV.
"In the nursery" was possibly the first time I've been in awe of
percussion oriented music. Highly recommended.

JA:
What tips and inspiration can you offer aspiring kernel hackers?

Dave Jones:
Persevere. Don't be disuaded by flames, no matter how severe.
When starting out in a few instances I had some pretty brain dead ideas.
I don't think I'm alone here either. Quite a few kernel hackers remember
ancient emails they'd rather everyone else forgets. The point is to progress
from this. Learn from mistakes you make, and once you have done so,
pass on what you've learned to other newcomers.

JA:
Is there anything else you'd like to add?

Dave Jones:
I'd like to thank the vendors that have been helpful to me over the past
few years. In particular VIA and AMD. If all vendors were as responsive as
these folks, hardware support under Linux would be an even prettier picture.
Some vendors are learning, some slower than others, but they're learning.

JA:
Thank you very much for your time! I'm off to tune my system with Powertweak...
:^)


Related Links:



About the interviewer: Jeremy Andrews was born and raised in Southeast Alaska. Currently he lives and works in South Florida. He maintains KernelTrap as a hobby in his spare time.