Jeremy Jackson asked "which kernel debugger is 'best'?" on the Linux Kernel Mailing List. The responses from Andrew Morton and Keith Owens offer some interesting insight into the differences between the two main Linux kernel debuggers: kdb and kgdb. Long a sore point by many kernel developers, Linus Torvalds has refused to include a kernel debugger in the main kernel tree.
According to Keith Owens, the kdb maintainer, "kdb and kgdb are aimed at different debugging environments. kgdb requires a second machine containing the kernel compiled with -g, kdb lets you debug directly on the machine that failed, with or without compiling with -g. Almost all the differences flow from that design decision".
From the kdb home page, "This debugger is part of the linux kernel and provides a means of examining kernel memory and data structures while the system is operational. Additional commands may be easily added to format and display essential system data structures given an identifier or address of the data structure. Current command set allows complete control of kernel operations including single-stepping a processor, stopping upon execution of a specific instruction, stopping upon access (or modification) of a specific virtual memory location, stopping upon access to a register in the input-output address space, stack tracebacks for the current active task as well as for all other tasks (by process id), instruction disassembly, et. al."
From the kgdb home page (kgdb is maintained by Amit Kale) introduction, "kgdb is a source level debugger for linux kernel. It is used along with gdb to debug linux kernel. Kernel developers can debug a kernel similar to application programs with use of kgdb. It makes it possible to place breakpoints in kernel code, step through the code and observe variables."
What are people using? neither kdb or kgdb appear to support
2.5.7 (kdb does 2.5.5)... or do real men debug with prink() ?
-
From: Andrew Morton
Subject: Re: [QUESTION] which kernel debugger is "best"?
Date: Fri, 29 Mar 2002 19:18:39 -0800
Jeremy Jackson wrote:
>
> What are people using?
kgdb. Tried kdb and (sorry, Keith), it's not in the same
league. Not by miles.
> neither kdb or kgdb appear to support
> 2.5.7 (kdb does 2.5.5)...
General answer to this is to go for a foray in
http://www.zip.com.au/~akpm/linux/patches/
Which turns up
http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.7/kgdb.patch
> or do real men debug with prink() ?
I have done it both ways, extensively, for long periods.
The printk method is comically inefficient. The amount
of transparency whch kgdb gives to kernel internals is
extraordinary.
-
From: Keith Owens
Subject: Re: [QUESTION] which kernel debugger is "best"?
Date: Sat, 30 Mar 2002 14:46:02 +1100
On Fri, 29 Mar 2002 19:18:39 -0800,
Andrew Morton wrote:
>kgdb. Tried kdb and (sorry, Keith), it's not in the same
>league. Not by miles.
kdb and kgdb are aimed at different debugging environments. kgdb
requires a second machine containing the kernel compiled with -g, kdb
lets you debug directly on the machine that failed, with or without
compiling with -g. Almost all the differences flow from that design
decision.
Another important niggle to me is that kgdb requires the kernel to be
compiled with frame pointers, because that is all that gdb understands.
On ix86 the extra register pressure from dedicating ebp to frame
pointers can cause Heisenbugs. kdb works with and without frame
pointers.
Can kgdb handle the special hard wired calls that do not add frame
pointers, such as __down_failed? I doubt that gdb knows how to handle
those.
I am not knocking kgdb, it has its place. I see a spectrum of
debugging tools from UML through kgdb to kdb, each tool is aimed at a
different debugging environment. Pick the right tool.
From: Andrew Morton
Subject: Re: [QUESTION] which kernel debugger is "best"?
Date: Fri, 29 Mar 2002 20:24:05 -0800
Keith Owens wrote:
> >kgdb. Tried kdb and (sorry, Keith), it's not in the same
> >league. Not by miles.
>
>
> ..
> Pick the right tool.
I guess the distinction here is that I use kgdb for "development",
not for "debugging".
Displaying data structures, values of variables. Seeing what
state all tasks in the system are in, where they're sleeping,
where they're spending CPU, etc.
When adding ad-hoc inxtrumentation to the kernel, you don't
need to bother printing it out - just increment the counters
and go in take a look when desired.
And yes, kgdb mucks up call chains across down() because of the
lack of a frame pointer - backtraces don't display who called
down() - it loses the innermost frame. That's irritating,
but not enough to have motivated me to soil my hands with
x86 assembly yet.
I haven't had any problems with -fno-omit-frame-pointer at
any time.
I *have* had problems with -fno-inline. I'd very much like
to be able to turn that on, but the presence of `extern inline'
functions causes a link failure with `-fno-inline'. I'd suggest
that this is a gcc shortcoming. I actually had a poke yesterday
at teaching gcc to convert extern inline to static inline if
flag_no_inline, but it didn't work out.
kgdb is damned inconvenient. You have to set up a cross-build
machine, serial cable and generally get organised to use it.
In reality, this would take an hour or so but it is some friction.
I would like to see kdb shipped in the mainline kernel, so that
we can get better diagnostic reports from users/testers.
On Fri, 29 Mar 2002 18:38:50 -0800,
"Jeremy Jackson" wrote:
>What are people using? neither kdb or kgdb appear to support
>2.5.7 (kdb does 2.5.5)... or do real men debug with prink() ?
I just uploaded kdb patches for 2.5.7 to
ftp://oss.sgi.com/projects/kdb/download/v2.1. They compile but have not
been booted, I don't have much time to work on 2.5 kernels. I have no
idea if it will work with a preemptible kernel or not.
So why isn't one in the kernel?
Linus's justification of the whole BitKeeper license argument was "the best tool for the job, regardless of some ideology." So why isn't there a kernel debugger in the kernel, rather than "you should know what your code is doing"? Has Linus's stance on this show any sign of changing?
Re: So why isn't one in the kernel?
That doesn't mean he has to include it in his tree.
I mean he didn't include anyhing BK-specific either ;)
--
I used to have a sig until the great Kahuna of FOOness
told me to dump it and use /dev/urandom instead.
I'm wondering
What is Linus' argument for NOT including a debugger in the kernel ?
I mean, it would have to be a REALLY good argument, becuase a debugger would be a handy tool, now that the kernel has grown so big and complex.
Why linus doesn't want a debugger
When Keith Owens was asked (in kerneltrap's interview with him) why linus didn't want a debugger, he mentioned this link.
Problems can only exist when patches are accepted!
All of Linus' arguments against kernel debuggers would seems to indicate
the problem is with people using built-in debuggers to submit bug fixes.
The problem isn't with the person writing the patch (perhaps with the
aid of a kernel debugger). The problem is with the code maintainer
if they apply the patch without knowing what the patch does.
It is the responsibility of the code maintainer to accept or deny patches.
To deny that the user community can not possibly provide useful
information (in the form of a patch) by not allowing a debugger
to be included in the stock kernel is equivalent to a police detective
ignoring eyewitnesses entirely over forensic evidence.
Sure, eyewitnesses can be mistaken, but their testimony is evidence.
The insight given by a patch is valuable.
What's wrong with having a patch supplied as part of a problem description?
Including a kernel debugger in the stock kernel can only increase the
number of patches for problems from the user community.
Many users would be able to produce "some" patch if a stock kernel debugger
were built-in. That creates more evidence for their problems.
What's wrong with that?
reasons
The real question is what are the reasons to include a debugger? Who does it help to include a debugger in the kernel?
It might help developers. But if they want a debugger it's trivial for them to install it themselves.
Also some people would argue that installing a debugger makes developers grow sloppy. If they don't have a debugger then they have to look at the code to figure out what's going on. This makes them ask questions about why the code does what it does. With a debugger you might just find out where the error occurs and comment it out or add a check etc. But then you've fixed the symptom and ignored the disease.
A second possible reason to have a debugger is that it can lead to better bug reports. On the other hand, if companies think this is important, then it's trivial for them to distribute a debugger in their kernel. I think it would be interesting to see how many distributions do install kdb by default...
Another reason to install a debugger is that it forces kernel developers to maintain it, which makes life simpler for people who maintain the debugger. Right now Keith doesn't have time to keep it up to date with 2.5 but if it was included in the kernel then someone else would have to take care of that.
A fourth reason that people would like a debugger in the stock kernel is that it would serve as "advertizing" for debuggers. Many people feel that kernel debuggers are under used and if the stock had a debugger probably more people would use it. But that really raises the question of if we want the kernel to be used for advertising stuff like that. If people want a debugger it's trivial for them to install it themselves.
I guess the real issue is that Linus doesn't want to advertize something that he doesn't believe in. The people who know how to use a debugger are also know enough to be able to install it themselves. While including a debugger would make life simpler for the people who currently maintain it, someone still has to do the work.
Good points
> Also some people would argue that installing a debugger makes
>developers grow sloppy. If they don't have a debugger then they have to
>look at the code to figure out what's going on. This makes them ask
Wrong. even with a debugger, you still have to look at the code
>questions about why the code does what it does. With a debugger you
Again, you still have to look at the code to figure out what it does. The debugger just *helps* you to know what the code is really doing.
perhaps...
Back in the day there was something called "clean room" software developement. Basically that meant that you never compiled your code until you were completely finished. This meant that one thing you couldn't do was compile the program, find where the error was, compile, fix and repeat. To me it makes a certain amount of sense that code written that way would be more logical and readable. Using printks also makes the compile, fix and repeat style software developement rather difficult.
Hack and fix style developement leads to bad code. This morning I was looking at some code that I had written earlier. I had accidentally reset a variable in the wrong place and someone else had fixed it. But instead of moving the code he added another reset right after my code. This didn't fix the problem when he tested it, so he next added a third reset before my reset and this fixed the problem.
There were a couple problems with this. First of all he left all three resets in the code even though it would have been cleaner with just one. Secondly he put resets in the function where he first noticed the problem, instead of in the logical place: before the function was called.
Of course, not everyone is going to make these kinds of mistakes. I can't even dream about being in the same league as Andrew Morton for example. His recent work with splitting apart and documenting the AA VM impressed me no end. Obviously Mr Morton doesn't just hack and fix...
Bad tool does not imply bad code
I for one am completely unswayed by Linus's argument.
Bad code can be written with or without the use of a kernel debugger.
Good code can be written with or without the use of a kernel debugger.
The quality of the code is a reflection of who wrote it, not the tools
he/she used. Taken to its extreme, Linus's argument implies that
compilers are worthless as well. After all, if we all had to code
directly in assembly language, we surely would think about the code we
were going to write for a *long time* before we wrote it, and that
most likely would lead to better, tighter, faster code. But you
don't hear Linus advocating asm-only kernel coding, do you?