In a recent lkml thread the concept of dumping an image of the kernel's memory to swap when the kernel hits a bug was discussed. Linus Torvalds pointed out that such a feature wasn't useful to an operating system like Linux that can ran on such a diverse assortment of computers, "yes, in a controlled environment, dumping the whole memory image to disk may be the right thing to do. BUT: in a controlled environment, you'll never get the kind of usage that Linux gets. Why do you think Linux (and Windows, for that matter) took away a lot of the market from traditional UNIX?" He went on to explain that there are systems where swap is not larger than the size of the core so collecting a crash dump would not be possible, that Linux instead tries to acknowledge bugs without crashing, and quite often the bug is actually in the drivers, "writing to disk when the biggest problem is a driver to begin with is INSANE." Comparing Linux to Solaris he added, "so the fact is, Solaris is crap, and to a large degree Solaris is crap exactly _because_ it assumes that it runs in a 'controlled environment'."
Alan Cox went on to point out that there are also privacy issues, "there is an additional factor - dumps contain data which variously is - copyright third parties, protected by privacy laws, just personally private, security sensitive (eg browser history) and so on. The only reasons you can get dumps back in the hands of vendors is because there are strong formal agreements controlling where they go and what is done with them." He went on to note that dump utilities are also not user friendly, "diskdump (and even more so netdump) are useful in the hands of a developer crashing their own box just like kgdb, but not in the the normal and rational end user response of 'its broken, hit reset'". Linus heartily agread, and suggested that anyone willing to use kernel dumps would be better off debugging through a firewire connection, " if you've ever picked through a kernel dump after-the-fact, I just bet you could have done equally well with firewire, and it would have had _zero_ impact on your kernel image. Now, contrast that with kdump, and ask yourself: which one do you think is worth concentrating effort on?"
From: Chris Newport [email blocked] To: Ingo Molnar [email blocked] Subject: Re: [2/3] 2.6.22-rc2: known regressions v2 Date: Fri, 25 May 2007 12:53:29 +0100 Ingo Molnar wrote: >A BUG_ON() has a (much) lower likelyhood of being reported back - for >most users it is a "X just hung hard, there was nothing in the syslog, i >had to switch back to the older kernel" experience, and they do not have >a serial console to hook up (newer hardware often doesnt even have a >serial port). With the WARN_ON()s we have a _chance_ that despite the >seriousness of the bug, the message makes it to the syslog, until the >system comes to a screeching halt due to side-effects of the bug. > >in that sense i am part of the problem: i was adding WARN_ON()s that >werent true 'warnings' but 'bugs'. So i'd very much like to fix that >problem, but i'd also like to solve the (very serious and existing) >problem of BUG_ON()s making it less likely to get bugs reported back. > > > There is a fundamental problem in getting a decent log to debug a crashed kernel. Maybe we should take a hint from Solaris. If the kernel crashes Solaris dumps core to swap and sets a flag. At the next boot this image is copied to /var/adm/crashdump where it is preserved for future debugging. Obviously swap needs to be larger than core, but this is usually the case. On Sun machines this is fairly easy because the dump can be performed by the OBP, on other architectures it may be more difficult to still have enough working kernel to achieve the dump after a kernel panic. Just a thought .......
From: Linus Torvalds [email blocked] Subject: Re: [2/3] 2.6.22-rc2: known regressions v2 Date: Fri, 25 May 2007 09:45:28 -0700 (PDT) On Fri, 25 May 2007, Chris Newport wrote: > > Maybe we should take a hint from Solaris. No. Solaris is shit. They make their decisions based on "we control the hardware" kind of setup. > If the kernel crashes Solaris dumps core to swap and sets a flag. > At the next boot this image is copied to /var/adm/crashdump where > it is preserved for future debugging. Obviously swap needs to be > larger than core, but this is usually the case. (a) it's not necessarily the case at all on many systems (b) _most_ crashes that are real BUG()'s (rather than WARN_ON()'s) leave the system in such a fragile state that trying to write to disk is the _last_ thing you should do. Linux does the right thing: it tries to not make bugs fatal. Generally, you should see an oops, and things continue. Or a WARN_ON(), and things continue. But you should avoid the "the machine is now dead" cases. (c) have you looked at the size of drivers lately? I'd argue that *most* bugs by far happen in something driver-related, and most of our source code is likely drivers. Writing to disk when the biggest problem is a driver to begin with is INSANE. So the fact is, Solaris is crap, and to a large degree Solaris is crap exactly _because_ it assumes that it runs in a "controlled environment". Yes, in a controlled environment, dumping the whole memory image to disk may be the right thing to do. BUT: in a controlled environment, you'll never get the kind of usage that Linux gets. Why do you think Linux (and Windows, for that matter) took away a lot of the market from traditional UNIX? Answer: the traditional UNIX hardware/control model doesn't _work_. People want more flexibility, both on a hardware side and on a usage side. And once you have the flexibility, the "dump everything to disk" is simply not an option any more. Disk dumps etc are options at things like wall street. But look at the bug reports, and ask yourself how many of them happen at Wall Street, and how many of them would even be _relevant_ to somebody there? So forget about it. The whole model is totally broken. We need to make bug-reports short and sweet, enough so that random people can copy-and-paste them into an email or take a digital photo. Anything else IS TOTALLY INSANE AND USELESS! Linus
From: Alan Cox [email blocked] Subject: Re: [2/3] 2.6.22-rc2: known regressions v2 Date: Fri, 25 May 2007 18:03:04 +0100 Windsor, Berkshire, SL4 1TE, Y Deyrnas Gyfunol. Cofrestrwyd yng Nghymru a Lloegr o'r rhif cofrestru 3798903 > Disk dumps etc are options at things like wall street. But look at the bug > reports, and ask yourself how many of them happen at Wall Street, and how > many of them would even be _relevant_ to somebody there? There is an additional factor - dumps contain data which variously is - copyright third parties, protected by privacy laws, just personally private, security sensitive (eg browser history) and so on. The only reasons you can get dumps back in the hands of vendors is because there are strong formal agreements controlling where they go and what is done with them. Diskdump (and even more so netdump) are useful in the hands of a developer crashing their own box just like kgdb, but not in the the normal and rational end user response of "its broken, hit reset" Alan
From: Linus Torvalds [email blocked] Subject: Re: [2/3] 2.6.22-rc2: known regressions v2 Date: Fri, 25 May 2007 10:19:52 -0700 (PDT) On Fri, 25 May 2007, Alan Cox wrote: > > There is an additional factor - dumps contain data which variously is - > copyright third parties, protected by privacy laws, just personally > private, security sensitive (eg browser history) and so on. Yes. I'm sure we've had one or two crashdumps over the years that have actually clarified a bug. But I seriously doubt it is more than a handful. > Diskdump (and even more so netdump) are useful in the hands of a > developer crashing their own box just like kgdb, but not in the the > normal and rational end user response of "its broken, hit reset" Amen, brother. Even for developers, I suspect a _lot_ of people end up doing "ok, let's bisect this" or some other method to narrow it down to a specific case, and then staring at the source code once they get to that point. At least I hope so. Even in user space, you should generally use gdb to get a traceback and perhaps variable information, and then go look at the source code. Yes, dumps can (in theory) be useful for one-off issues, but I doubt many people have ever been able to get anything much more out of them than from a kernel "oops" message. For developers, I can heartily recommend the firewire-based remote debug facilities that the PowerPC people use. I've used it once or twice, and it is fairly simple and much better than a full dump (and it works even when the CPU is totally locked up, which is the best reason for using it). But 99% of the time, the problem doesn't happen on a developer machine, and even if it does, 90% of the time you really just want the traceback and register info that you get out of an oops. Linus
From: Andrew Morton [email blocked] Subject: Re: [2/3] 2.6.22-rc2: known regressions v2 Date: Fri, 25 May 2007 10:37:14 -0700 On Fri, 25 May 2007 10:19:52 -0700 (PDT) Linus Torvalds [email blocked] wrote: > > > On Fri, 25 May 2007, Alan Cox wrote: > > > > There is an additional factor - dumps contain data which variously is - > > copyright third parties, protected by privacy laws, just personally > > private, security sensitive (eg browser history) and so on. > > Yes. We're uninterested in pagecache and user memory and they should be omitted from the image (making it enormously smaller too). That leaves security keys and perhaps filenames, and these could probably be addressed. > I'm sure we've had one or two crashdumps over the years that have actually > clarified a bug. > > But I seriously doubt it is more than a handful. We've had a few more than that, but all the ones I recall actually came from the kdump developers who were hitting other bugs and who just happened to know how to drive the thing. > > Diskdump (and even more so netdump) are useful in the hands of a > > developer crashing their own box just like kgdb, but not in the the > > normal and rational end user response of "its broken, hit reset" > > Amen, brother. > > Even for developers, I suspect a _lot_ of people end up doing "ok, let's > bisect this" or some other method to narrow it down to a specific case, > and then staring at the source code once they get to that point. > > At least I hope so. Even in user space, you should generally use gdb to > get a traceback and perhaps variable information, and then go look at the > source code. > > Yes, dumps can (in theory) be useful for one-off issues, but I doubt many > people have ever been able to get anything much more out of them than from > a kernel "oops" message. > > For developers, I can heartily recommend the firewire-based remote debug > facilities that the PowerPC people use. I've used it once or twice, and it > is fairly simple and much better than a full dump (adn it works even when > the CPU is totally locked up, which is the best reason for using it). > > But 99% of the time, the problem doesn't happen on a developer machine, > and even if it does, 90% of the time you really just want the traceback > and register info that you get out of an oops. > Often we don't even get that: "I was in X and it didn't hit the logs". You can learn a hell of a lot by really carefully picking through kernel memory with gdb.
From: Linus Torvalds [email blocked] Subject: Re: [2/3] 2.6.22-rc2: known regressions v2 Date: Fri, 25 May 2007 10:50:38 -0700 (PDT) On Fri, 25 May 2007, Andrew Morton wrote: > > > > There is an additional factor - dumps contain data which variously is - > > > copyright third parties, protected by privacy laws, just personally > > > private, security sensitive (eg browser history) and so on. > > > > Yes. > > We're uninterested in pagecache and user memory and they should be omitted > from the image (making it enormously smaller too). The people who would use crash-dumps (big sensitive firms) don't trust you. And they'd be right not to trust you. You end up having a _lot_ of sensitive data even if you avoid user memory and page cache. The network buffers, the dentries, and just stale data that hasn't been overwritten. So if you end up having secure data on that machine, you should *never* send a dump to somebody you don't trust. For the financial companies (which are practically the only ones that would use dumps) there can even be legal reasons why they cannot do that! > That leaves security keys and perhaps filenames, and these could probably > be addressed. It leaves almost every single kernel allocation, and no, it cannot be addressed. How are you going to clear out the network packets that you have in memory? They're just kmalloc'ed. > > I'm sure we've had one or two crashdumps over the years that have actually > > clarified a bug. > > > > But I seriously doubt it is more than a handful. > > We've had a few more than that, but all the ones I recall actually came > from the kdump developers who were hitting other bugs and who just happened > to know how to drive the thing. Right, I don't dispute that some _developers_ might use dumping. I dispute that any user would practically ever use it. And even for developers, I suspect it's _so_ far down the list of things you do, that it's practically zero. > > But 99% of the time, the problem doesn't happen on a developer machine, > > and even if it does, 90% of the time you really just want the traceback > > and register info that you get out of an oops. > > Often we don't even get that: "I was in X and it didn't hit the logs". Yes. > You can learn a hell of a lot by really carefully picking through kernel > memory with gdb. .. but you can learn equally much with other methods that do *not* involve the pain and suffering that is a kernel dump. Setting up netconsole or the firewire tools is much easier. The firewire thing in particular is nice, because it doesn't actually rely on the target having to even know about it (other than enabling the "remote DMA access" thing once on bootup). If you've ever picked through a kernel dump after-the-fact, I just bet you could have done equally well with firewire, and it would have had _zero_ impact on your kernel image. Now, contrast that with kdump, and ask yourself: which one do you think is worth concentrating effort on? - kdump: lots of code and maintenance effort, doesn't work if the CPU locks up, requires a lot of learning to go through the dump. - firewire: zero code, no maintenance effort, works even if the CPU locks up. Still does require the same learning to go through the end result. Which one wins? I know which one I'll push. Linus
What are the cool kids using?
I recently had to debug a problem with a homemade driver when upgrading from CentOS 4 to CentOS 5. I noticed that the netdump facility had gone away and was replaced with kdump, so I used it. Linus is right, though, in that I really only wanted the textual oops information. A backtrace is enough, and code inspection from there.
So...if not through kdump, how am I supposed to get that backtrace? Most of the oops scrolls off-screen quickly, and then I believe the system totally halts, so you can't scroll back. Serial console? This FireWire method (is that ppc-only? I'm using x86_64)?
ahh, netconsole
On second read through, I saw the reference to netconsole. Seems to be documentation in Documentation/networking/netconsole.txt. Looks easy to set up, and good enough for me.
Firescope
I used Firescope the other day to get the dmesg of a crashed machine. Both the crashed machine and the one used for retrieving the dmesg was x86. Get the tarball from ftp://ftp.firstfloor.org/pub/ak/firescope/ and read the QUICKSTART inside.
Crashed OS writing to stable storage....
Linus is exactly right this is only useful in a perfect world and there there are no kernel panics.
I just think it's a horrible idea for a crashed kernel which is, by nature, unstable writing to permanent storage. It's like begging someone to dd if=/dev/random of=/dev/sda.
You miss one important fact
You miss one important fact - this 'horrible idea' works. Pretty much all operating systems support it, many have it enabled by default, and it works without damaging disk contents.
Oh, so Linus thinks not only
Oh, so Linus thinks not only that not debuggers are evil - all debugging facilities other than printk are evil as well?
Linus prefers debugging
Linus prefers debugging solutions that will actually work when needed and have no "bad" side effects. The firewire debugging solution will work even if the CPU has hung! And that support is only a few hundred lines of kernel code. Simple and elegant with easily defined side effects, the way more code should be written.
Writing to disk is an unwise and unnecessary feature for a kernel debugger when there are many safer ways to debug. Just because other OS's can do it does not make the idea sound.
True, debugging via firewire
True, debugging via firewire is a very good thing. However, it has one _big_ problem - requires a way to reproduce the problem. Which is not a good thing if the problem occurs randomly. Crashdumps solve that.
Now, as for "just because others do that" - I didn't say crashdumps are ok because other systems do that. I said crashdumps are ok because in other systems they work correctly, are very useful and don't cause any problems which, according to Linus' theory, they should. That means that this theory is just void.
Now, in which way can you
Now, in which way can you dump information in a crashdump, that is not accessible in memory?
The Firewire solution can recover all available memory, and thus everything a crashdump can, including stack traces and registers. There is absloutely no advantage to the crashdump.
That they mostly work correctly on other systems... There's no doubt that they can cost you real data by writing to a filesystem that could easily have been partially corrupted by the crash, and/or by writing when the system is severely unstable (Maybe even from HDD controller defects).
'Live' debugging via
'Live' debugging via Firewire and crashdumps serve different purposes. Firewire is useless, if you cannot reproduce the bug or don't have another machine configured for this purpose.
As for damaging data - no offence, but which part of "experience shows that it does not happen" you don't understand? If theory says something else, then the theory is wrong.
The problem with Linus' theory is that he assumes that a significant part of kernel has to work for dump to be effective, and that is not true. Take a look at the implementation in FreeBSD, for example - you have a dumpsys() routine, which reads memory and calls special routine in disk device driver - ATA, SCSI HBA driver, whatever is registered for dumps - to write it to disk beginning at specific sector. For this to work you don't need working filesystem, working GEOM, you may not even need interrupts and DMA, if the routine works in polling mode.
Of course, if you want to implement dumping via network or something then this gets complicated - but that's probably the _least_ important consequence of Linux breaking with the KISS principle.
"Solaris is shit"
I *really* don't like when people make these kind of generalized statements, and when Linus Torvalds himself makes one, it casts Linux as a hole in bad light.
Maybe it's only because he has learnt English from teenage IRC chat?
The Hole of Linux!
Please don't cast Linux into a hole, we need it!
But seriously, his attitude is not a problem for the Linux community. He's always been like this and Linux just keeps getting stronger. Go write a blog entry about why Solaris isn't shit - it might make you feel better and I bet Linus won't mind.
-sam
Debugging 101
The first truism in debugging is that you need the information needed to identify the problem and nothing more. Anything in addition to that is simply going to confuse the issue. It just isn't going to be the same minimal information in each case.
The second truism in debugging is that where the program fails is NOT necessarily where the bug has occurred. Non-fatal bugs and cumulative bugs are by far the most common of all bug types, and an oops won't help you there.
The existing methods of getting kernel state information are probably adequate for fatal bugs. For non-fatal and cumulative bugs, it would seem better to add more optional sanity checks and other state validation. The more often bugs can be detected and fixed before they become a problem, the less often users will ever need to worry about crashes and therefore the less often anyone will need to concern themselves over what Linux does or does not provide for such cases.
How early at boot stage can we do crash dump?
Hi
I have a question regarding crash dump. It is possible to do disk dump or net dump after the kernel has booted successfully. My question is --- is it possible to move the crash dump/disk dump to very early stage of booting, just loading the disk driver only from initrd image (assuming disk driver is bugfree !!!)? In other words, how early in the boot process can we collect crash dump from panic? What kernel subsystems must be initialized before trying to do crash dump?
Thanks.