Linux: Preserving Oops Data Through Resets

Submitted by Jeremy
on April 10, 2006 - 7:11am

James Courtier queried the Linux Kernel mailing list on the feasibility of restoring the kernel ring buffer after a reset. He proposed simply writing the ring buffer data redundantly to memory in the hope that not all RAM is erased at boot time, allowing the buffer to be reconstructed. The kernel ring buffer is typically viewed with the dmesg command. Referring to the method of collecting data from an oops through a serial connection, James explained, "the main advantage of something like this would be for newer motherboards that are around now that don't have a serial port." An existing solution to this problem is using kexec to boot a special lightweight kernel after a crash to collect a kernel crash dump.

The general consensus to James' query was that data written to RAM before a reset will not be available after, though exactly how much of the RAM is overwritten was debated. Kexec author Eric Biederman explained, "clearing the memory can be done at full memory bandwidth which can happen in seconds. On systems with ECC you need initialize all of the check bits so some kind of write to memory needs to happen." He then went on to note, "in practice a reset does not clear the memory and only a few bits tend to get flipped." Andi Kleen offered an alternative solution, "define a generic interface that allows drivers to register memory storage handlers. Add a entry into the oops die and panic notifiers that saves the kernel log into these backends." As an example, Andi suggested that video drivers could make available a small portion of video card RAM which could be used to preserve crash data across reboots.


From: James Courtier-Dutton [email blocked]
To: linux list [email blocked]
Subject: Black box flight recorder for Linux
Date:	Sat, 08 Apr 2006 12:12:56 +0100

Hi,

I have had an idea for a black box flight recorder type feature for 
Linux. Before I try to implement it, I just wish to ask here if anyone 
has already tried it, and whether the idea works or not.

Description for feature:
Stamp the dmesg output on RAM somewhere, so that after a reset (reset 
button pressed, not power off), the RAM can be read and details of 
oopses etc. can be read.

Now, the question I have is, if I write values to RAM, do any of those 
values survive a reset? If any did survive, one could use them to store 
oops output in. I am currently only interested in Intel CPU and AMD CPU 
based motherboards. If only some values survived, one could use some 
sort of redundant encoding so the good values could be recovered.

The main advantage of something like this would be for newer 
motherboards that are around now that don't have a serial port.

If no one has tried this, I will spend some time testing.

James


From: Andi Kleen [email blocked] Subject: Re: Black box flight recorder for Linux Date: 08 Apr 2006 15:41:02 +0200 James Courtier-Dutton [email blocked] writes: > > Now, the question I have is, if I write values to RAM, do any of those > values survive a reset? They don't generally. Some people used to write the oopses into video memory, but that is not portable. -Andi
From: Matti Aarnio <matti.aarnio@zmailer.org> Subject: Re: Black box flight recorder for Linux Date: Sat, 8 Apr 2006 20:30:38 +0300 On Sat, Apr 08, 2006 at 12:12:56PM +0100, James Courtier-Dutton wrote: > Hi, > > I have had an idea for a black box flight recorder type feature for > Linux. Before I try to implement it, I just wish to ask here if anyone > has already tried it, and whether the idea works or not. > > Description for feature: > Stamp the dmesg output on RAM somewhere, so that after a reset (reset > button pressed, not power off), the RAM can be read and details of > oopses etc. can be read. The idea of dmesg buffer comes to Linux from SunOS 4.x series on hardware, where system boot code explicitely left aside memory space which was not _cleared_ during boot (it was parity-regenerated, though). The command to display that ring-buffer content was (no surprise there?) "dmesg". I do wish so many things from PC hardware, but it has stayed so b***y inferior to real computers forever. Lattest AMD CPUs have nice features making them almost as good as IBM S/370 from early 1970es, but still BIOSes are rather primitive things keeping things back. ( IOMMUs are things that have been invented since, and are definitely a good thing. Otherwise it has been faster and more capacitious processing and memory at cheaper system cost... ) Like others have noted, display card memory spaces have been used for this kind of "survives over reset" uses -- I do also know some embedded boot codes that created similar ring buffers for similar reasons. They don't generally survive over power-cycling, of course. > The main advantage of something like this would be for newer > motherboards that are around now that don't have a serial port. > > If no one has tried this, I will spend some time testing. > > James /Matti Aarnio
From: Robert Hancock [email blocked] Subject: Re: Black box flight recorder for Linux Date: Sat, 08 Apr 2006 08:05:41 -0600 Andi Kleen wrote: > James Courtier-Dutton [email blocked] writes: >> Now, the question I have is, if I write values to RAM, do any of those >> values survive a reset? > > They don't generally. > > Some people used to write the oopses into video memory, but that > is not portable. I wouldn't think most BIOSes these days would bother to clear system RAM on a reboot. Certainly Microsoft was encouraging vendors not to do this because it slowed down system boot time. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [email blocked] Home Page: http://www.roberthancock.com/
From: Andi Kleen [email blocked] Subject: Re: Black box flight recorder for Linux Date: Sat, 8 Apr 2006 09:17:39 +0200 On Saturday 08 April 2006 16:05, Robert Hancock wrote: > Andi Kleen wrote: > > James Courtier-Dutton [email blocked] writes: > >> Now, the question I have is, if I write values to RAM, do any of those > >> values survive a reset? > > > > They don't generally. > > > > Some people used to write the oopses into video memory, but that > > is not portable. > > I wouldn't think most BIOSes these days would bother to clear system RAM > on a reboot. Certainly Microsoft was encouraging vendors not to do this > because it slowed down system boot time.to Reset button is like a cold boot and it generally ends up with cleared RAM. -Andi
From: James Courtier-Dutton [email blocked] Subject: Re: Black box flight recorder for Linux Date: Sat, 08 Apr 2006 17:28:39 +0100 Andi Kleen wrote: > On Saturday 08 April 2006 16:05, Robert Hancock wrote: >> Andi Kleen wrote: >>> James Courtier-Dutton [email blocked] writes: >>>> Now, the question I have is, if I write values to RAM, do any of those >>>> values survive a reset? >>> They don't generally. >>> >>> Some people used to write the oopses into video memory, but that >>> is not portable. >> I wouldn't think most BIOSes these days would bother to clear system RAM >> on a reboot. Certainly Microsoft was encouraging vendors not to do this >> because it slowed down system boot time.to > > Reset button is like a cold boot and it generally ends up with cleared > RAM. > > -Andi Thank you. That saved me 30mins hacking. :-)
From: Andi Kleen [email blocked] Subject: Re: Black box flight recorder for Linux Date: Sun, 9 Apr 2006 17:04:47 +0200 On Saturday 08 April 2006 18:28, James Courtier-Dutton wrote: > Andi Kleen wrote: > > On Saturday 08 April 2006 16:05, Robert Hancock wrote: > >> Andi Kleen wrote: > >>> James Courtier-Dutton [email blocked] writes: > >>>> Now, the question I have is, if I write values to RAM, do any of those > >>>> values survive a reset? > >>> They don't generally. > >>> > >>> Some people used to write the oopses into video memory, but that > >>> is not portable. > >> I wouldn't think most BIOSes these days would bother to clear system RAM > >> on a reboot. Certainly Microsoft was encouraging vendors not to do this > >> because it slowed down system boot time.to > > > > Reset button is like a cold boot and it generally ends up with cleared > > RAM. > > > > -Andi > > Thank you. That saved me 30mins hacking. :-) Sorry for having discouraged you. Actually there is a rare special case - triple fault - where you might be ok if the BIOS correctly supports the ACPI "bootflag" standard, but triple faults are relatively rare. They happen when the kernel screws up so badly that the CPU cannot even run exception handlers anymore. But I suspect it's too special for this. First if you're not aware of this - the "official" way right now to solve this problem is kexec + kdump + a preloaded crash kernel. But in practice it still has many problems because a lot of drivers cannot reinitialize the hardware properly. And of course it will users need to load the crash kernel in advance and lose about 64MB of RAM. My personal solution to the problem is firescope, but it also has its drawbacks (needs ohci1394 loaded first, requires a firewire cable) What I would do for this if you want to hack.- define a generic interface that allows drivers to register memory storage handlers. Add a entry into the oops die and panic notifiers that saves the kernel log into these backends. Then write some Documentation file for it and add a proof of comcept e.g. to the Nvidia/ATI frame buffer drivers. Then driver writers could expose this functionality if their hardware supports it or if someone has an embedded platform that guarantees it they could also use it. For Nvidia/ATI it might be tricky to get the X server to keep its hands off the memory, but I assume most graphic cards these days have more memory than the X server uses at least without 3d (?). If you're unlucky it will fill up everything with mozilla pixmaps over time though. In the worst case you would need to define a new interface between X server and kernel to tell the X server to leave some memory alone. The generic driver could also do the high level work, like adding proper checksums and magic values to make sure the data is sane after reboot. You would also need another driver that allows the boot process to read that data. Hope this helps, -Andi
From: [email blocked] (Eric W. Biederman) Subject: Re: Black box flight recorder for Linux Date: Sun, 09 Apr 2006 13:25:42 -0600 Andi Kleen [email blocked] writes: > On Saturday 08 April 2006 18:28, James Courtier-Dutton wrote: >> Andi Kleen wrote: >> > On Saturday 08 April 2006 16:05, Robert Hancock wrote: >> >> Andi Kleen wrote: >> >>> James Courtier-Dutton [email blocked] writes: >> >>>> Now, the question I have is, if I write values to RAM, do any of those >> >>>> values survive a reset? >> >>> They don't generally. >> >>> >> >>> Some people used to write the oopses into video memory, but that >> >>> is not portable. >> >> I wouldn't think most BIOSes these days would bother to clear system RAM >> >> on a reboot. Certainly Microsoft was encouraging vendors not to do this >> >> because it slowed down system boot time.to >> > >> > Reset button is like a cold boot and it generally ends up with cleared >> > RAM. >> > >> > -Andi >> >> Thank you. That saved me 30mins hacking. :-) Actually clearing the memory can be done at full memory bandwidth which can happen in seconds. On systems with ECC you need initialize all of the check bits so some kind of write to memory needs to happen. In practice a reset does not clear the memory and only a few bits tend to get flipped. Unless you can get support from the BIOS/firmware developers it isn't a practical approach though. > First if you're not aware of this - the "official" way right now > to solve this problem is kexec + kdump + a preloaded crash kernel. But in > practice it still has many problems because a lot of drivers cannot > reinitialize the hardware properly. And of course it will users need > to load the crash kernel in advance and lose about 64MB of RAM. Does a kernel really need 64M? That number seems insanely large to me. 8M should be more than sufficient if someone actually tried to be small. If you are aiming for a dedicated hardware solution you don't need even need a kernel in there and the amount reserved could be reduced to less than a meg. The size etc becomes a trade off between what is expedient and maintainable. Eric
From: [email blocked] Subject: Re: Black box flight recorder for Linux Date: 8 Apr 2006 18:45:33 -0400 > I wouldn't think most BIOSes these days would bother to clear system RAM > on a reboot. Certainly Microsoft was encouraging vendors not to do this > because it slowed down system boot time. I don't think they explicitly clear it all, but they do write to it to test how much RAM is installed and don't bother to put back what they scribbled on. Sufficient ECC techniques sould probably recover from the damage. For a first attempt, I'd take 4096-byte pages, not use the first and last 8 bytes at all, and divide the remaining 4080 bytes into 16 interleaved 255-byte ECC segments, each using a byte-wide Reed-Solomon code. (The fraction of that 255 devoted to ECC is up to you; n-bit-wide Reed-Solomon just requires that data + ECC <= (2^n - 1) bytes of n bits each.) For extra hack value, you could detect at boot what parts of your log got corrupted and avoid using those parts when logging new data. (There are complications...) It is possible to update RS ECC incrementally, or perhaps it would be better to store the tail of the log in some less efficient form (like multiple replication) and then pack it into ECC when full. The other thing that might be a problem is that I don't know how long refresh stops during reset. Again, ECC can be your friend. (And code for it already exists in lib/reed_solomon/)

Related Links:

Curious

Anonymous (not verified)
on
April 13, 2006 - 9:55pm

Erm, perhaps I'm misunderstanding (don't read things like this with massive headaches) but wouldn't kernel crash dumps do pretty much what this guy is looking for?

If so, it would seem to be a more direct and less complicated thing to do than some of the things envisioned by some of the folks in the emails...

Writing to disk can be dicey.

Mr_Z
on
April 14, 2006 - 10:15am

From what I recall, there's extreme hesitancy towards writing to disk, especially if the crash comes from anywhere near the block drivers. Hence the discussion of kexec'ing a "crash cleanup" kernel that presumably would have a known-stable driver layer.

On the topic initializing memory, I noticed on some BIOSes, if I enabled ECC it would take forever to boot, presumably because it was giving RAM its first scrub prior to booting. (That's how my dual Athlon MP 2600 behaved.) My newer dual Opteron system doesn't seem to do that though.

Well most serious hackers hav

Anonymous (not verified)
on
April 16, 2006 - 10:32pm

Well most serious hackers have at least 2 computers at home..
How about a "net dump"? dump to the hard disk of a second machine.
I guess via samba or nfs depending on what OS the 2nd machine is running..

I think they have that already.

Mr_Z
on
April 17, 2006 - 4:37am

At least, RedHat does. Given it's a kernel modification, I would suppose it's GPL. But, it's not in the mainline kernel.

And there's a special diskdump patch floating around that avoids the existing block driver stack.

The "kexec into a new kernel" idea has been around awhile, and the oops-to-a-floppy idea has been around in Linux circles even longer (1999).

Use for floppy drives?

Anonymous (not verified)
on
April 14, 2006 - 2:41pm

This might be a use for the good old PC floppy drive.. just write raw data out to the floppy disk. Should be simple and atomic enough to help with recording. Print oops, wait 5s, then try to write to FD.

Why can't it be written to sw

Anonymous (not verified)
on
April 19, 2006 - 4:46am

Why can't it be written to swap?
Would survive a cold boot...

The driver layer isn't reliable.

Mr_Z
on
April 19, 2006 - 9:41am

Very often, the oops is in such a place that the kernel can't rely on the driver layer being reliable. Attempting to unwedge the disk drivers in order to write to the disk could end up simply trashing the disk instead. As I mentioned above, that's why you see all this talk of rebooting into mini-kernels, writing low-performance mini-driver layers, etc.

There's a very good explanation of the challenges and tradeoffs at the RedHat page on NetDump, both on why writing to disk is dicey, and why Linux so far does not do what traditional UNIXes do (dump everything to swap) when a fault occurs.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.