Alright so that should have no hardware effect then.
Well I think this is slightly on the wrong track. The original patch description
said we get as far as purgatory. Purgatory is the bit of C code form
/sbin/kexec that runs just before our second kernel. It sets up
arguments and verify the target kernel has a valid sha256sum. If
purgatory detects data corruption it spins, to prevent a corrupt
recovery kernel from doing something nasty.
It appears that the primary crash_kexec path is working fine.
The original description speculated that we had non-stopped cpus that
were telling the hardware to shut off.
I don't see what the hang is. However the goal apparently is to make
the kexec on panic path more robust so that we can take crash dumps in
more strange cases.
We can get NMI from the nmi watchdogs so it is possible this happens
on legitimate hardware so there is a chance this is deterministic and
that we can get enough information to debug and fix the original
error.
If part of the problem is getting to crash_kexec my inclination is to
move the call to crash_kexec up as early as possible in die_nmi. As
we may simply be hanging in printk or something stupid like that.
It is weird that only the 32bit die_nmi path calls bust_spinlocks.
I'm not really happy with the secondary cpus taking whole notify_die
path as that is more general purpose infrastructure that might go
bad. However it doesn't appear broken, and it should not be critical
to the crash dump process.
Eric
--