On Tue, Nov 27, 2007 at 10:41:15AM -0800, Ben Woodard wrote:
Ben I tend to agree. I think re-enabling the APIC early in the boot process
provides a greater degree of reliability in that it more quickly restores the
system to a state where booting on a cpu other than cpu0 will be more likely to
work, but I have to say that overall it seems like booting a secondary kernel on
cpu0, when possible offers the highest degree of reliability.
Perhaps what we need is a 'both solution'. Re-enabling the apic to full smp
functionality early in the boot process is a good solution for the problems
which we are hypothesizing here, and would be a good thing to do in general, but
it doesn't preclude also attmpting to switch back to cpu0 during a crash.
Perhaps it would be worthwhile to:
1) Investigate the early enablement of the ioapic for x86[_64]
2) implement my prevoiusly proposed patch with the addition of a handshake
element, such that:
a) when the boot cpu gets the ipi from machine_crash_shutdown it flags
the fact that it is going to boot the kexec kernel with a global
variable
b) the crashing cpu loops waiting for either:
I) a timeout of 1 second
II) a reduction of the halt count to zero
III) the setting of the flag mentioned in (a)
c) the crashing cpu, if it sees that it is not the boot cpu AND
that the flag in (III) is set, will halt itself, otherwise it
will set the flag and boot the kexec image itself.
With this modification, we can try to relocate to cpu0, and if we fail, we fall
back to booting on the crashing processor.
I'll work up a patch that implements (2), unless there are strong objections. I
see no reason why we can't implment this 'both' solution.
Regards
Neil
--
/***************************************************
*Neil Horman
*Software Engineer
*Red Hat, Inc.
*nhorman@redhat.com
*gpg keyid: 1024D / 0x92A74FA1
*http://pgp.mit.edu
***************************************************/
-