PATCH/RFC: [kdump] fix APIC shutdown sequence This patch fixes a problem that we have encountered with kdump under high I/O load on some machines. The machines showing the errors have an Intel ICH7 chip set with a 6702PXH PCI Express-to-PCI Bridge (8086:032c) containing an IO-APIC. The bug symptom is that certain controllers connected to the 6702PXH bridge wouldn't receive any IRQs in the kdump kernel. In the error case (which is about 20% of all cases) the IRR bit of the IO-APIC pin for that controller is always set after the start of the kdump kernel, indicating an IRQ in progress. We haven't found a way to recover from this situation when it has once occured, except for a system reset. The error is caused by IRQs arriving while the APIC subsystem is deactivated in machine_crash_shutdown(). Apparently, the IO-APIC gets stuck if it sends an IRQ message to a Local APIC and never receives an EOI for that message. This can have several possible reasons: 1. If, under SMP, the IO-APIC logical destination field is set by the IRQ balancing code to one of the "other" CPUs (i.e. not the crashing_cpu), and an IRQ arrives on the respective pin after that CPU has shut down its local APIC (but before the IO-APIC pin is masked) the IRQ message can't be delivered. 2. The crashing CPU itself disables its local APIC before the IO-APIC, leaving a short time window where the IOAPIC can receive IRQs, but not deliver them. 3. An IRQ is received and delivered to a local APIC, but no CPU ever executes the IRQ handler and therefore no EOI is sent. After a lot of failed attempts, i have come up with the following patch, which fixes the problem. The patch first masks all IO-Apic pins to avoid a sitation where the IO-Apic can receive, but not deliver, the IRQs. Moreover, it enables interrupts for a short period before eventually starting the kdump kernel, so that EOIs can be sent to the APICs as necessary. Notes: a) Simply calling disable_IO_APIC() early doesn't work, probably because that also clears the IRQ vector information, so that arriving EOI messages can't be associated with pins by the IO-APIC. b) We have tried patches that avoid re-enabling interrupts, but so far without success. Re-enabling IRQs is of course dangerous while dumping, and I'd rather find a way to avoid it. c) There are indications that besides the EOI, it's also necessary that the PCI IRQ pin is deasserted at least for a short time. That usually requires that the driver IRQ handler is called and tells the FW that the IRQ was received. Whether or not this is a requirement hasn't been finally clarified yet. d) The problem is only seen with the IO-APIC in the 6702PXH PCI bridge, which is the system's secondary IO-APIC. On the system's main IO-APIC, we see other IRQs (timer etc) arrive and never get an EOI, but we see no errors. The patch below is against 2.6.23-rc1. The problem was originally analyzed and the patch developed against the Red Hat EL5 kernel (2.6.18-8.el5). I verified that the problem still occurs with 2.6.23-rc1, and that the patch below fixes the problem. Regards Martin PS: patch attached ain MIME format because it'd be mangled quoted-printable by my Mail relay. -- Martin Wilck PRIMERGY System Software Engineer FSC IP ESP DE6 Fujitsu Siemens Computers GmbH Heinz-Nixdorf-Ring 1 33106 Paderborn Germany Tel: ++49 5251 8 15113 Fax: ++49 5251 8 20409 Email: mailto:martin.wilck@fujitsu-siemens.com Internet: http://www.fujitsu-siemens.com Company Details: http://www.fujitsu-siemens.com/imprint.html
