When switching from the crashed kernel to the kdump kernel without going
through PCI reset, IRQs may not work if a different IRQ mode is used on
the kdump kernel. The original IRQ mode used in the crashed kernel may
still be enabled and the new IRQ mode may not work. For example, it
will fail when going from MSI-X mode to MSI mode.
We fix this by disabling MSI/MSI-X and enabling INTX in bnx2_init_board().
pci_save_state() is also moved to the end of bnx2_init_board() after
all config register fixups (including the new IRQ fixups) have been done.
Export pci_msi_off() from drivers/pci/pci.c for this purpose.
Update bnx2 version to 2.0.16.
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
drivers/net/bnx2.c | 17 ++++++++++++++---
drivers/pci/pci.c | 1 +
2 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 188e356..1b8ba14 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -58,8 +58,8 @@
#include "bnx2_fw.h"
#define DRV_MODULE_NAME "bnx2"
-#define DRV_MODULE_VERSION "2.0.15"
-#define DRV_MODULE_RELDATE "May 4, 2010"
+#define DRV_MODULE_VERSION "2.0.16"
+#define DRV_MODULE_RELDATE "May 28, 2010"
#define FW_MIPS_FILE_06 "bnx2/bnx2-mips-06-5.0.0.j6.fw"
#define FW_RV2P_FILE_06 "bnx2/bnx2-rv2p-06-5.0.0.j3.fw"
#define FW_MIPS_FILE_09 "bnx2/bnx2-mips-09-5.0.0.j15.fw"
@@ -7877,7 +7877,6 @@ bnx2_init_board(struct pci_dev *pdev, struct net_device *dev)
}
pci_set_master(pdev);
- pci_save_state(pdev);
bp->pm_cap = pci_find_capability(pdev, PCI_CAP_ID_PM);
if (bp->pm_cap == 0) {
@@ -7953,6 +7952,16 @@ bnx2_init_board(struct pci_dev *pdev, struct net_device *dev)
bp->flags |= BNX2_FLAG_MSI_CAP;
}
+ /* When going from a crashed kernel to a kdump kernel without PCI
+ * reset, MSI/MSI-X may still be enabled. We need to disable
+ * MSI/MSI-X and enable INTX because the kdump driver may operate
+ * the device in a different IRQ mode.
+ */
+ if (bp->flags & ...From: "Michael Chan" <mchan@broadcom.com> I sincerely doubt that your's will be the only device which will ever run into this issue. Therefore handling it manually in each and every device driver, which is the trend you will be setting with this patch, doesn't make much sense. Any device which uses MSI in any way can run into this scenerio, wherein the device will be left with MSI enabled when we leave the crash kernel and jump into the kdump kernel. So this needs to be handled generically in the PCI layer or similar. I'm not applying this patch. --
Does the driver have to register a different Interrupt handler when switching from MSI(-X) to IRQ interrupts? (I'm thinking of IRQ interrupts might have DMA vs IRQ ordering dependency) If that's true, then this can't be handled in the generic PCI layer (as suggested by davem) unless the device driver could register multiple interrupt handlers even if only one is active at a time. thanks, --
From: Grant Grundler <grundler@parisc-linux.org> The generic PCI layer very well can turn off MSI on all devices when it starts up or a device is plugged in. That's all he is doing. Drivers essentially expect that the device comes up in INTX mode when the driver probes the device. All his change is doing is forcing that to be true, and there is no reason the generic PCI code can't do that. --
On Fri, 28 May 2010 20:24:22 -0700 This is probably a generic problem for many drivers. So why not get the kdump code to fix it for all. --
PCIe with AER actually does support per link root port reset (e.g. used for AER) I've been wondering for some time if kexec should not simply use that to reset all the devices, instead of addings hacks around this to all drivers. That would fix your problems too, right? The question is just if AER is widely enough supported for this. -Andi -- ak@linux.intel.com -- Speaking for myself only. --
Do you mean the slot_reset function in the pci_error_handlers? This If it is called in the context of the crashed kernel, it won't work. Some newer PCIe devices support Function Level Reset, and that would be ideal. But most existing devices including bnx2 devices don't have this feature. --
Well the fallback code in the PCIE root port driver that does the actual resets. It could be done on kexec, however of course you would rely on PCI root port data structures still being intact on a crash Root port reset should be fine for this case. Even if some innocent device on the same root port gets reset too that shouldn't matter. Only drawback for the NIC would be that you have to renegotiate links I think. Also there are systems without AER support. -Andi -- ak@linux.intel.com -- Speaking for myself only. --
The bnx2 driver like many other drivers has a slot_reset function in the pci_driver struct's err_handler. If the AER code calls this function, we would reset the chip and put it back to the same IRQ mode. Without calling this per driver reset function, I'm not sure if you can reset --
