Following patches address/workaround the issues we have identified with the interrupt-remapping code flow while debugging hangs/spurious NMI's we have seen with different OEM platforms during kexec/kdump in the presence of interrupt-remapping (and x2apic in some cases). All the patches are small and self-contained and are marked stable as it makes kexec/kdump functional on these platforms. While some of these patches touch pci files and self-contained, I would appreciate if all these patches get routed to Linus tree (for v2.6.37) through -tip tree. thanks, suresh --
On platforms with Intel 7500 chipset, there were some reports of system hang/NMI's during kexec/kdump in the presence of interrupt-remapping enabled. During kdump, there is a window where the devices might be still using old kernel's interrupt information, while the kdump kernel is coming up. This can cause vt-d faults as the interrupt configuration from the old kernel map to null IRTE entries in the new kernel etc. (with out interrupt-remapping enabled, we still have the same issue but in this case we will see benign spurious interrupt hit the new kernel). Based on platform config settings, these platforms seem to generate NMI/SMI when a vt-d fault happens and there were reports that the resulting SMI causes the system to hang. Fix it by masking vt-d spec defined errors to platform error reporting logic. VT-d spec related errors are already handled by the VT-d OS code, so need to report the same erorr through other channels. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: stable@kernel.org [v2.6.32+] --- drivers/pci/quirks.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) Index: tip/drivers/pci/quirks.c =================================================================== --- tip.orig/drivers/pci/quirks.c +++ tip/drivers/pci/quirks.c @@ -2764,6 +2764,26 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_RI DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_RICOH, PCI_DEVICE_ID_RICOH_R5C832, ricoh_mmc_fixup_r5c832); #endif /*CONFIG_MMC_RICOH_MMC*/ +#if defined(CONFIG_DMAR) || defined(CONFIG_INTR_REMAP) +/* + * This is a quirk for masking vt-d spec defined errors to platform error + * handling logic. With out this, platforms seem to generate NMI/SMI (based + * on the RAS config settings of the platform) when a vt-d fault happens and + * there were reports that the resulting SMI causes system to hang. + * + * VT-d spec related errors are already handled by the VT-d OS code, so no + * need to report the same erorr through other channels. + */ +static ...
Acked-by: Chris Wright <chrisw@sous-sol.org> --
On Tue, 30 Nov 2010 22:22:26 -0800 Can we make these registers and bits a bit more self-documenting (i.e. #defines for both, maybe along with other useful bit definitions for this reg)? Also, "error" is misspelled as "erorr" above. :) -- Jesse Barnes, Intel Open Source Technology Center --
Thanks for the review. Appended the updated patch. I haven't used #defines for the pci-id's, as the first one (IOH) is used by several chipsets and the second one is not named yet. --- From: Suresh Siddha <suresh.b.siddha@intel.com> Subject: vt-d: quirk for masking vtd spec errors to platform error handling logic On platforms with Intel 7500 chipset, there were some reports of system hang/NMI's during kexec/kdump in the presence of interrupt-remapping enabled. During kdump, there is a window where the devices might be still using old kernel's interrupt information, while the kdump kernel is coming up. This can cause vt-d faults as the interrupt configuration from the old kernel map to null IRTE entries in the new kernel etc. (with out interrupt-remapping enabled, we still have the same issue but in this case we will see benign spurious interrupt hit the new kernel). Based on platform config settings, these platforms seem to generate NMI/SMI when a vt-d fault happens and there were reports that the resulting SMI causes the system to hang. Fix it by masking vt-d spec defined errors to platform error reporting logic. VT-d spec related errors are already handled by the VT-d OS code, so need to report the same error through other channels. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: stable@kernel.org [v2.6.32+] --- drivers/pci/quirks.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) Index: tip/drivers/pci/quirks.c =================================================================== --- tip.orig/drivers/pci/quirks.c +++ tip/drivers/pci/quirks.c @@ -2764,6 +2764,29 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_RI DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_RICOH, PCI_DEVICE_ID_RICOH_R5C832, ricoh_mmc_fixup_r5c832); #endif /*CONFIG_MMC_RICOH_MMC*/ +#if defined(CONFIG_DMAR) || defined(CONFIG_INTR_REMAP) +#define VTUNCERRMSK_REG 0x1ac +#define VTD_MSK_SPEC_ERRORS (1 << 31) +/* + * This is a quirk for masking vt-d spec defined ...
On Mon, 06 Dec 2010 12:26:30 -0800 Is there a bug # that should be referenced in the commit log? Any tested-bys to add? Thanks, -- Jesse Barnes, Intel Open Source Technology Center --
There is no kernel.org bug# but there are multiple bugs with different OSV's. And hence didn't care to mention to the bug # Please add: Reported-by: Max Asbock <masbock@linux.vnet.ibm.com> Reported-and-tested-by: Takao Indoh <indou.takao@jp.fujitsu.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> thanks, suresh --
I tested the patches on a system with a Tylersburg chipset. I used the patches against the 2.6.37-rc4 kernel and tested kdump. I still see the Vt-d errors but they no longer cause NMIs. It works as expected. - Max --
Commit-ID: 254e42006c893f45bca48f313536fcba12206418 Gitweb: http://git.kernel.org/tip/254e42006c893f45bca48f313536fcba12206418 Author: Suresh Siddha <suresh.b.siddha@intel.com> AuthorDate: Mon, 6 Dec 2010 12:26:30 -0800 Committer: H. Peter Anvin <hpa@linux.intel.com> CommitDate: Mon, 13 Dec 2010 16:51:51 -0800 x86, vt-d: Quirk for masking vtd spec errors to platform error handling logic On platforms with Intel 7500 chipset, there were some reports of system hang/NMI's during kexec/kdump in the presence of interrupt-remapping enabled. During kdump, there is a window where the devices might be still using old kernel's interrupt information, while the kdump kernel is coming up. This can cause vt-d faults as the interrupt configuration from the old kernel map to null IRTE entries in the new kernel etc. (with out interrupt-remapping enabled, we still have the same issue but in this case we will see benign spurious interrupt hit the new kernel). Based on platform config settings, these platforms seem to generate NMI/SMI when a vt-d fault happens and there were reports that the resulting SMI causes the system to hang. Fix it by masking vt-d spec defined errors to platform error reporting logic. VT-d spec related errors are already handled by the VT-d OS code, so need to report the same error through other channels. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1291667190.2675.8.camel@sbsiddha-MOBL3.sc.intel.com> Cc: stable@kernel.org [v2.6.32+] Reported-by: Max Asbock <masbock@linux.vnet.ibm.com> Reported-and-tested-by: Takao Indoh <indou.takao@jp.fujitsu.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> --- drivers/pci/quirks.c | 23 +++++++++++++++++++++++ 1 files changed, 23 insertions(+), 0 deletions(-) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 6f9350c..36191ed 100644 --- ...
Fault handling is getting enabled after enabling the interrupt-remapping (as the success of interrupt-remapping can affect the apic mode and hence the fault handling mode). Hence there can potentially be some faults between the window of enabling interrupt-remapping in the vt-d and the fault-handling of the vt-d units. Handle any previous faults after enabling the vt-d fault handling. For v2.6.38 cleanup, need to check if we can remove the dmar_fault() in the enable_intr_remapping() and see if we can enable fault handling along with enabling intr-remapping. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: stable@kernel.org [v2.6.32+] --- drivers/pci/dmar.c | 5 +++++ 1 file changed, 5 insertions(+) Index: tip/drivers/pci/dmar.c =================================================================== --- tip.orig/drivers/pci/dmar.c +++ tip/drivers/pci/dmar.c @@ -1417,6 +1417,11 @@ int __init enable_drhd_fault_handling(vo (unsigned long long)drhd->reg_base_addr, ret); return -1; } + + /* + * Clear any previous faults. + */ + dmar_fault(iommu->irq, iommu); } return 0; --
Acked-by: Chris Wright <chrisw@sous-sol.org> --
Commit-ID: 7f99d946e71e71d484b7543b49e990508e70d0c0 Gitweb: http://git.kernel.org/tip/7f99d946e71e71d484b7543b49e990508e70d0c0 Author: Suresh Siddha <suresh.b.siddha@intel.com> AuthorDate: Tue, 30 Nov 2010 22:22:29 -0800 Committer: H. Peter Anvin <hpa@linux.intel.com> CommitDate: Mon, 13 Dec 2010 16:53:57 -0800 x86, vt-d: Handle previous faults after enabling fault handling Fault handling is getting enabled after enabling the interrupt-remapping (as the success of interrupt-remapping can affect the apic mode and hence the fault handling mode). Hence there can potentially be some faults between the window of enabling interrupt-remapping in the vt-d and the fault-handling of the vt-d units. Handle any previous faults after enabling the vt-d fault handling. For v2.6.38 cleanup, need to check if we can remove the dmar_fault() in the enable_intr_remapping() and see if we can enable fault handling along with enabling intr-remapping. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20101201062244.630417138@intel.com> Cc: stable@kernel.org [v2.6.32+] Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> --- drivers/pci/dmar.c | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c index 0157708..09933eb 100644 --- a/drivers/pci/dmar.c +++ b/drivers/pci/dmar.c @@ -1417,6 +1417,11 @@ int __init enable_drhd_fault_handling(void) (unsigned long long)drhd->reg_base_addr, ret); return -1; } + + /* + * Clear any previous faults. + */ + dmar_fault(iommu->irq, iommu); } return 0; --
From: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Subject: x86: enable the intr-remap fault handling after local apic setup Interrupt-remapping gets enabled very early in the boot, as it determines the apic mode that the processor can use. And the current code enables the vt-d fault handling before the setup_local_APIC(). And hence the APIC LDR registers and data structure in the memory may not be initialized. So the vt-d fault handling in logical xapic/x2apic modes were broken. Fix this by enabling the vt-d fault handling in the end_local_APIC_setup() A cleaner fix of enabling fault handling while enabling intr-remapping will be addressed for v2.6.38. [ Enabling intr-remapping determines the usage of x2apic mode and the apic mode determines the fault-handling configuration. ] Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: stable@kernel.org [v2.6.32+] --- arch/x86/kernel/apic/apic.c | 8 ++++++++ arch/x86/kernel/apic/probe_64.c | 7 ------- 2 files changed, 8 insertions(+), 7 deletions(-) Index: tip/arch/x86/kernel/apic/apic.c =================================================================== --- tip.orig/arch/x86/kernel/apic/apic.c +++ tip/arch/x86/kernel/apic/apic.c @@ -1384,6 +1384,14 @@ void __cpuinit end_local_APIC_setup(void #endif apic_pm_activate(); + + /* + * Now that local APIC setup is completed for BP, configure the fault + * handling for interrupt remapping. + */ + if (!smp_processor_id() && intr_remapping_enabled) + enable_drhd_fault_handling(); + } #ifdef CONFIG_X86_X2APIC Index: tip/arch/x86/kernel/apic/probe_64.c =================================================================== --- tip.orig/arch/x86/kernel/apic/probe_64.c +++ tip/arch/x86/kernel/apic/probe_64.c @@ -79,13 +79,6 @@ void __init default_setup_apic_routing(v /* need to update phys_pkg_id */ apic->phys_pkg_id = apicid_phys_pkg_id; } - - /* - * Now that apic ...
Acked-by: Chris Wright <chrisw@sous-sol.org> --
Commit-ID: 7f7fbf45c6b748074546f7f16b9488ca71de99c1 Gitweb: http://git.kernel.org/tip/7f7fbf45c6b748074546f7f16b9488ca71de99c1 Author: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> AuthorDate: Tue, 30 Nov 2010 22:22:28 -0800 Committer: H. Peter Anvin <hpa@linux.intel.com> CommitDate: Mon, 13 Dec 2010 16:53:32 -0800 x86: Enable the intr-remap fault handling after local APIC setup Interrupt-remapping gets enabled very early in the boot, as it determines the apic mode that the processor can use. And the current code enables the vt-d fault handling before the setup_local_APIC(). And hence the APIC LDR registers and data structure in the memory may not be initialized. So the vt-d fault handling in logical xapic/x2apic modes were broken. Fix this by enabling the vt-d fault handling in the end_local_APIC_setup() A cleaner fix of enabling fault handling while enabling intr-remapping will be addressed for v2.6.38. [ Enabling intr-remapping determines the usage of x2apic mode and the apic mode determines the fault-handling configuration. ] Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> LKML-Reference: <20101201062244.541996375@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: stable@kernel.org [v2.6.32+] Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> --- arch/x86/kernel/apic/apic.c | 8 ++++++++ arch/x86/kernel/apic/probe_64.c | 7 ------- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 3f838d5..7821813 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1389,6 +1389,14 @@ void __cpuinit end_local_APIC_setup(void) setup_apic_nmi_watchdog(NULL); apic_pm_activate(); + + /* + * Now that local APIC setup is completed for BP, configure the fault + * handling for interrupt remapping. + */ + if (!smp_processor_id() && ...
From: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Subject: x86, vtd: fix the vt-d fault handling irq migration in the x2apic mode In x2apic mode, we need to set the upper address register of the fault handling interrupt register of the vt-d hardware. Without this irq migration of the vt-d fault handling interrupt is broken. Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: stable@kernel.org [v2.6.32+] --- arch/x86/kernel/apic/io_apic.c | 2 ++ 1 file changed, 2 insertions(+) Index: tip/arch/x86/kernel/apic/io_apic.c =================================================================== --- tip.orig/arch/x86/kernel/apic/io_apic.c +++ tip/arch/x86/kernel/apic/io_apic.c @@ -3367,6 +3367,8 @@ dmar_msi_set_affinity(struct irq_data *d msg.data |= MSI_DATA_VECTOR(cfg->vector); msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK; msg.address_lo |= MSI_ADDR_DEST_ID(dest); + if (x2apic_mode) + msg.address_hi = MSI_ADDR_BASE_HI | MSI_ADDR_EXT_DEST_ID(dest); dmar_msi_write(irq, &msg); --
Looks correct, I didn't have a chance to test this patch. Acked-by: Chris Wright <chrisw@sous-sol.org> --
Is it necessary to test x2apic_mode here? It looks like MSI_ADDR_EXT_DEST_ID() gives you everything above the low 8 bits of the APIC ID. If those bits are always zero except in x2apic_mode, we might not need the test. Does the ia64 dmar_msi_set_affinity() need the same fix? Why do we have both x2apic_enabled() and x2apic_mode? They seem sort of redundant. (Not related to this patch, of course.) Bjorn --
BIOS can handover to OS in x2apic mode in some cases. x2apic_enabled() is used to check for that and it reads the MSR to check the status. Some early portions of the kernel boot will use it. For all others, we should be using x2apic_mode. thanks, suresh --- From: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Subject: x86, vtd: fix the vt-d fault handling irq migration in the x2apic mode In x2apic mode, we need to set the upper address register of the fault handling interrupt register of the vt-d hardware. Without this irq migration of the vt-d fault handling interrupt is broken. Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: stable@kernel.org [v2.6.32+] --- arch/x86/kernel/apic/io_apic.c | 1 + 1 file changed, 1 insertion(+) Index: tip/arch/x86/kernel/apic/io_apic.c =================================================================== --- tip.orig/arch/x86/kernel/apic/io_apic.c +++ tip/arch/x86/kernel/apic/io_apic.c @@ -3367,6 +3367,7 @@ dmar_msi_set_affinity(struct irq_data *d msg.data |= MSI_DATA_VECTOR(cfg->vector); msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK; msg.address_lo |= MSI_ADDR_DEST_ID(dest); + msg.address_hi = MSI_ADDR_BASE_HI | MSI_ADDR_EXT_DEST_ID(dest); dmar_msi_write(irq, &msg); --
I applied this patch against 2.6.36 and confirmed irq migration of vt-d fault worked. Tested-by: Takao Indoh <indou.takao@jp.fujitsu.com> Thanks, --- 印藤隆夫(INDOH Takao) E-Mail : indou.takao@jp.fujitsu.com --
Commit-ID: 086e8ced65d9bcc4a8e8f1cd39b09640f2883f90 Gitweb: http://git.kernel.org/tip/086e8ced65d9bcc4a8e8f1cd39b09640f2883f90 Author: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> AuthorDate: Wed, 1 Dec 2010 09:40:32 -0800 Committer: H. Peter Anvin <hpa@linux.intel.com> CommitDate: Mon, 13 Dec 2010 16:52:52 -0800 x86, vt-d: Fix the vt-d fault handling irq migration in the x2apic mode In x2apic mode, we need to set the upper address register of the fault handling interrupt register of the vt-d hardware. Without this irq migration of the vt-d fault handling interrupt is broken. Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> LKML-Reference: <1291225233.2648.39.camel@sbsiddha-MOBL3> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: stable@kernel.org [v2.6.32+] Acked-by: Chris Wright <chrisw@sous-sol.org> Tested-by: Takao Indoh <indou.takao@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> --- arch/x86/kernel/apic/io_apic.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 226060e..fadcd74 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -3412,6 +3412,7 @@ dmar_msi_set_affinity(struct irq_data *data, const struct cpumask *mask, msg.data |= MSI_DATA_VECTOR(cfg->vector); msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK; msg.address_lo |= MSI_ADDR_DEST_ID(dest); + msg.address_hi = MSI_ADDR_BASE_HI | MSI_ADDR_EXT_DEST_ID(dest); dmar_msi_write(irq, &msg); --
