Series includes the following patches: x86/amd-iommu: enable iommu before attaching devices x86/amd-iommu: warn when issuing command to uninitialized cmd buffer Revert "x86: disable IOMMUs on kernel crash" x86/amd-iommu: use for_each_pci_dev First one is the primary bug fix. v2 - add disable_iommus on failure path - move removal of CMD_BUFFER_UNINITIALIZED to iommu_enable_command_buffer() - fix ";;" typo in patch 2 - add Revert "x86: disable IOMMUs on kernel crash" - add x86/amd-iommu: use for_each_pci_dev thanks, -chris --
This effectively reverts commit 61d047be99757fd9b0af900d7abce9a13a337488. Disabling the IOMMU can potetially allow DMA transactions to complete without being translated. Leave it enabled, and allow crash kernel to do the IOMMU reinitialization properly. Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> --- arch/x86/kernel/crash.c | 6 ------ 1 file changed, 6 deletions(-) --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -27,7 +27,6 @@ #include <asm/cpu.h> #include <asm/reboot.h> #include <asm/virtext.h> -#include <asm/x86_init.h> #if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC) @@ -103,10 +102,5 @@ void native_machine_crash_shutdown(struc #ifdef CONFIG_HPET_TIMER hpet_disable(); #endif - -#ifdef CONFIG_X86_64 - x86_platform.iommu_shutdown(); -#endif - crash_save_cpu(regs, safe_smp_processor_id()); } --
Hmm, I think for this we need to change the gart code too and disable the gart before its initialization runs to not re-introduce issues fixed in commit bc2cea6a34fdb30f118ec75db39a46a191870607, no? Joerg --
That is a different code path with a different set of assumptions and restrictions. On a normal kexec of course we want to do an orderly shutdown. For the gart with a little luck we can just ignore it on kexec on panic. Unlike a virtualization capable iommu it doesn't prevent access to devices, when it is enabled. Worst case is that we have to start including iommu=off for gart systems. The best case is that we can figure out how to have the gart code reinitialize itself sanely, starting from some arbitrary point. machine_crash_shutdown is about doing those things that we can not do in any other way. Eric --
Thats another problem with this patch. It introduces a difference No no no. This is a maintenance nightmare for almost everybody. Where do you want to Document this special cases that 'if kernel uses gart then and only then boot the kexec kernel with iommu=off'. Always passing iommu=off to the kexec kernel doesn't work too for Yes, that is missing in this patch. But to keep changes small and don't bother with the gart code at all I suggest to remove the shutdown routine from the amd-iommu code only and not the whole shutdown call in the machine_crash_shutdown path. Joerg --
Of course there is. kexec on panic does no shutdown, and should not. That is fundamental. That is documented. This make them symmetric crap is a bad idea, because we can not reliably do it. 999 times out of 1000 we can actually handle everything we need to in the kdump kernel in the initialization path and when Now that I look at that commit again I am stunned. That commit already says it is doing things wrong. I agree. I would like to see something better, but the situation with the current set of patches is workable. Getting autodetection in there an automatically doing the right thing would be much better. Which will only encourage further abuse of that code path. The reliability has been continually decreasing lately and I believe this is one of those reasons. The hpet junk in there appears to be an even bigger reason. I have machines right now that can not reliably crash dump because someone tried papering over problems by putting code in the machine_crash_shutdown path which must have worked for their test cast but does not work for real world failures, and right now I am very grumpy about it all. I guess what I am saying is that I do not believe shutting down the gart in machine_crash_shutdown actually works. It is definitely not the right place to do that work. So I don't see why we should keep any of that code there. Eric --
I guess I should add the place where I have serious test results for with a gart is on 2.6.29. Where there isn't that iommu shutdown and I don't have problems with it. So my experience is that touching the gart in machine_crash_shutdown is unnecessary. Eric --
Another problem: This also breaks if the kdump kernel has no iommu-support. Joerg --
Not a problem. We require a lot of things of the kdump kernel, and it is immediately apparent in a basic sanity test. Eric --
Only if the sanity test is done on an iommu machine which I don't want to rely on. Joerg --
That makes no sense. The requirements on the kdump kernel has always been that it somehow figure out to recover a machine that is in a random hardware state. That requires drivers for the hardware, that is critical to the machines operation. The easy test for sysadmins is to do: echo > /proc/sysrq-trigger Anyone who thinks the result from one piece of hardware applies to another is deluded. We have been down the path of doing lots of things in the crashing kernel with lkcd, in practice it was worthless in the event of real world crashes. kexec on panic isn't perfect but it at least is an architecture that works often enough to be usable. It does require testing to make certain the basic code paths don't regress, but even so it is a lot easier to maintain and keep useful than any alternative I know of. Eric --
Also, in most cases (for example: distribution kernels), the kdump kernel nowadays is identical to the running kernel. So, if the running kernel has IOMMU support, the kdump kernel also has. Regards, Bernhard --
I have been rather pleasantly surprised at how well that works. Although I am still glad I insisted that using the same kernel version not be a hard requirement. It allowed us to specify a good solid interface. How to cope with GART in that situation without having to manually specify iommu=off in the configuration I find a more compelling question. I expect I will have a look at that in the coming months if someone doesn't get there before I do. Eric --
Yes, I know. But is that a requirement for kexec? If not we still potentially have this problem. It is a smaller problem than data-corruption caused by in-flight dma because most people^Wdistributions indeed use the same kernel for normal boot and kexec, thats true. Joerg --
For normal kexec no. That path is expected to do a clean hardware shutdown. For kexec on panic aka kdump the requirement is that your your crash kernel be able to initialize your hardware from any state it can be put in. Currently we have no kernels that properly recover and amd-iommu and There are exceptions made for practical reality. The fact that typically the kdump kernel is the same as the running kernel mean that we don't even need to consider one of those exceptions. Eric --
Ok, if you show me where this is documented for everybody then I am probably convinced :-) We should fixup the gart initialization anyway. Joerg --
Hi Joerg,
Going through the old mail thread, I think the commit you pointed to was
primarily introduced to solve kexec + GART issue and not necessarily kdump
issue.
In fact disabling IOMMU patch was introduced by you.
Author: Joerg Roedel <joerg.roedel@amd.com>
Date: Tue Jun 9 17:56:09 2009 +0200
x86: disable IOMMUs on kernel crash
If the IOMMUs are still enabled when the kexec kernel boots access to
the disk is not possible. This is bad for tools like kdump or anything
else which wants to use PCI devices.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
I am assuming you introduced this patch because you faced issues with
amd-iommu and not GART.
So basically GART should have been working with kdump even before you
introduced disabling iommu patch in kdump path.
Thanks
Vivek
--
Looking at following commit, we were still not shutting down GART and
fixing issues like second kernel accessing the GART aperture set by first
kernel.
commit aaf230424204864e2833dcc1da23e2cb0b9f39cd
Author: Yinghai Lu <Yinghai.Lu@Sun.COM>
Date: Wed Jan 30 13:33:09 2008 +0100
x86: disable the GART early, 64-bit
For K8 system: 4G RAM with memory hole remapping enabled, or more than
4G RAM installed.
So I guess it should be fine to not shutdown GART in crashing kernel and
then look at the fresh issues which crop up and figure out how to fix
those.
Vivek
--
not sure if it is related: for crashing kernel, it could do early_memtest to check if some device are still do dma operation. When I use kexec to start second kernel, if enable the early_memtest in second kernel, it will find some pages RAM are BAD, and it will mark them and not use them. memtest=1 should be good enough. Fresh restart will not report there is any BAD ram in the same system. YH --
Devices doing DMA in general are not a problem in the kdump kernel because we are using an area of memory that has been reserved since the beginning of time and no DMA's should be targeting it. The challenge I assume you are not talking kdump here. On-going DMA in the case of kexec indicates some device driver isn't shutting itself down when it's shutdown method is called. Odds are it is a network controller that doesn't stop DMA when it is brought down or it is, possibly a really weird disk driver. If you are seeing this with the kdump kernel this may indeed indicate an IOMMU reinitialization problem. Eric --
The kdump kernel has to support the hardware it's running on. Film at 11. -- David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation --
Replace open coded version with for_each_pci_dev
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
arch/x86/kernel/amd_iommu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kernel/amd_iommu.c
+++ b/arch/x86/kernel/amd_iommu.c
@@ -2187,7 +2187,7 @@ static void prealloc_protection_domains(
struct dma_ops_domain *dma_dom;
u16 devid;
- while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
+ for_each_pci_dev(dev) {
/* Do we handle this device? */
if (!check_device(&dev->dev))
--
To catch future potential issues we can add a warning whenever we issue
a command before the command buffer is fully initialized.
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
Or not...this is just if you think it's useful ;-)
arch/x86/include/asm/amd_iommu_types.h | 1 +
arch/x86/kernel/amd_iommu.c | 1 +
arch/x86/kernel/amd_iommu_init.c | 5 +++--
3 files changed, 5 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/amd_iommu_types.h
+++ b/arch/x86/include/asm/amd_iommu_types.h
@@ -140,6 +140,7 @@
/* constants to configure the command buffer */
#define CMD_BUFFER_SIZE 8192
+#define CMD_BUFFER_UNINITIALIZED 1
#define CMD_BUFFER_ENTRIES 512
#define MMIO_CMD_SIZE_SHIFT 56
#define MMIO_CMD_SIZE_512 (0x9ULL << MMIO_CMD_SIZE_SHIFT)
--- a/arch/x86/kernel/amd_iommu.c
+++ b/arch/x86/kernel/amd_iommu.c
@@ -392,6 +392,7 @@ static int __iommu_queue_command(struct
u32 tail, head;
u8 *target;
+ WARN_ON(iommu->cmd_buf_size & CMD_BUFFER_UNINITIALIZED);
tail = readl(iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
target = iommu->cmd_buf + tail;
memcpy_toio(target, cmd, sizeof(*cmd));
--- a/arch/x86/kernel/amd_iommu_init.c
+++ b/arch/x86/kernel/amd_iommu_init.c
@@ -436,7 +436,7 @@ static u8 * __init alloc_command_buffer(
if (cmd_buf == NULL)
return NULL;
- iommu->cmd_buf_size = CMD_BUFFER_SIZE;
+ iommu->cmd_buf_size = CMD_BUFFER_SIZE | CMD_BUFFER_UNINITIALIZED;
return cmd_buf;
}
@@ -472,12 +472,13 @@ static void iommu_enable_command_buffer(
&entry, sizeof(entry));
amd_iommu_reset_cmd_buffer(iommu);
+ iommu->cmd_buf_size &= ~(CMD_BUFFER_UNINITIALIZED);
}
static void __init free_command_buffer(struct amd_iommu *iommu)
{
free_pages((unsigned long)iommu->cmd_buf,
- get_order(iommu->cmd_buf_size));
+ get_order(iommu->cmd_buf_size & ~(CMD_BUFFER_UNINITIALIZED)));
}
/* allocates the memory where the IOMMU will log its events to */
--
Hit another kdump problem as reported by Neil Horman. When initializaing the IOMMU, we attach devices to their domains before the IOMMU is fully (re)initialized. Attaching a device will issue some important invalidations. In the context of the newly kexec'd kdump kernel, the IOMMU may have stale cached data from the original kernel. Because we do the attach too early, the invalidation commands are placed in the new command buffer before the IOMMU is updated w/ that buffer. This leaves the stale entries in the kdump context and can renders device unusable. Simply enable the IOMMU before we do the attach. Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> --- arch/x86/kernel/amd_iommu_init.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) --- a/arch/x86/kernel/amd_iommu_init.c +++ b/arch/x86/kernel/amd_iommu_init.c @@ -1288,6 +1288,8 @@ static int __init amd_iommu_init(void) if (ret) goto free; + enable_iommus(); + if (iommu_pass_through) ret = amd_iommu_init_passthrough(); else @@ -1300,8 +1302,6 @@ static int __init amd_iommu_init(void) amd_iommu_init_notifier(); - enable_iommus(); - if (iommu_pass_through) goto out; @@ -1315,6 +1315,7 @@ out: return ret; free: + disable_iommus(); amd_iommu_uninit_devices(); --
