kdump kernel fails to boot with calgary iommu and aacraid driver on a
x366 box. The ongoing dma's of aacraid from the first kernel continue
to exist until the driver is loaded in the kdump kernel. Calgary is
initialized prior to aacraid and creation of new tce tables causes wrong
dma's to occur.Here we try to grab the tce tables of the first kernel in kdump kernel
and use them. While in the kdump kernel we do not allocate new tce
tables but rather read the base address register contents of calgary
iommu and use the tables that the registers point to. With these changes
the kdump kernel and hence aacraid now boots normally.Another point that came when talking with Vivek was to reserve part of
the tce table space in first kernel for use in the kdump kernel.Signed-off-by: Chandru S <chandru@in.ibm.com>
------ linux-2.6.23-rc9/arch/x86_64/kernel/pci-calgary.c.orig
2007-10-09 23:39:22.000000000 +0530
+++ linux-2.6.23-rc9/arch/x86_64/kernel/pci-calgary.c 2007-10-10
01:25:53.000000000 +0530
@@ -35,6 +35,7 @@
#include <linux/pci_ids.h>
#include <linux/pci.h>
#include <linux/delay.h>
+#include <linux/bootmem.h>
#include <asm/iommu.h>
#include <asm/calgary.h>
#include <asm/tce.h>
@@ -165,6 +166,7 @@ static void calgary_dump_error_regs(stru
static void calioc2_handle_quirks(struct iommu_table *tbl, struct
pci_dev *dev);
static void calioc2_tce_cache_blast(struct iommu_table *tbl);
static void calioc2_dump_error_regs(struct iommu_table *tbl);
+static int calgary_bus_has_devices(int , unsigned short ) __init;static struct cal_chipset_ops calgary_chip_ops = {
.handle_quirks = calgary_handle_quirks,
@@ -819,7 +821,23 @@ static int __init calgary_setup_tar(strutbl = pci_iommu(dev->bus);
tbl->it_base = (unsigned
long)bus_info[dev->bus->number].tce_space;
- tce_free(tbl, 0, tbl->it_size);
+#ifdef CONFIG_CRASH_DUMP
+ if (is_kdump_kernel()){
+ u...
Yes, I think reserving some entries for second kernel will be important
to make sure second kernel can perform DMA while saving the kernel core.
We don't want to clear some entries in second kernel as some ongoing DMA
might be using it and we will run into issues.In general, run checkpatch.pl on the patch.
is_kdump_kernel() can go in kdump specific files and exported here.
This function can be used by some other drivers too.I think use "elfcorehdr" instead of "reset_devices" to detect the kdump
kernel. reset_devices can in general also be used for booting on a kernel
on a fauly bios which does not does a good job of resetting the devices.Thanks
Vivek
-
Hi Chandru,
Thanks for the patch. Comments on the patch below, but first a general
question for my education: the main problem here that aacraid
continues DMA'ing when it shouldn't. Why can't we shut it down
cleanly? Even without the presence of an IOMMU, it seems dangerous to
let an adapter continue DMA'ing to and from memory when the kernel is
an inconsistent state.The patch below looks reasonable *if* that is the least worst way of
doing it - let's see if we can come up with something cleaner that
doesn't rely in the new kernel on data (which may or may not be
corrupted...) from the old kernel.Cheers,
MuliPlease no #ifdefs in here. Do it like this:
if (is_kdump_kernel())
something()
else
something_else()Where is_kdump_kernel() is defined to 0 if CONFIG_CRASH_DUMP is not
set.Also, please move the part where we scan the table into a suitably
named function, e.g., calgary_init_bitmap_from_tce_table().This should be encapsulated in a function. Something like:
if (is_kdump_kernel())
What is the value of saved_max_pfn if !kdump? Can we just use it
unconditionally? If not, I prefer this asmax_pfn = is_kdump_kernel() ? saved_max_pfn : end_pfn;
Please define is_kdump_kernel() even if CONFIG_CRASH_DUMP is set, that
will let us avoid the ifdef in C files.Cheers,
Muli
--
SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007
http://www.haifa.il.ibm.com/Workshops/systor2007/Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007
-
Hi Muli,
After the kernel crash, we can't rely on the crashed kernel's data
structures or methods any more. We can't call the device shutdown methods of
all the device drivers as we might be invoking a driver which actually might
have caused the crash. Hence we don't perform any device shutdown
in the case of kdump. Instead after the crash, we try to take the shortest
route to second kernel (Execute minimum code in crashed kernel).Whatever special handling is required to bring up the second kernel on
a potentially unknown state hardware, is taken in second kernel.We will not be too concerned about ongoing DMA's as long as there is no
corruption of tce tables. That would mean DMA is happening in first
kernel's memory buffer and second kernel is not impacted. But if TCE
tables themselves are corrupted, then it can potentially interfere withI think the issue here is that some DMA was going on when first kernel
crashed. After the crash, second kernel booted and created new TCE tables
and switched to it. This resulted in ongoing DMA failure and hardware raised
an alarm.In this case, probably it would make sense to re-use the TCE tables of
previous kernel (until and unless we have a way to tell hardware not
to flag a DMA error if TCE mapping is changed while DMA is going on ?)I think, we also need to reserve some TCE table entries (in first kernel),
which can be used by second kernel for saving kernel core file to disk. There
might be a case where first kernel has used up all TCE entries and second
kernel can't allocate more. I think ppc64 has taken the approach of freeing
some entries in second kernel but that will have the problem as you might
be clearing an entry which is being used by ongoing DMA.Thanks
Vivek
-
That makes sense, but does it really make more sense to set-up proper
IOMMU translation tables for DMA's which have been triggered from the
first kernel than to either quiesce the device ASAP (this means before
enabling IOMMU translation...) or to trap such DMA's and continue,I worry about two cases: the first is TCE table corruption, the second
is DMA's to areas which were fine in the first kernel but are wrongWe don't have a way to do that, but we should be able to trap the DMA,
figure out that we're in a kdump kernel, and then just ignore it andThat's a good point - how do PPC (or other architectures with an
isolation-capable IOMMU) handle this whole issue?Cheers,
Muli
--
SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007
http://www.haifa.il.ibm.com/Workshops/systor2007/Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007
-
Trapping DMA and ignoring will make sense. I don't know how it can be done.
What does ignoring DMA mean here? Acutual DMA does not take place but device
thinks DMA is complete?Trapping DMA and ignoring in kdump kernel (if possible), seems to be better
than letting DMA happen. This will reduce chances of corruption of second
kernel.Can you please give little more details in terms of how to trap DMA. I think
Right now ppc64 just uses the TCE tables from first kernel and allows the
older DMA's to finish. Then they free up some TCE table entries (2K) to
make space for DMA entries from second kernel.Mohan, please corret me, if I am wrong.
Thanks
Vivek
-
Hi Muli,
I have tried to work with CCR ,CSR, CSMR, CSAR, CFGRW, GBRERRM
registers but have been unable to make calgary generate an exception
upon error condition. In alloc_tce_table() , I am setting WRITE_ONLY
access permission bit to tce entries but it isn't helping. Would you
kindly let me know how we can make calgary to generate an exception in
error conditions ?.Thanks,
Chandru--
Hello Andrew,
Could you pls consider the attached patch for inclusion in mainline.
Thanks,
ChandruSigned-off-by: Chandru S <chandru@in.ibm.com>
------ arch/x86/kernel/pci-calgary_64.c.orig 2008-03-10
15:54:08.000000000 +0530
+++ arch/x86/kernel/pci-calgary_64.c 2008-03-10 16:34:20.000000000 +0530
@@ -36,6 +36,7 @@
#include <linux/delay.h>
#include <linux/scatterlist.h>
#include <linux/iommu-helper.h>
+#include <linux/crash_dump.h>
#include <asm/gart.h>
#include <asm/calgary.h>
#include <asm/tce.h>
@@ -166,6 +167,8 @@ static void calgary_dump_error_regs(stru
static void calioc2_handle_quirks(struct iommu_table *tbl, struct
pci_dev *dev);
static void calioc2_tce_cache_blast(struct iommu_table *tbl);
static void calioc2_dump_error_regs(struct iommu_table *tbl);
+static void calgary_init_bitmap_from_tce_table(struct iommu_table *tbl);
+static void grab_tce_space_from_tar(void);static struct cal_chipset_ops calgary_chip_ops = {
.handle_quirks = calgary_handle_quirks,
@@ -828,7 +831,11 @@ static int __init calgary_setup_tar(strutbl = pci_iommu(dev->bus);
tbl->it_base = (unsigned long)bus_info[dev->bus->number].tce_space;
- tce_free(tbl, 0, tbl->it_size);
+
+ if (is_kdump_kernel())
+ calgary_init_bitmap_from_tce_table(tbl);
+ else
+ tce_free(tbl, 0, tbl->it_size);if (is_calgary(dev->device))
tbl->chip_ops = &calgary_chip_ops;
@@ -1186,6 +1193,9 @@ static int __init calgary_locate_bbars(v
}
}+ if (is_kdump_kernel())
+ grab_tce_space_from_tar();
+
return 0;error:
@@ -1276,6 +1286,24 @@ static inline int __init determine_tce_t
return ret;
}+/*
+ * calgary_init_bitmap_from_tce_table():
+ * Funtion for kdump case. In the second/kdump kernel initialize
+ * the bitmap based on the tce table entries ob...
On Mon, Mar 10, 2008 at 06:50:29PM +0530, Chandru wrote:
Hi Chandru,
- How do we make sure that previous kernel's TCE tables are not
overwritten
by new kernel (In case previous kernel allocated TCE tables in first
640 KB?)- How do we make sure that when new kernel tries to setup an entry in
TCE table, then it does not try to clear up an existing entry which is
still in use?Did you try the Muli suggestion of ignoring DMA error in exception
handler? What happens if I setup new table and try to switch to new
table? Some sort of error will occur. Can't we modify the handler and
ignore it for kdump case and move on?Thanks
Vivek
--
Yikes, not really :-)
You're basically saying "if we're in a kdump kernel, let's ignore all
DMA errors and hope for the best". This is not really acceptable. What
we could do is limit the scope of ignorance - only ignore errors from
the point the kdump kernel starts booting until we have reinitialized
the device and then go back to handling DMA errors correctly.Cheers,
Muli
--
Agree. Ignoring all the DMA errors in kdump kernel does not sound very
good.On a side note, typically how much time does it take to for DMA operations
to finish. Can we wait for a random amount of time in second kernel,
hoping all the DMA operations are complete, and then setup a new table?This can be atleast a stop gap solution till we come up with a good
solution?Thanks
Vivek
--
The problem as I understand it is that when we boot into the kdump
kernel we have left-over devices from the old kernel which are active
and initiating DMA's. This is fine as long as Calgayr is also active
(with the old kernel's mappings). Once we initialize Calgary (in the
kdump) kernel with the new mappings, however, it starts trapping all
those DMA's. We can't wait for them to stop, because that will only
happen when the device is re-initialized (or otherwise quiesced).Cheers,
Muli
--
I can sort-of work out what the patch does from the above quote, but it
would be nice to have a proper changelog description from yourself ratherThe patch was badly wordwrapped. Please fix your email client for the next
version.--
kdump kernel fails to boot with calgary iommu and aacraid driver on a x366
box. The ongoing dma's of aacraid from the first kernel continue to exist
until the driver is loaded in the kdump kernel. Calgary is initialized prior
to aacraid and creation of new tce tables causes wrong dma's to occur. Here
we try to get the tce tables of the first kernel in kdump kernel and use
them. While in the kdump kernel we do not allocate new tce tables but instead
read the base address register contents of calgary iommu and use the tables
that the registers point to. With these changes the kdump kernel and hence
aacraid now boots normally.Signed-off-by: Chandru Siddalingappa <chandru@in.ibm.com>
---arch/x86/kernel/pci-calgary_64.c | 91 ++++++++++++++++++++++++++---
include/linux/crash_dump.h | 8 ++
Sorry for the much delayed response on this mail. The patch was tested with
linux-2.6.26-rc6. Pls consider it for inclusion.
Thanks,diff -Naurp linux-2.6.26-rc6-orig/arch/x86/kernel/pci-calgary_64.c
linux-2.6.26-rc6/arch/x86/kernel/pci-calgary_64.c
--- linux-2.6.26-rc6-orig/arch/x86/kernel/pci-calgary_64.c 2008-06-21
13:59:08.000000000 +0530
+++ linux-2.6.26-rc6/arch/x86/kernel/pci-calgary_64.c 2008-06-21
13:59:45.000000000 +0530
@@ -36,6 +36,7 @@
#include <linux/delay.h>
#include <linux/scatterlist.h>
#include <linux/iommu-helper.h>
+#include <linux/crash_dump.h>
#include <asm/gart.h>
#include <asm/calgary.h>
#include <asm/tce.h>
@@ -167,6 +168,8 @@ static void calgary_dump_error_regs(stru
static void calioc2_handle_quirks(struct iommu_table *tbl, struct pci_dev
*dev);
static void calioc2_tce_cache_blast(struct iommu_table *tbl);
static void calioc2_dump_error_regs(struct iommu_table *tbl);
+static void calgary_init_bitmap_from_tce_table(struct iommu_table *tbl);
+static void get_tce_space_from_tar(void);static struct cal_chipset_ops calgary_chip_ops = {
.handle_quirks = calgary_handl...
Hi Chandru,
I am not a big fan of this patch, but I guess it would do as a
stop-gap measure. However it is not appropriate for 2.6.26 and should
go in at the beginning of the 2.6.27 merge window. Also, minorI think the code is correct, but the indentation is misleading. Please
unneeded whitespace.
Cheers,
Muli
--
kdump kernel fails to boot with calgary iommu and aacraid driver on a x366
box. The ongoing dma's of aacraid from the first kernel continue to exist
until the driver is loaded in the kdump kernel. Calgary is initialized prior
to aacraid and creation of new tce tables causes wrong dma's to occur. Here
we try to get the tce tables of the first kernel in kdump kernel and use
them. While in the kdump kernel we do not allocate new tce tables but instead
read the base addres register contents of calgary iommu and use the tables
that the registers point to. With these changes the kdump kernel and hence
aacraid now boots normally.Signed-off-by: Chandru Siddalingappa <chandru@in.ibm.com>
---patch taken on top of linux-2.6.26 stable. Comments from Muli Ben-Yehuda taken
into consideration. Pls apply it as a stop-gap patch until we can come up
with a more stable patch for this issue. Thanks,arch/x86/kernel/pci-calgary_64.c | 87 ++++++++++++++++++++++++++---
include/linux/crash_dump.h | 8 ++
2 files changed, 88 insertions(+), 7 deletions(-)diff -Narup linux-2.6.26-orig/arch/x86/kernel/pci-calgary_64.c
linux-2.6.26/arch/x86/kernel/pci-calgary_64.c
--- linux-2.6.26-orig/arch/x86/kernel/pci-calgary_64.c 2008-07-15
13:36:00.000000000 +0530
+++ linux-2.6.26/arch/x86/kernel/pci-calgary_64.c 2008-07-15
13:45:36.000000000 +0530
@@ -36,6 +36,7 @@
#include <linux/delay.h>
#include <linux/scatterlist.h>
#include <linux/iommu-helper.h>
+#include <linux/crash_dump.h>
#include <asm/gart.h>
#include <asm/calgary.h>
#include <asm/tce.h>
@@ -167,6 +168,8 @@ static void calgary_dump_error_regs(stru
static void calioc2_handle_quirks(struct iommu_table *tbl, struct pci_dev
*dev);
static void calioc2_tce_cache_blast(struct iommu_table *tbl);
static void calioc2_dump_error_regs(struct iommu_table *tbl);
+static void calgary_init_bitmap_from_tce_table(struct iommu_table *tbl);
+static void get_tce_space_from_tar(void...
On Tue, 15 Jul 2008 14:15:27 +0530
Is this needed in 2.6.26?
If so, the patch whcih I applied will need a little work because I had
to touch up some rejects against already-queued changes. I can fix
that issue by applying this _ahead_ of those changes, but I'm just not
able to judge whether this is needed from the information which wasYou don't _have_ to put new includes right at the end of the list!
Everyone does this, and it just maximises the probability of mergeA couple of little coding-style errors there. checkpatch missed them.
--
No, it's 2.6.27 material.
Cheers,
Muli
--
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cheers,
Muli
--
ACK (as a JAFO, though). I feel this patch was a long time coming!
Thanks!Sincerely -- Mark Salyzyn
--
