Hi Andrew,
Following Kernel panic was raised, when tried to boot using IA-64
machine with 2.6.23-rc3-mm1 kernel.
===============================================
[ 15.198125] target0:0:8: Ending Domain Validation
[ 15.203117] target0:0:8: asynchronous
[ 15.207377] scsi 0:0:8:0: Attached scsi generic sg1 type 3
[ 16.971340] GSI 41 (level, low) -> CPU 4 (0x1000) vector 72
[ 16.977053] ACPI: PCI Interrupt 0000:01:03.1[B] -> GSI 41 (level,
low) -> IRQ 72
[ 16.984767] mptbase: Initiating ioc1 bringup
[ 17.465344] ioc1: LSI53C1030 C0: Capabilities={Initiator}
[ 15.180382] scsi1 : ioc1: LSI53C1030 C0, FwRev=01032821h, Ports=1,
MaxQ=222, IRQ=72
[ 20.376426] GSI 142 (level, low) -> CPU 5 (0x1200) vector 73
[ 20.382298] ACPI: PCI Interrupt 0000:41:03.0[A] -> GSI 142 (level,
low) -> IRQ 73
[ 20.390209] mptbase: Initiating ioc2 bringup
[ 20.873561] ioc2: LSI53C1030 C0: Capabilities={Initiator}
[ 20.880366] DMA: Out of SW-IOMMU space for 263200 bytes at device ?
[ 20.886858] Kernel panic - not syncing: DMA: Memory would be corrupted
Thanks & Regards,
Kamalesh Babulal.
Gad, never seen that before. Andi, Tony: help? -
Either the driver is leaking or more likely something went wrong with the swiotlb initialization. Earlier boot messages might tell -Andi -
Attached the boot log and config file. Thanks & Regards, Kamalesh Babulal.
> Attached the boot log and config file. Kamelesh, I don't see anything obvious in the boot_log. I used your "dotconfig" file to build a 2.6.23-rc3-mm1 kernel and booted it on my test system ... it worked just fine (except that for some reason the network did not come up :-( ) I tried to compare my boot log with yours ... you have some different devices (E.g. your ethernet links are Tigon3 while mine are e1000) ... so the memory leak in a driver theory is a possibility. Try adding a dump_stack() call to lib/swiotlb.c where the "DMA: Out of SW-IOMMU space ..." message is printed. That would tell us who is making the call that fails (they might be an innocent bystander after someone else has used all the space ... but this is unlikely). -Ton -
stack dump after the message "DMA: Out of SW-IOMMU space ...." [ 20.865382] DMA: Out of SW-IOMMU space for 263200 bytes at device ? [ 20.871946] [ 20.871947] Call Trace: [ 20.876287] [<a0000001000144e0>] show_stack+0x80/0xa0 [ 20.876289] sp=e00000014322f8f0 bsp=e000000143229170 [ 20.889731] [<a000000100014530>] dump_stack+0x30/0x60 [ 20.889733] sp=e00000014322fac0 bsp=e000000143229158 [ 20.903201] [<a0000001003aaa50>] swiotlb_full+0x50/0x120 [ 20.903202] sp=e00000014322fac0 bsp=e000000143229120 [ 20.916902] [<a0000001003aac40>] swiotlb_map_single+0x120/0x1c0 [ 20.916904] sp=e00000014322fac0 bsp=e0000001432290d8 [ 20.931215] [<a0000001003ab630>] swiotlb_alloc_coherent+0x150/0x240 [ 20.931217] sp=e00000014322fac0 bsp=e000000143229090 [ 20.945923] [<a000000100550860>] PrimeIocFifos+0x4c0/0xb20 [ 20.945925] sp=e00000014322fac0 bsp=e000000143229010 [ 20.959812] [<a000000100556a80>] mpt_do_ioc_recovery+0xd60/0x28e0 [ 20.959814] sp=e00000014322faf0 bsp=e000000143228f30 [ 20.974310] [<a00000010055c8f0>] mpt_attach+0x830/0x20e0 [ 20.974311] sp=e00000014322fdc0 bsp=e000000143228eb0 [ 20.988021] [<a0000001005667b0>] mptspi_probe+0x30/0x720 [ 20.988023] sp=e00000014322fdd0 bsp=e000000143228e60 [ 21.001743] [<a0000001003bb4b0>] pci_device_probe+0x1f0/0x2c0 [ 21.001745] sp=e00000014322fdd0 bsp=e000000143228e18 [ 21.015911] [<a00000010048f9e0>] driver_probe_device+0x180/0x400 [ 21.015913] sp=e00000014322fdd0 bsp=e000000143228dc8 [ 21.030305] [<a00000010048ff20>] __driver_attach+0xc0/0x160 [ 21.030307] sp=e00000014322fdd0 bsp=e000000143228d90 [ 21.044268] [<a00000010048dd90>] bus_for_each_dev+0xb0/0x120 [ 21.044269] sp=e00000014322fdd0 bsp=e000000143228d58 [ 21.058316] [<a00000010048f640>] driver_attach+0x40/0x60 [ 21.058318] sp=e00000014322fdf0 bsp=e000000143228d38 [ 21.072024] [<a00000010048e600>] bus_add_driver+0x120/0x400 [ 21.072026] sp=e00000014322fdf0 bsp=e000000143228cf8 [ 21.085989] [<a0000001004903c0>] driver_register+0xc0/0x180 [ 21.085990] sp=e00000014322fdf0 ...
[ 20.903201] [<a0000001003aaa50>] swiotlb_full+0x50/0x120
[ 20.903202] sp=e00000014322fac0 bsp=e000000143229120
[ 20.916902] [<a0000001003aac40>] swiotlb_map_single+0x120/0x1c0
[ 20.916904] sp=e00000014322fac0 bsp=e0000001432290d8
[ 20.931215] [<a0000001003ab630>] swiotlb_alloc_coherent+0x150/0x240
[ 20.931217] sp=e00000014322fac0 bsp=e000000143229090
[ 20.945923] [<a000000100550860>] PrimeIocFifos+0x4c0/0xb20
[ 20.945925] sp=e00000014322fac0 bsp=e000000143229010
[ 20.959812] [<a000000100556a80>] mpt_do_ioc_recovery+0xd60/0x28e0
[ 20.959814] sp=e00000014322faf0 bsp=e000000143228f30
[ 20.974310] [<a00000010055c8f0>] mpt_attach+0x830/0x20e0
Hmmm! So you were in the mpt/fusion driver when you ran out
of SWIOTLB space. That's an area where we both have the same
hardware ... and since it booted for me, it means that the
driver isn't totally broken.
I'm totally ignorant of what goes on inside this driver though.
You have more "ioc's" than I do. I only see messages from mpt
bringing up ioc0 & ioc1. Your boot_log also has ioc2 (which is
where you crash). Here's the sdiff(1) output comparing the MPT
part of your boot log with my successful boot of the same kernel
and config (your log is the one on the left). Maybe some MPT/Fusion
expert can spot something important in this bit?
-Tony
Fusion MPT base driver 3.04.05 Fusion MPT base driver 3.04.05
Copyright (c) 1999-2007 LSI Corporation Copyright (c) 1999-2007 LSI Corporation
Fusion MPT SPI Host driver 3.04.05 Fusion MPT SPI Host driver 3.04.05
GSI 40 (level, low) -> CPU 3 (0x0600) vector 71 | GSI 28 (level, low) -> CPU 0 (0xc018) vector 48
ACPI: PCI Interrupt 0000:01:03.0[A] -> GSI 40 (level, low) -> | ACPI: PCI Interrupt 0000:06:02.0[A] -> GSI 28 (level, low) ->
mptbase: Initiating ioc0 bringup mptbase: Initiating ioc0 bringup
ioc0: LSI53C1030 C0: Capabilities={Initiator} | ioc0: ...The more ioc's you have, the more space you will use. jeremy -
> The more ioc's you have, the more space you will use. Default SW IOTLB allocation is 64MB ... how much should we see used per ioc? Kamelesh: You could try increasing the amount of sw iotlb space available by booting with a swiotlb=131072 argument (argument value is the number of 2K slabs to allocate ... 131072 would give you four times as much space as the default allocation). -Tony -
Hmm. Must be something else going on then. It should be less than 1MB
per ioc plus whatever is used for streaming I/O.
| mptbase: Initiating ioc2 bringup | GSI 16 (level, low) -> CPU 2 (0xc418) vector 50
| ioc2: LSI53C1030 C0: Capabilities={Initiator} | ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 16 (level, low) ->
| DMA: Out of SW-IOMMU space for 263200 bytes at device ? | uhci_hcd 0000:00:1d.0: UHCI Host Controller
| Kernel panic - not syncing: DMA: Memory would be corrupted | uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus n
-
I traced the pci_alloc_consistent calls from PrimeIocFifos on my system. There are two calls for each ioc. The first is for 266368 bytes, the second for 16320 bytes. I wonder why Kamalesh's system wants the slightly different amount (263200 bytes) from what my system asks for? It also looks to be a little unfriendly to swiotlb to ask for more than 256K at a time (see IO_TLB_SEGSIZE) in swiotlb.c -Tony -
I believe those would vary a bit based on the exact firmware rev and perhaps nvram settings. Also driver settings, but those are presumably the same. jeremy -
Actually, you can see that you have a different chip rev level and different firmware revs, so that's probably why the requested sizes are a little different. Compare /proc/mpt/ioc0/info if you're curious. There's probably a small difference. jeremy -
I tried that value and just in case swiotlb=262144. An IA-64 machines I
have here fails with the same message anyway. i.e.
[ 19.834906] mptbase: Initiating ioc1 bringup
[ 20.317152] ioc1: LSI53C1030 C0: Capabilities={Initiator}
[ 15.474303] scsi1 : ioc1: LSI53C1030 C0, FwRev=01032821h, Ports=1, MaxQ=222, IRQ=72
[ 20.669730] GSI 142 (level, low) -> CPU 5 (0x1200) vector 73
[ 20.675602] ACPI: PCI Interrupt 0000:41:03.0[A] -> GSI 142 (level, low) -> IRQ 73
[ 20.683508] mptbase: Initiating ioc2 bringup
[ 21.166796] ioc2: LSI53C1030 C0: Capabilities={Initiator}
[ 21.180539] DMA: Out of SW-IOMMU space for 263200 bytes at device ?
[ 21.187018] Kernel panic - not syncing: DMA: Memory would be corrupted
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
I saw same trouble on my box, and I chased what was wrong.
Here is today's progress of mine.
__get_free_pages() of swiotlb_alloc_coherent() fails in rc3-mm1.
(See following patch)
But, it doesn't fail on rc2-mm2, and kernel can boot up.
Hmmm....
(2.6.23-rc3-mm1)
---
swiotlb_alloc_coherent flags=21 order=3 ret=0000000000000000
DMA: Out of SW-IOMMU space for 266368 bytes at device ?
Kernel panic - not syncing: DMA: Memory would be corrupted
---
(2.6.23-rc2-mm2)
---
swiotlb_alloc_coherent flags=21 order=3 ret=e000000020080000
:
(boot up continue...)
---
lib/swiotlb.c | 2 ++
1 file changed, 2 insertions(+)
Index: current/lib/swiotlb.c
===================================================================
--- current.orig/lib/swiotlb.c 2007-08-23 22:27:01.000000000 +0900
+++ current/lib/swiotlb.c 2007-08-23 22:29:49.000000000 +0900
@@ -455,6 +455,8 @@ swiotlb_alloc_coherent(struct device *hw
flags |= GFP_DMA;
ret = (void *)__get_free_pages(flags, order);
+
+ printk("%s flags=%0x order=%d ret=%p\n",__func__, flags, order, ret);
if (ret && address_needs_mapping(hwdev, virt_to_bus(ret))) {
/*
* The allocated memory isn't reachable by the device.
--
Yasunori Goto
-
That looks to be part of the problem here ... failing an order=3 allocation during boot on a system that just a few lines earlier in the boot log reported "Memory: 37474000k/37680640k available" looks bad ... but perhaps having *more* memory is part of your problem. You may have run low on GFP_DMA memory because some allocation scaled by memory size has chewed up a lot of your memory. To check this try booting with a "mem=4G" parameter and see if that helps you. But it is also bad that the swiotlb() code failed to handle this. Can you check whether the problem is related to the size of the allocation being just over 256K (a magic number for swiotlb since IO_TLB_SEGSIZE is 128 times a slab size of 2k). Try changing lib/swiotlb.c to set IO_TLB_SEGSIZE to 256 instead. -Tony -
On Thu, 23 Aug 2007 10:22:26 -0700 Others are reporting machines which fail int he memory allcoator much earlier, and which claim to have four CPUs and 16 nodes. So something is very wonky in the rc3-mm1 page allocator. I guess suspicion has to be directed at the memoryless-nodes patches, but until that's cleared up I don't think there's much to be gained from chasing this iommu problem, now that you've worked out that it's a bogus memory allocation failure (thanks). -
I found find_next_best_node() was wrong. I confirmed boot up by the following patch. Mel-san, Kamalesh-san, could you try this? Bye. --- Fix decision of memoryless node in find_next_best_node(). This can be cause of SW-IOMMU's allocation failure. This patch is for 2.6.23-rc3-mm1. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: current/mm/page_alloc.c =================================================================== --- current.orig/mm/page_alloc.c 2007-08-24 16:03:17.000000000 +0900 +++ current/mm/page_alloc.c 2007-08-24 16:04:06.000000000 +0900 @@ -2136,7 +2136,7 @@ static int find_next_best_node(int node, * Note: N_HIGH_MEMORY state not guaranteed to be * populated yet. */ - if (pgdat->node_present_pages) + if (!pgdat->node_present_pages) continue; /* Don't want a node to appear more than once */ -- Yasunori Goto -
This boots the IA-64 successful and gets rid of that DMA corrupts memory message. As a bonus, it fixes up the memoryless nodes (the bug where Total pages == 0 and there is a BUG in page_alloc.c) by building zonelists properly. The machine still fails to boot with the more familiar net/core/skbuff.c:95 but that is a separate problem. Well spotted Yasunori-san. Andrew, this fixes a real problem and should be considered a fix to memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code.patch unless -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -
I reworked that patch and posted the update on 16aug which does not have this problem: http://marc.info/?l=linux-mm&m=118729871101418&w=4 This should replace memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code.patch in -mm. Lee -
Could you post a diff to rc3-mm1 of that patch? -
Sure. Here it is. This looks nicer to me than explicitly skipping unpopulated nodes in find_next_best_node()--as I tried to do, but botched it :-(. I didn't notice that because I'd moved on to v2 before testing with any significant load. Even when I was running with v1 with botched zonelists, I apparently had sufficient memory on each node that I never had to fallback. I also didn't notice that Andrew had added v1 instead of v2 to the mm tree. Will pay more attention in the future, I promise. Lee --------------------------- PATCH Diffs between "Fix generic usage of node_online_map" V1 & V2 Against 2.6.23-rc3-mm1 V1 -> V2: + moved population of N_HIGH_MEMORY node state mask to free_area_init_nodes(), as this is called before we build zonelists. So, we can use this mask in find_next_best_node. Still need to keep the duplicate code in early_calculate_totalpages() for zone movable setup. mm/page_alloc.c:find_next_best_node() visit only nodes with memory [N_HIGH_MEMORY mask] looking for next best node for fallback zonelists. mm/page_alloc.c:find_zone_movable_pfns_for_nodes() spread kernelcore over nodes with memory. This required calling early_calculate_totalpages() unconditionally, and populating N_HIGH_MEMORY node state therein from nodes in the early_node_map[]. This duplicates the code in free_area_init_nodes(), but I don't want to depend on this copy if ZONE_MOVABLE might go away, taking early_calculate_totalpages() with it. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> mm/page_alloc.c | 48 ++++++++++++++++++++---------------------------- 1 file changed, 20 insertions(+), 28 deletions(-) Index: Linux/mm/page_alloc.c =================================================================== --- Linux.orig/mm/page_alloc.c 2007-08-24 13:20:28.000000000 -0400 +++ Linux/mm/page_alloc.c 2007-08-24 13:25:20.000000000 -0400 @@ -2127,18 +2127,10 @@ static int find_next_best_node(int node, return node; } ...
Ahh. Yes. I remember some of that. Acked-by: Christoph Lameter <clameter@sgi.com> -
Right. Lets make sure to cc Lee on future discussions of the memoryless node patchset. Acked-by: Christoph Lameter <clameter@sgi.com> -
This patch resolves the kernel panic problem. - Kamalesh Babulal. -
FYI: This patch also allows the alloc-instantiate-race testcase in libhugetlbfs to pass again :) -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center -
boot log after passing boot parameter swiotlb=131072
[ 0.000000] 0: 32768 -> 131072
[ 0.000000] 0: 262144 -> 1703936
[ 0.000000] 1: 4194304 -> 4980736
[ 0.000000] Built 2 zonelists in Zone order, mobility grouping on.
Total pages: 1563903
[ 0.000000] Policy zone: Normal
[ 0.000000] Kernel command line:
BOOT_IMAGE=scsi0:/EFI/debian/boot/vmlinuz-autobench root=/dev/sda2
console=tty0 console=ttyS0,115200n8 ro autobench_args: root=/dev/sda2
ABAT:1187857488 profile=2 swiotlb=131072
[ 0.000000] kernel profiling enabled (shift: 2)
<<snip>>
[ 20.408360] mptbase: Initiating ioc2 bringup
[ 20.892659] ioc2: LSI53C1030 C0: Capabilities={Initiator}
[ 20.902432] DMA: Out of SW-IOMMU space for 263200 bytes at device ?
[ 20.908992]
[ 20.908993] Call Trace:
[ 20.913324] [<a0000001000144e0>] show_stack+0x80/0xa0
[ 20.913327] sp=e00000014322f8f0
bsp=e000000143229170
[ 20.926764] [<a000000100014530>] dump_stack+0x30/0x60
[ 20.926766] sp=e00000014322fac0
bsp=e000000143229158
[ 20.940225] [<a0000001003aaa50>] swiotlb_full+0x50/0x120
[ 20.940227] sp=e00000014322fac0
bsp=e000000143229120
[ 20.953915] [<a0000001003aac40>] swiotlb_map_single+0x120/0x1c0
[ 20.953916] sp=e00000014322fac0
bsp=e0000001432290d8
[ 20.968223] [<a0000001003ab630>] swiotlb_alloc_coherent+0x150/0x240
[ 20.968226] sp=e00000014322fac0
bsp=e000000143229090
[ 20.982919] [<a000000100550860>] PrimeIocFifos+0x4c0/0xb20
[ 20.982922] sp=e00000014322fac0
bsp=e000000143229010
[ 20.996801] [<a000000100556a80>] mpt_do_ioc_recovery+0xd60/0x28e0
[ 20.996803] sp=e00000014322faf0
bsp=e000000143228f30
[ 21.011292] [<a00000010055c8f0>] mpt_attach+0x830/0x20e0
[ 21.011293] ...