Hi Andrew,
Following Kernel panic was raised, when tried to boot using IA-64
machine with 2.6.23-rc3-mm1 kernel.
===============================================
[ 15.198125] target0:0:8: Ending Domain Validation
[ 15.203117] target0:0:8: asynchronous
[ 15.207377] scsi 0:0:8:0: Attached scsi generic sg1 type 3
[ 16.971340] GSI 41 (level, low) -> CPU 4 (0x1000) vector 72
[ 16.977053] ACPI: PCI Interrupt 0000:01:03.1[B] -> GSI 41 (level,
low) -> IRQ 72
[ 16.984767] mptbase: Initiating ioc1 bringup
[ 17.465344] ioc1: LSI53C1030 C0: Capabilities={Initiator}
[ 15.180382] scsi1 : ioc1: LSI53C1030 C0, FwRev=01032821h, Ports=1,
MaxQ=222, IRQ=72
[ 20.376426] GSI 142 (level, low) -> CPU 5 (0x1200) vector 73
[ 20.382298] ACPI: PCI Interrupt 0000:41:03.0[A] -> GSI 142 (level,
low) -> IRQ 73
[ 20.390209] mptbase: Initiating ioc2 bringup
[ 20.873561] ioc2: LSI53C1030 C0: Capabilities={Initiator}
[ 20.880366] DMA: Out of SW-IOMMU space for 263200 bytes at device ?
[ 20.886858] Kernel panic - not syncing: DMA: Memory would be corruptedThanks & Regards,
Kamalesh Babulal.
Gad, never seen that before. Andi, Tony: help?
-
Either the driver is leaking or more likely something went
wrong with the swiotlb initialization. Earlier boot messages
might tell-Andi
-
Attached the boot log and config file.
Thanks & Regards,
Kamalesh Babulal.
> Attached the boot log and config file.
Kamelesh,
I don't see anything obvious in the boot_log.
I used your "dotconfig" file to build a 2.6.23-rc3-mm1 kernel
and booted it on my test system ... it worked just fine
(except that for some reason the network did not come up :-( )I tried to compare my boot log with yours ... you have some
different devices (E.g. your ethernet links are Tigon3 while
mine are e1000) ... so the memory leak in a driver theory
is a possibility.Try adding a dump_stack() call to lib/swiotlb.c where the
"DMA: Out of SW-IOMMU space ..." message is printed. That
would tell us who is making the call that fails (they
might be an innocent bystander after someone else has
used all the space ... but this is unlikely).-Ton
-
stack dump after the message "DMA: Out of SW-IOMMU space ...."
[ 20.865382] DMA: Out of SW-IOMMU space for 263200 bytes at device ?
[ 20.871946]
[ 20.871947] Call Trace:
[ 20.876287] [<a0000001000144e0>] show_stack+0x80/0xa0
[ 20.876289] sp=e00000014322f8f0 bsp=e000000143229170
[ 20.889731] [<a000000100014530>] dump_stack+0x30/0x60
[ 20.889733] sp=e00000014322fac0 bsp=e000000143229158
[ 20.903201] [<a0000001003aaa50>] swiotlb_full+0x50/0x120
[ 20.903202] sp=e00000014322fac0 bsp=e000000143229120
[ 20.916902] [<a0000001003aac40>] swiotlb_map_single+0x120/0x1c0
[ 20.916904] sp=e00000014322fac0 bsp=e0000001432290d8
[ 20.931215] [<a0000001003ab630>] swiotlb_alloc_coherent+0x150/0x240
[ 20.931217] sp=e00000014322fac0 bsp=e000000143229090
[ 20.945923] [<a000000100550860>] PrimeIocFifos+0x4c0/0xb20
[ 20.945925] sp=e00000014322fac0 bsp=e000000143229010
[ 20.959812] [<a000000100556a80>] mpt_do_ioc_recovery+0xd60/0x28e0
[ 20.959814] sp=e00000014322faf0 bsp=e000000143228f30
[ 20.974310] [<a00000010055c8f0>] mpt_attach+0x830/0x20e0
[ 20.974311] sp=e00000014322fdc0 bsp=e000000143228eb0
[ 20.988021] [<a0000001005667b0>] mptspi_probe+0x30/0x720
[ 20.988023] sp=e00000014322fdd0 bsp=e000000143228e60
[ 21.001743] [<a0000001003bb4b0>] pci_device_probe+0x1f0/0x2c0
[ 21.001745] sp=e00000014322fdd0 bsp=e000000143228e18
[ 21.015911] [<a00000010048f9e0>] driver_probe_device+0x180/0x400
[ 21.015913] sp=e00000014322fdd0 bsp=e000000143228dc8
[ 21.030305] [<a00000010048ff20>] __driver_attach+0xc0/0x160
[ 21.030307] sp=e00000014322fdd0 bsp=e000000143228d90
[ 21.044268] [<a00000010048dd90>] bus_for_each_dev+0xb0/0x120
[ 21.044269] sp=e00000014322fdd0 bsp=e000000143228d58
[ 21.058316] [<a00000010048f640>] driver_attach+0x40/0x60
[ 21.058318] sp=e00000014322fdf0 bsp=e000000143228d38
[ 21.072024] [<a00000010048e600>] bus_add_driver+0x120/0x400
[ 21.072026] sp=e00000014322fdf0 bsp=e000000143228cf8
[ 21.085989] [<a00000...
[ 20.903201] [<a0000001003aaa50>] swiotlb_full+0x50/0x120
[ 20.903202] sp=e00000014322fac0 bsp=e000000143229120
[ 20.916902] [<a0000001003aac40>] swiotlb_map_single+0x120/0x1c0
[ 20.916904] sp=e00000014322fac0 bsp=e0000001432290d8
[ 20.931215] [<a0000001003ab630>] swiotlb_alloc_coherent+0x150/0x240
[ 20.931217] sp=e00000014322fac0 bsp=e000000143229090
[ 20.945923] [<a000000100550860>] PrimeIocFifos+0x4c0/0xb20
[ 20.945925] sp=e00000014322fac0 bsp=e000000143229010
[ 20.959812] [<a000000100556a80>] mpt_do_ioc_recovery+0xd60/0x28e0
[ 20.959814] sp=e00000014322faf0 bsp=e000000143228f30
[ 20.974310] [<a00000010055c8f0>] mpt_attach+0x830/0x20e0Hmmm! So you were in the mpt/fusion driver when you ran out
of SWIOTLB space. That's an area where we both have the same
hardware ... and since it booted for me, it means that the
driver isn't totally broken.I'm totally ignorant of what goes on inside this driver though.
You have more "ioc's" than I do. I only see messages from mpt
bringing up ioc0 & ioc1. Your boot_log also has ioc2 (which is
where you crash). Here's the sdiff(1) output comparing the MPT
part of your boot log with my successful boot of the same kernel
and config (your log is the one on the left). Maybe some MPT/Fusion
expert can spot something important in this bit?-Tony
Fusion MPT base driver 3.04.05 Fusion MPT base driver 3.04.05
Copyright (c) 1999-2007 LSI Corporation Copyright (c) 1999-2007 LSI Corporation
Fusion MPT SPI Host driver 3.04.05 Fusion MPT SPI Host driver 3.04.05
GSI 40 (level, low) -> CPU 3 (0x0600) vector 71 | GSI 28 (level, low) -> CPU 0 (0xc018) vector 48
ACPI: PCI Interrupt 0000:01:03.0[A] -> GSI 40 (level, low) -> | ACPI: PCI Interrupt 0000:06:02.0[A] -> GSI 28 (level, low) ->
mptbase: Initiating ioc0 bringup mptbase: Initiating ioc0 bringup
ioc0: LSI53C1030...
The more ioc's you have, the more space you will use.
jeremy
-
> The more ioc's you have, the more space you will use.
Default SW IOTLB allocation is 64MB ... how much should we see
used per ioc?Kamelesh: You could try increasing the amount of sw iotlb space
available by booting with a swiotlb=131072 argument (argument
value is the number of 2K slabs to allocate ... 131072 would
give you four times as much space as the default allocation).-Tony
-
boot log after passing boot parameter swiotlb=131072
[ 0.000000] 0: 32768 -> 131072
[ 0.000000] 0: 262144 -> 1703936
[ 0.000000] 1: 4194304 -> 4980736
[ 0.000000] Built 2 zonelists in Zone order, mobility grouping on.
Total pages: 1563903
[ 0.000000] Policy zone: Normal
[ 0.000000] Kernel command line:
BOOT_IMAGE=scsi0:/EFI/debian/boot/vmlinuz-autobench root=/dev/sda2
console=tty0 console=ttyS0,115200n8 ro autobench_args: root=/dev/sda2
ABAT:1187857488 profile=2 swiotlb=131072
[ 0.000000] kernel profiling enabled (shift: 2)
<<snip>>
[ 20.408360] mptbase: Initiating ioc2 bringup
[ 20.892659] ioc2: LSI53C1030 C0: Capabilities={Initiator}
[ 20.902432] DMA: Out of SW-IOMMU space for 263200 bytes at device ?
[ 20.908992]
[ 20.908993] Call Trace:
[ 20.913324] [<a0000001000144e0>] show_stack+0x80/0xa0
[ 20.913327] sp=e00000014322f8f0
bsp=e000000143229170
[ 20.926764] [<a000000100014530>] dump_stack+0x30/0x60
[ 20.926766] sp=e00000014322fac0
bsp=e000000143229158
[ 20.940225] [<a0000001003aaa50>] swiotlb_full+0x50/0x120
[ 20.940227] sp=e00000014322fac0
bsp=e000000143229120
[ 20.953915] [<a0000001003aac40>] swiotlb_map_single+0x120/0x1c0
[ 20.953916] sp=e00000014322fac0
bsp=e0000001432290d8
[ 20.968223] [<a0000001003ab630>] swiotlb_alloc_coherent+0x150/0x240
[ 20.968226] sp=e00000014322fac0
bsp=e000000143229090
[ 20.982919] [<a000000100550860>] PrimeIocFifos+0x4c0/0xb20
[ 20.982922] sp=e00000014322fac0
bsp=e000000143229010
[ 20.996801] [<a000000100556a80>] mpt_do_ioc_recovery+0xd60/0x28e0
[ 20.996803] sp=e00000014322faf0
bsp=e000000143228f30
[ 21.011292] [<a00000010055c8f0>] mpt_attach+0x830/0x...
I tried that value and just in case swiotlb=262144. An IA-64 machines I
have here fails with the same message anyway. i.e.[ 19.834906] mptbase: Initiating ioc1 bringup
[ 20.317152] ioc1: LSI53C1030 C0: Capabilities={Initiator}
[ 15.474303] scsi1 : ioc1: LSI53C1030 C0, FwRev=01032821h, Ports=1, MaxQ=222, IRQ=72
[ 20.669730] GSI 142 (level, low) -> CPU 5 (0x1200) vector 73
[ 20.675602] ACPI: PCI Interrupt 0000:41:03.0[A] -> GSI 142 (level, low) -> IRQ 73
[ 20.683508] mptbase: Initiating ioc2 bringup
[ 21.166796] ioc2: LSI53C1030 C0: Capabilities={Initiator}
[ 21.180539] DMA: Out of SW-IOMMU space for 263200 bytes at device ?
[ 21.187018] Kernel panic - not syncing: DMA: Memory would be corrupted--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
I saw same trouble on my box, and I chased what was wrong.
Here is today's progress of mine.__get_free_pages() of swiotlb_alloc_coherent() fails in rc3-mm1.
(See following patch)
But, it doesn't fail on rc2-mm2, and kernel can boot up.Hmmm....
(2.6.23-rc3-mm1)
---
swiotlb_alloc_coherent flags=21 order=3 ret=0000000000000000
DMA: Out of SW-IOMMU space for 266368 bytes at device ?
Kernel panic - not syncing: DMA: Memory would be corrupted
---(2.6.23-rc2-mm2)
---
swiotlb_alloc_coherent flags=21 order=3 ret=e000000020080000
:
(boot up continue...)---
lib/swiotlb.c | 2 ++
1 file changed, 2 insertions(+)Index: current/lib/swiotlb.c
===================================================================
--- current.orig/lib/swiotlb.c 2007-08-23 22:27:01.000000000 +0900
+++ current/lib/swiotlb.c 2007-08-23 22:29:49.000000000 +0900
@@ -455,6 +455,8 @@ swiotlb_alloc_coherent(struct device *hw
flags |= GFP_DMA;ret = (void *)__get_free_pages(flags, order);
+
+ printk("%s flags=%0x order=%d ret=%p\n",__func__, flags, order, ret);
if (ret && address_needs_mapping(hwdev, virt_to_bus(ret))) {
/*
* The allocated memory isn't reachable by the device.--
Yasunori Goto-
That looks to be part of the problem here ... failing an order=3
allocation during boot on a system that just a few lines earlier
in the boot log reported "Memory: 37474000k/37680640k available"
looks bad ... but perhaps having *more* memory is part of your problem.
You may have run low on GFP_DMA memory because some allocation
scaled by memory size has chewed up a lot of your memory. To check
this try booting with a "mem=4G" parameter and see if that helps
you.But it is also bad that the swiotlb() code failed to handle this.
Can you check whether the problem is related to the size of the
allocation being just over 256K (a magic number for swiotlb since
IO_TLB_SEGSIZE is 128 times a slab size of 2k). Try changing
lib/swiotlb.c to set IO_TLB_SEGSIZE to 256 instead.-Tony
-
On Thu, 23 Aug 2007 10:22:26 -0700
Others are reporting machines which fail int he memory allcoator much
earlier, and which claim to have four CPUs and 16 nodes. So something is
very wonky in the rc3-mm1 page allocator.I guess suspicion has to be directed at the memoryless-nodes patches, but
until that's cleared up I don't think there's much to be gained from
chasing this iommu problem, now that you've worked out that it's a bogus
memory allocation failure (thanks).-
I found find_next_best_node() was wrong.
I confirmed boot up by the following patch.
Mel-san, Kamalesh-san, could you try this?Bye.
---Fix decision of memoryless node in find_next_best_node().
This can be cause of SW-IOMMU's allocation failure.This patch is for 2.6.23-rc3-mm1.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)Index: current/mm/page_alloc.c
===================================================================
--- current.orig/mm/page_alloc.c 2007-08-24 16:03:17.000000000 +0900
+++ current/mm/page_alloc.c 2007-08-24 16:04:06.000000000 +0900
@@ -2136,7 +2136,7 @@ static int find_next_best_node(int node,
* Note: N_HIGH_MEMORY state not guaranteed to be
* populated yet.
*/
- if (pgdat->node_present_pages)
+ if (!pgdat->node_present_pages)
continue;/* Don't want a node to appear more than once */
--
Yasunori Goto-
FYI: This patch also allows the alloc-instantiate-race testcase in
libhugetlbfs to pass again :)--
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center-
This patch resolves the kernel panic problem.
-
Kamalesh Babulal.
-
This boots the IA-64 successful and gets rid of that DMA corrupts
memory message. As a bonus, it fixes up the memoryless nodes (the bug
where Total pages == 0 and there is a BUG in page_alloc.c) by building
zonelists properly. The machine still fails to boot with the more familiar
net/core/skbuff.c:95 but that is a separate problem.Well spotted Yasunori-san.
Andrew, this fixes a real problem and should be considered a fix to
memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code.patch unless--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
Right. Lets make sure to cc Lee on future discussions of the memoryless
node patchset.Acked-by: Christoph Lameter <clameter@sgi.com>
-
I reworked that patch and posted the update on 16aug which does not have
this problem:http://marc.info/?l=linux-mm&m=118729871101418&w=4
This should replace
memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code.patch
in -mm.Lee
-
Could you post a diff to rc3-mm1 of that patch?
-
Sure. Here it is. This looks nicer to me than explicitly skipping
unpopulated nodes in find_next_best_node()--as I tried to do, but
botched it :-(. I didn't notice that because I'd moved on to v2 before
testing with any significant load. Even when I was running with v1 with
botched zonelists, I apparently had sufficient memory on each node that
I never had to fallback.I also didn't notice that Andrew had added v1 instead of v2 to the mm
tree. Will pay more attention in the future, I promise.Lee
---------------------------
PATCH Diffs between "Fix generic usage of node_online_map" V1 & V2Against 2.6.23-rc3-mm1
V1 -> V2:
+ moved population of N_HIGH_MEMORY node state mask to
free_area_init_nodes(), as this is called before we
build zonelists. So, we can use this mask in
find_next_best_node. Still need to keep the duplicate
code in early_calculate_totalpages() for zone movable
setup.mm/page_alloc.c:find_next_best_node()
visit only nodes with memory [N_HIGH_MEMORY mask]
looking for next best node for fallback zonelists.mm/page_alloc.c:find_zone_movable_pfns_for_nodes()
spread kernelcore over nodes with memory.
This required calling early_calculate_totalpages()
unconditionally, and populating N_HIGH_MEMORY node
state therein from nodes in the early_node_map[].
This duplicates the code in free_area_init_nodes(), but
I don't want to depend on this copy if ZONE_MOVABLE
might go away, taking early_calculate_totalpages()
with it.Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
mm/page_alloc.c | 48 ++++++++++++++++++++----------------------------
1 file changed, 20 insertions(+), 28 deletions(-)Index: Linux/mm/page_alloc.c
===================================================================
--- Linux.orig/mm/page_alloc.c 2007-08-24 13:20:28.000000000 -0400
+++ Linux/mm/page_alloc.c 2007-08-24 13:25:20.000000000 -0400
@@ -2127,18 +2127,10 @@ static int find_next_best_node(int node,
return node;
...
Ahh. Yes. I remember some of that.
Acked-by: Christoph Lameter <clameter@sgi.com>
-
Hmm. Must be something else going on then. It should be less than 1MB
per ioc plus whatever is used for streaming I/O.| mptbase: Initiating ioc2 bringup | GSI 16 (level, low) -> CPU 2 (0xc418) vector 50
| ioc2: LSI53C1030 C0: Capabilities={Initiator} | ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 16 (level, low) ->
| DMA: Out of SW-IOMMU space for 263200 bytes at device ? | uhci_hcd 0000:00:1d.0: UHCI Host Controller
| Kernel panic - not syncing: DMA: Memory would be corrupted | uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus n
-
I traced the pci_alloc_consistent calls from PrimeIocFifos on my
system. There are two calls for each ioc. The first is for
266368 bytes, the second for 16320 bytes.I wonder why Kamalesh's system wants the slightly different
amount (263200 bytes) from what my system asks for?It also looks to be a little unfriendly to swiotlb to ask for
more than 256K at a time (see IO_TLB_SEGSIZE) in swiotlb.c-Tony
-
I believe those would vary a bit based on the exact firmware
rev and perhaps nvram settings. Also driver settings, but
those are presumably the same.jeremy
-
Actually, you can see that you have a different chip rev level and different
firmware revs, so that's probably why the requested sizes are a little
different.Compare /proc/mpt/ioc0/info if you're curious. There's probably a small
difference.jeremy
-
| Brandeburg, Jesse | RE: [regression] e1000e broke e1000 (was: Re: [ANNOUNCE] e1000 toe1000e migration ... |
| Vladislav Bolkhovitin | Re: Integration of SCST in the mainstream Linux kernel |
| Linus Torvalds | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| IKEDA Munehiro | [RFD] Documentation/stable_api_nonsense.txt translated into Japanese |
git: | |
| Gerrit Renker | [PATCH 02/37] dccp: Implement lookup table for feature-negotiation information |
| Paweł Staszewski | Re: rib_trie / Fix inflate_threshold_root. Now=15 size=11 bits |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | [GIT]: Networking |
