RE: [BUG] 2.6.23-rc3-mm1 Kernel panic - not syncing: DMA: Memory would be corrupted

Previous thread: [PATCH 4/4] ehea: show physical port state by Jan-Bernd Themann on Wednesday, August 22, 2007 - 7:21 am. (2 messages)

Next thread: how is the boot process of the multicore system by Xu Yang on Wednesday, August 22, 2007 - 7:53 am. (3 messages)
From: Kamalesh Babulal
Date: Wednesday, August 22, 2007 - 7:32 am

Hi Andrew,

Following Kernel panic was raised, when tried to boot using IA-64
machine with 2.6.23-rc3-mm1 kernel.
===============================================
[   15.198125]  target0:0:8: Ending Domain Validation
[   15.203117]  target0:0:8: asynchronous
[   15.207377] scsi 0:0:8:0: Attached scsi generic sg1 type 3
[   16.971340] GSI 41 (level, low) -> CPU 4 (0x1000) vector 72
[   16.977053] ACPI: PCI Interrupt 0000:01:03.1[B] -> GSI 41 (level, 
low) -> IRQ 72
[   16.984767] mptbase: Initiating ioc1 bringup
[   17.465344] ioc1: LSI53C1030 C0: Capabilities={Initiator}
[   15.180382] scsi1 : ioc1: LSI53C1030 C0, FwRev=01032821h, Ports=1, 
MaxQ=222, IRQ=72
[   20.376426] GSI 142 (level, low) -> CPU 5 (0x1200) vector 73
[   20.382298] ACPI: PCI Interrupt 0000:41:03.0[A] -> GSI 142 (level, 
low) -> IRQ 73
[   20.390209] mptbase: Initiating ioc2 bringup
[   20.873561] ioc2: LSI53C1030 C0: Capabilities={Initiator}
[   20.880366] DMA: Out of SW-IOMMU space for 263200 bytes at device ?
[   20.886858] Kernel panic - not syncing: DMA: Memory would be corrupted

Thanks & Regards,
Kamalesh Babulal.


From: Andrew Morton
Date: Wednesday, August 22, 2007 - 9:19 am

Gad, never seen that before.  Andi, Tony: help?
-

From: Andi Kleen
Date: Wednesday, August 22, 2007 - 10:25 am

Either the driver is leaking or more likely something went 
wrong with the swiotlb initialization. Earlier boot messages
might tell

-Andi
-

From: Kamalesh Babulal
Date: Wednesday, August 22, 2007 - 11:31 am

Attached the boot log and config file.

Thanks & Regards,
Kamalesh Babulal.
From: Luck, Tony
Date: Wednesday, August 22, 2007 - 2:04 pm

> Attached the boot log and config file.

Kamelesh,

I don't see anything obvious in the boot_log.

I used your "dotconfig" file to build a 2.6.23-rc3-mm1 kernel
and booted it on my test system ... it worked just fine
(except that for some reason the network did not come up :-( )

I tried to compare my boot log with yours ... you have some
different devices (E.g. your ethernet links are Tigon3 while
mine are e1000) ... so the memory leak in a driver theory
is a possibility.

Try adding a dump_stack() call to lib/swiotlb.c where the
"DMA: Out of SW-IOMMU space ..." message is printed.  That
would tell us who is making the call that fails (they
might be an innocent bystander after someone else has
used all the space ... but this is unlikely).

-Ton
-

From: Kamalesh Babulal
Date: Wednesday, August 22, 2007 - 3:24 pm

stack dump after the message "DMA: Out of SW-IOMMU space ...."


[ 20.865382] DMA: Out of SW-IOMMU space for 263200 bytes at device ?
[ 20.871946]
[ 20.871947] Call Trace:
[ 20.876287] [<a0000001000144e0>] show_stack+0x80/0xa0
[ 20.876289] sp=e00000014322f8f0 bsp=e000000143229170
[ 20.889731] [<a000000100014530>] dump_stack+0x30/0x60
[ 20.889733] sp=e00000014322fac0 bsp=e000000143229158
[ 20.903201] [<a0000001003aaa50>] swiotlb_full+0x50/0x120
[ 20.903202] sp=e00000014322fac0 bsp=e000000143229120
[ 20.916902] [<a0000001003aac40>] swiotlb_map_single+0x120/0x1c0
[ 20.916904] sp=e00000014322fac0 bsp=e0000001432290d8
[ 20.931215] [<a0000001003ab630>] swiotlb_alloc_coherent+0x150/0x240
[ 20.931217] sp=e00000014322fac0 bsp=e000000143229090
[ 20.945923] [<a000000100550860>] PrimeIocFifos+0x4c0/0xb20
[ 20.945925] sp=e00000014322fac0 bsp=e000000143229010
[ 20.959812] [<a000000100556a80>] mpt_do_ioc_recovery+0xd60/0x28e0
[ 20.959814] sp=e00000014322faf0 bsp=e000000143228f30
[ 20.974310] [<a00000010055c8f0>] mpt_attach+0x830/0x20e0
[ 20.974311] sp=e00000014322fdc0 bsp=e000000143228eb0
[ 20.988021] [<a0000001005667b0>] mptspi_probe+0x30/0x720
[ 20.988023] sp=e00000014322fdd0 bsp=e000000143228e60
[ 21.001743] [<a0000001003bb4b0>] pci_device_probe+0x1f0/0x2c0
[ 21.001745] sp=e00000014322fdd0 bsp=e000000143228e18
[ 21.015911] [<a00000010048f9e0>] driver_probe_device+0x180/0x400
[ 21.015913] sp=e00000014322fdd0 bsp=e000000143228dc8
[ 21.030305] [<a00000010048ff20>] __driver_attach+0xc0/0x160
[ 21.030307] sp=e00000014322fdd0 bsp=e000000143228d90
[ 21.044268] [<a00000010048dd90>] bus_for_each_dev+0xb0/0x120
[ 21.044269] sp=e00000014322fdd0 bsp=e000000143228d58
[ 21.058316] [<a00000010048f640>] driver_attach+0x40/0x60
[ 21.058318] sp=e00000014322fdf0 bsp=e000000143228d38
[ 21.072024] [<a00000010048e600>] bus_add_driver+0x120/0x400
[ 21.072026] sp=e00000014322fdf0 bsp=e000000143228cf8
[ 21.085989] [<a0000001004903c0>] driver_register+0xc0/0x180
[ 21.085990] sp=e00000014322fdf0 ...
From: Luck, Tony
Date: Wednesday, August 22, 2007 - 3:56 pm

[ 20.903201] [<a0000001003aaa50>] swiotlb_full+0x50/0x120
[ 20.903202] sp=e00000014322fac0 bsp=e000000143229120
[ 20.916902] [<a0000001003aac40>] swiotlb_map_single+0x120/0x1c0
[ 20.916904] sp=e00000014322fac0 bsp=e0000001432290d8
[ 20.931215] [<a0000001003ab630>] swiotlb_alloc_coherent+0x150/0x240
[ 20.931217] sp=e00000014322fac0 bsp=e000000143229090
[ 20.945923] [<a000000100550860>] PrimeIocFifos+0x4c0/0xb20
[ 20.945925] sp=e00000014322fac0 bsp=e000000143229010
[ 20.959812] [<a000000100556a80>] mpt_do_ioc_recovery+0xd60/0x28e0
[ 20.959814] sp=e00000014322faf0 bsp=e000000143228f30
[ 20.974310] [<a00000010055c8f0>] mpt_attach+0x830/0x20e0


Hmmm!  So you were in the mpt/fusion driver when you ran out
of SWIOTLB space.  That's an area where we both have the same
hardware ... and since it booted for me, it means that the
driver isn't totally broken.

I'm totally ignorant of what goes on inside this driver though.
You have more "ioc's" than I do.  I only see messages from mpt
bringing up ioc0 & ioc1.  Your boot_log also has ioc2 (which is
where you crash).  Here's the sdiff(1) output comparing the MPT
part of your boot log with my successful boot of the same kernel
and config (your log is the one on the left).  Maybe some MPT/Fusion
expert can spot something important in this bit?

-Tony

Fusion MPT base driver 3.04.05                                  Fusion MPT base driver 3.04.05
Copyright (c) 1999-2007 LSI Corporation                         Copyright (c) 1999-2007 LSI Corporation
Fusion MPT SPI Host driver 3.04.05                              Fusion MPT SPI Host driver 3.04.05
GSI 40 (level, low) -> CPU 3 (0x0600) vector 71               | GSI 28 (level, low) -> CPU 0 (0xc018) vector 48
ACPI: PCI Interrupt 0000:01:03.0[A] -> GSI 40 (level, low) -> | ACPI: PCI Interrupt 0000:06:02.0[A] -> GSI 28 (level, low) ->
mptbase: Initiating ioc0 bringup                                mptbase: Initiating ioc0 bringup
ioc0: LSI53C1030 C0: Capabilities={Initiator}                 | ioc0: ...
From: Jeremy Higdon
Date: Wednesday, August 22, 2007 - 4:11 pm

The more ioc's you have, the more space you will use.

jeremy
-

From: Luck, Tony
Date: Wednesday, August 22, 2007 - 4:27 pm

> The more ioc's you have, the more space you will use.

Default SW IOTLB allocation is 64MB ... how much should we see
used per ioc?

Kamelesh: You could try increasing the amount of sw iotlb space
available by booting with a swiotlb=131072 argument (argument
value is the number of 2K slabs to allocate ... 131072 would
give you four times as much space as the default allocation).

-Tony
-

From: Jeremy Higdon
Date: Wednesday, August 22, 2007 - 4:54 pm

Hmm.  Must be something else going on then.  It should be less than 1MB
per ioc plus whatever is used for streaming I/O.

| mptbase: Initiating ioc2 bringup                              | GSI 16 (level, low) -> CPU 2 (0xc418) vector 50
| ioc2: LSI53C1030 C0: Capabilities={Initiator}                 | ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 16 (level, low) ->
| DMA: Out of SW-IOMMU space for 263200 bytes at device ?       | uhci_hcd 0000:00:1d.0: UHCI Host Controller
| Kernel panic - not syncing: DMA: Memory would be corrupted    | uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus n
-

From: Luck, Tony
Date: Wednesday, August 22, 2007 - 5:05 pm

I traced the pci_alloc_consistent calls from PrimeIocFifos on my
system.  There are two calls for each ioc.  The first is for
266368 bytes, the second for 16320 bytes.

I wonder why Kamalesh's system wants the slightly different
amount (263200 bytes) from what my system asks for?

It also looks to be a little unfriendly to swiotlb to ask for
more than 256K at a time (see IO_TLB_SEGSIZE) in swiotlb.c

-Tony
-

From: Jeremy Higdon
Date: Wednesday, August 22, 2007 - 6:09 pm

I believe those would vary a bit based on the exact firmware
rev and perhaps nvram settings.  Also driver settings, but
those are presumably the same.

jeremy
-

From: Jeremy Higdon
Date: Wednesday, August 22, 2007 - 6:16 pm

Actually, you can see that you have a different chip rev level and different
firmware revs, so that's probably why the requested sizes are a little
different.

Compare /proc/mpt/ioc0/info if you're curious.  There's probably a small
difference.

jeremy
-

From: Mel Gorman
Date: Thursday, August 23, 2007 - 2:15 am

I tried that value and just in case swiotlb=262144. An IA-64 machines I
have here fails with the same message anyway. i.e.

[   19.834906] mptbase: Initiating ioc1 bringup
[   20.317152] ioc1: LSI53C1030 C0: Capabilities={Initiator}
[   15.474303] scsi1 : ioc1: LSI53C1030 C0, FwRev=01032821h, Ports=1, MaxQ=222, IRQ=72
[   20.669730] GSI 142 (level, low) -> CPU 5 (0x1200) vector 73
[   20.675602] ACPI: PCI Interrupt 0000:41:03.0[A] -> GSI 142 (level, low) -> IRQ 73
[   20.683508] mptbase: Initiating ioc2 bringup
[   21.166796] ioc2: LSI53C1030 C0: Capabilities={Initiator}
[   21.180539] DMA: Out of SW-IOMMU space for 263200 bytes at device ?
[   21.187018] Kernel panic - not syncing: DMA: Memory would be corrupted

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
-

From: Yasunori Goto
Date: Thursday, August 23, 2007 - 6:27 am

I saw same trouble on my box, and I chased what was wrong.
Here is today's progress of mine.

__get_free_pages() of swiotlb_alloc_coherent() fails in rc3-mm1.
(See following patch)
But, it doesn't fail on rc2-mm2, and kernel can boot up.

Hmmm....


(2.6.23-rc3-mm1)
---
swiotlb_alloc_coherent flags=21 order=3 ret=0000000000000000
DMA: Out of SW-IOMMU space for 266368 bytes at device ?
Kernel panic - not syncing: DMA: Memory would be corrupted
---




(2.6.23-rc2-mm2)
---
swiotlb_alloc_coherent flags=21 order=3 ret=e000000020080000
           :
       (boot up continue...)

---
 lib/swiotlb.c |    2 ++
 1 file changed, 2 insertions(+)

Index: current/lib/swiotlb.c
===================================================================
--- current.orig/lib/swiotlb.c	2007-08-23 22:27:01.000000000 +0900
+++ current/lib/swiotlb.c	2007-08-23 22:29:49.000000000 +0900
@@ -455,6 +455,8 @@ swiotlb_alloc_coherent(struct device *hw
 	flags |= GFP_DMA;
 
 	ret = (void *)__get_free_pages(flags, order);
+
+	printk("%s flags=%0x order=%d ret=%p\n",__func__, flags, order, ret);
 	if (ret && address_needs_mapping(hwdev, virt_to_bus(ret))) {
 		/*
 		 * The allocated memory isn't reachable by the device.


-- 
Yasunori Goto 


-

From: Luck, Tony
Date: Thursday, August 23, 2007 - 10:22 am

That looks to be part of the problem here ... failing an order=3
allocation during boot on a system that just a few lines earlier
in the boot log reported "Memory: 37474000k/37680640k available"
looks bad ... but perhaps having *more* memory is part of your problem.
You may have run low on GFP_DMA memory because some allocation
scaled by memory size has chewed up a lot of your memory.  To check
this try booting with a "mem=4G" parameter and see if that helps
you.

But it is also bad that the swiotlb() code failed to handle this.
Can you check whether the problem is related to the size of the
allocation being just over 256K (a magic number for swiotlb since
IO_TLB_SEGSIZE is 128 times a slab size of 2k).  Try changing
lib/swiotlb.c to set IO_TLB_SEGSIZE to 256 instead.

-Tony
-

From: Andrew Morton
Date: Thursday, August 23, 2007 - 2:21 pm

On Thu, 23 Aug 2007 10:22:26 -0700

Others are reporting machines which fail int he memory allcoator much
earlier, and which claim to have four CPUs and 16 nodes.  So something is
very wonky in the rc3-mm1 page allocator.

I guess suspicion has to be directed at the memoryless-nodes patches, but
until that's cleared up I don't think there's much to be gained from
chasing this iommu problem, now that you've worked out that it's a bogus
memory allocation failure (thanks).


-


I found find_next_best_node() was wrong.
I confirmed boot up by the following patch.
Mel-san, Kamalesh-san, could you try this?

Bye.
---

Fix decision of memoryless node in find_next_best_node().
This can be cause of SW-IOMMU's allocation failure.

This patch is for 2.6.23-rc3-mm1.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>

---
 mm/page_alloc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: current/mm/page_alloc.c
===================================================================
--- current.orig/mm/page_alloc.c	2007-08-24 16:03:17.000000000 +0900
+++ current/mm/page_alloc.c	2007-08-24 16:04:06.000000000 +0900
@@ -2136,7 +2136,7 @@ static int find_next_best_node(int node,
 		 * Note:  N_HIGH_MEMORY state not guaranteed to be
 		 *        populated yet.
 		 */
-		if (pgdat->node_present_pages)
+		if (!pgdat->node_present_pages)
 			continue;
 
 		/* Don't want a node to appear more than once */

-- 
Yasunori Goto 


-


This boots the IA-64 successful and gets rid of that DMA corrupts
memory message. As a bonus, it fixes up the memoryless nodes (the bug
where Total pages == 0 and there is a BUG in page_alloc.c) by building
zonelists properly. The machine still fails to boot with the more familiar
net/core/skbuff.c:95 but that is a separate problem.

Well spotted Yasunori-san.

Andrew, this fixes a real problem and should be considered a fix to
memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code.patch unless


-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
-


I reworked that patch and posted the update on 16aug which does not have
this problem:

http://marc.info/?l=linux-mm&m=118729871101418&w=4

This should replace
memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code.patch
in -mm.

Lee


-


Could you post a diff to rc3-mm1 of that patch?

-


Sure.  Here it is.  This looks nicer to me than explicitly skipping
unpopulated nodes in find_next_best_node()--as I tried to do, but
botched it :-(.  I didn't notice that because I'd moved on to v2 before
testing with any significant load.  Even when I was running with v1 with
botched zonelists, I apparently had sufficient memory on each node that
I never had to fallback.

I also didn't notice that Andrew had added v1 instead of v2 to the mm
tree.  Will pay more attention in the future, I promise.

Lee
---------------------------
PATCH Diffs between "Fix generic usage of node_online_map" V1 & V2

Against 2.6.23-rc3-mm1

V1 -> V2:
+ moved population of N_HIGH_MEMORY node state mask to
  free_area_init_nodes(), as this is called before we
  build zonelists.  So, we can use this mask in 
  find_next_best_node.  Still need to keep the duplicate
  code in early_calculate_totalpages() for zone movable
  setup.

mm/page_alloc.c:find_next_best_node()

	visit only nodes with memory [N_HIGH_MEMORY mask]
	looking for next best node for fallback zonelists.

mm/page_alloc.c:find_zone_movable_pfns_for_nodes()

	spread kernelcore over nodes with memory.

	This required calling early_calculate_totalpages()
	unconditionally, and populating N_HIGH_MEMORY node
	state therein from nodes in the early_node_map[].
	This duplicates the code in free_area_init_nodes(), but
	I don't want to depend on this copy if ZONE_MOVABLE 
	might go away, taking early_calculate_totalpages()
	with it.

Signed-off-by:  Lee Schermerhorn <lee.schermerhorn@hp.com>

 mm/page_alloc.c |   48 ++++++++++++++++++++----------------------------
 1 file changed, 20 insertions(+), 28 deletions(-)

Index: Linux/mm/page_alloc.c
===================================================================
--- Linux.orig/mm/page_alloc.c	2007-08-24 13:20:28.000000000 -0400
+++ Linux/mm/page_alloc.c	2007-08-24 13:25:20.000000000 -0400
@@ -2127,18 +2127,10 @@ static int find_next_best_node(int node,
 		return node;
 	}
 ...

Ahh. Yes. I remember some of that.

Acked-by: Christoph Lameter <clameter@sgi.com>

-


Right. Lets make sure to cc Lee on future discussions of the memoryless 
node patchset.

Acked-by: Christoph Lameter <clameter@sgi.com>
-


This patch resolves the kernel panic problem.

-
Kamalesh Babulal.
-


FYI: This patch also allows the alloc-instantiate-race testcase in
libhugetlbfs to pass again :)

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

-

From: Kamalesh Babulal
Date: Thursday, August 23, 2007 - 2:22 am

boot log after passing boot parameter swiotlb=131072

[    0.000000]     0:    32768 ->   131072
[    0.000000]     0:   262144 ->  1703936
[    0.000000]     1:  4194304 ->  4980736
[    0.000000] Built 2 zonelists in Zone order, mobility grouping on.  
Total pages: 1563903
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: 
BOOT_IMAGE=scsi0:/EFI/debian/boot/vmlinuz-autobench root=/dev/sda2 
console=tty0 console=ttyS0,115200n8 ro autobench_args: root=/dev/sda2 
ABAT:1187857488 profile=2 swiotlb=131072
[    0.000000] kernel profiling enabled (shift: 2)
<<snip>>
[   20.408360] mptbase: Initiating ioc2 bringup
[   20.892659] ioc2: LSI53C1030 C0: Capabilities={Initiator}
[   20.902432] DMA: Out of SW-IOMMU space for 263200 bytes at device ?
[   20.908992]
[   20.908993] Call Trace:
[   20.913324]  [<a0000001000144e0>] show_stack+0x80/0xa0
[   20.913327]                                 sp=e00000014322f8f0 
bsp=e000000143229170
[   20.926764]  [<a000000100014530>] dump_stack+0x30/0x60
[   20.926766]                                 sp=e00000014322fac0 
bsp=e000000143229158
[   20.940225]  [<a0000001003aaa50>] swiotlb_full+0x50/0x120
[   20.940227]                                 sp=e00000014322fac0 
bsp=e000000143229120
[   20.953915]  [<a0000001003aac40>] swiotlb_map_single+0x120/0x1c0
[   20.953916]                                 sp=e00000014322fac0 
bsp=e0000001432290d8
[   20.968223]  [<a0000001003ab630>] swiotlb_alloc_coherent+0x150/0x240
[   20.968226]                                 sp=e00000014322fac0 
bsp=e000000143229090
[   20.982919]  [<a000000100550860>] PrimeIocFifos+0x4c0/0xb20
[   20.982922]                                 sp=e00000014322fac0 
bsp=e000000143229010
[   20.996801]  [<a000000100556a80>] mpt_do_ioc_recovery+0xd60/0x28e0
[   20.996803]                                 sp=e00000014322faf0 
bsp=e000000143228f30
[   21.011292]  [<a00000010055c8f0>] mpt_attach+0x830/0x20e0
[   21.011293]                                 ...
Previous thread: [PATCH 4/4] ehea: show physical port state by Jan-Bernd Themann on Wednesday, August 22, 2007 - 7:21 am. (2 messages)

Next thread: how is the boot process of the multicore system by Xu Yang on Wednesday, August 22, 2007 - 7:53 am. (3 messages)