Just got quite.. bad situation on a production server here. The machine locked up hard several times in a row (required hard reboot). So I finally enabled watchdog subsystem which helped. Now I see the following (over netconsole): DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:08:07.0 ------------[ cut here ]------------ kernel BUG at drivers/scsi/aic7xxx/aic79xx_osm.c:1490! invalid opcode: 0000 [1] SMP CPU 0 Modules linked in: xfs netconsole nfsd lockd nfs_acl sunrpc exportfs autofs4 iTCO_wdt iTCO_vendor_support raid10 raid0 sr_mod cdrom ata_piix libata tg3 mptspi mptscsih mptbase ext3 jbd mbcache raid1 md_mod sd_mod aic79xx scsi_transport_spi scsi_mod Pid: 2176, comm: gzip Not tainted 2.6.24-x86-64 #2.6.24.2 RIP: 0010:[<ffffffff8805053a>] [<ffffffff8805053a>] :aic79xx:ahd_linux_queue+0x58a/0x590 RSP: 0000:ffffffff80511d40 EFLAGS: 00010082 RAX: 00000000fffffff4 RBX: ffff81018c331600 RCX: 00000000fffffff4 RDX: ffff8100063660e0 RSI: 0000000000000002 RDI: ffffffff804a2150 RBP: ffff8101a9029e40 R08: 0000000000000044 R09: 0000000000000000 R10: 00000000fffffff4 R11: ffffffff80222d80 R12: ffff8101aff8d418 R13: ffff8101aeea7000 R14: ffff8101aef50000 R15: ffff8101aeea78b4 FS: 0000000000000000(0000) GS:ffffffff804b7000(0063) knlGS:00000000f7de56b0 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: 0000000008065000 CR3: 00000001adbb8000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process gzip (pid: 2176, threadinfo ffff8101a9270000, task ffff8101a91b2000) Stack: ffff8101aff8d000 0000000000000083 0000000000000220 ffffffff80245435 ffff81014ec656c0 0000000000000293 ffff8101aff8d000 ffff81018c331600 ffff8101aef48800 ffff81018c331600 ffff8101aff8d048 ffffffff8800100c Call Trace: <IRQ> [<ffffffff80245435>] __mod_timer+0xb5/0xd0 [<ffffffff8800100c>] :scsi_mod:scsi_dispatch_cmd+0x17c/0x2e0 [<ffffffff88007db5>] :scsi_mod:scsi_request_fn+0x225/0x3d0 ...
Forgot the most important information. # uname -a Linux tbus90.msk.rgs-podm.ru 2.6.24-x86-64 #2.6.24.2 SMP Mon Feb 18 16:04:41 MSK 2008 x86_64 GNU/Linux It's mostly vanilla 2.6.24.2, with some irrelevant patches like unionfs --
On Sun, 09 Mar 2008 14:23:13 +0300 Seems that you was out of swiommu space (and aic79xx can't handle it though it should). This happened because: a) you produced more I/Os than swiommu can handle. b) swiommu space leaks due to bugs. If you hit this problem due to a), the following boot option might help: swiotlb=65536 The same machine run well with old kernels? If so, probably, 2.6.24 has new bugs that lead to swiommu space leak. --
Well, this makes little sense, right? I mean, if just a normal filesystem I/O produces more I/O requests than the machine can handle, - it means the kernel is broken. It shouldn't let the queue to grow without bounds. The hardware is quite capable - 14-drives raid10 array works which should be quite huge leakage, as it happens almost immediately, Just tried this option. Gzip is working for 15 minutes already, -- previously the system hanged within a first minute, usually first It's difficult to say if it was ok with older kernels. I'll try anyway. The thing is that this very workload is new for this machine. Once upon a time it hanged in a very similar way, but we had no time to debug the issue and just ignored it, in a hope for the best. By the way, is there something to look at, for swiommu space leaks -- like slabinfo for example...? Thanks! /mjt --
Actually, it's worse than this. The aic79xx is a fully 64 bit capable
PCI card, it shouldn't be using the iommu at all. However, it has three
DMA modes: 64 bit, 39 bit and 32 bit; with a corresponding resource
cost increasing with the number of bits. It employs special APIs to
size the masks according to the memory, in aic79xx_osm_pci.c:
if (sizeof(dma_addr_t) > 4) {
const u64 required_mask = dma_get_required_mask(dev);
if (required_mask > DMA_39BIT_MASK &&
dma_set_mask(dev, DMA_64BIT_MASK) == 0)
ahd->flags |= AHD_64BIT_ADDRESSING;
else if (required_mask > DMA_32BIT_MASK &&
dma_set_mask(dev, DMA_39BIT_MASK) == 0)
ahd->flags |= AHD_39BIT_ADDRESSING;
else
dma_set_mask(dev, DMA_32BIT_MASK);
} else {
dma_set_mask(dev, DMA_32BIT_MASK);
}
Could you firstly tell me how much memory you have, and secondly
instrument this code with the patch below to see if we can work out what
it's doing?
Thanks,
James
---
diff --git a/drivers/scsi/aic7xxx/aic79xx_osm_pci.c b/drivers/scsi/aic7xxx/aic79xx_osm_pci.c
index dfaaae5..d6e46ce 100644
--- a/drivers/scsi/aic7xxx/aic79xx_osm_pci.c
+++ b/drivers/scsi/aic7xxx/aic79xx_osm_pci.c
@@ -194,14 +194,21 @@ ahd_linux_pci_dev_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
if (sizeof(dma_addr_t) > 4) {
const u64 required_mask = dma_get_required_mask(dev);
+ printk("DEBUG: RETURNED REQUIRED MASK %llx\n",
+ (unsigned long long)required_mask);
+
if (required_mask > DMA_39BIT_MASK &&
- dma_set_mask(dev, DMA_64BIT_MASK) == 0)
+ dma_set_mask(dev, DMA_64BIT_MASK) == 0) {
+ printk("DEBUG: SET 64 BIT ADDRESSING\n");
ahd->flags |= AHD_64BIT_ADDRESSING;
- else if (required_mask > DMA_32BIT_MASK &&
- dma_set_mask(dev, DMA_39BIT_MASK) == 0)
+ } else if (required_mask > DMA_32BIT_MASK &&
+ dma_set_mask(dev, DMA_39BIT_MASK) == 0) {
+ printk("DEBUG: SET 39 BIT ADDRESSING\n");
ahd->flags |= AHD_39BIT_ADDRESSING;
- else
+ } else ...Actually, amount is insufficient, I need to know where it is. In dmesg you should see something like this on boot up: BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 0000000000097800 (usable) BIOS-e820: 0000000000097800 - 00000000000a0000 (reserved) BIOS-e820: 00000000000d2000 - 00000000000d4000 (reserved) BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000003fee0000 (usable) BIOS-e820: 000000003fee0000 - 000000003fee9000 (ACPI data) BIOS-e820: 000000003fee9000 - 000000003ff00000 (ACPI NVS) BIOS-e820: 000000003ff00000 - 0000000040000000 (reserved) BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved) If you could attach your version, that would be great. Thanks, James --
The memory map is below (6Gb total). The patch - kernel is being compiled right now. Linux version 2.6.24-x86-64 (mjt@paltus.tls.msk.ru) (gcc version 4.2.3 20071123 (prerelease) (Debian 4.2.2-4)) #2.6.24.2 SMP Mon Feb 18 16:04:41 MSK 2008 Command line: auto BOOT_IMAGE=linux-test ro root=100 swiotlb=65536 panic=30 elevator=deadline BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009dc00 (usable) BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000cfffca00 (usable) BIOS-e820: 00000000cfffca00 - 00000000d0000000 (ACPI data) BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved) BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) BIOS-e820: 0000000100000000 - 00000001b0000000 (usable) /mjt --
And here's the result (without swiotlb=65536): DEBUG: RETURNED REQUIRED MASK ffffffff DEBUG: SET 32 BIT ADDRESSING (which doesn't look like a good thing, provided this machine has 6Gb of memory...) It just crashed again, with the same message - in a few seconds after I --
That's the root cause then. There's a bug in the generic implementation of dma_get_required_mask(), a fix for which is below, if you could try it (still with the debugging patches to make sure it's working). James --- diff --git a/drivers/base/platform.c b/drivers/base/platform.c index efaf282..911ec60 100644 --- a/drivers/base/platform.c +++ b/drivers/base/platform.c @@ -648,7 +648,7 @@ u64 dma_get_required_mask(struct device *dev) high_totalram += high_totalram - 1; mask = (((u64)high_totalram) << 32) + 0xffffffff; } - return mask & *dev->dma_mask; + return mask; } EXPORT_SYMBOL_GPL(dma_get_required_mask); #endif --
James Bottomley wrote: With the 2 patches applied: DEBUG: RETURNED REQUIRED MASK 1ffffffff DEBUG: SET 39 BIT ADDRESSING I'm running the tests now. But for some reason I think it will be ok... ;) Thanks! /mjt --
