Hi,
I'm running (uname -a):
Linux porpoise 2.6.24.3 #11 Mon Mar 17 17:24:30 CDT 2008 ppc64
PPC970FX, altivec supported RackMac3,1 GNU/Linux
on an Apple Xserve G5. With the following fibre channel card (lspci
-vvv, it has two fibre connections):
0001:06:03.0 Fibre Channel: LSI Logic / Symbios Logic FC929X Fibre
Channel Adapter (rev 81)
Subsystem: LSI Logic / Symbios Logic Unknown device 10d0
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Latency: 16 (16000ns min, 2500ns max), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 53
Region 0: I/O ports at 0400 [size=256]
Region 1: Memory at 90030000 (64-bit, non-prefetchable) [size=64K]
Region 3: Memory at 90020000 (64-bit, non-prefetchable) [size=64K]
Expansion ROM at 90200000 [disabled] [size=1M]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+
Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [68] PCI-X non-bridge device
Command: DPERE- ERO- RBC=2048 OST=8
Status: Dev=ff:1f.0 64bit+ 133MHz+ SCD- USC- DC=simple
DMMRBC=2048 DMOST=8 DMCRS=64 RSCEM- 266MHz- 533MHz-
0001:06:03.1 Fibre Channel: LSI Logic / Symbios Logic FC929X Fibre
Channel Adapter (rev 81)
Subsystem: LSI Logic / Symbios Logic Unknown device 10d0
Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Latency: 16 (16000ns min, 2500ns max), Cache Line Size: 64 bytes
Interrupt: pin B routed to IRQ 53
Region 0: I/O ports at <unassigned> [disabled]
Region 1: Memory at 90010000 (64-bit, non-prefetchable)
[disabled] [size=64K]
Region 3: ...And that's a totally, wildly different part of the kernel. ANd it's a code-patch which millions of machines run all the time. Did any earlier kernel work OK? If so, which? 2.6.23?? Thanks. --
Fixing up Eric's email address. It's no longer "lsil.com". Does the exception happen every time? --
Yes and now I've tested older kernels:
Linux porpoise 2.6.20 #4 Wed Mar 19 15:32:32 CDT 2008 ppc64 PPC970FX,
altivec supported RackMac3,1 GNU/Linux
...and put another very similar LSI FC card into the box that is known
to be working (the card in the previous email was also known to be
working, but just to be sure...):
0001:06:03.0 Fibre Channel: LSI Logic / Symbios Logic FC929X Fibre
Channel Adapter (rev 81)
Subsystem: LSI Logic / Symbios Logic Unknown device 10d0
Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Latency: 16 (16000ns min, 2500ns max), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 53
Region 0: I/O ports at 0400 [disabled] [size=256]
Region 1: Memory at 90030000 (64-bit, non-prefetchable)
[disabled] [size=64K]
Region 3: Memory at 90020000 (64-bit, non-prefetchable)
[disabled] [size=64K]
Expansion ROM at 90200000 [disabled] [size=1M]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+
Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [68] PCI-X non-bridge device
Command: DPERE- ERO- RBC=2048 OST=8
Status: Dev=ff:1f.0 64bit+ 133MHz+ SCD- USC- DC=simple
DMMRBC=2048 DMOST=8 DMCRS=64 RSCEM- 266MHz- 533MHz-
0001:06:03.1 Fibre Channel: LSI Logic / Symbios Logic FC929X Fibre
Channel Adapter (rev 81)
Subsystem: LSI Logic / Symbios Logic Unknown device 10d0
Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Latency: 16 (16000ns min, 2500ns max), Cache Line Size: 64 bytes
Interrupt: pin B routed to IRQ 53
Region 0: I/O ports at <unassigned> ...Well, I just got the kernel to boot all the way with both 2.6.24.3 and 2.6.20 accidentally by changing the FC host side settings on all the target infortrend eonstor RAID boxes to point-to-point from loop. All the targets and initiators are connected to a QLogic 5600 FC switch and zones have been setup such that the initiators can see only the targets. This active zoning has not changed during testing. At the time I was actually testing the eonstor boxes outside the FC switch by connecting them directly to another Apple Xserve running OS X, which is using the same LSI FC HBA that I reported in my first post regarding this issue. After testing that each target (LUN) could be seen by the LSI FC HBA in the OSX Xserve, I reconnected the targets back into the switch. Since the Linux Xserve was endlessly rebooting itself after each kernel panic, to my surprise I had found that it completed booting! All targets and LUNs can be seen under "cat /proc/scsi/scsi". To confirm that the loop host side setting was causing the problem, I set one of the eonstor's back to loop and reset it. Immediately after the eonstor came up it caused a kernel panic on the Linux Xserve (which was already booted into the OS). This was the output on the console: porpoise login: Unrecoverable FP Unavailable Exception 800 at d000000000073da0 Oops: Unrecoverable FP Unavailable Exception, sig: 6 [#1] Modules linked in: mptfc mptscsih mptbase NIP: D000000000073DA0 LR: D0000000000675BC CTR: D000000000073DA0 REGS: c00000000076b4e0 TRAP: 0800 Not tainted (2.6.20) MSR: 9000000000009032 <EE,ME,IR,DR> CR: 48004048 XER: 00000000 TASK = c000000000671420[0] 'swapper' THREAD: c000000000768000 GPR00: D000000000073DA0 C00000000076B760 D0000000000803D0 C00000000FA46000 GPR04: C000000001B81310 0000000000000053 C00000000076B833 9000000000049032 GPR08: C000000001B82800 D0000000000782F8 0000000000000000 0000000000000000 GPR12: D00000000006A2F0 C000000000671C80 000000000023FB28 C00000000058C6D8 GPR16: C000000000664DB0 ...
Hi all, After going through lots of hardware (HBAs, motherboards, PSUs) it was determined that there was a problem with the PCI-X riser in this particular system. Sorry about the false alarm but this one was difficult to track down. Thanks, Sabuj --
