Hi, I have an Alive -NF5-esata2+. AM2 board with dual core AMD64 cpu. Problem exists with latest bios 1.90 and its predecessor 1.80. This board with nforce 520 chipset (MCP65) has three possible settings for = the=20 sata controller: non-raid raid ahci. My default is non-raid. With 2.6.24 based kernels I get this on cold boot: [ 49.176044] Driver 'sd' needs updating - please use bus_type methods [ 49.176117] ahci 0000:00:0a.0: version 3.0 [ 49.176291] ACPI: PCI Interrupt Link [LSA0] enabled at IRQ 23 [ 49.176335] ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LSA0] -> GSI 23= =20 (level, low) -> IRQ 23 [ 50.176871] ahci 0000:00:0a.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0x= f=20 impl IDE mode [ 50.176922] ahci 0000:00:0a.0: flags: 64bit sntf led clo pmp pio [ 50.176962] PCI: Setting latency timer of device 0000:00:0a.0 to 64 [ 50.177163] scsi0 : ahci [ 50.177259] scsi1 : ahci [ 50.177325] scsi2 : ahci [ 50.177392] scsi3 : ahci [ 50.177480] ata1: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc10= 0=20 irq 315 [ 50.177528] ata2: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc18= 0=20 irq 315 [ 50.177575] ata3: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc20= 0=20 irq 315 [ 50.177623] ata4: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc28= 0=20 irq 315 [ 50.808824] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 80.743754] ata1.00: qc timeout (cmd 0xec) [ 80.743792] ata1.00: failed to IDENTIFY (I/O error, err_mask=3D0x4) [ 80.743831] ata1: failed to recover some devices, retrying in 5 secs [ 86.368194] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 86.368803] ata1.00: ATA-7: WDC WD1600JS-00MHB1, 10.02E01, max UDMA/133 [ 86.368846] ata1.00: 312581808 sectors, multi 16: LBA48 [ 86.369491] ata1.00: configured for UDMA/133 [ 87.000153] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 87.020703] ata2.00: ATA-8: SAMSUNG HD501LJ, CR100-12, max UDMA7 [ ...
Is this consistent? Do you always get IDENTIFY timeout on cold boots This one definitely looks like a misrouted IRQ. irqpoll should help. This failure is the same for 2.6.23 and 24, right? -- tejun
irqpoll doesn't make any difference on cold boot or reboot (just adding irqpoll to kernel boot line, right?). The attached patch does not change anything (except this: ahci 0000:00:0a.0: controller can't do PMP, turning off CAP_PMP as you can see in the dmesg parts below). Here are several dmesg-parts: dmesg part of cold boot with irqpoll but no patch: [ 28.256323] Driver 'sd' needs updating - please use bus_type methods [ 28.256396] ahci 0000:00:0a.0: version 3.0 [ 28.256570] ACPI: PCI Interrupt Link [LSA0] enabled at IRQ 23 [ 28.256614] ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LSA0] -> GSI 23 (level, low) -> IRQ 23 [ 29.257150] ahci 0000:00:0a.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl IDE mode [ 29.257202] ahci 0000:00:0a.0: flags: 64bit sntf led clo pmp pio [ 29.257241] PCI: Setting latency timer of device 0000:00:0a.0 to 64 [ 29.257442] scsi0 : ahci [ 29.257538] scsi1 : ahci [ 29.257604] scsi2 : ahci [ 29.257671] scsi3 : ahci [ 29.257760] ata1: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc100 irq 315 [ 29.257808] ata2: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc180 irq 315 [ 29.257855] ata3: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc200 irq 315 [ 29.257903] ata4: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc280 irq 315 [ 29.889103] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 59.824033] ata1.00: qc timeout (cmd 0xec) [ 59.824072] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) [ 59.824111] ata1: failed to recover some devices, retrying in 5 secs [ 65.448472] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 65.449133] ata1.00: ATA-7: WDC WD1600JS-00MHB1, 10.02E01, max UDMA/133 [ 65.449176] ata1.00: 312581808 sectors, multi 16: LBA48 [ 65.449814] ata1.00: configured for UDMA/133 [ 66.080432] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 66.083950] ata2.00: ATA-8: SAMSUNG HD501LJ, CR100-12, max UDMA7 [ 66.083989] ...
Hi,
I tried three more kernels.
2.6.25-rc5
2.6.23.17
and
2.6.23.11
and I get the same result with all three of them which is puzzling for me,=
=20
because once 2.6.23.11 was ok and first came the timeouts then the timeouts=
=20
and then the hardware changes (PATA harddisk replaced with SATA harddisk).
this is with 2.6.25-rc5 on a coldboot:
1.360551] ahci 0000:00:0a.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf=
=20
impl IDE mode
[ 1.360551] ahci 0000:00:0a.0: flags: 64bit sntf led clo pmp pio
[ 1.360551] PCI: Setting latency timer of device 0000:00:0a.0 to 64
[ 1.360553] scsi0 : ahci
[ 1.360567] scsi1 : ahci
[ 1.360578] scsi2 : ahci
[ 1.360591] scsi3 : ahci
[ 1.360597] ata1: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc10=
0=20
irq 315
[ 1.360597] ata2: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc18=
0=20
irq 315
[ 1.360597] ata3: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc20=
0=20
irq 315
[ 1.360597] ata4: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc28=
0=20
irq 315
[ 1.794389] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 31.794187] ata1.00: qc timeout (cmd 0xec)
[ 31.794187] ata1.00: failed to IDENTIFY (I/O error, err_mask=3D0x4)
[ 31.794187] ata1: failed to recover some devices, retrying in 5 secs
[ 37.431961] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 37.432460] ata1.00: ATA-7: WDC WD1600JS-00MHB1, 10.02E01, max UDMA/133
[ 37.432460] ata1.00: 312581808 sectors, multi 16: LBA48
[ 37.432990] ata1.00: configured for UDMA/133
[ 38.064932] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 38.085423] ata2.00: ATA-8: SAMSUNG HD501LJ, CR100-12, max UDMA7
[ 38.085423] ata2.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32)
[ 38.087332] ata2.00: configured for UDMA/133
[ 38.404590] ata3: SATA link down (SStatus 0 SControl 300)
[ 38.724459] ata4: SATA link down (SStatus 0 SControl 300)
[ 38.923769] scsi 0:0:0:0: Direct-Access ...Hi, I tried some more stuff, replaced the cables, played with bios settings. No change. Then I updated to 2.6.24.3 - and no hangs or 'softreset' failures anymore. [ 38.151334] ahci 0000:00:0a.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0x= f=20 impl IDE mode [ 38.151386] ahci 0000:00:0a.0: flags: 64bit sntf led clo pmp pio [ 38.151425] PCI: Setting latency timer of device 0000:00:0a.0 to 64 [ 38.151626] scsi0 : ahci [ 38.151722] scsi1 : ahci [ 38.151788] scsi2 : ahci [ 38.151853] scsi3 : ahci [ 38.151942] ata1: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc10= 0=20 irq 315 [ 38.151990] ata2: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc18= 0=20 irq 315 [ 38.152037] ata3: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc20= 0=20 irq 315 [ 38.152085] ata4: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc28= 0=20 irq 315 [ 38.783287] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 38.783915] ata1.00: ATA-7: WDC WD1600JS-00MHB1, 10.02E01, max UDMA/133 [ 38.783957] ata1.00: 312581808 sectors, multi 16: LBA48 [ 38.784590] ata1.00: configured for UDMA/133 [ 39.415249] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 39.435823] ata2.00: ATA-8: SAMSUNG HD501LJ, CR100-12, max UDMA7 [ 39.435862] ata2.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32) [ 39.437862] ata2.00: configured for UDMA/133 [ 39.754512] ata3: SATA link down (SStatus 0 SControl 300) [ 40.073818] ata4: SATA link down (SStatus 0 SControl 300) [ 40.073907] scsi 0:0:0:0: Direct-Access ATA WDC WD1600JS-00M 10= =2E0=20 PQ: 0 ANSI: 5 [ 40.074017] sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (16004= 2=20 MB) [ 40.074061] sd 0:0:0:0: [sda] Write Protect is off [ 40.074099] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 40.074107] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled,= =20 doesn't support DPO or FUA [ 40.074178] sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors ...
Hello, I don't see any libata changes which can cause such difference. Weird. Does pci=nomsi help? -- tejun --
it was for a little bit more than 24h. I booted and rebooted several times = to=20 make sure - and everything was fine, but after a good night and on the Xth= =20 boot, the hang occured again - and since then it is there. Reliable on ever= y=20 boot :(=20 (and the softreset failed message on reboots). Of course, I booted and rebooted several times. And it stays. Maybe it is the hardware. But I replaced the cables already and smart says = the=20 disk is ok. SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours)= =20 LBA _of_first_error # 1 Short offline Completed without error 00% 8124 = - # 2 Short offline Completed without error 00% 8067 = - # 3 Short offline Completed without error 00% 3402 = - # 4 Extended offline Completed without error 00% 3374 = - oh yes! It does. I changed the 'Sata operation mode' setting from 'non raid' to AH= CI,=20 booted with that option: [ 35.026629] Driver 'sd' needs updating - please use bus_type methods [ 35.026702] ahci 0000:00:0a.0: version 3.0 [ 35.026877] ACPI: PCI Interrupt Link [LSA0] enabled at IRQ 23 [ 35.026922] ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LSA0] -> GSI 23= =20 (level, low) -> IRQ 23 [ 36.029726] ahci 0000:00:0a.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0x= f=20 impl SATA mode [ 36.029777] ahci 0000:00:0a.0: flags: 64bit sntf led clo pmp pio [ 36.029817] PCI: Setting latency timer of device 0000:00:0a.0 to 64 [ 36.030019] scsi0 : ahci [ 36.030114] scsi1 : ahci [ 36.030180] scsi2 : ahci [ 36.030245] scsi3 : ahci [ 36.030333] ata1: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc10= 0=20 irq 23 [ 36.030381] ata2: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc18= 0=20 irq 23 [ 36.030428] ata3: SATA max UDMA/133 abar m8192@0xf9dfc000 port ...
Yeah, that sounds about right. Hmm... Can you post the result of "lspci -nn"? -- tejun
patching file drivers/ata/ahci.c Hunk #1 succeeded at 397 (offset -5 lines). Hunk #2 succeeded at 1819 (offset -45 lines). built&booted, same results as allways. With non-raid setting, IDENTIFY error on cold boot and sofreset error on reboot: (with patch, without nosmi): cold boot: [ 35.397857] ahci 0000:00:0a.0: controller can't do PMP, turning off CAP_PMP [ 36.398054] ahci 0000:00:0a.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl IDE mode [ 36.398102] ahci 0000:00:0a.0: flags: 64bit sntf led clo pio [ 36.398142] PCI: Setting latency timer of device 0000:00:0a.0 to 64 [ 36.398343] scsi0 : ahci [ 36.398439] scsi1 : ahci [ 36.398505] scsi2 : ahci [ 36.398570] scsi3 : ahci [ 36.398659] ata1: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc100 irq 315 [ 36.398707] ata2: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc180 irq 315 [ 36.398754] ata3: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc200 irq 315 [ 36.398801] ata4: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc280 irq 315 [ 36.870357] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 66.805287] ata1.00: qc timeout (cmd 0xec) [ 66.805326] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) [ 66.805365] ata1: failed to recover some devices, retrying in 5 secs [ 72.270071] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 72.270674] ata1.00: ATA-7: WDC WD1600JS-00MHB1, 10.02E01, max UDMA/133 [ 72.270716] ata1.00: 312581808 sectors, multi 16: LBA48 [ 72.271355] ata1.00: configured for UDMA/133 [ 72.742378] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 72.762946] ata2.00: ATA-8: SAMSUNG HD501LJ, CR100-12, max UDMA7 [ 72.762986] ata2.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32) [ 72.764976] ata2.00: configured for UDMA/133 [ 73.074987] ata3: SATA link down (SStatus 0 SControl 300) [ 73.387641] ata4: SATA link down (SStatus 0 SControl 300) [ 73.387729] scsi 0:0:0:0: Direct-Access ATA WDC ...
It's okay. It's because I was lazy and generated the patch against Oh... I see. I made a mistake. In ahci.c, what I intended was making ahci_softreset NULL not ahci_hardreset. Can you please change that and test again? -- tejun --
Hi, I tried your patch (on freshly unpacked sources). non-raid setting in bios, coldboot: [ 37.123980] ahci 0000:00:0a.0: version 3.0 [ 37.124155] ACPI: PCI Interrupt Link [LSA0] enabled at IRQ 23 [ 37.124199] ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LSA0] -> GSI 23= =20 (level, low) -> IRQ 23 [ 37.124533] ahci 0000:00:0a.0: controller can't do PMP, turning off CAP_= PMP [ 38.124731] ahci 0000:00:0a.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0x= f=20 impl IDE mode [ 38.124780] ahci 0000:00:0a.0: flags: 64bit sntf led clo pio [ 38.124819] PCI: Setting latency timer of device 0000:00:0a.0 to 64 [ 38.125020] scsi0 : ahci [ 38.125116] scsi1 : ahci [ 38.125182] scsi2 : ahci [ 38.125248] scsi3 : ahci [ 38.125337] ata1: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc10= 0=20 irq 315 [ 38.125384] ata2: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc18= 0=20 irq 315 [ 38.125432] ata3: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc20= 0=20 irq 315 [ 38.125479] ata4: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc28= 0=20 irq 315 [ 38.597035] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 38.597732] ata1.00: ATA-7: WDC WD1600JS-00MHB1, 10.02E01, max UDMA/133 [ 38.597775] ata1.00: 312581808 sectors, multi 16: LBA48 [ 38.598405] ata1.00: configured for UDMA/133 [ 39.069342] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 39.084225] ata2.00: ATA-8: SAMSUNG HD501LJ, CR100-12, max UDMA7 [ 39.084264] ata2.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32) [ 39.086268] ata2.00: configured for UDMA/133 [ 39.405277] ata3: SATA link down (SStatus 0 SControl 300) [ 39.724583] ata4: SATA link down (SStatus 0 SControl 300) [ 39.724672] scsi 0:0:0:0: Direct-Access ATA WDC WD1600JS-00M 10= =2E0=20 PQ: 0 ANSI: 5 [ 39.724782] sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (16004= 2=20 MB) [ 39.724826] sd 0:0:0:0: [sda] Write Protect is off [ 39.724864] sd ...
So, just to confirm. With the updated patch, you don't see any problem, right? -- tejun --
Correct. With the updated patch I don't see problems in 'non-raid' mode. AHCI mode still has problems without nosmi. But that is an entirely different problem, right? Glück Auf, Volker --
Yeap, can you please post the result of "lspci -nn"? -- tejun --
with AHCI+nosmi, 2.6.24.3: lspci -nn 00:00.0 RAM memory [0500]: nVidia Corporation MCP65 Memory Controller=20 [10de:0444] (rev a1) 00:01.0 ISA bridge [0601]: nVidia Corporation MCP65 LPC Bridge [10de:0441]= =20 (rev a2) 00:01.1 SMBus [0c05]: nVidia Corporation MCP65 SMBus [10de:0446] (rev a1) 00:01.2 RAM memory [0500]: nVidia Corporation MCP65 Memory Controller=20 [10de:0445] (rev a1) 00:02.0 USB Controller [0c03]: nVidia Corporation MCP65 USB Controller=20 [10de:0454] (rev a1) 00:02.1 USB Controller [0c03]: nVidia Corporation MCP65 USB Controller=20 [10de:0455] (rev a1) 00:08.0 PCI bridge [0604]: nVidia Corporation MCP65 PCI bridge [10de:0449]= =20 (rev a1) 00:09.0 IDE interface [0101]: nVidia Corporation MCP65 IDE [10de:0448] (rev= =20 a1) 00:0a.0 SATA controller [0106]: nVidia Corporation MCP65 AHCI Controller=20 [10de:044d] (rev a1) 00:0b.0 PCI bridge [0604]: nVidia Corporation Unknown device [10de:045b] (r= ev=20 a1) 00:0c.0 PCI bridge [0604]: nVidia Corporation MCP65 PCI Express bridge=20 [10de:045a] (rev a1) 00:0d.0 PCI bridge [0604]: nVidia Corporation MCP65 PCI Express bridge=20 [10de:0458] (rev a1) 00:0e.0 PCI bridge [0604]: nVidia Corporation MCP65 PCI Express bridge=20 [10de:0459] (rev a1) 00:18.0 Host bridge [0600]: Advanced Micro Devices [AMD] K8 [Athlon64/Opter= on]=20 HyperTransport Technology Configuration [1022:1100] 00:18.1 Host bridge [0600]: Advanced Micro Devices [AMD] K8 [Athlon64/Opter= on]=20 Address Map [1022:1101] 00:18.2 Host bridge [0600]: Advanced Micro Devices [AMD] K8 [Athlon64/Opter= on]=20 DRAM Controller [1022:1102] 00:18.3 Host bridge [0600]: Advanced Micro Devices [AMD] K8 [Athlon64/Opter= on]=20 Miscellaneous Control [1022:1103] 01:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.=20 RTL8111/8168B PCI Express Gigabit Ethernet controller [10ec:8168] (rev 01) 02:08.0 SCSI storage controller [0100]: Adaptec AHA-2944UW / AIC-7884U=20 [9004:8478] (rev 01) 02:09.0 Multimedia audio controller [0401]: Creative Labs ...
Volker Armin Hemmann wrote: >On Mittwoch, 19. M
patch -p1 < /home/energyman/mcp65-ahci-debug.patch
patching file drivers/ata/libata-core.c
Hunk #1 succeeded at 7136 (offset 2 lines).
patching file drivers/ata/libata-eh.c
Hunk #1 succeeded at 2083 (offset -107 lines).
patching file drivers/pci/quirks.c
Hunk #1 succeeded at 1733 with fuzz 1 (offset -122 lines).
saved config, removed linux-2.6.24.3, unpacked 2.6.24.3 from tarball, reiser4
and your patch, make oldconfig, make menuconfig, make all modules_install
install.
I think so:
[*] PCI support
[*] Support mmconfig PCI config space access
[ ] Support for DMA Remapping Devices (EXPERIMENTAL)
[*] PCI Express support
[*] Root Port Advanced Error Reporting support
[*] Message Signaled Interrupts (MSI and MSI-X)
[*] Enable deprecated pci_find_* API
[*] Interrupts on hypertransport devices
but to make sure,
non-raid, coldboot, pci=nomsi:
cat /proc/interrupts
CPU0 CPU1
0: 57 1 IO-APIC-edge timer
1: 2 3300 IO-APIC-edge i8042
8: 0 1 IO-APIC-edge rtc
9: 0 1 IO-APIC-fasteoi acpi
12: 0 3 IO-APIC-edge i8042
17: 17 43662 IO-APIC-fasteoi nvidia
18: 0 0 IO-APIC-fasteoi EMU10K1
19: 12 4188 IO-APIC-fasteoi eth0
22: 26 13832 IO-APIC-fasteoi ehci_hcd:usb1
23: 24 30517 IO-APIC-fasteoi ahci
NMI: 0 0 Non-maskable interrupts
LOC: 86361 120679 Local timer interrupts
RES: 25603 ...Hello, Volker.
Thanks for confirming.
Peer, Kuan. Volker is reporting detection problems on MCP65 AHCI.
The followings are what we've discovered till now.
1. When the controller is put into non-raid mode in BIOS
* If softreset is used, either the softreset itself or IDENTIFY
following it times out once. On retrial, it works fine. It
doesn't matter whether the SRST is issued by itself or as
follow-up-srst after hardreset. Using only hardreset works fine.
* The controller doesn't indicate MSI capability and MSI isn't used
by default.
2. When the controller is put into ahci mode in BIOS
* SRST works fine.
* The controller indicates MSI capability but MSI doesn't work
properly resulting in IRQ delivery failure. Adding
intx_disable_bug quirk doesn't help.
I've performed similar test on MCP67 and everything worked fine on it.
Both problems (SRST and MSI) can be worked around but I need more
information to work around those.
* Which chips are affected? Are there proper fixes?
* For the MSI problem, is it system wide problem or local to the ahci
controller?
Thanks.
--
tejun
--
Hi,
I rebooted with non-raid set, without pci=nomsi
this is cat /proc/interrupts:
CPU0 CPU1
0: 57 1 IO-APIC-edge timer
1: 0 81 IO-APIC-edge i8042
8: 0 1 IO-APIC-edge rtc
9: 0 1 IO-APIC-fasteoi acpi
12: 0 3 IO-APIC-edge i8042
17: 2 2868 IO-APIC-fasteoi nvidia
18: 0 0 IO-APIC-fasteoi EMU10K1
22: 4 745 IO-APIC-fasteoi ehci_hcd:usb1
314: 0 158 PCI-MSI-edge eth0
315: 7 12769 PCI-MSI-edge ahci
NMI: 0 0 Non-maskable interrupts
LOC: 12308 14343 Local timer interrupts
RES: 2994 1456 Rescheduling interrupts
CAL: 559 59 function call interrupts
TLB: 323 245 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
SPU: 0 0 Spurious interrupts
ERR: 0
as you can see, only the sata controller and the network uses msi.
And networking works - or I wouldn't be able to send you this mail.
lspci -vv:
00:00.0 RAM memory: nVidia Corporation MCP65 Memory Controller (rev a1)
Subsystem: ASRock Incorporation Unknown device 0444
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Capabilities: [44] HyperTransport: Slave or Primary Interface
Command: BaseUnitID=0 UnitCnt=15 MastHost- DefDir- DUL-
Link Control 0: CFlE+ CST- CFE- <LkFail- Init+ EOC- TXO-
<CRCErr=0 IsocEn- LSEn- ExtCTL- 64b-
Link Config 0: MLWI=16bit DwFcIn- MLWO=16bit DwFcOut-
LWI=16bit DwFcInEn- LWO=16bit DwFcOutEn-
Link Control 1: ...Hello, Volker. I guess the ahci controller works too. This is getting confusing, so Is sytemrescuecd using the same kernel? Otherwise, it will only add more to the confusion. -- tejun --
that is correct, if I set it to 'ahci' in bios, no interrupts are delivered.
From systemrescuecd, ahci in bios, no nomsi:
CPU0 CPU1
0: 147 1 IO-APIC-edge timer
1: 0 607 IO-APIC-edge i8042
8: 0 61 IO-APIC-edge rtc
9: 0 1 IO-APIC-fasteoi acpi
12: 0 4 IO-APIC-edge i8042
14: 0 0 IO-APIC-edge libata
15: 0 0 IO-APIC-edge libata
18: 0 16 IO-APIC-fasteoi aic7xxx
21: 0 0 IO-APIC-fasteoi ohci_hcd:usb2
22: 2 8428 IO-APIC-fasteoi ehci_hcd:usb1
1274: 0 0 PCI-MSI-edge ahci
1275: 0 222 PCI-MSI-edge eth0
NMI: 0 0 Non-maskable interrupts
LOC: 14777 16785 Local timer interrupts
RES: 7250 6100 Rescheduling interrupts
CAL: 41 52 function call interrupts
TLB: 1866 1876 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
SPU: 0 0 Spurious interrupts
ERR: 6
With my 'own' kernel, AHCI has interrupt 315. I will try and make a more
monolithic kernel so I might be able to boot from a usb-flash device.
.
.
.
almost. A heavily patched 2.6.24.2. That is why I don't sent a dmesg ... just
cat /proc/interrupts. I can send you lspci -vv and dmesg from systemrescuecd
if you want.
Glueck Auf,
Volker
--
Aiee.... Can you please try 2.6.25-rc8 with both configurations? And w/o systemrescuecd? Just seeing whether things mount properly or not is good enough. There isn't much risk of losing data. -- tejun --
2.6.25-rc8 is a 'no change'. AHCI+pci=3Dnomsi boots, AHCI without hangs. with AHCI I also get this: [ 0.363828] ------------[ cut here ]------------ [ 0.363828] WARNING: at drivers/ata/ahci.c:645 ahci_init_one+0x190/0xa3a= () [ 0.363828] Modules linked in: [ 0.363828] Pid: 1, comm: swapper Not tainted 2.6.25-rc8 #2 [ 0.363828] [ 0.363828] Call Trace: [ 0.363828] [<ffffffff8022c5a5>] warn_on_slowpath+0x51/0x63 [ 0.363832] [<ffffffff80220061>] __ioremap+0x8/0x197 [ 0.363872] [<ffffffff803921f4>] pci_conf1_read+0xb2/0xbd [ 0.363911] [<ffffffff802f3316>] pcim_iomap_release+0x0/0x2c [ 0.363950] [<ffffffff802f3316>] pcim_iomap_release+0x0/0x2c [ 0.363989] [<ffffffff803471a9>] devres_find+0x4b/0x65 [ 0.364028] [<ffffffff8036e4cd>] ahci_init_one+0x190/0xa3a [ 0.364067] [<ffffffff802af837>] sysfs_addrm_finish+0x1d/0x209 [ 0.364107] [<ffffffff80284014>] ifind+0x34/0x8d [ 0.364145] [<ffffffff802af54e>] sysfs_find_dirent+0x1b/0x2f [ 0.364184] [<ffffffff802eb75d>] ida_get_new_above+0xf0/0x180 [ 0.364223] [<ffffffff802af837>] sysfs_addrm_finish+0x1d/0x209 [ 0.364263] [<ffffffff802b0378>] sysfs_create_link+0xb6/0x102 [ 0.364303] [<ffffffff802f8664>] pci_device_probe+0x4c/0x72 [ 0.364342] [<ffffffff80344c48>] driver_probe_device+0xb5/0x132 [ 0.364381] [<ffffffff80344ddb>] __driver_attach+0x6f/0xaf [ 0.364419] [<ffffffff80344d6c>] __driver_attach+0x0/0xaf [ 0.364458] [<ffffffff80344d6c>] __driver_attach+0x0/0xaf [ 0.364498] [<ffffffff8034400a>] bus_for_each_dev+0x44/0x6f [ 0.364538] [<ffffffff802a9b57>] proc_match+0x23/0x2d [ 0.364576] [<ffffffff80344876>] bus_add_driver+0xae/0x1f4 [ 0.364614] [<ffffffff8034500a>] driver_register+0x59/0xce [ 0.364654] [<ffffffff802f88a5>] __pci_register_driver+0x4a/0x7d [ 0.364701] [<ffffffff804e56b8>] kernel_init+0x14f/0x2b9 [ 0.364740] [<ffffffff8020be98>] child_rip+0xa/0x12 [ 0.364779] [<ffffffff804e5569>] kernel_init+0x0/0x2b9 [ ...
Heh.. that's AHCI_EN not getting set. Peer, any ideas? The driver set AHCI_EN in the HOST_CTL register but on reading back it's still clear. Thanks. -- tejun --
Possibly the hardware is not idle? I wonder if the BIOS is doing anything with the IDE interface. _Probably_ not, but that is certainly a situation where switching to AHCI can be a problem for certain AHCI chips. Jeff --
Volker, Could you add a delay between setting AHCI_EN and reading back function? BRs Peer Chen -----Original Message----- From: Jeff Garzik [mailto:jeff@garzik.org] Sent: Saturday, April 12, 2008 9:43 AM To: Tejun Heo Cc: Volker Armin Hemmann; linux-kernel@vger.kernel.org; linux-ide@vger.kernel.org; Peer Chen; Kuan Luo Subject: Re: 2.6.24.X: SATA/AHCI related boot delay. - not with 2.6.24.3 Possibly the hardware is not idle? I wonder if the BIOS is doing anything with the IDE interface. _Probably_ not, but that is certainly a situation where switching to AHCI can be a problem for certain AHCI chips. Jeff ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ----------------------------------------------------------------------------------- --
if you tell me how to? Glück Auf Volker --
Volker, I mean add msleep(x) function between writel(HOST_AHCI_EN, mmio + HOST_CTL) and readl(mmio + HOST_CTL) cause we ever met a bug that writing data to the AHCI BAR5 but need a delay to get correct read back value. But forget it, looks like it's not the same case here. Which mode do you set(ahci or non-raid) when you encountered the issue that the AHCI_EN bit failed to set? I found we have workaround for AHCI_EN bit setting in our Bios programming guide for AHCI mode. BRs Peer Chen -----Original Message----- From: Volker Armin Hemmann [mailto:volker.armin.hemmann@tu-clausthal.de] Sent: Monday, April 14, 2008 2:42 PM To: Peer Chen Cc: Jeff Garzik; Tejun Heo; linux-kernel@vger.kernel.org; linux-ide@vger.kernel.org; Kuan Luo Subject: Re: 2.6.24.X: SATA/AHCI related boot delay. - not with 2.6.24.3 if you tell me how to? Glück Auf Volker ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ----------------------------------------------------------------------------------- --
Hi, you mean this? [ 0.363828] ------------[ cut here ]------------ [ 0.363828] WARNING: at drivers/ata/ahci.c:645 ahci_init_one+0x190/0xa3a() [ 0.363828] Modules linked in: [ 0.363828] Pid: 1, comm: swapper Not tainted 2.6.25-rc8 #2 [ 0.363828] [ 0.363828] Call Trace: [ 0.363828] [<ffffffff8022c5a5>] warn_on_slowpath+0x51/0x63 [ 0.363832] [<ffffffff80220061>] __ioremap+0x8/0x197 [ 0.363872] [<ffffffff803921f4>] pci_conf1_read+0xb2/0xbd [ 0.363911] [<ffffffff802f3316>] pcim_iomap_release+0x0/0x2c [ 0.363950] [<ffffffff802f3316>] pcim_iomap_release+0x0/0x2c [ 0.363989] [<ffffffff803471a9>] devres_find+0x4b/0x65 [ 0.364028] [<ffffffff8036e4cd>] ahci_init_one+0x190/0xa3a [ 0.364067] [<ffffffff802af837>] sysfs_addrm_finish+0x1d/0x209 [ 0.364107] [<ffffffff80284014>] ifind+0x34/0x8d [ 0.364145] [<ffffffff802af54e>] sysfs_find_dirent+0x1b/0x2f [ 0.364184] [<ffffffff802eb75d>] ida_get_new_above+0xf0/0x180 [ 0.364223] [<ffffffff802af837>] sysfs_addrm_finish+0x1d/0x209 [ 0.364263] [<ffffffff802b0378>] sysfs_create_link+0xb6/0x102 [ 0.364303] [<ffffffff802f8664>] pci_device_probe+0x4c/0x72 [ 0.364342] [<ffffffff80344c48>] driver_probe_device+0xb5/0x132 [ 0.364381] [<ffffffff80344ddb>] __driver_attach+0x6f/0xaf [ 0.364419] [<ffffffff80344d6c>] __driver_attach+0x0/0xaf [ 0.364458] [<ffffffff80344d6c>] __driver_attach+0x0/0xaf [ 0.364498] [<ffffffff8034400a>] bus_for_each_dev+0x44/0x6f [ 0.364538] [<ffffffff802a9b57>] proc_match+0x23/0x2d [ 0.364576] [<ffffffff80344876>] bus_add_driver+0xae/0x1f4 [ 0.364614] [<ffffffff8034500a>] driver_register+0x59/0xce [ 0.364654] [<ffffffff802f88a5>] __pci_register_driver+0x4a/0x7d [ 0.364701] [<ffffffff804e56b8>] kernel_init+0x14f/0x2b9 [ 0.364740] [<ffffffff8020be98>] child_rip+0xa/0x12 [ 0.364779] [<ffffffff804e5569>] kernel_init+0x0/0x2b9 [ 0.364817] [<ffffffff8020be8e>] child_rip+0x0/0x12 [ 0.364854] [ ...
Yes, could you dump the IO port data from 0x2f00 to 0x2fff and send it to me, it record the base address BIOS trap? Also, could you update the latest BIOS to try this issue again? BRs Peer Chen -----Original Message----- From: Volker Armin Hemmann [mailto:volker.armin.hemmann@tu-clausthal.de] Sent: Tuesday, April 15, 2008 5:13 AM To: Peer Chen Cc: Jeff Garzik; Tejun Heo; linux-kernel@vger.kernel.org; linux-ide@vger.kernel.org; Kuan Luo Subject: Re: 2.6.24.X: SATA/AHCI related boot delay. - not with 2.6.24.3 Hi, you mean this? [ 0.363828] ------------[ cut here ]------------ [ 0.363828] WARNING: at drivers/ata/ahci.c:645 ahci_init_one+0x190/0xa3a() [ 0.363828] Modules linked in: [ 0.363828] Pid: 1, comm: swapper Not tainted 2.6.25-rc8 #2 [ 0.363828] [ 0.363828] Call Trace: [ 0.363828] [<ffffffff8022c5a5>] warn_on_slowpath+0x51/0x63 [ 0.363832] [<ffffffff80220061>] __ioremap+0x8/0x197 [ 0.363872] [<ffffffff803921f4>] pci_conf1_read+0xb2/0xbd [ 0.363911] [<ffffffff802f3316>] pcim_iomap_release+0x0/0x2c [ 0.363950] [<ffffffff802f3316>] pcim_iomap_release+0x0/0x2c [ 0.363989] [<ffffffff803471a9>] devres_find+0x4b/0x65 [ 0.364028] [<ffffffff8036e4cd>] ahci_init_one+0x190/0xa3a [ 0.364067] [<ffffffff802af837>] sysfs_addrm_finish+0x1d/0x209 [ 0.364107] [<ffffffff80284014>] ifind+0x34/0x8d [ 0.364145] [<ffffffff802af54e>] sysfs_find_dirent+0x1b/0x2f [ 0.364184] [<ffffffff802eb75d>] ida_get_new_above+0xf0/0x180 [ 0.364223] [<ffffffff802af837>] sysfs_addrm_finish+0x1d/0x209 [ 0.364263] [<ffffffff802b0378>] sysfs_create_link+0xb6/0x102 [ 0.364303] [<ffffffff802f8664>] pci_device_probe+0x4c/0x72 [ 0.364342] [<ffffffff80344c48>] driver_probe_device+0xb5/0x132 [ 0.364381] [<ffffffff80344ddb>] __driver_attach+0x6f/0xaf [ 0.364419] [<ffffffff80344d6c>] __driver_attach+0x0/0xaf [ 0.364458] [<ffffffff80344d6c>] __driver_attach+0x0/0xaf [ 0.364498] [<ffffffff8034400a>] ...
Hi,
how do I do that?
Also, I can't see anything resembling this two adresses(?) in /proc/ioports:
0000-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-006f : keyboard
0070-0077 : rtc
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : 0000:00:09.0
01f0-01f7 : 0000:00:09.0
0290-029f : pnp 00:0a
0295-0296 : w83627ehf
0295-0296 : w83627ehf
0376-0376 : 0000:00:09.0
03c0-03df : vga+
03f6-03f6 : 0000:00:09.0
04d0-04d1 : pnp 00:05
0800-080f : pnp 00:05
0cf8-0cff : PCI conf1
0e00-0e07 : 0000:00:0a.0
0e00-0e07 : ahci
0e20-0e23 : 0000:00:0a.0
0e20-0e23 : ahci
0e40-0e47 : 0000:00:0a.0
0e40-0e47 : ahci
0e60-0e63 : 0000:00:0a.0
0e60-0e63 : ahci
2000-207f : pnp 00:05
2000-2003 : ACPI PM1a_EVT_BLK
2004-2005 : ACPI PM1a_CNT_BLK
2008-200b : ACPI PM_TMR
2010-2015 : ACPI CPU throttle
2020-2027 : ACPI GPE0_BLK
2080-20ff : pnp 00:05
2400-247f : pnp 00:05
2480-24ff : pnp 00:05
24a0-24af : ACPI GPE1_BLK
2800-287f : pnp 00:05
2880-28ff : pnp 00:05
2c00-2c7f : pnp 00:05
2c80-2cff : pnp 00:05
2d00-2d3f : 0000:00:01.1
2d00-2d3f : nForce2_smbus
2e00-2e3f : 0000:00:01.1
2e00-2e3f : nForce2_smbus
a800-a80f : 0000:00:0a.0
a800-a80f : ahci
bc00-bc3f : 0000:00:01.1
c000-cfff : PCI Bus #01
c800-c8ff : 0000:01:00.0
c800-c8ff : r8169
d000-dfff : PCI Bus #02
d800-d8ff : 0000:02:08.0
dc00-dc3f : 0000:02:09.0
dc00-dc3f : EMU10K1
e000-efff : PCI Bus #04
ec00-ec7f : 0000:04:00.0
there was a new bios from today - and nothing changed:
with 2.6.25-rc8:
[ 0.360518] ------------[ cut here ]------------
[ 0.360518] WARNING: at drivers/ata/ahci.c:645 ahci_init_one+0x190/0xa3a()
[ 0.360518] Modules linked in:
[ 0.360518] Pid: 1, comm: swapper Not tainted 2.6.25-rc8 #2
[ 0.360518]
[ 0.360518] Call Trace:
[ 0.360518] [<ffffffff8022c5a5>] warn_on_slowpath+0x51/0x63
[ 0.360522] [<ffffffff80220061>] __ioremap+0x8/0x197
[ 0.360561] ...Try to use this: http://sourceforge.net/project/showfiles.php?group_id=220929 BRs Peer Chen -----Original Message----- From: Volker Armin Hemmann [mailto:volker.armin.hemmann@tu-clausthal.de] Sent: Wednesday, April 16, 2008 12:20 AM To: Peer Chen Cc: Jeff Garzik; Tejun Heo; linux-kernel@vger.kernel.org; linux-ide@vger.kernel.org; Kuan Luo Subject: Re: 2.6.24.X: SATA/AHCI related boot delay. - not with 2.6.24.3 Hi, how do I do that? Also, I can't see anything resembling this two adresses(?) in /proc/ioports: 0000-001f : dma1 0020-0021 : pic1 0040-0043 : timer0 0050-0053 : timer1 0060-006f : keyboard 0070-0077 : rtc 0080-008f : dma page reg 00a0-00a1 : pic2 00c0-00df : dma2 00f0-00ff : fpu 0170-0177 : 0000:00:09.0 01f0-01f7 : 0000:00:09.0 0290-029f : pnp 00:0a 0295-0296 : w83627ehf 0295-0296 : w83627ehf 0376-0376 : 0000:00:09.0 03c0-03df : vga+ 03f6-03f6 : 0000:00:09.0 04d0-04d1 : pnp 00:05 0800-080f : pnp 00:05 0cf8-0cff : PCI conf1 0e00-0e07 : 0000:00:0a.0 0e00-0e07 : ahci 0e20-0e23 : 0000:00:0a.0 0e20-0e23 : ahci 0e40-0e47 : 0000:00:0a.0 0e40-0e47 : ahci 0e60-0e63 : 0000:00:0a.0 0e60-0e63 : ahci 2000-207f : pnp 00:05 2000-2003 : ACPI PM1a_EVT_BLK 2004-2005 : ACPI PM1a_CNT_BLK 2008-200b : ACPI PM_TMR 2010-2015 : ACPI CPU throttle 2020-2027 : ACPI GPE0_BLK 2080-20ff : pnp 00:05 2400-247f : pnp 00:05 2480-24ff : pnp 00:05 24a0-24af : ACPI GPE1_BLK 2800-287f : pnp 00:05 2880-28ff : pnp 00:05 2c00-2c7f : pnp 00:05 2c80-2cff : pnp 00:05 2d00-2d3f : 0000:00:01.1 2d00-2d3f : nForce2_smbus 2e00-2e3f : 0000:00:01.1 2e00-2e3f : nForce2_smbus a800-a80f : 0000:00:0a.0 a800-a80f : ahci bc00-bc3f : 0000:00:01.1 c000-cfff : PCI Bus #01 c800-c8ff : 0000:01:00.0 c800-c8ff : r8169 d000-dfff : PCI Bus #02 d800-d8ff : 0000:02:08.0 dc00-dc3f : 0000:02:09.0 dc00-dc3f : EMU10K1 e000-efff : PCI Bus #04 ec00-ec7f : 0000:04:00.0 there was a new bios from today - and nothing changed: with 2.6.25-rc8: [ ...
H, with 2.6.25+reiser4: [ 0.367181] ------------[ cut here ]------------ [ 0.367181] WARNING: at drivers/ata/ahci.c:645 ahci_init_one+0x190/0xa3a() [ 0.367181] Modules linked in: [ 0.367181] Pid: 1, comm: swapper Not tainted 2.6.25 #1 [ 0.367181] [ 0.367181] Call Trace: [ 0.367181] [<ffffffff8022c451>] warn_on_slowpath+0x51/0x63 [ 0.367181] [<ffffffff80220061>] __ioremap+0x148/0x197 [ 0.367185] [<ffffffff803cf680>] pci_conf1_read+0xb2/0xbd [ 0.367225] [<ffffffff8032aa0e>] pcim_iomap_release+0x0/0x2c [ 0.367264] [<ffffffff8032aa0e>] pcim_iomap_release+0x0/0x2c [ 0.367304] [<ffffffff803845e9>] devres_find+0x4b/0x65 [ 0.367343] [<ffffffff803ab959>] ahci_init_one+0x190/0xa3a [ 0.367382] [<ffffffff802af71f>] sysfs_addrm_finish+0x1d/0x209 [ 0.367422] [<ffffffff80283f34>] ifind+0x34/0x8d [ 0.367461] [<ffffffff802af436>] sysfs_find_dirent+0x1b/0x2f [ 0.367501] [<ffffffff80322e1d>] ida_get_new_above+0xf0/0x180 [ 0.367540] [<ffffffff802af71f>] sysfs_addrm_finish+0x1d/0x209 [ 0.367580] [<ffffffff802b0260>] sysfs_create_link+0xb6/0x102 [ 0.367620] [<ffffffff80335b44>] pci_device_probe+0x4c/0x72 [ 0.367659] [<ffffffff80382088>] driver_probe_device+0xb5/0x132 [ 0.367698] [<ffffffff8038221b>] __driver_attach+0x6f/0xaf [ 0.367737] [<ffffffff803821ac>] __driver_attach+0x0/0xaf [ 0.367776] [<ffffffff803821ac>] __driver_attach+0x0/0xaf [ 0.367816] [<ffffffff8038144a>] bus_for_each_dev+0x44/0x6f [ 0.367856] [<ffffffff802a9a3b>] proc_match+0x23/0x2d [ 0.367895] [<ffffffff80381cb6>] bus_add_driver+0xae/0x1f4 [ 0.367933] [<ffffffff8038244a>] driver_register+0x59/0xce [ 0.367973] [<ffffffff80335d85>] __pci_register_driver+0x4a/0x7d [ 0.368019] [<ffffffff805346b8>] kernel_init+0x14f/0x2b9 [ 0.368059] [<ffffffff8020bd58>] child_rip+0xa/0x12 [ 0.368097] [<ffffffff80534569>] kernel_init+0x0/0x2b9 [ 0.368136] [<ffffffff8020bd4e>] child_rip+0x0/0x12 [ 0.368173] [ ...
Looks like the BIOS trap had been applied, GHC.AE suppose should be set correctly. Volker, could you try to set the AE twice in the driver or using mmapper to set the AE manually to check if AE will be set correctly or not? BRs ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ----------------------------------------------------------------------------------- --
emm, I am totally out of my depth here. I don't even know what you are talking about. If you explain me in nice, little word or even better, with examples, what I shall try, I will happily do so. Glück Auf, Volker --
ok, with the patch the call trace is gone. Looks like this now (2.6.25+reiser4+your patch) [ 0.367177] Driver 'sd' needs updating - please use bus_type methods [ 0.367189] ahci 0000:00:0a.0: version 3.0 [ 0.367189] ACPI: PCI Interrupt Link [LSA0] enabled at IRQ 23 [ 0.367189] ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LSA0] -> GSI 23 (lev el, low) -> IRQ 23 [ 0.470521] AHCI_EN failed i=0 [ 1.572680] ahci 0000:00:0a.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf imp l SATA mode [ 1.572680] ahci 0000:00:0a.0: flags: 64bit sntf led clo pmp pio [ 1.572680] PCI: Setting latency timer of device 0000:00:0a.0 to 64 [ 1.572682] scsi0 : ahci [ 1.572692] scsi1 : ahci [ 1.572701] scsi2 : ahci [ 1.572711] scsi3 : ahci [ 1.572717] ata1: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc100 irq 23 [ 1.572717] ata2: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc180 irq 23 [ 1.572717] ata3: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc200 irq 23 [ 1.572717] ata4: SATA max UDMA/133 abar m8192@0xf9dfc000 port 0xf9dfc280 irq 23 [ 2.010056] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 2.010552] ata1.00: ATA-7: WDC WD1600JS-00MHB1, 10.02E01, max UDMA/133 [ 2.010552] ata1.00: 312581808 sectors, multi 16: LBA48 [ 2.011132] ata1.00: configured for UDMA/133 [ 2.643013] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 2.663511] ata2.00: ATA-8: SAMSUNG HD501LJ, CR100-12, max UDMA7 [ 2.663511] ata2.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32) [ 2.665420] ata2.00: configured for UDMA/133 [ 2.982665] ata3: SATA link down (SStatus 0 SControl 300) [ 3.302534] ata4: SATA link down (SStatus 0 SControl 300) [ 3.499383] scsi 0:0:0:0: Direct-Access ATA WDC WD1600JS-00M 10.0 [ 3.499400] sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) [ 3.499400] sd 0:0:0:0: [sda] Write Protect is off [ 3.499400] sd 0:0:0:0: [sda] Mode Sense: ...
Hi, Since 2.6.26-rc4 is out, I updated the bios and replaced one harddisk I=20 think its time for a status report ;) Bios is now version 2.10 from 05/12/2008 (http://www.asrock.com/mb/download.asp?Model=3DALiveNF5-eSATA2%2b&s=3DAM2) =2DAHCI+nomsi: boots fine. Works fine. But: I still can not enable ncq =2DAHCI+msi: the usual. No harddisks found, waiting for the timeouts, then= =20 kernel panic because of missing root. =2DNON-RAID/IDE+msi: works fine. No delays. No NCQ (not surprising since mo= bos=20 handbook says that AHCI mode is required). =2DNON-RAID/IDE+nomsi: see above. Attached are kernel config used, lspci outputs and dmesgs. lspci was done=20 straight after boot without the nvidia-module loaded. Kernel is patched wit= h=20 reiser4.=20 One thing that confuses me is this: echo 31 > /sys/block/sda/device/queue_depth echo: write error: invalid argument but: [14315325.624510] ahci 0000:00:0a.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps= =20 0xf impl SATA mode [14315325.624559] ahci 0000:00:0a.0: flags: 64bit sntf led clo pmp pio [14315325.624599] PCI: Setting latency timer of device 0000:00:0a.0 to 64 [14315325.626674] scsi0 : ahci [14315325.626674] scsi1 : ahci [14315325.626674] scsi2 : ahci [14315325.626674] scsi3 : ahci [14315325.626674] ata1: SATA max UDMA/133 abar m8192@0xf9dfc000 port=20 0xf9dfc100 irq 23 [14315325.626674] ata2: SATA max UDMA/133 abar m8192@0xf9dfc000 port=20 0xf9dfc180 irq 23 [14315325.626674] ata3: SATA max UDMA/133 abar m8192@0xf9dfc000 port=20 0xf9dfc200 irq 23 [14315325.626674] ata4: SATA max UDMA/133 abar m8192@0xf9dfc000 port=20 0xf9dfc280 irq 23 [14315325.943951] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [14315325.950222] ata1.00: ATA-7: SAMSUNG HD502IJ, 1AA01109, max UDMA7 [14315325.950222] ata1.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/= 32) [14315325.953942] ata1.00: configured for UDMA/133 [14315326.273949] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [14315326.286509] ata2.00: ...
Hello, So, sans NCQ, the only remaining issue is MSI, right? Peer Chen, please lemme know which controllers are affected by this MSI problem and where The controller is not reporting NCQ capability in its cap register (flags: line contains ncq if the controller is), so NCQ is not enabled. Thanks. -- tejun --
ah, I thought all AHCI chipsets support NCQ (and the board's handbook lists it as feature) thanks for the explanation. Glück Auf, Volker --
It could be that the controller has ahci support but it just forgets to set the corresponding cap bit. Peer Chen, Kuan Luo, can you guys please comment on the 64bit and ncq problems? Thanks. -- tejun --
I think it did support NCQ function, our windows driver doesn't check that capability bit, so it can issue FP-DMA command no matter that bit being set or not. For the MSI issue, I suggest just disable the MSI for MCP65 ahci controller, it should be hardware bug. More specifically, you can disable the MSI if the MCP65 ahci controller's revision ID is 0xa1 or 0xa2. What's 64bit problem? BRs ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ----------------------------------------------------------------------------------- --
Hmmmm.. What was it? Searching... Hmmm... Anyone knows what I was talking about? :-) I'll ask again when I recall. Thanks. -- tejun --
Volker, can you please test the attached patch? Thanks. -- tejun
I patched 2.6.26-rc5 with it. It booted fine without pci=nomsi and ncq got
turned on. If I see problems with ncq I report back.
cat /proc/interrupts
CPU0 CPU1
0: 37 1 IO-APIC-edge timer
1: 0 1535 IO-APIC-edge i8042
7: 1 0 IO-APIC-edge
8: 0 1 IO-APIC-edge rtc
9: 0 1 IO-APIC-fasteoi acpi
12: 0 3 IO-APIC-edge i8042
17: 5 1896 IO-APIC-fasteoi nvidia
18: 0 0 IO-APIC-fasteoi EMU10K1
22: 4 2322 IO-APIC-fasteoi ehci_hcd:usb1
23: 17 18904 IO-APIC-fasteoi ahci
315: 2 305 PCI-MSI-edge eth0
NMI: 0 0 Non-maskable interrupts
LOC: 40225 31947 Local timer interrupts
RES: 9609 7702 Rescheduling interrupts
CAL: 9872 3693 function call interrupts
TLB: 1179 1245 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
SPU: 0 0 Spurious interrupts
ERR: 1
[ 0.000000] Linux version 2.6.26-rc5r4mcp65patch (root@energy) (gcc version
4.2.4 (Gentoo 4.2.4 p1.0)) #3 SMP Mon Jun 9 16:33:24 CEST 2008
[ 0.000000] Command line: root=/dev/sda3 nmi_watchdog=0
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
[ 0.000000] BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 00000000cffb0000 (usable)
[ 0.000000] BIOS-e820: 00000000cffb0000 - 00000000cffc0000 (ACPI data)
[ 0.000000] BIOS-e820: 00000000cffc0000 - 00000000cfff0000 (ACPI NVS)
[ 0.000000] BIOS-e820: 00000000cfff0000 - 00000000d0000000 (reserved)
[ 0.000000] BIOS-e820: 00000000fec00000 - ...I didn't find this issue on my MCP65 board but my chip revision is 0xa3 and yours is 0xa1, I'll try to find if there is any useful information between two version chip for this issue.
BRs
Peer Chen
-----Original Message-----
From: Tejun Heo [mailto:htejun@gmail.com]
Sent: Thursday, April 03, 2008 9:48 AM
To: Volker Armin Hemmann
Cc: linux-kernel@vger.kernel.org; linux-ide@vger.kernel.org; Peer Chen; Kuan Luo
Subject: Re: 2.6.24.X: SATA/AHCI related boot delay. - not with 2.6.24.3
Hello, Volker.
Thanks for confirming.
Peer, Kuan. Volker is reporting detection problems on MCP65 AHCI.
The followings are what we've discovered till now.
1. When the controller is put into non-raid mode in BIOS
* If softreset is used, either the softreset itself or IDENTIFY
following it times out once. On retrial, it works fine. It
doesn't matter whether the SRST is issued by itself or as
follow-up-srst after hardreset. Using only hardreset works fine.
* The controller doesn't indicate MSI capability and MSI isn't used
by default.
2. When the controller is put into ahci mode in BIOS
* SRST works fine.
* The controller indicates MSI capability but MSI doesn't work
properly resulting in IRQ delivery failure. Adding
intx_disable_bug quirk doesn't help.
I've performed similar test on MCP67 and everything worked fine on it.
Both problems (SRST and MSI) can be worked around but I need more
information to work around those.
* Which chips are affected? Are there proper fixes?
* For the MSI problem, is it system wide problem or local to the ahci
controller?
Thanks.
--
tejun
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy ...I checked our chipset errata, didn't find any useful information about SRST and MSI issue of AHCI controller. Looks like it's related to BIOS because different mode setting in BIOS result in different behavior, could you point out the BIOS version and also send me the 'lspci -xxx' dump of AHCI/non-raid mode? One question, what kind of setting option for AHCI controller in your BIOS, there are IDE/RAID/AHCI modes for the board of mine, no 'non-raid' mode. Non-raid mode confuse me. BRs Peer Chen -----Original Message----- From: Tejun Heo [mailto:htejun@gmail.com] Sent: Friday, April 11, 2008 2:06 PM To: Peer Chen Cc: Volker Armin Hemmann; linux-kernel@vger.kernel.org; linux-ide@vger.kernel.org; Kuan Luo Subject: Re: 2.6.24.X: SATA/AHCI related boot delay. - not with 2.6.24.3 Any progress? -- tejun ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ----------------------------------------------------------------------------------- --
in the handbook it says that non-raid is for no-hotplug, no-ncq configurations, ahci for hotplug and ncq configurations and raid to use the nvidia raid functions. So there are three settings: non-raid, ahci and raid. The bios version is P1.90 and the bios is definetly quirky - disabling IDE because I don't have any IDE devices will make the BIOS hang while posting ... following three lspci -xxx. AHCI+pci=nomsi, non-raid+pci=nomsi and non-raid: cat /lspci_ahci_2.6.25rc8_xxx 00:00.0 RAM memory: nVidia Corporation MCP65 Memory Controller (rev a1) 00: de 10 44 04 06 00 b0 00 a1 00 00 05 00 00 00 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 49 18 44 04 30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00 40: 49 18 44 04 08 dc e0 01 22 00 11 11 d0 00 00 00 50: 23 05 7f 00 03 00 00 00 00 00 03 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 06 15 00 00 70: 44 44 44 00 d0 09 00 00 11 00 00 00 11 11 88 00 80: 12 88 88 00 fa 00 64 0d 03 00 00 00 7f 00 00 00 90: 70 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 01 01 01 01 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 80 00 0f 0f 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 08 00 01 a8 e0: 00 00 e0 fe 00 00 00 00 00 00 80 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00:01.0 ISA bridge: nVidia Corporation MCP65 LPC Bridge (rev a2) 00: de 10 41 04 0f 00 a0 20 a2 00 01 06 00 00 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 49 18 41 04 30: 00 00 00 00 00 00 00 00 00 00 00 00 ff 00 00 00 40: 49 18 41 04 00 00 d0 fe fa 3e ff 00 fa 3e ff 00 50: fa 3e ff 00 00 5a 62 02 00 00 00 05 33 00 2c 01 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 10 00 ff ff c5 80 00 00 00 00 44 19 60 00 c0 00 80: 09 20 00 10 00 00 00 00 f0 00 00 01 ff 00 00 00 90: ff 7f 00 00 00 00 00 00 21 97 0a 68 ed bc 00 00 a0: 00 00 00 80 00 00 00 00 ...
Hi, I tried 2.6.25-rc7 and there are some changes. Bad news first. With AHCI in bios I get this: [ 0.363838] ------------[ cut here ]------------ [ 0.363838] WARNING: at drivers/ata/ahci.c:645 ahci_init_one+0x190/0xa3a= () [ 0.363838] Modules linked in: [ 0.363838] Pid: 1, comm: swapper Not tainted 2.6.25-rc7 #3 [ 0.363838] [ 0.363838] Call Trace: [ 0.363838] [<ffffffff8022c545>] warn_on_slowpath+0x51/0x63 [ 0.363838] [<ffffffff80220061>] __ioremap+0x8/0x197 [ 0.363838] [<ffffffff8038e218>] pci_conf1_read+0xb2/0xbd [ 0.363841] [<ffffffff802f3076>] pcim_iomap_release+0x0/0x2c [ 0.363880] [<ffffffff802f3076>] pcim_iomap_release+0x0/0x2c [ 0.363920] [<ffffffff80346ca1>] devres_find+0x4b/0x65 [ 0.363959] [<ffffffff8036a405>] ahci_init_one+0x190/0xa3a [ 0.363998] [<ffffffff802af91f>] sysfs_addrm_finish+0x1d/0x209 [ 0.364038] [<ffffffff80283f98>] ifind+0x34/0x8d [ 0.364076] [<ffffffff802af636>] sysfs_find_dirent+0x1b/0x2f [ 0.364115] [<ffffffff802eb4c5>] ida_get_new_above+0xf0/0x180 [ 0.364154] [<ffffffff802af91f>] sysfs_addrm_finish+0x1d/0x209 [ 0.364194] [<ffffffff802b0458>] sysfs_create_link+0xb6/0x102 [ 0.364234] [<ffffffff802f83c4>] pci_device_probe+0x4c/0x72 [ 0.364273] [<ffffffff80344874>] driver_probe_device+0xb5/0x132 [ 0.364312] [<ffffffff80344a07>] __driver_attach+0x6f/0xaf [ 0.364350] [<ffffffff80344998>] __driver_attach+0x0/0xaf [ 0.364389] [<ffffffff80344998>] __driver_attach+0x0/0xaf [ 0.364429] [<ffffffff80343c82>] bus_for_each_dev+0x44/0x6f [ 0.364468] [<ffffffff803444a0>] bus_add_driver+0xae/0x1f6 [ 0.364507] [<ffffffff80344c36>] driver_register+0x59/0xce [ 0.364546] [<ffffffff802f8605>] __pci_register_driver+0x4a/0x7d [ 0.364587] [<ffffffff804dd6b8>] kernel_init+0x14f/0x2b9 [ 0.364626] [<ffffffff8020be98>] child_rip+0xa/0x12 [ 0.364670] [<ffffffff804dd569>] kernel_init+0x0/0x2b9 [ 0.364709] [<ffffffff8020be8e>] child_rip+0x0/0x12 [ ...
