Re: sata_sil24 broken since 2.6.23-rc4-mm1

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Matt Mackall <mpm@...>
Cc: Tejun Heo <htejun@...>, Jeff Garzik <jeff@...>, <linux-kernel@...>, <akpm@...>
Date: Friday, October 5, 2007 - 2:06 am

On 10/4/07, Matt Mackall <mpm@selenic.com> wrote:

Yes, I thought about that too. But I never seemed to need more than
two tries to make it fail.
So I would only suspect the last good step as wrong positive.
That would then point to the first of your maps2-patches, the moving
of the pagewalker code.
Would you thing that this is a plausible cause?


What backups? :-)

Yes, I also thought about hardware trouble, but the bisect result
seemed to consistent.
Also that its not always the same drive that fails, only every time
one of the sil-drives.

I now have activated ATA_DEBUG to see if the good and the bad boots differ.
It looks the same until the RAID5 starts.

Good boot:
[   40.160000] ata_scsi_dump_cdb: CDB (2:0,0,0) 35 00 00 00 00 00 00 00 00
[   40.160000] ata_scsi_dump_cdb: CDB (1:0,0,0) 35 00 00 00 00 00 00 00 00
[   40.160000] ata_scsi_dump_cdb: CDB (2:0,0,0) 2a 00 25 42 d6 09 00 00 08
[   40.160000] ata_sg_setup: 1 sg elements mapped
[   40.160000] ata_scsi_dump_cdb: CDB (1:0,0,0) 2a 00 25 42 d6 09 00 00 08
[   40.160000] ata_sg_setup: 1 sg elements mapped
[   40.160000] ata_scsi_dump_cdb: CDB (2:0,0,0) 35 00 00 00 00 00 00 00 00
[   40.160000] ata_scsi_dump_cdb: CDB (1:0,0,0) 35 00 00 00 00 00 00 00 00
[   40.320000] nv_swncq_host_interrupt: id 0x3 SWNCQ: qc_active 0x1
dhfis 0x1 dmafis 0x1 sactive 0x0
[   40.320000] nv_swncq_sdbfis: over
[   40.320000] ata_scsi_dump_cdb: CDB (3:0,0,0) 35 00 00 00 00 00 00 00 00
[   40.320000] ata_exec_command: ata3: cmd 0xEA
[   40.390000] ata_hsm_move: ata3: protocol 1 task_state 3 (dev_stat 0x40)
[   40.390000] ata_hsm_move: ata3: dev 0 command complete, drv_stat 0x40
[   40.420000] md: considering sdb1 ...
[   40.440000] md:  adding sdb1 ...
[   40.440000] md:  adding sda1 ...
[   40.450000] md: created md0
[   40.460000] md: bind<sda1>
[   40.470000] md: bind<sdb1>
[   40.480000] md: running: <sdb1><sda1>
[   40.500000] raid1: raid set md0 active with 2 out of 2 mirrors

Bad boot:
[   40.060000] ata_scsi_dump_cdb: CDB (2:0,0,0) 35 00 00 00 00 00 00 00 00
[   40.060000] ata_scsi_dump_cdb: CDB (1:0,0,0) 35 00 00 00 00 00 00 00 00
[   40.060000] ata_scsi_dump_cdb: CDB (2:0,0,0) 2a 00 25 42 d6 09 00 00 08
[   40.060000] ata_sg_setup: 1 sg elements mapped
[   40.060000] ata_scsi_dump_cdb: CDB (1:0,0,0) 2a 00 25 42 d6 09 00 00 08
[   40.060000] ata_sg_setup: 1 sg elements mapped
[   40.060000] ata_scsi_dump_cdb: CDB (2:0,0,0) 35 00 00 00 00 00 00 00 00
[   40.200000] nv_swncq_host_interrupt: id 0x3 SWNCQ: qc_active 0x1
dhfis 0x1 dmafis 0x1 sactive 0x0
[   40.200000] nv_swncq_sdbfis: over
[   40.200000] ata_scsi_dump_cdb: CDB (3:0,0,0) 35 00 00 00 00 00 00 00 00
[   40.200000] ata_exec_command: ata3: cmd 0xEA
[   40.270000] ata_hsm_move: ata3: protocol 1 task_state 3 (dev_stat 0x40)
[   40.270000] ata_hsm_move: ata3: dev 0 command complete, drv_stat 0x40
[   70.060000] ata_scsi_timed_out: ENTER
[   70.060000] ata_scsi_timed_out: EXIT, ret=0
[   70.080000] ata_scsi_error: ENTER
[   70.080000] ata_port_flush_task: ENTER
[   70.100000] ata1: ata_port_flush_task: EXIT
[   70.110000] __ata_port_freeze: ata1 port frozen
[   70.220000] __ata_port_freeze: ata1 port frozen
[   70.230000] ata_eh_link_autopsy: ENTER
[   70.240000] ata_eh_link_autopsy: EXIT
[   70.250000] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[   70.270000] ata1.00: cmd 61/08:00:09:d6:42/00:00:25:00:00/40 tag 0
cdb 0x0 data 4096 out
[   70.270000]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)

After [   40.060000] ata_scsi_dump_cdb: CDB (1:0,0,0) 2a 00 25 42 d6 09 00 00 08
the drive sda falls of the earth and can't be recovered through soft-
or hard-resetting the port by the error handler.

So I will use the weekend to see if I can find out who issues this
command and add more debug to that place...

Torsten
-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
sata_sil24 broken since 2.6.23-rc4-mm1, Torsten Kaiser, (Wed Sep 26, 4:26 pm)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Tejun Heo, (Thu Sep 27, 12:54 am)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Tejun Heo, (Thu Sep 27, 12:57 am)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Torsten Kaiser, (Thu Sep 27, 2:14 am)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Jeff Garzik, (Thu Sep 27, 2:24 am)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Torsten Kaiser, (Thu Sep 27, 1:34 pm)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Tejun Heo, (Thu Sep 27, 4:22 pm)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Torsten Kaiser, (Fri Sep 28, 1:36 am)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Torsten Kaiser, (Sun Sep 30, 2:00 am)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Tejun Heo, (Sun Sep 30, 10:34 am)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Torsten Kaiser, (Sun Sep 30, 12:19 pm)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Tejun Heo, (Sun Sep 30, 1:39 pm)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Torsten Kaiser, (Sun Sep 30, 2:39 pm)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Torsten Kaiser, (Mon Oct 1, 2:00 pm)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Torsten Kaiser, (Wed Oct 3, 11:21 am)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Torsten Kaiser, (Wed Oct 3, 11:55 am)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Matt Mackall, (Wed Oct 3, 12:38 pm)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Torsten Kaiser, (Thu Oct 4, 1:32 am)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Matt Mackall, (Thu Oct 4, 1:05 pm)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Torsten Kaiser, (Fri Oct 5, 2:06 am)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Torsten Kaiser, (Sun Oct 7, 4:44 am)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Torsten Kaiser, (Sun Oct 7, 10:39 am)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Tejun Heo, (Wed Oct 10, 11:25 pm)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Jens Axboe, (Thu Oct 11, 4:26 am)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Tejun Heo, (Thu Oct 11, 4:36 am)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Jens Axboe, (Thu Oct 11, 6:28 am)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Torsten Kaiser, (Thu Oct 11, 1:54 am)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Tejun Heo, (Thu Oct 11, 2:26 am)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Torsten Kaiser, (Thu Oct 11, 1:51 pm)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Torsten Kaiser, (Wed Oct 3, 1:36 pm)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Matt Mackall, (Wed Oct 3, 1:51 pm)
Re: sata_sil24 broken since 2.6.23-rc4-mm1, Torsten Kaiser, (Wed Oct 3, 2:06 pm)