On 10/4/07, Matt Mackall <mpm@selenic.com> wrote:Yes, I thought about that too. But I never seemed to need more than two tries to make it fail. So I would only suspect the last good step as wrong positive. That would then point to the first of your maps2-patches, the moving of the pagewalker code. Would you thing that this is a plausible cause? What backups? :-) Yes, I also thought about hardware trouble, but the bisect result seemed to consistent. Also that its not always the same drive that fails, only every time one of the sil-drives. I now have activated ATA_DEBUG to see if the good and the bad boots differ. It looks the same until the RAID5 starts. Good boot: [ 40.160000] ata_scsi_dump_cdb: CDB (2:0,0,0) 35 00 00 00 00 00 00 00 00 [ 40.160000] ata_scsi_dump_cdb: CDB (1:0,0,0) 35 00 00 00 00 00 00 00 00 [ 40.160000] ata_scsi_dump_cdb: CDB (2:0,0,0) 2a 00 25 42 d6 09 00 00 08 [ 40.160000] ata_sg_setup: 1 sg elements mapped [ 40.160000] ata_scsi_dump_cdb: CDB (1:0,0,0) 2a 00 25 42 d6 09 00 00 08 [ 40.160000] ata_sg_setup: 1 sg elements mapped [ 40.160000] ata_scsi_dump_cdb: CDB (2:0,0,0) 35 00 00 00 00 00 00 00 00 [ 40.160000] ata_scsi_dump_cdb: CDB (1:0,0,0) 35 00 00 00 00 00 00 00 00 [ 40.320000] nv_swncq_host_interrupt: id 0x3 SWNCQ: qc_active 0x1 dhfis 0x1 dmafis 0x1 sactive 0x0 [ 40.320000] nv_swncq_sdbfis: over [ 40.320000] ata_scsi_dump_cdb: CDB (3:0,0,0) 35 00 00 00 00 00 00 00 00 [ 40.320000] ata_exec_command: ata3: cmd 0xEA [ 40.390000] ata_hsm_move: ata3: protocol 1 task_state 3 (dev_stat 0x40) [ 40.390000] ata_hsm_move: ata3: dev 0 command complete, drv_stat 0x40 [ 40.420000] md: considering sdb1 ... [ 40.440000] md: adding sdb1 ... [ 40.440000] md: adding sda1 ... [ 40.450000] md: created md0 [ 40.460000] md: bind<sda1> [ 40.470000] md: bind<sdb1> [ 40.480000] md: running: <sdb1><sda1> [ 40.500000] raid1: raid set md0 active with 2 out of 2 mirrors Bad boot: [ 40.060000] ata_scsi_dump_cdb: CDB (2:0,0,0) 35 00 00 00 00 00 00 00 00 [ 40.060000] ata_scsi_dump_cdb: CDB (1:0,0,0) 35 00 00 00 00 00 00 00 00 [ 40.060000] ata_scsi_dump_cdb: CDB (2:0,0,0) 2a 00 25 42 d6 09 00 00 08 [ 40.060000] ata_sg_setup: 1 sg elements mapped [ 40.060000] ata_scsi_dump_cdb: CDB (1:0,0,0) 2a 00 25 42 d6 09 00 00 08 [ 40.060000] ata_sg_setup: 1 sg elements mapped [ 40.060000] ata_scsi_dump_cdb: CDB (2:0,0,0) 35 00 00 00 00 00 00 00 00 [ 40.200000] nv_swncq_host_interrupt: id 0x3 SWNCQ: qc_active 0x1 dhfis 0x1 dmafis 0x1 sactive 0x0 [ 40.200000] nv_swncq_sdbfis: over [ 40.200000] ata_scsi_dump_cdb: CDB (3:0,0,0) 35 00 00 00 00 00 00 00 00 [ 40.200000] ata_exec_command: ata3: cmd 0xEA [ 40.270000] ata_hsm_move: ata3: protocol 1 task_state 3 (dev_stat 0x40) [ 40.270000] ata_hsm_move: ata3: dev 0 command complete, drv_stat 0x40 [ 70.060000] ata_scsi_timed_out: ENTER [ 70.060000] ata_scsi_timed_out: EXIT, ret=0 [ 70.080000] ata_scsi_error: ENTER [ 70.080000] ata_port_flush_task: ENTER [ 70.100000] ata1: ata_port_flush_task: EXIT [ 70.110000] __ata_port_freeze: ata1 port frozen [ 70.220000] __ata_port_freeze: ata1 port frozen [ 70.230000] ata_eh_link_autopsy: ENTER [ 70.240000] ata_eh_link_autopsy: EXIT [ 70.250000] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen [ 70.270000] ata1.00: cmd 61/08:00:09:d6:42/00:00:25:00:00/40 tag 0 cdb 0x0 data 4096 out [ 70.270000] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) After [ 40.060000] ata_scsi_dump_cdb: CDB (1:0,0,0) 2a 00 25 42 d6 09 00 00 08 the drive sda falls of the earth and can't be recovered through soft- or hard-resetting the port by the error handler. So I will use the weekend to see if I can find out who issues this command and add more debug to that place... Torsten -
| david | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Linus Torvalds | Linux 2.6.27-rc5 |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
| David Miller | Slow DOWN, please!!! |
git: | |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | [GIT]: Networking |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Natalie Protasevich | [BUG] New Kernel Bugs |
