Hello everyone, I turn to you with a new problem about scsi or libata. I have a removable disk of type RDX, connected by SATA to my linux machine. I access it via libata on /dev/sdb, like a classical disk. I write to it a certain time with some mounts and umounts. After a certain number of mounting and unmounting, or maybe a certain number of hours (not clearly identified), the device refuses to be mounted, saying "You must specify the filesystem type.". Looking to /var/log/messages I have the following logs : Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] READ CAPACITY failed Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Sense not available. Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Write Protect is off Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Mode Sense: 00 00 00 00 Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Asking for cache data failed Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Assuming drive cache: write through Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] READ CAPACITY failed Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Sense not available. Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Write Protect is off Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Mode Sense: 00 00 00 00 Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Asking for cache data failed Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Assuming drive cache: write through These lines are logged again when I run cfdisk or parted or hdparm or anything on /dev/sdb. My system is : linux kernel 2.6.21.1 with some patches : - libata-start_stop_management (http://bugs.gentoo.org/attachment.cgi?id=118829) compiled with libata. Motherboard ICH6 family (id 2651) Does anyone know this problem or have an idea about what it is and ...
> Looking to /var/log/messages I have the following logs : I need the logs where it fails not afterwards - the ones showing the drive error and where the fs gives up. Without that it is hard to see It would be useful to know if a 2.6.25/2.6.26 kernel without other Insufficient information to guess right now. --
Sorry, you're right, I didn't see these logs : Sep 4 08:03:01 devsni1 kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Sep 4 08:03:01 devsni1 kernel: ata4.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x2a data 131072 out Sep 4 08:03:01 devsni1 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 4 08:03:08 devsni1 kernel: ata4: port is slow to respond, please be patient (Status 0xd0) Sep 4 08:03:31 devsni1 kernel: ata4: port failed to respond (30 secs, Status 0xd0) Sep 4 08:03:31 devsni1 kernel: ata4: soft resetting port Sep 4 08:03:32 devsni1 kernel: ATA: abnormal status 0xD0 on port 0x0001d807 Sep 4 08:03:32 devsni1 last message repeated 4 times Sep 4 08:06:14 devsni1 kernel: Sep 4 08:06:14 devsni1 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000 Sep 4 08:06:14 devsni1 kernel: end_request: I/O error, dev sdb, sector 37700080 Sep 4 08:06:14 devsni1 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000 Sep 4 08:06:14 devsni1 kernel: end_request: I/O error, dev sdb, sector 37700336 Sep 4 08:06:14 devsni1 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000 Sep 4 08:06:14 devsni1 kernel: end_request: I/O error, dev sdb, sector 37700592 Sep 4 08:06:14 devsni1 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000 Sep 4 08:06:14 devsni1 kernel: end_request: I/O error, dev sdb, sector 37700848 ... and so on with always different sector numbers. What is strange is that before this, I have logs looking like : Sep 4 07:27:32 devsni1 kernel: kjournald starting. Commit interval 5 seconds Sep 4 07:27:32 devsni1 kernel: EXT3 FS on sdb, internal journal Sep 4 07:27:32 devsni1 kernel: EXT3-fs: mounted filesystem with ordered data mode. Sep 4 07:27:32 devsni1 kernel: kjournald starting. Commit interval 5 seconds Sep 4 07:27:32 devsni1 kernel: EXT3 FS on sdb, internal journal Sep 4 07:27:32 devsni1 kernel: EXT3-fs: mounted filesystem with ordered data mode. Sep 4 07:28:20 devsni1 kernel: ...
Your disk went offline and then refused to come back when the link was reset. The initial trigger appears to have been the drive, the fact it didn't come back could either be the drive or a controller problem. We've seen a few cases where devices or controllers fail to recover from one end being stuck expecting data. Mark Lord did some patches to try and drain data in this case but I don't remember if they were merged yet. Alan --
..
That would be this patch, currently not merged, not maintained,
and probably needs rework for some chipsets. But for the record:
I think this original patch still applies cleanly on at least 2.6.23-rc7.
Drain up to 512 words from host/bridge FIFO on stuck DRQ HSM violation,
rather than just getting stuck there forever.
Signed-off-by: Mark Lord <mlord@pobox.com>
---
--- old/drivers/ata/libata-sff.c 2007-09-28 09:29:22.000000000 -0400
+++ linux/drivers/ata/libata-sff.c 2007-09-28 09:39:44.000000000 -0400
@@ -420,6 +420,28 @@
ap->ops->irq_on(ap);
}
+static void ata_drain_fifo(struct ata_port *ap, struct ata_queued_cmd *qc)
+{
+ u8 stat = ata_chk_status(ap);
+ /*
+ * Try to clear stuck DRQ if necessary,
+ * by reading/discarding up to two sectors worth of data.
+ */
+ if ((stat & ATA_DRQ) && (!qc || qc->dma_dir != DMA_TO_DEVICE)) {
+ unsigned int i;
+ unsigned int limit = qc ? qc->sect_size : ATA_SECT_SIZE;
+
+ printk(KERN_WARNING "Draining up to %u words from data FIFO.\n",
+ limit);
+ for (i = 0; i < limit ; ++i) {
+ ioread16(ap->ioaddr.data_addr);
+ if (!(ata_chk_status(ap) & ATA_DRQ))
+ break;
+ }
+ printk(KERN_WARNING "Drained %u/%u words.\n", i, limit);
+ }
+}
+
/**
* ata_bmdma_drive_eh - Perform EH with given methods for BMDMA controller
* @ap: port to handle error for
@@ -476,7 +498,7 @@
}
ata_altstatus(ap);
- ata_chk_status(ap);
+ ata_drain_fifo(ap, qc);
ap->ops->irq_clear(ap);
spin_unlock_irqrestore(ap->lock, flags);
--
Hi Mark and Alan, Thank you for your answers and for the patch Mark sent in his mail. I had found just before this patch : http://kerneltrap.org/mailarchive/linux-kernel/2007/9/27/324334 which seems to be different from the one you sent. What does cuase this difference, do these two patches apply on different kernel versions or do they correspond to different revisions of the patch, one being fixed compared to the other? Thank you Pascal --
Hi everyone, I posted this problem last week on this mailing list, I got an answer from Alan Cox requiring more informations. Then when I gave those informations, I didn't get any other answer. So I try another time to get help from some of you. Here is my problem : I have a Linux box with an RDX removable disk in SATA. A software uses regularly this RDX, mounts it, reads and/or writes to it and unmounts it. But after a certain time or a certain number of uses (not clearly identified), the device fails to respond, mount displaying something like : "There is no filesystem on this device" In /var/log/messages I have : Sep 4 08:03:01 devsni1 kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Sep 4 08:03:01 devsni1 kernel: ata4.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x2a data 131072 out Sep 4 08:03:01 devsni1 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 4 08:03:08 devsni1 kernel: ata4: port is slow to respond, please be patient (Status 0xd0) Sep 4 08:03:31 devsni1 kernel: ata4: port failed to respond (30 secs, Status 0xd0) Sep 4 08:03:31 devsni1 kernel: ata4: soft resetting port Sep 4 08:03:32 devsni1 kernel: ATA: abnormal status 0xD0 on port 0x0001d807 Sep 4 08:03:32 devsni1 last message repeated 4 times Sep 4 08:06:14 devsni1 kernel: Sep 4 08:06:14 devsni1 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000 Sep 4 08:06:14 devsni1 kernel: end_request: I/O error, dev sdb, sector 37700080 Sep 4 08:06:14 devsni1 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000 Sep 4 08:06:14 devsni1 kernel: end_request: I/O error, dev sdb, sector 37700336 Sep 4 08:06:14 devsni1 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000 Sep 4 08:06:14 devsni1 kernel: end_request: I/O error, dev sdb, sector 37700592 Sep 4 08:06:14 devsni1 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000 Sep 4 08:06:14 devsni1 kernel: end_request: I/O error, dev sdb, sector 37700848 ... and so on with always ...
