Hi Alan,
In 2.6.23 and previous, CD writing works fine on my system. I'm using
ata_piix on:
00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA
IDE Controller (rev 01)
When I'm running CD writing utilities, I sometimes see this message in
the kernel logs:
ata2.00: 66 bytes trailing data
Things do work fine though.
With 2.6.24-rc1, I can't write CDs. Instead of the above message, I get:
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata2.00: cmd a0/00:00:00:0a:00/00:00:00:00:00/a0 tag 0 cdb 0x5a data 10 in
res 58/00:02:00:0a:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)
ata2.00: status: { DRDY DRQ }
ata2: soft resetting link
ata2.00: configured for UDMA/33
ata2: EH complete
and the software acts oddly - for example Brasero tells me that the
blank CD I have inserted has a capacity of 17,179,869,184gb and it
doesn't let me write CDs as it says the medium is not writable.
git bisect lead me to commit 2db78dd302d26d242d3e8e5c4c5024b6c3ea93c2 as
the culprit.
Author: Alan Cox <alan@lxorguk.ukuu.org.uk>
Date: Tue Oct 2 13:53:04 2007 -0700
libata_scsi: Fix ATAPI transfer lengths
Some controller variants snoop the ATAPI length value for Packet
transfers to do state machine and FIFO management. Thus we want to
set it properly, even for cases where it is otherwise meaningless.
Any ideas?
Thanks!
Daniel
-
On Tue, 30 Oct 2007 15:14:39 +0000 By luck in part I suspect. That shouldn't be happening and indicates something is very wrong. With the fifo buffers set right that may well Not immediately - but if you've got wrong transfer lengths its a candidate for this. Ok lets start with the basics If you mount a CD and use it does it work If you use cdrecord does it work ? What vendor drive and does it seem to be a specific box/drive that triggers this ? Alan -
Device type : Removable CD-ROM Version : 5 Response Format: 2 Capabilities : Vendor_info : 'PHILIPS ' Identification : 'DVD+-RW SDVD8820' Revision : 'AD15' Device seems to be: Generic mmc2 DVD-R/DVD-RW. Using generic SCSI-3/mmc CD-R/CD-RW driver (mmc_cdr). Driver flags : MMC-3 SWABAUDIO BURNFREE Supported modes: TAO PACKET SAO SAO/R96P SAO/R96R RAW/R16 RAW/R96P RAW/R96R This is on my laptop (Dell Inspiron 640m). I won't have access to another system to test this there until next week. The nutty app I was using for burning is Brasero, a GNOME app which does some SG_IO directly with the drive. (I guess it has some bad error handling and doesn't realise when some I/O path has failed) I have narrowed down the exact command submission which causes those nasty messages in dmesg. The function which submits this is named "brasero_medium_get_page_2A_write_speed_desc". The test program that I've attached can now reproduce this error every time it is run. The output is: result 0 check sense data: 72 0b 47 00 00 00 00 0e 09 0c 00 00 00 02 00 00 00 0a 00 I get the same output and dmesg errors even when there is no media in the drive. I have had a quick look at the SCSI command being submitted there, and don't see anything wrong. What next? Daniel
Just as an extra data point... I've compiled the test program an ran it on my ICH7/Pentium D box running 2.6.24-rc1-192-gef49c32 (x86_64), but it does not give the error and nothing in dmesg. I just get: result 0 My SATA DVD-RW drive is: Host: scsi2 Channel: 00 Id: 00 Lun: 00 Vendor: Optiarc Model: DVD RW AD-7170S Rev: 1.00 Type: CD-ROM ANSI SCSI revision: 05 -
I would guess Brasero is issuing a command with the length of data wrongly set. In the old code that might well just produce errors of the "Umm wtf is this data left over for ?", with the new code the drive is likely to change state as it knows the transfer size and that will *correctly* cause an HSM error and what follows. Now the question is who gets the length wrong - Brasero or the ata translation code in libata Alan -
Brasero does exactly the same as my test app which I attached to my last mail. Is my test app wrong? Daniel -
On Tue, 30 Oct 2007 19:21:29 +0000 Would need to double check the SCSI specificatons to be sure but I think you are asking for less data than the drive wishes to provide. You aren't allowed to do that with ATA. Alan -
ide-cd handles this by throwing the excess away, which I think is the sane way to do this. -- Jens Axboe -
That's easy for the PIO case. But CD writing is normally DMA, which means you will get a DMA engine exception if the device wants to give you more data than the scatter/gather entries permit. Jeff -
Right, that's of course problematic... There has to be a way to recover that situation though, or you can't export any user command issue facility. -- Jens Axboe -
You cannot hope to handle all possible effects arising from an app providing an invalid sg header / cdb. Once you start talking "recovery" you are already screwed: we are talking about low-level hardware commands that are passed straight to the hardware. It is trivial to lock up hardware, brick hardware, and corrupt data at that level. If this is NOT a privileged app, we must update the command validation to ensure that invalid commands are not transported to the hardware. If this is a privileged app, our work is done. Fix the app. We gave root rope, and he took it. I even venture to say that "accept anything, clean up afterwards" is /impossible/ to implement, in addition to being dangerous. Jeff -
We allow only a subset of standard commands to be issued by unprivileged applications. We don't strictly enforce transfer sizes and its arguable on SCSI transports this doesn't matter one iota. On ATA it perhaps matters a bit more. Note btw - the reset sequence cased by excess data off these drives is something we jump up and down and trigger. For PIO draining the bits is fine, for DMA pretty much all controllers will clean up happily enough. Its also nothing like the problem you might think as most drives told "send me 150 bytes, oh and the transfer length is 40" will in fact send 40. Alan -
Is it "invalid" though? As a SCSI command, it seems perfectly valid to ask for less data than the total amount the drive could possibly send. It's only when we have to translate it to an ATA command in libata that it becomes invalid there, right? Are you saying that we should limit the SG_IO commands to only ones that are valid both as SCSI *and* as their ATA translations? Would that be for all SCSI devices, or just ones that libata backs? Daniel -
Woaw, back the truck up a bit :-) I'm talking about simple things - like asking for 8 bytes of sense data. Simple mistakes. You cannot possibly check for everything like that in a Certainly, that's not what I'm talking about. -- Jens Axboe -
Hello, Jeff. For sense data and mode pages, the standard-sanctioned way to know the transfer size is to issue command with short buffer size just enough to contain the fixed size header part, determine actual transfer size from it and issue the command again with the correct buffer size. This doesn't happen for READ/WRITE commands. Transfer sizes are pre-determined for those commands and WRITE's to optical devices often can't be retried w/o side effect. I've just went through the ATA spec and this basically means we can't use DMA for these variable-transfer-length commands. Some DMA engines have "throw away what's left over bit" in its command structure or SG entry but not all do and none of drivers we currently has such feature enabled. Hmmm.... reading ide-cd.c::cdrom_pc_intr(). OIC, ide-cd is dealing with this problem by draining PIO after BMDMA engine is done. This is possible for BMDMA engines as they simply step out when the SG entries are exhausted; then, the interrupt handler kicks in and drains the left-over using PIO. This just isn't possible with more modern DMA engines. This really makes me think libata should do these commands via PIO unless we're gonna enable leftover draining for each DMA engine implementation or blacklist the ones which can't drain individually. Then again, nobody really knows how well those features would work as probably none has actually used them. Thanks. -- tejun -
Eeek, please ignore this. I somehow completely forgot about Allocation Length fields in CDBs. Thanks. -- tejun -
Its not possible with old ones either - its a bug not a feature and is why there is special casing for stuff like promise where this hangs the chip. Alan -
It also fails for some cases because the controller snoops the length information in the packet command. Alan -
Ok, gave this a hard look. This is basically a behavior change with regards to how we program the bcount(low) and bcount(high) registers. Issues about FIFO draining and devices returning too-much data are ultimately tangential. Furthermore, this is an ATAPI PIO issue, as demonstrated by (a) Alan's patch did not change DMA lbam/lbah programming and (b) Daniel's report of the message "ata2.00: 66 bytes trailing data" which occurs in the PIO state machine. To survey the behaviors for ATAPI PIO: ide-cd: read/write commands, blimit = 32k others cmds, blimit = xfer_len old libata (pre-Alan): blimit = 8k new libata behavior: blimit = xfer_len thus Alan's patch was moving us CLOSER to ide-cd's behavior (if we ignore read/write commands for the moment, which are not at issue). and the end result is that the change from old-libata to new-libata behavior broke Daniel's case, which is a device known to ignore the SCSI command CDB's allocation length field. Additionally, let's add some background: libata was originally written for first-gen SATA controllers (incl. first-gen SATA bridges), most notably ata_piix, the controller Daniel is using. I chose blimit 8k because I felt that matches the maximum size of a SATA Data FIS. I felt -- this was a wild-added guess -- that setting blimit thusly would properly configure all the magic internal FIFOs and data buffers in silicon that handled these crazy ATAPI devices with random transfer lengths. I am not drawing any conclusions yet, but I'm thinking that blimit=8k may be a better choice for SATA ATAPI. Other comment: * as Tejun noted in IRC, we don't have a clear idea of what triggered the HSM violation. turning on debugging printk's would help, but unfortunately that means turning them on for everything, including hard drives. Jeff -
Only if your transfer is actually 8K or more. Alan -
We are both half-right. I reverted my version of that completely, applied your version verbatim, and pushed it upstream. The result: atapi_xlat _did_ start programming lbam/lbah for DMA (another behavior change), but request-sense DMA path was not changed to program lba[mh] No, that's precisely the problem cause for what Daniel is reporting. We are setting blimit=xfer_len(==10), when the device wants to return more than 10 bytes. When set to the old limit (8k), the problem didn't occur. Prior to this change, __atapi_pio_bytes() happily discarded trailing data, so the software already knows how to eat trailing data left to us by the ATAPI device. The printk (and lack of problem) indicated that this code was active and working fine. Jeff -
Lots of *other* problems occur instead. Daniel is reporting that if he makes a stupid request to a buggy drive he gets a reset and the system continues happily. Even that reset being a reset not a new command issue is actually us being excessively paranoid. Prior to these changes people were reporting things like "I cannot use my CD-ROM". So given we have ten years of experience of the method I put in taken from the old IDE layer versus a good way to make one buggy request work and break everything else - I *know* which I consider to be the right path. And PIO drains of DMA controllers do not work for most controllers, some will simply hang solid if you touch the data register as part of a DMA sequence (eg promise 2026x) Alan -
Sorry if I'm sprouting nonsense here -- I don't have the knowledge that you all do. However, just wanted to point out that I looked up the info about that mode page in the MMC specs. http://www.t10.org/ftp/t10/drafts/mmc4/mmc4r05a.pdf page 513. "E.3.3 MM Capabilities and Mechanical Status Page (Page Code 2Ah)" The length of this mode page varies from drive to drive, so there is no "one size" that you can supply to the SG_IO command (unless you want to use a stupidly large buffer) to retrieve all the data at once. Instead, as Tejun describes, you put a short read request in first, look at byte 1 of the page which tells you the length, and then read the whole lot. Again, ignore me if I'm not contributing anything useful, but I'm increasingly thinking that the SG_IO command block in question is perfectly valid, and doing a short read of the mode page in question (and probably others too) is in fact required before you can know its true size to do a full read anyway. Thanks, Daniel -
ATAPI effectively requires you supply a stupidly large buffer. In theory you set the transfer size in the lba registers and it all works. In practice some drives ignore this and there isn't a nice reliable way to clean up. Old IDE tries to pio the extra out but that won't work on all controllers Welcome to the wonderful world of IDE, where the spec sucks and the drives manage to do even worse things. We can try and clean up better in these cases at least for PIO transfers by trying to drain the data beyond this point, on the controllers that cope with this but really - fix the app to reflect reality: ATAPI is SCSI as spoken by yokels Alan -
The transfer size is specified in Allocation Length field inside CDB of commands which can transfer variable length data. I got confused about Not that I disagree to this point tho. :-) -- tejun -
Thanks for looking at this, the kernel now says:
<4>ata2.00: HSM violation: eh_analyze_tf: BUSY|DRQ
<3>ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
<3>ata2.00: cmd a0/00:00:00:0a:00/00:00:00:00:00/a0 tag 0 cdb 0x5a data
10 in
<4> res 58/00:02:00:0a:00/00:00:00:00:00/a0 Emask 0x2 (HSM
violation)
<3>ata2.00: status: { DRDY DRQ }
<6>ata2: soft resetting link
<6>ata2.00: configured for UDMA/33
<6>ata2: EH complete
Daniel
-
[snip] I hope it does... that patch removes a call to sg_last() and in general cleans up the function a bit, which is nice. Jeff -
That fixes it, thanks! There is no more ugly error in dmesg, the test prog doesn't print any sense data, and brasero works OK too. However, these messages appear in the kernel log every time I run the test app (or when brasero does its thing): <4>ata2.00: 10 bytes trailing data <4>ata2.00: 10 bytes trailing data <4>ata2.00: 10 bytes trailing data <4>ata2.00: 10 bytes trailing data <4>ata2.00: 10 bytes trailing data <4>ata2.00: 10 bytes trailing data <4>ata2.00: 6 bytes trailing data Daniel -
Yeah, that's expected. What's going on here is that your drive sends full mode sense data (76bytes) regardless of allocation size in CDB but it does honor transfer chunk set in the PACKET TF, which is set to the same value as allocation size by Alan's patch. So, now the drive sends the 76 bytes in 8 chunks. The first chunk is transferred into the sg buffer and the following chunks are thrown away. Previously, transfer chunk was set to 8k, so the drive claims to transfer 76 bytes from the buegging, libata transfers leading 10 bytes got transferred into the user buffer and throws away what's remaining. The change caused problem because libata HSM always switches to HSM_ST_LAST (command sequence completion) after filling the command buffer completely. So, throwing away is activated iff the extra data to throw away is transfered together with the last chunk of useful data. With the chunk size reduced to allocation size, the initial chunk fills the data buffer completely and all the extra bytes are transfered in separate chunks. However, libata HSM expects command sequence to complete after the initial chunk but the drive asserts DRQ for the next chunk on the following interrupt, so HSM violation is triggered. The patch modifies HSM such that it keeps throwing away extra data as long as the drive asserts DRQ which is how IDE driver does it. However, there's still remaining issues. What does happen if you raise allocation length and buffersize of the test program to 16? ie. Change 0x0a in cmd[] to 0x10 and increase buffer[10] to buffer[16]. Thanks. -- tejun -
From past experience, some dead ATAPI devices stuck in DRQ=1. We should take care of such situation, otherwise the HSM might get into an infinite loop of waiting for the dead ATAPI device to say DRQ=0 and discarding endless "trailing data". Maybe we could set a limit here. If the ATAPI device keeps DRQ=1 and exceeds the limit, we consider it as HSM violation and have EH handle it. -- albert -
On a DMA transfer its basically out of our control (and a PIO drain will lock some controllers solid until power cycle), but on PIO we know we never set a chunk size over 64K, so if we exceed 64K its time to apply a larger hammer -
Do such controllers lock up on PIO draining after PIO transfers too? Draining is related to the amount of data the drive responds not to the chunk size. I agree 64k should be enough for most cases but I think there can be corner cases where this doesn't hold. Thanks. -- tejun -
On Mon, 05 Nov 2007 09:05:48 +0900 Promise PDC202xx will lock on a PIO drain of a DMA transfer or (if you Tapes perhaps yes ? -
Just to make sure I'm not misunderstanding it. So, PDC202xx will lock up if you transfer allocation size, reset and then try to drain, but it's okay to keep draining as part of continued PIO transfer, right? -- tejun -
The following hang mine: PIO transfer Reset state machine Read data register DMA transfer in progress Read data register And this does not PIO transfer Keep reading data register while checking DRQ -
Eek. The process hangs in D state for a good 60 seconds or so.
Caught a trace with sysrq.
<6>a.out D 00200202 0 7015 3217
<4> b801fc7c 00200082 b029d5cb 00200202 b04a4c00 b04a4c00 b046ee58
b0472040
<4> b24fa000 b24fa140 b17fc040 00000000 b264e000 b801fb80 b264e000
b01f266c
<4> b278f800 00200202 b0285eb7 7fffffff 7fffffff b801fc7c b801fc7c
b035327c
<4>Call Trace:
<4> [<b029d5cb>] ata_scsi_translate+0xd7/0x105
<4> [<b01f266c>] elv_next_request+0xe3/0xf2
<4> [<b0285eb7>] scsi_dispatch_cmd+0x172/0x1aa
<4> [<b035327c>] schedule_timeout+0x13/0x8d
<4> [<b0128830>] del_timer+0x48/0x4e
<4> [<b0352abc>] wait_for_common+0xb2/0x118
<4> [<b011bc10>] default_wake_function+0x0/0x8
<4> [<b01f50d6>] blk_execute_rq+0xa6/0xbc
<4> [<b01f412b>] blk_end_sync_rq+0x0/0x23
<4> [<b0181a14>] bio_add_pc_page+0x23/0x28
<4> [<b0182116>] bio_copy_user+0x116/0x221
<4> [<b0181e5b>] bio_phys_segments+0xe/0x14
<4> [<b01f4253>] blk_rq_bio_prep+0x28/0xa4
<4> [<b01f452a>] blk_rq_append_bio+0x11/0x3a
<4> [<b01f46a0>] blk_rq_map_user+0xfc/0x17f
<4> [<b01f8173>] sg_io+0x20e/0x2f0
<4> [<b01f8408>] scsi_cmd_ioctl+0x1b3/0x34b
When it unhung, this appeared in the logs
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata2.00: cmd a0/01:00:00:10:00/00:00:00:00:00/a0 tag 0 cdb 0x5a data 16 in
res 40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
ata2.00: status: { DRDY }
ata2: port is slow to respond, please be patient (Status 0xd0)
ata2: device not ready (errno=-16), forcing hardreset
ata2: soft resetting link
ata2.00: configured for UDMA/33
ata2: EH complete
And the app itself outputted:
result 0
check sense data:
72 0b 00 00 00 00 00 0e 09 0c 00 00 00 03 00 00 00 00 00
Daniel
-
Correct -- I do not see anything wrong with the SCSI command being passed to the device. It is quite normal to request an allocation length less than the amount of data to be returned -- often this is even a requirement, in order for the SCSI client to see how much data buffer is must allocate for the returned data. This technique is used all over SCSI, both in userland and in the kernel. Nothing wrong with the test app AFAICS. This behavior can be adequated summarized as a drive firmware bug, where the device will puke when both SCSI allocation length and ATA byte count limit are below the amount of data it wishes to return. At this point I think we're interested to see the output from Tejun's debugging patch, posted in this thread. Jeff -
hrm we may need to make that controller-specific, given that the code prior to 2db78dd302d26d242d3e8e5c4c5024b6c3ea93c2 was working for most... Jeff -
Removing it breaks some devices on any controller. I don't think this one is a simple compatibility issue - its a behaviour we have done for ten years on PIIX/ICH chipsets without problems in the old IDE. Also those logs of 66 bytes overrun are worrying - the app is either issuing incorrectly sized commands, the drive is providing incorrectly sized answers or something really evil is going on in the midlayer translations. And the ICH7 is in fact the chipset I did the original testing on ... Need to work out if this is some nutty gentoo app, a quirky device we need to dig into or something else. I've asked Daniel for some first test points to try and eliminate the obvious. Alan -
