Greetings,
I get a few million of these on boot-- the system never actually boots.
Works fine in 2.6.23-rc7.
[ 50.456012] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 50.462484] ata2.00: irq_stat 0x40000001
[ 50.466441] ata2.00: cmd e5/00:00:00:00:00/00:00:00:00:00/a0 tag 0
cdb 0x0 data 0
[ 50.466442] res 51/04:00:01:01:80/00:00:00:00:00/a0 Emask
0x1 (device error)
[ 50.481914] ata2.00: status: {DRDY ERR }
[ 50.485876] ata2.00: error: {ABRT }
[ 50.489533] ata2.00: configured for UDMA/133
[ 50.493839] ata2: EH complete
I've attached the entire dmesg and lspci.
Berck
Are you "git-friendly"? A few quick kernel compiles and reboots would help us narrow down the problem, given that it's a reproducible regression. The first step would be to clone the "upstream" branch of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git and see if the problem is reproducible there. If yes, then you have narrowed down the problem to something my ATA devel tree has introduced into -mm. Once the blame has been squared fixed upon me :) you can use git-bisect to locate the precise change that broke your setup. Info at http://kerneltrap.org/node/11753 or http://www.kernel.org/pub/software/scm/git/docs/v1.3.3/howto/isolate-bugs-with-bisect.txt or "man git-bisect" Jeff -
Nope, you're off the hook. The libata tree works great, so it must be something else in -mm conflicting. -
Can you try 2.6.23-rc8 plus this patch: http://brick.kernel.dk/git-block.patch.bz2 and see if that works? -- Jens Axboe -
Whoops, sorry! I just lied. I'm a git newbie, and failed to actually get the "upstream" branch the first time, so rc8 is clean, but it fails when I actually pull the upstream branch. I'll git bisect and get back to you. BErck -
OK, you probably realize this, but you can forget about the git-block testing for now then. -- Jens Axboe -
Okay, here's the problem:
268fe6f9f15551be9abedd44a237392675d529d5 is first bad commit
commit 268fe6f9f15551be9abedd44a237392675d529d5
Author: Jeff Garzik <jeff@garzik.org>
Date: Fri Sep 21 07:09:36 2007 -0400
[libata] SCSI: simple TEST UNIT READY simulation
It's trivial to ping the device, and that's a much more sane behavior
than no-op.
df6d21f7ce56a4e796f8f856c1f647b0395ab4df M drivers
Berck
-
Thanks for debugging! Can you tell me something about this device? [ 49.045635] ata2.00: ATA-6: Config Disk, RGL10364, max UDMA/133 [ 49.051677] ata2.00: 640 sectors, multi 1: LBA [ 49.056321] ata2.00: configured for UDMA/133 It seems like it does not support the 'check power mode' command. Can you post a text file attachment, containing the output of 'hdparm --Istdout' ? Jeff -
No problem. The device in question is a Western Digital Raptor WD360GD 36.7GB 10,000 RPM Serial ATA150 Hard Drive. hdparm output attached. Berck -
Does the attached patch change behavior at all? You should be able to apply it on top of libata-dev.git#upstream or -mm. If there are still problems, an updated dmesg (w/ the attached patch) and output from enabling ATA_DEBUG (include/linux/libata.h) would be very helpful. Thanks! Jeff
Still broken, dmesg with ATA_DEBUG defined, attached.
Great, this will be useful output. It will probably be a couple days before my next patch. In the meantime, you can extract the bad commit to a patch git-diff-tree -p 268fe6f9f15551be9abedd44a237392675d529d5 > \ /tmp/patch and then revert it locally in your kernel tree patch -sp1 -R < /tmp/patch to temporarily work around this. I will definitely make sure this is either fixed or reverted before it goes upstream to Linus. Thanks, Jeff -
Would it also be possible for you to send along 'hdparm --Istdout' output for your config disk thingy, /dev/sdd ? Jeff -
One of these appears in my system as well (ASUS P5W-DH Deluxe mainboard). Here's the hdparm output: /dev/sdb: 0040 3fff c837 0010 0000 0000 003f 0000 0000 0000 3030 3030 3030 305f 5f5f 5f5f 5f5f 5f5f 5f30 5f41 0003 3e00 0004 5247 4c31 3033 3634 436f 6e66 6967 2020 4469 736b 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 8001 0000 2f00 4000 0200 0000 0007 3fff 0010 003f fc10 00fb 0101 0280 0000 0000 0407 0003 0078 0078 0078 0078 0000 0000 0000 0000 0000 0000 0000 0201 0000 0000 0000 007e 001b 0068 5060 4000 0000 1000 4000 407f 0000 0000 0000 fffe 0000 c0fe 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0017 2040 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 baa5 Since about 2.6.17 or 2.6.18, it has been causing long delays while booting: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x5) ata2: port is slow to respond, please be patient (Status 0x80) ata2: COMRESET failed (errno=-16) ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata2.00: ATA-6: Config Disk, RGL10364, max UDMA/133 ata2.00: 640 sectors, multi 1: LBA ata2.00: configured for UDMA/133 Bernd -
And yup, same problem with the painful boot delays since 2.6.18. Tejun indicated that a fix would get merged with 2.6.23, but that didn't happen. Here's hoping something makes it into .24! Berck -
Yeah, it is the sil4726 virtual device which is really crappy as an ATA device. About the fix, I thought PMP support would fix it but the controller on P5W-DH doesn't support PMP. It can only talk to the virtual device or the device attached to the first port depending on how the PMP chip is configured. It seems we'll have to blacklist the mainboard and skip or use modified reset sequence on the affected port, so that's why the fix was delayed. I'm currently on the road but I'll look into it when I get back (next week). Thanks. -- tejun -
Sure, just don't ask me what it is! (I've generally assumed that writing to it would be a bad idea.) Berck
FWIW I haven't had time to debug this, so I'm going to simply revert the patch, and make sure it does not make it into 2.6.24. Jeff -
