(Please cc me on replies)
I have three samsung hdds (/sys/block/sda/device/model says SAMSUNG
SP2504C) in a raid configuration. My system frequently (2-3x/day)
experiences temporary lockups, which produce messages as below in my
dmesg/syslog. The system recovers, but the hang is annoying to say the
least.
All three drives are connected to sata_nv ports. Oddly, it almost
always happens on ata6 or ata7 (the second and third ports of that 4
port setup on my motherboard). There is an identical drive connected at
ata5, but I've only once or twice seen it hit that drive.
Googling around lkml.org, I found a few threads investigating what look
like very similar problems, some of which never seemed to find the
solution, but one of which came up with a fairly quick answer it seemed,
namely that the drive's NCQ implementation was horked:
http://lkml.org/lkml/2007/4/18/32
While I don't have older logs to verify exactly when this started, it
was fairly recent, perhaps around my 2.6.20.1 to 2.6.21.1 kernel
upgrade.
Any other info or tests I can provide/run to help?
Syslog snippet:
Jun 21 10:35:23 cheetah kernel: ata6: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0 next cpb idx 0x0
Jun 21 10:35:24 cheetah kernel: ata6: CPB 0: ctl_flags 0x9, resp_flags 0x0
Jun 21 10:35:24 cheetah kernel: ata6: timeout waiting for ADMA IDLE, stat=0x400
Jun 21 10:35:24 cheetah kernel: ata6: timeout waiting for ADMA LEGACY, stat=0x400
Jun 21 10:35:24 cheetah kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Jun 21 10:35:24 cheetah kernel: ata6.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
Jun 21 10:35:24 cheetah kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 21 10:35:24 cheetah kernel: ata6: soft resetting port
Jun 21 10:35:24 cheetah kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 21 10:35:24 cheetah kernel: ata6.00: configured for ...