Re: Frequent SATA resets with sata_nv (fwd)

Previous thread: [PATCH] HFSPlus: simplify inode mode settting logic by Wyatt Banks on Saturday, June 23, 2007 - 4:29 pm. (2 messages)

Next thread: [RFC] fsblock by Nick Piggin on Saturday, June 23, 2007 - 6:45 pm. (44 messages)
From: Matthew "Cheetah" Gabeler-Lee
Date: Saturday, June 23, 2007 - 5:52 pm

(Please cc me on replies)

I have three samsung hdds (/sys/block/sda/device/model says SAMSUNG 
SP2504C) in a raid configuration.  My system frequently (2-3x/day) 
experiences temporary lockups, which produce messages as below in my 
dmesg/syslog.  The system recovers, but the hang is annoying to say the 
least.

All three drives are connected to sata_nv ports.  Oddly, it almost 
always happens on ata6 or ata7 (the second and third ports of that 4 
port setup on my motherboard).  There is an identical drive connected at 
ata5, but I've only once or twice seen it hit that drive.

Googling around lkml.org, I found a few threads investigating what look 
like very similar problems, some of which never seemed to find the 
solution, but one of which came up with a fairly quick answer it seemed, 
namely that the drive's NCQ implementation was horked: 
http://lkml.org/lkml/2007/4/18/32

While I don't have older logs to verify exactly when this started, it 
was fairly recent, perhaps around my 2.6.20.1 to 2.6.21.1 kernel 
upgrade.

Any other info or tests I can provide/run to help?

Syslog snippet:
Jun 21 10:35:23 cheetah kernel: ata6: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0 next cpb idx 0x0
Jun 21 10:35:24 cheetah kernel: ata6: CPB 0: ctl_flags 0x9, resp_flags 0x0
Jun 21 10:35:24 cheetah kernel: ata6: timeout waiting for ADMA IDLE, stat=0x400
Jun 21 10:35:24 cheetah kernel: ata6: timeout waiting for ADMA LEGACY, stat=0x400
Jun 21 10:35:24 cheetah kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Jun 21 10:35:24 cheetah kernel: ata6.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
Jun 21 10:35:24 cheetah kernel:          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 21 10:35:24 cheetah kernel: ata6: soft resetting port
Jun 21 10:35:24 cheetah kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 21 10:35:24 cheetah kernel: ata6.00: configured for ...
From: Alistair John Strachan
Date: Sunday, June 24, 2007 - 10:09 am

(Sorry, accidentally dropped LKML)


Well, there's been generic problems with the ADMA code on the CK804, but I 
think Robert fixed them (added CC). I've certainly had NO problems since 
2.6.21.

However, assuming the drive's NCQ _is_ busted and needs to be blacklisted, you 
might find you can temporarily work around the problem by loading the sata_nv 
module with adma=0, or boot with sata_nv.adma=0. Not to point the finger at 
ADMA support specifically, of course, but simply that ADMA enables the NCQ 
features.


Yes, this is probably around the time adma became the default.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-

Previous thread: [PATCH] HFSPlus: simplify inode mode settting logic by Wyatt Banks on Saturday, June 23, 2007 - 4:29 pm. (2 messages)

Next thread: [RFC] fsblock by Nick Piggin on Saturday, June 23, 2007 - 6:45 pm. (44 messages)