Re: [PATCH] blacklist NCQ on Seagate Barracuda ST380817AS

Previous thread: Unexpected segfaults with 2.6.23-rc-8-git4 by Chris Holvenstot on Sunday, September 30, 2007 - 6:41 am. (2 messages)

Next thread: [PATCH] robust futex thread exit race by Martin Schwidefsky on Sunday, September 30, 2007 - 8:02 am. (10 messages)
From: Paolo Ornati
Date: Sunday, September 30, 2007 - 7:05 am

Hi, I think you forgot to blacklist this one  :)

--
Seagate Barracuda ST380817AS has troubles with NCQ. For example,
unpacking a tarball on an XFS filesystem gives this:

ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
ata1.00: cmd 61/40:00:29:a3:98/00:00:00:00:00/40 tag 0 cdb 0x0 data 32768 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

More info here:
http://lkml.org/lkml/2007/1/21/76
    
Blacklist it!
    
Signed-off-by: Paolo Ornati <ornati@fastwebnet.it>

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 772be09..be289d0 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -3781,6 +3781,7 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
 	{ "Maxtor 7B250S0",	"BANC1B70",	ATA_HORKAGE_NONCQ, },
 	{ "Maxtor 7B300S0",	"BANC1B70",	ATA_HORKAGE_NONCQ },
 	{ "Maxtor 7V300F0",	"VA111630",	ATA_HORKAGE_NONCQ },
+	{ "ST380817AS",		"3.42",		ATA_HORKAGE_NONCQ },
 	{ "HITACHI HDS7250SASUN500G 0621KTAWSD", "K2AOAJ0AHITACHI",
 	 ATA_HORKAGE_NONCQ },
 	/* NCQ hard hangs device under heavier load, needs hard power cycle */


-- 
	Paolo Ornati
	Linux 2.6.23-rc8-ga64314e6-dirty on x86_64
-

From: Tejun Heo
Date: Sunday, September 30, 2007 - 7:17 am

Hmmm... Was there a thread about this one?  Also, please cc
linux-ide@vger.kernel.org.

Thanks.

-- 
tejun
-

From: Paolo Ornati
Date: Sunday, September 30, 2007 - 7:42 am

On Sun, 30 Sep 2007 07:17:05 -0700

Yes, this was the thread:
http://lkml.org/lkml/2007/1/21/43

-- 
	Paolo Ornati
	Linux 2.6.23-rc8-ga64314e6-dirty on x86_64
-

From: Alan Cox
Date: Sunday, September 30, 2007 - 7:29 am

On Sun, 30 Sep 2007 16:05:48 +0200

What makes you sure that is an NCQ problem ?
-

From: Paolo Ornati
Date: Sunday, September 30, 2007 - 7:46 am

On Sun, 30 Sep 2007 15:29:08 +0100

It goes away with:
echo 1 > /sys/block/sda/device/queue_depth

I have this problem only with XFS, and even with XFS it goes away
mounting with "nobarrier"...

-- 
	Paolo Ornati
	Linux 2.6.23-rc8-ga64314e6-dirty on x86_64
-

From: Jeff Garzik
Date: Sunday, September 30, 2007 - 8:05 am

This last is an interesting datapoint.

I wonder if libata has a generic problem with NCQ + FLUSH CACHE.

What happens if you enable the 'fua' module parameter?  (libata.fua on 
kernel command line, if built in)

	Jeff



-

From: Paolo Ornati
Date: Sunday, September 30, 2007 - 8:28 am

On Sun, 30 Sep 2007 11:05:17 -0400

it isn't supported here:
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Some more info:

	-------	lspci -----------
00:00.0 Host bridge: Intel Corporation Memory Controller Hub (rev 02)
00:02.0 VGA compatible controller: Intel Corporation Integrated Graphics Controller (rev 02)
00:03.0 Communication controller: Intel Corporation HECI Controller (rev 02)
00:19.0 Ethernet controller: Intel Corporation Ethernet Controller (rev 02)
00:1a.0 USB Controller: Intel Corporation USB UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation USB UHCI Controller #5 (rev 02)
00:1a.7 USB Controller: Intel Corporation USB2 EHCI Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation HD Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation PCI Express Port 2 (rev 02)
00:1c.2 PCI bridge: Intel Corporation PCI Express Port 3 (rev 02)
00:1c.3 PCI bridge: Intel Corporation PCI Express Port 4 (rev 02)
00:1c.4 PCI bridge: Intel Corporation PCI Express Port 5 (rev 02)
00:1d.0 USB Controller: Intel Corporation USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation SATA Controller AHCI (rev 02)
00:1f.3 SMBus: Intel Corporation SMBus Controller (rev 02)
02:00.0 IDE interface: Marvell Technology Group Ltd. Unknown device 6101 (rev b1)
06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)


	---- smartctl  -i -d ata /dev/sda ------

smartctl version 5.37 [x86_64-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is ...
From: Jeff Garzik
Date: Sunday, September 30, 2007 - 8:43 am

Did you actually try my suggestion?

That message is normal, because libata defaults to FUA==off.

	Jeff



-

From: Paolo Ornati
Date: Sunday, September 30, 2007 - 8:52 am

On Sun, 30 Sep 2007 11:43:38 -0400

Yes ("libata.fua=1" is ok I think, or it should just be "libata.fua"?):
...
[    0.000000] Kernel command line: root=/dev/sda6 ro vga=0x305 libata.fua=1
...
[  285.004166] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen

-- 
	Paolo Ornati
	Linux 2.6.23-rc8-ga64314e6 on x86_64
-

From: Jeff Garzik
Date: Sunday, September 30, 2007 - 8:59 am

Is libata built into the kernel, or a module?

If it is a module, using the kernel command line won't work.

	Jeff



-

From: Paolo Ornati
Date: Sunday, September 30, 2007 - 9:03 am

On Sun, 30 Sep 2007 11:59:45 -0400

built-in, my kernel is pretty monolithic

-- 
	Paolo Ornati
	Linux 2.6.23-rc8-ga64314e6 on x86_64
-

From: Mark Lord
Date: Sunday, September 30, 2007 - 10:26 am

Yeah, that's pretty suspicious.  Prior to issuing a FLUSH_CACHE op,
one must first drain all outstanding NCQ commands (and not issue new ones).

I'm sure the code must *try* to do that, but perhaps there's a bug in there?
Or just another drive bug?

??
-

From: Tejun Heo
Date: Sunday, September 30, 2007 - 10:29 am

If there was such a bug, the aborted commands list should contain both
FPDMA commands and FLUSH commands.  I don't think command filtering
itself is broken.  Possibly another quirky firmware but it's strange
that this is the only Seagate drive showing this problem.

Thanks.

-- 
tejun
-

From: Mark Lord
Date: Sunday, September 30, 2007 - 10:43 am

Yeah, that's the strange bit.

Surely someone at SuSE must have a drive like that,
which they could set up with XFS and reproduce the same results?

??
-

From: Tejun Heo
Date: Sunday, September 30, 2007 - 10:51 am

I wish we had a detailed hardware catalog.  I'll give a shot at the
internal mailing list.

Thanks.

-- 
tejun
-

From: Mark Lord
Date: Sunday, September 30, 2007 - 11:54 am

Or pick up a new one (only about $78 here) and expense it!

Cheers
-

From: Mark Lord
Date: Sunday, September 30, 2007 - 12:01 pm

Mmm.. $66 for "open box".  But the drive itself has been discontinued by Seagate,
and once claimed to be "World's first SATA desktop drive with NCQ.".

Probably buggy firmware after all.

-ml
-

From: Tejun Heo
Date: Tuesday, October 2, 2007 - 2:19 am

Hello,


Couldn't find any in SUSE and I don't think I can't find any vendor who

Yeah, "World's first" is a pretty good clue indicating "broken".
Blacklisting it seems like a good idea after all.

Thanks.

-- 
tejun
-

From: Paolo Ornati
Date: Tuesday, October 2, 2007 - 8:40 am

On Tue, 02 Oct 2007 18:19:14 +0900

OT: I cannot test anything NCQ related for a while because the Intel
Mobo departed yesterday, so I'm on a different board without NCQ
support  ;)

-- 
	Paolo Ornati
	Linux 2.6.23-rc8generic-ga64314e6 on x86_64
-

Previous thread: Unexpected segfaults with 2.6.23-rc-8-git4 by Chris Holvenstot on Sunday, September 30, 2007 - 6:41 am. (2 messages)

Next thread: [PATCH] robust futex thread exit race by Martin Schwidefsky on Sunday, September 30, 2007 - 8:02 am. (10 messages)