Re: [BUG 2.6.21-rc3-git9] SATA NCQ failure with Samsum HD401LJ

Previous thread: [PATCH][RSDL-mm 6/6] sched: document rsdl cpu scheduler by Con Kolivas on Friday, March 16, 2007 - 6:55 am. (1 message)

Next thread: [PATCH][RSDL-mm 5/6] sched: implement rsdl cpu scheduler by Con Kolivas on Friday, March 16, 2007 - 6:55 am. (1 message)
From: Robert Hancock
Date: Friday, March 16, 2007 - 6:56 am

(linux-ide cc'ed)


This does indeed look like a drive side issue to me (the controller is 
reporting CPBs with response flags 2 which as far as I can tell 
indicates it's still waiting for the drive to complete the request).

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/

-

From: Christian
Date: Friday, March 16, 2007 - 7:44 am

I have been using this hw-config (SATA II, NCQ) since the nvidia ADMA support 
made it in the -mm kernel (maybe around 2.6.19-mm? or even earlyer). I'm 
seeing this problem excessively since I upgraded to 2.6.21-rc3-mm1. I think 
something got broken recently...

-Christian
-

From: Tejun Heo
Date: Saturday, March 17, 2007 - 10:43 pm

Can you post the result of "hdparm -I /dev/sdX"?

-- 
tejun
-

From: Christian
Date: Sunday, March 18, 2007 - 1:31 pm

Output generated on 2.6.21-rc3-mm1 #3 SMP PREEMPT

user@ubuntu:~$ sudo hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
        Model Number:       SAMSUNG HD401LJ
        Serial Number:      S0HVJ1FL900207
        Firmware Revision:  ZZ100-15
Standards:
        Used: ATA/ATAPI-7 T13 1532D revision 4a
        Supported: 7 6 5 4
Configuration:
        Logical         max     current
        cylinders       16383   16383
        heads           16      16
        sectors/track   63      63
        --
        CHS current addressable sectors:   16514064
        LBA    user addressable sectors:  268435455
        LBA48  user addressable sectors:  781422768
        device size with M = 1024*1024:      381554 MBytes
        device size with M = 1000*1000:      400088 MBytes (400 GB)
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 32
        Standby timer values: spec'd by Standard, no device specific minimum
        R/W multiple sector transfer: Max = 16  Current = 16
        Recommended acoustic management value: 254, current value: 0
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
udma7
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4
             Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           *    SMART feature set
                Security Mode feature set
           *    Power Management feature set
           *    Write cache
           *    Look-ahead
           *    Host Protected Area feature set
           *    WRITE_BUFFER command
           *    READ_BUFFER command
           *    NOP cmd
           *    DOWNLOAD_MICROCODE
                SET_MAX security extension
                Automatic Acoustic Management feature set
           *    48-bit Address feature set
           *    Device Configuration Overlay feature set
           *    Mandatory FLUSH_CACHE
           *    ...
From: Tejun Heo
Date: Sunday, March 18, 2007 - 7:48 pm

That's a fairly recent drive.  Does the problem go away if you downgrade
the kernel?

-- 
tejun
-

From: Christian
Date: Monday, March 19, 2007 - 12:08 am

Yes, for me the problem was introduced recently. I have moved around terabytes 
(sic!) on my discs with older kernels and I never got errors.

-Christian
-

From: Tejun Heo
Date: Monday, March 19, 2007 - 12:39 am

There is always the possibility of disk going bad, so it would be great
if you can boot an older kernel and verify that the problem doesn't
occur on it.

Thanks.

-- 
tejun
-

From: Christian
Date: Wednesday, March 21, 2007 - 7:34 am

I've tested multiple kernels (including -mm series) in the range of 2.6.19.7
(before sata_nv adma support went in) up to 2.6.20-rc4.
Every NCQ enabled kernel I've tested showed ata errors in dmesg. So I came to 
the conclusion that my system was faulty. I ran memtest86+ for a long time, 
but no errors were found. After some fiddling with my HW I discovered that 
the nforce chipset fan induced some kind of electro magnetic interference to 
the southbridge, which could clearly be heard as a low frequency noise if I 
plugged in my speakers to the onboard sound. After replacing the fan, my 
system is stable again. Now running 2.6.21-rc3-mm2+rsdlv31 without errors. 
Really strange problem he ;-)

-Christian
-

From: Tejun Heo
Date: Wednesday, March 21, 2007 - 9:35 am

Man, that's the strangest way to solve ATA command failures I've ever 
heard of.  Kudos to you for finding it out.  :-)

-- 
tejun
-

From: Jeff Garzik
Date: Monday, March 19, 2007 - 5:09 am

I may have missed the answer to this before, but:  does the problem go 
away if you disable preempt?

	Jeff



-

From: Max Kellermann
Date: Tuesday, March 20, 2007 - 2:41 am

On my system (same problem, original bug report), preemption is
disabled.

Max

-

From: Pablo Sebastian Greco
Date: Monday, March 19, 2007 - 5:21 am

Tejun: sdb and sdc are exactly the same drives as when I had my problem.
Christian: Can you verify if this firmware upgrade helps?

http://www.samsung.com/Products/HardDiskDrive/support/faqs/faqs_20060414_0000246673.htm

Thanks.
Pablo.
-

Previous thread: [PATCH][RSDL-mm 6/6] sched: document rsdl cpu scheduler by Con Kolivas on Friday, March 16, 2007 - 6:55 am. (1 message)

Next thread: [PATCH][RSDL-mm 5/6] sched: implement rsdl cpu scheduler by Con Kolivas on Friday, March 16, 2007 - 6:55 am. (1 message)