Re: GA-MA790FX-DS5 SATA ahci NCQ erros on Jmicron 20360/20363 (JMB363) kernel 2.6.25-2 Debian/Lenny

Previous thread: Random crashes with 2.6.27-rc3 on PPC by Michael Buesch on Saturday, August 23, 2008 - 7:10 am. (9 messages)

Next thread: Re: HPET regression in 2.6.26 versus 2.6.25 -- found another user with the same regression by David Witbrodt on Saturday, August 23, 2008 - 8:42 am. (2 messages)
From: Sergey Spiridonov
Date: Saturday, August 23, 2008 - 7:32 am

Hi

I got kernel errors [1] and [2] followed by SATA reset on heavy load on
the hard drive connected to the GA-MA790FX-DS5 onboard controller
Jmicron 20360/20363 (JMB363) (here is lspci [3]). Hard drive connected
to the another onboard (south bridge from AMD SB600) controller works
without problem.

I got two 1TB Seagate hard disks, ST31000340AS and ST31000340NS. I
connected one to Jmicron JMB363, another to SB600. After some testing
with several instances of bonnie++ I got kernel errors [1] and [2].
After this I exchanged hard disks connections. The one which was
connected to JMB363 I connected to SB600 and vs versa. Errors, timeouts
and hard drive resetting happened always on the hard drive which is
connected to the JMB363 (in log file it is sdb). There are no errors if
both drives are connected to the SB600.

Here [4] is complete (before i get errors) dmesg output after system is
booted.

I already replaced (took from working PC) power supply, memory, video
card and dvd drive. I get same problems also with this devices. So
problem must be motherboard, software or CPU. CPU seems to work O.K.

It looks like the problem is motherboard or ahci ata driver. Does
somebody have any clue about it? Is chip JMB363 broken or linux driver
is broken?

[1] http://hurd.homeunix.org/~sena/GA-MA790FX-DS5/dmesg-sata-errors.txt
[2] http://hurd.homeunix.org/~sena/GA-MA790FX-DS5/dmesg-sata-errors2.txt
[3] http://hurd.homeunix.org/~sena/GA-MA790FX-DS5/lspci.txt
[4] http://hurd.homeunix.org/~sena/GA-MA790FX-DS5/dmesg-after-boot.txt

Here is complete hw description:
--------------------------------------------
Motherboard : GA-MA790FX-DS5(rev. 1.0)
BIOS Ver : F6 (was tested also with F5)
VGA Brand : Asus Model : EN8400GS HTP TD 256MB
CPU Brand : AMD Model : AM2 Athlon64 X2 4850E boxed Speed : 2500 MHz
Operation System : Debian GNU/Linux Lenny with kernel 2.6.25-2
Memory Brand : Kingston Type : DDRII
Memory Size : 1GB Speed : 800Mhz
Power Supply : 600W MS-Tech MP-600 ...
From: Jeff Garzik
Date: Saturday, August 23, 2008 - 3:38 pm

See http://ata.wiki.kernel.org/index.php/Libata_error_messages for an 
introduction.

In general, tons of ATA bus errors and SError register bits means that 
problems are coming from the ATA bus, a.k.a. the SATA cable and its 
related connections.

So...  suspect bad cables, bad port connectors, cable interference, 
motherboard-caused interference or grounding problems, power supply 
problems.

	Jeff



--


hmm, or something totally odd...

what happens if you do: (after you made a backup!)
"dd if=/dev/sdX(where X is your affected hdd?) of=/dev/null bs=1"
Note:
The important bit is the small bs (blocksize) number.
You can throw in a O_DIRECT flag to disable the caches, or 
if you have some "empty" partition space, you can "dd" into
it with a small blocksize too)

my seagate & even a samsung hd103uj doesn't like that and will spew
out the same sort problems you have just posted... (but they work fine,
if I don't do nasty dd things!)

and unfortunatly my md(raid1) seems to do lots of "small" reads & writes
when it starts to check/resync the whole 1TB array :-/.

Regards,
	Chr
--

From: Sergey Spiridonov
Date: Saturday, August 23, 2008 - 5:27 pm

Hi


I did exchange power supply and I did exchange hard drives. The same
hard drive with the same cable works with SB600 and produces errors with
JMB363. So looks like it is not cable or hard drive problem. May be the
problem is JMB363 port connector on the motherboard. How can I check it?
-- 
Best regards, Sergey Spiridonov

--

From: Jeff Garzik
Date: Saturday, August 23, 2008 - 9:39 pm

Try new motherboard of same brand and model :/

In general, tons of ATA bus errors and SError complaints indicate some 
sort of problem at the physical layer/level.  Its always possible that 
software is to blame, but bug report patterns so far tend to point to 
hardware.

	Jeff


--


Hi!

I have a JMB363 myself and it has its share of problems.
I would say it is buggy hardware. (why would they otherwise
release a new windows driver every week ? if not to workaround
bugs in HW ;-)

My WD MyBook Studio Edition 500 GB external eSATA drive does not
work on the JMB363 correctly no matter what I try. Both
under linux and windows. I think the best was 30 minutes of
(apparent) error free operation under windows.

If interested, I can supply logs, data etc.
(I have a bunch of drives to try).

Regards,
David


--

From: Tejun Heo
Date: Saturday, August 30, 2008 - 4:00 am

Well, FWIW, JMB ahci's are one of my favorites and usually very well

This one is being discussed both with JMB and WD.  It seems the bridge
chip used in the WD external drives is somehow incompatible with the
JMB ahci's.  Don't know whose fault it is or how it can be worked
around yet.  The issue is being tracked in the following bugzilla.

  http://bugzilla.kernel.org/show_bug.cgi?id=9913

-- 
tejun
--

From: xerces8
Date: Saturday, August 30, 2008 - 11:13 am

I know, I'm David Balažic (the last commenter on bug, besides you) ;-)

Regards,
David


--

From: Tejun Heo
Date: Sunday, August 31, 2008 - 2:33 am

Somehow I've been confusing people a lot lately.  I asked my AMD contact
 a few times about sata_nv problems somehow thinking AMD acquired NVidia
instead of ATI.  :-)

-- 
tejun
--

Previous thread: Random crashes with 2.6.27-rc3 on PPC by Michael Buesch on Saturday, August 23, 2008 - 7:10 am. (9 messages)

Next thread: Re: HPET regression in 2.6.26 versus 2.6.25 -- found another user with the same regression by David Witbrodt on Saturday, August 23, 2008 - 8:42 am. (2 messages)