Re: Freeze with Western Digital Caviar Green HDD

Previous thread: Hash error on /bsd (correction) by OpenBSD Geek on Thursday, December 9, 2010 - 11:53 am. (1 message)

Next thread: inexactitude bancaire by info - Caisse dEprgne on Thursday, December 9, 2010 - 4:28 pm. (1 message)
From: Aaron Suen
Date: Thursday, December 9, 2010 - 12:48 pm

It looks like the IntelliPark feature on a Western Digital Caviar Green
HDD can cause issues with OpenBSD, which can be fixed/mitigated by
disabling IntelliPark.

About 6 months ago, I built myself a new amd64 machine.  I decided to
optimize for low wattage--reducing power costs and waste heat,
increasing UPS runtime--and so I chose a single Western Digital Caviar
Green HDD.  Although these drives are intended/marketed for something
more like nearline storage, according to bonnie++, the drive performed
roughly as well as the 7200RPM PATA-100 2-drive mirror in my old
machine.

The machine I built, initially running 4.7/amd64, then 4.8/amd64 (both
unmodified -RELEASE) was never stable for more than a couple of days at
a time.  The machine would freeze hard, sometimes with the HDD light lit
solid, usually not.  I worked around a number of bugs, trying a patched
kernel with http://marc.info/?l=openbsd-misc&m=128897915014154&w=2, and
disabling installing an fxp(4) so I could disable the onboard re(4).  I
wrote scripts to monitor hw.sensors, SMART, and various stats from
systat(1), and graph them using rrdtool.  What I noticed was that my
machine would generally crash right before an IO-intensive cronjob
started.

I also noticed that SMART stat 193 (Load/Unload Cycle Count) was very
high, and climbing rapidly.  Doing some research on this stat, I found
out that WD Caviar Green drives have a feature called IntelliPark that
parks the HDD heads after 8 seconds of inactivity.  This is supposed to
make the HDD more efficient, but has been reported not to play well with
Linux, and WD provides a workaround: the WDIDLE3 utility, which would
allow me to change/disable the IntelliPark 8-second timeout.  I ran
WDIDLE3 on my WD Caviar Green HDD, setting the timeout to the maximum
allowed (300 seconds).  I have a monitoring process running that writes
to disk roughly every 60 seconds, so IntelliPark is effectively disabled
for me.  As of now, the system has been up a record 19.5 days ...
From: Joel Wiramu Pauling
Date: Thursday, December 9, 2010 - 1:34 pm

Hrm, do you have model number of the drives?

I have some WD drives in a raid 10 array (LVM2 + EXT4 + linux) for my
media PC and it would be useful to figure out if some of the issues I
have seen over the last year have been related to the use of drive.


From: Aaron Suen
Date: Thursday, December 9, 2010 - 1:48 pm

I have a WD10EADS-22M2B0.  Manufacture date printed on the drive is 17
MAR 2010, and I haven't attempted any firmware updates, if applicable.

There appear to be some drives out there that support a much wider range
of IntelliPark timeouts, and support TLER.  My idle timer only goes up
to 300s, or can be completely disabled, which I didn't do, since I heard
rumors that it silently turned off other power-saving features.  I also
tried enabling TLER, to no avail.  I guess I got the cheaper kind.


From: roberth
Date: Thursday, December 9, 2010 - 2:50 pm

On Thu, 09 Dec 2010 14:48:02 -0500

Not an issue with OpenBSD in itself.
It's a generall "bug" with the firmware. The issue also gets triggered
by the allmighty Linux.
Even Windows hits that issue, oh wait, i already said, it's not an OS
problem, ...

If you have one of those disks, turn that "feature" off, get a "fixed"
firmware (hehe) or buy something else.

WD's trackrecord is reaching Seagate levels.
Heck, even Hitachi has remidied itself from the deathstar tech they took
over from IBM.

From: roberth
Date: Thursday, December 9, 2010 - 3:16 pm

On Thu, 9 Dec 2010 22:50:21 +0100

Just to be complete,
Samsung fixing their SMART bug with a firmware that doesn't bump the
version number, doesn't realy make me want to recomend them anymore
that much either, atm.
(still F3's works or were "dead" on arrival) </rant>

From: Paolo Aglialoro
Date: Friday, December 10, 2010 - 3:25 pm

ok, what manufacturers are left??? :)) just toshiba???



From: roberth
Date: Friday, December 10, 2010 - 5:23 pm

On Fri, 10 Dec 2010 23:25:56 +0100

i am happy with samsung, because in that area i am a cheapskate.
hardware dies, deal with it, don't buy the new kid on the block and be
happy. :)
sata disk got really crappy since they hit 2TB. (or 1.5TB in Seagates
case.)

From: Kevin Chadwick
Date: Wednesday, December 15, 2010 - 10:00 am

On Sat, 11 Dec 2010 01:23:36 +0100

Hitachi have said that some issues were hit when they moved to 2tbs but
a new generation of their drives will solve these problems starting
with a 3tb version. I've just bought a 2tb WD too, luckily it will only
be used for cctv backup from time to time.

From: Michał Koc
Date: Wednesday, December 15, 2010 - 11:37 am

Can't say anything bad about WD Raid Edition drives.

Currently I've go over 100 of them without any problems.

Thou I've found some of them generating small number of Raw Read Error 
Rate, but only in 2TB model WDC WD2002FYPS.

I've got much worse experience with Seagate............

My policy is to replace them every 2 years, faulty or not.

regards
M.K.


From: Henning Brauer
Date: Friday, December 24, 2010 - 2:55 am

I have hundreds of disks in use, about half ide/sata and half scsi,
and the vast mojority of them seagate. i lose about 2 disks a year.
this year it was a 18G SCA (quantum btw) and a 14G IDE - IBM.

for new machines i insist on seagate disks whereever possible, when i
order disks seperately i order seagate or SSDs. i have not lost a
single seagate sata drive yet.

in short, i disagree with your seagate judgement.
but then i also skip the cheap ones.

oh, samsung: 9, not a single one in use, 2 suspect defective.
WD: just 6, only 3 in use, one of them defective.

and for comedy: there is just two vendors (according to the vendor
strings, there might be relabeling) i only have a single disk
from. and they are alive. these babies:
  CONNER CFP2105E 2.14GB 1524
  DEC RZ26N (C) DEC 0466

-- 
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting

From: Nick Holland
Date: Friday, December 10, 2010 - 9:11 pm

I'm going to say "Anyone who says brand X is great and Y is crap" has
just exposed themselves as a newbie in the computer business. :)

I've seen every make of drive have some real stinkers, and also build
drives that don't seem to die.  Unfortunately, by the time you can say,
"This model is really good" or "this model is a disaster", it's too
late, the drive has been out of production for six months (or has had
its production processes changed, and the old results don't represent
the current production runs).

One of the worst drives in terms of quality and failure I ever saw was
the Seagate ST225.  One of the best was...uh...the Seagate ST225.  The
difference was at the beginning, the ST225 was a cutting edge drive, a
whopping 20M of storage in a half-height case, with a label on the drive
listing dozens of bad sectors.  By the end of its production run, the
bad sector tables on almost all ST225 drives were COMPLETELY empty, they
were 100% good out of the box, and would run long past their useful
life.  By this point, they were old tech and Just Worked.

(ok, the worst drives I ever had were "JTS".  One day, I was overly
frustrated at all the major drive makers, and saw these "JTS" brand
drives, and figured they either had a good idea or a bad one.  Turned
out to be bad beyond my imagination...  Fortunately, they seem to have
vanished from the world shortly after they arrived, but...  *shudder*)

I discovered (quite) a few years back that you could toast a Samsung
disk on demand using the Novell disk test utility.  Now, I can't seem to
get one to fail.

Right now, if you buy a 2TB disk, expect it to be unreliable.  Expect a
300G drive to last for quite some time (if you can find one). You still
have to have backups, you still have to have plan for what you do until
it is repaired (failure tolerance), and you have to have a plan for how
you will repair it (failure recovery).

If you are deploying a thousand machines, yeah, it would be really nice
to know that this ...
From: Jan Stary
Date: Saturday, December 11, 2010 - 12:33 am

From: Eric Furman
Date: Saturday, December 11, 2010 - 2:57 am

+10000
Nick Holland Rules!

From: Marco Peereboom
Date: Saturday, December 11, 2010 - 6:32 am

I'll have to disagree a bit here.  Manufacturers go through cycles and
usually there is one that stands out on a size/period.

Manufacturers almost never change the manufacturing process over time
for a particular drive.  They will update firmware as time goes buy.  So
a good drive today is going to be equally good in a few years.

The thing I disagree with is that there are very good 2TB drives out
there.  The trick is to have enough of a brand (usually a few hundred)
to start to understand it's personality.  If you have volume you can
pretty easily determine which manufacturer is good today and sucked
yesterday.  And when a new generation drives come out it starts all over
again.

Oh and be safe, make backups.

FWIW


From: David Vasek
Date: Sunday, December 12, 2010 - 12:07 pm

Some manufacturers have the advantage of providing good documentation for 
their drives, some others do not have any at all.

Regards,
David

Previous thread: Hash error on /bsd (correction) by OpenBSD Geek on Thursday, December 9, 2010 - 11:53 am. (1 message)

Next thread: inexactitude bancaire by info - Caisse dEprgne on Thursday, December 9, 2010 - 4:28 pm. (1 message)