It looks like the IntelliPark feature on a Western Digital Caviar Green HDD can cause issues with OpenBSD, which can be fixed/mitigated by disabling IntelliPark. About 6 months ago, I built myself a new amd64 machine. I decided to optimize for low wattage--reducing power costs and waste heat, increasing UPS runtime--and so I chose a single Western Digital Caviar Green HDD. Although these drives are intended/marketed for something more like nearline storage, according to bonnie++, the drive performed roughly as well as the 7200RPM PATA-100 2-drive mirror in my old machine. The machine I built, initially running 4.7/amd64, then 4.8/amd64 (both unmodified -RELEASE) was never stable for more than a couple of days at a time. The machine would freeze hard, sometimes with the HDD light lit solid, usually not. I worked around a number of bugs, trying a patched kernel with http://marc.info/?l=openbsd-misc&m=128897915014154&w=2, and disabling installing an fxp(4) so I could disable the onboard re(4). I wrote scripts to monitor hw.sensors, SMART, and various stats from systat(1), and graph them using rrdtool. What I noticed was that my machine would generally crash right before an IO-intensive cronjob started. I also noticed that SMART stat 193 (Load/Unload Cycle Count) was very high, and climbing rapidly. Doing some research on this stat, I found out that WD Caviar Green drives have a feature called IntelliPark that parks the HDD heads after 8 seconds of inactivity. This is supposed to make the HDD more efficient, but has been reported not to play well with Linux, and WD provides a workaround: the WDIDLE3 utility, which would allow me to change/disable the IntelliPark 8-second timeout. I ran WDIDLE3 on my WD Caviar Green HDD, setting the timeout to the maximum allowed (300 seconds). I have a monitoring process running that writes to disk roughly every 60 seconds, so IntelliPark is effectively disabled for me. As of now, the system has been up a record 19.5 days ...
Hrm, do you have model number of the drives? I have some WD drives in a raid 10 array (LVM2 + EXT4 + linux) for my media PC and it would be useful to figure out if some of the issues I have seen over the last year have been related to the use of drive.
I have a WD10EADS-22M2B0. Manufacture date printed on the drive is 17 MAR 2010, and I haven't attempted any firmware updates, if applicable. There appear to be some drives out there that support a much wider range of IntelliPark timeouts, and support TLER. My idle timer only goes up to 300s, or can be completely disabled, which I didn't do, since I heard rumors that it silently turned off other power-saving features. I also tried enabling TLER, to no avail. I guess I got the cheaper kind.
On Thu, 09 Dec 2010 14:48:02 -0500 Not an issue with OpenBSD in itself. It's a generall "bug" with the firmware. The issue also gets triggered by the allmighty Linux. Even Windows hits that issue, oh wait, i already said, it's not an OS problem, ... If you have one of those disks, turn that "feature" off, get a "fixed" firmware (hehe) or buy something else. WD's trackrecord is reaching Seagate levels. Heck, even Hitachi has remidied itself from the deathstar tech they took over from IBM.
On Thu, 9 Dec 2010 22:50:21 +0100 Just to be complete, Samsung fixing their SMART bug with a firmware that doesn't bump the version number, doesn't realy make me want to recomend them anymore that much either, atm. (still F3's works or were "dead" on arrival) </rant>
ok, what manufacturers are left??? :)) just toshiba???
On Fri, 10 Dec 2010 23:25:56 +0100 i am happy with samsung, because in that area i am a cheapskate. hardware dies, deal with it, don't buy the new kid on the block and be happy. :) sata disk got really crappy since they hit 2TB. (or 1.5TB in Seagates case.)
On Sat, 11 Dec 2010 01:23:36 +0100 Hitachi have said that some issues were hit when they moved to 2tbs but a new generation of their drives will solve these problems starting with a 3tb version. I've just bought a 2tb WD too, luckily it will only be used for cctv backup from time to time.
Can't say anything bad about WD Raid Edition drives. Currently I've go over 100 of them without any problems. Thou I've found some of them generating small number of Raw Read Error Rate, but only in 2TB model WDC WD2002FYPS. I've got much worse experience with Seagate............ My policy is to replace them every 2 years, faulty or not. regards M.K.
I have hundreds of disks in use, about half ide/sata and half scsi, and the vast mojority of them seagate. i lose about 2 disks a year. this year it was a 18G SCA (quantum btw) and a 14G IDE - IBM. for new machines i insist on seagate disks whereever possible, when i order disks seperately i order seagate or SSDs. i have not lost a single seagate sata drive yet. in short, i disagree with your seagate judgement. but then i also skip the cheap ones. oh, samsung: 9, not a single one in use, 2 suspect defective. WD: just 6, only 3 in use, one of them defective. and for comedy: there is just two vendors (according to the vendor strings, there might be relabeling) i only have a single disk from. and they are alive. these babies: CONNER CFP2105E 2.14GB 1524 DEC RZ26N (C) DEC 0466 -- Henning Brauer, email@example.com, firstname.lastname@example.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting
I'm going to say "Anyone who says brand X is great and Y is crap" has just exposed themselves as a newbie in the computer business. :) I've seen every make of drive have some real stinkers, and also build drives that don't seem to die. Unfortunately, by the time you can say, "This model is really good" or "this model is a disaster", it's too late, the drive has been out of production for six months (or has had its production processes changed, and the old results don't represent the current production runs). One of the worst drives in terms of quality and failure I ever saw was the Seagate ST225. One of the best was...uh...the Seagate ST225. The difference was at the beginning, the ST225 was a cutting edge drive, a whopping 20M of storage in a half-height case, with a label on the drive listing dozens of bad sectors. By the end of its production run, the bad sector tables on almost all ST225 drives were COMPLETELY empty, they were 100% good out of the box, and would run long past their useful life. By this point, they were old tech and Just Worked. (ok, the worst drives I ever had were "JTS". One day, I was overly frustrated at all the major drive makers, and saw these "JTS" brand drives, and figured they either had a good idea or a bad one. Turned out to be bad beyond my imagination... Fortunately, they seem to have vanished from the world shortly after they arrived, but... *shudder*) I discovered (quite) a few years back that you could toast a Samsung disk on demand using the Novell disk test utility. Now, I can't seem to get one to fail. Right now, if you buy a 2TB disk, expect it to be unreliable. Expect a 300G drive to last for quite some time (if you can find one). You still have to have backups, you still have to have plan for what you do until it is repaired (failure tolerance), and you have to have a plan for how you will repair it (failure recovery). If you are deploying a thousand machines, yeah, it would be really nice to know that this ...
+10000 Nick Holland Rules!
I'll have to disagree a bit here. Manufacturers go through cycles and usually there is one that stands out on a size/period. Manufacturers almost never change the manufacturing process over time for a particular drive. They will update firmware as time goes buy. So a good drive today is going to be equally good in a few years. The thing I disagree with is that there are very good 2TB drives out there. The trick is to have enough of a brand (usually a few hundred) to start to understand it's personality. If you have volume you can pretty easily determine which manufacturer is good today and sucked yesterday. And when a new generation drives come out it starts all over again. Oh and be safe, make backups. FWIW
Some manufacturers have the advantage of providing good documentation for their drives, some others do not have any at all. Regards, David
|Kay Sievers||Re: char/tpm: tpm_infineon no longer loaded for HP 2510p laptop|
|Eric W. Biederman||[PATCH 8/8] sysfs: user namespaces: fix bug with clone(CLONE_NEWUSER) with fairsched|
|S K||Re: cpufreq doesn't seem to work in Intel Q9300|
|Bart Van Assche||Re: Is gcc thread-unsafe?|
|Greg Kroah-Hartman||[PATCH 20/36] Driver core: Call device_pm_add() after bus_add_device() in device_a...|
|Junio C Hamano||Re: git-svnimport|
|Junio C Hamano||Re: [PATCH] git-mv: Keep moved index entries inact|
|Johannes Schindelin||Re: [PATCH] Fix approxidate("never") to always return 0|
|A Large Angry SCM||Re: [RFC] origin link for cherry-pick and revert|
|Gabriel||[PATCH] When a remote is added but not fetched, tell the user.|
|Daniel Lezcano||getsockopt(TCP_DEFER_ACCEPT) value change|
|David Miller||Re: 22.214.171.124: bnx2/tg3: BUG: "scheduling while atomic" trying to ifenslave a seco...|
|Ingo Molnar||Re: [regression] nf_iterate(), BUG: unable to handle kernel NULL pointer dereference|
|Eric W. Biederman||[PATCH 14/20] net: Simplify pppol2tp pernet operations.|
|Jeff Kirsher||[net-2.6 PATCH 2/5] e1000e: increase swflag acquisition timeout for ICHx/PCH|
|Linux Kernel Mailing List||ath9k_htc: Allocate URBs properly|
|Linux Kernel Mailing List||sm501: add power control callback|
|Linux Kernel Mailing List||MIPS: Cavium: Remove unused watchdog code.|
|Linux Kernel Mailing List||V4L/DVB (8976): af9015: Add USB ID for AVerMedia A309|
|Linux Kernel Mailing List||ARM: 5670/1: bcmring: add default configuration for bcmring arch|
|daniele.pilenga||snmpd hangs on 4.1 looking up hrSWRunTable|
|Jason Dixon||Re: any web management gui for pf ?|
|Christophe Rioux||Implementation example of snmp|
|Nick Holland||Re: booting openbsd on eee without cd-rom|
|Bryan Irvine||Re: OpenBSD 4.7 Released, May 19 2010|