David Lethe wrote:I bet $500 is well below minimum wage in the US for the number of hours it would take someone to do this. And I would say that if you have > 100TB in a single raid5/6 that would mean you had to have at least 100 disks in that array, and most people get nervous at >8-16 disks in either raid5 or raid6 arrays, and the statistics of disks going bad, and the chance of a rebuild succeeding before another disk/block goes bad gets smaller and smaller as the number of disks increase, as you have noted you are at the point that it becomes unlikely that the rebuild will ever complete even with good disks in the array. Most people build a number of smaller raid5/raid6 arrays and then LVM them together to get around this issue. And on top of that the larger number of disks the greater the IO required to do a rebuild so the slower the rebuild potentially is. And that is assuming that you don't have a bad batch of disks that has an abnormally high failure rate. I know of a hardware disk arrays that handle the bad block issue by allocating (on initial array construction) a set of spare blocks on each disk. On finding a bad block on a disk they relocated and rebuild just the bad block on the disk with the bad block from the stripe/parity and somehow note that the block on the bad disk has been relocated, and after some number of bad blocks on a given disk, they note that the given disk has too many bad blocks, and you that should "clone" and then fail the original disk over to the cloned disk once the clone is finished, but this sort of thing would seem to be rather non-trivial, though if someone would setup a clone of the bad disk, and rebuild the bad sector this would probably cut down the amount of time/IO required to complete a rebuild, though it would still take several hours, and things would get more complicated if you had another failure during that process. Roger --
| Greg Kroah-Hartman | [PATCH 002/196] Chinese: rephrase English introduction in HOWTO |
| Linus Torvalds | Linux 2.6.27-rc8 |
| Parag Warudkar | BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] |
| James Bottomley | Re: Integration of SCST in the mainstream Linux kernel |
git: | |
| Jakub Narebski | Re: VCS comparison table |
| Wincent Colaiuta | Re: [ANNOUNCE] GIT 1.5.4 |
| Sam Song | Fwd: [OT] Re: Git via a proxy server? |
| Junio C Hamano | Re: More precise tag following |
| Nick Guenther | Re: Real men don't attack straw men |
| Raimo Niskanen | HP ProLiant DL140 G3 problems |
| Todd Pytel | IDE or SCSI virtual disks for VMWare image? |
| R. Fumione | OpenBSD speed on desktops |
| Patrick McHardy | [NET_SCHED 03/15]: act_api: fix netlink API conversion bug |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Theodore Tso | Re: [bug?] tg3: Failed to load firmware "tigon/tg3_tso.bin" |
| Linus Torvalds | Re: [GIT]: Networking |
