login
Header Space

 
 

Re: Mechanism to safely force repair of single md stripe w/o hurting data integrity of file system

Score:
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: <David@...>
Cc: Guy Watkins <linux-raid@...>, 'LinuxRaid' <linux-raid@...>, <linux-kernel@...>
Date: Saturday, May 17, 2008 - 7:16 pm

David Lethe wrote:

I bet $500 is well below minimum wage in the US for the number of hours it would 
take someone to do this.

And I would say that if you have > 100TB in a single raid5/6 that would mean you 
had to have at least 100 disks in that array, and most people get nervous at 
 >8-16 disks in either raid5 or raid6 arrays, and the statistics of disks going 
bad, and the chance of a rebuild succeeding before another disk/block goes bad 
gets smaller and smaller as the number of disks increase, as you have noted you 
are at the point that it becomes unlikely that the rebuild will ever complete 
even with good disks in the array.   Most people build a number of smaller 
raid5/raid6 arrays and then LVM them together to get around this issue.   And on 
top of that the larger number of disks the greater the IO required to do a 
rebuild so the slower the rebuild potentially is.   And that is assuming that 
you don't have a bad batch of disks that has an abnormally high failure rate.

I know of a hardware disk arrays that handle the bad block issue by allocating 
(on initial array construction) a set of spare blocks on each disk.  On finding 
a bad block on a disk they relocated and rebuild just the bad block on the disk 
with the bad block from the stripe/parity and somehow note that the block on the 
bad disk has been relocated, and after some number of bad blocks on a given 
disk, they note that the given disk has too many bad blocks, and you that should 
"clone" and then fail the original disk over to the cloned disk once the clone 
is finished, but this sort of thing would seem to be rather non-trivial, though 
if someone would setup a clone of the bad disk, and rebuild the bad sector this 
would probably cut down the amount of time/IO required to complete a rebuild, 
though it would still take several hours, and things would get more complicated 
if you had another failure during that process.


                                            Roger
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: Mechanism to safely force repair of single md stripe w/o..., Roger Heflin, (Sat May 17, 7:16 pm)
speck-geostationary