Re: Suggestion needed for fixing RAID6

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: MRK
Date: Sunday, May 2, 2010 - 4:05 pm

On 05/01/2010 11:44 PM, Janos Haar wrote:

Firstly: do you have any backup of your data? If not, before doing any 
experiment I suggest that you back up important stuff. This can be done 
with rsync, and reassembling the array every time it goes down. I 
suggest to put the array in readonly mode (mdadm --readonly /dev/md3): 
this should prevent resyncs from starting automatically, and AFAIR even 
prevent drives being dropped because of read errors (but you can't use 
it during resyncs or rebuilds). Resyncs are bad because they will 
eventually bring down your array. Don't use DM when doing this.

Now, for the real thing, instead of experimenting with bitmaps, I 
suggest you try and see if the normal MD resync works now. If that works 
then you can do the normal rebuild.

*Pls note that: DM should not be needed!* - I know that you have tried 
resyncing with DM COW under MD and that one doesn't work well in this 
case, but in fact DM should not be needed.

We pointed you to DM around Apr 23rd because at that time we thought 
that your drives were dropping for uncorrectable read error, but we had 
guessed wrong.
The general MD phylosophy is that if there is enough parity 
informations, drives are not dropped just for a read error. Upon read 
error MD recomputes the value of the sector from the parity information, 
and then it attempts rewriting the block in place. During this rewrite 
the drive performs a reallocation, moving the block to a hidden spare 
region. If this rewrite fails it means that the drive is out of spare 
sectors and this is considered to be a major failure for MD, and only at 
that point the drive is dropped.
So we thought this was the reason also in your case, but we were wrong, 
in your case it was because of an MD bug, which is the one for which I 
submitted the patch.

So it should work now (without DM). And I think this is the safest thing 
you can try. Having a backup is always better though.

So start the resync without DM and see if it goes through to the end 
without dropping drives. You can use sync_min to cut the dead times.

For max safety you could first try resyncing only one chunk from the 
region of the damaged sectors, so to provoke only a minimum amount of 
rewrites. Set the sync_min to the location of the errors, and sync_max 
to just one chunk above. See what happens...
If it rewrites correctly and the drive is not dropped, then run "check" 
again on the same region and see if "cat /sys/block/md3/md/mismatch_cnt" 
still returns zero (or the value it was before the rewrite). If it is 
zero (or anyway has not changed value) it means the block was really 
rewritten with the correct value: recovery of one sector really works 
for raid6 in singly-degraded state. Then the procedure is safe, as far 
as I understand, and you can go ahead on the other chunks.
When all damaged sectors are reallocated, there are no more read errors, 
and the mismatch_cnt is still at zero, you can go ahead replacing the 
defective drive.

There are a few reasons that can still make the resync fail if we are 
really unlucky, but dmesg should point us to the right direction in that 
case.
Also remember that the patch still needs testing... currently it is not 
really tested because DM drops the drive before MD. We would need to 
know if raid6 is behaving like a raid6 now or it's still behaving like a 
raid5...
Thank you

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Suggestion needed for fixing RAID6, Janos Haar, (Thu Apr 22, 3:09 am)
Re: Suggestion needed for fixing RAID6, Mikael Abrahamsson, (Thu Apr 22, 8:00 am)
Re: Suggestion needed for fixing RAID6, Janos Haar, (Thu Apr 22, 8:12 am)
Re: Suggestion needed for fixing RAID6, Mikael Abrahamsson, (Thu Apr 22, 8:18 am)
Re: Suggestion needed for fixing RAID6, Janos Haar, (Thu Apr 22, 9:25 am)
Re: Suggestion needed for fixing RAID6, Peter Rabbitson, (Thu Apr 22, 9:32 am)
Re: Suggestion needed for fixing RAID6, Janos Haar, (Thu Apr 22, 1:48 pm)
Re: Suggestion needed for fixing RAID6, Luca Berra, (Thu Apr 22, 11:51 pm)
Re: Suggestion needed for fixing RAID6, Janos Haar, (Fri Apr 23, 1:47 am)
Re: Suggestion needed for fixing RAID6, MRK, (Fri Apr 23, 5:34 am)
Re: Suggestion needed for fixing RAID6, Janos Haar, (Sat Apr 24, 12:36 pm)
Re: Suggestion needed for fixing RAID6, MRK, (Sat Apr 24, 3:47 pm)
Re: Suggestion needed for fixing RAID6, Janos Haar, (Sun Apr 25, 3:00 am)
Re: Suggestion needed for fixing RAID6, MRK, (Mon Apr 26, 3:24 am)
Re: Suggestion needed for fixing RAID6, Janos Haar, (Mon Apr 26, 5:52 am)
Re: Suggestion needed for fixing RAID6, MRK, (Mon Apr 26, 9:53 am)
Re: Suggestion needed for fixing RAID6, Janos Haar, (Mon Apr 26, 3:39 pm)
Re: Suggestion needed for fixing RAID6, Michael Evans, (Mon Apr 26, 4:06 pm)
Re: Suggestion needed for fixing RAID6, Michael Evans, (Mon Apr 26, 5:04 pm)
Re: Suggestion needed for fixing RAID6, Janos Haar, (Tue Apr 27, 8:50 am)
Re: Suggestion needed for fixing RAID6, MRK, (Tue Apr 27, 4:02 pm)
Re: Suggestion needed for fixing RAID6, Neil Brown, (Tue Apr 27, 6:37 pm)
Re: Suggestion needed for fixing RAID6, Mikael Abrahamsson, (Tue Apr 27, 7:02 pm)
Re: Suggestion needed for fixing RAID6, Neil Brown, (Tue Apr 27, 7:12 pm)
Re: Suggestion needed for fixing RAID6, Mikael Abrahamsson, (Tue Apr 27, 7:30 pm)
Re: Suggestion needed for fixing RAID6, MRK, (Wed Apr 28, 5:57 am)
Re: Suggestion needed for fixing RAID6, Janos Haar, (Wed Apr 28, 6:32 am)
Re: Suggestion needed for fixing RAID6, MRK, (Wed Apr 28, 7:19 am)
Re: Suggestion needed for fixing RAID6, Janos Haar, (Wed Apr 28, 7:51 am)
Re: Suggestion needed for fixing RAID6, Janos Haar, (Thu Apr 29, 12:55 am)
Re: Suggestion needed for fixing RAID6, MRK, (Thu Apr 29, 8:22 am)
Re: Suggestion needed for fixing RAID6, Janos Haar, (Thu Apr 29, 2:07 pm)
Re: Suggestion needed for fixing RAID6, MRK, (Thu Apr 29, 4:00 pm)
Re: Suggestion needed for fixing RAID6, Janos Haar, (Thu Apr 29, 11:17 pm)
Re: Suggestion needed for fixing RAID6, MRK, (Fri Apr 30, 4:54 pm)
Re: Suggestion needed for fixing RAID6, Janos Haar, (Sat May 1, 2:37 am)
Re: Suggestion needed for fixing RAID6, MRK, (Sat May 1, 10:17 am)
Re: Suggestion needed for fixing RAID6, Janos Haar, (Sat May 1, 2:44 pm)
Re: Suggestion needed for fixing RAID6, MRK, (Sun May 2, 4:05 pm)
Re: Suggestion needed for fixing RAID6, Neil Brown, (Sun May 2, 7:17 pm)
Re: Suggestion needed for fixing RAID6, Neil Brown, (Sun May 2, 7:29 pm)
Re: Suggestion needed for fixing RAID6, MRK, (Mon May 3, 3:04 am)
Re: Suggestion needed for fixing RAID6, Janos Haar, (Mon May 3, 3:20 am)
Re: Suggestion needed for fixing RAID6, MRK, (Mon May 3, 3:21 am)
Re: Suggestion needed for fixing RAID6, Neil Brown, (Mon May 3, 2:02 pm)
Re: Suggestion needed for fixing RAID6, Neil Brown, (Mon May 3, 2:04 pm)
Re: Suggestion needed for fixing RAID6 [SOLVED], Janos Haar, (Wed May 5, 8:24 am)