----- Original Message -----
From: "MRK" <mrk@shiftmail.org>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Monday, April 26, 2010 12:24 PM
Subject: Re: Suggestion needed for fixing RAID6
quoted text > On 04/25/2010 12:00 PM, Janos Haar wrote:
>>
>> ----- Original Message ----- From: "MRK" <mrk@shiftmail.org>
>> To: "Janos Haar" <janos.haar@netcenter.hu>
>> Cc: <linux-raid@vger.kernel.org>
>> Sent: Sunday, April 25, 2010 12:47 AM
>> Subject: Re: Suggestion needed for fixing RAID6
>>
>> Just a little note:
>>
>> The repair-sync action failed similar way too. :-(
>>
>>
>>> On 04/24/2010 09:36 PM, Janos Haar wrote:
>>>>
>>>> Ok, i am doing it.
>>>>
>>>> I think i have found some interesting, what is unexpected:
>>>> After 99.9% (and another 1800minute) the array is dropped the
>>>> dm-snapshot structure!
>>>>
>>>> ...[CUT]...
>>>>
>>>> raid5:md3: read error not correctable (sector 2923767944 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923767952 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923767960 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923767968 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923767976 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923767984 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923767992 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923768000 on dm-0).
>>>>
>>>> ...[CUT]...
>>>>
>>
>
> Remember this exact error message: "read error not correctable"
>
>>
>>>
>>> This is strange because the write should have gone to the cow device.
>>> Are you sure you did everything correctly with DM? Could you post
>>> here how you created the dm-0 device?
>>
>> echo 0 $(blockdev --getsize /dev/sde4) \
>> snapshot /dev/sde4 /dev/loop3 p 8 | \
>> dmsetup create cow
>>
>
> Seems correct to me...
>
>> ]# losetup /dev/loop3
>> /dev/loop3: [0901]:55091517 (/snapshot.bin)
>>
> This line comes BEFORE the other one, right?
>
>> /snapshot.bin is a sparse file with 2000G seeked size.
>> I have 3.6GB free space in / so the out of space is not an option. :-)
>>
>>
> [...]
>>
>>>
>>> We might ask to the DM people why it's not working maybe. Anyway
>>> there is one good news, and it's that the read error apparently does
>>> travel through the DM stack.
>>
>> For me, this looks like md's bug not dm's problem.
>> The "uncorrectable read error" means exactly the drive can't correct
>> the damaged sector with ECC, and this is an unreadable sector.
>> (pending in smart table)
>> The auto read reallocation failed not meas the sector is not
>> re-allocatable by rewriting it!
>> The most of the drives doesn't do read-reallocation only
>> write-reallocation.
>>
>> These drives wich does read reallocation, does it because the sector
>> was hard to re-calculate (maybe needed more rotation, more
>> repositioning, too much time) and moved automatically, BUT those
>> sectors ARE NOT reported to the pc as read-error (UNC), so must NOT
>> appear in the log...
>>
>
> No the error message really comes from MD. Can you read C code? Go into
> the kernel source and look this file:
>
> linux_source_dir/drivers/md/raid5.c
>
> (file raid5.c is also for raid6) search for "read error not correctable"
>
> What you see there is the reason for failure. You see the line "if
> (conf->mddev->degraded)" just above? I think your mistake was that you
> did the DM COW trick only on the last device, or anyway one device only,
> instead you should have done it on all 3 devices which were failing.
>
> It did not work for you because at the moment you got the read error on
> the last disk, two disks were already dropped from the array, the array
> was doubly degraded, and it's not possible to correct a read error if
> the array is degraded because you don't have enough parity information
> to recover the data for that sector.
Oops, you are right!
It was my mistake.
Sorry, i will try it again, to support 2 drives with dm-cow.
I will try it.
Thanks again.
Janos
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to
majordomo@vger.kernel.org
More majordomo info at
http://vger.kernel.org/majordomo-info.html