Feature Request

Previous thread: Re: Disk I/O error while rebuilding an md raid-5 array by russ on Monday, February 8, 2010 - 5:25 pm. (1 message)

Next thread: [PULL REQUEST] 2 md fixes for 2.6.33 by Neil Brown on Tuesday, February 9, 2010 - 5:36 pm. (1 message)
From: Stefan *St0fF* Huebner
Date: Tuesday, February 9, 2010 - 1:43 am

Hi Everybody,

I would like to propose a few probably hard-to-implement features to mdraid.

Background:
Nowadays harddisk drives, I only talk about ATA/SATA drives (SCSI
devices are too expensive for me), do their own error correction.  Most
of them also have a feature called ERC (Error Recovery Control), where
you can set timeouts for read/write error correction.  Desktop drives
are preset to run their error recovery to its fullest extend, not
reacting while this procedure is active.  RAID-edition/enterprise disks
are normally set to start error recovery, but report back a media error
after 7 seconds of unsuccessful error recovery - here this timeout
"happens".

Now imagine any RAID with some kind of redundancy, reading/writing
data.  One of the disks finds out "I cannot correctly read/write the
requested sector", starts its error correction, hits the respective
ERC-timeout and reports back a media error or unrecoverable error.  Now
mdraid would drop the disk.

But actually the data of the sector can be recreated through the
existing redundancy.  Wouldn't it be a smart thing if the mdraid
recreates the sector and just tried to write it again?  And after a good
amount of failed retries it may well drop the disk.

Prerequisites:
- upon assembling/creating of the array:
  - mdraid needs to find out if the used devices rely on (s)ata block
devices
  - if it does, the ERC-timeouts for reading/writing operations on each
device need to be set, as this feature is volatile (gets reset to
factory defaults upon power-on-reset).
  - if successful, some flag indicating the enabled feature shall be set
- error handling needs to be updated with above described "intelligence"
for devices, that have the ERC-feature set

This is a request for comments (and of course this feature).

All the best,
Stefan Hübner
--

From: Michael Tokarev
Date: Tuesday, February 9, 2010 - 5:28 am

Stefan *St0fF* Huebner wrote:

This is exactly what md layer is doing.  On failed _read_ it tries to
reconstruct data from other disk drives and writes the reconstructed
data back to the drive where read failed.  If the _write_ fails md will
drop the disk.

/mjt
--

From: Stefan Hübner
Date: Tuesday, February 9, 2010 - 7:19 am

Hi Mjt,

I hoped so - great it is implemented like that.

Well, then all that's needed is the check at assembly/creation time:
- (is the drive an ATA-drive) && (does it support SCT ERC)
-> and if it does, set some reasonable timeouts. (like the 7s it is with
enterprise class drives for reading.  For writing I would suggest 14s,
bearing in mind that too quick reallocation results in the spare sectors
running out quickly.)

The writing back (I guess this is done with a reasonable amount of
retries) does not make sense if the drive is still in its error recovery
procedure and does not react to any commands until it is done.

P.S.: I have already implemented the checks and setup, but in userspace
using SG_IO.


--

Previous thread: Re: Disk I/O error while rebuilding an md raid-5 array by russ on Monday, February 8, 2010 - 5:25 pm. (1 message)

Next thread: [PULL REQUEST] 2 md fixes for 2.6.33 by Neil Brown on Tuesday, February 9, 2010 - 5:36 pm. (1 message)