Re: TLER / CCTL timeout handling

Previous thread: 3-drive RAID1 for my low use home server? by Mark Knecht on Tuesday, March 23, 2010 - 2:37 pm. (3 messages)

Next thread: 2.6.33.1: RAID multi-core processing experimental option is broken. by Justin Piszcz on Wednesday, March 24, 2010 - 4:50 am. (1 message)
From: Nebojsa Trpkovic
Date: Tuesday, March 23, 2010 - 3:10 pm

Hello.

I've found interesting text about TLER / CCTL
(http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery )

on desktop class drives:
http://forums.storagereview.com/index.php/topic/28333-tler-cctl/

So, the question is:

If I make my drive report back it failed to read the requested sector,
how that report will be handeled?

Will Linux software RAID be aware of that report and start some action
(rebuilding affected stripe or at least whole array, reallocating bad
sectors along the way) ?


Thank you.
Nebojsa Trpkovic


--

From: Stefan /*St0fF*/ Hübner
Date: Wednesday, March 24, 2010 - 2:54 pm

Indeed.  Michael Tokarev answered to me on 2/9/10:
"On failed _read_ it tries to
reconstruct data from other disk drives and writes the reconstructed
data back to the drive where read failed.  If the _write_ fails md will
drop the disk."

This means: if read fails and the drive does not report back, the
following reconstructing write calls will fail, too.  The disk gets
dropped, because it (most probably) is still doing its error recovery on
the former read request and by that not responding.

If you enable ERC read timeouts, it'll report a media error (or
something similar), but honour the write request.  If you give the ERC
write timeout a value that is not too small and also not too large (i.e.
it shouldn't timeout the write-operation from the view of the kernel),
it will either fix the pending sector, or reallocate it.  If the ERC
write timeout value is too small, it'll very aggressively reallocate
sectors - which should not be the intention, as there are very few spare

You're welcome, and all the best,
Stefan
--

From: Nebojsa Trpkovic
Date: Wednesday, March 24, 2010 - 5:48 pm

Thank you very much!

This is great answer (explaining a lot of things) and great news
(there's nothing to worry/hack about).

So, now we (desktop drives users) just have to wait for smartmontools
5.40 or pull source from SVN and set some reasonable ERC read timeouts.

Is value of 7 seconds considered as reasonable ERC read timeout?

Nebojsa
--

Previous thread: 3-drive RAID1 for my low use home server? by Mark Knecht on Tuesday, March 23, 2010 - 2:37 pm. (3 messages)

Next thread: 2.6.33.1: RAID multi-core processing experimental option is broken. by Justin Piszcz on Wednesday, March 24, 2010 - 4:50 am. (1 message)