Indeed. Michael Tokarev answered to me on 2/9/10:
"On failed _read_ it tries to
reconstruct data from other disk drives and writes the reconstructed
data back to the drive where read failed. If the _write_ fails md will
drop the disk."
This means: if read fails and the drive does not report back, the
following reconstructing write calls will fail, too. The disk gets
dropped, because it (most probably) is still doing its error recovery on
the former read request and by that not responding.
If you enable ERC read timeouts, it'll report a media error (or
something similar), but honour the write request. If you give the ERC
write timeout a value that is not too small and also not too large (i.e.
it shouldn't timeout the write-operation from the view of the kernel),
it will either fix the pending sector, or reallocate it. If the ERC
write timeout value is too small, it'll very aggressively reallocate
sectors - which should not be the intention, as there are very few spare
You're welcome, and all the best,
Thank you very much!
This is great answer (explaining a lot of things) and great news
(there's nothing to worry/hack about).
So, now we (desktop drives users) just have to wait for smartmontools
5.40 or pull source from SVN and set some reasonable ERC read timeouts.
Is value of 7 seconds considered as reasonable ERC read timeout?