It is always interesting to try to explain to users that just because
fsck ran cleanly does not mean anything that they care about is actually
safely on disk. The speed that fsck can run at is important when you are
trying to recover data from a really hosed file system, but that is
thankfully relatively rare for most people.
Having been involved in many calls with customers after crashes, what
they really want to know is pretty routine - do you have all of the data
I wrote? can you prove that it is the same data that I wrote? if not,
what data is missing and needs to be restored?
We can get help answer those questions with checksums or digital hashes
to validate the actual user data of files (open question is when to
compute it, where to store, would the SCSI T10 DIF/DIX stuff be
sufficient), putting in place some background scrubbers to detect
corruptions (which can happen even without an IO error), etc.
Being able to pin point what was impacted is actually enormously useful
- for example, being able to map a bad sector back into some meaningful
object like a user file, meta-data (translation, run fsck) or so on.
Ric
--