I don't see why you think that. In general, fsck (for any fs) only
checks metadata. If you have silent data corruption that corrupts things
that are fixable by fsck, you most likely have silent corruption hitting
things users care about like their data blocks inside of files. Fsck
will not fix (or notice) any of that, that is where things like full
data checksums can help.
Also note (from first hand experience), unless you check and validate
your data, you can have data corruptions that will not get flagged as IO
errors so data signing or scrubbing is a critical part of data integrity.
I think that we need to help people understand the full spectrum of data
concerns, starting with reasonable best practices that will help most
people suffer *less* (not no) data loss. And make very sure that they
are not falsely assured that by following any specific script that they
can skip backups, remote backups, etc :-)
Nothing in our code in any part of the kernel deals well with every
disaster or odd event.
I think that the example and the response are both off base. If your
head ever touches the platter, you won't be reading from a huge part of
your drive ever again (usually, you have 2 heads per platter, 3-4
platters, impact would kill one head and a corresponding percentage of
your data).
No file system will recover that data although you might be able to
scrape out some remaining useful bits and bytes.
More common causes of silent corruption would be bad DRAM in things like
the drive write cache, hot spots (that cause adjacent track data
errors), etc. Note in this last case, your most recently written data
is fine, just the data you wrote months/years ago is toast!
It is hard for anyone to see the real data without looking in detail at
large numbers of parts. Back at EMC, we looked at failures for lots of
parts so we got a clear grasp on trends. I do agree that flash/SSD
parts are still very young so we will have interesting and unexpected
failure modes to learn to deal with....
Nothing is perfect. It is still a trade off between storage utilization
(how much storage we give users for say 5 2TB drives), performance and
costs (throw away any disks over 2 years old?).
ext3 is used on lots of raid arrays without any issue.
I think that you really need to step back and look harder at real
failures - not just your personal experience - but a larger set of real
world failures. Many papers have been published recently about that (the
google paper, the Bianca paper from FAST, Netapp, etc).
Regards,
Ric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html