Re: [patch] ext2/3: document conditions when reliable operation is possible

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Ric Wheeler
Date: Monday, August 24, 2009 - 1:24 pm

Pavel Machek wrote:

I don't see why you think that. In general, fsck (for any fs) only 
checks metadata. If you have silent data corruption that corrupts things 
that are fixable by fsck, you most likely have silent corruption hitting 
things users care about like their data blocks inside of files. Fsck 
will not fix (or notice) any of that, that is where things like full 
data checksums can help.

Also note (from first hand experience), unless you check and validate 
your data, you can have data corruptions that will not get flagged as IO 
errors so data signing or scrubbing is a critical part of data integrity.
I think that we need to help people understand the full spectrum of data 
concerns, starting with reasonable best practices that will help most 
people suffer *less* (not no) data loss. And make very sure that they 
are not falsely assured that by following any specific script that they 
can skip backups, remote backups, etc :-)

Nothing in our code in any part of the kernel deals well with every 
disaster or odd event.


I think that the example and the response are both off base. If your 
head ever touches the platter, you won't be reading from a huge part of 
your drive ever again (usually, you have 2 heads per platter, 3-4 
platters, impact would kill one head and a corresponding percentage of 
your data).

No file system will recover that data although you might be able to 
scrape out some remaining useful bits and bytes.

More common causes of silent corruption would be bad DRAM in things like 
the drive write cache, hot spots (that cause adjacent track data 
errors), etc.  Note in this last case, your most recently written data 
is fine, just the data you wrote months/years ago is toast!

It is hard for anyone to see the real data without looking in detail at 
large numbers of parts. Back at EMC, we looked at failures for lots of 
parts so we got a clear grasp on trends.  I do agree that flash/SSD 
parts are still very young so we will have interesting and unexpected 
failure modes to learn to deal with....

Nothing is perfect. It is still a trade off between storage utilization 
(how much storage we give users for say 5 2TB drives), performance and 
costs (throw away any disks over 2 years old?).

ext3 is used on lots of raid arrays without any issue.

I think that you really need to step back and look harder at real 
failures - not just your personal experience - but a larger set of real 
world failures. Many papers have been published recently about that (the 
google paper, the Bianca paper from FAST, Netapp, etc).

Regards,

Ric


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: ext2/3: document conditions when reliable operation is ..., Goswin von Brederlow, (Mon Mar 30, 8:06 am)
Re: [patch] ext2/3: document conditions when reliable oper ..., Ric Wheeler, (Mon Aug 24, 1:24 pm)
[patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 3:21 pm)
[patch] document that ext2 can't handle barriers, Pavel Machek, (Tue Aug 25, 3:27 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 3:33 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 3:40 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 3:59 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 4:37 pm)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Tue Aug 25, 4:48 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 4:56 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 5:06 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 5:12 pm)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Tue Aug 25, 5:12 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 5:20 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 5:20 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 5:26 pm)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Tue Aug 25, 5:26 pm)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Tue Aug 25, 5:28 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 5:38 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 5:39 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 5:44 pm)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Tue Aug 25, 5:45 pm)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Tue Aug 25, 5:50 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 6:17 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 6:19 pm)
Re: [patch] ext2/3: document conditions when reliable oper ..., Henrique de Moraes H ..., (Tue Aug 25, 7:53 pm)
Re: [patch] document flash/RAID dangers, Rik van Riel, (Tue Aug 25, 9:20 pm)
Re: [patch] document flash/RAID dangers, Rik van Riel, (Tue Aug 25, 9:24 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Wed Aug 26, 4:21 am)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Wed Aug 26, 4:22 am)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Wed Aug 26, 4:25 am)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Wed Aug 26, 4:58 am)
Re: [patch] document flash/RAID dangers, Theodore Tso, (Wed Aug 26, 5:37 am)
Re: [patch] document flash/RAID dangers, Theodore Tso, (Wed Aug 26, 5:40 am)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Wed Aug 26, 6:11 am)
Re: [patch] document flash/RAID dangers, david, (Wed Aug 26, 6:44 am)
Re: [patch] document flash/RAID dangers, Rik van Riel, (Wed Aug 26, 7:45 am)
Re: MD/DM and barriers (was Re: [patch] ext2/3: document c ..., Alasdair G Kergon, (Thu Aug 27, 11:09 am)
Re: raid is dangerous but that's secret, Florian Weimer, (Fri Aug 28, 12:11 am)
Re: raid is dangerous but that's secret, NeilBrown, (Fri Aug 28, 12:23 am)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Sat Aug 29, 2:38 am)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Sat Aug 29, 2:39 am)
Re: [patch] document flash/RAID dangers, Ron Johnson, (Sat Aug 29, 4:47 am)
Re: [patch] document flash/RAID dangers, jim owens, (Sat Aug 29, 9:12 am)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Sat Aug 29, 11:49 pm)
Re: raid is dangerous but that's secret (was Re: [patch] e ..., Christoph Hellwig, (Sun Aug 30, 9:35 am)
Re: raid is dangerous but that's secret (was Re: [patch] e ..., Christoph Hellwig, (Mon Aug 31, 6:16 am)
Re: raid is dangerous but that's secret (was Re: [patch] e ..., Christoph Hellwig, (Mon Aug 31, 6:21 am)
Re: raid is dangerous but that's secret (was Re: [patch] e ..., Jesse Brandeburg, (Mon Aug 31, 10:49 am)
Re: raid is dangerous but that's secret (was Re: [patch] e ..., Christoph Hellwig, (Mon Aug 31, 11:31 am)