wishful thinking about atomic, multi-sector or full MD stripe width, writes in storage

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Ric Wheeler
Date: Thursday, September 3, 2009 - 7:15 am

On 09/03/2009 09:59 AM, Krzysztof Halasa wrote:

The whole thread above is about software MD using commodity drives (S-ATA or 
SAS) without battery backed write cache.

We have that (and I have it personally) and do test it.

You must disable the write cache on these commodity drives *if* the MD RAID 
level does not support barriers properly.

This will greatly reduce errors after a power loss (both in degraded state and 
non-degraded state), but it will not eliminate data loss entirely. You simply 
cannot do that with any storage device!

Note that even without MD raid, the file system issues IO's in file system block 
size (4096 bytes normally) and most commodity storage devices use a 512  byte 
sector size which means that we have to update 8 512b sectors.

Drives can (and do) have multiple platters and surfaces and it is perfectly 
normal to have contiguous logical ranges of sectors map to non-contiguous 
sectors physically. Imagine a 4KB write stripe that straddles two adjacent 
tracks on one platter (requiring a seek) or mapped across two surfaces 
(requiring a head switch). Also, a remapped sector can require more or less a 
full surface seek from where ever you are to the remapped sector area of the drive.

These are all examples that can after a power loss,  even a local (non-MD) 
device,  do a partial update of that 4KB write range of sectors. Note that 
unlike unlike RAID/MD, local storage has no parity on the server to detect this 
partial write.

This is why new file systems like btrfs and zfs do checksumming of data and 
metadata. This won't prevent partial updates during a write, but can at least 
detect them and try to do some kind of recovery.

In other words, this is not just an MD issue, it is entirely possible even with 
non-MD devices.

Also, when you enable the write cache (MD or not) you are buffering multiple 
MB's of data that can go away on power loss. Far greater (10x) the exposure that 
the partial RAID rewrite case worries about.

ric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: ext2/3: document conditions when reliable operation is ..., Goswin von Brederlow, (Mon Mar 30, 8:06 am)
[patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 3:21 pm)
[patch] document that ext2 can't handle barriers, Pavel Machek, (Tue Aug 25, 3:27 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 3:33 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 3:40 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 3:59 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 4:37 pm)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Tue Aug 25, 4:48 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 4:56 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 5:06 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 5:12 pm)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Tue Aug 25, 5:12 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 5:20 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 5:20 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 5:26 pm)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Tue Aug 25, 5:26 pm)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Tue Aug 25, 5:28 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 5:38 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 5:39 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 5:44 pm)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Tue Aug 25, 5:45 pm)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Tue Aug 25, 5:50 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 6:17 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 6:19 pm)
Re: [patch] ext2/3: document conditions when reliable oper ..., Henrique de Moraes H ..., (Tue Aug 25, 7:53 pm)
Re: [patch] document flash/RAID dangers, Rik van Riel, (Tue Aug 25, 9:20 pm)
Re: [patch] document flash/RAID dangers, Rik van Riel, (Tue Aug 25, 9:24 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Wed Aug 26, 4:21 am)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Wed Aug 26, 4:22 am)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Wed Aug 26, 4:25 am)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Wed Aug 26, 4:58 am)
Re: [patch] document flash/RAID dangers, Theodore Tso, (Wed Aug 26, 5:37 am)
Re: [patch] document flash/RAID dangers, Theodore Tso, (Wed Aug 26, 5:40 am)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Wed Aug 26, 6:11 am)
Re: [patch] document flash/RAID dangers, david, (Wed Aug 26, 6:44 am)
Re: [patch] document flash/RAID dangers, Rik van Riel, (Wed Aug 26, 7:45 am)
Re: MD/DM and barriers (was Re: [patch] ext2/3: document c ..., Alasdair G Kergon, (Thu Aug 27, 11:09 am)
Re: raid is dangerous but that's secret, Florian Weimer, (Fri Aug 28, 12:11 am)
Re: raid is dangerous but that's secret, NeilBrown, (Fri Aug 28, 12:23 am)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Sat Aug 29, 2:38 am)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Sat Aug 29, 2:39 am)
Re: [patch] document flash/RAID dangers, Ron Johnson, (Sat Aug 29, 4:47 am)
Re: [patch] document flash/RAID dangers, jim owens, (Sat Aug 29, 9:12 am)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Sat Aug 29, 11:49 pm)
Re: raid is dangerous but that's secret (was Re: [patch] e ..., Christoph Hellwig, (Sun Aug 30, 9:35 am)
Re: raid is dangerous but that's secret (was Re: [patch] e ..., Christoph Hellwig, (Mon Aug 31, 6:16 am)
Re: raid is dangerous but that's secret (was Re: [patch] e ..., Christoph Hellwig, (Mon Aug 31, 6:21 am)
Re: raid is dangerous but that's secret (was Re: [patch] e ..., Jesse Brandeburg, (Mon Aug 31, 10:49 am)
Re: raid is dangerous but that's secret (was Re: [patch] e ..., Christoph Hellwig, (Mon Aug 31, 11:31 am)
wishful thinking about atomic, multi-sector or full MD str ..., Ric Wheeler, (Thu Sep 3, 7:15 am)