On Monday 24 August 2009 16:11:56 Greg Freemyer wrote:
Right now, people think that a degraded raid 5 is equivalent to raid 0. As
this thread demonstrates, in the power failure case it's _worse_, due to write
granularity being larger than the filesystem sector size. (Just like flash.)
Knowing that, some people might choose to suspend writes to their raid until
it's finished recovery. Perhaps they'll set up a system where a degraded raid
5 gets remounted read only until recovery completes, and then writes go to a
new blank hot spare disk using all that volume snapshoting or unionfs stuff
people have been working on. (The big boys already have hot spare disks
standing by on a lot of these systems, ready to power up and go without human
intervention. Needing two for actual reliability isn't that big a deal.)
Or maybe the raid guys might want to tweak the recovery logic so it's not
entirely linear, but instead prioritizes dirty pages over clean ones. So if
somebody dirties a page halfway through a degraded raid 5, skip ahead to
recover that chunk first to the new disk first (yes leaving holes, it's not that
hard to track), and _then_ let the write go through.
But unless people know the issue exists, they won't even start thinking about
ways to address it.
Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html