Re: [patch] ext2/3: document conditions when reliable operation is possible

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Rob Landley
Date: Wednesday, August 26, 2009 - 10:19 pm

On Tuesday 25 August 2009 21:58:49 Theodore Tso wrote:

Or panic, hang, the drive failed because the system is overheating because the 
air conditioner suddenly died and the server room is now an oven.  (Yup, 
worked at that company too.)


I'm a bit concerned by the argument that we don't need to document serious 
pitfalls because every Linux system has a sufficiently competent administrator 
they already know stuff that didn't even come up until the second or third day 
it was discussed on lkml.

"You're documenting it wrong" != "you shouldn't document it".


I worked at a company that retested their UPSes a year after installing them 
and found that _none_ of them supplied more than 15 seconds charge, and when 
they dismantled them the batteries had physically bloated inside their little 
plastic cases.  (Same company as the dead air conditioner, possibly 
overheating was involved but the little _lights_ said everything was ok.)

That was by no means the first UPS I'd seen die, the suckers have a higher 
failure rate than hard drives in my experience.  This is a device where the 
batteries get constantly charged and almost never tested because if it _does_ 
fail you just rebooted your production server, so a lot of smaller companies 
think they have one but actually don't.


Here's hoping they shut the system down properly to install the new drive in 
the raid then, eh?  Not accidentally pull the plug before it's finished running 
the ~7 minutes of shutdown scripts in the last Red Hat Enterprise I messed 
with...

Does this situation apply during the rebuild?  I.E. once a hot spare has been 
supplied, is the copy to the new drive linear, or will it write dirty pages to 
the new drive out of order, even before the reconstruction's gotten that far, 
_and_ do so in an order that doesn't open this race window of the data being 
unable to be reconstructed?

If "degraded array" just means "don't have a replacement disk yet", then it 
sounds like what Pavel wants to document is "don't write to a degraded array 
at all, because power failures can cost you data due to write granularity 
being larger than filesystem block size".  (Which still comes as news to some 
of us, and you need a way to remount mount the degraded array read only until 
the sysadmin can fix it.)

But if "degraded array" means "hasn't finished rebuilding the new disk yet", 
that could easily be several hours' window and not writing to it is less of an 
option.

(I realize a competent system administrator would obviously already know this, 
but I don't.)


Rob
-- 
Latency is more important than throughput. It's that simple. - Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: ext2/3: document conditions when reliable operation is ..., Goswin von Brederlow, (Mon Mar 30, 8:06 am)
[patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 3:21 pm)
[patch] document that ext2 can't handle barriers, Pavel Machek, (Tue Aug 25, 3:27 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 3:33 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 3:40 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 3:59 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 4:37 pm)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Tue Aug 25, 4:48 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 4:56 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 5:06 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 5:12 pm)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Tue Aug 25, 5:12 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 5:20 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 5:20 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 5:26 pm)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Tue Aug 25, 5:26 pm)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Tue Aug 25, 5:28 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 5:38 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 5:39 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Tue Aug 25, 5:44 pm)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Tue Aug 25, 5:45 pm)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Tue Aug 25, 5:50 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 6:17 pm)
Re: [patch] document flash/RAID dangers, david, (Tue Aug 25, 6:19 pm)
Re: [patch] ext2/3: document conditions when reliable oper ..., Henrique de Moraes H ..., (Tue Aug 25, 7:53 pm)
Re: [patch] document flash/RAID dangers, Rik van Riel, (Tue Aug 25, 9:20 pm)
Re: [patch] document flash/RAID dangers, Rik van Riel, (Tue Aug 25, 9:24 pm)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Wed Aug 26, 4:21 am)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Wed Aug 26, 4:22 am)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Wed Aug 26, 4:25 am)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Wed Aug 26, 4:58 am)
Re: [patch] document flash/RAID dangers, Theodore Tso, (Wed Aug 26, 5:37 am)
Re: [patch] document flash/RAID dangers, Theodore Tso, (Wed Aug 26, 5:40 am)
Re: [patch] document flash/RAID dangers, Ric Wheeler, (Wed Aug 26, 6:11 am)
Re: [patch] document flash/RAID dangers, david, (Wed Aug 26, 6:44 am)
Re: [patch] document flash/RAID dangers, Rik van Riel, (Wed Aug 26, 7:45 am)
Re: [patch] ext2/3: document conditions when reliable oper ..., Rob Landley, (Wed Aug 26, 10:19 pm)
Re: MD/DM and barriers (was Re: [patch] ext2/3: document c ..., Alasdair G Kergon, (Thu Aug 27, 11:09 am)
Re: raid is dangerous but that's secret, Florian Weimer, (Fri Aug 28, 12:11 am)
Re: raid is dangerous but that's secret, NeilBrown, (Fri Aug 28, 12:23 am)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Sat Aug 29, 2:38 am)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Sat Aug 29, 2:39 am)
Re: [patch] document flash/RAID dangers, Ron Johnson, (Sat Aug 29, 4:47 am)
Re: [patch] document flash/RAID dangers, jim owens, (Sat Aug 29, 9:12 am)
Re: [patch] document flash/RAID dangers, Pavel Machek, (Sat Aug 29, 11:49 pm)
Re: raid is dangerous but that's secret (was Re: [patch] e ..., Christoph Hellwig, (Sun Aug 30, 9:35 am)
Re: raid is dangerous but that's secret (was Re: [patch] e ..., Christoph Hellwig, (Mon Aug 31, 6:16 am)
Re: raid is dangerous but that's secret (was Re: [patch] e ..., Christoph Hellwig, (Mon Aug 31, 6:21 am)
Re: raid is dangerous but that's secret (was Re: [patch] e ..., Jesse Brandeburg, (Mon Aug 31, 10:49 am)
Re: raid is dangerous but that's secret (was Re: [patch] e ..., Christoph Hellwig, (Mon Aug 31, 11:31 am)