On Thu, Dec 23, 2010 at 12:47:34PM -0500, Jeff Moyer wrote:
Suppose you have a hard-to-reach server somewhere. Suppose that you
find out that the <whatever> card could perform 15% better if you put
it in a different slot. Would you go and dig the server out to fix
this if you know the performance now will be adequate for the next few
years? Isn't it acceptable to keep things like this until a next
scheduled (or unscheduled) maintenance?
In reality I have two servers with 8T of RAID storage each. Together
with shuffling all important data around on these trying to get the
exactly optimal performance out of these storage systems is very
timeconsuming. Also each "move the data out of the way, reconfigure
the RAID, move the data back" cycle incurs risks of losing or
corrupting the data.
I prefer concentrating on the most important part. In this case we
have a 30fold performance problem. If there is a 15fold one and a
2fold one then I'll settle for looking into and hopefully fixing the
15fold one, and I'll discard the 2fold one for the time being. Not
important enough to look into. The machine happens to have 30fold
performance margin. It can keep up with what it has to do with the
30fold slower disks. However work comes in batches so the queue grows
significantly during a higher-workload-period.
^^^
It is a production system. Wether my friend is willing to run a
prerelease kernel there remains to be seen.
On the other hand, if this were a MAJOR performance bottleneck it
wouldn't be on the "list of things to fix in december 2010, but it
would've been fixed years ago.
Jeff, can you tell me where in that blktrace output do I see the
system noticing "we need to read block XXX from the disk", then that
gets queued, next it gets submitted to the hardware, and eventually
the hardware reports back: I got block XXX from the media here it
is. Can you point these events out in the logfile form me? (for any
single transaction that belongs together?)
It would be useful to see the XXX numbers (for things like block
device optimizers) and the timestamps (for us to debug this problem
today.) I strongly suspect that both are logged, right?
Roger.
--
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ
--