Incorrect device mapper stats

Submitted by markseger
on March 16, 2009 - 10:58am

I've just noticed collectl is reporting inconsistent data for dm disks and the disks they're made up from. The bottom line is that /proc/diskstats appears to be wrong and I say this for 2 reasons. I know collectl is correctly reporting the data and I also confirmed that iostat reports the same numbers as collectl.

Consider the following data snapshot while writing a large file to /tmp:

### RECORD    5 >>> poker <<< (1237225871.002) (Mon Mar 16 13:51:11 2009) ###
# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
sda              0      0    0    0    3204   7171   10  320     320     2    31      8    7
sdb              0      0    0    0       0      0    0    0       0     0     0      0    0
dm-0             0      0    0    0   28952      0 7238    4       4   267     4      0    7
dm-1             0      0    0    0       0      0    0    0       0     0     0      0    0
hda              0      0    0    0       0      0    0    0       0     0     0      0    0

Here's another sample, this time the KBs are closer:

### RECORD   10 >>> poker <<< (1237225998.002) (Mon Mar 16 13:53:18 2009) ###
# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
sda              0      0    0    0   38912   8000   77  505     505   141  1864     12  100
sdb              0      0    0    0       0      0    0    0       0     0     0      0    0
dm-0             0      0    0    0   32256      0 8064    4       4 17853  2251      0  100
dm-1             0      0    0    0       0      0    0    0       0     0     0      0    0
hda              0      0    0    0       0      0    0    0       0     0     0      0    0

This time KBs and util sort of agree, though I'd expect the dm KBs to be >= the device KBs.

-mark

queues

strcmp
on
March 16, 2009 - 2:06pm

the values are KB/sec, not absolute values. the difference could be the data in the i/o-queues waiting to be written. if you write to dm-0 the data is counted and passed directly to the i/o-queue where it has to wait until the disk is ready, which can be a long time, if your load is seeky. also there is feedback, if the disk queue grew too big, the writers are throttled. you could sample the values less often to get the buffering oscillations smoothed out.

The thing is the disk

markseger
on
March 19, 2009 - 6:31am

The thing is the disk performance rates as seen by the application are all consistent and steady. If you look at the data more closely I/Os reported with dm disks are approximately the sum of the merges and actual I/Os. Is that what's happening? But aren't the I/Os the number of actual reads/writes whereas the merges are counting pages?

I also see real small I/O sizes for the dm disks, on the order of 4K which doesn't sound right at all since I'm actually using a load generator doing 1MB writes. I understand they're actully broken into something smaller, but 4K?

And finally the queue depth - can close to 20K requests be sitting in a dm queue? That one doesn't feel right either.

-mark

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.