dd read from /dev/mapper device performance

Submitted by skimike
on February 20, 2007 - 9:51am

Hello,

I have 2 WD 150GB Raptor drives in RAID 0 on an NVIDIA 680i motherboard using NVIDIA's fake RAID.

dmraid -ay correctly detects and activates the devices, placing the corresponding entries under /dev/mapper/nvidia_dbggicbd for the logical drive and /dev/mapper/nvidia_dbggicbd1 for the first partition.

dmraid -s nvidia_dbggicbd reveals:

*** Active Set
name   : nvidia_dbggicbd
size   : 586093056
stride : 256
type   : stripe
status : ok
subsets: 0
devs   : 2
spares : 0

When performing a time dd if=/dev/mapper/nvidia_dbggicbd of=/dev/null bs=1024k count=10000, the following results are obtained:

10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 59.3748 seconds, 177 MB/s

real    0m59.392s
user    0m0.008s
sys     0m12.529s

mpstat -P ALL 1 reveals:

10:59:14 AM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
10:59:15 AM  all    0.00    0.00   10.05   33.17    2.51    4.02    0.00   50.25   3940.59
10:59:15 AM    0    0.00    0.00   19.80   66.34    5.94    7.92    0.00    0.00   3821.78
10:59:15 AM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.01     18.81

10:59:15 AM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
10:59:16 AM  all    0.00    0.00    9.95   32.84    2.49    4.98    0.00   49.75   4007.07
10:59:16 AM    0    0.00    0.00   19.19   66.67    4.04   10.10    0.00    0.00   3902.02
10:59:16 AM    1    0.00    0.00    0.00    0.00    1.01    0.00    0.00  101.01      3.03

10:59:16 AM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
10:59:17 AM  all    0.00    0.00   10.05   32.66    2.51    5.03    0.00   49.75   3978.79
10:59:17 AM    0    0.00    0.00   21.21   64.65    5.05    9.09    0.00    0.00   3851.52
10:59:17 AM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00  101.01     11.11

10:59:17 AM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
10:59:18 AM  all    0.00    0.00   10.45   32.84    2.99    3.48    0.00   50.25   3908.91
10:59:18 AM    0    0.00    0.00   20.79   65.35    5.94    7.92    0.00    0.00   3807.92
10:59:18 AM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.01      0.99

with the "dd" process consuming 20% CPU according to "top".

Performance drops drastically once I start using the logical partition instead of the entire logical drive, however:

time dd if=/dev/mapper/nvidia_dbggicbd1 of=/dev/null bs=1024k count=10000 results in:

10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 93.6217 seconds, 112 MB/s

real    1m33.637s
user    0m0.014s
sys     1m29.637s

mpstat -P ALL 1 reveals:

11:05:42 AM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
11:05:43 AM  all    0.00    0.00   26.13    0.00    3.02   21.11    0.00   49.75   7714.00
11:05:43 AM    0    0.00    0.00   52.00    0.00    6.00   42.00    0.00    0.00   7605.00
11:05:43 AM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.00      3.00

11:05:43 AM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
11:05:44 AM  all    0.00    0.00   26.00    0.00    3.50   20.50    0.00   50.00   7738.00
11:05:44 AM    0    0.00    0.00   52.00    0.00    7.00   41.00    0.00    0.00   7614.00
11:05:44 AM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00     21.00

11:05:44 AM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
11:05:45 AM  all    0.00    0.00   26.00    0.00    4.00   20.00    0.00   50.00   7763.64
11:05:45 AM    0    0.00    0.00   52.53    0.00    8.08   39.39    0.00    0.00   7658.59
11:05:45 AM    1    0.00    0.00    1.01    0.00    0.00    0.00    0.00  101.01      3.03

11:05:45 AM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
11:05:46 AM  all    0.50    0.00   25.25    0.50    2.48   21.78    0.00   49.50   7741.00
11:05:46 AM    0    0.00    0.00   50.00    0.00    6.00   44.00    0.00    0.00   7614.00
11:05:46 AM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00     12.00

with the "dd" process consuming 95% CPU according to "top".

fdisk -l /dev/mapper/nvidia_dbggicbd1 shows:


Disk /dev/mapper/nvidia_dbggicbd: 300.0 GB, 300079644672 bytes
255 heads, 63 sectors/track, 36482 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

                      Device Boot      Start         End      Blocks   Id  System
/dev/mapper/nvidia_dbggicbd1               1       36482   293041633+  83  Linux

The above performance/behavior difference does not occur when performing similar tests from actual physical disks, for example, /dev/sda and /dev/sda1.

So I guess my question is, why is there such a drastic difference in performance between using the entire logical device versus a partition on the logical device which, in my dense understanding, would appear to only be the difference between starting on sector 0 in the case of the logical device and sector 63 for the logical partition. Writes to both devices are similarly impacted though the numbers are artifically higher due to having 4GB of RAM installed. I'm obviously missing something stupidly simple. Somebody please clue me in?

Not enough information

farnz
on
February 20, 2007 - 10:51am

At a guess, the kernel's doing something silly when dealing with the partition instead of the entire device. Can you install dmsetup, and post the output of dmsetup -r table and dmsetup -r status?

dmsetup info

skimike
on
February 20, 2007 - 11:20am

Here are the results of the commands that you requested:

[root@miya ~]# dmsetup -r table
nvidia_dbggicbd1: 0 586083267 linear 253:0 63
nvidia_dbggicbd: 0 586093056 striped 2 256 8:0 0 8:16 0
[root@miya ~]# dmsetup -r status
nvidia_dbggicbd1: 0 586083267 linear
nvidia_dbggicbd: 0 586093056 striped

Interesting that the device reads "striped" while the partition reads "linear". I can't say if that is normal behavior or not as this is the first time I've used dmraid versus using good old md. Regardless of whether it thinks the partition is linear or striped, I've mounted and read my WinXP NTFS partition from it before and have built ext2 filesystems on it so I'm going to guess that it is normal behavior and that the linear device is actually sitting on top of the striped device. "iostat -xk 1" supports that assumption when using both the logical device and the partition.

Probably a kernel bug in the dm layer

farnz
on
February 20, 2007 - 11:35am

Looks like the kernel's being a bit slow in the dm layer (and I meant dmsetup -r ls when I said dmsetup -r status - sorry, although there's enough here to spot what's going on).

As you've correctly surmised, nvidia_dbggicbd is being built as a striped device (RAID-0) from two physical disks. nvidia_dbggicbd1 is then being built as a linear mapping against that striped device. When you run dd against the partition, it issues I/O requests against nvidia_dbggicbd1. dm-linear reissues this I/O requests against nvidia_dbggicbd, but with a 63 sector offset. dm-stripe then splits the request across the two disks according to the stripe pattern.

The extra CPU load you see when accessing the partition is therefore occurring when dm-linear reworks the request and passes it on to dm-stripe. The code for this is in drivers/md/dm-*.

Thanks!

skimike
on
February 20, 2007 - 12:16pm

Thank you for the excellent information! It looks like there's no compelling reason to attempt to use dmraid over md unless compatibility for NTFS partitions installed on native/fake software RAID is required. I was scratching my head over this for a while, running and rerunning benchmarks. Makes perfect sense about the reissuance of I/O requests given the read from the logical device was drive-performance limited whereas the partition was bottlenecking on CPU. Thanks for helping me put 2 and 2 together!

Avoidance

George Harkin (not verified)
on
August 15, 2007 - 10:22pm

Can you post some comments on generalizing the technique to avoid this bottleneck.

How should we arrange the logical partitions?
Will all LVM devices suffer from this bottleneck?
Does this apply to md raid also?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.