io performance

Submitted by raziebe
on January 29, 2005 - 3:03pm

I have been trying to optimize my raid throughput. I am trying to read hundred of files simultaneously from 6 disks raid 5 array (3ware).I found out the weirdest things:

1. xfs degrades linear read performance. just loading this filesystem degrades a linear read from the raid.

2.I have written a simple c program that reads hundred of files synchronously (file after file )from ext2 filesystem. I got a terrible performance. The minute i switched to direct io the performance improved greatly. I have no idea why. Could it be that the page cache radix degrades the performance ? I must add that when i read these files I am using 10 meg buffers and the cache has no meaning because i do not read the same block twice .

I would be happy to hear any ideas for this odd behaviour.

As to 1. I'd like to see some

Anonymous (not verified)
on
January 30, 2005 - 9:00am

As to 1. I'd like to see some numbers.

As to 2. the files themselves might not be cached but perhaps the directory entries were. Having them cached means less seeking around, and having direct IO on means that the file cache won't grow, so all the dentries stay in RAM... Perhaps?

version?

strcmp
on
January 30, 2005 - 9:17am

Which kernel version do you have?

I remember xfs having low raid5 performance when it was new (perhaps an artifact of its 2.4 interface), because they had to lower some transfer block size or so. Maybe it had to be done globally? Remember xfs expects an advanced block layer only introduced in 2.6. But I don't know what I'm talking about, sorry...

some numbers

raziebe
on
January 30, 2005 - 2:24pm

First thank you for you interest.
Here is the full configuration of my machines:

Motherboard super micro
3.2 GHZ 4 G ram.

6 sata maxtor disk (250 MB) .
3ware 9500s not with no upgrade of the firmware.
Raid 5 . 256K stripe size.
total raid size is 1.2 TB

kernel : 2.6.10

1.
linear read : ~250 MBps
mkfs.xfs a partition and mounting it dropped the linear read performance to ~130 MBps.
doing the same thing with ext3, resulted in 190 MPps.
ext2 had no affect. Could it be the journalling ?
I don't know about reiserfs ,will tell you in a few days.

2.
I need to be more percise. I had created 250 files , each file size is 100MB. I had written a simple c program that read 10MB from file 1 , 10 MB from file 2 and so on untill 100MB are readen from all files. here are the figures:
250 files , 100 MB each. total of 25000 MB readen ended in ~127 seconds, 197 MBps.

ALL files reside in the same directory so DNLC cache is no factor.
I read in Robert Love book that the page cache had improved greatly in 2.6,
So I am realy lost here.

Any how , if you are already reading this , do you know whteher in direct IO the DMA is copied to the user buffer ?
Also,A friend of mine asked me how could that be that high memory buffers are referenced in interrupt context or BH context.I am using 4G ram, how many times buffers copies are made till the user buffer is filled?

Raz Ben Jehuda

Try increasing readahead

cantinflas (not verified)
on
January 31, 2005 - 4:35pm

Default value for readahead is 128K (at least it was in 2.6.8.1). This will tend to limit your read reqest size to 128K. With RAID devices this limits your performance because you're not reading a full stripe set at a time.

Try the following:

blockdev --setra 8192 /dev/sda1

Then run your test. You may need to experiment with the setra value. Setting it too large can lead to readahead thrashing in my experience.

If I read your config right, I would shoot for at least 1.5MB reads (6*256K = 1526K).

P.S. The setra value is in terms of 512-byte blocks, so 8192 gives a 4MB max readahead.

setra

raziebe
on
February 1, 2005 - 12:48pm

Sorry for not mentioning it.
In my tests i've tuned the read-ahead to 8192 which gave the best results for random read over filesystem. the default for 3ware is 256.

Raz Ben Jehuda

My 2 cents about directIO

cantinflas
on
January 31, 2005 - 5:38pm

If you know the I/O access pattern that will be used it's actually not that hard to outperform the pagecache. Especially in the large, random, and large+random I/O scenarios. The main reason is because you can give your app information that the readahead algorithm doesn't have and can't guess. (Such as you may know that you're going to read 10MB chunks in a somewhat random fashion from a very large file.) The second reason is because there is actually less processing overhead.

AFAIK, read data should get placed directly into the user buffer as long as your HW supports DMA. Same goes for writes (in the other direction).

The one thing to watch out for, is that directIO may or may not bypass the standard locking mechanisms (I'm not sure). It's a feature really only intended for people who know what they are doing and are prepared to work without a safety net.

thanks

raziebe
on
February 1, 2005 - 1:07pm

Well, i am happy to see that i am not alone in my opinion over the overhead of large processing.
I been comparing performance of kernel vs user space when both are doing the same algorithm. I must say that system calls in linux are very light, and of the few things that would convince me to write code in kernel space is the ability to control the precision of the scheduling (specially in rtai) and when possible , the ability to avoid buffer managemnt and double copying.
thankyou
Long live the penguin
Raz Ben Jehuda

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.