Re: Maximizing File/Network I/O

Previous thread: softraid rebuild by nixlists on Monday, January 4, 2010 - 7:44 pm. (3 messages)

Next thread: pf: state reuse by nixlists on Monday, January 4, 2010 - 8:13 pm. (1 message)
From: nixlists
Date: Monday, January 4, 2010 - 8:05 pm

Hi.

I have two machines one running 4.6, the other running a recent
snapshot of current. tcpbench reports maximum throughput of 275 Mbit -
that's around 34 MB/s between them over a gig-E link. What should one
expect with an el-cheapo gig-e switch and 'em' Intel NIC and a  msk
NIC? Is that reasonable or too slow?

The 4.6 machine has a softraid mirror and can read off it at around 55
MB/s as shown by 'dd', and the -current machine has an eSATA enclosure
mounted async for the purpose of quickly backing up to it, that I can
write to at around 45 MB/s as shown by 'dd'. However copying over the
network to it - through NFS I can only get around 15 MB/s. Where is
the bottleneck?
How to fix??

Copying with rsync over ssh is even slower due to rsync and ssh eating
quite a bit of CPU - but that's to be expected.

Thanks a bunch.

From: Aaron Mason
Date: Monday, January 4, 2010 - 10:40 pm

It would be best put this way - if you go for the lowest bidder, in
most cases you get what you pay for.  Your results aren't too bad
considering what's in use.

With top notch stuff (we're talking HP Procurve/Cisco Catalyst and
Intel PRO/1000+ cards here) plus tuning for Jumbo frames, you can get
to the 95MB/sec range.

And it's not just CPU usage that slows rsync over ssh - the transfer
rate is only counting the data that gets pushed through - it doesn't
cover the encrypted data, which is something like 30-40% bigger than
the original.

--
Aaron Mason - Programmer, open source addict
I've taken my software vows - for beta or for worse

From: nixlists
Date: Monday, January 4, 2010 - 11:02 pm

On Tue, Jan 5, 2010 at 12:40 AM, Aaron Mason <simplersolution@gmail.com>

Thanks. Where could I find more info on tuning jumbo frames? Both
cards support it...

Update: after upgrading the other machine to -current. tcpbench
performs around 420 Mbit/s now :D

One of the machines is using pf...

From: Bret S. Lambert
Date: Monday, January 4, 2010 - 11:45 pm

Start with mount_nfs options, specifically -r and -w; I assume that

From: nixlists
Date: Tuesday, January 5, 2010 - 1:04 am

Setting -r and -w to 16384, and jumbo frames to 9000 yields just a
couple of MB/s more. Far from 10 MB/s more the network can do ;(

From: Tomas Bodzar
Date: Tuesday, January 5, 2010 - 1:13 am

There is much more to do. You can find some ideas eg. here
http://www.openbsd.org/papers/tuning-openbsd.ps . It's good idea to
follow outputs of systat, vmstat and top for some time to find
bottlenecks.




-- 
http://www.openbsd.org/lyrics.html

From: Iñigo Ortiz de Urbina
Date: Tuesday, January 5, 2010 - 3:12 am

I recall a message in misc (which I am not able to find on the archives)
about someone posting here the results of his research on optimizing and
improving OpenBSD overall performance (fs, network, etc).

Among the links he posted on his comprehensive compilation, he sent
tuning-openbsd.ps.

I remember one reply of a developer stating that some of those tuning
measures are not needed anymore as OpenBSD has grown quite a bit since that
time. Which are the recommended -always working- directions, then, to tune a
system for its particular needs?

My point is we all have to be careful and not follow guides or try values on
sysctls blindly (although experimenting is welcome and healthy) as we can
harm more than benefit we can get. Still, some enviroments will need
adjustment to push much more traffic than GENERIC can, and this is a really

From: Henning Brauer
Date: Tuesday, January 5, 2010 - 12:32 pm

I'm one of the two authors of this paper.

there isn't really all that much needed these days, defaults are good.
some very specific situations benefit from some specific things, but

heh :)

I really like the 275 -> 420MBit/s change for 4.6 -> current with pf.

-- 
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting

From: nixlists
Date: Wednesday, January 6, 2010 - 5:25 pm

Disabling pf gives a couple of MB/s more.

From: Henning Brauer
Date: Friday, January 8, 2010 - 8:13 pm

really. what a surprise.

-- 
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting

From: nixlists
Date: Sunday, January 10, 2010 - 1:35 pm

Anything wrong with http://everything2.com/title/stating+the+obvious   ?

But I guess, there's nothing wrong with making fun of it, either...

From: Uwe Werler
Date: Friday, January 8, 2010 - 3:29 pm

Oh cool! There's this explained a little bit deeper? Sounds VERY
interesting.

From: Henning Brauer
Date: Friday, January 8, 2010 - 8:15 pm

well, yu know, i have been working on pf and general network stack
performance for years. others have improved performance in subsystems
used. i almost always bench my changes. i cannot point my finger to
one change between 4.6 and -current that is the cause for this
improvement, there were a few - and i keep forgetting what made 4.6
and what was after.

-- 
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting

From: nixlists
Date: Wednesday, January 13, 2010 - 4:49 pm

Update: both machines run -current again this time. I think my initial
tcpbench results were poor because of running cbq queuing on 4.6. The
server has em NIC , the client has msk. Jumbo frames are set to 9000
on both, but don't make much difference. This is with a $20 D-link
switch.

tcpbench results:

pf disabled on both machines: 883 Mb/s

pf enabled on tcpbench server only - simple ruleset like the documentation
example: 619 Mb/s

pf enabled on both machines - the tcpbench client box has the standard
-current default install pf.conf: 585 Mb/s

pf enabled on just the tcpbench server: with cbq queuing enabled on
the internal interface as follows (for tcpbench only, not for real
network use) - no other queues defined on $int_if:

  altq on $int_if cbq bandwidth 1Gb queue { std_in, ssh_im_in, dns_in  }
  queue std_in    bandwidth 999.9Mb cbq(default,borrow)

401 Mb/s

Why is that? cbq code overhead? The machine doesn't have enough CPU?
Or am I missing something? Admittedly it's an old P4.

After a while, during benching, even if pf is disabled on both
machines the throughput drops to 587 Mbit/s. The only way to bring it
back up to 883 Mb/s is to reboot the tcpbench client. Anyone know why?

Thanks!

From: Henning Brauer
Date: Wednesday, January 13, 2010 - 6:39 pm

that seems.... weird. CPU throttling down becuase it overheated,
perhaps? could be some ressource issue in OpenBSD as well, but i've
never seen such behaviour

-- 
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting

From: nixlists
Date: Wednesday, January 13, 2010 - 7:11 pm

On Wed, Jan 13, 2010 at 8:39 PM, Henning Brauer <lists-openbsd@bsws.de>

Why?

  cpu0: Intel(R) Pentium(R) 4 CPU 2.53GHz ("GenuineIntel" 686-class) 2.52 GHz

Isn't 2.52GHz fast enough for gigabit links? I know that's like half
that in P3 cycles, but still... What's the issue?

Thank you.

From: Henning Brauer
Date: Wednesday, January 13, 2010 - 9:43 pm

cache

-- 
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting

From: Tomas Bodzar
Date: Wednesday, January 13, 2010 - 11:40 pm

very OT :

Is there some tool for inspection of CPU cache like this one
http://docs.sun.com/app/docs/doc/819-2240/cpustat-1m?l=en&a=view ? I
found in man pages memconfig(8), but if I'm understand  it correctly
then it's just for setting.

On Thu, Jan 14, 2010 at 5:43 AM, Henning Brauer <lists-openbsd@bsws.de>

From: nixlists
Date: Thursday, January 14, 2010 - 12:33 am

On Wed, Jan 13, 2010 at 11:43 PM, Henning Brauer <lists-openbsd@bsws.de>

What about it? Please elaborate.

Thanks!

From: Henning Brauer
Date: Wednesday, February 3, 2010 - 8:04 am

it's very different in P4 and sucks

-- 
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting

From: Tomas Bodzar
Date: Wednesday, January 13, 2010 - 11:32 pm

What shows 'systat vmstat' during your tests plus other "windows" like
mbufs and similar, what shows 'vmstat -m' and so on. It will say much
more about actual situation of whole system then tcpbench.


From: Claudio Jeker
Date: Tuesday, January 5, 2010 - 2:39 am

Jumbo frames on em(4) will not gane you that much since the driver is
already very efficent. msk(4) is a bit a different story one problem for
high speed low delay links is the interrupt mitigation in those cards.
msk(4) delays packets a lot more then em(4). Plus you should run -current
on msk(4) systems (a few things got fixed at f2k9).


From: Marco Peereboom
Date: Tuesday, January 5, 2010 - 5:21 am

Rpc unfortunately is slow.


From: Jean-Francois
Date: Thursday, January 14, 2010 - 10:53 am

For some reasone, when I mount NFS drives with -r=4096 and -w=4096 I reach
the best transfer rates.

From: James Peltier
Date: Sunday, January 17, 2010 - 6:15 pm

This is possibly because the OS is able to match the request to a single memory page for your architecture. Other architectures offer larger page sizes.

Not saying that's the case, but a possibility.



      __________________________________________________________________
Looking for the perfect gift? Give the gift of Flickr! 

http://www.flickr.com/gift/

Previous thread: softraid rebuild by nixlists on Monday, January 4, 2010 - 7:44 pm. (3 messages)

Next thread: pf: state reuse by nixlists on Monday, January 4, 2010 - 8:13 pm. (1 message)