Hi. I have two machines one running 4.6, the other running a recent snapshot of current. tcpbench reports maximum throughput of 275 Mbit - that's around 34 MB/s between them over a gig-E link. What should one expect with an el-cheapo gig-e switch and 'em' Intel NIC and a msk NIC? Is that reasonable or too slow? The 4.6 machine has a softraid mirror and can read off it at around 55 MB/s as shown by 'dd', and the -current machine has an eSATA enclosure mounted async for the purpose of quickly backing up to it, that I can write to at around 45 MB/s as shown by 'dd'. However copying over the network to it - through NFS I can only get around 15 MB/s. Where is the bottleneck? How to fix?? Copying with rsync over ssh is even slower due to rsync and ssh eating quite a bit of CPU - but that's to be expected. Thanks a bunch.
It would be best put this way - if you go for the lowest bidder, in most cases you get what you pay for. Your results aren't too bad considering what's in use. With top notch stuff (we're talking HP Procurve/Cisco Catalyst and Intel PRO/1000+ cards here) plus tuning for Jumbo frames, you can get to the 95MB/sec range. And it's not just CPU usage that slows rsync over ssh - the transfer rate is only counting the data that gets pushed through - it doesn't cover the encrypted data, which is something like 30-40% bigger than the original. -- Aaron Mason - Programmer, open source addict I've taken my software vows - for beta or for worse
On Tue, Jan 5, 2010 at 12:40 AM, Aaron Mason <simplersolution@gmail.com> Thanks. Where could I find more info on tuning jumbo frames? Both cards support it... Update: after upgrading the other machine to -current. tcpbench performs around 420 Mbit/s now :D One of the machines is using pf...
Start with mount_nfs options, specifically -r and -w; I assume that
Setting -r and -w to 16384, and jumbo frames to 9000 yields just a couple of MB/s more. Far from 10 MB/s more the network can do ;(
There is much more to do. You can find some ideas eg. here http://www.openbsd.org/papers/tuning-openbsd.ps . It's good idea to follow outputs of systat, vmstat and top for some time to find bottlenecks. -- http://www.openbsd.org/lyrics.html
I recall a message in misc (which I am not able to find on the archives) about someone posting here the results of his research on optimizing and improving OpenBSD overall performance (fs, network, etc). Among the links he posted on his comprehensive compilation, he sent tuning-openbsd.ps. I remember one reply of a developer stating that some of those tuning measures are not needed anymore as OpenBSD has grown quite a bit since that time. Which are the recommended -always working- directions, then, to tune a system for its particular needs? My point is we all have to be careful and not follow guides or try values on sysctls blindly (although experimenting is welcome and healthy) as we can harm more than benefit we can get. Still, some enviroments will need adjustment to push much more traffic than GENERIC can, and this is a really
I'm one of the two authors of this paper. there isn't really all that much needed these days, defaults are good. some very specific situations benefit from some specific things, but heh :) I really like the 275 -> 420MBit/s change for 4.6 -> current with pf. -- Henning Brauer, hb@bsws.de, henning@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting
really. what a surprise. -- Henning Brauer, hb@bsws.de, henning@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting
Anything wrong with http://everything2.com/title/stating+the+obvious ? But I guess, there's nothing wrong with making fun of it, either...
well, yu know, i have been working on pf and general network stack performance for years. others have improved performance in subsystems used. i almost always bench my changes. i cannot point my finger to one change between 4.6 and -current that is the cause for this improvement, there were a few - and i keep forgetting what made 4.6 and what was after. -- Henning Brauer, hb@bsws.de, henning@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting
Update: both machines run -current again this time. I think my initial
tcpbench results were poor because of running cbq queuing on 4.6. The
server has em NIC , the client has msk. Jumbo frames are set to 9000
on both, but don't make much difference. This is with a $20 D-link
switch.
tcpbench results:
pf disabled on both machines: 883 Mb/s
pf enabled on tcpbench server only - simple ruleset like the documentation
example: 619 Mb/s
pf enabled on both machines - the tcpbench client box has the standard
-current default install pf.conf: 585 Mb/s
pf enabled on just the tcpbench server: with cbq queuing enabled on
the internal interface as follows (for tcpbench only, not for real
network use) - no other queues defined on $int_if:
altq on $int_if cbq bandwidth 1Gb queue { std_in, ssh_im_in, dns_in }
queue std_in bandwidth 999.9Mb cbq(default,borrow)
401 Mb/s
Why is that? cbq code overhead? The machine doesn't have enough CPU?
Or am I missing something? Admittedly it's an old P4.
After a while, during benching, even if pf is disabled on both
machines the throughput drops to 587 Mbit/s. The only way to bring it
back up to 883 Mb/s is to reboot the tcpbench client. Anyone know why?
Thanks!
that seems.... weird. CPU throttling down becuase it overheated, perhaps? could be some ressource issue in OpenBSD as well, but i've never seen such behaviour -- Henning Brauer, hb@bsws.de, henning@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting
On Wed, Jan 13, 2010 at 8:39 PM, Henning Brauer <lists-openbsd@bsws.de> Why? cpu0: Intel(R) Pentium(R) 4 CPU 2.53GHz ("GenuineIntel" 686-class) 2.52 GHz Isn't 2.52GHz fast enough for gigabit links? I know that's like half that in P3 cycles, but still... What's the issue? Thank you.
cache -- Henning Brauer, hb@bsws.de, henning@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting
very OT : Is there some tool for inspection of CPU cache like this one http://docs.sun.com/app/docs/doc/819-2240/cpustat-1m?l=en&a=view ? I found in man pages memconfig(8), but if I'm understand it correctly then it's just for setting. On Thu, Jan 14, 2010 at 5:43 AM, Henning Brauer <lists-openbsd@bsws.de>
On Wed, Jan 13, 2010 at 11:43 PM, Henning Brauer <lists-openbsd@bsws.de> What about it? Please elaborate. Thanks!
it's very different in P4 and sucks -- Henning Brauer, hb@bsws.de, henning@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting
What shows 'systat vmstat' during your tests plus other "windows" like mbufs and similar, what shows 'vmstat -m' and so on. It will say much more about actual situation of whole system then tcpbench.
Jumbo frames on em(4) will not gane you that much since the driver is already very efficent. msk(4) is a bit a different story one problem for high speed low delay links is the interrupt mitigation in those cards. msk(4) delays packets a lot more then em(4). Plus you should run -current on msk(4) systems (a few things got fixed at f2k9).
Rpc unfortunately is slow.
For some reasone, when I mount NFS drives with -r=4096 and -w=4096 I reach the best transfer rates.
This is possibly because the OS is able to match the request to a single memory page for your architecture. Other architectures offer larger page sizes.
Not saying that's the case, but a possibility.
__________________________________________________________________
Looking for the perfect gift? Give the gift of Flickr!
http://www.flickr.com/gift/
