Hi I have some odd packet loss on a openbsd based router (running -current as of the beginning of september....) . The router has 6 physical interfaces (all em, Intel 82575EB), 4 of them have traffic (about 10-20 Mbps). We did some tuning (mostly with informations from: https://calomel.org/network_performance.html) and could improve the performance: Currently we use the following sysctl tweaks: sysctl kern.maxclusters=122880 sysctl net.inet.ip.ifq.maxlen=1536 sysctl net.inet.tcp.recvspace=262144 sysctl net.inet.tcp.sendspace=262144 sysctl net.inet.udp.recvspace=262144 sysctl net.inet.udp.sendspace=262144 But still we have about 1300 Ierrs per minute... When we run a simple ping, we can see that something is strange. Where the majority of packets have a rtt of 1ms or less about every tenth package shows a rtt of >250ms... I could really use a hint of what to try next (autoneg has been disabled on all interfaces for testing, now it has been enabled again...) Thank you for your inputs Andri Keller The switches on the other and of the device are both cisco 2960G with a lacp to two interfaces on the openbsd box: em0: flags=8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST> mtu 1500 lladdr 00:25:90:05:54:6c priority: 0 trunk: trunkdev trunk1 media: Ethernet autoselect (1000baseT full-duplex) status: active inet6 fe80::225:90ff:fe05:546c%em0 prefixlen 64 scopeid 0x1 em1: flags=8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST> mtu 1500 lladdr 00:25:90:05:54:6c priority: 0 trunk: trunkdev trunk1 media: Ethernet autoselect (1000baseT full-duplex) status: active inet6 fe80::225:90ff:fe05:546d%em1 prefixlen 64 scopeid 0x2 em2: flags=8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST> mtu 1500 lladdr 00:25:90:05:54:6e priority: 0 trunk: trunkdev trunk0 media: Ethernet 1000baseT ...
I see you are using LACP as your trunk protocol. You might want to check that all the LACP settings are correct or that there aren't any links being dropped for some reason that might cause the errors to occur. Additionally, have you tried with only one link in the LACP pairs being active? Does it stop then? --- James A. Peltier james_a_peltier@yahoo.ca
Just tried that. There is not much I can configure for LACP. On the switch I see no errors. I've now pulled one cable so that only on interface in the trunk is active. The problem is still existing. Ierrs on the interfaces (mostly em2) (btw. there are no ifq.drops) It seems to me that some buffers are running full. As now when there is low traffic there is only a small amount of errors (about 150 in 5minutes) Are there any other knobs I could try to tune? Regards Andri
I would be tempted to say, back out all your changes and return to a stock configuration, except for the net.inet.ip.ifq.maxlen parameter. I posted in early august that I was able to push nearly full gigabit speeds with a Dell R200 w/4GB of RAM with a pretty stock configuration. Eventually I had to bump maxlen and the state table but that's about it. I don't see these problems on an mid August snapshot. I haven't had a chance to try the latest ones yet though. --- James A. Peltier james_a_peltier@yahoo.ca
grr, that page again. "As a very general rule, using the on-board network card is going to be much slower than an add in PCI card" "A gigabit network controller built on board using the CPU will slow the entire system down. More than likely the system will not even be able to sustain 100MB speeds while also pegging the CPU at 100%." increasing this from the defaults can be useful if you see drops in net.inet.ip.ifq.drops, I'm surprised if you have to go that high for missing dmesg. but try disabling sensor devices or i2c controllers (boot -c, disable <somedevice>, quit).
As we didn't find any other advices out there we thought it might be
yes this might be a bit to much:
[root@rt01-rc: root]# netstat
-m
9665 mbufs in use:
9642 mbufs allocated to data
14 mbufs allocated to packet headers
9 mbufs allocated to socket names and addresses
83/1970/122880 mbuf 2048 byte clusters in use (current/peak/max)
0/8/122880 mbuf 4096 byte clusters in use (current/peak/max)
0/8/122880 mbuf 8192 byte clusters in use (current/peak/max)
0/8/122880 mbuf 9216 byte clusters in use (current/peak/max)
0/8/122880 mbuf 12288 byte clusters in use (current/peak/max)
0/8/122880 mbuf 16384 byte clusters in use (current/peak/max)
0/8/122880 mbuf 65536 byte clusters in use (current/peak/max)
7288 Kbytes allocated to network (35% in use)
0 requests for memory denied
0 requests for memory delayed
yeah we had alot of ifq drops first and after setting this value they
are gone... I read on multiple "tuning tutorial" setting this to
Not from the machine above but a machine with the exactly same hardware...
OpenBSD 4.8 (GENERIC.MP) #3: Wed Aug 11 19:24:59 CEST 2010
root@scaramanga.rbnetwork.biz:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 3486973952 (3325MB)
avail mem = 3380334592 (3223MB)
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.5 @ 0xcfedf000 (39 entries)
bios0: vendor Phoenix Technologies LTD version "1.3a" date 11/03/2009
bios0: Supermicro X7SBi
acpi0 at bios0: rev 2
acpi0: sleep states S0 S1 S4 S5
acpi0: tables DSDT FACP _MAR MCFG APIC BOOT SPCR ERST HEST BERT EINJ
SLIC SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT
acpi0: wakeup devices PXHA(S5) PEX_(S5) LAN_(S5) USB4(S5) USB5(S5)
USB7(S5) ESB2(S5) EXP1(S5) EXP5(S5) EXP6(S5) USB1(S5) USB2(S5) USB3(S5)
USB6(S5) ESB1(S5) PCIB(S5) KBC0(S1) MSE0(S1) COM1(S5) COM2(S5) PWRB(S3)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) ...I would try wbng first. Failing that, lm. I doubt you would need to disable ichiic but that would be the next step if there's no improvement. You can make permanent changes to an on-disk Please follow-up and let us know how it goes.
Hi Stuart well disabling wbng seems to be the solution. After one day of normal traffic levels we do not see any Ierrs anymore... Thank you Stuart for the helpful advise. Can somebody explain how this driver (which is for getting voltage levels, fan speeds etc, if i did not misinterpret the manpage) is causing this strange behavior? I'm just curious... Thank you all Regards Andre
Great, thanks for the feedback. If any code ties up the kernel for too long, it can't handle other tasks in a timely fashion.
I, unfortunately, am still experiencing livelocks on my em interfaces on my Dell R200 server in bridging mode. I'm going to have to schedule an upgrade to the latest snapshot first to see if that clears up any issues, but barring that I'm not sure where to look. Perhaps I'll also try the UP kernel. --- James A. Peltier james_a_peltier@yahoo.ca
the "livelock" counter means a timeout wasn't reached in time, indicating the system being too busy to run userland. (see m_cltick(), m_cldrop() etc in sys/kern/uipc_mbuf.c, and the video from asiabsdcon starting about 15 minutes into http://www.youtube.com/watch?v=fv-AQJqUzRI). when this happens, nics with drivers using the MCLGETI mechanism halve the size of their receive rings, so that packets drop earlier, more effectively limiting system load than if they were allowed to proceed up the network stack. so for some reason or other the timeout wasn't processed quickly enough and the system responds in this way to limit the overload. so the challenge is to identify what causes the system to become non-responsive (could be in the network stack or could be for other reasons) and work out ways to alleviate that..
Thanks for the notes. Below are snapshots of vmstat -i and systat vmstat which
do show "high" interrupt levels (6-12k). I put quotes around high because I'm
not really sure if that is high.
That said, is there any benefit to the use of blocknonip clause being added to
the bridge devices?
I also note, that according to the m_cldrop() that the "halving" is done on all
interfaces. This seems odd, in that, if you had a device with multiple cards
that all traffic would be affected at the expense of one. Am I correct in this?
# vmstat -i
interrupt total rate
irq0/clock 819075628 199
irq0/ipi 20855029 5
irq112/em0 12478765512 3047
irq113/em1 13607027530 3322
irq113/bge1 12635532 3
irq97/uhci1 1949 0
irq96/ehci0 22 0
irq98/pciide0 5204039 1
irq145/com0 339 0
Total 26943565580 6578
and
#systat vmstat
1 users Load 0.64 0.67 0.66 Wed Sep 22 16:56:35 2010
memory totals (in KB) PAGING SWAPPING Interrupts
real virtual free in out in out 11067 total
Active 15388 15388 2918228 ops 200 clock
All 383480 383480 6585880 pages 48 ipi
5586 em0
Proc:r d s w Csw Trp Sys Int Sof Flt 1 forks 5212 em1
7 101 561 1525 9438 105 595 fkppw 21 bge1
fksvm uhci1
18.8%Int 1.3%Sys 1.9%Usr 0.0%Nic 77.9%Idle pwait ehci0
| | | | | | | | | | | relck pciide0
|||||||||=> ...and this, by itself, isn't necessarily a problem. you just see the rx ring autosizing figuring out the right size. -- Henning Brauer, hb@bsws.de, henning@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting
ok, here's another advice that you migt wanna follow since you don't find another: to make your system run faster, donate all your belongings to openbsd, then dance naked around the computer and eat nothing but rice all day. after a few days throw the computer into the ocean. it'll be very fast (to sink). -- Henning Brauer, hb@bsws.de, henning@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting
holy shit. that is indeed horribly wrong. in many cases it is the exact opposite as said a gazillion times. -- Henning Brauer, hb@bsws.de, henning@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting
IFACE LIVELOCKS SIZE ALIVE LWM HWM
CWM
System 256 9893
805
2k 287
985
lo0
em0 3765 2k 113 4 256
113
em1 43 2k 12 4 256
4
em2 9311 2k 135 4 256
135
em3 670 2k 12 4 256
4
em4 43 2k 6 4 256 6
seriously, please try disabling at least wbng, i think there is no point looking at other things until you have tried that.
livelocks are seen on my em interfaces as well. I also have livelocks on my far
less busy bge1 management interface. See below
IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM
System 256 116 84
2k 92 504
lo0
em0 29363 2k 37 4 256 37
em1 10174 2k 37 4 256 37
bge0
bge1 4 2k 17 17 512 17
enc0
vlan300
bridge0
pflog0
pflow0
---
James A. Peltier james_a_peltier@yahoo.ca
I should mention that these might have been made prior to some recent tuning. However, for the purpose of following this thread I will keep an eye on it to be sure.
I am in bridging mode and I too, am indeed seeing a slow increase in livelocks
on my em0 interfaces. Traffic has been quite low over the past week or so, so
it certainly shouldn't be an issue. The only modifications I have made thus far
are to the net.inet.ip.ifq.maxlen bumped to 2048. If you want any other info
please let me know.
#sysctl -b mbuf
1 users Load 0.13 0.09 0.08 Tue Sep 21 20:22:30 2010
IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM
System 256 98 84
2k 74 504
lo0
em0 29891 2k 29 4 256 29
em1 10381 2k 28 4 256 28
bge0
bge1 4 2k 17 17 512 17
enc0
vlan300
bridge0
pflog0
pflow0
# netstat -m
100 mbufs in use:
95 mbufs allocated to data
1 mbuf allocated to packet headers
4 mbufs allocated to socket names and addresses
74/1008/6144 mbuf 2048 byte clusters in use (current/peak/max)
0/8/6144 mbuf 4096 byte clusters in use (current/peak/max)
0/8/6144 mbuf 8192 byte clusters in use (current/peak/max)
0/8/6144 mbuf 9216 byte clusters in use (current/peak/max)
0/8/6144 mbuf 12288 byte clusters in use (current/peak/max)
0/8/6144 mbuf 16384 byte clusters in use (current/peak/max)
0/8/6144 mbuf 65536 byte clusters in use (current/peak/max)
2544 Kbytes allocated to network (6% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines
#
---
James A. Peltier james_a_peltier@yahoo.ca
If you use bridge(4) net.inet.ip.ifq.maxlen will not change anything since that queue is only used for incomming IP traffic. bridge(4) is stealing the packets beforehands and has a own ifq.
