Re: em(4) ierrs

Previous thread: probleme sur votre derniere facture by Orange on Monday, September 20, 2010 - 8:21 am. (1 message)

Next thread: que d'emotions ! by Jean Marie - RKC on Monday, September 20, 2010 - 1:39 pm. (1 message)
From: Andre Keller
Subject: em(4) ierrs
Date: Monday, September 20, 2010 - 10:15 am

Hi


I have some odd packet loss on a openbsd based router (running -current
as of the beginning of september....) .

The router has 6 physical interfaces (all em, Intel 82575EB), 4 of them
have traffic (about 10-20 Mbps).


We did some tuning (mostly with informations from:
https://calomel.org/network_performance.html) and could improve the
performance:

Currently we use the following sysctl tweaks:
sysctl kern.maxclusters=122880
sysctl net.inet.ip.ifq.maxlen=1536
sysctl net.inet.tcp.recvspace=262144
sysctl net.inet.tcp.sendspace=262144
sysctl net.inet.udp.recvspace=262144
sysctl net.inet.udp.sendspace=262144


But still we have about 1300 Ierrs per minute...

When we run a simple ping, we can see that something is strange. Where
the majority of packets have a rtt of 1ms or less about every tenth
package shows a rtt of >250ms...


I could really use a hint of what to try next (autoneg has been disabled
on all interfaces for testing, now it has been enabled again...)



Thank you for your inputs


Andri Keller




The switches on the other and of the device are both cisco 2960G with a
lacp to two interfaces on the openbsd box:

em0: flags=8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST>
mtu 1500
        lladdr 00:25:90:05:54:6c
        priority: 0
        trunk: trunkdev trunk1
        media: Ethernet autoselect (1000baseT full-duplex)
        status: active
        inet6 fe80::225:90ff:fe05:546c%em0 prefixlen 64 scopeid 0x1
em1: flags=8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST>
mtu 1500
        lladdr 00:25:90:05:54:6c
        priority: 0
        trunk: trunkdev trunk1
        media: Ethernet autoselect (1000baseT full-duplex)
        status: active
        inet6 fe80::225:90ff:fe05:546d%em1 prefixlen 64 scopeid 0x2
em2: flags=8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST>
mtu 1500
        lladdr 00:25:90:05:54:6e
        priority: 0
        trunk: trunkdev trunk0
        media: Ethernet 1000baseT ...
From: James Peltier
Date: Monday, September 20, 2010 - 10:54 am

I see you are using LACP as your trunk protocol.  You might want to check that 
all the LACP settings are correct or that there aren't any links being dropped 
for some reason that might cause the errors to occur.  Additionally, have you 
tried with only one link in the LACP pairs being active?  Does it stop then?

 ---
James A. Peltier     james_a_peltier@yahoo.ca

From: Andre Keller
Date: Monday, September 20, 2010 - 3:51 pm

Just tried that. There is not much I can configure for LACP. On the
switch I see no errors.

I've now pulled one cable so that only on interface in the trunk is
active. The problem is still existing. Ierrs on the interfaces (mostly
em2) (btw. there are no ifq.drops)
It seems to me that some buffers are running full. As now when there is
low traffic there is only a small amount of errors (about 150 in 5minutes)

Are there any other knobs I could try to tune?


Regards Andri

From: James Peltier
Date: Tuesday, September 21, 2010 - 12:06 am

I would be tempted to say, back out all your changes and return to a stock 
configuration, except for the net.inet.ip.ifq.maxlen parameter.

I posted in early august that I was able to push nearly full gigabit speeds with 
a Dell R200 w/4GB of RAM with a pretty stock configuration.  Eventually I had to 
bump maxlen and the state table but that's about it.  I don't see these problems 
on an mid August snapshot.  I haven't had a chance to try the latest ones yet 
though.


 ---
James A. Peltier     james_a_peltier@yahoo.ca

From: Stuart Henderson
Date: Monday, September 20, 2010 - 3:43 pm

grr, that page again.

"As a very general rule, using the on-board network card is going
to be much slower than an add in PCI card"

"A gigabit network controller built on board using the CPU will
slow the entire system down. More than likely the system will not
even be able to sustain 100MB speeds while also pegging the CPU at
100%."



increasing this from the defaults can be useful if you see drops in
net.inet.ip.ifq.drops, I'm surprised if you have to go that high for


missing dmesg. but try disabling sensor devices or i2c controllers
(boot -c, disable <somedevice>, quit).

From: Andre Keller
Date: Monday, September 20, 2010 - 4:07 pm

As we didn't find any other advices out there we thought it might be

yes this might be a bit to much:
[root@rt01-rc: root]# netstat
-m                                             
9665 mbufs in use:
        9642 mbufs allocated to data
        14 mbufs allocated to packet headers
        9 mbufs allocated to socket names and addresses
83/1970/122880 mbuf 2048 byte clusters in use (current/peak/max)
0/8/122880 mbuf 4096 byte clusters in use (current/peak/max)
0/8/122880 mbuf 8192 byte clusters in use (current/peak/max)
0/8/122880 mbuf 9216 byte clusters in use (current/peak/max)
0/8/122880 mbuf 12288 byte clusters in use (current/peak/max)
0/8/122880 mbuf 16384 byte clusters in use (current/peak/max)
0/8/122880 mbuf 65536 byte clusters in use (current/peak/max)
7288 Kbytes allocated to network (35% in use)
0 requests for memory denied
0 requests for memory delayed

yeah we had alot of ifq drops first and after setting this value they
are gone... I read on multiple "tuning tutorial" setting this to


Not from the machine above but a machine with the exactly same hardware...

OpenBSD 4.8 (GENERIC.MP) #3: Wed Aug 11 19:24:59 CEST 2010
    root@scaramanga.rbnetwork.biz:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 3486973952 (3325MB)
avail mem = 3380334592 (3223MB)
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.5 @ 0xcfedf000 (39 entries)
bios0: vendor Phoenix Technologies LTD version "1.3a" date 11/03/2009
bios0: Supermicro X7SBi
acpi0 at bios0: rev 2
acpi0: sleep states S0 S1 S4 S5
acpi0: tables DSDT FACP _MAR MCFG APIC BOOT SPCR ERST HEST BERT EINJ
SLIC SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT
acpi0: wakeup devices PXHA(S5) PEX_(S5) LAN_(S5) USB4(S5) USB5(S5)
USB7(S5) ESB2(S5) EXP1(S5) EXP5(S5) EXP6(S5) USB1(S5) USB2(S5) USB3(S5)
USB6(S5) ESB1(S5) PCIB(S5) KBC0(S1) MSE0(S1) COM1(S5) COM2(S5) PWRB(S3)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) ...
From: Stuart Henderson
Date: Monday, September 20, 2010 - 4:28 pm

I would try wbng first. Failing that, lm. I doubt you would
need to disable ichiic but that would be the next step if there's
no improvement. You can make permanent changes to an on-disk

Please follow-up and let us know how it goes.

From: Andre Keller
Date: Wednesday, September 22, 2010 - 8:38 am

Hi Stuart


well disabling wbng seems to be the solution. After one day of normal
traffic levels we do not see any Ierrs anymore...

Thank you Stuart for the helpful advise.


Can somebody explain how this driver (which is for getting voltage
levels, fan speeds etc, if i did not misinterpret the manpage) is
causing this strange behavior? I'm just curious...


Thank you all


Regards Andre

From: Stuart Henderson
Date: Wednesday, September 22, 2010 - 8:44 am

Great, thanks for the feedback.

If any code ties up the kernel for too long, it can't handle
other tasks in a timely fashion. 

From: James Peltier
Date: Wednesday, September 22, 2010 - 10:04 am

I, unfortunately, am still experiencing livelocks on my em interfaces on my Dell 
R200 server in bridging mode.  I'm going to have to schedule an upgrade to the 
latest snapshot first to see if that clears up any issues, but barring that I'm 
not sure where to look.  Perhaps I'll also try the UP kernel.

---
James A. Peltier     james_a_peltier@yahoo.ca

From: Stuart Henderson
Date: Wednesday, September 22, 2010 - 12:31 pm

the "livelock" counter means a timeout wasn't reached in time,
indicating the system being too busy to run userland.
(see m_cltick(), m_cldrop() etc in sys/kern/uipc_mbuf.c,
and the video from asiabsdcon starting about 15 minutes into
http://www.youtube.com/watch?v=fv-AQJqUzRI).

when this happens, nics with drivers using the MCLGETI mechanism
halve the size of their receive rings, so that packets drop
earlier, more effectively limiting system load than if they
were allowed to proceed up the network stack.

so for some reason or other the timeout wasn't processed
quickly enough and the system responds in this way to limit
the overload. so the challenge is to identify what causes
the system to become non-responsive (could be in the network
stack or could be for other reasons) and work out ways
to alleviate that..

From: James Peltier
Date: Wednesday, September 22, 2010 - 3:36 pm

Watching now. :)

From: James Peltier
Date: Wednesday, September 22, 2010 - 5:06 pm

Thanks for the notes.  Below are snapshots of vmstat -i and systat vmstat which 
do show "high" interrupt levels (6-12k).  I put quotes around high because I'm 
not really sure if that is high.

That said, is there any benefit to the use of blocknonip clause being added to 
the bridge devices?

I also note, that according to the m_cldrop() that the "halving" is done on all 
interfaces.  This seems odd, in that, if you had a device with multiple cards 
that all traffic would be affected at the expense of one.  Am I correct in this?


# vmstat -i
interrupt                       total     rate
irq0/clock                  819075628      199
irq0/ipi                     20855029        5
irq112/em0                12478765512     3047
irq113/em1                13607027530     3322
irq113/bge1                  12635532        3
irq97/uhci1                      1949        0
irq96/ehci0                        22        0
irq98/pciide0                 5204039        1
irq145/com0                       339        0
Total                     26943565580     6578


and

#systat vmstat

   1 users    Load 0.64 0.67 0.66                      Wed Sep 22 16:56:35 2010

            memory totals (in KB)            PAGING   SWAPPING     Interrupts
           real   virtual     free           in  out   in  out    11067 total
Active    15388     15388  2918228   ops                            200 clock
All      383480    383480  6585880   pages                           48 ipi
                                                                   5586 em0
Proc:r  d  s  w    Csw   Trp   Sys   Int   Sof  Flt     1 forks    5212 em1
           7       101   561  1525  9438   105  595       fkppw      21 bge1
                                                          fksvm         uhci1
  18.8%Int   1.3%Sys   1.9%Usr   0.0%Nic  77.9%Idle       pwait         ehci0
|    |    |    |    |    |    |    |    |    |    |       relck         pciide0
|||||||||=>                                           ...
From: Henning Brauer
Date: Wednesday, September 22, 2010 - 5:42 pm

and this, by itself, isn't necessarily a problem. you just see the rx
ring autosizing figuring out the right size.

-- 
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting

From: Henning Brauer
Date: Monday, September 20, 2010 - 6:07 pm

ok, here's another advice that you migt wanna follow since you don't
find another:
to make your system run faster, donate all your belongings to openbsd,
then dance naked around the computer and eat nothing but rice all day.
after a few days throw the computer into the ocean. it'll be very fast
(to sink).

-- 
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting

From: Henning Brauer
Date: Monday, September 20, 2010 - 6:04 pm

holy shit.
that is indeed horribly wrong. in many cases it is the exact opposite

as said a gazillion times.

-- 
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting

From: Joerg Goltermann
Date: Tuesday, September 21, 2010 - 12:21 am

which packet rate do you expect on the interfaces? Do you see
livelocks (systat -b mbuf)?

  - Joerg

From: Andre Keller
Date: Tuesday, September 21, 2010 - 2:13 am

IFACE             LIVELOCKS  SIZE ALIVE   LWM   HWM  
CWM                      
System                        256  9893        
805                            
                               2k   287        
985                            
lo0                                                                            

em0                    3765    2k   113     4   256  
113                      
em1                      43    2k    12     4   256    
4                      
em2                    9311    2k   135     4   256  
135                      
em3                     670    2k    12     4   256    
4                      
em4                      43    2k     6     4   256     6 

From: Stuart Henderson
Date: Tuesday, September 21, 2010 - 6:54 am

seriously,  please try disabling at least wbng, i think there is no
point looking at other things until you have tried that.

From: James Peltier
Date: Tuesday, September 21, 2010 - 9:46 am

livelocks are seen on my em interfaces as well.  I also have livelocks on my far 
less busy bge1 management interface.  See below

IFACE             LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM
System                        256   116          84
                               2k    92         504
lo0
em0                   29363    2k    37     4   256    37
em1                   10174    2k    37     4   256    37
bge0
bge1                      4    2k    17    17   512    17
enc0
vlan300
bridge0
pflog0
pflow0


 ---
James A. Peltier     james_a_peltier@yahoo.ca

From: James Peltier
Date: Tuesday, September 21, 2010 - 9:51 am

I should mention that these might have been made prior to some recent tuning.  
However, for the purpose of following this thread I will keep an eye on it to be 
sure.

From: James Peltier
Date: Tuesday, September 21, 2010 - 8:31 pm

I am in bridging mode and I too, am indeed seeing a slow increase in livelocks 
on my em0 interfaces.  Traffic has been quite low over the past week or so, so 
it certainly shouldn't be an issue.  The only modifications I have made thus far 
are to the net.inet.ip.ifq.maxlen bumped to 2048.  If you want any other info 
please let me know.


#sysctl -b mbuf
   1 users    Load 0.13 0.09 0.08                      Tue Sep 21 20:22:30 2010

IFACE             LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM
System                        256    98          84
                               2k    74         504
lo0
em0                   29891    2k    29     4   256    29
em1                   10381    2k    28     4   256    28
bge0
bge1                      4    2k    17    17   512    17
enc0
vlan300
bridge0
pflog0
pflow0


# netstat -m
100 mbufs in use:
        95 mbufs allocated to data
        1 mbuf allocated to packet headers
        4 mbufs allocated to socket names and addresses
74/1008/6144 mbuf 2048 byte clusters in use (current/peak/max)
0/8/6144 mbuf 4096 byte clusters in use (current/peak/max)
0/8/6144 mbuf 8192 byte clusters in use (current/peak/max)
0/8/6144 mbuf 9216 byte clusters in use (current/peak/max)
0/8/6144 mbuf 12288 byte clusters in use (current/peak/max)
0/8/6144 mbuf 16384 byte clusters in use (current/peak/max)
0/8/6144 mbuf 65536 byte clusters in use (current/peak/max)
2544 Kbytes allocated to network (6% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines
#

 ---
James A. Peltier     james_a_peltier@yahoo.ca

From: Claudio Jeker
Date: Tuesday, September 21, 2010 - 9:30 pm

If you use bridge(4) net.inet.ip.ifq.maxlen will not change anything since
that queue is only used for incomming IP traffic. bridge(4) is stealing
the packets beforehands and has a own ifq.


From: patrick keshishian
Date: Tuesday, September 21, 2010 - 10:02 pm

On Tue, Sep 21, 2010 at 8:31 PM, James Peltier <james_a_peltier@yahoo.ca>


Previous thread: probleme sur votre derniere facture by Orange on Monday, September 20, 2010 - 8:21 am. (1 message)

Next thread: que d'emotions ! by Jean Marie - RKC on Monday, September 20, 2010 - 1:39 pm. (1 message)