Re: NAPI, rx_no_buffer_count, e1000, r8169 and other actors

Previous thread: [PATCH] raw: Restore /proc/net/raw correct behavior by Eric Dumazet on Sunday, June 15, 2008 - 3:22 am. (2 messages)

Next thread: r8169 rx descriptors by Denys Fedoryshchenko on Sunday, June 15, 2008 - 2:26 pm. (4 messages)
From: Denys Fedoryshchenko
Date: Sunday, June 15, 2008 - 1:24 pm

Hi

Since i am using PC routers for my network, and i reach significant numbers
(for me significant) i start noticing minor problems. So all this talk about
networking performance in my case.

For example.
Sun server, AMD based (two CPU -  AMD Opteron(tm) Processor 248).
e1000 connected over PCI-X ([    4.919249] e1000: 0000:01:01.0: e1000_probe:
(PCI-X:100MHz:64-bit) 00:14:4f:20:89:f4)

All traffic processed over eth0, 5 VLAN, 1 second average around 110-200Mbps
of traffic. Host running also conntrack (max 1000000 entries, when packetloss
happen - around 256k entries). Around 1300 routes (FIB_TRIE) running. What is
worrying me, that ok, i win time by increasing rx descriptors from 256 to
4096, but how much time i win? if it "cracks" on 100 Mbps RX, it means by
interpolating descriptors increase from 256 to 4096 (4 times), i cannot
process more than 400Mbps RX?
The CPU is not so busy after all... maybe there is a way to change some
parameter to force NAPI poll interface more often?
I tried nice, changing realtime priority to FIFO, changing kernel to
preemptible... no luck, except increasing descriptors.

Router-Dora ~ # mpstat -P ALL 1
Linux 2.6.26-rc6-git2-build-0029 (Router-Dora)  06/15/08

22:51:02     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal  
%idle    intr/s
22:51:03     all    1.00    0.00    0.00    0.00    2.50   29.00    0.00  
67.50  12927.00
22:51:03       0    2.00    0.00    0.00    0.00    4.00   59.00    0.00  
35.00  11935.00
22:51:03       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
100.00    993.00
22:51:03       2    0.00    0.00    0.00    0.00    0.00    0.00    0.00   
0.00      0.00

  PID  PPID USER     STAT   VSZ %MEM %CPU COMMAND
 1544     1 root     S     5824  0.2  0.0 /usr/sbin/snmpd -c /config/snmpd.conf
 1530     1 squid    S     2880  0.1  0.0 /usr/sbin/ripd -d
 1524     1 squid    S     2740  0.1  0.0 /usr/sbin/zebra -d
    1     0 root     S     2384  0.1  0.0 /bin/sh /init
 1576  1115 root     S     ...
From: Francois Romieu
Date: Sunday, June 15, 2008 - 1:57 pm

Denys Fedoryshchenko <denys@visp.net.lb> :

400 rx + 400 tx or 200 rx + 200 tx ?

Can you send an ethtool -S (+ ifconfig) of the 8169 if it misses packets 
as well as the lines of dmesg which relate to the r8169 driver ?

-- 
Ueimor
--

From: Denys Fedoryshchenko
Date: Sunday, June 15, 2008 - 2:32 pm

On this host 275 Mbps TX right now, 152 Mbps RX.
After 3 minute uptime
eth0      Link encap:Ethernet  HWaddr 00:18:F8:0B:46:A6
          inet addr:192.168.20.10  Bcast:0.0.0.0  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:9510755 errors:0 dropped:400 overruns:0 frame:0
          TX packets:9601889 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:10000
          RX bytes:3768549053 (3.5 GiB)  TX bytes:2251698126 (2.0 GiB)
          Interrupt:21 Base address:0x4000

MegaRouter-KARAM ~ # ethtool -S eth0
NIC statistics:
     tx_packets: 10336831
     rx_packets: 10191781
     tx_errors: 0
     rx_errors: 0
     rx_missed: 436
     align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     unicast: 10183249
     broadcast: 971
     multicast: 7561
     tx_aborted: 0
     tx_underrun: 0

MegaRouter-KARAM ~ # mpstat -P ALL 1
Linux 2.6.26-rc6-git2-build-0029 (MegaRouter-KARAM)     06/16/08

00:32:08     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal  
%idle    intr/s
00:32:09     all    0.50    0.00    1.49    0.00    1.49   27.23    0.00  
69.31  76659.41
00:32:09       0    1.01    0.00    0.00    0.00    0.00   43.43    0.00  
55.56  61549.50
00:32:09       1    0.00    0.00    1.98    0.00    2.97   10.89    0.00  
84.16  15102.97
00:32:09       2    0.00    0.00    0.00    0.00    0.00    0.00    0.00   


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--

From: Denys Fedoryshchenko
Date: Sunday, June 15, 2008 - 2:32 pm

Very sorry, forgot dmesg
[    3.070955] r8169 Gigabit Ethernet driver 2.2LK-NAPI loaded
[    3.070972] ACPI: PCI Interrupt 0000:07:00.0[A] -> GSI 21 (level, low) ->
IRQ 21
[    3.071582] eth0: RTL8110s at 0xf8894000, 00:18:f8:0b:46:a6, XID 04000000
IRQ 21




--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--

From: Ben Hutchings
Date: Sunday, June 15, 2008 - 4:46 pm

[Empty message]
From: Stephen Hemminger
Date: Sunday, June 15, 2008 - 7:59 pm

On Mon, 16 Jun 2008 00:46:22 +0100

You are CPU limited because of the overhead of firewalling. When this happens

No if the receive side is CPU limited, you just end up eating more memory.

How are you measuring CPU? You need to do something like measure the available
cycles left for applications. Don't believe top or other measures that may


Routing and firewalling should scale well. The deadlock is probably going

The bigger issues is available memory bandwidth. Different processors
and busses have different overheads. PCI is much worse than PCI-express,
and CPU's with integrated memory controllers do much better than CPU's
with separate memory controller (like Core 2).

--

From: Denys Fedoryshchenko
Date: Sunday, June 15, 2008 - 9:05 pm

I tried to increase net.core.netdev_max_backlog, it doesn't help, and it
doesn't change anything at all.
But it looks like: if i have 200Mbps RX, with average packet 500 bytes, i have
50Kpps rate. RX descriptor is 256 packets, each 1ms passed 50 packets. If poll
just more than late then 5ms, i miss packets. Or if it doesn't complete all
packets in one softirq cycle.
Probably i understand something (or everything) wrong.

But firewalling must be not a big deal, since i am not using anything "heavy"
like L7 filtering. But i will try to optimize rules, like i did once with u32
hash... so most of packets will not pass "long chain". And, there is around 29
Thats very good idea. 
e1000 / AMD - cache size      : 1024 KB

Probably mpstat gives correct results? I never use top, other that to find
clear CPU hog userspace app.

Router-Dora ~ # mpstat 1
Linux 2.6.26-rc6-git2-build-0029 (Router-Dora)  06/16/08

06:31:19     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal  
%idle    intr/s
06:31:20     all    0.00    0.00    0.00    0.00    1.51    8.04    0.00  
90.45  13570.30
06:31:21     all    0.00    0.00    0.00    0.00    2.49    9.95    0.00  
87.56  13986.00
06:31:22     all    0.00    0.00    0.50    0.00    2.49    9.45    0.00  

I tried to change tx queue length. If i make it too much small, it will just
drop packets _silently_. Will not be shown on netstat -s, nor ifconfig stats.

I think in this case r8169 is routed over PCI-PCIExpress bridge, other card is
PCIExpress, nothing else on PCI, other than IDE controller which not used at
all). Yes, it is bad, but still must be 133 Mbyte/s (1064 Mbit/s). Yes i know
Well, realtek 8169 doesn't support changing ring, and doesn't support changing
coalesce parameters. By the way e1000 also doesn't support -C, but e1000e
Yes, but in my case Core 2 do heavier job much better, probably because of
larger cache or some voodoo magic.

The biggest issue, in this country it is not possible to find PCI-Express
network ...
Previous thread: [PATCH] raw: Restore /proc/net/raw correct behavior by Eric Dumazet on Sunday, June 15, 2008 - 3:22 am. (2 messages)

Next thread: r8169 rx descriptors by Denys Fedoryshchenko on Sunday, June 15, 2008 - 2:26 pm. (4 messages)