Hi
Since i am using PC routers for my network, and i reach significant numbers
(for me significant) i start noticing minor problems. So all this talk about
networking performance in my case.
For example.
Sun server, AMD based (two CPU - AMD Opteron(tm) Processor 248).
e1000 connected over PCI-X ([ 4.919249] e1000: 0000:01:01.0: e1000_probe:
(PCI-X:100MHz:64-bit) 00:14:4f:20:89:f4)
All traffic processed over eth0, 5 VLAN, 1 second average around 110-200Mbps
of traffic. Host running also conntrack (max 1000000 entries, when packetloss
happen - around 256k entries). Around 1300 routes (FIB_TRIE) running. What is
worrying me, that ok, i win time by increasing rx descriptors from 256 to
4096, but how much time i win? if it "cracks" on 100 Mbps RX, it means by
interpolating descriptors increase from 256 to 4096 (4 times), i cannot
process more than 400Mbps RX?
The CPU is not so busy after all... maybe there is a way to change some
parameter to force NAPI poll interface more often?
I tried nice, changing realtime priority to FIFO, changing kernel to
preemptible... no luck, except increasing descriptors.
Router-Dora ~ # mpstat -P ALL 1
Linux 2.6.26-rc6-git2-build-0029 (Router-Dora) 06/15/08
22:51:02 CPU %user %nice %sys %iowait %irq %soft %steal
%idle intr/s
22:51:03 all 1.00 0.00 0.00 0.00 2.50 29.00 0.00
67.50 12927.00
22:51:03 0 2.00 0.00 0.00 0.00 4.00 59.00 0.00
35.00 11935.00
22:51:03 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
100.00 993.00
22:51:03 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00
PID PPID USER STAT VSZ %MEM %CPU COMMAND
1544 1 root S 5824 0.2 0.0 /usr/sbin/snmpd -c /config/snmpd.conf
1530 1 squid S 2880 0.1 0.0 /usr/sbin/ripd -d
1524 1 squid S 2740 0.1 0.0 /usr/sbin/zebra -d
1 0 root S 2384 0.1 0.0 /bin/sh /init
1576 1115 root S ...Denys Fedoryshchenko <denys@visp.net.lb> : 400 rx + 400 tx or 200 rx + 200 tx ? Can you send an ethtool -S (+ ifconfig) of the 8169 if it misses packets as well as the lines of dmesg which relate to the r8169 driver ? -- Ueimor --
On this host 275 Mbps TX right now, 152 Mbps RX.
After 3 minute uptime
eth0 Link encap:Ethernet HWaddr 00:18:F8:0B:46:A6
inet addr:192.168.20.10 Bcast:0.0.0.0 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:9510755 errors:0 dropped:400 overruns:0 frame:0
TX packets:9601889 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:10000
RX bytes:3768549053 (3.5 GiB) TX bytes:2251698126 (2.0 GiB)
Interrupt:21 Base address:0x4000
MegaRouter-KARAM ~ # ethtool -S eth0
NIC statistics:
tx_packets: 10336831
rx_packets: 10191781
tx_errors: 0
rx_errors: 0
rx_missed: 436
align_errors: 0
tx_single_collisions: 0
tx_multi_collisions: 0
unicast: 10183249
broadcast: 971
multicast: 7561
tx_aborted: 0
tx_underrun: 0
MegaRouter-KARAM ~ # mpstat -P ALL 1
Linux 2.6.26-rc6-git2-build-0029 (MegaRouter-KARAM) 06/16/08
00:32:08 CPU %user %nice %sys %iowait %irq %soft %steal
%idle intr/s
00:32:09 all 0.50 0.00 1.49 0.00 1.49 27.23 0.00
69.31 76659.41
00:32:09 0 1.01 0.00 0.00 0.00 0.00 43.43 0.00
55.56 61549.50
00:32:09 1 0.00 0.00 1.98 0.00 2.97 10.89 0.00
84.16 15102.97
00:32:09 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
--
Very sorry, forgot dmesg [ 3.070955] r8169 Gigabit Ethernet driver 2.2LK-NAPI loaded [ 3.070972] ACPI: PCI Interrupt 0000:07:00.0[A] -> GSI 21 (level, low) -> IRQ 21 [ 3.071582] eth0: RTL8110s at 0xf8894000, 00:18:f8:0b:46:a6, XID 04000000 IRQ 21 -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. --
On Mon, 16 Jun 2008 00:46:22 +0100 You are CPU limited because of the overhead of firewalling. When this happens No if the receive side is CPU limited, you just end up eating more memory. How are you measuring CPU? You need to do something like measure the available cycles left for applications. Don't believe top or other measures that may Routing and firewalling should scale well. The deadlock is probably going The bigger issues is available memory bandwidth. Different processors and busses have different overheads. PCI is much worse than PCI-express, and CPU's with integrated memory controllers do much better than CPU's with separate memory controller (like Core 2). --
I tried to increase net.core.netdev_max_backlog, it doesn't help, and it doesn't change anything at all. But it looks like: if i have 200Mbps RX, with average packet 500 bytes, i have 50Kpps rate. RX descriptor is 256 packets, each 1ms passed 50 packets. If poll just more than late then 5ms, i miss packets. Or if it doesn't complete all packets in one softirq cycle. Probably i understand something (or everything) wrong. But firewalling must be not a big deal, since i am not using anything "heavy" like L7 filtering. But i will try to optimize rules, like i did once with u32 hash... so most of packets will not pass "long chain". And, there is around 29 Thats very good idea. e1000 / AMD - cache size : 1024 KB Probably mpstat gives correct results? I never use top, other that to find clear CPU hog userspace app. Router-Dora ~ # mpstat 1 Linux 2.6.26-rc6-git2-build-0029 (Router-Dora) 06/16/08 06:31:19 CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s 06:31:20 all 0.00 0.00 0.00 0.00 1.51 8.04 0.00 90.45 13570.30 06:31:21 all 0.00 0.00 0.00 0.00 2.49 9.95 0.00 87.56 13986.00 06:31:22 all 0.00 0.00 0.50 0.00 2.49 9.45 0.00 I tried to change tx queue length. If i make it too much small, it will just drop packets _silently_. Will not be shown on netstat -s, nor ifconfig stats. I think in this case r8169 is routed over PCI-PCIExpress bridge, other card is PCIExpress, nothing else on PCI, other than IDE controller which not used at all). Yes, it is bad, but still must be 133 Mbyte/s (1064 Mbit/s). Yes i know Well, realtek 8169 doesn't support changing ring, and doesn't support changing coalesce parameters. By the way e1000 also doesn't support -C, but e1000e Yes, but in my case Core 2 do heavier job much better, probably because of larger cache or some voodoo magic. The biggest issue, in this country it is not possible to find PCI-Express network ...
