Oleksandr Samoylyk wrote:
quoted text > Dear community,
>
> I'm running a busy pptp-server which serves about 1500 users on about 80
> Mbit/s (> 15000 packets/sec) through it. I use poptop, traffic shaper
> and basic firewall rules.
>
> As for hardware it's a server on SuperMicro Server on X7DB8+ serverboard
> server with 8 GB RAM on it running Ubuntu Server Edition.
>
> As for problem, sometimes I get the following erors in logs:
> Mar 3 15:21:14 vpn kernel: [ 0.000000] BUG: soft lockup - CPU#7
> stuck for 11s! [events/7:34]
> Mar 3 15:21:14 vpn kernel: [ 0.000000] CPU 7:
> Mar 3 15:21:14 vpn kernel: [ 0.000000] Modules linked in: af_packet
> xt_tcpmss iptable_mangle act_police sch_ingress cls_u32 sch_sfq sch_htb
> e1000 xt_multiport xt_TCPMSS xt_state xt_limit xt_tcpudp
> nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_irc nf_conntrack_irc
> nf_nat_ftp nf_conntrack_ftp iptable_nat nf_nat nf_conntrack_ipv4
> nf_conntrack e1000e iptable_filter ip_tables x_tables tcp_cubic pptp
> pppox ppp_generic slhc parport_pc lp parport loop iTCO_wdt
> iTCO_vendor_support pcspkr shpchp i5000_edac pci_hotplug evdev edac_core
> ext3 jbd mbcache sg sr_mod cdrom sd_mod ata_piix pata_acpi aic94xx
> libsas ata_generic libata scsi_transport_sas scsi_mod fuse
> Mar 3 15:21:14 vpn kernel: [ 0.000000] Pid: 34, comm: events/7 Not
> tainted 2.6.24-11-server #1
> Mar 3 15:21:14 vpn kernel: [ 0.000000] RIP:
> 0010:[libata:_spin_unlock_irqrestore+0x8/0x26222]
> [libata:_spin_unlock_irqrestore+0x8/0x26222]
> _spin_unlock_irqrestore+0x8/0x10
> Mar 3 15:21:14 vpn kernel: [ 0.000000] RSP: 0018:ffff810225a779a0
> EFLAGS: 00000246
> Mar 3 15:21:14 vpn kernel: [ 0.000000] RAX: 0000000000000000 RBX:
> ffffc20001a1ab80 RCX: 0000000000000000
> Mar 3 15:21:14 vpn kernel: [ 0.000000] RDX: 0000000000000014 RSI:
> 0000000000000246 RDI: ffff8102249d5aa8
> Mar 3 15:21:14 vpn kernel: [ 0.000000] RBP: ffff810225a77920 R08:
> 0000000000001000 R09: 0000000002000000
> Mar 3 15:21:14 vpn kernel: [ 0.000000] R10: 0000000000000000 R11:
> ffff81021ac01300 R12: ffffffff8020cfb6
> Mar 3 15:21:14 vpn kernel: [ 0.000000] R13: 0000000000000000 R14:
> ffff8102249d5a80 R15: 0000000000000010
> Mar 3 15:21:14 vpn kernel: [ 0.000000] FS: 0000000000000000(0000)
> GS:ffff81022590cd00(0000) knlGS:0000000000000000
> Mar 3 15:21:14 vpn kernel: [ 0.000000] CS: 0010 DS: 0018 ES: 0018
> CR0: 000000008005003b
> Mar 3 15:21:14 vpn kernel: [ 0.000000] CR2: 00007fa2778f7315 CR3:
> 0000000000201000 CR4: 00000000000006e0
> Mar 3 15:21:14 vpn kernel: [ 0.000000] DR0: 0000000000000000 DR1:
> 0000000000000000 DR2: 0000000000000000
> Mar 3 15:21:14 vpn kernel: [ 0.000000] DR3: 0000000000000000 DR6:
> 00000000ffff0ff0 DR7: 0000000000000400
> Mar 3 15:21:14 vpn kernel: [ 0.000000]
> Mar 3 15:21:14 vpn kernel: [ 0.000000] Call Trace:
> Mar 3 15:21:14 vpn kernel: [ 0.000000] <IRQ> [<ffffffff8825ab1c>]
> :e1000:e1000_xmit_frame+0x8fc/0xcb0
> Mar 3 15:21:14 vpn kernel: [ 0.000000]
> [e1000:____versions+0x56f3/0x7340]
> :nf_conntrack_ipv4:ipv4_confirm+0x33/0x60
> Mar 3 15:21:14 vpn kernel: [ 0.000000]
> [dev_hard_start_xmit+0x22c/0x2a0] dev_hard_start_xmit+0x22c/0x2a0
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [__qdisc_run+0x74/0x1f0]
> __qdisc_run+0x74/0x1f0
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [dev_queue_xmit+0x1e8/0x300]
> dev_queue_xmit+0x1e8/0x300
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [pptp:pptp_xmit+0x567/0x7e0]
> :pptp:pptp_xmit+0x567/0x7e0
> Mar 3 15:21:14 vpn kernel: [ 0.000000]
> [ppp_generic:ppp_push+0x45c/0x620] :ppp_generic:ppp_push+0x45c/0x620
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [<ffffffff8827b235>]
> :sch_htb:htb_safe_rb_erase+0x15/0x50
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [<ffffffff8827b635>]
> :sch_htb:htb_deactivate_prios+0x1a5/0x250
> Mar 3 15:21:14 vpn kernel: [ 0.000000]
> [ppp_generic:ppp_xmit_process+0x469/0x640]
> :ppp_generic:ppp_xmit_process+0x469/0x640
> Mar 3 15:21:14 vpn kernel: [ 0.000000]
> [ppp_generic:ppp_start_xmit+0x151/0x1e0]
> :ppp_generic:ppp_start_xmit+0x151/0x1e0
> Mar 3 15:21:14 vpn kernel: [ 0.000000]
> [dev_hard_start_xmit+0x22c/0x2a0] dev_hard_start_xmit+0x22c/0x2a0
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [__qdisc_run+0x74/0x1f0]
> __qdisc_run+0x74/0x1f0
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [dev_queue_xmit+0x1e8/0x300]
> dev_queue_xmit+0x1e8/0x300
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [ip_rcv_finish+0x114/0x3b0]
> ip_rcv_finish+0x114/0x3b0
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [ip_rcv+0x212/0x300]
> ip_rcv+0x212/0x300
> Mar 3 15:21:14 vpn kernel: [ 0.000000]
> [e1000:netif_receive_skb+0x3ac/0xd80] netif_receive_skb+0x3ac/0x490
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [<ffffffff8825c7a4>]
> :e1000:e1000_clean_rx_irq+0x1c4/0x590
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [<ffffffff88259609>]
> :e1000:e1000_clean+0x69/0x250
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [net_rx_action+0x128/0x230]
> net_rx_action+0x128/0x230
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [__do_softirq+0x75/0xe0]
> __do_softirq+0x75/0xe0
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [call_softirq+0x1c/0x30]
> call_softirq+0x1c/0x30
> Mar 3 15:21:14 vpn kernel: [ 0.000000] <EOI>
> [pptp:_ack_timeout_work+0x0/0x20] :pptp:_ack_timeout_work+0x0/0x20
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [do_softirq+0x35/0x90]
> do_softirq+0x35/0x90
> Mar 3 15:21:14 vpn kernel: [ 0.000000]
> [local_bh_enable_ip+0x54/0x60] local_bh_enable_ip+0x54/0x60
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [run_workqueue+0xcc/0x170]
> run_workqueue+0xcc/0x170
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [worker_thread+0x0/0x110]
> worker_thread+0x0/0x110
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [worker_thread+0x0/0x110]
> worker_thread+0x0/0x110
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [worker_thread+0xa3/0x110]
> worker_thread+0xa3/0x110
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [<ffffffff802543d0>]
> autoremove_wake_function+0x0/0x30
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [worker_thread+0x0/0x110]
> worker_thread+0x0/0x110
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [worker_thread+0x0/0x110]
> worker_thread+0x0/0x110
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [kthread+0x4b/0x80]
> kthread+0x4b/0x80
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [child_rip+0xa/0x12]
> child_rip+0xa/0x12
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [kthread+0x0/0x80]
> kthread+0x0/0x80
> Mar 3 15:21:14 vpn kernel: [ 0.000000] [child_rip+0x0/0x12]
> child_rip+0x0/0x12
> Mar 3 15:21:14 vpn kernel: [ 0.000000]
>
> One of my CPUs goes 100% with ksoftirqd.
sysreq-t?
quoted text > Latency increases signifcantly during this this (more than >100ms).
>
>
> ethtool -G eth0 tx 4096 rx 4096
> ethtool -G eth1 tx 4096 rx 4096
unless you're not seeing rx_no_buffer_count increase in the `ethtool -S ethx`
increase this is unnecesary
quoted text > echo "8192" > /proc/sys/vm/min_free_kbytes
ditto
quoted text > I attached IRQ to different CPUs by:
> /bin/echo "0f" > /proc/irq/2297/smp_affinity
> /bin/echo "f0" > /proc/irq/2296/smp_affinity
>
> root@vpn:~# cat /proc/interrupts
> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
> CPU6 CPU7
> 0: 72 0 0 0 0 0
> 0 0 XT-PIC-XT timer
> 1: 2 0 0 0 0 0
> 0 0 XT-PIC-XT i8042
> 2: 0 0 0 0 0 0
> 0 0 XT-PIC-XT cascade
> 8: 1 0 0 0 0 0
> 0 0 XT-PIC-XT rtc
> 11: 9117 0 0 0 0 0
> 0 0 XT-PIC-XT aic94xx
> 14: 84 0 0 0 0 0
> 0 0 XT-PIC-XT libata
> 15: 0 0 0 0 0 0
> 0 0 XT-PIC-XT libata
> 2296: 76853 76976 76639 76692 248073 247957
> 248079 248084 PCI-MSI-edge eth1
> 2297: 266795 266672 267009 266956 93920 94035
> 93914 93908 PCI-MSI-edge eth0
your cpu is round-robinning network interrupts which is bad, turn the in-kernel
irqbalance feature OFF and use the userspace irqbalance tool instead
quoted text > ifconfig eth0 txqueuelen 1000
> ifconfig eth1 txqueuelen 1000
>
> I wanted to switch to e1000e driver instead of e1000 as I've in kernel:
>
> CONFIG_E1000=m
> CONFIG_E1000E=m
> # CONFIG_E1000_DISABLE_PACKET_SPLIT is not set
> CONFIG_E1000_NAPI=y
>
> I'm trying to load it by:
> modprobe -r e1000; modprobe e1000e; /etc/init.d/networking restart
>
> but NICs aren't detected :(
>
> Why so?
because that device is not going to be supported by e1000e until 2.6.25. you're
using 2.6.24 so you need to use e1000 just one kernel release longer.
quoted text > I have (lspci -vvv output):
>
> 06:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
> Ethernet Controller (Copper) (rev 01)
> Subsystem: Super Micro Computer Inc Unknown device 0000
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr+ Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0, Cache Line Size: 32 bytes
> Interrupt: pin A routed to IRQ 2297
> Region 0: Memory at d8300000 (32-bit, non-prefetchable) [size=128K]
> Region 2: I/O ports at 3000 [size=32]
> Capabilities: [c8] Power Management version 2
> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
> Status: D0 PME-Enable- DSel=0 DScale=1 PME-
> Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+
> Queue=0/0 Enable+
> Address: 00000000fee0f00c Data: 41b9
> Capabilities: [e0] Express Endpoint IRQ 0
> Device: Supported: MaxPayload 256 bytes, PhantFunc 0,
> ExtTag-
> Device: Latency L0s <512ns, L1 <64us
> Device: AtnBtn- AtnInd- PwrInd-
> Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
> Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
> Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
> Link: Supported Speed 2.5Gb/s, Width x4, ASPM unknown,
> Port 0
> Link: Latency L0s <128ns, L1 <64us
> Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
> Link: Speed 2.5Gb/s, Width x4
>
> 06:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
> Ethernet Controller (Copper) (rev 01)
> Subsystem: Super Micro Computer Inc Unknown device 0000
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr+ Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0, Cache Line Size: 32 bytes
> Interrupt: pin B routed to IRQ 2296
> Region 0: Memory at d8320000 (32-bit, non-prefetchable) [size=128K]
> Region 2: I/O ports at 3020 [size=32]
> Capabilities: [c8] Power Management version 2
> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
> Status: D0 PME-Enable- DSel=0 DScale=1 PME-
> Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+
> Queue=0/0 Enable+
> Address: 00000000feef000c Data: 41c1
> Capabilities: [e0] Express Endpoint IRQ 0
> Device: Supported: MaxPayload 256 bytes, PhantFunc 0,
> ExtTag-
> Device: Latency L0s <512ns, L1 <64us
> Device: AtnBtn- AtnInd- PwrInd-
> Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
> Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
> Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
> Link: Supported Speed 2.5Gb/s, Width x4, ASPM unknown,
> Port 0
> Link: Latency L0s <128ns, L1 <64us
> Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
> Link: Speed 2.5Gb/s, Width x4
>
> So I still have to use e1000 with InterruptThrottleRate=0,0
> RxIntDelay=0,0 RxAbsIntDelay=0,0 TxIntDelay=0,0 TxAbsIntDelay=0,0:
just use the defaults for e1000 please, unless those defaults don't work - and
then it's a bug.
I don't think e1000 is the issue here, I've definately not seen any reports of
e1000+pptp giving issues. perhaps one of the pptp/pppd developers can look at the
stack trace.
Auke
--
unsubscribe notice To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
majordomo@vger.kernel.org
More majordomo info at
http://vger.kernel.org/majordomo-info.html
Please read the FAQ at
http://www.tux.org/lkml/
Messages in current thread:
Re: e1000 latency problem or what ? , Kok, Auke , (Mon Mar 3, 5:48 pm)