ethernet got wierd after 2.6.29

Submitted by Anonymous
on June 19, 2009 - 2:42pm

Hello,

Sorry, I'm not a low-level programmer so I'm kinda lost on what's happening...

I updated the kernel of a linux router to 2.6.29 today and I got this dump a few minutes after the server boot.

Jun 19 14:20:34 firewall64 ------------[ cut here ]------------
Jun 19 14:20:34 firewall64 WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x122/0x1ca()
Jun 19 14:20:34 firewall64 Hardware name:
Jun 19 14:20:34 firewall64 NETDEV WATCHDOG: eth0 (r8169): transmit timed out
Jun 19 14:20:34 firewall64 Modules linked in: nf_nat_ftp nf_conntrack_ftp xt_limit xt_multiport xt_pkttype xt_state xt_DSCP xt_TCPMSS xt_tcpmss xt_tcpudp cls_route cls_u32 cls_fw sch_sfq sch_htb ipt_LOG ipt_MASQUERADE ipt_REDIRECT ipt_REJECT iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables ipv6 usbhid ata_piix ehci_hcd 8139cp uhci_hcd r8169 8139too 3c59x libata usbcore mii thermal processor button
Jun 19 14:20:34 firewall64 Pid: 0, comm: swapper Not tainted 2.6.29-gentoo-r5 #2
Jun 19 14:20:34 firewall64 Call Trace:
Jun 19 14:20:34 firewall64 [] warn_slowpath+0xd3/0x10f
Jun 19 14:20:34 firewall64 [] ? dev_queue_xmit+0x37a/0x46d
Jun 19 14:20:34 firewall64 [] ? ip_finish_output+0x212/0x254
Jun 19 14:20:34 firewall64 [] ? ip_output+0x9b/0xa0
Jun 19 14:20:34 firewall64 [] ? ip_forward_finish+0x3d/0x41
Jun 19 14:20:34 firewall64 [] ? ip_forward+0x2d4/0x343
Jun 19 14:20:34 firewall64 [] ? ip_rcv_finish+0x2e1/0x2fb
Jun 19 14:20:34 firewall64 [] ? ip_rcv+0x263/0x299
Jun 19 14:20:34 firewall64 [] ? _spin_unlock_bh+0xf/0x11
Jun 19 14:20:34 firewall64 [] ? nf_nat_cleanup_conntrack+0x61/0x65 [nf_nat]
Jun 19 14:20:34 firewall64 [] ? death_by_timeout+0x0/0xc3 [nf_conntrack]
Jun 19 14:20:34 firewall64 [] dev_watchdog+0x122/0x1ca
Jun 19 14:20:34 firewall64 [] ? destroy_conntrack+0xfa/0x101 [nf_conntrack]
Jun 19 14:20:34 firewall64 [] ? dev_watchdog+0x0/0x1ca
Jun 19 14:20:34 firewall64 [] run_timer_softirq+0x157/0x1c6
Jun 19 14:20:34 firewall64 [] ? ktime_get_ts+0x49/0x4e
Jun 19 14:20:34 firewall64 [] ? clockevents_program_event+0x77/0x80
Jun 19 14:20:34 firewall64 [] __do_softirq+0x83/0x123
Jun 19 14:20:34 firewall64 [] call_softirq+0x1c/0x28
Jun 19 14:20:34 firewall64 [] do_softirq+0x34/0x76
Jun 19 14:20:34 firewall64 [] irq_exit+0x3f/0x79
Jun 19 14:20:34 firewall64 [] smp_apic_timer_interrupt+0x93/0xac
Jun 19 14:20:34 firewall64 [] apic_timer_interrupt+0x13/0x20
Jun 19 14:20:34 firewall64 [] ? dequeue_task+0xad/0xb8
Jun 19 14:20:34 firewall64 [] ? mwait_idle+0x6e/0x73
Jun 19 14:20:34 firewall64 [] ? enter_idle+0x22/0x24
Jun 19 14:20:34 firewall64 [] ? cpu_idle+0x52/0x93
Jun 19 14:20:34 firewall64 [] ? rest_init+0x66/0x68
Jun 19 14:20:34 firewall64 ---[ end trace 3629ea280831ca2c ]---

After this dump, periodically I get this log on my /var/log/messages
Jun 19 15:20:27 firewall64 r8169: eth0: link up
Jun 19 15:22:15 firewall64 r8169: eth0: link up
Jun 19 15:24:15 firewall64 r8169: eth0: link up
Jun 19 15:25:39 firewall64 r8169: eth0: link up
Jun 19 15:32:33 firewall64 r8169: eth0: link up
Jun 19 15:38:57 firewall64 r8169: eth0: link up

Which, unfortunately also results in packet loss:
15:38:45 - 64 bytes from intelig (201.12.218.61): icmp_seq=1 ttl=64 time=17.1 ms
15:38:48 - Timeout
15:38:50 - Timeout
15:38:52 - Timeout
15:38:54 - Timeout
15:38:56 - Timeout
15:38:57 - 64 bytes from intelig (201.12.218.61): icmp_seq=1 ttl=64 time=19.3 ms

Having all these problems isn't really good for a router that has more than 2k customers behind... Anyone could give me any hint of what might have happened so I can try and fix it? The server working flawlessly before this update.

Thanks in advance

Mark

Just to add a little bit

Anonymous (not verified)
on
June 19, 2009 - 3:25pm

Just to add a little bit more of information:
My /proc/interrupts looks like this on 2.6.29:
CPU0 CPU1
16: 3003508 0 IO-APIC-fasteoi uhci_hcd:usb4, eth1
17: 0 0 IO-APIC-fasteoi eth2
25: 1665093 1517188 PCI-MSI-edge eth0

But on 2.6.27 it's like this (I just returned to 2.6.27 and it's working as intended)
16: 138118 2195831 IO-APIC-fasteoi uhci_hcd:usb4, eth0, eth1
17: 0 0 IO-APIC-fasteoi eth2

As you can see, both eth0 and eth1 share IRQs on this kernel version, not allowing me to set affinity to another CPU, which is the main goal I need to achieve.

Two conflicting modules loaded

on
June 19, 2009 - 11:43pm

8139cp (which supports "C" chipset) and 8139too (the generic module) are both loaded, and I assume there are not multiple 8139 NIC's of different types in your machine. In general one of the modules works and one does not for a given Realtek 8139 chipset. The reason both are loaded is that both modules claim they can support your NIC's PCI id, and one of them is not truthful.

Perhaps try something like:

rmmod 8139too
rmmod 8139cp
modprobe 8139cp
/etc/init.d/network restart

to first figure out which of the two modules actually supports your hardware. You can also do "lspci -vvv" to try to figure out if you have a "C" chipset or not.

he uses 1 GB card, not the 100 Mbit one

mangoo (not verified)
on
June 22, 2009 - 4:30am

he uses 1 GB card (r8169 module), not the 100 Mbit one (8139* modules). No your instructions are incorrect.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.