After upgrading from 2.6.29.1 to 2.6.30.4, I'm getting these warnings in dmesg. Let me know if you need more info. Other than that, the device seems to work stable. ------------[ cut here ]------------ WARNING: at net/ipv4/af_inet.c:155 inet_sock_destruct+0x122/0x13a() Hardware name: Altos G510 Modules linked in: nfs nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs tun bitrev crc32 bonding lm75 adm9240 adm1026 hwmon_vid hwmon i2c_piix4 i2c_core dm_crypt dm_mirror dm_region_hash dm_log dm_snapshot dm_mod e1000 tg3 libphy Pid: 2222, comm: openvpn Not tainted 2.6.30.4-1 #4 Call Trace: [<4011f018>] ? warn_slowpath_common+0x5e/0x8a [<4011f04e>] ? warn_slowpath_null+0xa/0xc [<402b5d64>] ? inet_sock_destruct+0x122/0x13a [<402796a4>] ? sk_free+0x10/0xa7 [<402b5964>] ? inet_release+0x3f/0x44 [<4027751b>] ? sock_release+0x11/0x52 [<40277575>] ? sock_close+0x19/0x1c [<40164026>] ? __fput+0xa6/0x149 [<40161989>] ? filp_close+0x4e/0x54 [<401619f5>] ? sys_close+0x66/0x9c [<401027c8>] ? sysenter_do_call+0x12/0x26 ---[ end trace bdfe445acbab5307 ]--- ------------[ cut here ]------------ WARNING: at net/ipv4/af_inet.c:155 inet_sock_destruct+0x122/0x13a() Hardware name: Altos G510 Modules linked in: nfs nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs tun bitrev crc32 bonding lm75 adm9240 adm1026 hwmon_vid hwmon i2c_piix4 i2c_core dm_crypt dm_mirror dm_region_hash dm_log dm_snapshot dm_mod e1000 tg3 libphy Pid: 2222, comm: openvpn Tainted: G W 2.6.30.4-1 #4 Call Trace: [<4011f018>] ? warn_slowpath_common+0x5e/0x8a [<4011f04e>] ? warn_slowpath_null+0xa/0xc [<402b5d64>] ? inet_sock_destruct+0x122/0x13a [<402796a4>] ? sk_free+0x10/0xa7 [<402b5964>] ? inet_release+0x3f/0x44 [<4027751b>] ? sock_release+0x11/0x52 [<40277575>] ? sock_close+0x19/0x1c [<40164026>] ? __fput+0xa6/0x149 [<40161989>] ? filp_close+0x4e/0x54 [<401619f5>] ? sys_close+0x66/0x9c [<401027c8>] ? sysenter_do_call+0x12/0x26 ---[ end trace bdfe445acbab5308 ]--- ------------[ cut here ...
Thanks for the report, Tomasz. There's a good chance e51a67a9c8a2ea5c563f8c2ba6613fe2100ffe67 from the current mainline will fix this problem. Dave, Eric's fix might be a candidate for -stable. The symptom is usually a WARN, but the impact is significant. -- John --
From: John Dykstra <john.dykstra1@gmail.com> Hmmm, I'll double-check. I thought I had submitted this one. Thanks for the heads up. --
Hmm, I dont see how this patch could solve Tomasz case... Since commit 2b85a34e911bf483c27cfdd124aeb1605145dc80 (net: No more expensive sock_hold()/sock_put() on each tx) was not part of 2.6.30.4 AFAIK This is the WARN_ON(sk->sk_forward_alloc) that triggers... Sounds like a truesize mismatch rather than a sk_refcount one ? --
BTW, I've seen the same issue in 2.6.28 and 2.6.29 while doing a bunch of NFS-over-UDP testing. I've seen the issue reported in 2.6.27 as well, but it went by ignored. It's not easy to reproduce as it seems like it requires quite a bit traffic over over multiple interfaces. I've been looking at this for a while and haven't caught the bugger. Here is the stack trace from 2.6.28: May 13 16:17:38 dl380g6-2 kernel: [ 4473.086015] ------------[ cut here ]------- ----- May 13 16:17:38 dl380g6-2 kernel: [ 4473.086017] WARNING: at net/ipv4/af_inet.c: 155 inet_sock_destruct+0x15d/0x182() May 13 16:17:38 dl380g6-2 kernel: [ 4473.086019] Modules linked in: sctp libcrc32c sg edd nfsd auth_rpcgss exportfs nfs lockd nfs_acl sunrpc deflate zlib_deflate ctr twofish twofish_common camellia serpent blowfish des_generic cbc aes_x86_64 aes_generic xcbc rmd160 sha256_generic sha1_generic crypto_null af_key loop serio_raw psmouse hpilo shpchp pci_hotplug container button evdev ext3 jbd mbcache ses enclosure sd_mod crc_t10dif usbhid hid ehci_hcd uhci_hcd mptsas mptscsih mptbase scsi_transport_sas bnx2 zlib_inflate cciss scsi_mod thermal processor fan thermal_sys [last unloaded: ipmi_msghandler] May 13 16:17:38 dl380g6-2 kernel: [ 4473.086053] Pid: 4570, comm: nfsd Not tainted 2.6.28-clim-9-amd64 #1 May 13 16:17:38 dl380g6-2 kernel: [ 4473.086055] Call Trace: May 13 16:17:38 dl380g6-2 kernel: [ 4473.086062] [<ffffffff8024307f>] warn_on_slowpath+0x58/0x7d May 13 16:17:38 dl380g6-2 kernel: [ 4473.086066] [<ffffffff804b5ada>] ? _spin_unlock_irq+0x1c/0x35 May 13 16:17:38 dl380g6-2 kernel: [ 4473.086069] [<ffffffff8024813f>] ? local_bh_disable+0xe/0x10 May 13 16:17:38 dl380g6-2 kernel: [ 4473.086072] [<ffffffff804b58af>] ? _spin_lock_bh+0x23/0x29 May 13 16:17:38 dl380g6-2 kernel: [ 4473.086074] [<ffffffff8024826a>] ? local_bh_enable+0x88/0xa1 May 13 16:17:38 dl380g6-2 kernel: [ 4473.086076] [<ffffffff8024813f>] ? local_bh_disable+0xe/0x10 May 13 16:17:38 dl380g6-2 kernel: [ 4473.086078] ...
I've been unable to reproduce it so far. Has bonding always been present in the cases you've seen, or are multiple independent interfaces sufficient? In the case you reported initially, openvpn was using UDP, but the peer was dead, so there presumably wasn't much traffic from that app. Was there lots of NFS-over-UDP traffic also going on? Where was the independent report on 2.6.27? -- John --
The openvpn case wasn't mine. I didn't use any vpn traffic. Just 2 systems back-to-back with NFS traffic between them. http://article.gmane.org/gmane.linux.nfs/22887 http://kerneltrap.org/mailarchive/linux-netdev/2008/11/26/4244994 --
There was quite a bit of NFS, but over TCP. The other type of traffic was iSCSI (made with tgt as a target). -- Tomasz Chmielewski http://wpkg.org --
Is this an openvpn server or client? Does the warning fire on each authentication, or while the tunnel is up? Are you tunneling over UDP or TCP? Your kernel config, network configuration details and openvpn configuration might be useful. Thanks! -- John --
Hmm, neither?
openvpn --remote 192.168.111.164 --dev tun --ifconfig 192.168.3.52 192.168.3.51 --verb 1 --comp-lzo \
--resolv-retry 999999 --ping-restart 120 --ping 6 --port 5123 --fragment 1400 --mssfix 1400 \
It was using UDP.
I killed it (openvpn) now, as I don't need it anymore.
bonding, round-robin mode, miimon 100.
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.30.4
# Sat Aug 1 20:12:19 2009
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
# CONFIG_X86_64 is not set
CONFIG_X86=y
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
# CONFIG_GENERIC_TIME_VSYSCALL is not set
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_HAVE_DYNAMIC_PER_CPU_AREA=y
# CONFIG_HAVE_CPUMASK_OF_CPU_MAP is not set
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
# CONFIG_ZONE_DMA32 is not set
CONFIG_ARCH_POPULATES_NODE_MAP=y
# CONFIG_AUDIT_ARCH is not set
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
CONFIG_X86_32_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_TRAMPOLINE=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_KTIME_SCALAR=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
#
# General ...