Re: WARNING: at net/ipv4/af_inet.c:155 inet_sock_destruct+0x122/0x13a()

Previous thread: [Patch net-next]atl11:set MAX_TX_OFFLOAD_THRESH to 6k from 9k and fix spelling error by jie.yang on Sunday, August 2, 2009 - 10:43 pm. (2 messages)

Next thread: reg IFF_RUNNING behaviour by durgam@it.iitb.ac.in phani on Monday, August 3, 2009 - 2:26 am. (1 message)
From: Tomasz Chmielewski
Date: Monday, August 3, 2009 - 1:30 am

After upgrading from 2.6.29.1 to 2.6.30.4, I'm getting these warnings in dmesg.

Let me know if you need more info.

Other than that, the device seems to work stable.

------------[ cut here ]------------
WARNING: at net/ipv4/af_inet.c:155 inet_sock_destruct+0x122/0x13a()
Hardware name: Altos G510
Modules linked in: nfs nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs tun bitrev crc32 bonding lm75 adm9240 adm1026 hwmon_vid hwmon i2c_piix4 i2c_core dm_crypt dm_mirror dm_region_hash dm_log dm_snapshot dm_mod e1000 tg3 libphy
Pid: 2222, comm: openvpn Not tainted 2.6.30.4-1 #4
Call Trace:
 [<4011f018>] ? warn_slowpath_common+0x5e/0x8a
 [<4011f04e>] ? warn_slowpath_null+0xa/0xc
 [<402b5d64>] ? inet_sock_destruct+0x122/0x13a
 [<402796a4>] ? sk_free+0x10/0xa7
 [<402b5964>] ? inet_release+0x3f/0x44
 [<4027751b>] ? sock_release+0x11/0x52
 [<40277575>] ? sock_close+0x19/0x1c
 [<40164026>] ? __fput+0xa6/0x149
 [<40161989>] ? filp_close+0x4e/0x54
 [<401619f5>] ? sys_close+0x66/0x9c
 [<401027c8>] ? sysenter_do_call+0x12/0x26
---[ end trace bdfe445acbab5307 ]---
------------[ cut here ]------------
WARNING: at net/ipv4/af_inet.c:155 inet_sock_destruct+0x122/0x13a()
Hardware name: Altos G510
Modules linked in: nfs nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs tun bitrev crc32 bonding lm75 adm9240 adm1026 hwmon_vid hwmon i2c_piix4 i2c_core dm_crypt dm_mirror dm_region_hash dm_log dm_snapshot dm_mod e1000 tg3 libphy
Pid: 2222, comm: openvpn Tainted: G        W  2.6.30.4-1 #4
Call Trace:
 [<4011f018>] ? warn_slowpath_common+0x5e/0x8a
 [<4011f04e>] ? warn_slowpath_null+0xa/0xc
 [<402b5d64>] ? inet_sock_destruct+0x122/0x13a
 [<402796a4>] ? sk_free+0x10/0xa7
 [<402b5964>] ? inet_release+0x3f/0x44
 [<4027751b>] ? sock_release+0x11/0x52
 [<40277575>] ? sock_close+0x19/0x1c
 [<40164026>] ? __fput+0xa6/0x149
 [<40161989>] ? filp_close+0x4e/0x54
 [<401619f5>] ? sys_close+0x66/0x9c
 [<401027c8>] ? sysenter_do_call+0x12/0x26
---[ end trace bdfe445acbab5308 ]---
------------[ cut here ...
From: John Dykstra
Date: Monday, August 3, 2009 - 5:38 pm

Thanks for the report, Tomasz.  

There's a good chance e51a67a9c8a2ea5c563f8c2ba6613fe2100ffe67 from the
current mainline will fix this problem.

Dave, Eric's fix might be a candidate for -stable.  The symptom is
usually a WARN, but the impact is significant.

  --  John

--

From: David Miller
Date: Monday, August 3, 2009 - 9:20 pm

From: John Dykstra <john.dykstra1@gmail.com>

Hmmm, I'll double-check.  I thought I had submitted this one.

Thanks for the heads up.
--

From: Eric Dumazet
Date: Monday, August 3, 2009 - 11:18 pm

Hmm, I dont see how this patch could solve Tomasz case...
Since commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
was not part of 2.6.30.4 AFAIK

This is the WARN_ON(sk->sk_forward_alloc) that triggers...

Sounds like a truesize mismatch rather than a sk_refcount one ?
--

From: Vlad Yasevich
Date: Wednesday, August 12, 2009 - 1:00 pm

BTW, I've seen the same issue in 2.6.28 and 2.6.29 while doing a bunch
of NFS-over-UDP testing.  I've seen the issue reported in 2.6.27 as well,
but it went by ignored.  It's not easy to reproduce as it seems like it
requires quite a bit traffic over over multiple interfaces.

I've been looking at this for a while and haven't caught the bugger.

Here is the stack trace from 2.6.28:

May 13 16:17:38 dl380g6-2 kernel: [ 4473.086015] ------------[ cut here
]-------
-----
May 13 16:17:38 dl380g6-2 kernel: [ 4473.086017] WARNING: at
net/ipv4/af_inet.c:
155 inet_sock_destruct+0x15d/0x182()
May 13 16:17:38 dl380g6-2 kernel: [ 4473.086019] Modules linked in: sctp
libcrc32c sg edd nfsd auth_rpcgss exportfs nfs lockd nfs_acl sunrpc deflate
zlib_deflate ctr twofish twofish_common camellia serpent blowfish des_generic
cbc aes_x86_64 aes_generic xcbc rmd160 sha256_generic sha1_generic crypto_null
af_key loop serio_raw psmouse hpilo shpchp pci_hotplug container button evdev
ext3 jbd mbcache ses enclosure sd_mod crc_t10dif usbhid hid ehci_hcd uhci_hcd
mptsas mptscsih mptbase scsi_transport_sas bnx2 zlib_inflate cciss scsi_mod
thermal processor fan thermal_sys [last unloaded: ipmi_msghandler]
May 13 16:17:38 dl380g6-2 kernel: [ 4473.086053] Pid: 4570, comm: nfsd Not
tainted 2.6.28-clim-9-amd64 #1
May 13 16:17:38 dl380g6-2 kernel: [ 4473.086055] Call Trace:
May 13 16:17:38 dl380g6-2 kernel: [ 4473.086062]  [<ffffffff8024307f>]
warn_on_slowpath+0x58/0x7d
May 13 16:17:38 dl380g6-2 kernel: [ 4473.086066]  [<ffffffff804b5ada>] ?
_spin_unlock_irq+0x1c/0x35
May 13 16:17:38 dl380g6-2 kernel: [ 4473.086069]  [<ffffffff8024813f>] ?
local_bh_disable+0xe/0x10
May 13 16:17:38 dl380g6-2 kernel: [ 4473.086072]  [<ffffffff804b58af>] ?
_spin_lock_bh+0x23/0x29
May 13 16:17:38 dl380g6-2 kernel: [ 4473.086074]  [<ffffffff8024826a>] ?
local_bh_enable+0x88/0xa1
May 13 16:17:38 dl380g6-2 kernel: [ 4473.086076]  [<ffffffff8024813f>] ?
local_bh_disable+0xe/0x10
May 13 16:17:38 dl380g6-2 kernel: [ 4473.086078]  ...
From: John Dykstra
Date: Thursday, August 13, 2009 - 8:21 am

I've been unable to reproduce it so far.  Has bonding always been
present in the cases you've seen, or are multiple independent interfaces
sufficient?

In the case you reported initially, openvpn was using UDP, but the peer
was dead, so there presumably wasn't much traffic from that app.  Was
there lots of NFS-over-UDP traffic also going on?

Where was the independent report on 2.6.27?

  --  John

--

From: Vlad Yasevich
Date: Thursday, August 13, 2009 - 10:04 am

The openvpn case wasn't mine.  I didn't use any vpn traffic.  Just 2
systems back-to-back with NFS traffic between them.


http://article.gmane.org/gmane.linux.nfs/22887
http://kerneltrap.org/mailarchive/linux-netdev/2008/11/26/4244994


--

From: Tomasz Chmielewski
Date: Thursday, August 13, 2009 - 11:16 am

There was quite a bit of NFS, but over TCP.

The other type of traffic was iSCSI (made with tgt as a target).


-- 
Tomasz Chmielewski
http://wpkg.org
--

From: John Dykstra
Date: Tuesday, August 4, 2009 - 1:04 pm

Is this an openvpn server or client?  

Does the warning fire on each authentication, or while the tunnel is
up?  

Are you tunneling over UDP or TCP?

Your kernel config, network configuration details and openvpn
configuration might be useful.

Thanks!

  -- John

--

From: Tomasz Chmielewski
Date: Tuesday, August 4, 2009 - 1:35 pm

Hmm, neither?

openvpn --remote 192.168.111.164 --dev tun --ifconfig 192.168.3.52 192.168.3.51 --verb 1 --comp-lzo \
        --resolv-retry 999999 --ping-restart 120 --ping 6 --port 5123 --fragment 1400 --mssfix 1400 \


It was using UDP.
I killed it (openvpn) now, as I don't need it anymore.



bonding, round-robin mode, miimon 100.

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.30.4
# Sat Aug  1 20:12:19 2009
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
# CONFIG_X86_64 is not set
CONFIG_X86=y
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
# CONFIG_GENERIC_TIME_VSYSCALL is not set
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_HAVE_DYNAMIC_PER_CPU_AREA=y
# CONFIG_HAVE_CPUMASK_OF_CPU_MAP is not set
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
# CONFIG_ZONE_DMA32 is not set
CONFIG_ARCH_POPULATES_NODE_MAP=y
# CONFIG_AUDIT_ARCH is not set
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
CONFIG_X86_32_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_TRAMPOLINE=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_KTIME_SCALAR=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General ...
Previous thread: [Patch net-next]atl11:set MAX_TX_OFFLOAD_THRESH to 6k from 9k and fix spelling error by jie.yang on Sunday, August 2, 2009 - 10:43 pm. (2 messages)

Next thread: reg IFF_RUNNING behaviour by durgam@it.iitb.ac.in phani on Monday, August 3, 2009 - 2:26 am. (1 message)