Patrick McManus a écrit :I believe Ingo problems come on long lived sockets (were many bytes were exchanged between the peers), so I dont think DEFER_ACCEPT is the cullprit. I suggest to enable CONFIG_TIMER_STATS and to check timers, because /proc/net/tcp can display apparently large timer values when the timer is elapsed (jiffies > icsk->icsk_timeout) and jiffies_to_clock_t(timer_expires - jiffies) is then overflowing doing a multiply and a divide. On a 64bits server running linux-2.6.24-rc2, I can see *strange* timers values too in /proc/net/tcp, but not stuck TCP sessions. On 64 bits, these strange values have 1AD7F grep 1AD7F /proc/net/tcp | obfuscate_IP_and_ports 2017: local_peer remote_peer 03 00000000:00000000 01:1AD7F29ABBA 00000001 0 0 0 2 ffff81067e7520c0 2019: local_peer remote_peer 03 00000000:00000000 01:1AD7F29ABBA 00000003 0 0 0 2 ffff8106c580bcc0 2029: local_peer remote_peer 03 00000000:00000000 01:1AD7F29ABBA 00000002 0 0 0 2 ffff81067313fe40 2032: local_peer remote_peer 03 00000000:00000000 01:1AD7F29ABBA 00000003 0 0 0 2 ffff8106c716c340 2039: local_peer remote_peer 03 00000000:00000000 01:1AD7F29ABBA 00000002 0 0 0 2 ffff8107d45b3f40 2041: local_peer remote_peer 03 00000000:00000000 01:1AD7F29AB37 00000000 0 0 0 2 ffff810718e221c0 6610: local_peer remote_peer 01 00000000:00000000 00:1AD7F29ABCA 00000000 0 0 136594789 1 ffff8107183fb940 94 10 16 2 -1 9925: local_peer remote_peer 01 00000000:00000000 00:1AD7F29ABCA 00000000 0 0 144451161 1 ffff8107051a9840 351 10 0 2 -1 On TCP_SYN_RECV (03) sockets, timer can apparently be elapsed by many ticks, while on TCP_ESTABLISHED (01) one, I get jiffies_to_clock(-1) -> 1AD7F29ABCA value because the way get_tcp4_sock() is coded (jiffies can change while running this function). Note the 00: that means that no timer in my case. Running again the command one second later gives completely different results (other sockets are displayed) Maybe on 2.6.26-rc3+ we miss some timer correctness or we expose a latent NET bug. void sk_reset_timer(struct sock *sk, struct timer_list* timer, unsigned long expires) { if (!mod_timer(timer, expires)) sock_hold(sk); } Note that arming a timer also increase socket refcount and could explain why Ingo have sockets apparently not owned by a process but still referenced (by a timer or many ones (I see refcnt=5) on following snapshots) Just my initial thoughts, sorry I currently cannot spend much time to diagnose the problem. --
| Parag Warudkar | BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 010/196] Chinese: add translation of Codingstyle |
| Andrew Morton | -mm merge plans for 2.6.23 |
git: | |
| Gerrit Renker | [PATCH 24/37] dccp: Processing Confirm options |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Alexey Dobriyan | Re: [GIT]: Networking |
| david | Re: iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49 |
