Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: <linux-kernel@...>, <netdev@...>
Cc: Ingo Molnar <mingo@...>, David S. Miller <davem@...>, Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, Ilpo Järvinen <ilpo.jarvinen@...>
Date: Saturday, May 31, 2008 - 10:25 am

2008/5/28 Peter Zijlstra <peterz@infradead.org>:

Me too, however with a completely different scenario; my hung connections
are not related to distcc at all. The output from /proc/net/tcp that Ingo
posted a few days ago are somewhat different from mine, however I believe
this is the same problem or at least related. Just as Ingo experienced,
netstat -p only shows PID/program as '-' for the hung connections while
for other connections it shows the expected results.

I have recently bought a new PC and have started the process of copying
stuff from my old PC to the new PC. During this I have experienced this
hang several times. I started copying by using tar on both ends over a ssh
pipe but in order to eliminate possible ssh problems I also have tried tar
over a ttcp connection which also fails. There is no obvious pattern of
when this happens, I have experienced failures after transferring 1.15GB,
51.4GB and 23.6GB.

Here is the output from netstat -n -o filtered for port 22 and slightly
edited. All the lines started with Proto == tcp and Recv-Q == 0.

Send-Q Local Addr Foreign Addr  State       Timer
     0 old_pc:22  new_pc:52667  ESTABLISHED keepalive (3513.93/0/0)
     0 old_pc:22  new_pc:43825  ESTABLISHED keepalive (5467.38/0/0)
  2896 old_pc:22  new_pc:58601  ESTABLISHED on (21020884.65/0/0)
  4344 old_pc:22  new_pc:54105  ESTABLISHED on (21017016.33/0/0)
  2896 old_pc:22  new_pc:34149  ESTABLISHED on (20986889.24/0/0)

The first two connections are ongoing, working, interactive ssh
connections. The other three connections died days ago on my new PC.

One thing that caught my eyes was these very high timer values.
Checking the netstat source reveals that the value printed is "(double)
time_len / HZ" and that time_len is extracted from /proc/net/tcp. While
my CONFIG_HZ is 1000, I assume netstat has picked up HZ as 100 from
/usr/include/asm/param.h, and then things really seems to imply that
there is some integer overflow since 2^31 = 2147483648.

Looking into get_tcp4_sock in net/ipv4/tcp_ipv4.c I see that timer_expires
is initialized with icsk->icsk_timeout for the troublesome cases. But
here my competence to trace this further stops, so I have no idea of
how icsk->icsk_timeout gets such high values.

My old PC is currently still running with these stalled connections
present so let me know if there is something I should try to investigate
further. I can post output from /proc/net/tcp and my .config if you want
to have a look. My old PC is 32 bit/Celeron single core, kernel 2.6.24,
while my new is 64 bit/Q9300 quad core, kernel 2.6.25.3. The ethernet
cards are the following:

02:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056
PCI-E Gigabit Ethernet Controller (rev 12)

BR Håkon Løvdal
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[bug] stuck localhost TCP connections, v2.6.26-rc3+, Ingo Molnar, (Mon May 26, 7:56 am)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Thu May 29, 7:14 am)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Sat May 31, 7:46 am)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Sat May 31, 8:18 am)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Sat May 31, 8:58 am)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ingo Molnar, (Sat May 31, 12:35 pm)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Patrick McManus, (Sat May 31, 6:46 pm)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Sun Jun 1, 1:51 am)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Fri May 30, 5:12 pm)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Thu May 29, 9:48 am)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Evgeniy Polyakov, (Thu May 29, 9:05 am)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Peter Zijlstra, (Wed May 28, 5:27 am)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Håkon Løvdal, (Sat May 31, 10:25 am)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Sat May 31, 12:09 pm)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Håkon Løvdal, (Sat May 31, 1:58 pm)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Sat May 31, 2:37 pm)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Håkon Løvdal, (Sat May 31, 4:25 pm)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Sat May 31, 5:39 pm)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Håkon Løvdal, (Tue Jun 3, 8:10 pm)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Wed Jun 4, 7:14 am)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Håkon Løvdal, (Wed Jun 4, 10:00 am)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Wed Jun 4, 11:09 am)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Håkon Løvdal, (Fri Jun 6, 5:32 am)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Mon Jun 9, 3:24 pm)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Håkon Løvdal, (Tue Jun 10, 7:26 pm)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Wed Jun 11, 9:39 am)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Håkon Løvdal, (Wed Jun 18, 8:30 pm)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Håkon Løvdal, (Sat May 31, 5:45 pm)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Sat May 31, 1:22 pm)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Arjan van de Ven, (Mon May 26, 12:24 pm)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Mon May 26, 9:28 am)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ingo Molnar, (Mon May 26, 10:12 am)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Mon May 26, 10:58 am)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ingo Molnar, (Mon May 26, 12:23 pm)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Mon May 26, 12:32 pm)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ingo Molnar, (Mon May 26, 12:54 pm)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Mon May 26, 1:08 pm)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Mon May 26, 5:20 pm)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ingo Molnar, (Mon May 26, 10:17 am)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ilpo Järvinen, (Mon May 26, 10:43 am)
Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+, Ingo Molnar, (Mon May 26, 10:29 am)