2008/5/28 Peter Zijlstra <peterz@infradead.org>:Me too, however with a completely different scenario; my hung connections are not related to distcc at all. The output from /proc/net/tcp that Ingo posted a few days ago are somewhat different from mine, however I believe this is the same problem or at least related. Just as Ingo experienced, netstat -p only shows PID/program as '-' for the hung connections while for other connections it shows the expected results. I have recently bought a new PC and have started the process of copying stuff from my old PC to the new PC. During this I have experienced this hang several times. I started copying by using tar on both ends over a ssh pipe but in order to eliminate possible ssh problems I also have tried tar over a ttcp connection which also fails. There is no obvious pattern of when this happens, I have experienced failures after transferring 1.15GB, 51.4GB and 23.6GB. Here is the output from netstat -n -o filtered for port 22 and slightly edited. All the lines started with Proto == tcp and Recv-Q == 0. Send-Q Local Addr Foreign Addr State Timer 0 old_pc:22 new_pc:52667 ESTABLISHED keepalive (3513.93/0/0) 0 old_pc:22 new_pc:43825 ESTABLISHED keepalive (5467.38/0/0) 2896 old_pc:22 new_pc:58601 ESTABLISHED on (21020884.65/0/0) 4344 old_pc:22 new_pc:54105 ESTABLISHED on (21017016.33/0/0) 2896 old_pc:22 new_pc:34149 ESTABLISHED on (20986889.24/0/0) The first two connections are ongoing, working, interactive ssh connections. The other three connections died days ago on my new PC. One thing that caught my eyes was these very high timer values. Checking the netstat source reveals that the value printed is "(double) time_len / HZ" and that time_len is extracted from /proc/net/tcp. While my CONFIG_HZ is 1000, I assume netstat has picked up HZ as 100 from /usr/include/asm/param.h, and then things really seems to imply that there is some integer overflow since 2^31 = 2147483648. Looking into get_tcp4_sock in net/ipv4/tcp_ipv4.c I see that timer_expires is initialized with icsk->icsk_timeout for the troublesome cases. But here my competence to trace this further stops, so I have no idea of how icsk->icsk_timeout gets such high values. My old PC is currently still running with these stalled connections present so let me know if there is something I should try to investigate further. I can post output from /proc/net/tcp and my .config if you want to have a look. My old PC is 32 bit/Celeron single core, kernel 2.6.24, while my new is 64 bit/Q9300 quad core, kernel 2.6.25.3. The ethernet cards are the following: 02:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12) BR Håkon Løvdal
| Avi Kivity | [PATCH 09/58] KVM: MMU: Respect nonpae pagetable quadrant when zapping ptes |
| Andrew Morton | 2.6.25-rc2-mm1 |
| James Morris | Re: LSM conversion to static interface |
| Eric W. Biederman | Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu |
git: | |
| David Miller | Re: 2.6.25-rc8: FTP transfer errors |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | Re: [GIT *] Solos PCI ADSL card update |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
