* Eric Dumazet <dada1@cosmosbay.com> wrote:I turned off localhost distcc two days ago and there has not been a single hung socket since then, so we now know it for sure that without localhost distcc connections, -tip's QA will not produce any hung sockets in about 1000 random-kernel-build+boot iterations. i've added those reverts this morning and added back the localhost distcc rules - we'll see whether the hung sockets are back. i'm wondering whether your suspicion on broken TCP timers is consistent with the symptoms i've seen: the hung sockets clearly produced periodic packet activity every 180 seconds, up to 8 hours, without ever changing their receive of send queue. So at least a part of the TCP timer mechanism for that specific stuck socket was working fine. is there no sysctl or other debug mechanism to somehow get its full TCP state and the reasons for why it is stuck? I'm wondering how you debug broken TCP state machines without enabling testers to be able to dump all state and passing it to developers. I have a clearly reproducable testcase and i'd like to help out, but the whole effort is stalled on 'not enough information' it appears. Doing random reverts might help in truly helpless situations where a bug has no debuggable state - but this situation seems really routine to me: it's very difficult to trigger the bug but once it triggers the bug scenario is stable and analyzable. I'd be glad to test any instrumentation patch that makes similar scenarios more analyzable. Ingo --
| KOSAKI Motohiro | [bug?] tg3: Failed to load firmware "tigon/tg3_tso.bin" |
| Nick Piggin | [patch 3/6] mm: fix fault vs invalidate race for linear mappings |
| Stefan Richter | Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures |
| Ingo Molnar | [bug] stuck localhost TCP connections, v2.6.26-rc3+ |
git: | |
| Peter Zijlstra | Re: [PATCH 3/3] Convert the UDP hash lock to RCU |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | Re: 2.6.25-rc8: FTP transfer errors |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Doug Evans | Re: Stabilizing Linux |
| Robert Blum | And another version of the INFO sheet |
| Marc CORSINI | find-1.2 (binaries only) |
| Yanek Martinson | Re: Porting g++ 1.40.3 |
