* Ingo Molnar <mingo@elte.hu> wrote:the permanently hug distcc kernel build bug triggered again, twice. First time it happened yesterday, i left it running overnight and it never recovered after a 14+ hours of wait. It shows a similar pattern, 'ESTABLISHED' state on both sides, but the client-side is stuck and the server (running latest kernel) is seemingly clueless about that fact: client: Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 375450 10.0.1.16:39201 10.0.1.19:3632 ESTABLISHED server: Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 10.0.1.19:3632 10.0.1.16:39201 ESTABLISHED i waited ~30 minutes in this second case. the client (running 2.6.24) does periodic 120 seconds retransmits: 07:40:48.255452 IP dione.39201 > phoenix.distcc: . 1608:2144(536) ack 1 win 584007:40:48.255547 IP phoenix.distcc > dione.39201: . ack 2144 win 65535 07:40:48.255564 IP dione.39201 > phoenix.distcc: . 67143:67679(536) ack 1 win 5840 07:40:48.255648 IP phoenix.distcc > dione.39201: . ack 2144 win 65535 07:42:48.255440 IP dione.39201 > phoenix.distcc: . 2144:2680(536) ack 1 win 5840 07:42:48.255559 IP phoenix.distcc > dione.39201: . ack 2680 win 65535 07:42:48.255570 IP dione.39201 > phoenix.distcc: . 67679:68215(536) ack 1 win 5840 07:42:48.255659 IP phoenix.distcc > dione.39201: . ack 2680 win 65535 07:44:48.255436 IP dione.39201 > phoenix.distcc: . 2680:3216(536) ack 1 win 584007:44:48.255570 IP phoenix.distcc > dione.39201: . ack 3216 win 65535 07:44:48.255585 IP dione.39201 > phoenix.distcc: . 68215:68751(536) ack 1 win 5840 07:44:48.255669 IP phoenix.distcc > dione.39201: . ack 3216 win 65535 the server (running the latest kernel) responds: 07:40:47.551098 IP dione.39201 > phoenix.distcc: . 1072:1608(536) ack 1 win 584007:40:47.551141 IP phoenix.distcc > dione.39201: . ack 1608 win 65535 07:40:47.551204 IP dione.39201 > phoenix.distcc: . 66607:67143(536) ack 1 win 5840 07:40:47.551213 IP phoenix.distcc > dione.39201: . ack 1608 win 65535 07:42:47.570994 IP dione.39201 > phoenix.distcc: . 1608:2144(536) ack 1 win 584007:42:47.571027 IP phoenix.distcc > dione.39201: . ack 2144 win 65535 07:42:47.571117 IP dione.39201 > phoenix.distcc: . 67143:67679(536) ack 1 win 5840 07:42:47.571127 IP phoenix.distcc > dione.39201: . ack 2144 win 65535 07:44:47.590901 IP dione.39201 > phoenix.distcc: . 2144:2680(536) ack 1 win 584007:44:47.590960 IP phoenix.distcc > dione.39201: . ack 2680 win 65535 07:44:47.591042 IP dione.39201 > phoenix.distcc: . 67679:68215(536) ack 1 win 5840 07:44:47.591054 IP phoenix.distcc > dione.39201: . ack 2680 win 65535 full client socket state: dione:~> grep $(printf "%X\n" 39201) /proc/net/tcp 44: 1001000A:9921 1301000A:0E30 01 0005ABF2:00000000 01:00002B8A 00000000 500 0 63130083 2 ffff81000c762d00 120000 0 0 28 101 [ a few minutes later ] 44: 1001000A:9921 1301000A:0E30 01 0005A392:00000000 01:00002BF0 00000000 500 0 63130083 2 ffff81000c762d00 120000 0 0 32 101 [ i.e. the tx queue did increase by 2144 bytes - 4x 536 bytes ] full server socket state: phoenix:~> grep $(printf "%X\n" 39201) /proc/net/tcp 6: 1301000A:0E30 1001000A:9921 01 00000000:00000000 00:00000000 00000000 99 0 728382 1 ffff88042d8db280 300 4 30 2 -1 [ a few minutes later ] 6: 1301000A:0E30 1001000A:9921 01 00000000:00000000 00:00000000 00000000 99 0 728382 1 ffff88042d8db280 300 4 30 2 -1 [ i.e. no change - no pending packets ] I've started a longer capture session as well - it seems the TCP stack is slowly cycling through retransmissions of 536-byte packets, with 375450 bytes pending? At 120 seconds a pop that would be about 23 hours to make any progress on - but i'm not sure i interpreted that right. It all looks very weird. The timestamps of the two boxes are synced up to within about 1 second: earth4:~> for N in dione phoenix; do ssh $N date; done Thu Jul 24 07:44:02 CEST 2008 Thu Jul 24 07:44:02 CEST 2008 ( but the two boxes are responding to each other fine, so ordering of events is not a question here. ) Any other state you'd like to see before i continue with -tip testing? Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 002/196] Chinese: rephrase English introduction in HOWTO |
| Christoph Lameter | [00/41] Large Blocksize Support V7 (adds memmap support) |
| Chuck Ebbert | Re: Linux 2.6.21 |
git: | |
| Gerrit Renker | [PATCH 03/37] dccp: List management for new feature negotiation |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Hugh Dickins | Re: [bug?] tg3: Failed to load firmware "tigon/tg3_tso.bin" |
| David Miller | [GIT]: Networking |
