in an overnight -tip testruns that is based on recent -git i got two
stuck TCP connections:Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 174592 10.0.1.14:58015 10.0.1.14:3632 ESTABLISHED
tcp 72134 0 10.0.1.14:3632 10.0.1.14:58015 ESTABLISHEDon a previously reliable machine. That connection has been stuck for 9
hours so it does not time out, etc. - and the distcc run that goes over
that connection is stuck as well.kernel config is attached.
in terms of debugging there's not much i can do i'm afraid. It's not
possible to get a tcpdump of this incident, given the extreme amount of
load these testboxes handle. This problem started sometime around rc3
and it occured on two boxes (on a laptop and on a desktop), both are SMP
Core2Duo based systems. I never saw this problem before on thousands of
similar bootups, so i'm 99.9% sure the bug is either new or became
easier to trigger.It's not possible to bisect it as it needs up to 12 hours of heavy
workload to trigger. The incident happened about 5 times since the first
incident a couple of days ago - 4 times on one box and once on another
box. The first failing head i became aware of was 78b58e549a3098. (-tip
has other changes beyond -git but changes nothing in networking.)One clue (which might or might not matter) is that distcc is one of the
very few applications that makes use of sendfile().Ingo
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.26-rc3
# Sat May 24 04:28:24 2008
#
CONFIG_64BIT=y
# CONFIG_X86_32 is not set
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_DEFCONFIG_LIST="arch/x86/configs/x86_64_defconfig"
# CONFIG_GENERIC_LOCKBREAK is not set
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUP...
update: in the past 5 days of -tip testing i've gathered about 10
randconfig kernel configs that all produced such failures.Since the bug itself is very elusive (it takes up to 50 boot +
kernel-rebuild-via-distccc iterations to trigger) bisection was still
not an option - but with 10 configs statistical analysis of the configs
is now possible.I made a histogram of all kernel options present in those configs, and
one networking related kernel option stood out:5 CONFIG_TCP_CONG_ADVANCED=y
6 CONFIG_INET_TCP_DIAG=y
6 CONFIG_TCP_MD5SIG=y
9 CONFIG_TCP_CONG_CUBIC=ythe likelyhood of CONFIG_TCP_CONG_CUBIC=y being enabled in my randconfig
runs is 75%. The likelyhood of CONFIG_TCP_CONG_CUBIC=y being enabled in
10 configs in a row is 0.75^10, or 5.6%. So statistical analysis can say
it with a 95% confidence that the presence of this option correlates to
the hung sockets.i have started testing this theory now, via the patch below, which turns
off TCP_CONG_CUBIC. It will take about 50 bootups on the affected
testsystems to confirm. (it will take a couple of hours today as not all
testsystems show these hung socket symptoms)distributions enable TCP_CONG_CUBIC by default:
$ grep CUBIC /boot/config-2.6.24.7-92.fc8
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_CUBIC=ywhich would explain why Arjan and Peter triggered similar hangs as well.
Ingo
---------------------->
Subject: qa: no TCP_CONG_CUBIC
From: Ingo Molnar <mingo@elte.hu>
Date: Thu May 29 09:45:51 CEST 2008---
net/ipv4/Kconfig | 4 ++++
1 file changed, 4 insertions(+)Index: tip/net/ipv4/Kconfig
===================================================================
--- tip.orig/net/ipv4/Kconfig
+++ tip/net/ipv4/Kconfig
@@ -454,6 +454,8 @@ config TCP_CONG_BIC
config TCP_CONG_CUBIC
tristate "CUBIC TCP"
default y
+ depends on BROKEN_BOOT_ALLOWED
+ select BROKEN_BOOT
---help---
This is version 2.0 of BIC-TCP which uses a cubic growth functi...
...I tried yesterday some accept (& read some) & close/exit type
stressing but I couldn't get it to show up (though I'll try longerDo I understand you correctly... it doesn't explain the tenth case out
Main problem with this explanation is that congestion control modules are
only in use when TCP is in ESTABLISHED and transmitting normally, while it
has nothing to do how we enter or leave ESTABLISHED.But if it's really that the process who owned the connection already went
away, I think we should end up into tcp_close() which changes the state
from established (and send RST too if there's still data to be received,
which would be picked up by the other end and that end would no longer
keep established either). ...A failure to send the reset would show up in
LINUX_MIB_TCPABORTFAILED. Because both ends remain in established, it kind
of excludes the possibility that something would have accidently allowed
the Recv-Q end to return back to established too (due to some bug).To me there are mainly two weird things:
1) Why we see orphaning with data in the first place (I think distcc would
be interested to read everything, unless some worker crashed in early...
Though some timeout in distcc could explain it as well but I don't know
too well how distcc does everything)...
2) Why the connection is still in ESTABLISHED when it was orphaned and
it has some data to receive... If it had some unread data, should be in
CLOSE or in FIN_WAIT1 otherwise.--
i.
--
after about 50 bootups i got a hung test again:
titan:~/tip> netstat -nt
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address
State
tcp 0 0 10.0.1.14:22 10.0.1.16:58062 ESTABLISHED
tcp 0 0 10.0.1.14:22 10.0.1.16:60109 ESTABLISHED
tcp 0 86368 10.0.1.14:43914 10.0.1.16:3632 ESTABLISHEDand this time with CUBIC_TCP disabled - so that was a red herring.
Ingo
--
ah, in retrospect i realized that this test had one flaw: some of the
systems i the build cluster already ran a newer kernel and hence were
targets for this bug.so i turned off CONFIG_TCP_CONG_CUBIC on all the testboxes and rebooted
the cluster boxes into 2.6.25, and the hung sockets are now gone. (about
150 successful iterations)i did another change as well: i removed the localhost distcc component.
I'll reinstate that now to make sure it's really related to
TCP_CONG_CUBIC and not to localhost networking.Ingo
--
ok, once i added back the localhost distcc component and the hung kernel
build + stuck TCP socket bug happened again overnight:Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 72187 0 10.0.1.14:3632 10.0.1.14:47910 ESTABLISHED
tcp 0 174464 10.0.1.14:47910 10.0.1.14:3632 ESTABLISHEDso it seems distcc over localhost was the aspect that made it fail.
_Perhaps_ what matters is to have the new post-rc3 TCP code on _both_
sides of the connection. But that is just a theory - it could be timing,
etc.Ingo
--
Btw, does your distcc perhaps happen enable TCP_DEFER_ACCEPT (there were
some post 2.6.25 changes into it)?--
i.
--
On Sat, 31 May 2008, Ilpo J
what is the easiest way to figure out whether my version of distcc
enables TCP_DEFER_ACCEPT?Ingo
--
> * Ilpo J
seems to be used:
22003 write(3, "distccd[22003] (dcc_listen_by_ad"..., 62) = 62
22003 listen(4, 10) = 0
22003 setsockopt(4, SOL_TCP, TCP_DEFER_ACCEPT, [1], 4) = 0i'll queue up your reverts for testing in -tip.
Ingo
--
update: your 3 reverts in tip/out-of-tree [commit dad98991c] definitely
fixed the hangs!Here is the testing i did:
first i ran about 500+ successful iterations on the affected testboxes
with your revert patch applied, on multiple systems. Then today, without
changing anything else on one of the testsystems i reverted your revert
on that single system. After about an hour of testing, in 20 iterations
i got a hang again over localhost:titan:~> netstat -nt
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 174592 10.0.1.14:34710 10.0.1.14:3632 ESTABLISHED
tcp 72145 0 10.0.1.14:3632 10.0.1.14:34710 ESTABLISHEDso i hereby conclude that your revert works :) I've repeated the commit
below that resolves this nasty regression.phew ...
Ingo
---------------->
commit dad98991c137c089d64a3057dddc3a12bd6866b0
Author: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Date: Sat May 31 15:18:56 2008 +0300tcp: revert DEFER_ACCEPT modifications
On Sat, 31 May 2008, Ilpo Järvinen wrote:
> On Sat, 31 May 2008, Ingo Molnar wrote:
>
> > ok, once i added back the localhost distcc component and the hung kernel
> > build + stuck TCP socket bug happened again overnight:
> >
> > Active Internet connections (w/o servers)
> > Proto Recv-Q Send-Q Local Address Foreign Address State
> > tcp 72187 0 10.0.1.14:3632 10.0.1.14:47910 ESTABLISHED
> > tcp 0 174464 10.0.1.14:47910 10.0.1.14:3632 ESTABLISHED
> >
> > so it seems distcc over localhost was the aspect that made it fail.
> >
> > _Perhaps_ what matters is to have the new post-rc3 TCP code on _both_
> > sides of the connection. ...
...It wasn't exactly out-of-tree, Evgeniy fixed a problem that was found
in "TCP_DEFER_ACCEPT updates - process as established", perhaps it just
wasn't in your testing tree yet.$ git-log -n 1 --pretty=full 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38 |
grep "Commit:"Are you sure this is enough to conclude the results? Seems quite small
number to me to rule out luck. Especially considering that it was some
amount of time in the tree already until you noticed it for the first
time.Anyway, nice that it seems to be helping. It was almost the only
possibility on TCP side, I don't think there were any other state machine
related changes. So it wasn't just "random revert" in that sense like you
were implying :-), I just didn't have any theory how it would cause the
problem... ...I even first disregarded DA that because of timeline
in-exactness and because I wrongly assumed that distcc probably won't use
it anyway, but then I checked later on and found out that it was present
at least in the source I had lying around.Anyway, it might be that the revert was a bit overkill, I'm not fully sure
if 539fae and e4c7884 need to be reverted to fix it since main changes
are in ec3c098. I just didn't want to take chances at first and put them...I couldn't immediately find anything obviously wrong with those changes
but the patch below might be worth of a try (without the revert of
course). If it ever spits out that WARN_ON for you, we were playing with
fire too much and it's better to return on the safe side there...--
i.[PATCH] tcp DEFER_ACCEPT: see if header prediction got turned on
If header prediction is turned on under some circumstances,
DA can deadlock though I have great trouble in figuring out
how it could ever happen while ending up into that else
branch (but I've been wrong before as well :-)).Signed-off-by: Ilpo J
out of the -tip tree :-) The -tip tree has 75+ topic branches at the
moment, but TCP topics are not in its scope - so any TCP change is "out
of tree" for the -tip tree.People got confused in the past when they saw similar test patches show
up in sched.git and x86.git before, so we wanted to make it very clear
in -tip (with is the successor of sched.git, x86.git and a couple of
other git trees) that these are commits we dont want to push anywhere.Commits in tip/out-of-tree dont get propagated into the tip/auto-*-next
topic branches that linux-next and -mm picks up, they are purely a
courtesy to help the testing/fixing of bugs in subsystems that are
maintained in other git trees.See attached below the current shortlog of the tip/out-of-tree topic
branch - it contains changes all around the tree for various things that
we triggered in -tip and are not yet upstream or are in flight somewherea full day of testing on a testsystem with 500 random kernel builds and
bootups (the kernel build done on the testsystem utilizing distcc and
make -j100, so it's rather heavy and parallel TCP traffic per iteration)
with no hang, compared to the same system with your reverts not applied
that hung after an hour with 20-30 iterations.And that count increased to 1000 successful test iterations since
yesterday.So i think yes, it seems rather conclusive, given the circumstances ;-)
These random kernel boots found many 'impossible to trigger' bugs and
races in the past. The reason for its race finding capability is the
timing randomness of the resulting random kernel image: the delays
caused by random combination of debugging facilities, build variants,
kernel subsystem variants we have. This -tip qa method - as a
side-effect of its coverage testing - simulates timing variantions that
are otherwise only observable via hardware variations.I.e. this is not the same kernel booted up a 1000 times - that would be
a very narrow test. This is 1000 _different_ kern...
From: Ingo Molnar <mingo@elte.hu>
Ilpo posted another patch which fixes a locking bug in the
code, please test with that patch. I include it below so
that you know exactly which one I am referring to.The quicker you test this, the faster I can merge it to
Linus and get the bug fixed for good.[PATCH] tcp DEFER_ACCEPT: fix racy access to listen_sk
It seems that replacement of DA code also moved parts outside
of appropriate locking. The Ingo's problem seems to come from
the fact that two flows could now race in
(inet_csk_)reqsk_queue_add corrupting the queue. ...This can
leave dangling socks around which won't resolve themselves
without stimuli from outside (e.g., external RST would help
I think).Then some details I'm not too sure of:
I guess we want to put listen_sk->sk_state checking under the
lock as well. I've not evaluated if ->sk_data_ready too
requires locking but assumed it does.I'm by no means familiar with all locking variants, requirements,
etc.Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
net/ipv4/tcp_input.c | 23 +++++++++++++----------
1 files changed, 13 insertions(+), 10 deletions(-)diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c9454f0..d21d2b9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4562,6 +4562,7 @@ static int tcp_defer_accept_check(struct sock *sk)
struct tcp_sock *tp = tcp_sk(sk);if (tp->defer_tcp_accept.request) {
+ struct sock *listen_sk = tp->defer_tcp_accept.listen_sk;
int queued_data = tp->rcv_nxt - tp->copied_seq;
int hasfin = !skb_queue_empty(&sk->sk_receive_queue) ?
tcp_hdr((struct sk_buff *)
@@ -4570,8 +4571,9 @@ static int tcp_defer_accept_check(struct sock *sk)
if (queued_data && hasfin)
queued_data--;- if (queued_data &&
- tp->defer_tcp_accept.listen_sk->sk_state == TCP_LISTEN) {
+ bh_lock_sock(listen_sk);
+
+ if (queued_data && listen_sk->sk_state == ...
> > * Ilpo J
From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
I don't know, and since I'm getting back onto an airplane again tomorrow
morning I'm the last person who is likely do be able to do an audit
of these cases for you :-)
--
On Wed, 4 Jun 2008, Ilpo J
From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Keepalive is very long, it might still "seem" like a deadlock for
someone without much patience :-)
--
On Tue, 3 Jun 2008, David Miller wrote:
> From: "Ilpo_J
From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
I took a close look at this, it seems this patch adds
an ABBA deadlock. But I might be wrong.Normally the locking order is:
listen_sk --> some_child_sk
And this can be seen by the code paths that flow down
to tcp_child_process() in net/ipv4/tcp_minisocks.c (via
tcp_v4_do_rcv() for example).However here in this patch we will lock in the:
child_sk --> listen_sk
order.
Unless... these defer accepted sockets live outside of the
normal socket collection found by tcp_v{4,6}_hnd_req(). If
that is the case, that ought to make this locking order OK
but I fear that lockdep will likely complain because it has
no way to see this distinction.If we cannot find a simple and easy way to deal with this locking
problem, I am reverting the defer-accept changes entirely. It's not
the end of the world if this feature has to be deferred to 2.6.27
and this bug has been known for several weeks already.--
On Tue, 10 Jun 2008, David Miller wrote:
> From: "Ilpo_J
based on this, plus the comments from vitaliy it is clear that moving
the DA socket to established is pretty tricky. I agree I should address
these things in the >= 27 time frame.imho we don't need to revert the first two patches:
e4c78840284f3f51b1896cf3936d60a6033c4d2c and
539fae89bebd16ebeafd57a87169bc56eb530d76 .. they have nothing to do with
rearranging states or the locking and they do fix the more notable DA
behavior issues.-Patrick
--
i unapplied the reverts then applied your fix, and got this hang after
1.5 hours of testing:titan:~> netstat -nt
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 65032 10.0.1.14:39454 10.0.1.19:3632 ESTABLISHED10.0.1.14 is this box, Core2Duo dual-core. 10.0.1.19 is another box with
16 CPUs.Ingo
--
> * Ilpo J
On Thu, 5 Jun 2008, Ilpo J
Sorry, I don't see anything - it seems to boil down to the same code in
the DA and non-DA case as far as I can tell, but after a while all the
twisty passages seem to look alike.If Ingo confirms that the recv end was running the locking patch code,
it would be interesting to just confirm the sysreq+t looks the same as
before - it is possible the patch turned the race into a non-obvious
deadlock.I'm sure your smaller revert will make the problem go away just as the
larger one did, fwiw.The other odd thing is that Ingo did a lot of experimentation and was
only making this happen on localhost before (though I agree there is
nothing inherent about that lock and localhost) - isn't it odd that the
first trigger of it now is between two hosts? What do you make of that?--
On Thu, 5 Jun 2008, Patrick McManus wrote:
> On Fri, 2008-06-06 at 00:13 +0300, Ilpo J
maybe - I've noted that I can get the distcc server to crash with just a
little fuzz (telnet to it and close the telnet) - but it is true I
haven't seen anything odd using the distcc client.Anyhow, my news is that using rc5 I have managed to reproduce it on
localhost - so it isn't just ingo anymore ! ;)I didn't have a 16 cpu machine at my disposal, so I was concerned I
wouldn't be able to make it happen - but I setup a 64 bit kvm image with
-smp 4, running on my core2-duo and created a makefile with 3 src files
that I just compiled as "while true; do make -j3 all; done" - the
makefile uses distcc to localhost and has intentionally broken
dependencies so it just keeps recompiling stuff. The input files are
approximately 135k, 98k, and 16k after running gcc -E on them (which I
what I assume distcc does before putting them down the socket).On rc5 I could get the lockup in under 20 minutes.. usually 10. I think
I did it 4 times. My compile test is probably a better trigger than the
kernel compile because the distcc connects are never staggered like they
would be in a large directory of files. (3 files, -j4).When I apply the locking patch you (Ilpo) wrote, I cannot reproduce the
error at all in the first 90 minutes of testing. I'll let the test run
and update the list.I'm holding out hope that Ingo's report did not have the locking patch
on the distcc server end - because it certainly makes a difference for
me.-Patrick
--
In addition I think I've also seen some bits floating around that
occassionally distcc does something weird in a correct setup too.I briefly looked how distcc behaved while doing the stress_accept. Distcc
basically seems to have n processes each accept()ing and some kind of
memleak killer by limiting number of successive accepts then exit, while
the parent who did the listen is only periodically (had some sleep(1)s)Also Peter Z has reported it earlier, it was distcc+localhost for him as
It could be even easier if you make next in path gcc to play with
nice, trying a number of different values might reveal some really fast...He had some issue with different versions being deployed at least in
the past, and I failed to follow his latest answer :-).--
i.
--
Hm, the distcc server had the full 3-patch-revert from Ilpo, was that
supposed to fix the problem too, indirectly?The box is running that 3-patch revert right now as well:
phoenix:~> uptime
19:20:28 up 9:58, 2 users, load average: 7.75, 13.88, 30.95
phoenix:~> uname -a
Linux phoenix 2.6.26-rc4 #2352 SMP Fri Jun 6 09:18:07 CEST 2008 x86_64 x86_64 x86_64 GNU/Linux... and i never saw a single hang today in the 10 hours of uptime this
box has. (and it built a good 500 kernel today) Nor any hang yesterday,
and that was a good 500 kernels too.You can see it that the box built more than two thousand kernels in the
past few days alone, so it's a rather busy little bee. The other
testboxes built even more kernels - a quad box built and booted 2500
kernels:#define UTS_VERSION "#2524 SMP PREEMPT Fri Jun 6 19:22:21 CEST 2008"
and i never saw a hang on that box either.
a third box has:
titan:~> uname -a
Linux titan 2.6.26-rc5-00002-g737697d-dirty #2557 SMP PREEMPT Fri Jun 6
19:24:00 CEST 2008 x86_64 x86_64 x86_64 GNU/Linux(this is the one that showed the hang for the first time)
The total count of kernel bootups i did this week for -tip QA was
somewhere between five and ten thousand random build+bootups - and the
only time i got a hang was when i removed the 3-patch-revert
intentionally, on one of the boxes.Maybe that 3-patch-revert just makes this locking bug a bit less likely
to trigger, by accident? Out tip test-setup is specialized to find
arch/x86 and scheduler bugs, not primarily to find networking bugs. (but
at this test volume, and given that it makes use of distcc, it will
trigger them too.)i have a rather accurate timeline of when the hang first occured, do we
know the timeline of the introduction of the locking bug by any chance?
Which commit introduced it? (Ilpo's commit log does not say it)Your test results are compelling nevertheless so i'll do a retest in any
case, with all boxes either running an olde...
...I kind of fail to follow in general in this mail which patch have been
tested and where and when... But I understand that it's just due to number
of tests & hosts & kernels & what-else you use and know by heart (and we
don't do all that well :-)). But I'll try to still sort it out below...Yes, the problematic outside of locking portion shouldn't be there
No, part of the DEFER_ACCEPT stuff was postponed in 2.6.25..2.6.26-rc1
timeframe (ec3c0982a2dd1e671bad8e9d26c28dcba0039d87) so that one portion
of it ended up being added outside of the socket lock of the listening
socket, while touching its datastructures. Without
ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 the deferred accept related
things happen earlier, ie., while we still are under the lock of the
listening socket. So that particular locking bug was _introduced_ by that
ec3c change, not made more likely or so....Of course software is known to have bugs, so we might always be
It seems to work quite well actually for this kind of networking related
Ah, sorry I forgot to add that one there, it was sent quite late in the
night and I just couldn't get sleep until sending the fix... :-) It was
one of the reverted ones that did it:If you want an older kernel, you would have to go basically to 2.6.25 or
so.To summarize. Both 3changes+1fix revert (you refer to it only as 3-patch
revert) _and_ the locking fix I made should fix the problem (obviously
they exclude each other). ...And end which is significant is the one which
has LISTENing sockets (please keep this in mind if you still get the hang
and provide some info).--
i.
--
correct, that's what i use as fallback, some distro kernel which is
2.6.25 or older.but i'm confused a bit, you say v2.6.25-rc6-475-gec3c098 introduced the
locking problem - so 2.6.25 is affected as well? This is a significant
question because the fallback kernel is kernel-2.6.25.3-18.fc9.x86_64 on
the 16-way box. (all other build-boxes have 2.6.24 or older as aok.
For completeness, let me repeat the patch i referred to as the
'3-patch-revert' below. (which indeed is 3+1 as you note)this is the patch that appears to be working empirically. (Disclaimer:
it might just hide the problem, change timings, have a lucky code
layout, etc.)Ingo
----------------->
commit 9e5b6ca8fa8e07aceaddbfb0f2a324449e5ef014
Author: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Date: Sat May 31 15:18:56 2008 +0300tcp: revert DEFER_ACCEPT modifications
On Sat, 31 May 2008, Ilpo Järvinen wrote:
> On Sat, 31 May 2008, Ingo Molnar wrote:
>
> > ok, once i added back the localhost distcc component and the hung kernel
> > build + stuck TCP socket bug happened again overnight:
> >
> > Active Internet connections (w/o servers)
> > Proto Recv-Q Send-Q Local Address Foreign Address State
> > tcp 72187 0 10.0.1.14:3632 10.0.1.14:47910 ESTABLISHED
> > tcp 0 174464 10.0.1.14:47910 10.0.1.14:3632 ESTABLISHED
> >
> > so it seems distcc over localhost was the aspect that made it fail.
> >
> > _Perhaps_ what matters is to have the new post-rc3 TCP code on _both_
> > sides of the connection. But that is just a theory - it could be timing,
> > etc.
>
> Btw, does your distcc perhaps happen enable TCP_DEFER_ACCEPT (there were
> some post 2.6.25 changes into it)?Hmm, if so, please try this revert b...
This is all a bit confusing, but here are the conclusions I have drawn.
There definitely is a problem with the locking of the DA commit
ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 . That code was part of 26-rc1
but it never appeared in 25. It exists in pretty much the same form in
rc5 (there was 1 patch to it over that time to fix a different problem).We're certain this code has a problem with the accept queue both because
of code inspection and the fact that Ingo can back it out (as the
significant part of the 3-patch revert) and the problem goes away in his
testing.I have run tests that can reproduce the hung socket with distcc over
localhost using 26-rc5. I can also apparently cure it using the locking
fix patch Ilpo sent (c9454f0..d21d2b9) on top of that. (My test of rc5
+lockpatch is at 4.5+ hrs and counting without failures, it fails 6
times an hour with vanilla rc5)Based on all of that, the right thing to do seems to be to apply the
lockpatch (c9454f0..d21d2b9) to Linus's tree and not revert anything -
just fix the code and I'll send Ilpo and Ingo cookies at Christmas time
for being great guys. Alternatively, Ingo could run the distcc servers
and clients on -tip with the lockpatch (nothing reverted) for more
testing.The only lingering problem is Ingo's report yesterday
http://marc.info/?l=linux-netdev&m=121267587715976&w=2
of a distcc hang. In this one it was not over localhost and the distcc
server had the ec3c DA changes totally reverted. (The server is really
the only stack that matters in this case - the client is not impacted by
the DA changes). This has to be a different issue, because the ec3c code
we're talking about here wasn't on the server at all. As Ilpo mentions,
Hakon is beleived to have a different problem and maybe you've tripped
over that too?If we're sure of that conclusion we should just take Ilpo's DA patch as
that will narrow the field for finding Hakon's issue. Its just with all
of these data points I'm not sure if I'm reaching the...
From: Patrick McManus <mcmanus@ducksong.com>
While the conclusion might be accurate, Ilpo's patch adds a lock
ordering problem that might introduce potential deadlocks (worst
case) or at least some lockdep warnings. Both of which are
unacceptable.I personally don't see any clear solution, and because this new
DA code moves an operation into a code block where the locking
order pretty much has to be inverted, I don't see any real short
term solution other than a revert of this particular DA commit.
--
...added Arjan.
Problems were at least these:
- Accept queue addition was racy and could leave dangling items
- Dangling items caused inconsistent sk_ack_backlog
- Checking for still in LISTEN state was racy, could be changed after
the check was made (shouldn't happen with distcc though)I didn't read ->sk_data_ready that carefully, it could have some
additional problems that are not listed (but they all should be fixed
by the added locking anyway).AFAICT, rest of that ec3c change is safe wrt. locking, just holding sk is
enough for the rest and those bits mostly shouldn't anyway be executedAnyway, we still would have an option to revert both the DA change + the
locking fix later if the problem is still clearly more likely than withIt definately didn't fit to picture that well if we would be talking just
a single bug here....The H
On Sat, 7 Jun 2008 00:12:55 +0300 (EEST)
at this point the observed hung I had seems to be the app doing
something bad (it's the app that hung) so for now please disregard my--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
> "Ilpo J
> * Ilpo J
On Wed, 2008-06-04 at 02:22 +0300, Ilpo Järvinen wrote:
Ilpo, has anyone told you today that you rock? Well allow me - you
do_rcv() clearly has the listening socket locked in the non-DA case, and
in the DA case it is the 'child' ESTABLISHED socket that is locked -
leaving the accept queue unprotected. So simple.-Pat
--
On Tue, 3 Jun 2008, Patrick McManus wrote:
> On Wed, 2008-06-04 at 02:22 +0300, Ilpo J
Not commenting on the locking, but I think the code
would be clearer if tp->defer_tcp_accept was used in
a temporary instead.Perhaps:
net/ipv4/tcp_input.c | 33 ++++++++++++++++-----------------
1 files changed, 16 insertions(+), 17 deletions(-)diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b54d9d3..f846e11 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4535,18 +4535,18 @@ static void tcp_urg(struct sock *sk, struct sk_buff *skb, struct tcphdr *th)
static int tcp_defer_accept_check(struct sock *sk)
{
struct tcp_sock *tp = tcp_sk(sk);
-
- if (tp->defer_tcp_accept.request) {
- int queued_data = tp->rcv_nxt - tp->copied_seq;
- int hasfin = !skb_queue_empty(&sk->sk_receive_queue) ?
+ struct tcp_deferred_accept_info *tdai = &tp->defer_tcp_accept;
+ if (tdai->request) {
+ int queued_data = tp->rcv_nxt - tp->copied_seq;
+ int hasfin = !skb_queue_empty(&sk->sk_receive_queue) ?
tcp_hdr((struct sk_buff *)
sk->sk_receive_queue.prev)->fin : 0;if (queued_data && hasfin)
queued_data--;- if (queued_data &&
- tp->defer_tcp_accept.listen_sk->sk_state == TCP_LISTEN) {
+ bh_lock_sock(tdai->listen_sk);
+ if (queued_data && tdai->listen_sk->sk_state == TCP_LISTEN) {
if (sock_flag(sk, SOCK_KEEPOPEN)) {
inet_csk_reset_keepalive_timer(sk,
keepalive_time_when(tp));
@@ -4554,23 +4554,22 @@ static int tcp_defer_accept_check(struct sock *sk)
inet_csk_delete_keepalive_timer(sk);
}- inet_csk_reqsk_queue_add(
- tp->defer_tcp_accept.listen_sk,
- tp->defer_tcp_accept.request,
- sk);
+ inet_csk_reqsk_queue_add(tdai->listen_sk,
+ tdai->request,
+ sk);- tp->defer_tcp_accept.listen_sk->sk_data_ready(
- tp->defer_tcp_accept.listen_sk, 0);
+ tdai->listen_sk->sk_data_ready(tdai->listen_sk, 0);- sock_put(tp->defer_tcp_accept....
On Tue, 3 Jun 2008, Joe Perches wrote:
> On Wed, 2008-06-04 at 02:22 +0300, Ilpo J
On Tue, 3 Jun 2008, David Miller wrote:
> From: "Ilpo_J
hm.
Here's a theory - the DA code put the socket in the accept queue after
data has been queued on the socket, instead of the other way around
which is normal.Could this be confusing the select() mechanism?
-Pat
--
So the code you will revert came from my fingers. The circumstances here
make me nervous; while I'm at a loss to explain what might be going on
in particular, let me offer an apology in advance should the revert help
resolve the issue.Here's what makes me nervous:
* not a lot of code uses DEFER_ACCEPT.. frankly it was pretty broken
before 26 - but not broken this way .. the correlation of your bug using
it is significant.* in 26, a server TCP socket (with DA) goes to ESTABLISHED when the 3rd
part of the handshake is received (as normal without DA), but the socket
isn't put on the accept queue until a real data packet arrives. (That's
the point of DA). In <= 25 this socket would have syn-recv until the
data packet arrived.- I did run tests where the server died in between the handshake being
completed and first data packet arriving - the client should see RST and
the server socket should disappear. But maybe something was missed?Do I understand this correctly, the server process is gone but the
socket is still in the table? And the client process is still there
waiting for the server to do something - having sent a bunch of data?Do we know if any data bytes (not handshake bytes) have been consumed by
the server side? If they were, that would seem to vindicate DA.Also pointing away from DA is that you started seeing this with rc3 -
that code was included in rc1.Is that a firm observation, or maybe there
weren't enough datapoints to conclude that rc1 and rc2 were clean?The most interesting patch is ec3c0982a2dd1e671bad8e9d26c28dcba0039d87
if anyone wants to eyeball it.--
I believe Ingo problems come on long lived sockets (were many bytes were
exchanged between the peers), so I dont think DEFER_ACCEPT is the cullprit.I suggest to enable CONFIG_TIMER_STATS and to check timers, because
/proc/net/tcp
can display apparently large timer values when the timer is elapsed
(jiffies > icsk->icsk_timeout)
and jiffies_to_clock_t(timer_expires - jiffies) is then overflowing
doing a multiply and a divide.On a 64bits server running linux-2.6.24-rc2, I can see *strange* timers
values too in /proc/net/tcp, but not
stuck TCP sessions. On 64 bits, these strange values have 1AD7Fgrep 1AD7F /proc/net/tcp | obfuscate_IP_and_ports
2017: local_peer remote_peer 03 00000000:00000000 01:1AD7F29ABBA
00000001 0 0 0 2 ffff81067e7520c0
2019: local_peer remote_peer 03 00000000:00000000 01:1AD7F29ABBA
00000003 0 0 0 2 ffff8106c580bcc0
2029: local_peer remote_peer 03 00000000:00000000 01:1AD7F29ABBA
00000002 0 0 0 2 ffff81067313fe40
2032: local_peer remote_peer 03 00000000:00000000 01:1AD7F29ABBA
00000003 0 0 0 2 ffff8106c716c340
2039: local_peer remote_peer 03 00000000:00000000 01:1AD7F29ABBA
00000002 0 0 0 2 ffff8107d45b3f40
2041: local_peer remote_peer 03 00000000:00000000 01:1AD7F29AB37
00000000 0 0 0 2 ffff810718e221c0
6610: local_peer remote_peer 01 00000000:00000000 00:1AD7F29ABCA
00000000 0 0 136594789 1 ffff8107183fb940 94 10 16 2 -1
9925: local_peer remote_peer 01 00000000:00000000 00:1AD7F29ABCA
00000000 0 0 144451161 1 ffff8107051a9840 351 10 0 2 -1On TCP_SYN_RECV (03) sockets, timer can apparently be elapsed by many
ticks, while on TCP_ESTABLISHED (01) one, I get
jiffies_to_clock(-1) -> 1AD7F29ABCA value because the way
get_tcp4_sock() is coded (jiffies can change while running this function).
Note the 00: that means that no timer in my case.Running again the command one second later gives completely different
results (other sockets are d...
I turned off localhost distcc two days ago and there has not been a
single hung socket since then, so we now know it for sure that without
localhost distcc connections, -tip's QA will not produce any hung
sockets in about 1000 random-kernel-build+boot iterations.i've added those reverts this morning and added back the localhost
i'm wondering whether your suspicion on broken TCP timers is consistent
with the symptoms i've seen: the hung sockets clearly produced periodic
packet activity every 180 seconds, up to 8 hours, without ever changing
their receive of send queue. So at least a part of the TCP timer
mechanism for that specific stuck socket was working fine.is there no sysctl or other debug mechanism to somehow get its full TCP
state and the reasons for why it is stuck? I'm wondering how you debug
broken TCP state machines without enabling testers to be able to dump
all state and passing it to developers.I have a clearly reproducable testcase and i'd like to help out, but the
whole effort is stalled on 'not enough information' it appears. Doing
random reverts might help in truly helpless situations where a bug has
no debuggable state - but this situation seems really routine to me:
it's very difficult to trigger the bug but once it triggers the bug
scenario is stable and analyzable. I'd be glad to test any
instrumentation patch that makes similar scenarios more analyzable.Ingo
--
> > * Ilpo J
btw., i now also have a hung socket over real network:
titan:~> netstat -nt
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 86752 10.0.1.14:49921 10.0.1.16:3632 ESTABLISHED(the scenario is similar to the incident i've analyzed before, so i wont
repeat the tcpdump, etc.)Ingo
--
last night i turned off distcc support, and got about 200 successful
bootups and zero TCP hangs (as expected - there's not much TCP traffic
if the distcc cluster is not utilized).but that's 200 overnight tests instead of the expected 600, so this is a
major and rather crippling bug to me.There's no good way to detect these hung sockets by me from userspace
and get rid of them. Has anyone before thought of the obvious: to write
a kernel-space "TCP socket watchdog" kernel feature that detects them
and tries to free them so that people can become aware of it?Hung sockets is a re-occuring bug in the TCP stack after all. (and it's
a natural property of it: state machine designs are always vulnerable to
lost event problems.)Ingo
--
...It is quite easy to kill a TCP flow with enough information (seqnos
which one could collect e.g., from a tcpdump), just a RST is necessary,
so user-space could do that too.--
i.
--
...Yes, also the previous one you quoted was already over the network.
But I think the key is in the receiving end rather than in the sending
end. Could try to extract any information about these I mentioned earlier,
to me there are mainly two weird things:1) Why we see orphaning with data in the first place. I think distcc would
be interested to read everything, unless some worker crashed in early...
Though some timeout in distcc could explain it as well but I don't know
too well how distcc does everything...2) Why the receiving end of the connection is still in ESTABLISHED
without an owner... If it had some unread data, should be in CLOSE or in
FIN_WAIT1 otherwise. Ie., tcp_close() would change the state of the flow.These both basically happen at the receiving end, though if there was
unread data the RST would also reflected state change to the sending side
and since there's window, also FIN would be sent right away and
cause sender to leave ESTABLISHED rather than that FIN would get stuck
into the tail of the TCP queue.How to collect the information, I'm not too sure, tcp_close might well
work as a plain printk because I think it shouldn't be that noisy. But
that would likely just show that it wasn't called for a stuck receiver
at all, so it would probably end up being a dead-end anyway.--
i.
--
Hi.
Are namespaces enabled? It is pretty non-trivial to make established
socket without process from userspace :)--
Evgeniy Polyakov
--
they are enabled in this particular kernel config, but there were a
couple of other failures where they were not enabled.Ingo
--
Just a quick note to say, me too!!
same scenario: distcc on localhost.
--
Me too, however with a completely different scenario; my hung connections
are not related to distcc at all. The output from /proc/net/tcp that Ingo
posted a few days ago are somewhat different from mine, however I believe
this is the same problem or at least related. Just as Ingo experienced,
netstat -p only shows PID/program as '-' for the hung connections while
for other connections it shows the expected results.I have recently bought a new PC and have started the process of copying
stuff from my old PC to the new PC. During this I have experienced this
hang several times. I started copying by using tar on both ends over a ssh
pipe but in order to eliminate possible ssh problems I also have tried tar
over a ttcp connection which also fails. There is no obvious pattern of
when this happens, I have experienced failures after transferring 1.15GB,
51.4GB and 23.6GB.Here is the output from netstat -n -o filtered for port 22 and slightly
edited. All the lines started with Proto == tcp and Recv-Q == 0.Send-Q Local Addr Foreign Addr State Timer
0 old_pc:22 new_pc:52667 ESTABLISHED keepalive (3513.93/0/0)
0 old_pc:22 new_pc:43825 ESTABLISHED keepalive (5467.38/0/0)
2896 old_pc:22 new_pc:58601 ESTABLISHED on (21020884.65/0/0)
4344 old_pc:22 new_pc:54105 ESTABLISHED on (21017016.33/0/0)
2896 old_pc:22 new_pc:34149 ESTABLISHED on (20986889.24/0/0)The first two connections are ongoing, working, interactive ssh
connections. The other three connections died days ago on my new PC.One thing that caught my eyes was these very high timer values.
Checking the netstat source reveals that the value printed is "(double)
time_len / HZ" and that time_len is extracted from /proc/net/tcp. While
my CONFIG_HZ is 1000, I assume netstat has picked up HZ as 100 from
/usr/include/asm/param.h, and then things really seems to imply that
there is some integer overflow since 2^31 = 2147483648.Looking into get_tcp4_sock in net/ipv4/tcp_ipv4.c I see that timer_expires
is initialized with icsk->icsk_ti...
Thanks for reporting!
On Sat, 31 May 2008, H
The ssh connection used for copying (using the command <ssh old_pc "cd
/directory; tar cvf - *" | pv | tar xvf ->) died in the following way:
...
Read from remote host old_pc: Connection timed out
51.4GB 4:26:19 [3.29MB/s] [<=> ]
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting nowand there are currently no traces of those ssh processes any longer on
the new PC, only these two active ssh interactive connections are present:new_pc>ps -elf | sed -n '1p;/ssh/p'
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 S hlovdal 1310 1195 0 80 0 - 14139 select 09:28 pts/14 00:00:00 ssh old_pc
5 S root 2477 1 0 80 0 - 15203 - May26 ? 00:00:00 /usr/sbin/sshd
1 S hlovdal 2727 2701 0 80 0 - 12466 select May26 ? 00:00:00 /usr/bin/ssh-agent /bin/sh -c exec -l /bin/bash -c "/usr/bin/startkde"
0 S hlovdal 26436 19976 0 80 0 - 20375 pipe_w 19:43 pts/9 00:00:00 sed -n 1p;/ssh/p
0 S hlovdal 30065 18970 0 80 0 - 14174 select May28 pts/6 00:00:00 ssh old_pc
new_pc>On the old PC there are three corresponding zombies:
old_pc>ps -elf | sed -n '1p;/ssh/p'
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
5 S root 4402 1 0 80 0 - 1600 - May25 ? 00:00:02 /usr/sbin/sshd
1 S root 5418 1 0 80 0 - 1347 - May25 ? 00:00:00 ssh-agent
1 S hlovdal 5752 1 0 80 0 - 1347 - May25 ? 00:00:00 ssh-agent
4 S root 6824 4402 0 80 0 - 2315 - 09:28 ? 00:00:00 sshd: hlovdal [priv]
5 S hlovdal 6828 6824 0 80 0 - 2355 - 09:28 ? 00:00:01 sshd: hlovdal@pts/8
0 S hlovdal 17186 25557 0 80 0 - 1191 - 19:45 pts/0 00:00:00 sed -n 1p;/ssh/p
4 D root 24444 4402 0 80 0 - 0 - May27 ? 00:00:00 [sshd]
5 Z hlovdal 24448 24444 0 80 0 - ...
On Sat, 31 May 2008, H
This is only from checking after the connection was dead. By the way,
I just had to remotely reboot the new machine because the window
manager locked up, however the old PC are still listing the defunctI can of course keep on copying for testing purposes, but then I would like to
be able to dump only that single tcp connection, any tips of how to do that?
I found nothing specific in the manuals of wireshark and tcpdump. Of course
it is possible to capture everything and filter afterwards, but since I will be
transferring lots of data the logs will get huge and I would not like to haveI have not done any specific selection myself. On old_pc: bic, new_pc: cubic.
BR Håkon Løvdal
On Sat, 31 May 2008, H
MjAwOC81LzMxIElscG8gSsOkcnZpbmVuIDxpbHBvLmphcnZpbmVuQGhlbHNpbmtpLmZpPjoKPiBJ
IGRpZG4ndCByZWFsbHkgbWVhbiB0Y3BkdW1wLCBJIHdhcyBtb3JlIHRoaW5raW5nIG9mIHN5c2Nh
bGwgd2hhdCBpcyB0aGUKPiBzeXNjYWxsIHdoZXJlIHRoZSBwcm9jZXNzIGlzIHdhaXRpbmcuIFRo
b3VnaCB0Y3BkdW1wIG1pZ2h0IHJldmVhbAo+IHNvbWV0aGluZyBhcyB3ZWxsIGFib3V0IHRoZSBi
ZWhhdmlvciB3aGVuIG5lYXJpbmcgdGhlIHByb2JsZW0sCgo+IEhtbSwgcGVyaGFwcyBwZXJpb2Rp
Y2FsbHkgY2hlY2tpbmcgL3Byb2MvbmV0L3RjcCAoZS5nLiwgb25jZSBwZXIgMTBzKSBpZgo+IHRo
ZSB0aW1lb3V0IGlzIGxhcmdlciB0aGFuIFRDUF9SVE9fTUFYIG1pZ2h0IGFsbG93IHNvbWUgc2Ny
aXB0IHRvCj4gaW1tZWRpYXRlbHkgbm90aWNlIHdoZW4gdGhpbmdzIGJyb2tlIHdoaWxlIHJlcHJv
ZHVjaW5nIGl0LiBTdG9yaW5nIGFsbAo+IHRob3NlIG9uY2UgcGVyIDEwcyB2YWx1ZXMgc2hvdWxk
bid0IGJlIGEgdG9vIGJpZyBlaXRoZXIsIGl0IGNvdWxkIGV2ZW4gYmUKPiBkb25lIGluIGJvdGgg
ZW5kcyBmb3IgYSBzaW5nbGUgZmxvdyAoYnV0IEknbGwgbGVhdmUgYSBzY3JpcHQgdG8gZG8gdGhh
dCBvbgo+IE1vbmRheSkuCgpIaS4gSSBoYXZlIG5vdyB0ZXN0ZWQgYW5kIHRha2VuIHZhcmlvdXMg
bG9ncyBiZWZvcmUvdW5kZXIvYWZ0ZXIgKHBzLCBuZXRzdGF0LAp0Y3BkdW1wLCBzdHJhY2UsIC9w
cm9jL25ldC90Y3ApLiBUaGUgbG9ncyBkaWQgbm90IGdyb3cgYXMgYmlnIGFzIEkgZmVhcmVkCmhv
d2V2ZXIgdGhleSBhcmUgd2F5IHRvbyBiaWcgYXR0YWNoIGhlcmUuIEFueSB0aXBzIG9uIGhvdyB0
byBkaXN0cmlidXRlPwoKSSBtZW50aW9uZWQgYmVmb3JlIHRoYXQgdGhlIHNzaCBjb25uZWN0aW9u
IGRpZWQgb24gdGhlIG5ldyBQQywgYnV0IHNvbWV0aW1lcwppdCBqdXN0IGhhbmdzIGFzIHdlbGws
IGkuZS4gdGhlIHNzaCBjb25uZWN0aW9uIGlzIGFsaXZlIGJ1dCB0aGVyZSBpcwpubyBkYXRhIHRy
YW5zZmVycmVkLgpUaGUgbGFzdCB0ZXN0IEkgZGlkIGhhbmcgYWZ0ZXIganVzdCAyOTJNQiB0cmFu
c2ZlcnJlZC4gSW4gdGhpcyBjYXNlIHB2Cmp1c3QgaGFuZ3MKb24gdGhlIG5ldyBQQywgd2FpdGlu
ZyBmb3IgZGF0YSB0aGF0IG5ldmVyIGFycml2ZXMuIE9uIHRoZSBzZW5kaW5nCnNpZGUgdGFyIHdo
aWNoCmlzIHByb2R1Y2luZyBkYXRhIGhhdmUgZGllZCBhbmQgdGhlIGNvcnJlc3BvbmRpbmcgc3No
IGlzIGxvY2tlZCBpbiBzb21lCnVuaW50ZXJydXB0aWJsZSBzbGVlcDoKCi8vbmV0c3RhdCBiZWZv
cmUgdGhlIGhhbmc6ClByb3RvIFJlY3YtUSBTZW5kLVEgTG9jYWwgQWRkcmVzcyAgICAgICAgICAg
ICAgIEZvcmVpZ24gQWRkcmVzcwogICAgIFN0YXRlICAgICAgIFBJRC9Qcm9ncmFtIG5hbWUgICAg
VGltZXIKdGNwICAgICAgICAwICAgIDUwNCBuZXdfcGM6MjIgICAgICAgICAgICAgICAgICAgb2xk
X3BjOjM5NDY0CiAgICAg...
On Wed, 4 Jun 2008, H
MjAwOC82LzQgSWxwbyBKw6RydmluZW4gPGlscG8uamFydmluZW5AaGVsc2lua2kuZmk+Ogo+IE9u
IFdlZCwgNCBKdW4gMjAwOCwgSMOla29uIEzDuHZkYWwgd3JvdGU6Cj4KPj4gSGkuIEkgaGF2ZSBu
b3cgdGVzdGVkIGFuZCB0YWtlbiB2YXJpb3VzIGxvZ3MgYmVmb3JlL3VuZGVyL2FmdGVyIChwcywK
Pj4gbmV0c3RhdCwgdGNwZHVtcCwgc3RyYWNlLCAvcHJvYy9uZXQvdGNwKS4gVGhlIGxvZ3MgZGlk
IG5vdCBncm93IGFzIGJpZwo+PiBhcyBJIGZlYXJlZCBob3dldmVyIHRoZXkgYXJlIHdheSB0b28g
YmlnIGF0dGFjaCBoZXJlLiBBbnkgdGlwcyBvbiBob3cgdG8KPj4gZGlzdHJpYnV0ZT8KPgo+IFVz
dWFsbHkgc3VjaCBsb2dzIGNvbXByZXNzIHF1aXRlIHdlbGwuIEEgYmVzdCB0aGluZyBtaWdodCBi
ZSB0byBwdXQKPiBzb21ld2hlcmUgKGlmIHBvc3NpYmxlKSBhbmQgaHR0cDovLyByZWZlcmVuY2Ug
dGhlbS4KClRoZSBsYXJnZXN0IGZpbGUgd2FzIGEgc3RyYWNlIGxvZyBvZiAxLjVHQiB1bmNvbXBy
ZXNzZWQuIEkgd2lsbCB0cnkgdG8Kc29ydCBvdXQgc29tZXRoaW5nLgoKCj4+IC8vbmV0c3RhdCBi
ZWZvcmUgdGhlIGhhbmc6Cj4+IFByb3RvIFJlY3YtUSBTZW5kLVEgTG9jYWwgQWRkcmVzcyAgICAg
ICAgICAgICAgIEZvcmVpZ24gQWRkcmVzcwo+PiAgICAgIFN0YXRlICAgICAgIFBJRC9Qcm9ncmFt
IG5hbWUgICAgVGltZXIKPj4gdGNwICAgICAgICAwICAgIDUwNCBuZXdfcGM6MjIgICAgICAgICAg
ICAgICAgICAgb2xkX3BjOjM5NDY0Cj4+ICAgICAgRVNUQUJMSVNIRUQgMTcwNTQvc3NoZDogaGxv
dmRhbCBvbiAoMC4yMC8wLzApCj4+Cj4+IC8vIGFmdGVyIHRoZSBoYW5nOgo+PiBQcm90byBSZWN2
LVEgU2VuZC1RIExvY2FsIEFkZHJlc3MgICAgICAgICAgICAgICBGb3JlaWduIEFkZHJlc3MKPj4g
ICAgICBTdGF0ZSAgICAgICBQSUQvUHJvZ3JhbSBuYW1lICAgIFRpbWVyCj4+IHRjcCAgICAgICAg
MCAgIDI4OTYgb2xkX3BjOjIyICAgICAgICAgICAgICAgICAgIG5ld19wYzozOTQ2NAo+PiAgICAg
IEVTVEFCTElTSEVEIC0gICAgICAgICAgICAgICAgICAgb24gKDIxMzI5NDQxLjE5LzAvMCkKPgo+
ID8/PyBBcmUgdGhlc2Ugb24gdGhlIHNhbWUgbWFjaGluZT8gSG93IGNvbWUgdGhlIGlwLWFkZHJl
c3NlcyBnb3QgcmV2ZXJzZWQKPiAod2hpbGUgcG9ydHMgcmVtYWluIHRoZSBzYW1lKS4uLj8hPwoK
T29wcywgdGhhdCB3YXMgYW4gZWRpdGluZyBtaXN0YWtlIG9uIG15IHBhcnQsIG9sZCBhbmQgbmV3
IGFyZSBtaXhlZCB1cAppbiB0aGUgbGFzdCBsaW5lLgoKQlIgSMOla29uIEzDuHZkYWwK
--
On Wed, 4 Jun 2008, H
T24gMDQvMDYvMjAwOCwgSWxwbyBKw6RydmluZW4gPGlscG8uamFydmluZW5AaGVsc2lua2kuZmk+
IHdyb3RlOgo+IEkgZ3Vlc3Mgbm90IGFsbCBvZiB0aGUgc3RyYWNlIHdpbGwgYmUgbmVjZXNzYXJ5
IGFueXdheS4gWW91IGNvdWxkIGZpcnN0Cj4gIGxlYXZlIHRoYXQgb3V0IGNvbXBsZXRlbHkgYW5k
IGNvbmNlbnRyYXRlIG9uIHRoZSBvdGhlciBsb2dzIGZpcnN0LgoKSSBoYXZlIG5vdyBzZXQgdXAg
YSBzZXJ2ZXIgb24gbXkgbWFjaGluZSBhbmQgbWFkZSBhbGwgbG9ncyBhdmFpbGFibGUuCkkgd2ls
bCBmb2xsb3cgdXAgd2l0aCB0aGUgYWRyZXNzIGluIHByaXZhdGUgZW1haWwgdG8gdGhvc2UgdGhh
dCBhbHJlYWR5CmhhdmUgcG9zdGVkIGluIHRoaXMgdGhyZWFkLiBGb3Igb3RoZXJzLCBwbGVhc2Ug
YXNrIHRvIHJlY2VpdmUgaXQuCgpCUiBIw6Vrb24gTMO4dmRhbAoKUFMKSSB3aWxsIGJlIGF3YXkg
Zm9yIHRoaXMgd2Vla2VuZC4K
--
On Fri, 6 Jun 2008, H
MjAwOC82LzkgSWxwbyBKw6RydmluZW4gPGlscG8uamFydmluZW5AaGVsc2lua2kuZmk+Ogo+IE9u
IEZyaSwgNiBKdW4gMjAwOCwgSMOla29uIEzDuHZkYWwgd3JvdGU6Cj4+IEkgaGF2ZSBub3cgc2V0
IHVwIGEgc2VydmVyIG9uIG15IG1hY2hpbmUgYW5kIG1hZGUgYWxsIGxvZ3MgYXZhaWxhYmxlLgo+
PiBJIHdpbGwgZm9sbG93IHVwIHdpdGggdGhlIGFkcmVzcyBpbiBwcml2YXRlIGVtYWlsIHRvIHRo
b3NlIHRoYXQgYWxyZWFkeQo+PiBoYXZlIHBvc3RlZCBpbiB0aGlzIHRocmVhZC4gRm9yIG90aGVy
cywgcGxlYXNlIGFzayB0byByZWNlaXZlIGl0Lgo+Cj4gQ2FuIHlvdSBwbGVhc2UgZXhwbGFpbiB0
aGlzOgo+Cj4gaWpqYXJ2aW5Ad3JsLTU5On4vdG1wJCBidW56aXAyIC10IGNvcHl0ZXN0MDAzLmNh
cHR1cmUuYnoyCj4gYnVuemlwMjogY29weXRlc3QwMDMuY2FwdHVyZS5iejI6IGRhdGEgaW50ZWdy
aXR5IChDUkMpIGVycm9yIGluIGRhdGEKPgo+IFlvdSBjYW4gdXNlIHRoZSBgYnppcDJyZWNvdmVy
JyBwcm9ncmFtIHRvIGF0dGVtcHQgdG8gcmVjb3Zlcgo+IGRhdGEgZnJvbSB1bmRhbWFnZWQgc2Vj
dGlvbnMgb2YgY29ycnVwdGVkIGZpbGVzLgo+Cj4gaWpqYXJ2aW5Ad3JsLTU5On4vdG1wJCBscyAt
YWwgY29weXRlc3QwMDMuY2FwdHVyZS5iejIKPiAtcnctci0tLS0tIDEgaWpqYXJ2aW4gdGtvbCAy
NzU0NDE2MjAgIDkuIGtlc8ODwqTDgiDDgiAgMTY6MzkKPiBjb3B5dGVzdDAwMy5jYXB0dXJlLmJ6
Mgo+IGlqamFydmluQHdybC01OTp+L3RtcCQgbWQ1c3VtIGNvcHl0ZXN0MDAzLmNhcHR1cmUuYnoy
Cj4gNGJiZjgwMWNmMWZiZDUyMzk5MzIyYjA4OGMxYTIwNTggIGNvcHl0ZXN0MDAzLmNhcHR1cmUu
YnoyCj4gaWpqYXJ2aW5Ad3JsLTU5On4vdG1wJAo+Cj4gLi4uYWxzbyB0aGUgb25lIGZyb20gYWxs
ZmlsZXMuemlwIGlzIHNpbWlsYXJpbHkgZGFtYWdlZC4KPgo+IC4uLklmIHlvdSBkb24ndCBoYXZl
IGFuIGV4cGxhbmF0aW9uLCB0aGVtIGl0IG1pZ2h0IGJlIHRpbWUgdG8gY2hlY2sKPiB5b3VyIGhh
cmR3YXJlLi4uLgoKSSBkbyBub3QgaGF2ZSBhbnkgZ29vZCBleHBsYW5hdGlvbiAoYmFkIG1lbW9y
eT8pLiBXaGVuIGNoZWNraW5nIHRoZQpjb3B5dGVzdDAwMy5jYXB0dXJlLmJ6MiBmaWxlIGluIHRo
ZSBodHRwIHNlcnZlciBzdG9yYWdlIGRpcmVjdG9yeQpJIGdldCB0aGUgc2FtZSByZXN1bHRzIGFz
IHlvdS4gSG93ZXZlciBJIGFsc28gaGF2ZSB0aGUgb3JpZ2luYWwgZmlsZQpzdG9yZWQgc3RpbGwg
YW5kIGNvbXBhcmluZyB0aGVtIHdpdGggY21wIC1sIGdpdmVzCiAyMDU1ODY1NiAxMjQgMTA0CnNv
IG9idmlvdXNseSBhIGJpdCBoYXMgYmVlbiBmbGlwcGVkLiBJIGhhdmUgbm93IHJlcGxhY2VkIHRo
ZQplcnJvbmVvdXMgZmlsZSB3aXRoIGEgZ29vZCBjb3B5LCBzbyB5b3Ugc2hvdWxkIGJlIGFibGUg
dG8gZG93bmxvYWQgYWdhaW4uCgpCUiBIw6Vrb24gTMO4dmRhbAoKQnR3LCBJIGdvdCBhICJ0cmFk
aXRpb25hbCIgaGFuZyBu...
On Wed, 11 Jun 2008, H
Hi. I tested the old pc with memtest86 and memtest86+ for 16 and 50
hours without any errors detected, so apparently the ram is ok.I now just did another test from a freshley booted old pc, standing
idle for around 2 hours before I started the testing (no network activity
before I plugged in the cable just before starting testing).The copying failed after just 2.62GB, so the logs are short and
nice. Stored on same server as earlier, new 2008-06-19 directory. One
thing I noticed in the dmesg output on the old pc was the following:BUG: unable to handle kernel paging request at virtual address ffff3cff
printing eip: ffff3cff *pde = 00793067 *pte = 00000000
Oops: 0000 [#1] PREEMPT
Modules linked in: ...
Pid: 18103, comm: sshd Not tainted (2.6.24-gafd564a8 #4)
...
Call Trace:
[<f8c45b39>] nf_conntrack_find_get+0x36/0x3a [nf_conntrack]
[<f8c46379>] nf_conntrack_in+0x135/0x3e8 [nf_conntrack]
[<c0423c5d>] local_bh_disable+0xa/0xb
[<c0626d54>] _read_lock_bh+0x12/0x1a
...BR Håkon Løvdal
MjAwOC81LzMxIElscG8gSsOkcnZpbmVuIDxpbHBvLmphcnZpbmVuQGhlbHNpbmtpLmZpPjoKPiBP
biBTYXQsIDMxIE1heSAyMDA4LCBIw6Vrb24gTMO4dmRhbCB3cm90ZToKPj4gMjAwOC81LzMxIEls
cG8gSsOkcnZpbmVuIDxpbHBvLmphcnZpbmVuQGhlbHNpbmtpLmZpPjoKPj4gPiBTbyB5b3UgaGFk
IHRoYXQgJy0nIGVhcmxpZXIgYW5kIHlvdSBjaGVja2VkIGF0IHRoYXQgdGltZSBidXQgdGhlCj4+
ID4gY29ubmVjdGlvbiBpcyBub3cgYWxyZWFkeSBkZWFkPwo+Pgo+PiBUaGlzIGlzIG9ubHkgZnJv
bSBjaGVja2luZyBhZnRlciB0aGUgY29ubmVjdGlvbiB3YXMgZGVhZC4KPgo+IENvdWxkIHlvdSBw
bGVhc2UgcmVwaHJhc2UgdGhlIGFuc3dlciwgSSBmYWlsZWQgdG8gdW5kZXJzdGFuZCBpdC4uLiA6
LSkKPiAuLi5Zb3Ugc2FpZCBlYXJsaWVyIHRoYXQgeW91IGhhZCAnLScgb3duZWQgY29ubmVjdGlv
bnMgbGlrZSBJbmdvLCB3aGVuIGRpZAo+IHRoYXQgaGFwcGVuIChub3cgdGhlIGNvbm5lY3Rpb25z
IHdvbid0IGV4aXN0cyBhbnltb3JlLCBzbyBhdCB3aGF0IHBvaW50IG9mCj4gdGltZSB5b3Ugc2F3
IHRob3NlIG5vbi1vd25lZCBjb25uZWN0aW9ucyk/CgpXaGF0IEkgbWVhbiBpcyB0aGF0IEkganVz
dCB0ZXN0ZWQgIm5ldHN0YXQgLXAiIGFuZCBub3RpY2VkIHNpbWlsYXJpdGllcwp3aXRoIEluZ28n
cyByZXN1bHQgdG9kYXkgd2hpbGUgcmVwbHlpbmcgdG8gdGhpcyB0aHJlYWQsIHdoaWNoIGlzIGRh
eXMKYWZ0ZXIgdGhlIGNvbm5lY3Rpb25zIGVuZGVkLgoKQlIgSMOla29uIEzDuHZkYWwK
--
On Sat, 31 May 2008, Ilpo J
On Mon, 26 May 2008 13:56:28 +0200
interestingly enough, I think I'm seeing something very similar to this
with the claws email client; as of 2.6.26-rc3 (sorry, didn't run
earlier ones because they broke my wireless) claws gets "stuck" once in
a while, really stuck, when talking to the imap server.....
It seems to happen a little less frequently when running under strace,
but it still happens, so that could just be a timing thing.--
...but you can still tcpdump that particular flow once the situation is
discovered to see if TCP still tries to do something, no? One needs to
tcpdump couple of minutes at minimum. Also please get /proc/net/tcp forCan you please try with /proc/sys/net/ipv4/tcp_frto set to zero though
recv-q symptom seems weird would it be related to that (but there were
some recent fixes to FRTO and retrans_stamp change could have some
significance here)?Other than that, nothing since -rc1 seems suspicious to me (though
I hardly understand every part of networking).--
i.
--
On Mon, May 26, 2008 at 6:28 AM, Ilpo Järvinen
Okay, but in some sense you've already bisected this somewhat. I'm
assuming that your testing uses the latest tip and is refreshed daily.If that's the case, then I would (possibly naively) expect the culprit
to show up in a:
git log -p v2.6.26-rc1..78b58e549a3098
net/{compat.c,core,ipv4,netfilter,packet,sched,socket.c}There are only a few commits in there that appear to touch network behavior:
79d44516b4b178ffb6e2159c75584cfcfc097914
a1c1f281b84a751fdb5ff919da3b09df7297619f
62ab22278308a40bcb7f4079e9719ab8b7fe11b5Reverting just those three and running overnight might provide a clue.
OTOH, I'm in no way a net/ expert, so if you are already working a
debugging strategy then feel free to ignore this. I'm only piping up
ok, i will first wait for it to trigger on a box and will do the tcpdump
session (and /proc/net/tcp output), then i'll continue the tests with
this done in the rc.local:echo 0 > /proc/sys/net/ipv4/tcp_frto
and will see whether the hung connections still occur. The cycle of
testing will be very slow i suspect.Ingo
--
i got lucky ... the bug just reproduced again - find the netstat and
/proc/net/tcp output below. New config attached.i also ran this for 15 minutes:
[root@europe ~]# tcpdump src and dst 10.0.1.15 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytesbut there was no output at all - i.e. TCP has not attempted to do
anything to these connections.there's a hung distcc task on the system, waiting for socket action
forever:[root@europe ~]# strace -fp 19578
Process 19578 attached - interrupt to quit
select(5, NULL, [4], [4], {82, 90000} <unfinished ...>disturbing that task via strace did not change the state of the socket -
and that's not unexpected as it's a select(). [TCP state might be
affected if strace impacted a recvmsg or a sendmsg wait directly.]i'll keep this system in its hung state for an hour, just in case you
can think of anything else to extract out of it.Ingo
------------->
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 10.0.1.15:22 10.0.1.16:42843 ESTABLISHED
tcp 0 207232 10.0.1.15:37198 10.0.1.15:3632 ESTABLISHED
tcp 0 0 10.0.1.15:22 10.0.1.16:36284 ESTABLISHED
tcp 72283 0 10.0.1.15:3632 10.0.1.15:37198 ESTABLISHED
Tcp:
1931 active connections openings
149 passive connection openings
0 failed connection attempts
0 connection resets received
4 connections established
436004 segments received
98819 segments send out
4 segments retransmited
0 bad segments received.
0 resets sent
UdpLite:
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode
0: 000...
Hmm, readfds is NULL isn't it?!? Are you sure you straced the right
I fail to understand this paragraph due to excessive negation... :-)
--
i.
--
i mean, sometimes a TCP connection can get 'unstuck' if you strace a
task - that is because the TCP related syscall the task sits in gets
interrupted. But in this case it's select() which doesnt explicitly take
the socket, doesnt do any tcp_push_pending_frames() processing, etc. -
it just its on the socket waitqueue AFAICS. And that's expected.Ingo
--
> * Ilpo J
it wasnt the receiving process. There's no receiving process - which is
weird:Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 207232 europe:37198 europe:distcc ESTABLISHED 19578/distcc
tcp 0 0 europe:ssh dione:36284 ESTABLISHED -
tcp 0 0 europe:ssh e2:45910 ESTABLISHED -i just gave it as a general example of why sometimes stracing a task can
'disturb' the observed system and can kick the TCP state machine out ofthe socket does not seem to be owned. It should have closed down?
Refcounting issue?find below the sysrq-t dump.
Ingo
-------------------------->
[11811.540000] SysRq : Show State
[11811.540000] task PC stack pid father
[11811.540000] init S f7c67b04 0 1 0
[11811.540000] f7c70000 00000086 00000002 f7c67b04 f7c67b0c 00000000 00000046 f7ca08b8
[11811.540000] f7c67b2c 00119272 00000002 80130c53 00000000 01ca0000 f7c67b08 f7c70164
[11811.540000] 82cbb080 0011907f 00000000 00000000 00000000 00000000 000000ff f7c67b2c
[11811.540000] Call Trace:
[11811.540000] [<80130c53>] internal_add_timer+0x103/0x110
[11811.540000] [<806ab7da>] schedule_timeout+0x4a/0xd0
[11811.540000] [<801310e0>] process_timeout+0x0/0x40
[11811.540000] [<806ab7d5>] schedule_timeout+0x45/0xd0
[11811.540000] [<8019868c>] do_select+0x3dc/0x500
[11811.540000] [<801982b0>] do_select+0x0/0x500
[11811.540000] [<80198d60>] __pollwait+0x0/0xe0
[11811.540000] [<80120070>] default_wake_function+0x0/0x10
[11811.540000] [<801852b6>] check_bytes_and_report+0x26/0xd0
[11811.540000] [<80184f6e>] slab_pad_check+0x7e/0x100
[11811.540000] [<80185695>] check_object+0xe5/0x1f0
[11811.54...
> * Ilpo J
yes, that would match the symptoms i think. I half-assumed that it's a
state machine problem so i didnt even check what the reader does - and
in this case it appears to not exist at all anymore ;-)Ingo
--
after ~7 hours of uptime the networking code produced this assertion:
[25441.140000] KERNEL: assertion (!sk->sk_ack_backlog) failed at
net/ipv4/inet_connection_sock.c (642)no further information in the logs. I kept the failing system booted up
for 8 hours and will reboot it now.Ingo
--
...Added Pavel & Denis.
I keep wondering why distcc in the first place has a connection orphaned
before it has read all data from it, I guess it should not yet be
speculatively executing anything or so :-). Anyway, there should be RST
due to that data because nobody is going read it after the process in gone
but since state is still ESTABLISHED, I doubt that tcp_close() was ever...I checked /proc/net/tcp output you gave earlier, and indeed, it shows
--
i.
--
... which is not surprising as it traced eth0 ;-) [10.0.1.15 is the IP
of the box and routes back to loopback]with the proper tcpdump it gave:
[root@europe ~]# tcpdump -i lo
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
16:15:05.260000 IP europe.37198 > europe.distcc: . ack 2022529163 win 257 <nop,nop,timestamp 209526 197526>
16:15:05.260000 IP europe.distcc > europe.37198: . ack 1 win 0 <nop,nop,timestamp 209526 7020>hm, a window of 257 and zero, is that right?
Ingo
--
Thanks, it's much easier to diagnose right with symptoms that are based
...Yes, though that 257 is lacks the effect of window scaling (and it's
mostly irrelevant anyway since the problem occurs in opposite dir). The
sender is using just window probes (/proc/net/tcp shows that too), that's
why you see them occassionally. So the main problem seems to be that
receiver doesn't read, rest is just a consequence of that (from TCP POV).
I doubt that tcp_frto will affect to that (and I read what a stale
retrans_stamp could do, it would hard cause any troubles, and those minor
things wouldn't realize in your setup anyway).I'll take a look into pre -rc1 changes now but it may well be beoynd
what I'm able to figure out.--
i.
--
there's TCP ACK activity every 120 seconds:
16:19:05.260000 IP europe.37198 > europe.distcc: . ack 2022529163 win 257
<nop,nop,timestamp 233526 221526>
16:19:05.260000 IP europe.distcc > europe.37198: . ack 1 win 0
<nop,nop,timestamp 233526 7020>
16:21:05.260000 IP europe.37198 > europe.distcc: . ack 1 win 257
<nop,nop,timestamp 245526 233526>
16:21:05.260000 IP europe.distcc > europe.37198: . ack 1 win 0
<nop,nop,timestamp 245526 7020>
16:23:05.260000 IP europe.37198 > europe.distcc: . ack 1 win 257
<nop,nop,timestamp 257526 245526>
16:23:05.260000 IP europe.distcc > europe.37198: . ack 1 win 0
<nop,nop,timestamp 257526 7020>
16:25:05.260000 IP europe.37198 > europe.distcc: . ack 1 win 257
<nop,nop,timestamp 269526 257526>
16:25:05.260000 IP europe.distcc > europe.37198: . ack 1 win 0
<nop,nop,timestamp 269526 7020>
16:27:05.260000 IP europe.37198 > europe.distcc: . ack 1 win 257
<nop,nop,timestamp 281526 269526>
16:27:05.260000 IP europe.distcc > europe.37198: . ack 1 win 0
<nop,nop,timestamp 281526 7020>The socket on the other side of the localhost connection,
10.0.1.15:3632, produces no packets.a detail i forgot to mention: i reproduced the same TCP hang on a 32-bit
system and on a 64-bit system as well.Ingo
--
| Greg Kroah-Hartman | [PATCH 006/196] Chinese: add translation of oops-tracing.txt |
| Jan Engelhardt | intel iommu (Re: -mm merge plans for 2.6.23) |
| James Bottomley | Re: Integration of SCST in the mainstream Linux kernel |
| Borislav Petkov | 2.6.23-rc1: no setup signature found... |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | Re: [BUG] New Kernel Bugs |
