...I kind of fail to follow in general in this mail which patch have been
tested and where and when... But I understand that it's just due to number
of tests & hosts & kernels & what-else you use and know by heart (and we
don't do all that well :-)). But I'll try to still sort it out below...
On Fri, 6 Jun 2008, Ingo Molnar wrote:
Yes, the problematic outside of locking portion shouldn't be there
without those DA changes.
...and you added the locking fix there instead? Or was this a removal?
No, part of the DEFER_ACCEPT stuff was postponed in 2.6.25..2.6.26-rc1
timeframe (ec3c0982a2dd1e671bad8e9d26c28dcba0039d87) so that one portion
of it ended up being added outside of the socket lock of the listening
socket, while touching its datastructures. Without
ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 the deferred accept related
things happen earlier, ie., while we still are under the lock of the
listening socket. So that particular locking bug was _introduced_ by that
ec3c change, not made more likely or so.
...Of course software is known to have bugs, so we might always be
(un?)lucky and hit another one and confuse... :-)
It seems to work quite well actually for this kind of networking related
bugs too which hardly depend on network at all :-).
Ah, sorry I forgot to add that one there, it was sent quite late in the
night and I just couldn't get sleep until sending the fix... :-) It was
one of the reverted ones that did it:
ec3c0982a2dd1e671bad8e9d26c28dcba0039d87.
If you want an older kernel, you would have to go basically to 2.6.25 or
so.
To summarize. Both 3changes+1fix revert (you refer to it only as 3-patch
revert) _and_ the locking fix I made should fix the problem (obviously
they exclude each other). ...And end which is significant is the one which
has LISTENing sockets (please keep this in mind if you still get the hang
and provide some info).
--
i.
--