Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...

Previous thread: [CALL FOR TESTING] Make Ext3 fsck way faster [2.6.24-rc6 -mm patch] by Abhishek Rai on Sunday, January 13, 2008 - 1:47 am. (1 message)

Next thread: HPET timer broken using 2.6.23.13 / nanosleep() hangs by Andrew Paprocki on Sunday, January 13, 2008 - 7:10 am. (12 messages)
To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <netdev@...>
Date: Sunday, January 13, 2008 - 3:35 am

I'm seeing problems with Sendmail on 24-rc6-mm1, where the main Sendmail is
listening on ::1/25, and Fetchmail connects to 127.0.0.1:25 to inject mail it
has just fetched from an outside server via IMAP - it will often just hang and
not make any further progress. Looking at netstat shows something interesting:

% netstat -n -a -A inet | grep 25
tcp 0 5108 127.0.0.1:59355 127.0.0.1:25 ESTABLISHED
% netstat -n -a -A inet6 | grep 25
tcp 0 0 :::25 :::* LISTEN
tcp 0 0 ::ffff:127.0.0.1:25 ::ffff:127.0.0.1:59355 ESTABLISHED
% netstat -n -a -A inet | grep 25
tcp 0 5108 127.0.0.1:59355 127.0.0.1:25 ESTABLISHED
% netstat -n -a -A inet6 | grep 25
tcp 0 0 :::25 :::* LISTEN
tcp 0 0 ::ffff:127.0.0.1:25 ::ffff:127.0.0.1:59355 ESTABLISHED
% netstat -n -a -A inet | grep 25
tcp 0 5108 127.0.0.1:59355 127.0.0.1:25 ESTABLISHED
% netstat -n -a -A inet6 | grep 25
tcp 0 0 :::25 :::* LISTEN
tcp 0 0 ::ffff:127.0.0.1:25 ::ffff:127.0.0.1:59355 ESTABLISHED

On the IPv4 side, it thinks it's got 5108 bytes in the send queue - but on
the IPv6 side of that same connection, it's showing 0 in the receive queue,
and we're stuck there.

It's not consistent - sometimes Fetchmail will wedge on the very first mail,
and do so several times in a row. Other times, it will do well for a while -
at the moment, it's gone through 471 of the 1,470 currently queued mails just
fine, only to get wedged again on number 472.

For what it's worth, here's what 'echo w > /proc/sysrq-trigger' got, although I
don't see anything that looks odd to me given the netstat output above -
procmail has sent data, and is waiting for a response back, and sendmail is
waitin...

To: Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <netdev@...>
Date: Monday, January 14, 2008 - 12:15 pm

The IPv6 is apparently a red herring - this morning I'm seeing the same problem
with another totally separate pair of programs that are IPv4-only, hanging
on loopback.

To: <Valdis.Kletnieks@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <netdev@...>
Date: Monday, January 14, 2008 - 12:36 pm

Are you still only seeing these problems on loopback? I can't help but wonder
if this is the skb_clone() problem where it wasn't copying skb->iif causing
SELinux to silently drop the packets. Then again, I'm not sure if there is a
clone operation in the code path are going down. From what I can remember I
only saw clones on some of the multicast stuff but I'm still learning some of
the darker corners of the stack.

If you've got some spare cycles, the kernel below should both have the
clone/iif fix (it's in Linus' tree now) as well as some printks when errors
occur so packet's are no longer silently dropped by SELinux.

* git://git.infradead.org/users/pcmoore/lblnet-2.6_testing

--
paul moore
linux security @ hp
--

To: Paul Moore <paul.moore@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <netdev@...>
Date: Monday, January 14, 2008 - 2:05 pm

Yes, I've only spotted it on loopback. The odd part is that I had reverted the
one commit 9c6ad8f6895db7a517c04c2147cb5e7ffb83a315 "Convert the netif code to
use ifindex values" - so either I managed to get the revert terribly wrong,
or there's something else odd going on. The first time around, I was seeing
hangs during a TCP 3-packet handshake - this time data flows for some number
of packets before hanging.

I'm pulling git://git.infradead.org/users/pcmoore/lblnet-2.6_testing at the
moment, and seeing if there's already a fix in there for this.

To: <unlisted-recipients@...>, <@...>
Cc: Paul Moore <paul.moore@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <netdev@...>
Date: Monday, January 14, 2008 - 2:22 pm

Apparently the only new commit in there since the tree that was in
24-rc6-mm1 is 5d95575903fd3865b884952bd93c339d48725c33 adding some warning
printk's. Would it be more productive to test against the full tree, or
leaving out the one commit I already reverted?

To: Paul Moore <paul.moore@...>, Andrew Morton <akpm@...>
Cc: <linux-kernel@...>, <netdev@...>
Date: Monday, January 14, 2008 - 2:50 pm

<voice=Emily Litella> Nevermind... </voice> :)

The new commit won't apply with the other one reverted - it patches
security/selinux/netnode.c which was created by the problematic commit...

To: <Valdis.Kletnieks@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <netdev@...>
Date: Monday, January 14, 2008 - 3:07 pm

There have been quite a few changes in lblnet-2.6_testing since 2.6.24-rc6-mm1
so I would recommend taking the whole tree. I'm also not quite sure if
simply reverting the "Convert the netif code to use ifindex values" patch
would solve the problem as there are other patches in the rc6-mm1 tree that
rely on skb->iif being valid (new code, not converted code). If you want to
stick with a _relatively_ vanilla rc6-mm1 tree I would leave everything in
and simply apply the following patch which solved the skb_clone()/iif
problem:

http://git.infradead.org/?p=users/pcmoore/lblnet-2.6_testing;a=commitdif...

--
paul moore
linux security @ hp
--

To: Paul Moore <paul.moore@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <netdev@...>
Date: Monday, January 14, 2008 - 7:04 pm

Initial testing indicates that 2.6.24-rc6-mm1 plus this one commit is
behaving itself correctly - my Tcl test case that reliably demonstrated wedges
during SYN handling is definitively fixed, and the current issue with hangs with
data pending seems to be gone as well (after admittedly light testing).

Thanks for finding the commit that fixed it...

To: <Valdis.Kletnieks@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <netdev@...>
Date: Monday, January 14, 2008 - 7:19 pm

No problem, glad to hear that fixed the problem. It's already in Linus' tree
so any future -mm kernels as well as 2.6.24 should be problem-free, at least
with respect to this ;)

--
paul moore
linux security @ hp
--

To: Paul Moore <paul.moore@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <netdev@...>
Date: Monday, January 14, 2008 - 3:37 pm

Weird. I did a 'git clone git://git.infradead.org/users/pcmoore/lblnet-2.6_testing'
into a new directory this morning, and doing a 'git log' against that only
showed the one added commit:

commit 5d95575903fd3865b884952bd93c339d48725c33
Author: Paul Moore <paul.moore@hp.com>
Date: Wed Jan 9 15:30:23 2008 -0500

SELinux: Add warning messages on network denial due to error

Currently network traffic can be sliently dropped due to non-avc errors which
can lead to much confusion when trying to debug the problem. This patch adds
warning messages so that when these events occur there is a user visible
notification.

Signed-off-by: Paul Moore <paul.moore@hp.com>

commit 9259ca5fd8b9fbdd2c3edade593dead905d8391e
Author: Paul Moore <paul.moore@hp.com>
Date: Wed Jan 9 15:30:23 2008 -0500

SELinux: Add network ingress and egress control permission checks
(already in 24-rc6-mm1).

OK, I'll go look at that..

To: <Valdis.Kletnieks@...>
Cc: Andrew Morton <akpm@...>, <linux-kernel@...>, <netdev@...>
Date: Monday, January 14, 2008 - 4:02 pm

It might be something on my end with managing the lblnet-2.6_testing git tree;
I'm still pretty clueless when it comes to git.

I've got a git tree on my dev machine which is backed against Linus' tree and
managed via stacked-git. I update the patches in this tree, refresh them
against new bits from Linus, etc and when something significant changes I
update the git tree on infradead.org and post a new patchset to the related
lists. The process of updating the git tree on infradead.org usually
involves deleting the entire tree located there, re-creating it, and then
doing a git-push from my dev machine. I have no idea if this is "correct" or
not, but I've often wondered if this is a the "right" way to do it ...

--
paul moore
linux security @ hp
--

To: <Valdis.Kletnieks@...>
Cc: <akpm@...>, <linux-kernel@...>, <netdev@...>
Date: Sunday, January 13, 2008 - 5:46 pm

Please provide a packet dump on both sides (or at least the sender
side).

Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

Previous thread: [CALL FOR TESTING] Make Ext3 fsck way faster [2.6.24-rc6 -mm patch] by Abhishek Rai on Sunday, January 13, 2008 - 1:47 am. (1 message)

Next thread: HPET timer broken using 2.6.23.13 / nanosleep() hangs by Andrew Paprocki on Sunday, January 13, 2008 - 7:10 am. (12 messages)