[bisected] Re: [bug] networking broke, ssh: connect to port 22: Protocol error

Previous thread: Re: 2.6.26-git0: IDE oops during boot by Pavel Machek on Wednesday, February 6, 2008 - 7:08 am. (17 messages)

Next thread: [git patches] net driver updates by Jeff Garzik on Wednesday, February 6, 2008 - 7:49 am. (2 messages)
To: David S. Miller <davem@...>, <linux-kernel@...>
Cc: <netdev@...>
Date: Wednesday, February 6, 2008 - 7:38 am

randconfig qa on x86.git ran into the following new networking related
problem on latest -git: with the attached .config the testbox comes up
but cannot establish any TCP connections due to -ENOPROTO in
sys_connect().

The error comes from this condition in inet_stream_connect():

/* Connection was closed by RST, timeout, ICMP error
* or another process disconnected us.
*/
if (sk->sk_state == TCP_CLOSE)
goto sock_error;

ICMP pings do work to the machine. Netfilter is on in the .config, maybe
some new option prevents TCP connections from being established?

CONFIG_SECURITY_NETWORK and CONFIG_SECURITY_SMACK is enabled as well.
(but that shouldnt throw a no-protocol error)

Ingo

To: <mingo@...>
Cc: <linux-kernel@...>, <netdev@...>
Date: Wednesday, February 6, 2008 - 7:42 am

From: Ingo Molnar <mingo@elte.hu>

Make sure you have the following fix in your tree.

It might be the cause.

commit 5d8c0aa9433b09387d9021358baef7939f9b32c4
Author: Pavel Emelyanov <xemul@openvz.org>
Date: Tue Feb 5 03:14:44 2008 -0800

[INET]: Fix accidentally broken inet(6)_hash_connect's port offset calculations.

The port offset calculations depend on the protocol family, but, as
Adrian noticed, I broke this logic with the commit

5ee31fc1ecdcbc234c8c56dcacef87c8e09909d8
[INET]: Consolidate inet(6)_hash_connect.

Return this logic back, by passing the port offset directly into the
consolidated function.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Noticed-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 48ac620..97dc35a 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -389,7 +389,7 @@ static inline struct sock *inet_lookup(struct net *net,
}

extern int __inet_hash_connect(struct inet_timewait_death_row *death_row,
- struct sock *sk,
+ struct sock *sk, u32 port_offset,
int (*check_established)(struct inet_timewait_death_row *,
struct sock *, __u16, struct inet_timewait_sock **),
void (*hash)(struct sock *sk));
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 90f422c..9cac6c0 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -398,7 +398,7 @@ out:
EXPORT_SYMBOL_GPL(inet_unhash);

int __inet_hash_connect(struct inet_timewait_death_row *death_row,
- struct sock *sk,
+ struct sock *sk, u32 port_offset,
int (*check_established)(struct inet_timewait_death_row *,
struct sock *, __u16, struct inet_timewait_sock **),
void (*hash)(struct sock *sk))
@@ -413,7 +413,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
...

To: David Miller <davem@...>
Cc: <linux-kernel@...>, <netdev@...>
Date: Wednesday, February 6, 2008 - 8:22 am

this is already upstream. As i mentioned above i tested latest -git.
(HEAD 551e4fb2465b8)

So no, it does not fix the problem. The config i sent is a rather
generic one, it should boot on most whitebox PCs. TCP connections will
fail immediately, all the time.

(I reverted 5d8c0aa943 as well, that didnt solve the problem either.)

Ingo
--

To: <mingo@...>
Cc: <linux-kernel@...>, <netdev@...>
Date: Wednesday, February 6, 2008 - 8:32 am

From: Ingo Molnar <mingo@elte.hu>

I suspect this got added recently with how often and how thoroughly
you test things :-)

If you can only give us the last GIT head that worked on that machine
it might help us narrow things down a lot.

If you have time for a bisect, even better but not absolutely
required.
--

To: David Miller <davem@...>
Cc: <linux-kernel@...>, <netdev@...>
Date: Wednesday, February 6, 2008 - 9:11 am

yeah, although various other upstream breakages prevented real long
randconfig series in the past 2-3 days. I'd say it's either in this pull
from your tree:

Author: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Tue Feb 5 10:09:07 2008 -0800

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)

or perhaps in this one:

Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon Feb 4 07:43:36 2008 -0800

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (77 commits)

i'll figure it out, it's totally reproducible so it should be easy to
bisect. Just wanted to know whether you had anything queued up already
for something like this.

Ingo
--

To: David Miller <davem@...>
Cc: <linux-kernel@...>, <netdev@...>, Linus Torvalds <torvalds@...>, Casey Schaufler <casey@...>
Date: Wednesday, February 6, 2008 - 9:35 am

ok, i have bisected it down but the result made no sense, so i
double-checked it and noticed that the .config mutated during the test.

the diff below is the diff between the 'good' and 'bad' .config, with
this notable detail:

@@ -2336,7 +2350,7 @@ CONFIG_SECURITY_NETWORK=y
CONFIG_SECURITY_CAPABILITIES=y
# CONFIG_SECURITY_FILE_CAPABILITIES is not set
# CONFIG_SECURITY_ROOTPLUG is not set
-# CONFIG_SECURITY_SMACK is not set
+CONFIG_SECURITY_SMACK=y
CONFIG_XOR_BLOCKS=m
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m

so i disabled CONFIG_SECURITY_SMACK, and viola, just 2 hours of hard
work later networking works on my testbox again :-/

And we have this 1 day old commit:

commit e114e473771c848c3cfec05f0123e70f1cdbdc99
Author: Casey Schaufler <casey@schaufler-ca.com>
Date: Mon Feb 4 22:29:50 2008 -0800

Smack: Simplified Mandatory Access Control Kernel

that adds SMACK.

So unlike some other security modules like SELINUX, enabling SMACK
breaks un-aware userspace and breaks TCP networking?

I dont think that's expected behavior - and i'd definitely like to
enable SMACK in automated tests to check for regressions, etc.

Ingo

--- .config.good 2008-02-06 14:13:35.000000000 +0100
+++ .config.bad 2008-02-06 14:17:28.000000000 +0100
@@ -1,7 +1,7 @@
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.24
-# Wed Feb 6 14:11:27 2008
+# Wed Feb 6 14:15:22 2008
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
@@ -94,15 +94,16 @@ CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
# CONFIG_EPOLL is not set
CONFIG_SIGNALFD=y
-CONFIG_TIMERFD=y
+# CONFIG_TIMERFD is not set
CONFIG_EVENTFD=y
# CONFIG_SHMEM is not set
# CONFIG_VM_EVENT_COUNTERS is not set
# CONFIG_SLAB is not set
# CONFIG_SLUB is not set
CONFIG_SLOB=y
-# CONFIG_PROFILING is not set
+CONFIG_PROFILING=y
# CONFIG_MARKERS is not set
+CONFIG_OPROFILE=y
CONFIG_HAVE_OPROFILE=y
# CONFIG_KPROBES is not set
CONFIG_HAVE_KPROBES=y
@@ -691,7 +692,...

To: Ingo Molnar <mingo@...>, David Miller <davem@...>
Cc: <linux-kernel@...>, <netdev@...>, Linus Torvalds <torvalds@...>, Casey Schaufler <casey@...>
Date: Wednesday, February 6, 2008 - 11:58 am

As Stephen mentions later, Smack uses CIPSO. sshd does not like
any IP options because of traceroute, and must be built with that
check disabled with the current Smack version. I have been looking
at using unlabeled packets for the "ambient" label, it appears that
doing so would make life simpler. I will get right on it.

Application behavior in the presence of IP options isn't
always what I think it ought to be.

Casey Schaufler
casey@schaufler-ca.com
--

To: Casey Schaufler <casey@...>
Cc: David Miller <davem@...>, <linux-kernel@...>, <netdev@...>, Linus Torvalds <torvalds@...>
Date: Thursday, February 7, 2008 - 7:44 am

ok - feel free to send me any patches to test.

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: David Miller <davem@...>, <linux-kernel@...>, <netdev@...>, Linus Torvalds <torvalds@...>, Casey Schaufler <casey@...>
Date: Wednesday, February 6, 2008 - 9:55 am

It is expected behavior for Smack due to default use of CIPSO for packet
labeling, see:
--
Stephen Smalley
National Security Agency

--

Previous thread: Re: 2.6.26-git0: IDE oops during boot by Pavel Machek on Wednesday, February 6, 2008 - 7:08 am. (17 messages)

Next thread: [git patches] net driver updates by Jeff Garzik on Wednesday, February 6, 2008 - 7:49 am. (2 messages)