Hi, I've upgraded lists.arm.linux.org.uk to 2.6.25, and I'm now seeing some very weird networking behaviour from the machine which seems to only affect IPv4 - including ICMP and NFS(tcp). tcpdump is available (all 4MB worth): http://www.home.arm.linux.org.uk/~rmk/ping.capture Machines involved: dyn-67 - x86 box 2.6.20-1.2320.fc5 (192.168.0.67 / 2002:4e20:1eda:1:201:80ff:fe4b:1778) n2100 - ARM box 2.6.24 (78.32.30.221, has ipv6 as well) lists - ARM box 2.6.25 (78.32.30.220 / 2002:4e20:1eda:1:201:3dff:fe00:0156) The dump shows three 8200 byte pings running - one IPv4 on n2100 against lists, one IPv4 on dyn-67 against lists, and one IPv6 on dyn-67 against lists. The tcpdump was running on lists itself. Everything looks fine until around packet 1688, where n2100 sends an echo request to lists, which doesn't get a reply. 300ms later, dyn-67 sends an echo request to lists, which also coincidentally doesn't get a reply. Note, however, how the IPv6 pings continue. The stats for the pings upon their termination are: rmk@dyn-67:[~]:<1005> ping6 -s 8192 lists PING lists(lists.arm.linux.org.uk) 8192 data bytes --- lists ping statistics --- 101 packets transmitted, 101 received, 0% packet loss, time 99990ms rtt min/avg/max/mdev = 4.132/4.488/26.585/2.374 ms, pipe 2 rmk@dyn-67:[~]:<1051> ping -s 8192 lists PING lists.arm.linux.org.uk (78.32.30.220) 8192(8220) bytes of data. --- lists.arm.linux.org.uk ping statistics --- 101 packets transmitted, 54 received, 46% packet loss, time 99993ms rtt min/avg/max/mdev = 4.139/6.027/35.274/6.405 ms root@n2100:~# ping -s 8192 lists PING lists.arm.linux.org.uk (78.32.30.220) 8192(8220) bytes of data. --- lists.arm.linux.org.uk ping statistics --- 101 packets transmitted, 55 received, 45% packet loss, time 100020ms rtt min/avg/max/mdev = 4.404/4.610/13.235/1.175 ms Lastly, in /proc/net/snmp on lists, I find: Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos ...
Forgot the config file for the problem kernel... http://www.home.arm.linux.org.uk/~rmk/bast-config-2.6.25 -- Russell King --
From: Russell King <rmk@arm.linux.org.uk> The ReasmTimeout and ReasmFails look interesting. Maybe it was the namespace bits? Pavel, could you take a quick look? Thanks. --
Can you please also show the /proc/net/netstat contents - I'm interested in IpExt statistics. Thanks, Pavel --
IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts
IpExt: 0 0 0 0 0 0
I suspect that you were expecting these to be non-zero.
I've just added some debug printks into ip_input.c, and I don't think the
IP stack itself is at fault (if it were, you'd be flooded with reports.)
int ip_local_deliver(struct sk_buff *skb)
{
...
if (ip_hdr(skb)->saddr == htonl(0xc0a80043) &&
ip_hdr(skb)->protocol == IPPROTO_ICMP) printk("ping 2\n");
return NF_HOOK(PF_INET, NF_INET_LOCAL_IN, skb, skb->dev, NULL,
ip_local_deliver_finish);
}
static int ip_local_deliver_finish(struct sk_buff *skb)
{
__skb_pull(skb, ip_hdrlen(skb));
/* Point into the IP datagram, just past the header. */
skb_reset_transport_header(skb);
if (ip_hdr(skb)->saddr == htonl(0xc0a80043) &&
ip_hdr(skb)->protocol == IPPROTO_ICMP) printk("ping 3\n");
When the machine stops responding to pings, I see in the kernel message
log 'ping 2' but no 'ping 3' (whereas I get both when it does respond.)
I don't have the iptables binary installed, so there aren't any rules.
(Also, the iptables_filter module isn't loaded.)
I'll see if I can track the packet's progress through the netfilter code
today.
--
Russell King
--
(Adding netfilter mailing list. See http://marc.info/?t=120933809600001&r=1&w=2 for the initial problem description.) Further to this, it's looking like there's a nf_conntrack issue. Having placed similar printks in the netfilter code, I see the ipv4_confirm() hook normally returning 1 (NF_ACCEPT), but then decides to return 0 (NF_DROP) and no ping replies. -bash-3.1# cat /proc/net/stat/ip_conntrack entries searched found new invalid ignore delete delete_list insert insert_failed drop early_drop icmp_error expect_new expect_create expect_delete 00000110 000000e2 000001c6 000003bb 00000140 00000000 000002ab 0000023a 0000034a 0000005f 00000000 00000000 0000000f 00000000 00000000 00000000 insert_failed increments when there aren't any ping replies. The other interesting thing (though I'm not sure if it's really related or helps) is: -bash-3.1# grep 'ipv4.*icmp.*192.168.0.67' /proc/net/nf_conntrack ipv4 2 icmp 1 29 src=192.168.0.67 dst=78.32.30.220 type=8 code=0 id=53823 packets=19 bytes=156180 [UNREPLIED] src=78.32.30.220 dst=192.168.0.67 type=0 code=0 id=53823 packets=0 bytes=0 mark=0 use=1 -bash-3.1# grep 'ipv4.*icmp.*192.168.0.67' /proc/net/nf_conntrack ipv4 2 icmp 1 29 src=192.168.0.67 dst=78.32.30.220 type=8 code=0 id=53823 packets=21 bytes=172620 [UNREPLIED] src=78.32.30.220 dst=192.168.0.67 type=0 code=0 id=53823 packets=0 bytes=0 mark=0 use=1 -bash-3.1# grep 'ipv4.*icmp.*192.168.0.67' /proc/net/nf_conntrack ipv4 2 icmp 1 29 src=192.168.0.67 dst=78.32.30.220 type=8 code=0 id=53823 packets=22 bytes=180840 [UNREPLIED] src=78.32.30.220 dst=192.168.0.67 type=0 code=0 id=53823 packets=0 bytes=0 mark=0 use=1 -bash-3.1# grep 'ipv4.*icmp.*192.168.0.67' /proc/net/nf_conntrack ipv4 2 icmp 1 29 src=192.168.0.67 dst=78.32.30.220 type=8 code=0 id=53823 packets=23 bytes=189060 [UNREPLIED] src=78.32.30.220 dst=192.168.0.67 type=0 code=0 id=53823 packets=0 bytes=0 mark=0 use=1 -bash-3.1# grep 'ipv4.*icmp.*192.168.0.67' /proc/net/nf_conntrack ipv4 ...
From: Russell King <rmk@arm.linux.org.uk>
There's already been a report about specific hashing problems with
conntrack on ARM. It has something to do with how structures are
padding on ARM combined with the following patch made by Patrick:
commit 0794935e21a18e7c171b604c31219b60ad9749a9
Author: Patrick McHardy <kaber@trash.net>
Date: Thu Jan 31 04:40:52 2008 -0800
[NETFILTER]: nf_conntrack: optimize hash_conntrack()
Avoid calling jhash three times and hash the entire tuple in one go.
__hash_conntrack | -485 # 760 -> 275, # inlines: 3 -> 1, size inlines: 717 -> 252
1 function changed, 485 bytes removed
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index ce4c4ba..4a2cce1 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -73,15 +73,19 @@ static unsigned int nf_conntrack_hash_rnd;
static u_int32_t __hash_conntrack(const struct nf_conntrack_tuple *tuple,
unsigned int size, unsigned int rnd)
{
- unsigned int a, b;
+ unsigned int n;
+ u_int32_t h;
- a = jhash2(tuple->src.u3.all, ARRAY_SIZE(tuple->src.u3.all),
- (tuple->src.l3num << 16) | tuple->dst.protonum);
- b = jhash2(tuple->dst.u3.all, ARRAY_SIZE(tuple->dst.u3.all),
- ((__force __u16)tuple->src.u.all << 16) |
- (__force __u16)tuple->dst.u.all);
+ /* The direction must be ignored, so we hash everything up to the
+ * destination ports (which is a multiple of 4) and treat the last
+ * three bytes manually.
+ */
+ n = (sizeof(tuple->src) + sizeof(tuple->dst.u3)) / sizeof(u32);
+ h = jhash2((u32 *)tuple, n,
+ rnd ^ (((__force __u16)tuple->dst.u.all << 16) |
+ tuple->dst.protonum));
- return ((u64)jhash_2words(a, b, rnd) * size) >> 32;
+ return ((u64)h * size) >> 32;
}
static inline u_int32_t hash_conntrack(const struct nf_conntrack_tuple *tuple)
--
Yup, reverting that appears to fix the problem. Looking at the structure, it will contain two bytes of padding in the 'u' union and another two bytes in the 'dst' structure. I suspect there'll be objections to packing the structure, in which case what's the permanent fix? -- Russell King --
