Re: Possible race condition in conntracking

Previous thread: Re: [Bugme-new] [Bug 12484] New: Kernel panic 1gb lan connection by Andrew Morton on Tuesday, January 27, 2009 - 12:09 am. (1 message)

Next thread: Re: [atl2] warn_slowpath in dev_watchdog by Sitsofe Wheeler on Tuesday, January 27, 2009 - 2:30 am. (1 message)
From: Tobias Klausmann
Date: Tuesday, January 27, 2009 - 12:57 am

Hi!

I'm resending this to netdev (sent it to linux-net yesterday)
because I was told all the cool and relevant kids hang out here
rather than there.

It seems I've stumbled across a bug in the way Netfilter handles
packets. I have only been able to reproduce this with UDP, but it
might also affect other IP protocols. This first bit me when
updating from glibc 2.7 to 2.9.

Suppose a program calls getaddrinfo() to find the address of a
given hostname. Usually, the glibc resolver asks the name server
for both the A and AAAA records, gets two answers (addresses or
NXDOMAIN) and happily continues on. What is new with glibc 2.9 is
that it doesn't serialize the two requests in the same way as 2.7
did. The older version will ask for the A record, wait for the
answer, ask for the AAAA record, then wait for that answer. The
newer lib will fire off both requests in short time (usually 5-20
microseconds apart on the systems I tested with). Not only that,
it also uses the same socket fd (and thus source port) for both
requests.

Now if those packets traverse a Netfilter firewall, in the
glibc-2.7 case, they will create two conntrack entries, allowing
the answers back[0] and everything is peachy. In the glibc-2.9
case, sometimes, the second packet gets lost[1]. After
eliminating other causes (buggy checksum offloading, packetloss,
busy firewall and/or DNS server and a host of others), I'm sure
it's lost inside the firewall's Netfilter code. 

Using counting-only rules and building a dedicated setup with a
minimal Netfilter rule set, we could watch the counters, finding
two interesting facts for the failing case:

- The count in the NAT pre/postrouting chains is higher than for
  the case where the requests work. This points to the second
  packet being counted although it's part of the same connection
  as the first.
  
- All other counters increase, up to and including
  mangle/POSTROUTING. 

In essence, if you have N tries and one of them fails, you have
2N packets counted ...
From: Patrick McHardy
Date: Tuesday, January 27, 2009 - 2:20 am

[CCed netfilter-devel]


That sounds plausible, but we only discard the new conntrack
entry on clashes. The packet should be fine, unless you drop

Try tracing the packet using the TRACE target. That should show
whether it really disappears within netfilter and where.
--

From: Tobias Klausmann
Date: Tuesday, January 27, 2009 - 6:06 am

Hi! 

(I've now subscribed to netdev@, so no more CCs to me are necessary).


The ruleset currently does not contain any rules regarding

I've removed the irrelevant fields like TTL, PREC etc and timing
info from syslog from the trace after making sure nothing funky
was going on there.

Apart from the ID field, I ended up with two identical traces.

So, as far as rule-matching is concerned, the two packets are
handled identically. Whatever happens after this:

Jan 27 11:00:39 fw2 kernel: TRACE: nat:POSTROUTING:policy:3 IN=
OUT=eth2.188 SRC=194.97.7.116 DST=194.97.3.83 LEN=66 TOS=0x00
PREC=0x00 TTL=63 ID=46964 DF PROTO=UDP SPT=53452 DPT=53 LEN=46 

is making this very packet go away. The policy of nat/PR is
ACCEPT.

Presuming this:
http://xkr47.outerspace.dyndns.org/netfilter/packet_flow/packet_flow9.png

is accurate, I'm not sure what could drop the packet. We're not
using QoS or tunneling on the packetfilter in question. This
happens on two different machines (the machines are of the same
type, but they have different NICs), so I doubt it's a hardware
or driver issue.



-- 
printk("Cool stuff's happening!\n")
        linux-2.4.3/fs/jffs/intrep.c
--

From: Patrick McHardy
Date: Tuesday, January 27, 2009 - 6:14 am

This just means it passed through the last table/chain. The
only one following is conntrack confirmation.

Damn it :) I just noticed, we do indeed drop packets from
duplicate new connections in conntrack confirmation.

You should see the insert_failed conntrack counter show this
(/proc/net/stat/nf_conntrack).
--

From: Tobias Klausmann
Date: Tuesday, January 27, 2009 - 6:28 am

Hi! 


So the question remains what to do instead and how to do it. That

We do, as I said in my first mail. Near as I can tell,
nf_conntrack_confirm() is the only function that ever increases
that counter, so it's definitely dropped there. As to how one
could handle it differently, I have to defer to people with more
Netfilter expertise. No point in "fixing" this by breaking other
stuff.

Regards,
Tobias

-- 
printk("Cool stuff's happening!\n")
        linux-2.4.3/fs/jffs/intrep.c
--

From: Patrick McHardy
Date: Tuesday, January 27, 2009 - 6:48 am

Fixing this requires some rather intrusive changes. We need
to perform a lookup on the unconfirmed list when a conntrack
is not found in the hash and use the one we find there, if any.
The entries on that list are not reference counted and there
are a lot of assumptions in the code that an unconfirmed conntrack
is exclusively associated with a single packet. This needs to
be audited and fixed, but it looks quite hard.

--

Previous thread: Re: [Bugme-new] [Bug 12484] New: Kernel panic 1gb lan connection by Andrew Morton on Tuesday, January 27, 2009 - 12:09 am. (1 message)

Next thread: Re: [atl2] warn_slowpath in dev_watchdog by Sitsofe Wheeler on Tuesday, January 27, 2009 - 2:30 am. (1 message)