Re: Diagnose co-location networking problem

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Stephan Wehner <stephanwehner@...>
Cc: <freebsd-net@...>
Date: Thursday, December 28, 2006 - 4:31 pm

On Wed, Dec 27, 2006 at 10:08:25PM -0800, Stephan Wehner wrote:

Ok, that explains the private 192.168 IP address I saw in your earlier
dumps, it was from the client (a detail mentioned but that I overlooked).


Actually there's significant indication of lost packets and clues that
point to the location of the problem.  I'll explain.
 

Generally two TCP connections on different sockets will never interfere
with each other, except in extreme examples of congestion or pathologically
configured address-translating gateways.  


So we're looking at the client here and there are few things of note:
      1. No significant interface errors are being recorded so it's
         not a layer-2 (ethernet) issue.
      2. The retansmit count went up by 9 while the overall transmit count
         went up by 191 packets, suggesting an approximate transient packetloss
         rate of 4.7% (9/191, fuzzy math) during the test which is
         significantly greater than the system-wide average of 0.8%
         (66665/8605107). Thus this possibly suggests that the client
         saw an abormal packetloss rate during the test. It may be
         the case that all of the successful connections experienced
         no packet loss and only the failed connect generated the
         retransmits. I'm not sure if initial SYN retransmits get
         counted in this column or not but I believe this still may be
         significant.  (The assumptions made in these calculations are
		 so grossly oversimplified that the evidence derived from
		 them is weak at best).
      3. The loopback saw 90 packets of activity.  I don't know how
		 long this test ran but that could be considered a little chatty.
		 As a longshot, I'd run a tcpdump on loopback and run the test
		 again, simply to make sure that no traffic is unintentionally
		 getting diverted over the loopback interface (unlikely but I've
		 actuallly seen bugs/bad firewall configs do this).


And here are the server stats which seem to show very little but
in fact are quite informative. 
      1. No significant interface errors, again ruling out layer-2.
	  2. pflog and pfsyn devices are registered in the kernel,
		 suggesting PF firewalling has been compiled in.  It doesn't
		 seem that pflog is being used at all but this does beg the
		 qustion, are you using any packetfiltering on the server?
		 If so, I'd suggest disabling the packetfilter entirely and 
		 retesting to see if the issue is reproducable. 
      3. The retransmit count has gone up by zero, suggesting the
		 server never sent a packet that it later had to retransmit.
		 This strongly suggests to me that the nature of the 
		 connection problems is that the server never sees the
		 client's SYN packets.  This is fairly strong evidence
		 pointing to an intelligent filtering device / proxy in
		 the middle of the connection.  (or even a firewall
		 configuration on the server itself).

Offhand, here's another test you can run:  try and determine if
this connection failure behavior is specific to HTTP or general
to all TCP services.  So far you've mentioned no troubles with
SSH, I think you should test that further.  Set up a similar 
test to your HTTP test but with SSH... I'd probably set up 
public-key authentication on a account on the server so that
I could log in without a password and then run simple remote commands
over ssh on the server:
          ssh myserver echo boink
over and over again to see if any of those connections fail with a
frequency similar to the HTTP test.  If you're unable to reproduce
the same failure behavior with a test like this then that suggests
that the problem is only specific to HTTP which is practically a
smoking gun that this is a firewall/loadbalancer/middlebox issue.
You need some smarts in the middle to selectively interfere with
one type of TCP traffic and not another.. there's no way that a
routing problem could be so selective.  It's also still possible
that this could be a kernel issue since you've clearly tweaked your
configuration (compiled out bpf, compiled in PF).. if you compile
a GENERIC kernel and run it, can the test be reproduced?  This is
a more costly test but one to consider if all else fails.

Also, there's another possibility.  I noticed in your earlier messages
that the IP address of the server is 65.110.18.138 which in-addr.arpa
maps to VPS-18-138.virtualprivateservers.ca.  Looking at 
virtualprivateservers.ca's website it seems that they specialize in
virtualized servers, begging the question: is your server running in
a virtual server (xen, whatnot)?  If so then that opens up a slew
of other possible issues and is important information to know.

Oh, also, going back to the 198.168 address seen in the client dumps,
it's clear that you're going through a NAT firewall or VPN or something
on the way to your server.  Thus are you able to reproduce this problem
from a different external network?  

Actually, I just realized that you've provided enough information for me
to run this test myself which I've now done.  I ran the following test;

       i=0; while true; do ((i++)); echo $i; curl http://stbgo.org > /dev/null; done

I was able to make over 64 consecutive connections without a single failure
before I stopped the test (didn't want to spam your site).  How sure
are you that this isn't a client-side problem?

cheers.

--
Matthew Hudson


_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Diagnose co-location networking problem, Stephan Wehner, (Tue Dec 26, 10:45 pm)
Re: Diagnose co-location networking problem, Matthew Hudson, (Wed Dec 27, 6:18 pm)
Re: Diagnose co-location networking problem, Stephan Wehner, (Thu Dec 28, 2:08 am)
Re: Diagnose co-location networking problem, Matthew Hudson, (Thu Dec 28, 4:31 pm)
Re: Diagnose co-location networking problem, Stephan Wehner, (Tue Jan 2, 12:07 am)
Re: Diagnose co-location networking problem, Matthew Hudson, (Thu Jan 4, 3:34 pm)
Re: Diagnose co-location networking problem, Stephan Wehner, (Fri Jan 5, 2:33 am)
Re: Diagnose co-location networking problem, Gary Palmer, (Thu Dec 28, 1:31 pm)
Re: Diagnose co-location networking problem, Bill Vermillion, (Thu Dec 28, 8:07 am)
Re: Diagnose co-location networking problem, Stephan Wehner, (Thu Dec 28, 12:31 pm)
Re: Diagnose co-location networking problem, Bill Vermillion, (Thu Dec 28, 5:46 pm)
Re: Diagnose co-location networking problem, Bill Vermillion, (Wed Dec 27, 9:45 am)
Re: Diagnose co-location networking problem, Stephan Wehner, (Wed Dec 27, 1:55 am)