Re: CARP and pfsync weird behaviour

Previous thread: Advertising opportunity for http://www.11dom.org.pl/ by Loren Smith on Thursday, April 10, 2008 - 3:09 am. (1 message)

Next thread: Re: timezone issue by Stuart Henderson on Thursday, April 10, 2008 - 5:38 am. (2 messages)
From: openbsd firewall
Date: Thursday, April 10, 2008 - 4:35 am

Hello,

I'm testing an OpenBSD 4.2 firewall with Iperf and I'm experiencing a very
strange behaviour.
What happens is that when I reboot the backup node the connection rate drops
while the backup node is coming back.
Iperf log:
[  3] 233.0-234.0 sec  6.62 MBytes  55.5 Mbits/sec
[  3] 234.0-235.0 sec  6.62 MBytes  55.5 Mbits/sec
[  3] 235.0-236.0 sec  6.62 MBytes  55.5 Mbits/sec
[  3] 236.0-237.0 sec  6.70 MBytes  56.2 Mbits/sec
[  3] 237.0-238.0 sec    288 KBytes  2.36 Mbits/sec
[  3] 238.0-239.0 sec  3.40 MBytes  28.5 Mbits/sec
[  3] 239.0-240.0 sec  0.00 Bytes  0.00 bits/sec
[  3] 240.0-241.0 sec  3.55 MBytes  29.8 Mbits/sec
[  3] 241.0-242.0 sec  0.00 Bytes  0.00 bits/sec
[  3] 242.0-243.0 sec  3.49 MBytes  29.3 Mbits/sec
[  3] 243.0-244.0 sec  0.00 Bytes  0.00 bits/sec
[  3] 244.0-245.0 sec  3.49 MBytes  29.3 Mbits/sec
[  3] 245.0-246.0 sec  2.30 MBytes  19.3 Mbits/sec
[  3] 246.0-247.0 sec  5.23 MBytes  43.9 Mbits/sec
[  3] 247.0-248.0 sec  2.60 MBytes  21.8 Mbits/sec
[  3] 248.0-249.0 sec  5.37 MBytes  45.0 Mbits/sec
[  3] 249.0-250.0 sec  1.28 MBytes  10.7 Mbits/sec
[  3] 250.0-251.0 sec  4.69 MBytes  39.3 Mbits/sec
[  3] 251.0-252.0 sec  4.69 MBytes  39.3 Mbits/sec
[  3] 252.0-253.0 sec  6.62 MBytes  55.5 Mbits/sec
[  3] 253.0-254.0 sec  6.62 MBytes  55.5 Mbits/sec
[  3] 254.0-255.0 sec  6.62 MBytes  55.5 Mbits/sec

That drop in connection is when the rebooted node is coming back ! Iperf is
being tested from one machine behind one firewall interface and another
machine behind another firewall interface. One machine is running Openbsd
and the other Linux.
Is there any reason for this behaviour ? I do not expect the backup node to
have any influence over the flow on active node.

Related to this is a problem with pfsync. Sometimes I get a bad state after
the backup firewall comes back and then Iperf gets totally messed up,
sometimes recovering others not. No difference if psync is configured with
multicast or with syncpeer.
Log from the active node:
Apr 10 ...
From: Calomel
Date: Thursday, April 10, 2008 - 10:14 am

John,

I ran a test using iperf on an external openbsd system (client) through a carp
firewall to an internal openbsd system (server). All systems are running
OpenBSD v4.2 with the latest patches.

      external               ---> CARP --->  internal
(iperf -i 1 -t 600 -c carp0)                (iperf -s)

I did _not_ see any slow down through the MASTER when I rebooted the BACKUP
server. For example, I started the reboot of the BACKUP at 5 seconds and
the BACKUP finished rebooting at 102 seconds:

[  3]  1.0- 2.0 sec  81.2 MBytes    681 Mbits/sec
[  3]  2.0- 3.0 sec  82.3 MBytes    690 Mbits/sec
[  3]  3.0- 4.0 sec  83.8 MBytes    703 Mbits/sec
[  3]  4.0- 5.0 sec  86.6 MBytes    727 Mbits/sec -- start reboot
[  3]  5.0- 6.0 sec  86.8 MBytes    728 Mbits/sec
[  3]  6.0- 7.0 sec  86.3 MBytes    724 Mbits/sec
[  3]  7.0- 8.0 sec  82.8 MBytes    695 Mbits/sec
[  3]  8.0- 9.0 sec  86.7 MBytes    728 Mbits/sec
[  3]  9.0-10.0 sec  85.8 MBytes    720 Mbits/sec
[  3] 10.0-11.0 sec  86.1 MBytes    722 Mbits/sec

....cut....

[  3] 96.0-97.0 sec  83.4 MBytes    699 Mbits/sec
[  3] 97.0-98.0 sec  82.4 MBytes    692 Mbits/sec
[  3] 98.0-99.0 sec  81.9 MBytes    687 Mbits/sec
[  3] 99.0-100.0 sec  84.7 MBytes    710 Mbits/sec
[  3] 100.0-101.0 sec  83.3 MBytes    699 Mbits/sec
[  3] 101.0-102.0 sec  83.7 MBytes    702 Mbits/sec -- finished reboot
[  3] 102.0-103.0 sec  83.3 MBytes    699 Mbits/sec
[  3] 103.0-104.0 sec  83.6 MBytes    701 Mbits/sec
[  3] 104.0-105.0 sec  85.3 MBytes    716 Mbits/sec
[  3] 105.0-106.0 sec  83.4 MBytes    699 Mbits/sec

I also did not see any errors in the logs of either system running ipref
or on the firewalls. The load on the MASTER firewall was around 0.30.

Are the firewalls kernel patched? Are their any hardware failures to
report? Are the firewalls overloaded? 

You are welcome to check out some of the "how to's" I have at
http://calomel.org if you need to.
 
--
  Calomel @ http://calomel.org
  Open Source Research and Reference



From: openbsd firewall
Date: Thursday, April 10, 2008 - 3:07 pm

Hello,

This got even more interesting. After reading your email I had the idea to
start turning off the various carp interfaces to see what would be the
effect.
I have two onboard "Broadcom BCM5704C" and a "Intel PRO/1000MT QP (82546GB)"
quad nic.
One carp is configured for one onboard nic and two other for the quad nic.
I removed the two carps for the quad nic at backup node and rebooted it a
few times. There are no failures in iperf test (I used a long time to make
sure it was always running between all the tests) which is the same as your
tests and normal expected result.
Removing the onboard carp and activating both or one of the quad nic carps
gives the failures I reported previously. Without pfsync active in the
master node, I get a  small failure in iperf tests while the backup node is
coming back. If I activate pfsync, I get the same small failure plus
sometimes a total mess up of iperf connection states.
So it seems the problem is happening with the quad nic. I don't see any
performance problems with the quad nic because I left iperf running for 2
days without any problem. CPU usage in interrupts is around 15% and load
0.20 while doing tests. The firewall is still not in production, so only
traffic is only my test and internet junk being dropped.
Kernel is GENERIC 4.2 without any patches (I don't see any of them relevant
to this problem). I doubt about any hardware problems because the same
happens if I exchange their roles as master and backup.

I can't understand how the backup node can generate these results with a
reboot. While writing this I remembered to do another test. I destroyed the
quad nic carps (with ifconfig carpX destroy) and then brought them back with
sh /etc/netstart. Iperf keeps running smoothly this time... Master node
receives the bulk update requests without any problems. Did this a few times
and nothing happened.
Even more weird now !!! Something is being done while those interfaces got
up for the first time after the reboot!
Any ideas ...
From: Jason Dixon
Date: Thursday, April 10, 2008 - 4:02 pm

Is ACPI enabled?

-J.

On Apr 10, 2008, at 6:07 PM, "openbsd firewall" <openbsdfirewall@gmail.com 

From: openbsd firewall
Date: Thursday, April 10, 2008 - 4:08 pm

Hello,

It's booting with default behaviour so no ACPI enabled.
Here's dmesg output for the backup node (master is exactly the same
hardware).

Apr 10 17:40:23 bbq /bsd: OpenBSD 4.2 (GENERIC) #375: Tue Aug 28 10:38:44
MDT 2007
Apr 10 17:40:23 bbq /bsd:     deraadt@i386.openbsd.org:
/usr/src/sys/arch/i386/compile/GENERIC
Apr 10 17:40:23 bbq /bsd: cpu0: Dual-Core AMD Opteron(tm) Processor 1210 HE
("AuthenticAMD" 686-class, 1024KB L2 cache) 1.80 GHz
Apr 10 17:40:23 bbq /bsd: cpu0:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16
Apr 10 17:40:23 bbq /bsd: real mem  = 2146988032 (2047MB)
Apr 10 17:40:23 bbq /bsd: avail mem = 2068418560 (1972MB)
Apr 10 17:40:23 bbq /bsd: mainbus0 at root
Apr 10 17:40:23 bbq /bsd: bios0 at mainbus0: AT/286+ BIOS, date 02/08/08,
BIOS32 rev. 0 @ 0xf0010, SMBIOS rev. 2.4 @ 0xfbb50 (50 entries)
Apr 10 17:40:23 bbq /bsd: bios0: vendor American Megatrends Inc. version
"080011 " date 02/08/2008
Apr 10 17:40:23 bbq /bsd: bios0: Supermicro H8SSL-I2
Apr 10 17:40:23 bbq /bsd: pcibios0 at bios0: rev 2.1 @ 0xf0000/0x10000
Apr 10 17:40:23 bbq /bsd: pcibios0: PCI IRQ Routing Table rev 1.0 @
0xf4d40/176 (9 entries)
Apr 10 17:40:23 bbq /bsd: pcibios0: no compatible PCI ICU found: ICU vendor
0x1166 product 0x0205
Apr 10 17:40:23 bbq /bsd: pcibios0: PCI bus #3 is the last bus
Apr 10 17:40:23 bbq /bsd: bios0: ROM list: 0xc0000/0xb000 0xcb000/0x3000!
0xce000/0x1600 0xcf800/0x1600 0xd1000/0x1000
Apr 10 17:40:23 bbq /bsd: acpi at mainbus0 not configured
Apr 10 17:40:23 bbq /bsd: cpu0 at mainbus0
Apr 10 17:40:23 bbq /bsd: pci0 at mainbus0 bus 0: configuration mode 1 (no
bios)
Apr 10 17:40:23 bbq /bsd: ppb0 at pci0 dev 1 function 0 "ServerWorks HT-1000
PCI" rev 0x00
Apr 10 17:40:23 bbq /bsd: pci1 at ppb0 bus 1
Apr 10 17:40:23 bbq /bsd: ppb1 at pci1 dev 13 function 0 "ServerWorks
HT-1000 PCIX" rev 0xb2
Apr 10 17:40:23 bbq /bsd: pci2 at ppb1 bus 2
Apr 10 17:40:23 bbq /bsd: ppb2 at pci2 dev 1 function 0 ...
From: Jason Dixon
Date: Thursday, April 10, 2008 - 4:17 pm

I was implying that you should enable ACPI and try again.

-J.

On Apr 10, 2008, at 7:08 PM, "openbsd firewall" <openbsdfirewall@gmail.com 

From: openbsd firewall
Date: Friday, April 11, 2008 - 7:25 am

Same results with ACPI enabled on both nodes.


From: Jason Dixon
Date: Friday, April 11, 2008 - 7:42 am

Let's see your dmesg with acpi enabled.

---
Jason Dixon
DixonGroup Consulting
http://www.dixongroup.net

From: openbsd firewall
Date: Friday, April 11, 2008 - 8:10 am

Dmesg for backup node:

Apr 11 10:21:34 bbq /bsd: OpenBSD 4.2 (GENERIC) #375: Tue Aug 28 10:38:44
MDT 2007
Apr 11 10:21:34 bbq /bsd:     deraadt@i386.openbsd.org:
/usr/src/sys/arch/i386/compile/GENERIC
Apr 11 10:21:34 bbq /bsd: cpu0: Dual-Core AMD Opteron(tm) Processor 1210 HE
("AuthenticAMD" 686-class, 1024KB L2 cache) 1.80 GHz
Apr 11 10:21:34 bbq /bsd: cpu0:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16
Apr 11 10:21:34 bbq /bsd: real mem  = 2146988032 (2047MB)
Apr 11 10:21:34 bbq /bsd: avail mem = 2068418560 (1972MB)
Apr 11 10:21:34 bbq /bsd: mainbus0 at root
Apr 11 10:21:34 bbq /bsd: bios0 at mainbus0: AT/286+ BIOS, date 02/08/08,
BIOS32 rev. 0 @ 0xf0010, SMBIOS rev. 2.4 @ 0xfbb50 (50 entries)
Apr 11 10:21:34 bbq /bsd: bios0: vendor American Megatrends Inc. version
"080011 " date 02/08/2008
Apr 11 10:21:34 bbq /bsd: bios0: Supermicro H8SSL-I2
Apr 11 10:21:34 bbq /bsd: pcibios0 at bios0: rev 2.1 @ 0xf0000/0x10000
Apr 11 10:21:34 bbq /bsd: pcibios0: PCI IRQ Routing Table rev 1.0 @
0xf4d40/176 (9 entries)
Apr 11 10:21:34 bbq /bsd: pcibios0: no compatible PCI ICU found: ICU vendor
0x1166 product 0x0205
Apr 11 10:21:34 bbq /bsd: pcibios0: PCI bus #3 is the last bus
Apr 11 10:21:34 bbq /bsd: bios0: ROM list: 0xc0000/0xb000 0xcb000/0x3000!
0xce000/0x1600 0xcf800/0x1600 0xd1000/0x1000
Apr 11 10:21:34 bbq /bsd: acpi0 at mainbus0: rev 0
Apr 11 10:21:34 bbq /bsd: acpi0: tables DSDT FACP APIC OEMB
Apr 11 10:21:34 bbq /bsd: acpitimer at acpi0 not configured
Apr 11 10:21:34 bbq /bsd: acpiprt0 at acpi0: bus 0 (PCI0)
Apr 11 10:21:34 bbq /bsd: acpiprt1 at acpi0: bus 1 (P0P1)
Apr 11 10:21:34 bbq /bsd: acpiprt2 at acpi0: bus 2 (P1P2)
Apr 11 10:21:34 bbq /bsd: acpicpu at acpi0 not configured
Apr 11 10:21:34 bbq /bsd: acpicpu at acpi0 not configured
Apr 11 10:21:34 bbq /bsd: acpibtn at acpi0 not configured
Apr 11 10:21:34 bbq /bsd: acpibtn at acpi0 not configured
Apr 11 10:21:34 bbq /bsd: cpu0 at mainbus0
Apr 11 10:21:34 bbq ...
From: openbsd firewall
Date: Monday, April 14, 2008 - 7:01 am

Hello,

Some news about this... If I change vhid on the backup node this problem
doesn't occurs since the ARP for the master node is still in cache and
backup node now has a different mac address for the carp interfaces. Of
course changing vhid and IP doesn't give any trouble at all.
It seems the backup node is messing with arp (maybe at switch level ???)
when it's coming back!
All switches are CISCO 2900 and 3500. Is there any recommend configuration
for these switches ?

Thanks,
John

From: Henning Brauer
Date: Monday, April 14, 2008 - 8:07 am

yes. involves a nice pack of explosives and a lighter.

that said, i have used these shitty things in a dark time long long 
ago, and they don't require special config w/ carp.
just take care to not use port-security with static leraning (they 
might use different words to confuse the matter) so that the carp mac 
is statically bound to one of the ports.

-- 
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam

Previous thread: Advertising opportunity for http://www.11dom.org.pl/ by Loren Smith on Thursday, April 10, 2008 - 3:09 am. (1 message)

Next thread: Re: timezone issue by Stuart Henderson on Thursday, April 10, 2008 - 5:38 am. (2 messages)