Re: CARP node crashing reproducibly (4.3-stable)

Previous thread: isakmpd & multiple CAs within one file? by Harald Dunkel on Friday, July 11, 2008 - 10:29 am. (1 message)

Next thread: cdce error by Need Coffee on Friday, July 11, 2008 - 3:42 pm. (1 message)
To: misc <misc@...>
Cc: Adrian M. Whatley <amw@...>, David Lawrence <dave@...>, berthierp <berthierp@...>
Date: Friday, July 11, 2008 - 10:55 am

Hello,

Here's all data I was able to get off our crashing machine, the backup
node of our CARP cluster, that used to run flawlessly since 3.7.

We can reproduce the problem by (no joke) installing an openSUSE 10.3
machine in one of our labs over the network. After 40 minutes, our
backup firewall crashes. Sounds preposterous, I know... We've not had
time to examined what packets are exactly sent out on the network by
this machine, yet.

The crashed machine is still in ddb, so just asked if I should execute
some more commands.

Should I rather file a bug report? I never know when I should just ask
here or rather file one, sorry... but thanks for your help anyway!

ddb> ps
PID PPID PGRP UID S FLAGS WAIT COMMAND
15843 7703 7703 556 2 0x100 nrpe
*27717 1 7703 556 7 0x100 nrpe
26537 24561 27862 0 3 0x4082 ttyin more
24561 27862 27862 0 3 0x4082 pause sh
27862 3290 27862 0 3 0x4082 wait man
10574 1244 10574 0 3 0x4082 ttyin ksh
1244 21740 1244 0 3 0x4180 select sshd
19759 15807 19759 0 3 0x4082 ttyin ksh
15807 21740 15807 0 3 0x4180 select sshd
3290 1 3290 0 3 0x4082 pause ksh
2463 1 2463 0 3 0x4082 ttyin getty
4032 1 4032 0 3 0x4082 ttyin getty
29698 1 29698 0 3 0x4082 ttyin getty
25598 1 25598 0 3 0x4082 ttyin getty
2451 1 2451 0 3 0x4082 ttyin getty
26554 1 26554 0 3 0x80 poll ntpd
7819 1 7819 0 3 0x80 select cron
21981 1 21981 0 3 0x80 kqread apmd
7703 1 7703 556 3 0x180 wait nrpe
21908 1 7188 0 3 0x80 select snmpd
...

To: misc <misc@...>
Date: Friday, July 11, 2008 - 3:32 pm

if you follow http://www.benzedrine.cx/crashreport.html we have a
chance to actually fix the bug...

--
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam

To: misc <misc@...>
Cc: <lists-openbsd@...>, Stephan A. Rickauer <stephan.rickauer@...>, <reyk@...>
Date: Monday, July 14, 2008 - 6:38 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Henning Brauer wrote:
| * Stephan A. Rickauer <stephan.rickauer@ini.phys.ethz.ch> [2008-07-11
16:59]:
|> Here's all data I was able to get off our crashing machine, the backup
|> node of our CARP cluster, that used to run flawlessly since 3.7.
|>
|> We can reproduce the problem
|
| if you follow http://www.benzedrine.cx/crashreport.html we have a
| chance to actually fix the bug...
|

Hello,
I'm a colleague of Stephan Rickauer and I've been taking a look at this
problem.

It's a NULL pointer bug!

dmesg shows
kernel: page fault trap, code=0
Stopped at pf_send_icmp+0x2b: orb

and ddb trace shows:

$0x1,0x32(%eax)pf_send_icmp(d62f3200,3,3,2,d67191b8,d115d500,2,db2a4eb8)
at pf_send_icmp+0x2b

ddb registers shows (among others):

eax 0
eip 0xd02f56db pf_send_icmp+0x2b

and helpfully disassembles the faulting instruction thus:

pf_send_icmp+0x2b: orb $0x1,0x32(%eax)

which is from line 1726 in pf_send_icmp() in pf.c:

m0->m_pkthdr.pf.flags |= PF_TAG_GENERATED;

The beginning of this function (up to the line with the or) is as follows:

pf_send_icmp(struct mbuf *m, u_int8_t type, u_int8_t code, sa_family_t af,
~ struct pf_rule *r)
{
struct mbuf *m0;

m0 = m_copy(m, 0, M_COPYALL);
m0->m_pkthdr.pf.flags |= PF_TAG_GENERATED;

Thus we have m_copy (actually m_copym, since m_copy is a macro defined
in /usr/src/sys/sys/mbuf.h in terms of m_copym, which itself is a
one-line wrapper around m_copym0) returning a NULL pointer in eax (= m0)
and the subsequent OR getting a page fault when it tries to use it.

Looking at m_copym0, it looks like it can legitimately fail and return
NULL (it even increments a global variable MCFail when it does so) and
therefore the bug is that its return value is not being checked in
pf_send_icmp.

As far as I can see, the precise nature of the packet being handled at
the time of the crash is not important...

To: Adrian M. Whatley <amw@...>
Cc: <lists-openbsd@...>, Stephan A. Rickauer <stephan.rickauer@...>, misc <misc@...>, <reyk@...>
Date: Monday, July 14, 2008 - 8:22 am

perfect analysis!

looks like the only sane thing to do in that case is to bail and not
send the icmp.

Index: pf.c
===================================================================
RCS file: /cvs/src/sys/net/pf.c,v
retrieving revision 1.609
diff -u -p -r1.609 pf.c
--- pf.c 10 Jul 2008 07:41:21 -0000 1.609
+++ pf.c 14 Jul 2008 12:20:27 -0000
@@ -1819,7 +1819,9 @@ pf_send_icmp(struct mbuf *m, u_int8_t ty
{
struct mbuf *m0;

- m0 = m_copy(m, 0, M_COPYALL);
+ if ((m0 = m_copy(m, 0, M_COPYALL)) == NULL)
+ return;
+
m0->m_pkthdr.pf.flags |= PF_TAG_GENERATED;

if (r->rtableid >= 0)

--
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam

To: misc <misc@...>
Cc: Adrian M. Whatley <amw@...>, David Lawrence <dave@...>, berthierp <berthierp@...>
Date: Monday, July 14, 2008 - 11:12 am

I've compiled a new kernel with the patch. The machine is no longer
crashing on pf_send_icmp(). However, I now see memory leaking until the
machine locks up (it doesn't crash but its network becomes unusable).
Unfortunately, it then also puts all CARP interfaces in MASTER state,
though the other node works perfectly as master already. This will, of
course, knock down our entire network until I manually put down the carp
interfaces.

I have increased kern.maxclusters to gain more time for debugging of the
memory leak. However, all I could find out so far is that lots of mbufs
are allocated while there is no significant traffic to be handled
(remember the machine is the CARP backup). The machine crashes within 15
minutes after reboot.

Because of the line wrapping in this email, I've also put the output of
netstat and vmstat online)

http://www.ini.uzh.ch/~stephan/vmstat+netstat.txt

# vmstat -m
Memory statistics by bucket size
Size In Use Free Requests HighWater Couldfree
16 3549 10275 304244 1280 7725
32 303 209 51063 640 0
64 2968 360 93244 320 89
128 511 65 5665 160 0
256 189 131 12817 80 1065
512 351 9 3326 40 0
1024 2313 11 3302 20 0
2048 33 1 1536 10 0
4096 28 1 6834 5 0
8192 12 0 12 5 0
16384 6 0 6 5 0
32768 5 0 5 5 0
65536 1 0 1 5 0

Memory usage type by bucket size
Size Type(s)
16 devbuf, pcb, routetbl, ifaddr, sysctl, UFS mount, dirhash,
in_multi,
exec, xform_data, VM swap, UVM amap, ...

To: misc <misc@...>
Date: Monday, July 14, 2008 - 11:38 am

ok that is weird. icmp_error as called in pf_send_icmp does not m_free
anything but the passed mbuf, and we now just bail if tghe allocation
of it fails. so i have a hard time seeing this as related... might be
something completely different. and finding mbuf leaks tends to be
damn hard and following a lot of code...

--
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam

To: misc <misc@...>
Cc: Henning Brauer <lists-openbsd@...>, Adrian M. Whatley <amw@...>, berthierp <berthierp@...>
Date: Wednesday, July 23, 2008 - 2:44 pm

Yes, you are right. The leak we've seen is due to a kernel build we must
have introduced by using an unclean source tree. Problem solved.
However, the patch you've implemented in 1.610 of pf.c does fix the
crashes we've seen before.

Thanks a lot!

--

Stephan A. Rickauer

-----------------------------------------------------------
Institute of Neuroinformatics Tel +41 44 635 30 50
University / ETH Zurich Sec +41 44 635 30 52
Winterthurerstrasse 190 Fax +41 44 635 30 53
CH-8057 Zurich Web www.ini.uzh.ch

To: misc <misc@...>
Date: Friday, July 11, 2008 - 3:47 pm

Nice page. I'll have a look on Monday. Thanks.

--

Stephan A. Rickauer

-----------------------------------------------------------
Institute of Neuroinformatics Tel +41 44 635 30 50
University / ETH Zurich Sec +41 44 635 30 52
Winterthurerstrasse 190 Fax +41 44 635 30 53
CH-8057 Zurich Web www.ini.uzh.ch

To: Stephan A. Rickauer <stephan.rickauer@...>
Cc: Adrian M. Whatley <amw@...>, David Lawrence <dave@...>, berthierp <berthierp@...>, misc <misc@...>
Date: Friday, July 11, 2008 - 11:09 am

hi stephan!

can you also show your carp configuration?

reyk

To: Reyk Floeter <reyk@...>
Cc: Adrian M. Whatley <amw@...>, David Lawrence <dave@...>, berthierp <berthierp@...>, misc <misc@...>
Date: Friday, July 11, 2008 - 11:25 am

Sure (just x'ed out the external IPs as well as passwords). We have a
simple master/backup system:

carp0: LAN
carp1: DMZ
carp2: WLAN
carp3: Internet

#
cat /etc/host*.carp*
inet 172.16.3.254 255.255.254.0 172.16.3.255 vhid 1 advskew 50 pass xxx
carpdev em0
inet 130.60.230.xxx 255.255.255.224 130.60.230.xxx vhid 2 advskew 50
pass xxx carpdev em1
inet 192.168.91.254 255.255.255.0 192.168.91.255 vhid 4 advskew 50 pass
xxx carpdev fxp0
inet 130.60.x.xx 255.255.255.252 130.60.x.xxx vhid 3 advskew 50 pass xxx
carpdev em2

('advskew 100' on backup node)

# sysctl net.inet.carp
net.inet.carp.allow=1
net.inet.carp.preempt=1
net.inet.carp.log=0

cat /etc/hostname.carp*

# ifconfig
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33208
groups: lo
inet 127.0.0.1 netmask 0xff000000
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x8
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
lladdr 00:0c:f1:8f:a9:c4
media: Ethernet autoselect (1000baseT
full-duplex,rxpause,txpause)
status: active
inet 172.16.3.252 netmask 0xfffffe00 broadcast 172.16.3.255
inet6 fe80::20c:f1ff:fe8f:a9c4%em0 prefixlen 64 scopeid 0x1
em1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
lladdr 00:0e:0c:c3:39:74
media: Ethernet autoselect (1000baseT
full-duplex,rxpause,txpause)
status: active
inet 130.60.230.xxx netmask 0xffffffe0 broadcast 130.60.230.xxx
inet6 fe80::20e:cff:fec3:3974%em1 prefixlen 64 scopeid 0x2
em2: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
lladdr 00:0e:0c:c3:39:75
media: Ethernet autoselect (1000baseT
full-duplex,rxpause,txpause)
status: active
inet6 fe80::20e:cff:fec3:3975%em2 prefixlen 64 scopeid 0x3
em3: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
...

To: Stephan A. Rickauer <stephan.rickauer@...>
Cc: Adrian M. Whatley <amw@...>, David Lawrence <dave@...>, berthierp <berthierp@...>, misc <misc@...>, Reyk Floeter <reyk@...>
Date: Friday, July 11, 2008 - 2:42 pm

I'm just guessing, but i had some problems with centos 4.3 (AFAICR), and
the avahi-daemon. It just started to use my gateway's ip address,
causing machines in the internal net not being able to navigate to the
internet. It was an openbsd 4.0, NOT using carp. I suggest you take a
look to see if the avahi-daemon is running on the suse machine. If it
is, shut it down and see it again. Also, try capturing some packets.

My regards,

--
Giancarlo Razzolini
http://lock.razzolini.adm.br
Linux User 172199
Red Hat Certified Engineer no:804006389722501
Verify:https://www.redhat.com/certification/rhce/current/
Moleque Sem Conteudo Numero #002
OpenBSD Stable
Ubuntu 8.04 Hardy Herom
4386 2A6F FFD4 4D5F 5842 6EA0 7ABE BBAB 9C0E 6B85

Previous thread: isakmpd & multiple CAs within one file? by Harald Dunkel on Friday, July 11, 2008 - 10:29 am. (1 message)

Next thread: cdce error by Need Coffee on Friday, July 11, 2008 - 3:42 pm. (1 message)