Server trouble shooting

Previous thread: OpenBSD kernel janitors by Karel Kulhavy on Tuesday, October 30, 2007 - 3:31 pm. (50 messages)

Next thread: In Memoriam: Jun-ichiro Hagino by Dragos Ruiu on Tuesday, October 30, 2007 - 6:10 pm. (18 messages)
To: <misc@...>
Date: Tuesday, October 30, 2007 - 2:49 pm

Background:

I'm running an web server with the Apache from the base install, php,
pureftp and postgresql database to serve multiple websites. Each
websites runs in its own instance of apache and one extra instance of
apache is doing reverse proxy via the domain name. In all 5 independent
apache instances are started. I've done this to separate the domains so
that php won't be able to access the data from another domain.

A simplified graphic representation:

Internet
|
NAT Firewall (OpenBSD)
|
+----------------------+
| | |
| Apache Reverse proxy | Web Server (OpenBSD 4.0)
| | | |
| dom1.com dom2.com |
+----------------------+

Problem:

This is the second time that after a period of time (1 to 3 months) that
the server does not respond to http, ftp and ssh. The connection seems
to be established but the service does not respond. Ping responds fine.
The first time this happened the system was in the ddb>. Since I'm not
to familiar with kernel debugging I simply restarted the system. :(

Question:

Instead of simply just rebooting the system I would like to start to
learn to trouble shoot the problem. Currently I'm physically away from
the system and can't look at the console. Since I can't connect
successfully via ssh is there anything else I could be doing remotely?

To: Claus <cniesen@...>
Cc: <misc@...>
Date: Tuesday, October 30, 2007 - 7:30 pm

...you could be researching a Lights-out-Management solution for your
server (Dell DRAC, Sun LOM). Best all-around solution is a PC-Weasel
(realweasel.com) connected to the system next to it (Or a RAS
concentrator)

If the system is completing 3-way TCP handshake, then you're dead in the
water. Consider making the system highly available.

~BAS

To: Claus <cniesen@...>, <misc@...>
Date: Tuesday, October 30, 2007 - 6:47 pm

The console terminal didn't respond either. I could use Ctrl-Alt-F2 to switch consoles but the console terminal wouldn't respond at all to key strokes. I didn't see any error messages on the console itself either. Faulty hardware or is it lack of RAM due to the multiple apache instances?

OpenBSD 4.0-stable (GENERIC) #3: Wed Mar 14 14:13:09 CDT 2007
claus@server1.xxxxxx.us:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel Pentium II ("GenuineIntel" 686-class, 512KB L2 cache) 266 MHz
cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,MMX
real mem = 66678784 (65116K)
avail mem = 52568064 (51336K)
using 839 buffers containing 3436544 bytes (3356K) of memory
mainbus0 (root)
bios0 at mainbus0: AT/286+(96) BIOS, date 08/22/99, BIOS32 rev. 0 @ 0xec800, SMBIOS rev. 2.1 @ 0xf13e6 (54 entries)
bios0: Compaq Deskpro
pcibios0 at bios0: rev 2.1 @ 0xec800/0x3800
pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xf6ff0/176 (9 entries)
pcibios0: PCI Interrupt Router at 000:20:0 ("Intel 82371AB PIIX4 ISA" rev 0x00)
pcibios0: PCI bus #1 is the last bus
bios0: ROM list: 0xc0000/0x8000 0xe0000/0x8000!
cpu0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
pchb0 at pci0 dev 0 function 0 "Intel 82443BX AGP" rev 0x02
ppb0 at pci0 dev 1 function 0 "Intel 82443BX AGP" rev 0x02
pci1 at ppb0 bus 1
vga1 at pci1 dev 0 function 0 "ATI Mach64 GD" rev 0x5c
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
xl0 at pci0 dev 16 function 0 "3Com 3c905C 100Base-TX" rev 0x6c: irq 11, address 00:01:02:66:8e:45
bmtphy0 at xl0 phy 24: Broadcom 3C905C internal PHY, rev. 4
pcib0 at pci0 dev 20 function 0 "Intel 82371AB PIIX4 ISA" rev 0x02
pciide0 at pci0 dev 20 function 1 "Intel 82371AB IDE" rev 0x01: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility
wd0 at pciide0 channel 0 drive 0: <IC35L060AVER07-0>
wd0: 16-sector PIO, LBA, 58644MB, 120103200 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode ...

To: OpenBSD misc <misc@...>
Date: Wednesday, October 31, 2007 - 10:56 am

Oddly enough, I had this same problem when I set a console timeout on
our external web server (internal was fine with it). If anything caused
a console timeout (ssh, direct console access, etc) the box stopped
spawning new processes yet allowed existing ones to continue. I ended up
taking the console timeout off and it cleared up the problem. Could be

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

To: <misc@...>
Date: Wednesday, October 31, 2007 - 10:37 am

On 10/30/2007 4:58 PM, Karsten McMinn wrote:
> ddb (4). (trace and ps) Have remote accesible console on the server.
> Check for hardware problems. Check for irregular network traffic.

Thanks for your reply. As already told the system didn't get stuck in
the ddb, so no info from it. The network traffic looked quite light.
34 web hits between 11 am and ~11:44 when the system stopped responding.
Only 2 unsuccessful ftp access that morning (ftp requires TLS).

I'm wondering if this might be caused due to the lack of memory (RAM)?

This is a wild and undereducated assumption: Did the system allow pings
and connections be established but not more because spawned processes
weren't able to get the memory required to run?

Currently the memory stats of the system are (with nearly no load):
# sysctl -n hw.physmem
66678784

# sysctl -n hw.usermem
66256896

# vmstat
procs memory page disks
r b w avm fre flt re pi po fr sr wd0 cd0
0 0 0 46456 4908 46 0 0 0 0 2 7 0

traps cpu
int sys cs us sy id
232 74 18 1 1 99

# top -b
load averages: 0.39, 0.22, 0.18 09:15:24
62 processes: 61 idle, 1 on processor
CPU states: 0.7% user, 0.0% nice, 0.4% system,
0.3% interrupt, 98.7% idle
Memory: Real: 24M/51M act/tot Free: 5476K Swap: 26M/256M used/to

While monitoring the system with "vmstat -w 1" and accessing web pages I
noticed that the free memory can drop significantly. I once got it down
quite a bit by running multiple http sessions simultaneously.

procs memory page disks
r b w avm fre flt re pi po fr sr wd0 cd0
2 0 0 56096 748 1903 0 7 0 0 0 240 0

traps cpu
int sys cs us sy id
355 1545 354 50 10 40

So I assume adding more memory to the system would be a good investment
and not money wasted, right?

To: <misc@...>
Date: Tuesday, October 30, 2007 - 5:58 pm

ddb (4). (trace and ps) Have remote accesible console on the server.
Check for hardware problems. Check for irregular network traffic.

Previous thread: OpenBSD kernel janitors by Karel Kulhavy on Tuesday, October 30, 2007 - 3:31 pm. (50 messages)

Next thread: In Memoriam: Jun-ichiro Hagino by Dragos Ruiu on Tuesday, October 30, 2007 - 6:10 pm. (18 messages)