Re: [SOLVED] Re: Running systat queues Leads to System Hang

Previous thread: EM_MIPS==LOONGSON? by Jordi Beltran Creix on Friday, June 18, 2010 - 8:18 pm. (2 messages)

Next thread: Re: ! by Marco Rivera on Saturday, June 19, 2010 - 12:09 am. (1 message)
From: Daniel Melameth
Date: Friday, June 18, 2010 - 10:08 pm

On my firewall at home, on occasion, running systat queues leaves me with an
unresponsive system.  pings are not returned and the keyboard at the console
is unresponsive.  Sometimes the command works fine and sometimes it does
not--though it does system the issue is more likely to occur when the system
has an uptime of more than a week or two.  I'm uncertain how to troubleshoot
this further and I have been unable to reproduce the issue on other
4.7-stable systems (though these other systems are not running the same
hardware and software).

Any ideas appreciated.



OpenBSD 4.7 (GENERIC) #0: Wed May 19 21:44:26 MDT 2010
    daniel@meth.internal.melameth.com:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel Pentium III ("GenuineIntel" 686-class) 446 MHz
cpu0:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,
SSE
real mem  = 334458880 (318MB)
avail mem = 315371520 (300MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 10/12/01, BIOS32 rev. 0 @ 0xfbf92,
SMBIOS rev. 2.3 @ 0xec000 (45 entries)
bios0: vendor TOSHIBA version "Version 2.60" date 10/12/2001
bios0: TOSHIBA Satellite Pro 4600
apm0 at bios0: Power Management spec V1.2
apm0: battery life expectancy 100%
apm0: AC on, battery charge high, estimated 2:12 hours
acpi at bios0 function 0x0 not configured
pcibios0 at bios0: rev 2.1 @ 0xf0000/0x10000
pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xf01c0/144 (7 entries)
pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 82371FB ISA" rev 0x00)
pcibios0: PCI bus #5 is the last bus
bios0: ROM list: 0xc0000/0xc000 0xe0000/0x10000!
cpu0 at mainbus0: (uniprocessor)
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 0 function 0 "Intel 82815 Host" rev 0x11
intelagp0 at pchb0
agp0 at intelagp0: aperture at 0xf0000000, size 0x2400000
ppb0 at pci0 dev 1 function 0 "Intel 82815 AGP" rev 0x11
pci1 at ppb0 bus 1
vga1 at pci1 dev 0 function 0 "Trident CyberBlade XP" rev 0x63
wsdisplay0 at vga1 mux 1: console (80x25, vt100 ...
From: Daniel Melameth
Date: Friday, June 25, 2010 - 8:19 pm

Would love for someone to hit me with a clue stick here.  Once I run
this command, I don't see anything--and the box instantly locks up.

On Fri, Jun 18, 2010 at 11:08 PM, Daniel Melameth <daniel@melameth.com>

From: Daniel Melameth
Date: Wednesday, July 7, 2010 - 7:45 pm

On Fri, Jun 18, 2010 at 11:08 PM, Daniel Melameth <daniel@melameth.com>

I upgraded the system several days ago to a snapshot from just before
the hackathon, and the system appeared more stable, but I can now also
instantly kill the box by running netstat -m after about five days of
uptime.

Ideas appreciated...



OpenBSD 4.7-current (GENERIC) #34: Wed Jun 23 22:16:39 MDT 2010
    deraadt@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel Pentium III ("GenuineIntel" 686-class) 446 MHz
cpu0:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PSE36,MMX,FXSR,SSE
real mem  = 334458880 (318MB)
avail mem = 314679296 (300MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 10/12/01, BIOS32 rev. 0 @
0xfbf92, SMBIOS rev. 2.3 @ 0xec000 (45 entries)
bios0: vendor TOSHIBA version "Version 2.60" date 10/12/2001
bios0: TOSHIBA Satellite Pro 4600
apm0 at bios0: Power Management spec V1.2
apm0: battery life expectancy 100%
apm0: AC on, battery charge high, estimated 1:54 hours
acpi at bios0 function 0x0 not configured
pcibios0 at bios0: rev 2.1 @ 0xf0000/0x10000
pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xf01c0/144 (7 entries)
pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 82371FB ISA" rev 0x00)
pcibios0: PCI bus #5 is the last bus
bios0: ROM list: 0xc0000/0xc000 0xe0000/0x10000!
cpu0 at mainbus0: (uniprocessor)
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 0 function 0 "Intel 82815 Host" rev 0x11
intelagp0 at pchb0
agp0 at intelagp0: aperture at 0xf0000000, size 0x2400000
ppb0 at pci0 dev 1 function 0 "Intel 82815 AGP" rev 0x11
pci1 at ppb0 bus 1
vga1 at pci1 dev 0 function 0 "Trident CyberBlade XP" rev 0x63
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
ppb1 at pci0 dev 30 function 0 "Intel 82801BAM Hub-to-PCI" rev 0x03
pci2 at ppb1 bus 2
mem address conflict 0x14000000/0x1000
mem address conflict 0x14001000/0x1000
fxp0 at pci2 dev 8 function 0 "Intel ...
From: Richard Toohey
Date: Wednesday, July 7, 2010 - 11:34 pm

Hardware?

Tried different NICs?  RAM?  Put the HD in another machine?

No-one else seems to be seeing this (or reporting it) and you can't
reproduce on other machines, so worth eliminating hardware.

Anything unusual or different about this machine or what you run on it?

From: Scott McEachern
Date: Thursday, July 8, 2010 - 1:33 am

I said much the same thing to Daniel off-list when he first posted 
almost two weeks ago, suggesting he try both a new snapshot (at the 
time) and trying another after the hackathon.

Interestingly, since then I've installed the June 23rd snapshot (and 
built to -current on June 27th) and guess what?  Sporadic freezes under 
different circumstances, none of which are the same as Daniel's (netstat 
-m seems to work fine for me.)  When I say freeze, I mean locked up 
hard: no mouse, no keyboard, no pings, nothing; I have to power cycle it.

Two freezes have occurred when I wasn't using the system locally, just 
watching movies (on another PC) using Samba.  One freeze when I was 
reading my mail locally (like now), but an ssh network backup was taking 
place from /etc/daily.local.

I'll be trying a newer snap this weekend (or before) and see how things 
go.  This is using the same hardware and same setup that has been fine 
for almost two years (except a new HDD from Nov/09), so I seriously 
doubt it's hardware.  Three "random" freezes in a week and a half when 
it's never happened on this hardware before, ever.  My previous install 
was running -current from early(?) May.

Sorry for the completely vague message, I know it won't help anyone 
debug anything.  The problem can't be reproduced, but I'm guessing some 
networking changes have happened that are affecting Daniel and myself.

I'm only posting this in case there are other lurkers that this is 
happening to, who haven't mentioned anything because there just aren't 
any leads to go on.

So, anyone else having mysterious intermittent lockups when the network 
is in use?

Dmesg & processes: (the unmounted warning is from the last time it froze 
up, 27h ago)

OpenBSD 4.7-current (GENERIC.MP) #0: Sun Jun 27 01:54:59 EDT 2010
     root@blackstaff.erratic.ca:/usr/src/sys/arch/i386/compile/GENERIC.MP
cpu0: Intel(R) Pentium(R) 4 CPU 3.20GHz ("GenuineIntel" 686-class) 3.20 GHz
cpu0: ...
From: Daniel Melameth
Date: Thursday, July 8, 2010 - 7:14 am

On Thu, Jul 8, 2010 at 12:34 AM, Richard Toohey

Perhaps, but the machine did not have this issue on 4.6 from just a
couple of months ago.  This box NEVER freezes--UNLESS I run either of
these two commands (so far).  I can recompile the kernel and userland

No.  With it being an older notebook and having three NICs, I'm a bit
limited in my flexibility and recreating the same environment on

Since it's for home use, but probably elaborate home use, maybe.  I
have multiple NICs, VLANs, pppoe, pflog, nfdump, named--it's also my
personal web server and is doing some minor nfs.

From: Daniel Melameth
Date: Monday, November 22, 2010 - 9:50 pm

FWIW, I thought I'd chime back with an update on this...  I was able
to reproduce the issue readily with 4.8-stable AND with different
hardware, but I have been been unable to reproduce this since running
a snapshot from November 7.  The box currently has 14 days of uptime
and numerous netstats and systats have not been able to hang it.


OpenBSD 4.8-current (GENERIC) #473: Sun Nov  7 13:33:27 MST 2010
    deraadt@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel(R) Pentium(R) M processor 1600MHz ("GenuineIntel"
686-class) 1.60 GHz
cpu0:
FPU,V86,DE,PSE,TSC,MSR,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,CFLUSH,DS,ACPI,MMX,FXSR,
SSE,SSE2,TM,SBF,EST,TM2
real mem  = 2146856960 (2047MB)
avail mem = 2101690368 (2004MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 08/30/06, BIOS32 rev. 0 @ 0xf0000,
SMBIOS
rev. 2.3 @ 0xfa1ee (31 entries)
bios0: vendor Hewlett-Packard version "68BDD Ver. F.15" date 08/30/2006
bios0: Hewlett-Packard HP Compaq nc6000 (DQ880A#ABA)
apm at bios0 function 0x15 not configured
acpi0 at bios0: rev 0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT
acpi0: wakeup devices C056(S5)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpiprt0 at acpi0: bus 1 (C045)
acpiprt1 at acpi0: bus 2 (C056)
acpiprt2 at acpi0: bus 0 (C044)
acpiec0 at acpi0
acpicpu0 at acpi0: C3, C2, C1, PSS
acpipwrres0 at acpi0: C16D
acpipwrres1 at acpi0: C13D
acpipwrres2 at acpi0: C184
acpipwrres3 at acpi0: C18B
acpipwrres4 at acpi0: C195
acpipwrres5 at acpi0: C0E6
acpipwrres6 at acpi0: C20B
acpipwrres7 at acpi0: C20C
acpipwrres8 at acpi0: C20D
acpipwrres9 at acpi0: C20E
acpitz0 at acpi0: critical temperature 103 degC
acpitz1 at acpi0: critical temperature 115 degC
acpitz2 at acpi0: critical temperature 103 degC
acpibat0 at acpi0: C137 model "Primary" serial 31163 2004/02/06 type
LIon oem "Hewlett-Packard"
acpibat1 at acpi0: C136 not present
acpiac0 at acpi0: AC unit online
acpibtn0 at acpi0: C139
acpibtn1 at acpi0: C138
acpivideo0 at acpi0: C0CF
acpivout0 ...
From: Daniel Melameth
Date: Wednesday, December 1, 2010 - 12:18 pm

Well, it would appear I spoke too soon.  After 22 days of uptime, a
simple netstat -m again caused the box to lockup and I was left with
FPU,V86,DE,PSE,TSC,MSR,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,CFLUSH,DS,ACPI,MMX,FXSR,

From: Daniel Melameth
Date: Sunday, December 5, 2010 - 10:21 am

For the archives, this issue is now resolved.  I was able to isolate
the problem to nfsd and a fix has just been committed to -current--see
http://www.openbsd.org/cgi-bin/cvsweb/src/sys/nfs/nfs_syscalls.c
revision 1.92 for details (it appears this issue might have been
introduced in revision 1.83.)

Previous thread: EM_MIPS==LOONGSON? by Jordi Beltran Creix on Friday, June 18, 2010 - 8:18 pm. (2 messages)

Next thread: Re: ! by Marco Rivera on Saturday, June 19, 2010 - 12:09 am. (1 message)