[ yes, dear, i know i should not run current on production systems. but then, if no one does, how are we gonna shake out proplems under load and real life conditions. someone has to do it. ] this bug is now causing system lockup when the midnight gmt jobs run on one system and it manifesting with less serious consequences on three others. the servers are all racked and remote but have serial console access. how can i be of help finding this one? randy _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
oh, and other folk have reported on list of seeing the same. though they have not added to the pr. randy _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
If you're using ZFS then you want to get to the tip of current to pick up the VM backpressure fixes that were added. Scott _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
i cvsupped and built and installed new kernel and world. cvsup of May 26 00:36 it locks up solid very reliably i do not think this is a related bug. randy _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
I'm still seeing occasional (I.E. I haven't pinned it down) ZFS write crashes. (this is with a current of 23-May-2009.) See my posts earlier today. I expect to get a textdump with the UMA and malloc stats that Kip requested in the next 24-72 hours if it stays true to -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683 E-Mail: ler@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
the problem is worst on a zfs system, crashing. and it is upgrading now and does so once a week. the problem also manifests on non-zfs systems. randy _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Which arch?
How much memory?
What are your loader.conf settings?
-Kip
--
When bad men combine, the good must associate; else they will fall one
by one, an unpitied sacrifice in a contemptible struggle.
Edmund Burke
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
> Which arch? on the worst one, which runs zfs # grep -v ^# /boot/loader.conf* /boot/loader.conf.local:loader_logo=beastie /boot/loader.conf.local:console="comconsole vidconsole" /boot/loader.conf.local:comconsole_speed=9600 /boot/loader.conf.local:vfs.zfs.prefetch_disable=1 /boot/loader.conf.local:zfs_load=YES /boot/loader.conf.local:vfs.zfs.prefetch_disable=1 /boot/loader.conf.local:geom_mirror_load=YES /boot/loader.conf.local:kern.maxvnodes=50000 on another which has gmirror, not zfs # grep -v ^# /boot/loader.conf* /boot/loader.conf.local:loader_logo=beastie /boot/loader.conf.local:console="comconsole vidconsole" /boot/loader.conf.local:comconsole_speed="9600" /boot/loader.conf.local:vm.pmap.pg_ps_enabled=1 /boot/loader.conf.local:geom_mirror_load=YES ... randy _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
--
When bad men combine, the good must associate; else they will fall one
by one, an unpitied sacrifice in a contemptible struggle.
Edmund Burke
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
yep. i am presuming that it is some kernel or other config aspect. randy _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
What type of hard drives?
How big are your zpools?
Do you use compression?
(I'm wondering if compression and slow disks have something to do with it)
--
When bad men combine, the good must associate; else they will fall one
by one, an unpitied sacrifice in a contemptible struggle.
Edmund Burke
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
a zfs system ad4: 305245MB <Seagate ST3320620AS 3.AAK> at ata2-master SATA150 ad6: 305245MB <Seagate ST3320620AS 3.AAE> at ata3-master SATA150 ad8: 305245MB <Seagate ST3320620AS 3.AAE> at ata4-master SATA150 ad10: 305245MB <Seagate ST3320620AS 3.AAK> at ata5-master SATA150 a gmirror system ad4: 238475MB <Seagate ST3250820NS 3.AEK> at ata2-master SATA150 ad5: 238475MB <Seagate ST3250820NS 3.AEK> at ata2-slave SATA150 ad6: 238475MB <Seagate ST3250820NS 3.AEK> at ata3-master SATA150 again, this is happening on non-zfs systems as well. i do not think this is zfs related. but the zfs system is the one with the worst lockups. it looks like Filesystem 1024-blocks Used Avail Capacity Mounted on /dev/mirror/boota 8122126 636960 6835396 9% / devfs 1 1 0 100% /dev procfs 4 4 0 100% /proc tank/data 653313024 0 653313024 0% /data tank/data/nfsen 845243776 191930752 653313024 23% /data/nfsen tank/data/rpki 653494144 181120 653313024 0% /data/rpki tank 653313024 0 653313024 0% /tank tank/usr 658919040 5606016 653313024 1% /usr tank/usr/home 660368256 7055232 653313024 1% /usr/home tank/usr/usr 658758144 5445120 653313024 1% /usr/usr tank/var 654433024 1120000 653313024 0% /var tank/var/log 653400960 87936 653313024 0% /var/log tank/var/spool 653337088 24064 653313024 0% /var/spool /dev/md0 253678 14 233370 0% /tmp nope randy _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
I ran into this crash (I think*) yesterday too, albeit in a (amd64) VM with 768MB RAM. However, I had set arc_min="30M" and arc_max="100M" so I expected it to work, but it crashed within 10-15 minutes of make -j4 buildworld. I changed the values to 5 and 30M, and so far (~30 minutes) no crash. The sources were from late May 21st, currently building rev. 192805 (since 192808 broke the build, at least on the tinderbox). * "I think" because I went to check on it it the middle of the night, saw a page fault in kernel mode or whatever, and figured "damnit... well, I'll suspend the VM, turn the laptop off and check in the morning". I hit shutdown instead, so no backtrace or anything. D'oh! Regards, Thomas _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Can you try not setting the ARC? I haven't had any problems on my comparably sized VMs. -Kip _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Uh oh, I think I replied to the wrong thread. After reading the PR in question, this doesn't appear to be the same problem that I'm having (which appears to be the ARC growing until it panics). Anyway, when the build is complete and all that (~2.5 hours to go, plus other stuff after that), I'll try again with no ARC settings, when I have the time. Regards, Thomas _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
OK, I tried it, since it crashed even with my low ARC settings. With
*no* ARC settings, I get this:
cc -O2 -pipe -I. -DIN_GCC -DHAVE_CONFIG_H -DPREFIX=\"/usr\" -I/usr/obj/
usr/src/gnu/usr.bin/cc/cc_tools/../cc_tools -I/usr/src/gnu/usr.bin/cc/
cc_tools/../cc_tools -I/usr/src/gnu/usr.bin/cc/cc_tools/../../../../
contrib/gcc -I/usr/src/gnu/usr.bin/cc/cc_tools/../../../../contrib/gcc/
config -I/usr/src/gnu/usr.bin/cc/cc_tools/../../../../contrib/gcclibs/
include -I/usr/src/gnu/usr.bin/cc/cc_tools/../../../../contrib/gcclibs/
libcpp/include -I/usr/src/gnu/usr.bin/cc/cc_tools/../../../../contrib/
gcclibs/libdecnumber -g -DGENERATOR_FILE -DHAVE_CONFIG_H -I/usr/obj/
usr/src/tmp/legacy/usr/include -c /usr/src/gnu/usr.bin/cc/
cc_tools/../../../../contrib/gcc/genattr.c
cc -O2 -pipe -I. -DIN_GCC -DHAVE_CONFIG_H -DPREFIX=\"/usr\" -I/usr/obj/
usr/src/gnu/usr.bin/cc/cc_tools/../cc_tools -I/usr/src/gnu/usr.bin/cc/
cc_tools/../cc_tools -I/usr/src/gnu/usr.bin/cc/cc_tools/../../../../
contrib/gcc -I/usr/src/gnu/usr.bin/cc/cc_tools/../../../../contrib/gcc/
config -I/usr/src/gnu/usr.bin/cc/cc_tools/../../../../contrib/gcclibs/
include -I/usr/src/gnu/usr.bin/cc/cc_tools/../../../../contrib/gcclibs/
libcpp/include -I/usr/src/gnu/usr.bin/cc/cc_tools/../../../../contrib/
gcclibs/libdecnumber -g -DGENERATOR_FILE -DHAVE_CONFIG_H -I/usr/obj/
usr/src/tmp/legacy/usr/include -c /usr/src/gnu/usr.bin/cc/
cc_tools/../../../../contrib/gcc/genautomata.c
*** drop to debugger here ***
---------------
# while :; do date; vmstat -m | grep -E 'Type|solaris'; sysctl
kstat.zfs.misc.arcstats.size; sleep 10; done
[...]
Wed May 27 11:33:43 CEST 2009
Type InUse MemUse HighUse Requests Size(s)
solaris 44183 109686K - 9175781
16,32,64,128,256,512,1024,2048,4096
kstat.zfs.misc.arcstats.size: 159089184
Wed May 27 11:33:53 CEST 2009
Type InUse MemUse HighUse Requests Size(s)
solaris 37633 108437K - 9536555 ...(Sorry if the quoting got FUBAR.)
I tried this "once" more, with 1GB VM RAM, no ARC settings and a 4GB
swap for dumps. (Apparently, when I had 640MB VM RAM, the dump created
was ~1150 MB. I figured it couldn't exceed RAM size.)
Between the previous tests and this, I had loads of crashes (with
640MB), even so bad that I couldn't boot because savecore would cause
a panic. I increased VM RAM and set arc_max in the loader and it
booted fine. Then, I tried (see below) with 1GB VM RAM and again no
ARC settings.
The wired count grew and grew and grew, until it crashed in
lzjb_decompress(), backtrace:
lzjb_decompress()
zio_decompress()
zio_done()
zio_execute()
zio_done()
zio_execute()
taskq_thread()
fork_exit()
fork_trampoline()
On "call doadump" I got "Fatal double fault", no dump, and a reboot.
Here's a LONG output of some vmstat output while running buildworld -
j4. Note how the wired count keeps increasing and increasing until it
breaks (in part, I guess this is intended, but it seems to grow a tad
out of hand):
Regards,
Thomas
------ NO arc settings below! ------
[serenity@clone ~]$ while :; do date; echo; vmstat -s | grep -E 'pages
(cached|active|wired down|free$)'; echo; sleep 20; done
Wed May 27 13:54:50 CEST 2009
15716 pages cached
41358 pages active
63244 pages wired down
141368 pages free
Wed May 27 13:55:10 CEST 2009
15719 pages cached
65571 pages active
65515 pages wired down
113358 pages free
Wed May 27 13:55:30 CEST 2009
15719 pages cached
41471 pages active
70436 pages wired down
132093 pages free
Wed May 27 13:55:50 CEST 2009
15809 pages cached
11660 pages active
81518 pages wired down
153294 pages free
Wed May 27 13:56:10 CEST 2009
16124 pages cached
11013 pages active
87862 pages wired down
147672 pages free
Wed May 27 13:56:30 CEST 2009
16124 pages cached
10917 pages active
...