All,
I'm getting quite a lot of these errors in /var/log/messages and can't
seem to find an appropriate fix in the archives:May 14 21:05:54 svr02 /bsd: uvm_mapent_alloc: out of static map entries
May 14 21:57:47 svr02 /bsd: uvm_mapent_alloc: out of static map entries
May 14 23:00:05 svr02 /bsd: uvm_mapent_alloc: out of static map entries
May 15 07:27:53 svr02 /bsd: uvm_mapent_alloc: out of static map entries
May 15 07:39:59 svr02 /bsd: uvm_mapent_alloc: out of static map entriesN.B. This machine serves mirror content for various F/OSS projects in
addition to standard www content, so it quite often has >350 users
concurrently connected downloading mirrored content (in addition to
visitors who're actually visiting the site).These messages correspond almost exactly with two things:
1.) the sites having quite a few visitors
2.) the sites becoming unavailable. In most cases, it fixes itself
when the freeloaders (errr downloaders ;-) complete their file
transfers.Possibly worth noting:
1.) We've had to crank various settings in Apache to keep serving
traffic, as the stock settings were too low: we were reaching the max
daemons for Apache so new visitors were just out-of-luck.
2.) When the system begins to knuckle under load, I'm taking a
snapshot of various bits like the following.Here's one example:
load averages: 0.45, 0.47, 0.40 07:40:00
247 processes: 245 idle, 2 on processor
CPU0 states: 7.2% user, 0.0% nice, 2.6% system, 2.2% interrupt, 88.0% idle
CPU1 states: 3.6% user, 0.0% nice, 0.3% system, 1.9% interrupt, 94.3% idle
Memory: Real: 339M/737M act/tot Free: 1272M Swap: 0K/518M used/totFrom the archives this seems to be something for which a fix *used* to
be cranking up the following:maxusers 64
option BUFCACHEPERCENT=25
option MULTIPROCESSOR
option MAX_KMAPENT=4000This hardly seems a real fix though--especially given everyone's
hatred of knobs, custom kernels, and such though ...
Are you using squid as well? You may try doing something like
restarting apache.The problem seems related to certain long running processes with
fragmented address spaces.Basically, in order to manage address spaces, the kernel keeps track
of a bunch of maps. Entries in these maps are stored in... map
entries. In certain situations, the kernel can't wait to allocate a
map entry, so it grabs one from a static list. Previously, when they
ran out, the kernel paniced. Now it just says uh oh. The kernel will
merrily go on making more static entries as needed.I'd keep track of how often the message appears. At some point, it
should stop. But it's not really alarming, unless it continues to
print that continuously.
the problem is not in the user land.
the problem is in i386 pmap which abuses kmem_map that is there
for malloc(9)s use and allocates pv_entries from it.
this leads to enormous kmem_map fragmentation and unaccounted
allocations that does not show up in the vmstat and as well leads
to livelocks (sleeping on kmem_map) and out of space in kmem_map
panics as well. there is a number of measures to remediate the
situation proper
- convert pv_entries allocations to pool (i have a diff if you wanna)
- backend malloc w/ pool (filed in sendbug)
- a number of uvm fixes (such as amap ops) that reduce fragmentation.
cu
--
paranoic mickey (my employers have changed but, the name has remained)
Yes, please. Definitely... and thanks.
FWIW I can bring a spare server online this weekend to keep in the
wings in case something goes completely nutty with the diff, so no
worries about this affecting production per se. :-)
Can you please point me to where the diffs you refer to reside?
I'd definitely like to try them out.
Thank you,
Darrian
most of these are filed in sendbug (some for months) already...
here is a cumulative diff also w/ a bonus himem high-quality
software (in caase you managed to squeeze more than 4g of memory
in your box ;).
cu
--
paranoic mickey (my employers have changed but, the name has remained)Index: arch/i386/conf/GENERIC
===================================================================
RCS file: /cvs/src/sys/arch/i386/conf/GENERIC,v
retrieving revision 1.603
diff -u -r1.603 GENERIC
--- arch/i386/conf/GENERIC 25 Feb 2008 23:16:47 -0000 1.603
+++ arch/i386/conf/GENERIC 7 May 2008 12:55:43 -0000
@@ -37,6 +37,8 @@
config bsd swap genericmainbus0 at root
+himem0 at root # himem.sys
+scsibus* at himem?cpu0 at mainbus?
bios0 at mainbus0
Index: arch/i386/conf/files.i386
===================================================================
RCS file: /cvs/src/sys/arch/i386/conf/files.i386,v
retrieving revision 1.172
diff -u -r1.172 files.i386
--- arch/i386/conf/files.i386 4 Mar 2008 21:14:29 -0000 1.172
+++ arch/i386/conf/files.i386 7 May 2008 12:55:43 -0000
@@ -440,6 +440,10 @@
attach esm at mainbus
file arch/i386/i386/esm.c esm needs-flag+device himem: scsi
+attach himem at root
+file arch/i386/i386/himem.c himem needs-flag
+
#
# VESA
#
Index: arch/i386/i386/autoconf.c
===================================================================
RCS file: /cvs/src/sys/arch/i386/i386/autoconf.c,v
retrieving revision 1.78
diff -u -r1.78 autoconf.c
--- arch/i386/i386/autoconf.c 27 Dec 2007 18:04:27 -0000 1.78
+++ arch/i386/i386/autoconf.c 7 May 2008 12:55:43 -0000
@@ -71,6 +71,7 @@
#include <dev/cons.h>#include "ioapic.h"
+#include "himem.h"#if NIOAPIC > 0
#include <machine/i82093var.h>
@@ -117,6 +118,10 @@if (config_rootfound("mainbus", NULL) == NULL)
panic("cpu_configure: mainbus not configured");
+
+#if NHIMEM > 0
+ config_rootfound("himem", NULL);
+#endif#if NIOAPIC > 0
if (nioapics > 0...
Funny you should ask. Yes and no. We are proxying some of the site's
content, but it's with apache's mod_proxy.(No way around this from what we can see as it solves some business
needs in terms of content delivery and is an easy fix to an otherwise
vexing problem.)Restarting apache always solves the problem, but that's hardly a fix.
Sure, I could crontab it to do so automatically and just periodically
kick everyone off, but that's super yucky and still doesn't really
*solve* the problem.... I'd feel good about that being the only answer
I'll go out on a limb and assume this is the case since some of the
files being downloaded are certainly ~100mb or more... some are entire
DVD ISOs. I'd say these downloads qualify as "long running processes,"It isn't alarming per se, but the sites on the server *definitely*
stop accepting new visitors at some point. This seems to correlate
directly to the uvm_mapent_alloc log events.If it were only the mirror visitors who were getting turned away that
would be one thing, but it's actually interfering with regular
traffic, too. :-(In short, I'm trying to find a way to:
1.) serve the oodles of mirror content (since we ourselves rely so
heavily on F/OSS we want to make sure the mirrors are running both for
ourselves and others) and
2.) also keep our normal site traffic humming along, too.I'm hoping to get to the point where "It Just Works", and it sure
seems like the server itself has the horsepower to do it.If the CPUs were sweating hard or we were swapping heavily, it would
make sense, but for it to be knuckling under what seems to me to be
relatively light load seems like there's something else I can do to
make it happy.Knobs, dials, levers, custom kernels, and custom apache builds they
may be, but at this point I'm open to juuuuust about anything and
everything including witch doctors, Chinese herbalists, and/or
exorcists to get the problem solved. :-)Thanks much,
Kevin
Hi Kevin,
Take a look at nginx. We found this server to be quite nice to use, and
it has a small memory footprint, too. For caching + proxying stuff, you
might want to take a look at Varnish, although I don't have first-hand
experience with it yet (nginx alone can proxy just fine, only not cache
anything).Kind regards,
--Toni++
the problem is not the message. problem is unneeded
kmem_map fragmentation that is caused byt misuse
by the bad code.
cu
--
paranoic mickey (my employers have changed but, the name has remained)
Hi Mickey,
I think I do not understand: I'm not worried about the message, but
about the crash I recently experienced on a 4.2 box, after seeing this
message. IOW, I'd be happy already if 4.3 prevents the crash and only
leaves a warning message.Making the code better to not experience the problem in the first place
is, of course, a laudable goal, but unfortunately, I have no
qualification to actually do anything about it.Kind regards,
--Toni++
both explanation of the problem and a big diff that fixes it
had been already posted (:
cu
--
paranoic mickey (my employers have changed but, the name has remained)
Hi Mickey,
thank you. I've seen that diff already (before sending the first), but
don't quite understand whether the crash was already fixed in stock
4.3, with the diff also fixing the memory fragmentation, thus avoiding
the warning message, or whether stock 4.3 is still susceptible to
crashes, and the patch fixes both problems. To understand that, I'll
have to give the diff a much closer look.Kind regards,
--Toni++
I believe cache is in the works, but of course this does no good right
now. Hopefully soon.--
Darrin Chandler | Phoenix BSD User Group | MetaBUG
dwchandler@stilyagin.com | http://phxbug.org/ | http://metabug.org/
http://www.stilyagin.com/ | Daemons in the Desert | Global BUG Federation
Hi Darrin,
I'm not sure whether I want another jack-of-all-trades kind of a web
server, but would have different priorities for it, anyway (like eg.
full IPv6 support - don't know it's in 0.6.x already). I mainly wanted
to point the original poster to the - imho - top contender for
lighttpd, which I personally don't like that much, despite using it for
a number of years.Btw, thanks for providing the port.
Kind regards,
--Toni++
I wouldn't like nginx to include every possible thing either. I doubt
that such a thing will happen. But I think adding cache makes some
sense, since if you're using it as proxy then it's a central place, and
it's also a very common configuration. Kind of like the old question of
whether 'sort' should have the '-u' option or not. 'sort|uniq' is such aYou're welcome :)
--
Darrin Chandler | Phoenix BSD User Group | MetaBUG
dwchandler@stilyagin.com | http://phxbug.org/ | http://metabug.org/
http://www.stilyagin.com/ | Daemons in the Desert | Global BUG Federation
well, use a httpd that is better designed than apache. at least for the
static content that should be kinda easy with a couple of redirects and
a second IP. lighttpd is a good pick.--
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam
If talking about serving static content: mathopd is doing really good job here.
--
Janusz Gumkowski
http://www.am.torun.pl/~ja
What output to you get from 'netstat -m'?
I might get yelled at for this as you mentioned people seem to hate
custom kernels.But i've had good luck with the following options, I'm not sure which
are still relevant, but they help.option NKMEMPAGES_MAX=81920
option NKMEMPAGES=81920
option MAX_KMAPENT=8192I've always received that error you described on any high load openbsd
box. Even with the above changes,
you will eventually get the same error as your new limits are reached.If you come up with any better solutions, please let me know, i'd be
very interested to hear them.-Darrian
sure, they help. at least if you want to believe they do.
randomly pushing buttons you don't understand until it feels better is
going to help how?btw, two of the three up there are completely unrelated to the problem
at hand and useless these days.--
Henning Brauer, hb@bsws.de, henning@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam
I see Allen beat me to the reply with the requested netstat data
below, but in the mean time, I'm going to do the unthinkable and build
a custom kernel with your mods and see where the chips fall. :-)Thanks for the suggestion.
Kevin
Based on that netstat output, things look OK on your system. On some
of my heavier loaded
systems, I will see the peak mbuf use hit the max.Good luck, and as I said if you come up with something better, please
let me know.-Darrian
2867 mbufs in use:
2566 mbufs allocated to data
274 mbufs allocated to packet headers
27 mbufs allocated to socket names and addresses
1129/5450/6144 mbuf clusters in use (current/peak/max)
13028 Kbytes allocated to network (22% in use)
0 requests for memory denied
0 requests for memory delayed
| Jeremy Fitzhardinge | Re: [RFC 00/15] x86_64: Optimize percpu accesses |
| Vladislav Bolkhovitin | Re: Integration of SCST in the mainstream Linux kernel |
| Mike Galbraith | Re: regression: CD burning (k3b) went broke |
git: | |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Linus Torvalds | Re: [GIT]: Networking |
| Michael Grollman | Re: 8169 Intermittent ifup Failure Issue With RTL8102E Chipset in Intel's New D945... |
