On Tue, Jan 22, 2008 at 09:59:33PM -0800, Xin LI wrote:
quoted text > -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>=20
> Kostik Belousov wrote:
> > On Tue, Jan 22, 2008 at 03:45:32PM -0800, Xin LI wrote:
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >>
> >> Hi,
> >>
> >> I have got a lot of this in dmesg output for RELENG_7_0 as of today:
> >>
> >> vm_thread_new: kstack allocation failed
> >> vm_thread_new: kstack allocation failed
> >> vm_thread_new: kstack allocation failed
> >> vm_thread_new: kstack allocation failed
> >> vm_thread_new: kstack allocation failed
> >> vm_thread_new: kstack allocation failed
> >>
> >> Any idea?
> >=20
> > Does it cause any problems aside from printing these messages ?
>=20
> It causes some fork() to fail.
>=20
> > What workload do you put on the machine ?
>=20
> It was an rsync from NFS to ZFS with ~15M of files, and rsync will
> consume basically all physical memory. I end up with some 2GB active,
> 4GB wired thing. (The system has 8GB of RAM), and I added a "make -j9
> buildworld" into the chaos to see if things get worse, and it did :-)
>=20
> > The messages came from the failure of the kernel to allocate address
> > space for the kernel stack for a thread being created. Previously, the
> > system would panic encountering this situation.
>=20
> Yes, I knew, previously it just panic and hangs there, and thanks a lot
> for fixing it =3D-)
>=20
> > This may happen due to kernel_map address space depletion, for instance,
> > by having a lot (on i386 machines with > 1Gb memory, ~40000) threads.
>=20
> It seems that I have hit some sort of "leak" or some exhaustion issue.
> Say, when the workload is gone, the system did not recover from the
> situation, and reboot worked fine.
>=20
> The system is sort of in production and it is about 20 miles away from
> my office. Do you want me to do some experiments for this?
Yes, I want to know what exactly leaked. Ideally, I would like to see the
series of the output of the vmstat -z and vmstat -m for some time before
the system is bogged down. But, even the one snapshot of the vmstat -z/-m
output immediately before things stop working would be good to look at.
Output of the ps auxwwH is helpful too.