login
Header Space

 
 

Re: sbrk(2) broken

Previous thread: HEADSUP: new wiki page: State of Packages on Sparc64 by Mark Linimon on Wednesday, January 2, 2008 - 7:13 pm. (1 message)

Next thread: panic about half the time with WPA+WPI during startup by Hanns Hartman on Thursday, January 3, 2008 - 1:22 pm. (3 messages)
To: <freebsd-current@...>
Cc: Poul-Henning Kamp <phk@...>
Date: Thursday, January 3, 2008 - 2:38 am

Poul-Henning noticed today that xchat fails to start if malloc uses sbrk 
internally.  This failure happens during the first call to malloc, with 
the following message:

Fatal error 'Can't allocate initial thread' at line 335 in file 
/usr/src/lib/libthr/thread/thr_init.c (errno = 12)

This can be worked around with MALLOC_OPTIONS=dM .

The problem does not appear to be specific to jemalloc; I reverted 
src/lib/libc/stdlib/malloc.c to revision 1.92 (last phkmalloc revision), 
which also uses sbrk, and the failure mode is the same.

The failure occurs on both i386 and amd64.  It appears that sbrk(0) 
returns an address that is in the address range normally used by mmap. 
So, the first call to sbrk with a non-zero increment is fantastically 
wrong.  On i386 (ktrace output):

   1013 xchat    CALL  break(0x28200000)
   1013 xchat    RET   break -1 errno 12 Cannot allocate memory

On amd64 (truss ouput):

   break(0x800900000)  ERR#12 'Cannot allocate memory'

sbrk is not a true system call, so it seems like the problem should have 
something to do with the _end data symbol.  I looked at it in gdb though 
and never saw an unreasonable value, despite bogus sbrk(0) results.  I 
do not know offhand how to get the addresses of .minbrk and .curbrk 
(register inspection within gdb while stepping through sbrk?), which are 
what sbrk actually uses (see src/lib/libc/amd64/sys/sbrk.S).  Perhaps 
the loader isn't initializing them correctly...

I am quite pressed for time at the moment, and cannot look into this in 
any more detail for at least a couple of weeks.  If anyone knows what 
the problem is, please let me know.

Thanks,
Jason
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Jason Evans <jasone@...>
Cc: <freebsd-current@...>, Poul-Henning Kamp <phk@...>
Date: Friday, January 4, 2008 - 8:21 am

Malloc() itself knows about memory amount _really_ in use by a program and 
could check it don't go beyond the limits, but for this it needs run-time 
check via getrlimit() call for each malloc() call (a program can use 
setrlimit() by itself). Traking direct mmap()s and sbrk()s outside of 
malloc() is also needed.

-- 
http://ache.pp.ru/
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Andrey Chernov <ache@...>
Cc: <freebsd-current@...>, Jason Evans <jasone@...>
Date: Friday, January 4, 2008 - 8:57 am

No, the VM system has a much better idea about this.

You need to think about this the right way:

There is address space allocated to the process (via sbrk/mmap)

A subset of this, is address space allocated by the program (via malloc)

...and then there is memory actually in use, which is an entirely different
thing, of which we currently only have some kind of clue in the VM
system.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Poul-Henning Kamp <phk@...>
Cc: <freebsd-current@...>, Jason Evans <jasone@...>
Date: Friday, January 4, 2008 - 9:12 am

Then, we need sysctl to fetch that "memory actually in use" from the 
kernel and compare that with getrlimit() which allows malloc() to return 
0 when needed.

-- 
http://ache.pp.ru/
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Andrey Chernov <ache@...>
Cc: <freebsd-current@...>
Date: Friday, January 4, 2008 - 9:25 am

That won't help much -- malloc could have allocated some address space that
hasn't (yet) been touched by the process.  Just returning 0 when the
amount of memory "in use" hits a limit wouldn't stop the process from
then touching all the memory it has previously been allocated and
exceeding the limit.

-- 
David Taylor
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: <freebsd-current@...>
Date: Friday, January 4, 2008 - 10:22 am

In that case the process is subject to be killed by system, if exceeds its 
limits.
But... this is not malloc() problem at all, malloc() designed to detect
overflow situation, not prevent it. The malloc() problem is not returning 0.

-- 
http://ache.pp.ru/
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Andrey Chernov <ache@...>
Cc: <freebsd-current@...>, Jason Evans <jasone@...>
Date: Friday, January 4, 2008 - 9:28 am

[Empty message]
To: Jason Evans <jasone@...>
Cc: <freebsd-current@...>
Date: Thursday, January 3, 2008 - 4:39 pm

I cannot say definitely what happen, but please note that the _end
symbol is defined by linker script, and it shall be present in all
executable and shared objects. The value you reported would be naturally
the _end value for some shared object.

I tried both the RELENG_7 and HEAD, and sbrk(0) correctly returns a
seemingly valid value like 0x8049644.

#include &lt;sys/types.h&gt;
#include &lt;unistd.h&gt;
#include &lt;stdio.h&gt;

int
main(int argc, char *argv)
{
	void *p;

	p =3D sbrk(0);
	printf("%p\n", p);

	return (0);
}
To: Jason Evans <jasone@...>
Cc: <freebsd-current@...>, Poul-Henning Kamp <phk@...>
Date: Thursday, January 3, 2008 - 3:21 pm

The real question is why we would revert perfectly good code (jemalloc)
from using a modern interface to using one that has been obsolete for
twenty years, and marked as such in the man page for seven years.

If rwatson@ wants malloc() to respect resource limits, he can bloody
well fix mmap().  Until he does, the datasize limit is a joke anyway, as
anyone can circumvent it by either using mmap() instead of malloc() or
setting _malloc_options before calling malloc().

DES
-- 
Dag-Erling Smørgrav - des@des.no
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Dag-Erling Smørgrav <des@...>
Cc: <freebsd-current@...>, Jason Evans <jasone@...>, Poul-Henning Kamp <phk@...>
Date: Thursday, January 3, 2008 - 8:26 pm

The issue here was that there were a number of reports that out-of-control=
=20
applications were toasting systems that weren't getting toasted under 6.x. =
 I=20
experienced this on my web server, but the ports build cluster has been=20
running into it for months.  The symptom is that a single application exhau=
sts=20
swap, causing all sorts of things to break (tm), killing of other large=20
processes, etc.  To be clear, in the new world order, instead of getting NU=
LL=20
back from malloc(3), SIGKILL is delivered to large processes.

When I e-mailed Jason Evans and Alan Cox about it, I suggested that we=20
actually teach malloc(3) to enforce an allocation limit itself by querying =
a=20
limit once at process startup, and then using its own accounting to decide=
=20
when to start failing requests.  As an alternative model that would require=
=20
some more infrastructural changes, I suggested a new mmap() flag that hinte=
d=20
to the kernel that the page should count against a swap/anonymous memory=20
limit, but that we should avoid more serious changes at the last minute bef=
ore=20
a release.  Alan suggested the the model Jason ended up implementing as a=
=20
lower risk way to restore the 6.x resource limits non-disruptively.  As it=
=20
turned out, this proved much more complicated than expected.

The right answer is presumably to introduce a new LIMIT_SWAP, which limits =
the=20
allocation of anonymous memory by processes, and size it to something like =
90%=20
of swap space by default.  Since that won't be happening before 7.0, I beli=
eve=20
the consensus is to simply not MFC the changes for 7 and proceed with the=
=20
release.  However, having a resource limit on swap use in order to prevent =
the=20
above scenario is actually quite important: SIGKILL of arbitrary processes =
is=20
not a good way to deal with one run-away process, and the virtual memory si=
ze=20
limit, while also useful, prevents you from limiting the allocation of swap=
=20
without also ...
To: Robert Watson <rwatson@...>
Cc: Dag-Erling Smørgrav <des@...>, <freebsd-current@...>, Jason Evans <jasone@...>
Date: Friday, January 4, 2008 - 6:41 am

Huh??? Again, huh???
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Igor Mozolevsky <igor@...>
Cc: Dag-Erling Smørgrav <des@...>, <freebsd-current@...>, Jason Evans <jasone@...>
Date: Friday, January 4, 2008 - 7:19 am

FreeBSD allows memory overcommit, both overcommit of physical memory resulting 
in paging, and overcommit of swap space.  For the last few years, resource 
limits on the data segment size, previously observed by malloc(), have 
prevented processes from mallocing enough memory individually to exhaust swap 
on 32-bit systems.  This is arguably a bug, because you actually want a single 
process to be able to allocate enough memory to fill its address space, but 
because the data segment size is used to make address space layout decisions 
from the inception of the process, is rather inate to using sbrk().  Jason's 
new malloc uses mmap() of anonymous memory, which isn't affected by the data 
segment limit, and hence, as a feature, isn't limited by the resouce limit. 
This turns out to be awkward if you have a run-away process, as where 
previously it would simply get back an error when it tried to exceed its 
resource limit, now it simply consumes all your swap, which then results in 
overcommit.

My hope was that we could re-introduce a resource limit on malloc'd memory 
without large changes, but that appears to have been more tricky than hoped. 
The goal is not to prevent overcommit, which is invaluable in UNIX systems due 
to the fork() model which pretty much pre-supposes it by design, rather, to 
prevent exhaustion of swap by a single process if not specifically allowed by 
the administrator (in the same way we limit all sorts of other things, like 
open files, mbufs, socket buffer memory, etc).  The right way to do it is to 
provide a specifically configurable process limit on swap use, the same way we 
did for data segment size, only not data segment size, but that was considered 
likely too risky for 7.0.

Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@f...
To: Igor Mozolevsky <igor@...>
Cc: <freebsd-current@...>, Robert Watson <rwatson@...>, Jason Evans <jasone@...>
Date: Friday, January 4, 2008 - 6:55 am

For the same reason as it has for the last 20 years or so: memory
overcommit, which means that malloc() allocates address space, not
memory.  Actual memory is allocated on-demand when the address space is
used (read from or written to).  If there is no RAM left and none can be
freed by swapping out, the process gets killed.  The process that gets
killed is not necessarily the memory hog, it is merely the process that
is unlucky enough to touch a new page at the wrong moment, i.e. when all
RAM and swap is exhausted *or* everything in RAM is wired down and
unswappable.

Of course, if you're afraid of memory overcommit and you know in advance
how much memory you need, you can simply allocate a sufficient amount of
address space at startup and touch it all.  This way, you will either be
killed right away, or be guaranteed to have sufficient memory for the
rest of your (process) lifetime.  Alternatively, do what Varnish does:
create a large file, mmap it, and allocate everything you need from that
area, so you have your own private swap space.  Just make sure to
actually allocate the disk space you need (by filling the file with
zeroes, or at the minimum writing a zero to the file every sb.st_blksize
bytes, preferably sequentially to avoid excessive fragmentation) or you
may run into the same problem as with malloc() if the disk fills up
while your backing file is still sparse.

The ability to specify a backing file to use instead of anonymous
mappings would be a cool addition to jemalloc.

DES
-- 
Dag-Erling Smørgrav - des@des.no
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Dag-Erling Smørgrav <des@...>
Cc: <freebsd-current@...>, Robert Watson <rwatson@...>, Jason Evans <jasone@...>
Date: Friday, January 4, 2008 - 7:18 am

Broadcasting SIGDANGER would be a much better option; followed by
SIGTERM to the memory hogger (to allow for graceful termination) and
only then SIGKILL. I can imagine a few (legitimate) scenarios when a


That would be really cool and even better if it allocated it in a
contiguous chunk.


Igor :-)
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Igor Mozolevsky <igor@...>
Cc: Dag-Erling Sm?rgrav <des@...>, <freebsd-current@...>, Robert Watson <rwatson@...>, Jason Evans <jasone@...>
Date: Friday, January 4, 2008 - 11:26 pm

That would create a nicely sized 'hole' in the starting blocks.  What
Dag-Erling describes is the correct(TM) way of making sure that all
blocks have been allocated from the backing store of the file.

_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Igor Mozolevsky <igor@...>
Cc: <freebsd-current@...>, Robert Watson <rwatson@...>, Jason Evans <jasone@...>
Date: Friday, January 4, 2008 - 8:45 am

We don't currently have SIGDANGER, but the signal code was rewritten
years ago to allow more than 32 signals precisely for the purpose of
implementing an AIX-like SIGDANGER.  This wasn't done, however, and
eventually SIGTHR was the first new signal to take advantage of the

No.  First of all, you're thinking of lseek(), not fseek() Second, an
lseek() beyond the end of a file will not actually extend the file.
Third, ftruncate() (which *will* extend a file if it is shorter than the
requested length) or lseek() followed by write() will not allocate
physical disk space except for the data actually written; it will create
a sparse file, which when later written to will become fragmented,
resulting in horrible performance.

DES
-- 
Dag-Erling Smørgrav - des@des.no
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Dag-Erling Smørgrav <des@...>
Cc: <freebsd-current@...>, Robert Watson <rwatson@...>, Jason Evans <jasone@...>, Igor Mozolevsky <igor@...>
Date: Friday, January 4, 2008 - 8:53 am

In message &lt;86myrlahee.fsf@ds4.des.no&gt;, =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= wr

SIGDANGER is not what we need.

What we need is an intelligent mechanism to tell applications what
the overall situation is, so that jemalloc and aware applications can
tune their usage pattern to the availability of physical and virtual
memory.

Instead of the binary "SIGDANGER" indication we need a more gradual
state, at the very least three stats:  "plenty", "getting a bit
tight" and "crunchtime".

Having a signal to indicate changes of the state may make sense,
but in a crunch, you don't want to wake all processes and page them
in, just to tell them that you're short on memory, it would have
to be a signal that doesn't schedule the recipient process until
something else does.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Poul-Henning Kamp <phk@...>
Cc: Dag-Erling Smørgrav <des@...>, <freebsd-current@...>, Robert Watson <rwatson@...>, Jason Evans <jasone@...>
Date: Friday, January 4, 2008 - 9:03 am

This makes memory management in the userland hideously and
unnecessarily complicated. It's simpler to have SIGDANGER (meaning,
free all you can) -&gt; SIGTERM (terminate gracefully) -&gt; SIGKILL (too
late, I'm killing you anyway); and maybe a MIB in sysctl like
...vm.overcommit_action  ='soft' being SIGDANGER-&gt;SIGTERM-&gt;SIGKILL and
= 'hard' being SIGKILL, so the sysadmin at least has a choice

Igor
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Igor Mozolevsky <igor@...>
Cc: Poul-Henning Kamp <phk@...>, <freebsd-current@...>, Robert Watson <rwatson@...>, Jason Evans <jasone@...>
Date: Friday, January 4, 2008 - 9:12 am

You don't seem to understand what Poul-Henning was trying to point out,
which is that broadcasting SIGDANGER can make a bad situation much, much
worse by waking up and paging in every single process in the system,
including processes that are blocked and wouldn't otherwise run for
several minutes, hours or even days (getty, inetd, sshd, mountd, even
nfsd / nfsiod in some cases can sleep for days at a time waiting for
I/O)

DES
-- 
Dag-Erling Smørgrav - des@des.no
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Dag-Erling Sm??rgrav <des@...>
Cc: Poul-Henning Kamp <phk@...>, <freebsd-current@...>, Robert Watson <rwatson@...>, Jason Evans <jasone@...>, Igor Mozolevsky <igor@...>
Date: Friday, January 4, 2008 - 9:48 am

By making the default action for SIGDANGER to be SIG_IGN, this problem
would be mostly solved. Only processes that actually care about SIGDANGER
and installing the handler for it would require some non-trivial and
resource-hungry operation.
To: Kostik Belousov <kostikbel@...>
Cc: Dag-Erling Sm??rgrav <des@...>, <freebsd-current@...>, Robert Watson <rwatson@...>, Jason Evans <jasone@...>, Igor Mozolevsky <igor@...>
Date: Monday, January 7, 2008 - 5:08 am

In message &lt;20080104134829.GA57756@deviant.kiev.zoral.com.ua&gt;, Kostik Belousov 

This is a non-starter, if SIGDANGER is to have any effect, all
processes that use malloc(3) should react to it.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Poul-Henning Kamp <phk@...>
Cc: Kostik Belousov <kostikbel@...>, <freebsd-current@...>
Date: Monday, January 7, 2008 - 5:58 am

This depends on what SIGDANGER is supposed to indicate.  IMO, a single
signal is inadequate - you need a "free memory is less than desirable,
please reduce memory use if possible" and one (or maybe several levels
of) "memory is really short, if you're not important, please die".

The former could reasonably default to SIG_IGN - processes that are
in a position to release memory on demand could provide a handler to
do so.  (This could potentially include malloc returning space on
its freelist to the kernel).

The latter should default to "terminate process" and a process that
considers itself "important" enough can trap it.

--=20
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.
To: Peter Jeremy <peterjeremy@...>
Cc: Kostik Belousov <kostikbel@...>, <freebsd-current@...>
Date: Monday, January 7, 2008 - 6:05 am

That's what I have been advocating for the last 10 years...

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Poul-Henning Kamp <phk@...>
Cc: Kostik Belousov <kostikbel@...>, Peter Jeremy <peterjeremy@...>, <freebsd-current@...>
Date: Monday, January 7, 2008 - 9:15 am

That makes the userland side of unnecessarily overcomplicated. If a
process handles SIGDANGER then let it do so and assume it's important
enough to be left alone, if a process doesn't handle SIGDANGER then
send SIGTERM to them then SIGKILL; but in any case SIGTERM *should*
precede SIGKILL - the processes ought to be allowed to terminate
gracefully.


Igor :-)
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Igor Mozolevsky <igor@...>
Cc: Kostik Belousov <kostikbel@...>, Peter Jeremy <peterjeremy@...>, <freebsd-current@...>
Date: Monday, January 7, 2008 - 9:18 am

In message &lt;a2b6592c0801070515g37735475kc0922af8f93723ca@mail.gmail.com&gt;, "Igor

Yes, but you will not see this complication, it will be hidden
in the implementation of malloc(3).

Every problem has a simple, easy to understand solution that does
not work.  SIGDANGER is one of these.  It didn't work any good on
AIX and it won't do so on FreeBSD either.

The problem simply requires more than one bit of feedback information
to get a sensible regulation.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Poul-Henning Kamp <phk@...>
Cc: Kostik Belousov <kostikbel@...>, Peter Jeremy <peterjeremy@...>, <freebsd-current@...>, Igor Mozolevsky <igor@...>
Date: Monday, January 7, 2008 - 7:19 pm

On Mon, 07 Jan 2008 13:18:47 +0000

How could you hide it inside malloc?  Would malloc start
returning 0 after receiving the "less mem than desirable"
signal?  Would it ever go back to returning non-zero?

I thought that the idea of things like SIGDANGER was that
applications would be written to have a mode where they could
shut down some aspect of their operation, and free resources.  I
don't see how you can do that, autonomously, from within malloc?

Maybe introduce a special flavour of pointer value, returned by a
special version of malloc for "cache" objects, that the system is
allowed to automatically reclaim?  Then programs would need to be
able to handle SIGSEGV when accessing those...

Cheers,

-- 
Andrew
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Andrew Reilly <andrew-freebsd@...>
Cc: Kostik Belousov <kostikbel@...>, Peter Jeremy <peterjeremy@...>, Poul-Henning Kamp <phk@...>, <freebsd-current@...>
Date: Monday, January 7, 2008 - 8:06 pm

I'm with Andrew on this one. The only (sensible) way I could see it
being hidden behind malloc() is if malloc() blocks until sufficient
memory becomes available.

I thought the real idea behind SIGDANGER was to tell the kernel "I
kind of know what I'm doing, so if you gonna kill something don't kill
me" and that was achieved by AIX not SIGKILLing processes that had
sigaction(SIGDANGER) != SIG_IGN.

Igor :-)
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Igor Mozolevsky <igor@...>
Cc: Kostik Belousov <kostikbel@...>, Andrew Reilly <andrew-freebsd@...>, <freebsd-current@...>, Peter Jeremy <peterjeremy@...>
Date: Monday, January 7, 2008 - 8:17 pm

In message &lt;a2b6592c0801071606g4c0dcb9ap117e345fda5e7e5f@mail.gmail.com&gt;, "Igor

You should read some recent literature on malloc(3), my own and
Jasons papers are good places to start.

For performance reasons, malloc(3) will hold on to a number of pages
that theoretically could be given back to the kernel, simply because
it expects to need them shortly.

Such parameters and many others of the malloc implementation can
be tweaked to "waste" more or less memory, in response to a sensibly
granular indication from the kernel about how bad things are.

Also, many subsystems in the kernel could adjust their memory use
in response to a "memory pressure" indication, if memory is tight,
we could cache vnodes and inodes less agressively, if things are
going truly bad, we can even ditch all non-active entries from
these caches.

If one implements this with three states:

Green - "all clear"

Yellow - "tight" - free one before you allocate one if you can.

Red - "all out" - free all that you sensibly can.

And implemented strategies like I propose above (and have proposed
for the last 10 years), then it is very unlikely that the system
would ever get into the red state, because the yellow state will
mitigate and reduce the memory pressure.

Nothing prevents an intelligent process from listening in and
doing sensible things, firefox could ditch the memory cache of
pages for instance.

But we can't get anywhere until some VM wizard produces the
three "lamps" for us to look at in the first place, that's where
we have been stuck for the last 10 years.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-uns...
To: <freebsd-current@...>
Cc: Kostik Belousov <kostikbel@...>, Andrew Reilly <andrew-freebsd@...>, Poul-Henning Kamp <phk@...>, Igor Mozolevsky <igor@...>, Peter Jeremy <peterjeremy@...>
Date: Monday, January 7, 2008 - 9:37 pm

Although the primary concern is malloc(), I would like to point out that=20
various programs implementing copying garbage collection could more=20
efficiently give memory back to the system than malloc(), and could therefo=
r=20
benefit more than malloc() from some kind of feedback from the kernel.

There was concern over the complexity involved with intelligently doing=20
something about the memory pressure hints in userspace, but this does not=20
apply here since the allocator/garbage collection would be the equivalent o=
f=20
malloc() and complexity there would not affect application code.

The problem with malloc() being that, unless I am missing something, malloc=
=20
will never be able to give back memory to the kernel except insofar as the=
=20
memory mapped is continuously unused between some location and the break (i=
n=20
the case of sbrk()) or over the entire range (mmap()). malloc() cannot forc=
e=20
this to be the case, since pointers must remain valid. The possibility of=20
reclamation is then often going to be limited to completely unused space=20
being held by malloc() for future use, rather than also applying to areas=20
already used for allocation.

Programs implementing copying GC, or able to for some other reason to move=
=20
allocated memory around, could compact the heap and give back left-over=20
memory. In some cases this would only entail a temporary improvement due to=
=20
defragmentation, but in others (such as a long-running program spiking in=20
memory use, only then to drop a lot of that memory) it could have a pretty=
=20
massive effect on memory use.

Where a malloc() using program might be unable to sbrk() or munmap() becaus=
e=20
there happens to be some left-over non-free piece of memory at the top of t=
he=20
mapped range, a GC could use indications from the system to ensure this is=
=20
not the case (depending on details of the implementation; for example,=20
compactation of tenured generations could be forced early, etc).

(This i...
To: Peter Schuller <peter.schuller@...>
Cc: Andrew Reilly <andrew-freebsd@...>, Peter Jeremy <peterjeremy@...>, Poul-Henning Kamp <phk@...>, <freebsd-current@...>, Igor Mozolevsky <igor@...>, Kostik Belousov <kostikbel@...>
Date: Tuesday, January 8, 2008 - 2:36 pm

Actually, malloc(3) can use madvise(2) to notify the kernel that
arbitrary pages in the arena are unused and can be discarded.  The
current implementation will do so if the H option is specified.

DES
-- 
Dag-Erling Smørgrav - des@des.no
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: <freebsd-current@...>
Cc: Andrew Reilly <andrew-freebsd@...>, Peter Jeremy <peterjeremy@...>, Poul-Henning Kamp <phk@...>, Igor Mozolevsky <igor@...>, Kostik Belousov <kostikbel@...>, Dag-Erling <des@...>
Date: Wednesday, January 9, 2008 - 2:22 pm

Ah, interesting. I was not aware of that.

However, in this context it will likely only help partially since you still=
=20
need a full page to be free (and with a lot of programs many allocations wi=
ll=20
be significantly smaller than that, and I have to assume no real-life mallo=
c=20
will align all allocations to pages, or the overhead would be extreme).

=2D-=20
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller &lt;peter.schuller@infidyne.com&gt;'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org
To: Peter Schuller <peter.schuller@...>
Cc: Andrew Reilly <andrew-freebsd@...>, Peter Jeremy <peterjeremy@...>, Poul-Henning Kamp <phk@...>, <freebsd-current@...>, Igor Mozolevsky <igor@...>, Kostik Belousov <kostikbel@...>
Date: Thursday, January 10, 2008 - 6:04 am

Page-aligning every allocation would be supremely stupid, and jemalloc
does so only for allocations larger than a page.

DES
-- 
Dag-Erling Smørgrav - des@des.no
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Peter Schuller <peter.schuller@...>
Cc: Andrew Reilly <andrew-freebsd@...>, Peter Jeremy <peterjeremy@...>, Poul-Henning Kamp <phk@...>, <freebsd-current@...>, Igor Mozolevsky <igor@...>, Kostik Belousov <kostikbel@...>
Date: Thursday, January 10, 2008 - 10:31 am

I misread your "no" as "any", so it seems we are in violent agreement.

However, most allocators these days are zone or slab allocators (or
similar in principle), and are pretty good at minimizing external
fragmentation except for pathological cases, which are suprisingly rare
in practice.

DES
-- 
Dag-Erling Smørgrav - des@des.no
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Poul-Henning Kamp <phk@...>
Cc: Kostik Belousov <kostikbel@...>, Andrew Reilly <andrew-freebsd@...>, <freebsd-current@...>, Peter Jeremy <peterjeremy@...>
Date: Monday, January 7, 2008 - 8:57 pm

Can you provide some refs/links, unfortunately googling for

I don't think it's the kernel that is being ill-mannered (unless, of
course, it's running ZFS ;-)) by eating up the memory, it's the user

How do you propose they 'eavesdrop' on the kernel? Baring in mind that
most apps nowadays are written for Linux and are hacked to be portable
afterwards (just look at the number of patches in the ports tree),
it's much simpler to write a signal handler than FreeBSD-kernel

I think the problem is not in providing the lamps to indicate the
state, but figuring out an algorithm for judging green-&gt;yellow and
yellow-&gt;green transitions...

Igor
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Igor Mozolevsky <igor@...>
Cc: Kostik Belousov <kostikbel@...>, Andrew Reilly <andrew-freebsd@...>, <freebsd-current@...>, Peter Jeremy <peterjeremy@...>
Date: Tuesday, January 8, 2008 - 4:31 am

In message &lt;a2b6592c0801071657s43fcc739jac09baedef7b7532@mail.gmail.com&gt;, "Igor

http://phk.freebsd.dk/pubs

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Igor Mozolevsky <igor@...>
Cc: <freebsd-current@...>
Date: Monday, January 7, 2008 - 10:34 pm

On Tue, 8 Jan 2008 00:57:21 +0000
"Igor Mozolevsky" &lt;igor@hybrid-lab.co.uk&gt; wrote:


Try PHK+malloc or just phkmalloc for better results. Looking for
misspelled acronyms can be a frustrating and futile undertaking
indeed :)

--=20
Alexander Kabaev
To: Poul-Henning Kamp <phk@...>
Cc: Kostik Belousov <kostikbel@...>, Peter Jeremy <peterjeremy@...>, <freebsd-current@...>, Igor Mozolevsky <igor@...>
Date: Monday, January 7, 2008 - 8:28 pm

On Tue, 08 Jan 2008 00:17:04 +0000

Aah, OK, so there's some essentially system-level caching going
on behind the scenes, and that's readily malleable for this sort
of thing.  I thought that you were proposing some way to
propagate the "yellow" or "red" conditions to user-program
activity through malloc, which seems hard, since the only
official out-of-band signal there is a zero return.

I'll have to track down your papers, though, because I thought
that the whole problem revolved around the fact that malloc(3)
doesn't hand out physical pages at all: that was left up to the
kernel vm pager to do as needed.  Is it zeroed (and therefore

I agree.  That sort of auto-tuning of the space/speed trade-off

I imagine that even if the accounting can be managed efficiently,
the specification of the specific thresholds would be fairly
tricky to specify...

Cheers,

-- 
Andrew
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Dag-Erling Smørgrav <des@...>
Cc: Poul-Henning Kamp <phk@...>, <freebsd-current@...>, Jason Evans <jasone@...>, Igor Mozolevsky <igor@...>
Date: Friday, January 4, 2008 - 9:24 am

Another aspect of the problem is that applications have come to depend in=
=20
malloc(3) returning NULL when memory is getting tight, and while we have ne=
ver=20
done exactly that, we have historically had malloc(3) return NULL when we g=
et=20
close to the process data segment size.

Robert N M Watson
Computer Laboratory
University of Cambridge
To: Robert Watson <rwatson@...>
Cc: Poul-Henning Kamp <phk@...>, <freebsd-current@...>, Jason Evans <jasone@...>, Igor Mozolevsky <igor@...>
Date: Saturday, January 5, 2008 - 9:50 am

I don't do that any more.  Unless the program I'm writing is intended to
run for a long time and can gracefully handle an out-of-memory situation
(such as denying client requests until the situation improves), I write
malloc() wrappers which zero the allocated region before returning to
the caller, to force a SIGSEGV and spare the caller from having to check
the return value.

I sometimes also allocate a little bit extra and stick a magic signature
and an allocation length in there so my free() wrapper can check for
bugs and zero the allocated memory before freeing it.  I wouldn't need
any of this if my code only ran on FreeBSD, but most of my $DAYTIME_JOB
code these days runs on Linux first and FreeBSD second.

DES
-- 
Dag-Erling Smørgrav - des@des.no
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Igor Mozolevsky <igor@...>
Cc: Dag-Erling Smørgrav <des@...>, <freebsd-current@...>, Robert Watson <rwatson@...>, Jason Evans <jasone@...>
Date: Friday, January 4, 2008 - 7:31 am

Do everyone a favour and research the topic in the archives, please. 
Another thread on the subject will just waste everyone's time.

Kris

_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Igor Mozolevsky <igor@...>
Cc: Dag-Erling Smørgrav <des@...>, <freebsd-current@...>, Jason Evans <jasone@...>
Date: Friday, January 4, 2008 - 7:22 am

That will create a sparse file without file system blocks to back it, and is 
effectively also over-commit.  When the file system runs out of room, you will 
get SIGSEGV when the vnode pager discovers it can't write a page to disk.  If 
you zero-fill it, the blocks are pre-allocated.  In a more ideal world, we 
might support an ioctl or system call to pre-allocate but not hook up the 
blocks until they were written to, in order to avoid writing lots of zeros to 
disk, but we don't live in that ideal world yet.

Allowing malloc to support alternative sources of pages for memory mapping, 
such as specific files, would be very neat indeed.

Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Robert Watson <rwatson@...>
Cc: Dag-Erling Smørgrav <des@...>, <freebsd-current@...>, Jason Evans <jasone@...>
Date: Friday, January 4, 2008 - 7:30 am

Surely you should not be allowed to overcommit on fseek() followed by
write(,,1); zeroing out gigs of hdd space seems rather silly...

Igor
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Igor Mozolevsky <igor@...>
Cc: Dag-Erling Smørgrav <des@...>, <freebsd-current@...>, Jason Evans <jasone@...>
Date: Friday, January 4, 2008 - 7:38 am

Sparse files are a feature.  It just becomes inconvenient at that point 
because you discover the lack of space asynchronously from a useful user 
process event.  When memory pressure gets high, the vnode pager decides it's 
time to push a dirty page to disk, and then discovers that there are no free 
blocks on the file system to write to.  As I mentioned in my e-mail, it would 
be nice if our file system supported a way to reserve blocks for files without 
hooking them up to the file's visiible address space (in order to avoid 
zeroing them, which is required if you do want to hook them up for an 
unprivileged process).  However, that feature doesn't currently exist.

Many systems with sensitivity to on-demand allocation costs and without 
security requirements allow files to be extended without zeroing.  On systems 
with security requirements, this becomes a privileged operation (such as on 
Mac OS X) because exposing unzeroed pages from other files or processes not 
explicitly shared is Not Allowed.

Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Robert Watson <rwatson@...>
Cc: <freebsd-current@...>, Jason Evans <jasone@...>, Igor Mozolevsky <igor@...>
Date: Friday, January 4, 2008 - 8:48 am

Even for files which are intended to be filled up immediately, telling
the file system ahead of time how much data will be written would allow
it to make much better layout decisions.

DES
-- 
Dag-Erling Smørgrav - des@des.no
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Robert Watson <rwatson@...>
Cc: <freebsd-current@...>, Jason Evans <jasone@...>, Poul-Henning Kamp <phk@...>
Date: Friday, January 4, 2008 - 5:32 am

Not a good solution on its own.  You need a per-process limit as well,
otherwise a malloc() bomb will still cause other processes to fail

Thank you :)

DES
-- 
Dag-Erling Smørgrav - des@des.no
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Dag-Erling Smørgrav <des@...>
Cc: <freebsd-current@...>, Jason Evans <jasone@...>, Poul-Henning Kamp <phk@...>
Date: Friday, January 4, 2008 - 7:06 am

ly.

That was what I had in mind, the above should read RLIMIT_SWAP.

Robert N M Watson
Computer Laboratory
University of Cambridge
To: Robert Watson <rwatson@...>
Cc: Dag-Erling <des@...>, <freebsd-current@...>, Jason Evans <jasone@...>, Poul-Henning Kamp <phk@...>
Date: Friday, January 4, 2008 - 9:54 am

Robert Watson wrote:
&gt; On Fri, 4 Jan 2008, Dag-Erling Sm
To: Skip Ford <skip@...>
Cc: Dag-Erling <des@...>, <freebsd-current@...>, Robert Watson <rwatson@...>, Jason Evans <jasone@...>, Poul-Henning Kamp <phk@...>
Date: Friday, January 4, 2008 - 9:59 am

Oh, I thought that I was the sole user of the patch. What problems did you
encountered while testing it ?

What you mean by "do 90% of swap" ?
To: Kostik Belousov <kostikbel@...>
Cc: Dag-Erling <des@...>, <freebsd-current@...>, Robert Watson <rwatson@...>, Jason Evans <jasone@...>, Poul-Henning Kamp <phk@...>
Date: Friday, January 4, 2008 - 10:11 am

&gt; &gt; &gt; On Fri, 4 Jan 2008, Dag-Erling Sm
To: Skip Ford <skip@...>
Cc: Dag-Erling <des@...>, <freebsd-current@...>, Robert Watson <rwatson@...>, Jason Evans <jasone@...>, Poul-Henning Kamp <phk@...>
Date: Friday, January 4, 2008 - 10:18 am

Ok. The patch really imposes two kind of limits:
- the total amount of anon memory that could be allocated in the whole
  system (this is what I called "disabling overcommit")
- per-user RLIMIT_SWAP limit, that account the allocation by the uid. This
  has some obvious problems with setuid(2) syscall. AFAIR, I ended up
  not moving the accounted numbers to the new uid.

Both limits can be turned on/off independently.

May be, time to revive it.
To: Kostik Belousov <kostikbel@...>
Cc: Dag-Erling <des@...>, <freebsd-current@...>, Robert Watson <rwatson@...>, Jason Evans <jasone@...>, Poul-Henning Kamp <phk@...>
Date: Friday, January 4, 2008 - 10:58 am

&gt; &gt; &gt; &gt; &gt; On Fri, 4 Jan 2008, Dag-Erling Sm
To: Skip Ford <skip@...>
Cc: Kostik Belousov <kostikbel@...>, <freebsd-current@...>, Robert Watson <rwatson@...>, Jason Evans <jasone@...>, Poul-Henning Kamp <phk@...>
Date: Saturday, January 5, 2008 - 10:01 am

Implementing a per-process limit would help fix the setuid() problem,
since the usage of the process calling setuid() would be known and could
be transferred to the new user.  There could however be a problem when a
process creates a MAP_SHARED | MAP_ANON mapping, then fork()s, and the
child calls setuid() (think privilege separation).  Hopefully, this case
is rare enough (malloc() always uses MAP_PRIVATE) that it can be handled
using the most restrictive interpretation possible rather than trying to
be painstakingly precise.

(BTW, Skip, I find your MUA's use of Mail-Followup-To: offensive; if you
don't want a copy of the followup, set the followup address to the list,
not to a random previous participant in the thread)

DES
-- 
Dag-Erling Smørgrav - des@des.no
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Robert Watson <rwatson@...>
Cc: <freebsd-current@...>, Jason Evans <jasone@...>, Poul-Henning Kamp <phk@...>
Date: Friday, January 4, 2008 - 8:34 am

You don't want the default to be so high.  You want a low default, with
the possibility for the admin to increase the limit for a particular
user in login.conf or similar without rebooting (which is currently not
possible since the default datasize == maxdsiz, which can only be
changed in the kernel config or loader.conf)

You may also want to have a collective limit for unprivileged users, so
root will still be able to log in if something goes wrong.

DES
-- 
Dag-Erling Smørgrav - des@des.no
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Dag-Erling Smørgrav <des@...>
Cc: <freebsd-current@...>, Jason Evans <jasone@...>, Poul-Henning Kamp <phk@...>
Date: Friday, January 4, 2008 - 9:26 am

This will presumably only work for console logins, as sshd (etc) will depen=
d=20
on unprivileged users, but perhaps that is fine.  I'm less concerned with t=
he=20
details of the implementation or policy than that we simply be able to supp=
ort=20
even a basic policy and have it configured by default to prevent=20
foot-shooting.

Robert N M Watson
Computer Laboratory
University of Cambridge
To: Robert Watson <rwatson@...>
Cc: Dag-Erling Smørgrav <des@...>, <freebsd-current@...>, Jason Evans <jasone@...>, Poul-Henning Kamp <phk@...>
Date: Friday, January 4, 2008 - 2:27 am

I'm not sure that I like that very much.  At least the way that
it has been explained here so correct me if I misunderstood.

I have long lived processes that continuously handle very valuable
data and potentially get very large (several GB).  I'd like that
process to be able to make a rational decision about what happens to its
memory contents when an allocation fails rather than having the
proverbial rug pulled out from under it.  Rug pulling at any point 
can cost an annual salary or two.

Ian

--
Ian Freislich

_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Ian FREISLICH <ianf@...>
Cc: <freebsd-current@...>
Date: Friday, January 4, 2008 - 5:51 am

[Empty message]
To: Peter Jeremy <peterjeremy@...>
Cc: Ian FREISLICH <ianf@...>, <freebsd-current@...>
Date: Friday, January 4, 2008 - 8:47 am

I need to make a slight correction there:

some time ago the patch at the
http://people.freebsd.org/~kib/overcommit/index.html
works, at least I believe so. I implemented overcommit turn-off knob
and did the exact anonymous memory accounting. Quite possible, the code
rotten since then.
To: Dag-Erling Smørgrav <des@...>
Cc: <freebsd-current@...>, Jason Evans <jasone@...>
Date: Thursday, January 3, 2008 - 6:23 pm

That is a pretty damning argument in my mind.  Why make such a major 
change right before the release when it's effectively useless?

Scott
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: <freebsd-current@...>
Cc: Dag-Erling <des@...>, Jason Evans <jasone@...>
Date: Thursday, January 3, 2008 - 6:46 pm

The motivation for the change is to preserve POLA as malloc() does honor 
RLIMIT_DATA in previous releases (4.x, 6.x, etc.).  That said, I think 
RLIMIT_VMEM is probably more useful going forward.  I know at work we have 
lots of hacks to deal with maxdsiz and trying to allow apps that use large 
malloc() and large mmap both cooperate.  Having one resource limit for malloc 
+ mmap is probably best for the future.

-- 
John Baldwin
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: John Baldwin <jhb@...>
Cc: Dag-Erling Smørgrav <des@...>, <freebsd-current@...>, Jason Evans <jasone@...>
Date: Thursday, January 3, 2008 - 7:08 pm

If it were happening on a stable branch, I'd agree more with the POLA  
argument.
The tradeoff between last minute destabilization, which is exactly  
what happened
here, and the highly imperfect and antiquated justification, is pretty  
bogus.

Scott

_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: Scott Long <scottl@...>
Cc: Dag-Erling Smørgrav <des@...>, <freebsd-current@...>, Jason Evans <jasone@...>
Date: Thursday, January 3, 2008 - 8:31 pm

The reason I'm more of a fan of introducing LIMIT_SWAP is that I'd like to be 
able to specifically avoid swap exhaustion by a process without preventing it 

When Alan proposed this as the approach, it was presumably under the 
assumption that it would be non-disruptive.  As it has proven highly 
disruptive, it's obviously not getting MFC'd for the release.  Instead we'll 
have to work on a solution for after .0, but make sure to document that the 
default swap resource limits effectively enforced in all prior FreeBSD 
releases are *not* enforced on 7.0, and that administrators wanting to prevent 
users from exhausting swap accidentally with something like the following:

int
main(int argc, char *argv[])
{
 	char *c;

 	while (1) {
 		c = malloc(getpagsize());
 		if (c == NULL)
 			err(-1, "malloc");
 		*c = 'a';
 	}
}

will need to now manually set the virtual memory limit in login.conf.  Note 
that the above strongly resembles frequently run CGI scripts written by many 
naive CGI script authors, so is something that we'd like to be robust against 
in the same way we prefer to be robust against:

int
main(int argc, char *argv[])
{

 	while (1) {
 		fork();
 	}
}

Smacking the user is obviously a good idea, but taking down the multi-user web 
server is not.

Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
To: <freebsd-current@...>
Cc: Dag-Erling <des@...>, Jason Evans <jasone@...>
Date: Thursday, January 3, 2008 - 5:00 pm

Also, may I humbly inject a user centric view here - it is pretty annoying =
to=20
be limited to 500 MB of mallocable memory on 32 bit machines when you expec=
t=20
3 GB to be usable (with 1 GB mapped to the kernel).

I scratched my head for a long time as to why I was getting out of memory=20
errors in spite of carefully setting resource limits and ensuring virtual=20
memory was available; at some later point in time I discovered the hard-cod=
ed=20
distinction between sbrk():able and mmap():able memory. I am not sure what =
I=20
was supposed to find this in the documentation (I found it by chance=20
Googling).

If sbrk() is indeed to be used by the default malloc, one definitely user=20
visible annoyance will be the 500 MB limit. At least with mmap() that will =
be=20
2.5 GB, unless I am misstaken, which is much closer to what one might expec=
t=20
and thus less likely to cause problems in the common case.

Changing maxdsize to be &gt; 500 MB is probably bad too, from a user centric=20
view, since you don't want to cause the equivalent problems for programs th=
at=20
do not use malloc(), but are indeed coded with "modern virtual memory" (as=
=20
the man page calls it) in mind. Better to leave this problem to those=20
programs that use sbrk() directly.

Another consequence is that if the sysadmin really wants a maximum amount o=
f=20
mmap():able memory