OpenBSD: Improved Memory Allocation, Beta Testing 3.8

Submitted by Jeremy
on August 23, 2005 - 6:32am

In a recent email, OpenBSD creator Theo de Raadt [interview] described a number of modifications to how OpenBSD allocates memory. In preparation for the upcoming 3.8 release, Theo asked for people to beta test -current as the recent modifications will likely cause instabilities in many applications. One of the modifications was to make the mmap system call return a random memory address, as well ensuring "that two objects are not mapped next to each other; in effect, this creates unallocated memory which we call a 'guard page'." Another was to update the malloc function to use mmap to obtain memory. Finally, the free function was updated to immediately return memory to the kernel and un-allocate it from the calling process. Additional changes were also made, but unlike these three the additional changes are not enabled by default as they are "too dangerous for normal software or cause too much of a slowdown".

Theo points out that these changes have a couple of significant impacts. He explains that for over a decade efforts have been made to find and fix buffer overflows, and more recently bugs have been found in which software is reading before the start of a buffer, or beyond the end of the buffer. With these recent memory allocation changes, such an attempt will cause the application to coredump with a SIGSEGV signal. Additionally, now that memory is unmapped as soon as it is freed, any attempt to access freed memory will also cause the application to coredump with a SIGSEGV signal. He explained, "we expect that our malloc will find more bugs in software, and this might hurt our user community in the short term. We know that what this new malloc is doing is perfectly legal, but that realistically some open source software is of such low quality that it is just not ready for these things to happen." Hence the request for beta testers to help track down these misbehaving applications. Theo concluded, "instead of saying that OpenBSD is busted in this regard, please realize that the software which is crashing is showing how shoddily it was written. Then help us fix it. For everyone.. not just OpenBSD users."


From: Theo de Raadt [email blocked]
To:  misc
Subject: 3.8 beta requests
Date: Mon, 22 Aug 2005 17:33:40 -0600

We are heading towards making the real 3.8 release soonish.  I would
like to ask the community to do lots of testing over the next week if
they can.

This release will bring a lot of new ideas from us.  One of them in
particular is somewhat risky.  I think it is time to talk about that
one, and let people know what is ahead on our road.

Traditionally, Unix malloc(3) has always just "extended the brk",
which means extending the traditional Unix process data segment to
allocate more memory.  malloc(3) would simply extend the data segment,
and then calve off little pieces to requesting callers as needed.  It
also remembered which pieces were which, so that free(3) could do it's
job.

The way this was always done in Unix has had a number of consequences,
some of which we wanted to get rid of.  In particular, malloc & free
have not been able to provide strong protection against overflows or
other corruption.

Our malloc implementation is a lot more resistant (than Linux) to
"heap overflows in the malloc arena", but we wanted to improve things
even more.

Starting a few months ago, the following changes were made:

- We made the mmap(2) system call return random memory addresses.  As well
  the kernel ensures that two objects are not mapped next to each other;
  in effect, this creates unallocated memory which we call a "guard page".

- We have changed malloc(3) to use mmap(2) instead of extending the data
  segment via brk()

- We also changed free(3) to return memory to the kernel, un-allocating
  them out of the process.

- As before, objects smaller than a page are allocated within shared
  pages that malloc(3) maintains.  But their allocation is now somewhat
  randomized as well.

- A number of other similar changes which are too dangerous for normal
  software or cause too much of a slowdown are available as malloc options
  as described in the manual page.  These are very powerful for debugging
  buggy applications.

Other results:

- When you free an object that is >= 1 page in size, it is actually
  returned to the system.  Attempting to read or write to it after
  you free is no longer acceptable.  That memory is unmapped.  You get
  a SIGSEGV.

- For a decade and a bit, we have been fixing software for buffer overflows.
  Now we are finding a lot of software that reads before the start of the
  buffer, or reads too far off the end of the buffer.  You get a SIGSEGV.

To some of you, this will sound like what the Electric Fence toolkit
used to be for.  But these features are enabled by default.  Electric
Fence was also very slow.  It took nearly 3 years to write these
OpenBSD changes since performance was a serious consideration.  (Early
versions caused a nearly 50% slowdown).

Our changes have tremendous benefits, but until some bugs in external
packages are found and fixed, there are some risks as well.  Some
software making incorrect assumptions will be running into these new
security technologies.

I discussed this in talks I have given before: I said that we were
afraid to go ahead with guard pages, because a lot of software is just
written to such low standards.  Applications over-read memory all the
time, go 1 byte too far, read 1 byte too early, access memory after free,
etc etc etc.

Oh well -- we've decided that we will try to ship with this protection
mechanism in any case, and try to solve the problems as we run into
them.

Two examples:

Over the last two months, some OpenBSD users noticed that the X server
was crashing occasionally.  Two bugs have been diagnosed and fixed by
us.  One was a use-after-free bug in the X shared library linker.  The
other was a buffer-over-read bug deep down in the very lowest level
fb* pixmap compositing routines.  The latter bug in particular was
very difficult to diagnose and fix, and is about 10 years old.  We
have found other bugs like this in other external software, and even a
few in the base OpenBSD tree (though those were found a while back,
even as we started experimenting with the new malloc code).

I would bet money that the X fb* bug has crashed Linux (and other) X
servers before.  It is just that it was very rare, and noone ever
chased it.  The new malloc we have just makes code get lucky less
often, which lets us get to the source of a bug easier.  As a
programmer, I appreciate anything which makes bugs easier to
reproduce.

We expect that our malloc will find more bugs in software, and this
might hurt our user community in the short term.  We know that what
this new malloc is doing is perfectly legal, but that realistically
some open source software is of such low quality that it is just not
ready for these things to happen.

We ask our users to help us uncover and fix more of these bugs in
applications.  Some will even be exploitable.  Instead of saying that
OpenBSD is busted in this regard, please realize that the software
which is crashing is showing how shoddily it was written.  Then help
us fix it.  For everyone.. not just OpenBSD users.


Related Links:

Performance?

runtime (not verified)
on
August 23, 2005 - 12:57am

This changes are pretty exciting. OpenBSD has been taking some daring changes in recent releases. Plus the whole open source community will benefit from bugs found and fixed on OpenBSD. I've used Electric Fence on Linux (and "PageHeap" on Windows), but it's VERY slow. I wonder how OpenBSD worked around these performance problems?

How long until similar features are added to the Linux kernel? One week? <:-)

re: Performance

on
August 23, 2005 - 5:45am

The effect on memory usage could be interesting. On the other
hand it looks like using at least a full page for every larger request
would use more memory, but on the other hand, since free() can then
immediately give the memory back, the kernel can reassign it to some
other process and ease memory pressure. It would be interesting to see
before/after measurements of memory usage.

Most malloc implementations (

mje (not verified)
on
August 23, 2005 - 7:34am

Most malloc implementations (GNU malloc included IIRC) will allocate whole pages for larger blocks anyway. Helps avoid heap fragmentation and cuts down on overhead.

Performance as a lower priority

Quality_First (not verified)
on
August 25, 2005 - 10:53am

I must not prematurely optimize. Premature optimization is the mind-killer. Premature optimization is the little-death that brings total obliteration. I will face my desire to prematurely optimize. I will permit it to pass over me and through me. And when it has gone past I will turn the inner eye to see its path. Where the desire has gone there will be nothing. Only I will remain.

one week? probably not at al

alan (not verified)
on
August 23, 2005 - 9:54am

one week? probably not at all. Linux has a much larger userbase, and responsible maintainers. They are much less likely to make sweeping changes to a core function such as memory allocation without studying the impact. I'd expect this sort of thing to be in a major release where application developers have some sort of warning. At the very least this behavior should be configurable in some way. Swapping out the expected behavior of the memory allocator (no matter how technically broken it is) and arrogantly pointing the finger back at your users claiming "your stuffs broken, haha!" is more of an OpenBSD thing to do.

Recall the vm changes in the early 2.4.x series and the move from devfs to udev.

The revolution is conservatively engineered.

introducting the changes

on
August 23, 2005 - 10:50am

Swapping out the expected behavior of the memory allocator (no matter how technically broken it is)

I wouldn't call it expected behaviour. By the C standard specs,
referring to freed memory or outside allocated areas results in
undefined behaviour. Everyone who cares even a bit about quality
writes his/her programs to avoid such things, and uses memory checkers like
ElectricFence and Valgrind as part of testing. I find it a major step
forward in reliability if at least some of the checking these tools
do really can be integrated into default runtime behaviour of malloc
without too much overhead. I agree not all Linux will change
overnight, but hopefully at least some distros start introducing
similar features soon.

_MALLOC_CHECK ...

lu_zero (not verified)
on
August 23, 2005 - 11:42am

glibc provides already a way to check the memory allocation.

_MALLOC_CHECK

on
August 23, 2005 - 10:00pm

doesn't seem like too many people are using it.

malloccheck on by default on

Unanymous (not verified)
on
August 26, 2005 - 9:09am

malloccheck on by default on my distro

Not the same at all.

Anonymous Coward (not verified)
on
August 26, 2005 - 10:30pm

Only if the MALLOC_CHECK_ environment value is defined to a value above zero. On OpenBSD-current it is the default behavior. Also, OpenBSD-current will cause a segfault the second you access the subsequent page. Glibc's malloc() cannot do this. Glibc's malloc just does sanity checks at malloc/free calls, not at memory access.

before you blow more smoke ou

ac (not verified)
on
August 27, 2005 - 6:47pm

before you blow more smoke out of your... check out glibc's MMAP_THRESHOLD and related things and tell us again how glibc is unable to use mmap backed memory for malloc.

> They are much less likely t

Anonymous2 (not verified)
on
August 23, 2005 - 2:00pm

> They are much less likely to make sweeping changes to a core function such as memory allocation without studying the impact.

> Recall the vm changes in the early 2.4.x series

Are you trolling or serious?!

Re: one week? probably not at al

Ray Lillard (not verified)
on
August 23, 2005 - 2:27pm

All the configuration option gets you is procrastination (and 10 year old bugs). Buggy code is also insecure code. Fix your damn code and quit whining. Oh, and I hope you didn't mean to identify the VM debacle of 2.4 Linux as an example of conservative engineering practice.

VM debacle?

anon y mous (not verified)
on
August 24, 2005 - 7:51am

The VM "debacle" of 2.4?

The VM was shown to run badly on Intel PAE systems above about 16GB in some situations and thus some parts of the memory manager were reworked. This was followed by fairly extensive testing and most people with those systems found it to be a great improvement.

The operating systems of worship for the hordes of people who attacked Linux for this probably could never handle more than 4GB of memory in the same situations, even if they had such users in the first place.

You messing up. Discussion

Anonymous/00003 (not verified)
on
August 25, 2005 - 5:36am

You messing up.

Discussions about such changes have floated on LKML for ages.

Most of the time VM hackers gave blank response "UNIX is limited by brk() and we won't change that. Period."

What article says in fact that OpenBSD finally got rid off of brk(). Memory management becomes more dynamic and more checks can be done. In Linux' libc free() is more or less dummy - there is no way to return memory to OS once it was allocated. man brk for more info.

Linux VM is total crap just because kernel can never know (only wild guesses) what app does with particular region of memory. And as long as kernel interface to memory allocation is brk() (what is absence of any interface) you can forget about any improvements.

I'm glad that OpenBSD can take such huge step in right direction. I hope Theo will successed and Linux will be forced to throw away brk(). Finally.

>What article says in fact th

Anonymoos (not verified)
on
August 27, 2005 - 8:29pm

>What article says in fact that OpenBSD finally got rid off of brk().
>Memory management becomes more dynamic and more checks can be done. In
>Linux' libc free() is more or less dummy - there is no way to return
>memory to OS once it was allocated. man brk for more info.

I think you should "man brk". It appears that you have no idea what you're talking about, however I'm sure you must know enough to say "Linux VM is total crap".

Fixing the problem

dm (not verified)
on
August 25, 2005 - 6:15pm

pointing the finger back at your users claiming "your stuffs broken, haha!" is more of an OpenBSD thing to do.

Actually no. OpenBSD developers are fixing the problems as they find them. E.g. here and here.

They are much less likely to

DM (not verified)
on
August 26, 2005 - 6:29pm

They are much less likely to make sweeping changes to a core function

You mean like changing the VM system in the middle of a "stable" kernel series? :)

Um.

Anonymous Coward (not verified)
on
August 26, 2005 - 10:54pm

Whether OpenBSD is arrogant or not, they are in fact bugs, and thus, incorrect programs. Let us not forget that.

What OpenBSD is doing is great. They're using the CPUs page mapping feature to make Electric Fence type stuff the default behavior, only much faster. This will make it easier to debug on OpenBSD, as problems will manifest themselves sooner rather than later. This will lead to many bugs being discovered, which in turn should lead to less buggy Linux programs as well as patches get merged upstream.

pointing the finger: will probably work well IMHO

on
October 3, 2005 - 7:47pm

pointing the finger back at your users claiming "your stuffs broken, haha!" is more of an OpenBSD thing to do

It does seem likely to be effective, however, and not particularly unreasonable to me. Theo is clearly asking developers to test rather than naive end users.

There is good logic in software faulting on a runtime error rather than silently failing.

(Disclaimer: I run Gentoo Linux, not OpenBSD.)

There is good logic in softwa

Anonymous (not verified)
on
October 13, 2005 - 9:26pm

There is good logic in software faulting on a runtime error rather than silently failing.
Exactly, and that's Unix behaviour as well.

If you need to fail, fail noisily.

I think this is another great thing done the Unix-way.

The malloc()/free() changes a

Anon (not verified)
on
August 23, 2005 - 2:51pm

The malloc()/free() changes are in libc (thus userspace), so they're not the kernel developers' business (the kernel only allocates memory via the brk()/sbrk()) interface, iirc). This would most likely be a call to the glibc maintainers.

On the other hand, the mmap() "random address" policy looks like a good idea and doesn't look too difficult/risky to implement on the linux kernel... It reminds me of the (also OpenBSD, if I'm not mistaken :-)) policy of assigning random port numbers to new tcp/ip sockets, so that potential hackers have no previsibility of which is going to be used next.

not just brk/sbrk

on
August 23, 2005 - 5:16pm

It also provides memory via anonymous mmap(), which is how OpenBSD works its magic for large allocations. But you're right, the change in behavior for malloc/free sounds like it's almost entirely in userspace. (The randomization of mmap and addition of guard pages sounds like it may have some kernel support.)

As for mmap randomization, Linux has done something similar recently with stack randomization. And guess what? It breaks GCC and a couple other apps, so I hear. Notably, the precompiled header support in GCC was saving data structures to disk with raw pointers, even though the format apparently allowed for an offset-based approach. I think it's fixed now.

For what it's worth, a projec

Chris B. (not verified)
on
October 4, 2005 - 3:38pm

For what it's worth, a project I've been working on for some time provides a subset of these features for the Linux 2.4 series kernel. Specifically, mmap allocations are randomly placed throughout the user address space.

See http://chris.kavefish.net/projects.php for more info.

For the brave, try my patch to the 2.4.31 kernel:
http://chris.kavefish.net/files/aslp-1.5-2.4.31-i386-elf.patch

re: The malloc()/free() changes a

ratatask (not verified)
on
August 28, 2005 - 7:26am

>so they're not the kernel developers' business so they're not the kernel developers' business

No, but it should, to some degree. The kernel provides mmap and brk. That's it. You have no way of telling the kernel to unmap a page or range of pages in a mapping (and to remap it for that matter.). There should be an interface for this. That's what OpenBSD has done. (And windows has for ages btw.).
It allows you to detect many more invalid accesses, as you now trap if you access that, which is what openbsd uses this for.
It also allows you to compact the heap much more, as you can unmap pages in the middle of a mapping, else it eventually gets swapped out = more work. Happens often, there is much "unused" memory on the heap, beeing able to unmap it is nice. That's what windows uses this for :-)

Look at the manpage for munmap(2)

on
August 28, 2005 - 11:04am

The kernel provides mmap, brk, mremap and munmap. munmap can be used to invalidate a mapping or part of a mapping created by mmap, while mremap can be used to remap part or all of an existing mapping. Therefore, it can do everything you've suggested it can't.

Not quite that simple...

Anonymous (not verified)
on
October 3, 2005 - 12:19pm

If libc changes from a brk() based malloc to a malloc where every allocation of one page or more tranlates into a mmap, then it changes the requirements for the kernel. A process that previously used a few to maby a few hundred mappings might now use thousands of mappings. The kernel can be expected to handle it, but probably not to be optimized for it. The basic support for a mmap based malloc is a user space thing, but without kernel support the performance penalty might be unacceptable.

Question ?

BrainBug (not verified)
on
August 23, 2005 - 4:05am

So does this mean that we will need better hardware ? What about hardware req. with these new security extras ?

Nope.

sjmorgan (not verified)
on
August 23, 2005 - 6:54am

Nope. They've simply (if you can call it that) made some very cool modifications to the standard library.

...

sjmorgan (not verified)
on
August 23, 2005 - 6:57am

and/or kernel.

Patent that sucka!

Crash Cash (not verified)
on
August 23, 2005 - 11:36am

This should be patented, so Microsoft can't use it or patent it themselves.

no

on
August 23, 2005 - 1:33pm

No, it shouldn't. It can't be patented right now, because this is clearly prior art. Any patent filed after today will be easily invalidated.

--
:wq

Maybe

Anonymous3 (not verified)
on
August 23, 2005 - 3:46pm

No, it shouldn't. It can't be patented right now, because this is clearly prior art. Any patent filed after today will be easily invalidated.

Until the US Congress changes the patent rules to a first file instead of a first to develop situation. 8(

Patent laws

on
August 24, 2005 - 4:05am

The difference between first to file and first to develop only matters when it comes to getting a patent. In both systems, the publication of prior art before you file for your patent is enough to invalidate your patent, since either you're patenting someone else's invention (not allowed), or your invention is obvious to someone skilled in the field (since someone else has come up with it).

The distinction between the two systems is in how they deal with inventions that aren't published. In a first to file system, the first person to file for a patent will get it, subject to no prior art. In a first to develop system, you could file for a patent, and lose it to Microsoft if they can show that they came up with it first, kept it secret, and never revealed it. Note that even in first to file, revealing an invention to the public prevents it being patented.

Reinventing the ElectricFence?

on
August 23, 2005 - 2:44pm

I wonder why a kernel-space solution is needed, when you can get the same effect by using LD_PRELOAD=/usr/lib/libefence.so.1 for selected
binaries (such as network daemons).

-Yenya

A lot of the changes were in

on
August 23, 2005 - 5:07pm

A lot of the changes were in userspace, the malloc()/free() routines are in libc. But some of the changes needed kernel support, like having mmap() return random addresses can't be done in userland.

And on top of that, Theo even mentioned that Electric Fence was way too slow to production use.

LD_PRELOAD doesn't always work

chadloder (not verified)
on
August 25, 2005 - 1:33pm

LD_PRELOAD is ignored for statically compiled binaries (e.g., most shells). Also in most circumstances it is ignored for setuid/setgid programs.

Plus Electric Fence is SLOW. It's a debugging tool.

If electric fence won't find/

Anonymous (not verified)
on
September 13, 2005 - 7:55am

If electric fence won't find/fix a problem then I can't see how the openbsd changes will. They both use exactly the same technique, reading the descriptions. With electric fence, you can have it never return the same memory address once it has been freed. This can consume a LOT of VM.

Electric fence should be part of everyone's software testing rig because it allows for you to check all of these sorts of things with malloc'd memory.

The random allocation thing is just a continuation of the "if you don't have randomness then you aren't secure" wank from openbsd.

no, it's just proactive

on
October 3, 2005 - 8:01pm

If electric fence won't find/fix a problem then I can't see how the openbsd changes will.

That statement betrays a shallow attitude to security. Even where a program is tested with Electric Fence, this does not cover all malicious data or runtime states that might happen to it later. These new measures just add to the arsenal of "preventive" measures - not just in development settings but in production as well. In development, a runtime fault can be diagnosed and fixed (silent failure not); in production, a runtime fault will be trapped instead of becoming a security breakdown. Being omnipresent they are easier to use than Electric Fence, and will therefore probably do more good.

Security of OpenBSD is great

on
August 23, 2005 - 3:31pm

It's amaizing how many effort they put on constant security improvements! I remeber when I found small semi-security bug in libedit openbsd team was the first to really responding to an initial announce and first produces a patch (that code was common for Open/Net/Free(ports)BSD and MySQL).

Good work.

So many wrong comments..

Anonymous
on
August 23, 2005 - 5:06pm

HU!

brk/sbrk are used to change the data segment size, so those addresses
cannot be modified (only the base address) on process startup! The only other source of memory allocation is mmap wich gives you virtual memory blocks (pages), those can be randomized on a per-call basis (each time you call mmap a randomized non-used address is returned). You can even alloc one page before, and other after the address you maped and mark those pages unreadable/unwriteable, thus cathing buffer overflows..etc.

OpenBSD implemented "address space randomization", wich is already implemented in Linux ;-p

> You can even alloc one page

Erik W (not verified)
on
August 24, 2005 - 3:29am

> You can even alloc one page before, and other after the address you maped and mark those pages unreadable/unwriteable, thus cathing buffer overflows..etc.

Yes, but now it's done for all allocations automagically, without the developer having to lift a finger.

> OpenBSD implemented "address space randomization", wich is already implemented in Linux ;-p

You sure that OBSD wasn't there first?

OpenBSD wasn't there first

Anonymouse! (not verified)
on
August 24, 2005 - 12:06pm

VMS had this behaviour for ages. It is also famous for being slow as a snail, may not be related though.

> Yes, but now it's done for

Anonymous
on
August 24, 2005 - 12:21pm

> Yes, but now it's done for all allocations automagically, without the > developer having to lift a finger.

So for every allocated page are there two more pages ?? Stupidity!

no, not quite.

tu (not verified)
on
August 24, 2005 - 12:55pm

no, not quite.

Not at all

Bombadil (not verified)
on
August 24, 2005 - 10:21pm

For every allocated page in the virtual address space, two pages around it in the virtual address space won't be useable for user data.

They don't take up any resource on the system and will not be missed unless the process tries to use all its available virtual address space (2/4Gb on 32 bit system, 8/16 ExaByte on 64-bit system).

For more info on this issue, you better learn basics of Virtual Memory.

I know Virtual Memory..but th

Anonymous
on
August 24, 2005 - 11:36pm

I know Virtual Memory..but the question was..that's the default behaviour, they have just randomized the return addresses, every single mordem OS dont't allow accesses to unmaped addresses.

Humm..so for every process, it can only use one third of it's virtual memory space (depending on the allocation pattern).. That's almost the same thing. Other process will be able to use the memory..but the same process not. I remain correct, that's stupidity.

Why is it stupid?

on
August 25, 2005 - 3:49am

I don't see the problem with a virtual address layout that in a pessimistic case (all allocations less than one page) limits you to 1/3rd the virtual memory limits. We have good 64-bit platforms out there (AMD64, SPARCv9, MIPS64) for users who need more than 1GB address space, and we're nowhere near any application that needs more than 5,000,000 TB address space. Even with AMD64's current 48 bit virtual addresses, you still get over 80 TB usable virtual address space with the pessimistic allocation.

This is especially true when you think about the security tradeoff; doing things this way makes programs with buffer overflow errors, read/write after free errors, and buffer underflow errors crash with a SIGSEGV rather than do something bad.

its 1/2 not 1/3

virtualmemory (not verified)
on
August 25, 2005 - 4:42am

because you dont need two consecutive pages blocked.

Also you write

for every process, it can only use one third of it's virtual memory space. [...] Other process will be able to use the memory..but the same process not.

Every process has its own virtual memory: the "blocked" pages do not exist, they are only a virtual memory address thats not available to that praticular process. What memory other processes use is completely independent.

As I said, other process will

Anonymous
on
August 25, 2005 - 8:18am

As I said, other process will be able to use the _physical_ memory. Unless they share the same virtual space, in wich case they will all be limited to the same "memory" as all they see. It's 1/3, if some process allocs pages of 4k each (plus one before and another after - of virtual memory, not physical). If the process has five threads, that will be 200mb for each thread and so on...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.