OpenBSD: Improved Memory Allocation, Beta Testing 3.8

Submitted by Jeremy
on August 23, 2005 - 3:32am

In a recent email, OpenBSD creator Theo de Raadt [interview] described a number of modifications to how OpenBSD allocates memory. In preparation for the upcoming 3.8 release, Theo asked for people to beta test -current as the recent modifications will likely cause instabilities in many applications. One of the modifications was to make the mmap system call return a random memory address, as well ensuring "that two objects are not mapped next to each other; in effect, this creates unallocated memory which we call a 'guard page'." Another was to update the malloc function to use mmap to obtain memory. Finally, the free function was updated to immediately return memory to the kernel and un-allocate it from the calling process. Additional changes were also made, but unlike these three the additional changes are not enabled by default as they are "too dangerous for normal software or cause too much of a slowdown".

Theo points out that these changes have a couple of significant impacts. He explains that for over a decade efforts have been made to find and fix buffer overflows, and more recently bugs have been found in which software is reading before the start of a buffer, or beyond the end of the buffer. With these recent memory allocation changes, such an attempt will cause the application to coredump with a SIGSEGV signal. Additionally, now that memory is unmapped as soon as it is freed, any attempt to access freed memory will also cause the application to coredump with a SIGSEGV signal. He explained, "we expect that our malloc will find more bugs in software, and this might hurt our user community in the short term. We know that what this new malloc is doing is perfectly legal, but that realistically some open source software is of such low quality that it is just not ready for these things to happen." Hence the request for beta testers to help track down these misbehaving applications. Theo concluded, "instead of saying that OpenBSD is busted in this regard, please realize that the software which is crashing is showing how shoddily it was written. Then help us fix it. For everyone.. not just OpenBSD users."


From: Theo de Raadt [email blocked]
To:  misc
Subject: 3.8 beta requests
Date: Mon, 22 Aug 2005 17:33:40 -0600

We are heading towards making the real 3.8 release soonish.  I would
like to ask the community to do lots of testing over the next week if
they can.

This release will bring a lot of new ideas from us.  One of them in
particular is somewhat risky.  I think it is time to talk about that
one, and let people know what is ahead on our road.

Traditionally, Unix malloc(3) has always just "extended the brk",
which means extending the traditional Unix process data segment to
allocate more memory.  malloc(3) would simply extend the data segment,
and then calve off little pieces to requesting callers as needed.  It
also remembered which pieces were which, so that free(3) could do it's
job.

The way this was always done in Unix has had a number of consequences,
some of which we wanted to get rid of.  In particular, malloc & free
have not been able to provide strong protection against overflows or
other corruption.

Our malloc implementation is a lot more resistant (than Linux) to
"heap overflows in the malloc arena", but we wanted to improve things
even more.

Starting a few months ago, the following changes were made:

- We made the mmap(2) system call return random memory addresses.  As well
  the kernel ensures that two objects are not mapped next to each other;
  in effect, this creates unallocated memory which we call a "guard page".

- We have changed malloc(3) to use mmap(2) instead of extending the data
  segment via brk()

- We also changed free(3) to return memory to the kernel, un-allocating
  them out of the process.

- As before, objects smaller than a page are allocated within shared
  pages that malloc(3) maintains.  But their allocation is now somewhat
  randomized as well.

- A number of other similar changes which are too dangerous for normal
  software or cause too much of a slowdown are available as malloc options
  as described in the manual page.  These are very powerful for debugging
  buggy applications.

Other results:

- When you free an object that is >= 1 page in size, it is actually
  returned to the system.  Attempting to read or write to it after
  you free is no longer acceptable.  That memory is unmapped.  You get
  a SIGSEGV.

- For a decade and a bit, we have been fixing software for buffer overflows.
  Now we are finding a lot of software that reads before the start of the
  buffer, or reads too far off the end of the buffer.  You get a SIGSEGV.

To some of you, this will sound like what the Electric Fence toolkit
used to be for.  But these features are enabled by default.  Electric
Fence was also very slow.  It took nearly 3 years to write these
OpenBSD changes since performance was a serious consideration.  (Early
versions caused a nearly 50% slowdown).

Our changes have tremendous benefits, but until some bugs in external
packages are found and fixed, there are some risks as well.  Some
software making incorrect assumptions will be running into these new
security technologies.

I discussed this in talks I have given before: I said that we were
afraid to go ahead with guard pages, because a lot of software is just
written to such low standards.  Applications over-read memory all the
time, go 1 byte too far, read 1 byte too early, access memory after free,
etc etc etc.

Oh well -- we've decided that we will try to ship with this protection
mechanism in any case, and try to solve the problems as we run into
them.

Two examples:

Over the last two months, some OpenBSD users noticed that the X server
was crashing occasionally.  Two bugs have been diagnosed and fixed by
us.  One was a use-after-free bug in the X shared library linker.  The
other was a buffer-over-read bug deep down in the very lowest level
fb* pixmap compositing routines.  The latter bug in particular was
very difficult to diagnose and fix, and is about 10 years old.  We
have found other bugs like this in other external software, and even a
few in the base OpenBSD tree (though those were found a while back,
even as we started experimenting with the new malloc code).

I would bet money that the X fb* bug has crashed Linux (and other) X
servers before.  It is just that it was very rare, and noone ever
chased it.  The new malloc we have just makes code get lucky less
often, which lets us get to the source of a bug easier.  As a
programmer, I appreciate anything which makes bugs easier to
reproduce.

We expect that our malloc will find more bugs in software, and this
might hurt our user community in the short term.  We know that what
this new malloc is doing is perfectly legal, but that realistically
some open source software is of such low quality that it is just not
ready for these things to happen.

We ask our users to help us uncover and fix more of these bugs in
applications.  Some will even be exploitable.  Instead of saying that
OpenBSD is busted in this regard, please realize that the software
which is crashing is showing how shoddily it was written.  Then help
us fix it.  For everyone.. not just OpenBSD users.


Related Links:

you remain incorrect.

tu (not verified)
on
August 25, 2005 - 12:28pm

you remain incorrect. you can still use the entire address space, which is considerably larger than 1GB, btw.

obviously, you aren't using the software in question, so maybe you should refrain from "informing" the people who do use it what's happening.

2 ^ 32 = 4294967296 bytes. Ea

Anonymous
on
August 25, 2005 - 5:27pm

2 ^ 32 = 4294967296 bytes. Each page is 4096 bytes.
4294967296 / 4096 = 1048576 pages.

For every mmaped page there are two more reserved ones, e.g.: allocatin a page at position 4096 makes pages starting at offset 0 and 8192 unusable (one page before and other after), if you follow this allocation pattern (allocating at offsets: 4096, 12288, 16384) for every allocated page there are two unusable ones.

1048576 / 3 = 349525 pages. That's 1431654400 bytes or 1.33333206 gigabytes for a single process.

Those values can vary upper or down depending on the process allocation pattern.

ok.. that's the worst case...

Mr_Z
on
August 25, 2005 - 6:08pm

But really, how many processes allocate gigabytes of storage 4K at a time? For very small allocations, it sounds like OpenBSD still aggregates the allocations. It's only for larger allocations that it mmap()'s with a guardband.

I'm willing to bet that the actual address space utilization is much larger than you suggest. For instance, if the average mmap()'d allocation is 8K, then the amount of unused virtual address space drops to 1.3GBm and the available address space rises to 2.7GB for a 32-bit machine.

I'm not exactly sure what OpenBSD does, but in general 32-bit OSes seem to favor a 2G/2G split in virtual address space as a default. I tend to believe that in practice no one hits an artificial malloc() limit for all but the most obscene allocation patterns. And you could argue that programs that exhibit those particular allocation patterns have other issues anyway.

openbsd splits the address sp

tu (not verified)
on
August 26, 2005 - 12:33pm

openbsd splits the address space 3.25 user .75 kernel for i386.

I recently converted a plan9

Michael (not verified)
on
October 3, 2005 - 8:18am

I recently converted a plan9 based OS on ia32 from using the 2gig/2gig split to an arbitrary page aligned split. This gave me some insight's on why some OSs do this sort of thing (as well as ulcers). It is bizarre how the good people at bell labs thought testing the high bit (sign) of an address would be a suitable means of determining if an address was kernel or user space. The lesson taken from this is that from day one KZERO should be allowed to be any page aligned address. Sigh.

-michael

it doesn't work like that.

tu (not verified)
on
August 26, 2005 - 12:39pm

if you think about this for a second, you'll realize that the guard page coming after allocation A can be used as the guard page before allocation B. anyway, the actual implementation doesn't work like that either. also, even if you run out of space, it's not like the guard pages are completely off limits, they can be converted into non-guard pages at any time.

if you care, look at the sources or run some tests, don't speculate.

It is 1/2, not 1/3 - simple math

Anon (not verified)
on
August 28, 2005 - 8:08am

It is one half, as you can obviously share guard pages - the way you describe it any two subsequent 1-page allocs are separated by 2 empty gurad pages - I trust the OpenBSD developers are smart enough to figure that one out :). 1-page allocs are also rare, apps either have a lot of tiny objects (which malloc handles differently than large objects), or pretty huge buffers.

No, no, no, no, no. Your estimation is wildly inaccurate.

Keith Gaughan (not verified)
on
October 3, 2005 - 6:28pm

Aside from the fact that it assumes that you'll only ever be allocating single pages at a time with malloc or mmap, the way you arrived at the figure of only having access to 1/3 of the virtual address space is completely off.

Let's say for a moment that we had some pathological process that only ever allocated chunks of memory that were exactly one page in size, and let's say that it somehow managed to keep allocating memory until it filled the whole of the address space.

You'll discover something: many of the guard pages for the various blocks of memory overlap. Yup, blocks can share guard pages. You might even get a sequence of pages like the following:


_M_M_M_M_M_   where:
              M = physical page mapped onto virtual address space
              _ = a guard page

So you see, the worst case horribly pathological memory allocation scenario actually varies between 1/3 and 1/2. Of course, anything below 1/2 is an edge case, and proclaiming that it's stupid to allocate memory like this is stupid because of such edge cases is akin to saying that, say, we shouldn't use quicksort because even though it's average performance is O(nlogn), it has a worst-case performance of O(n^2), or we shouldn't use languages like ML or Haskell that use type inference because the algorithms to do this are potentially insoluble in a few pathological edge cases.

PaX

Yosef (not verified)
on
August 23, 2005 - 3:10pm

Doesn't PaX/GRSecurity do all that and more? and when are they going to go mainline?

Come on... Don't be so mean -

Anonümous (not verified)
on
August 24, 2005 - 1:37am

Come on... Don't be so mean - let the OpenBSD fanboys have their moment of joy...

like duh

mordr (not verified)
on
August 24, 2005 - 2:49am

no, it doesn't

Why the heck do these BSD guy

Me Who Else (not verified)
on
August 24, 2005 - 3:39am

Why the heck do these BSD guys have to insult people every time they have the chance to do so?

poor thing

wouter
on
August 25, 2005 - 1:20pm

I think OpenBSD's communication might be more 'direct' (like hammer and anvil) than other projects, but that does not take away the technical advances made or even the truth in Theo's arguments. If security is your goal, it would be seriously stupid to let political correctness and the easily offended get in the way. As long as they do have a valid point to make, they can phrase it like they want in my opinion.

A program that touches memory it doesn't own (anymore/yet) can be called shoddy, and these are bugs that should be fixed. It's almost sad that operating system developers have to force user software developers to fix their bugs to obtain some form of security. I hope developers will make good use of this chance to improve the quality of open source software.

And off you go to fix some software...

What about system call for each free(3) call?

Bombadil (not verified)
on
August 24, 2005 - 7:24pm

How about performance of the added system call on each free(3) call?
I understand the big advantage of giving back the page to the system, both for memory conservation and for security proposes, but one of the advantages of malloc()/free() in userland is that they save system calls and therefore are "easier" on CPU.

there is some caching done.

tu (not verified)
on
August 25, 2005 - 12:30pm

there is some caching done.

Would you mind elaborating a bit?

Bombadil (not verified)
on
August 25, 2005 - 11:38pm

Would you mind elaborating a bit about the caching?

From the news item I gather that free(3) just calls unmap(2),

If it doesn't do this right away then when does the unmap occure, and wouldn't it loose part of the security advantage of having the page outside the userspace's reach?
Thanks.

64k can be kept on the freeli

tu (not verified)
on
August 26, 2005 - 12:44pm

64k can be kept on the freelist. there are options to adjust this.

Thanks. So that reduces the security for these?

Bombadil (not verified)
on
August 28, 2005 - 7:03pm

Thanks.

So the security enhancement of unmap(2)'ing these pages won't be relevant for these pages, right?

(I suppose they'll still benefit from their own random virtual address).

Cheers.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.