In a recent email, OpenBSD creator Theo de Raadt [interview] described a number of modifications to how OpenBSD allocates memory. In preparation for the upcoming 3.8 release, Theo asked for people to beta test -current as the recent modifications will likely cause instabilities in many applications. One of the modifications was to make the mmap system call return a random memory address, as well ensuring "that two objects are not mapped next to each other; in effect, this creates unallocated memory which we call a 'guard page'." Another was to update the malloc function to use mmap to obtain memory. Finally, the free function was updated to immediately return memory to the kernel and un-allocate it from the calling process. Additional changes were also made, but unlike these three the additional changes are not enabled by default as they are "too dangerous for normal software or cause too much of a slowdown".
Theo points out that these changes have a couple of significant impacts. He explains that for over a decade efforts have been made to find and fix buffer overflows, and more recently bugs have been found in which software is reading before the start of a buffer, or beyond the end of the buffer. With these recent memory allocation changes, such an attempt will cause the application to coredump with a SIGSEGV signal. Additionally, now that memory is unmapped as soon as it is freed, any attempt to access freed memory will also cause the application to coredump with a SIGSEGV signal. He explained, "we expect that our malloc will find more bugs in software, and this might hurt our user community in the short term. We know that what this new malloc is doing is perfectly legal, but that realistically some open source software is of such low quality that it is just not ready for these things to happen." Hence the request for beta testers to help track down these misbehaving applications. Theo concluded, "instead of saying that OpenBSD is busted in this regard, please realize that the software which is crashing is showing how shoddily it was written. Then help us fix it. For everyone.. not just OpenBSD users."
From: Theo de Raadt [email blocked] To: misc Subject: 3.8 beta requests Date: Mon, 22 Aug 2005 17:33:40 -0600 We are heading towards making the real 3.8 release soonish. I would like to ask the community to do lots of testing over the next week if they can. This release will bring a lot of new ideas from us. One of them in particular is somewhat risky. I think it is time to talk about that one, and let people know what is ahead on our road. Traditionally, Unix malloc(3) has always just "extended the brk", which means extending the traditional Unix process data segment to allocate more memory. malloc(3) would simply extend the data segment, and then calve off little pieces to requesting callers as needed. It also remembered which pieces were which, so that free(3) could do it's job. The way this was always done in Unix has had a number of consequences, some of which we wanted to get rid of. In particular, malloc & free have not been able to provide strong protection against overflows or other corruption. Our malloc implementation is a lot more resistant (than Linux) to "heap overflows in the malloc arena", but we wanted to improve things even more. Starting a few months ago, the following changes were made: - We made the mmap(2) system call return random memory addresses. As well the kernel ensures that two objects are not mapped next to each other; in effect, this creates unallocated memory which we call a "guard page". - We have changed malloc(3) to use mmap(2) instead of extending the data segment via brk() - We also changed free(3) to return memory to the kernel, un-allocating them out of the process. - As before, objects smaller than a page are allocated within shared pages that malloc(3) maintains. But their allocation is now somewhat randomized as well. - A number of other similar changes which are too dangerous for normal software or cause too much of a slowdown are available as malloc options as described in the manual page. These are very powerful for debugging buggy applications. Other results: - When you free an object that is >= 1 page in size, it is actually returned to the system. Attempting to read or write to it after you free is no longer acceptable. That memory is unmapped. You get a SIGSEGV. - For a decade and a bit, we have been fixing software for buffer overflows. Now we are finding a lot of software that reads before the start of the buffer, or reads too far off the end of the buffer. You get a SIGSEGV. To some of you, this will sound like what the Electric Fence toolkit used to be for. But these features are enabled by default. Electric Fence was also very slow. It took nearly 3 years to write these OpenBSD changes since performance was a serious consideration. (Early versions caused a nearly 50% slowdown). Our changes have tremendous benefits, but until some bugs in external packages are found and fixed, there are some risks as well. Some software making incorrect assumptions will be running into these new security technologies. I discussed this in talks I have given before: I said that we were afraid to go ahead with guard pages, because a lot of software is just written to such low standards. Applications over-read memory all the time, go 1 byte too far, read 1 byte too early, access memory after free, etc etc etc. Oh well -- we've decided that we will try to ship with this protection mechanism in any case, and try to solve the problems as we run into them. Two examples: Over the last two months, some OpenBSD users noticed that the X server was crashing occasionally. Two bugs have been diagnosed and fixed by us. One was a use-after-free bug in the X shared library linker. The other was a buffer-over-read bug deep down in the very lowest level fb* pixmap compositing routines. The latter bug in particular was very difficult to diagnose and fix, and is about 10 years old. We have found other bugs like this in other external software, and even a few in the base OpenBSD tree (though those were found a while back, even as we started experimenting with the new malloc code). I would bet money that the X fb* bug has crashed Linux (and other) X servers before. It is just that it was very rare, and noone ever chased it. The new malloc we have just makes code get lucky less often, which lets us get to the source of a bug easier. As a programmer, I appreciate anything which makes bugs easier to reproduce. We expect that our malloc will find more bugs in software, and this might hurt our user community in the short term. We know that what this new malloc is doing is perfectly legal, but that realistically some open source software is of such low quality that it is just not ready for these things to happen. We ask our users to help us uncover and fix more of these bugs in applications. Some will even be exploitable. Instead of saying that OpenBSD is busted in this regard, please realize that the software which is crashing is showing how shoddily it was written. Then help us fix it. For everyone.. not just OpenBSD users.
you remain incorrect.
you remain incorrect. you can still use the entire address space, which is considerably larger than 1GB, btw.
obviously, you aren't using the software in question, so maybe you should refrain from "informing" the people who do use it what's happening.
2 ^ 32 = 4294967296 bytes. Ea
2 ^ 32 = 4294967296 bytes. Each page is 4096 bytes.
4294967296 / 4096 = 1048576 pages.
For every mmaped page there are two more reserved ones, e.g.: allocatin a page at position 4096 makes pages starting at offset 0 and 8192 unusable (one page before and other after), if you follow this allocation pattern (allocating at offsets: 4096, 12288, 16384) for every allocated page there are two unusable ones.
1048576 / 3 = 349525 pages. That's 1431654400 bytes or 1.33333206 gigabytes for a single process.
Those values can vary upper or down depending on the process allocation pattern.
ok.. that's the worst case...
But really, how many processes allocate gigabytes of storage 4K at a time? For very small allocations, it sounds like OpenBSD still aggregates the allocations. It's only for larger allocations that it mmap()'s with a guardband.
I'm willing to bet that the actual address space utilization is much larger than you suggest. For instance, if the average mmap()'d allocation is 8K, then the amount of unused virtual address space drops to 1.3GBm and the available address space rises to 2.7GB for a 32-bit machine.
I'm not exactly sure what OpenBSD does, but in general 32-bit OSes seem to favor a 2G/2G split in virtual address space as a default. I tend to believe that in practice no one hits an artificial malloc() limit for all but the most obscene allocation patterns. And you could argue that programs that exhibit those particular allocation patterns have other issues anyway.
openbsd splits the address sp
openbsd splits the address space 3.25 user .75 kernel for i386.
I recently converted a plan9
I recently converted a plan9 based OS on ia32 from using the 2gig/2gig split to an arbitrary page aligned split. This gave me some insight's on why some OSs do this sort of thing (as well as ulcers). It is bizarre how the good people at bell labs thought testing the high bit (sign) of an address would be a suitable means of determining if an address was kernel or user space. The lesson taken from this is that from day one KZERO should be allowed to be any page aligned address. Sigh.
-michael
it doesn't work like that.
if you think about this for a second, you'll realize that the guard page coming after allocation A can be used as the guard page before allocation B. anyway, the actual implementation doesn't work like that either. also, even if you run out of space, it's not like the guard pages are completely off limits, they can be converted into non-guard pages at any time.
if you care, look at the sources or run some tests, don't speculate.
It is 1/2, not 1/3 - simple math
It is one half, as you can obviously share guard pages - the way you describe it any two subsequent 1-page allocs are separated by 2 empty gurad pages - I trust the OpenBSD developers are smart enough to figure that one out :). 1-page allocs are also rare, apps either have a lot of tiny objects (which malloc handles differently than large objects), or pretty huge buffers.
No, no, no, no, no. Your estimation is wildly inaccurate.
Aside from the fact that it assumes that you'll only ever be allocating single pages at a time with malloc or mmap, the way you arrived at the figure of only having access to 1/3 of the virtual address space is completely off.
Let's say for a moment that we had some pathological process that only ever allocated chunks of memory that were exactly one page in size, and let's say that it somehow managed to keep allocating memory until it filled the whole of the address space.
You'll discover something: many of the guard pages for the various blocks of memory overlap. Yup, blocks can share guard pages. You might even get a sequence of pages like the following:
_M_M_M_M_M_ where: M = physical page mapped onto virtual address space _ = a guard pageSo you see, the worst case horribly pathological memory allocation scenario actually varies between 1/3 and 1/2. Of course, anything below 1/2 is an edge case, and proclaiming that it's stupid to allocate memory like this is stupid because of such edge cases is akin to saying that, say, we shouldn't use quicksort because even though it's average performance is O(nlogn), it has a worst-case performance of O(n^2), or we shouldn't use languages like ML or Haskell that use type inference because the algorithms to do this are potentially insoluble in a few pathological edge cases.
PaX
Doesn't PaX/GRSecurity do all that and more? and when are they going to go mainline?
Come on... Don't be so mean -
Come on... Don't be so mean - let the OpenBSD fanboys have their moment of joy...
like duh
no, it doesn't
Why the heck do these BSD guy
Why the heck do these BSD guys have to insult people every time they have the chance to do so?
poor thing
I think OpenBSD's communication might be more 'direct' (like hammer and anvil) than other projects, but that does not take away the technical advances made or even the truth in Theo's arguments. If security is your goal, it would be seriously stupid to let political correctness and the easily offended get in the way. As long as they do have a valid point to make, they can phrase it like they want in my opinion.
A program that touches memory it doesn't own (anymore/yet) can be called shoddy, and these are bugs that should be fixed. It's almost sad that operating system developers have to force user software developers to fix their bugs to obtain some form of security. I hope developers will make good use of this chance to improve the quality of open source software.
And off you go to fix some software...
What about system call for each free(3) call?
How about performance of the added system call on each free(3) call?
I understand the big advantage of giving back the page to the system, both for memory conservation and for security proposes, but one of the advantages of malloc()/free() in userland is that they save system calls and therefore are "easier" on CPU.
there is some caching done.
there is some caching done.
Would you mind elaborating a bit?
Would you mind elaborating a bit about the caching?
From the news item I gather that free(3) just calls unmap(2),
If it doesn't do this right away then when does the unmap occure, and wouldn't it loose part of the security advantage of having the page outside the userspace's reach?
Thanks.
64k can be kept on the freeli
64k can be kept on the freelist. there are options to adjust this.
Thanks. So that reduces the security for these?
Thanks.
So the security enhancement of unmap(2)'ing these pages won't be relevant for these pages, right?
(I suppose they'll still benefit from their own random virtual address).
Cheers.