Re: some cleanup of of uvm_map.c

Previous thread: LIFE GOOD NEWSLETTER no 60 by Akis Angelakis on Saturday, January 30, 2010 - 2:00 pm. (1 message)

Next thread: compilation error with moko kernel by dermiste on Sunday, January 31, 2010 - 9:35 am. (1 message)
From: Anton Maksimenkov
Date: Sunday, January 31, 2010 - 5:43 am

Here is some cleanup of uvm_map.c code.

At first, in uvm_map_advice() function.
Let it looks as simple as uvm_map_inherit() - no need to lock/unlock map
just to realize that sanity check fail. Also no need to do it in every loop.

Second, remove redundant "temp_entry" variable from both functions.
One "entry" variable is enough to do the job.

$ cvs diff -Nup sys/uvm/uvm_map.c
Index: sys/uvm/uvm_map.c
===================================================================
RCS file: /cvs/src/sys/uvm/uvm_map.c,v
retrieving revision 1.123
diff -u -p -r1.123 uvm_map.c
--- sys/uvm/uvm_map.c   28 Aug 2009 00:40:03 -0000      1.123
+++ sys/uvm/uvm_map.c   31 Jan 2010 11:35:56 -0000
@@ -2473,7 +2473,7 @@ int
 uvm_map_inherit(struct vm_map *map, vaddr_t start, vaddr_t end,
     vm_inherit_t new_inheritance)
 {
-       struct vm_map_entry *entry, *temp_entry;
+       struct vm_map_entry *entry;
        UVMHIST_FUNC("uvm_map_inherit"); UVMHIST_CALLED(maphist);
        UVMHIST_LOG(maphist,"(map=%p,start=0x%lx,end=0x%lx,new_inh=0x%lx)",
            map, start, end, new_inheritance);
@@ -2492,11 +2492,10 @@ uvm_map_inherit(struct vm_map *map, vadd

        VM_MAP_RANGE_CHECK(map, start, end);

-       if (uvm_map_lookup_entry(map, start, &temp_entry)) {
-               entry = temp_entry;
+       if (uvm_map_lookup_entry(map, start, &entry)) {
                UVM_MAP_CLIP_START(map, entry, start);
        } else {
-               entry = temp_entry->next;
+               entry = entry->next;
        }

        while ((entry != &map->header) && (entry->start < end)) {
@@ -2519,18 +2518,29 @@ uvm_map_inherit(struct vm_map *map, vadd
 int
 uvm_map_advice(struct vm_map *map, vaddr_t start, vaddr_t end, int new_advice)
 {
-       struct vm_map_entry *entry, *temp_entry;
+       struct vm_map_entry *entry;
        UVMHIST_FUNC("uvm_map_advice"); UVMHIST_CALLED(maphist);
        UVMHIST_LOG(maphist,"(map=%p,start=0x%lx,end=0x%lx,new_adv=0x%lx)",
            map, start, end, ...
From: Bob Beck
Date: Monday, February 1, 2010 - 10:48 am

I don't think you want to do that.. uvm_map_lookup_entry can screw
with that pointer even when it fails.

bad juju

From: Anton Maksimenkov
Date: Monday, February 1, 2010 - 12:56 pm

Sorry, I don't clearly understand what you mean. What is bad?
I think that "screw with that pointer even when it fails" is exactly
what uvm_map_lookup_entry must do and what we need here.

uvm_map_protect() exploit exactly same trick. And uvm_map_extract() do the same:
...
if (uvm_map_lookup_entry(srcmap, start, &entry)) {
    ...do some work with entry...
} else {
    /* "start" is not within an entry ... skip to next entry */
    if (flags & UVM_EXTRACT_CONTIG) {
        error = EINVAL;
        goto bad;    /* definite hole here ... */
    }
    entry = entry->next;
    fudge = 0;
}

AFAIS, uvm_map_lookup_entry() set the "entry" either to the
vm_map_entry which contain "start" address (success)
or to the last vm_map_entry in the map (fail).
If it fail and set "entry" to the last vm_map_entry then "entry =
entry->next" move entry to the map->header.
So loop with
 while ((entry != &map->header) && (entry->start < end)) {
will never be executed (it is right thing).

Overcomplicated using of RB_TREE in uvm_map_lookup_entry() making it
hard to understand.
I thinking about drop this RB_TREE out from uvm_map.c or completely
rewrite using of this RB_TREE.
-- 
antonvm

From: Anton Maksimenkov
Date: Monday, February 1, 2010 - 1:21 pm

Oh, sorry, I mean "in uvm_map_findspace()" of course.
In uvm_map_lookup_entry() the use of RB_TREE is understandable.
-- 
antonvm

From: Ariane van der Steldt
Date: Wednesday, February 3, 2010 - 8:33 am

I'm pretty sure that function is bugged: I think it doesn't do
wrap-around on the search for an entry. (Should be easily provable, just
hint=vm_map_max(map) when entering uvm_map_findspace().)
You won't notice in kernel space, since every caller will set
hint=vm_map_min(map), but I suspect a number of failed allocations in

Both functions are rather complex. And both functions pre-date random
allocation. All this map->hint business has no reason to remain, since
random allocation is here to stay and the hint has a high failure rate
since then (if it is used, which it usually isn't anyway).

The RB_TREE code in uvm_map_findspace is a pretty good solution given
the map entry struct. You will need to understand the RB_AUGMENT hack
that is only used in this place of the kernel though.

Ciao,
-- 
Ariane

From: Owain Ainsworth
Date: Wednesday, February 3, 2010 - 10:10 am

Yes. Henning has a bgpd core router where bgpd occasionally dies due to
an allocation failure and needs restarting (should be plenty of memory
left, it looks like map starvation). I'm 90% sure that this is the
cause.

It's not so simple to fix though. you can't just go straight to the
start of the map. Say i386 where W^X is done using segments. if you dump
something right at the other end of the segment, that pretty much screws
it. Perhaps going back to the min hint for the protection and trying to
push just a little bit down may work?

-0-
-- 
In defeat, unbeatable; in victory, unbearable.
		-- Winston Churchill, on General Montgomery

From: Anton Maksimenkov
Date: Wednesday, February 3, 2010 - 11:51 pm

Let me introduce my idea.

In fact, we have only two functions in uvm_map.c which make searching
in maps. These are uvm_map_lookup_entry() and uvm_map_findspace().
The uvm_map_lookup_entry() do search by address (VA), while
uvm_map_findspace() do search by free space.

And we have a RB_HEAD(uvm_tree, vm_map_entry) rbhead, which is
"indexed by address" (see uvm_compare()). Since that this RB_TREE
provide a "searching by address" option to us. This is what the
RB_TREE used for.
But when we try to use that RB_TREE for track "free space" and to
SEARCH by free space  we do dirty and ugly things!
Then we add a RB_AUGMENT hack (it is so ugly and hard to use right
with others that you can't find it anywhere in source tree... exept
the uvm_map.c) and other tricks... In the end we got very unclear
uvm_map.c code in these places, which is very hard to understand and
track, and which is overloaded and overcomplexed by that tricks and
crutches.
We must stop it, because it keeps hold our hands and brains.

We can do things very clear and easy. Let me show how exactly.

Let's just add second RB_TREE, which is "indexed by space", let's call
it RB_HEAD(uvm_tree_by_space, vm_map_entry) rbhead_by_space, for
example. That uvm_tree_by_space will contain vm_map_entry'es sorted by
free space. And that uvm_tree_by_space will be used only in
uvm_map_findspace() to search for  vm_map_entry with needed free
space. RB_NFIND() is our best friend here! Simple, reasonable fast,
clearly and easy to understand.
Actually, since we can have two or more vm_map_entry'es with equal
space, so each RB_TREE element will be the list of vm_map_entry'es
with that equal space. We can use some additional logic here when we
push/pop vm_map_entry from that list  we can pop vm_map_entry with
smallest start address, for example.

Then we must free the first RB_TREE (uvm_tree, "indexed by address")
from tracking space/ownspace and other nasty tricks, and stop using it
in uvm_map_findspace(). This uvm_tree must be ...
From: Ariane van der Steldt
Date: Thursday, February 4, 2010 - 6:50 am

uvm_map_findspace() is also used when searching by addr:
mmap(..., MAP_FIXED) requires a search by addr.
uvm_map_findspace() should also be able to take an addr constraint: like

Nah, RB_AUGMENT is easy. When an item in the tree is deleted, each node
in the tree that is altered (position or insertion/removal) has each of
its children and each of its parents RB_AUGMENTed. The RB_AUGMENT calls
are done in such a way that each node is process prior to

I use that trick in uvm_pmemrange. It's a nice algorithm, easy to
implement. Keep in mind however: you'll need to support random
allocation, or the diff will be rejected.

I was actually thinking along the same lines: rb-tree of free space,
ordered by size.
Let: start = first rb-entry with enough space for allocation
Let: end = rb_max(free tree)
Let: chosen = random chunk in [start, end]

You'll want indices on the tree, so the random generator can be given a
number to use. (If the number is too large, you'll get bias on the
largest segments, if too small, you won't consider them.)

Once you have 'chosen', you need to generate a random address inside.

Let: offset = random number between 0 and (end-start)
Now you have a random address inside chosen:
chosen.start + offset.

Disadvantage of the algorithm is that it requires 2 random numbers (the
current algorithm only requires 1). Advantage is that this algorithm
always produces a valid address, so no forward searching is necessary
anymore.

The address may have to be shifted because pmap_prefer says so. If
that's the case, do so whenever possible: it removes aliasing problems
on some architectures (afaik pmap has a way of dealing with it, but

You can't simply take the lowest start address. It would make addresses

RB_AUGMENT (as it is implemented) is a hack. However the idea behind
RB_AUGMENT is sound. It enables some algorithms that would be hideously
expensive without (for example, RB_AUGMENT can produce O(1) size lookups
and make the tree indexed). I agree that it is ...
From: Anton Maksimenkov
Date: Thursday, February 4, 2010 - 10:57 am

I remember about MAP_FIXED, just not mentioned it in my not-so-short message ;)
And I don't want 2 vm_map_entry per allocation, I only need to keep
each vm_map_entry in both trees. One vm_map_entry can contain 2
separate RB_TREE entries (for both trees), so each tree can work with
that vm_map_entry independantly.


Can anyone explain me what is the problem with i386 segments? Or
better supply some links to docs which explain it.
I don't understand what you mean - MAP_FIXED flag is the problem or it
used as workaround for some problem?
-- 
antonvm

From: Ariane van der Steldt
Date: Thursday, February 4, 2010 - 11:54 am

Oh, I didn't explain clearly what I meant there.

Suppose you have a 4MB entry with free memory.
You need to allocate, say, 4kB. This would split the entry in:
[1] free range in front of allocation
[2] allocated range
[3] free range behind allocation


The idea is that one part of the memory is executable, the other half is
not. We like the executable code of a process to be mapped in the
executable half, while we want data to not be in that part.
Paper by Theo has a good explanation on how the i386 works:
http://cvs.openbsd.org/papers/ven05-deraadt/index.html

Implementation can be seen in uvm_map_hint (the part between
#ifdef __i386__).

Wrt MAP_FIXED: it indicates that the caller wants the allocation on this
specific address. So you're not allowed to choose. This may mean the
range has to be broken in 3 parts, hence requiring 2 additional
map_entry (on top of the one you're editing).
Since the current code only requires one new entry, you'll be eating
entries twice as fast, which may starve uvm_km_getpage.

Ciao,
-- 
Ariane

From: Ariane van der Steldt
Date: Wednesday, February 3, 2010 - 8:06 am

The functional change is that a wrong advice on an empty range will now

I agree temp_entry is a redundant variable in both functions. I think
the compiler may already optimize that away (otoh, we are talking gcc

Ok with me.
-- 
Ariane

From: Anton Maksimenkov
Date: Thursday, April 22, 2010 - 8:56 pm

So, can I remind this simple cleanup?





--
antonvm

From: Ted Unangst
Date: Thursday, April 22, 2010 - 9:51 pm

Previous thread: LIFE GOOD NEWSLETTER no 60 by Akis Angelakis on Saturday, January 30, 2010 - 2:00 pm. (1 message)

Next thread: compilation error with moko kernel by dermiste on Sunday, January 31, 2010 - 9:35 am. (1 message)