[PATCH 07/31] lmb: Add reserve_lmb/free_lmb

Previous thread: [PATCH] rcu: don't call rcu_preempt_note_context_switch() in rcu_check_callbacks() by Lai Jiangshan on Sunday, March 28, 2010 - 7:47 pm. (11 messages)

Next thread: [PATCH 31/31] x86: make e820 to be __initdata by Yinghai Lu on Sunday, March 28, 2010 - 7:43 pm. (1 message)
From: Yinghai Lu
Date: Sunday, March 28, 2010 - 7:43 pm

They will check if the region array is big enough.

__check_and_double_region_array will try to double the region if that array spare
slots if not big enough.
find_lmb_area() is used to find good postion for new region array.
Old array will be copied to new array.

Arch code should provide to get_max_mapped, so the new array have accessiable
address

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 include/linux/lmb.h |    4 ++
 mm/lmb.c            |   89 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 93 insertions(+), 0 deletions(-)

diff --git a/include/linux/lmb.h b/include/linux/lmb.h
index 05234bd..95ae3f4 100644
--- a/include/linux/lmb.h
+++ b/include/linux/lmb.h
@@ -83,9 +83,13 @@ lmb_end_pfn(struct lmb_region *type, unsigned long region_nr)
 	       lmb_size_pages(type, region_nr);
 }
 
+void reserve_lmb(u64 start, u64 end, char *name);
+void free_lmb(u64 start, u64 end);
+void add_lmb_memory(u64 start, u64 end);
 u64 __find_lmb_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
 			 u64 size, u64 align);
 u64 find_lmb_area(u64 start, u64 end, u64 size, u64 align);
+u64 get_max_mapped(void);
 
 #include <asm/lmb.h>
 
diff --git a/mm/lmb.c b/mm/lmb.c
index d5d5dc4..9798458 100644
--- a/mm/lmb.c
+++ b/mm/lmb.c
@@ -551,6 +551,95 @@ int lmb_find(struct lmb_property *res)
 	return -1;
 }
 
+u64 __weak __init get_max_mapped(void)
+{
+	u64 end = max_low_pfn;
+
+	end <<= PAGE_SHIFT;
+
+	return end;
+}
+
+static void __init __check_and_double_region_array(struct lmb_region *type,
+			 struct lmb_property *static_region,
+			 u64 ex_start, u64 ex_end)
+{
+	u64 start, end, size, mem;
+	struct lmb_property *new, *old;
+	unsigned long rgnsz = type->nr_regions;
+
+	/* Do we have enough slots left ? */
+	if ((rgnsz - type->cnt) > max_t(unsigned long, rgnsz/8, 2))
+		return;
+
+	old = type->region;
+	/* Double the array size */
+	size = sizeof(struct lmb_property) * rgnsz * 2;
+	if (old == static_region)
+		start = ...
From: Michael Ellerman
Date: Monday, March 29, 2010 - 5:22 am

^ This is (sort of) what lmb.rmo_size represents. So maybe instead of
adding this function, we could just say that the arch code needs to set
rmo_size up with an appropriate value, and then use that below. Though
maybe that's conflating things.


Doesn't this mean that if I call lmb_alloc() or lmb_free() too many
times then I'll potentially run out of space? So doesn't that
essentially break the existing API?

It seems to me that rather than adding these "special" routines that
check for enough space on the way in, instead you should be checking in
lmb_add_region() - which is where AFAICS all allocs/frees/reserves
eventually end up if they need to insert a new region.

cheers


From: Yinghai Lu
Date: Monday, March 29, 2010 - 9:45 am

ok

will have another patch following this patchset. to use rmo_size replace get_max_mapped()

long __init_lmb lmb_add(u64 base, u64 size)
{
        struct lmb_region *_rgn = &lmb.memory;

        /* On pSeries LPAR systems, the first LMB is our RMO region. */
        if (base == 0)
                lmb.rmo_size = size;

        return lmb_add_region(_rgn, base, size);

}

looks scary.

No, I didn't touch existing API, arches other than x86 should have little change about 
lmb.memory.region
lmb.reserved.region

later i prefer to replace lmb_alloc with find_lmb_area + reserve_lmb.

Thanks

Yinghai
--

From: Michael Ellerman
Date: Monday, March 29, 2010 - 3:20 pm

No don't, Benh's idea was better. Leave rmo_size for now, we can clean
that up later.

We just need a lmb.alloc_limit and a lmb_set_alloc_limit() which arch
code calls when it knows what the alloc limit is (and can call multiple
times during boot). Or maybe it should be called "default_alloc_limit",

It's not really scary, and it gives you a hint where the code came from
originally :)

We can remove that later though, with some powerpc code to detect the

But that's my point. You shouldn't need to touch the existing API, and
you shouldn't need to add a new parallel API. You should just be able to
add the logic for doubling the array in the lmb core, and then everyone
gets dynamically expandable lmb. I don't see any reason why we want to

Why? The existing code has been working for years and is well tested?

cheers


From: Yinghai Lu
Date: Monday, March 29, 2010 - 3:37 pm

that will have too much change for all x86 caller.

that set of API is __init, and will be freed later. 


We need to find_lmb_area to find good position for new reserved region array.

static void __init __reserve_lmb(u64 start, u64 end, char *name)
{
        __check_and_double_region_array(&lmb.reserved, &lmb_reserved_region[0], start, end);
        lmb_reserve(start, end - start);
}

with the execlude area in __check_and_double_region_lmb() will not put the new array overlap with area that we are going to reserve.

and find_lmb_area + reserve_lmb should be identical to lmb_alloc...

Thanks

Yinghai Lu
--

From: Benjamin Herrenschmidt
Date: Monday, March 29, 2010 - 4:34 pm

No. Do it now. get_max_mapped() sucks as an identifier.

Cheers,
Ben.

 

--

From: Yinghai Lu
Date: Monday, March 29, 2010 - 4:53 pm

ok, can i reuse rmo_size, or introduce one new member in struct lmb.

default_limit?

Thanks

Yinghai
--

From: Michael Ellerman
Date: Monday, March 29, 2010 - 9:13 pm

alloc_limit or default_alloc_limit

cheers
From: Yinghai Lu
Date: Monday, March 29, 2010 - 9:21 pm

looks that is not accurate.

if someone want to find some area, but not going to access that range, then we should let them alloc it.

how about access_limit?

Thanks

Yinghai Lu
--

From: Benjamin Herrenschmidt
Date: Monday, March 29, 2010 - 10:29 pm

No, no you don't get it. default_alloc_limit is fine. It's the -default-
limit. It doesn't apply to lmb_alloc_base() that takes an explicit
limit...

Something along those lines.

Cheers,
Ben.


--

From: Yinghai Lu
Date: Monday, March 29, 2010 - 10:40 pm

fine, will use default_limit.

Yinghai
--

From: Benjamin Herrenschmidt
Date: Monday, March 29, 2010 - 10:24 pm

lmb.default_limit sounds good to me.

Cheers,
Ben.


--

From: Benjamin Herrenschmidt
Date: Monday, March 29, 2010 - 4:31 pm

I still don't totally understand why he needs a find_lmb_area()
anyways. 

It might be justified ... or not. I just want it to be better
documented.

Cheers,
Ben.


--

From: Yinghai Lu
Date: Monday, March 29, 2010 - 5:03 pm

current changelog for that

------------------

Subject: [PATCH 6/31] lmb: Add lmb_find_area()

It will try find area according with size/align in specified range (start, end).

Need use it find correct buffer for new lmb.reserved.region.

also make it more easy for x86 to use lmb.
x86 early_res is using find/reserve pattern instead of alloc.

lmb_find_area() will honor goal

When we need temporary buff for range array etc for range work, if We are using
lmb_alloc(), We will need to add some post fix code for buffer that is used
by range array, because it is in the lmb.reserved already.

----------------

in short: It could make us to avoid use the range that we are going to reserve,
      when we try to get new position new lmb.reserved.region.

Thanks

Yinghai
--

From: Yinghai Lu
Date: Monday, March 29, 2010 - 11:12 pm

1. you want to reserve rangeA
2. before that will check if region array is big enough,
3. if region is not big enough, will call lmb_alloc to get new range.
   lmb_alloc could return rangB that is overlapped with rangeA
4. current lmb_alloc only honor limit, and doesn't honor low limit.

another usage is: for temporary buffer, like range array for subtraction.

that is some difference between them, and lmb_alloc doesn't honor low limit.

we can provide
lmb_find_area
lmb_reserve_area
lmb_free_area

and use lmb_find_area + lmb_reserve_area to get one lmb_alloc()

x86 sometime is using find_lmb_area to find big area, and use those area and later reserve actually used area.

you could use lmb_alloc, and later lmb_free not used. that is equal to lmb_find + lmb_reserve + lmb_free ...

Thanks

Yinghai Lu
--

From: Michael Ellerman
Date: Monday, March 29, 2010 - 11:46 pm

So instead you do it the other way.

1. you want to reserve rangeA
2. you reserve rangeA
3. if reserving rangeA consumed a slot in the array then you check if
you have at least two free slots. If not you realloc. You don't need any
special tricks because you have space to lmb_alloc() a new area and move
everything over.

cheers
From: Yinghai Lu
Date: Monday, March 29, 2010 - 11:57 pm

so that is check it later. should work.

one less find_lmb_area user.

Thanks

Yinghai
--

From: Benjamin Herrenschmidt
Date: Tuesday, March 30, 2010 - 2:30 pm

I see. This is a direct consequence of you wanting to use find/reserve
instead of alloc tho :-)

This is also easily fixed. Instead of doing the resize "on demand" like
I originally proposed, do it at the end of reserve/alloc. If the number
of free entries is down to 1 or 0, then alloc a new chunk.

Of course, all of that requires that reservations done from FW to take
memory out because it must not be accessed need to be all done before
the first grow of the array, and so the static array must be sized
accordingly. We may want to catch these things too. We don't want to


If you want a low and a high limit, then add the low limit to
lmb_alloc_base(), it's easy to fix all callers, there aren't many, and
make not one but two defaults for lmb_alloc(), one for the low limit and

I still don't understand why you insist on using find + reserve instead
of alloc in x86 land. Can you give me a proper explanation as to why
that is needed since it seems to be causing problems, and so far and I

That's very wrong. If you use something, alloc/reserve it. You can

Sure, then just do alloc + later free.

Cheers,
Ben.


--

From: Yinghai Lu
Date: Tuesday, March 30, 2010 - 3:42 pm

ok, I will replace that later after it get stable.
but at first will expose the find_lmb_area.

will send out -v11, please check that version.

Thanks

Yinghai
--

From: Benjamin Herrenschmidt
Date: Monday, March 29, 2010 - 10:26 pm

I'm not too sure I follow you. For the resizing, I would just basically
call a low level variant of alloc (__lmb_alloc ?) that explicitely
doesn't honor the total-2 "reserved" entries in the array.

Ie. It should all be one single find/allocation function.

In fact, you want to split lmb_find from lmb_reserve, then just make
lmb_alloc use the above, I don't want 2 implementations of the same
thing (maybe call it __lmb_find to expose the fact that it's a low level
function to avoid for normal use).

Cheers,
Ben.


--

From: Benjamin Herrenschmidt
Date: Monday, March 29, 2010 - 2:49 pm

Well... not quite.

The RMO (which really is the RMA, historical misnaming) is the region of
memory we can access very early during boot (in real mode on ppc64 but
I plan to use it to represent the boot-time fixed mapping on ppc32 at
some stage). It's not strictly equivalent to max lowmem. However, on
ppc64, it happens to be the size of the first added LMB entry/

In any case, the LMBs should really not care. They allocate where you
tell them to. 

IE. That stuff is arch specific enough that I suspect we should just
move it out, while the concept of max_lowmem is common enough (at least
for 32-bit archs) that I'm happy to have some provisions for it inside
the LMB core.

Maybe what we need is an arch call to set the allocation "limit". It
could be set in stages during boot. To the initial mapped memory (bolted
TLB) on ppc32 very early, and then pushed up to max_low_pfn as soon as
the full MMU setup is done for example. IE. All we need is an
lmb_set_alloc_limit() called by the arch in the right spots, that
defines the default allocation limit for lmb_alloc() though of course
lmb_alloc_base() can be used by callers to enforce explicit limits.

Cheers,
Ben.


--

Previous thread: [PATCH] rcu: don't call rcu_preempt_note_context_switch() in rcu_check_callbacks() by Lai Jiangshan on Sunday, March 28, 2010 - 7:47 pm. (11 messages)

Next thread: [PATCH 31/31] x86: make e820 to be __initdata by Yinghai Lu on Sunday, March 28, 2010 - 7:43 pm. (1 message)