Re: [PATCH 0/4] big chunk memory allocator v4

Previous thread: [PATCH 0/2] Generic hardware error reporting support by Huang Ying on Friday, November 19, 2010 - 1:10 am. (40 messages)

Next thread: Re: [RFC/RFT PATCH v3] sched: automated per tty task groups ... ps jaxk sid ... You are using the wrong number. by Robert de Bath on Friday, November 19, 2010 - 1:22 am. (3 messages)
From: KAMEZAWA Hiroyuki
Date: Friday, November 19, 2010 - 1:10 am

Hi, this is an updated version. 

No major changes from the last one except for page allocation function.
removed RFC.

Order of patches is

[1/4] move some functions from memory_hotplug.c to page_isolation.c
[2/4] search physically contiguous range suitable for big chunk alloc.
[3/4] allocate big chunk memory based on memory hotplug(migration) technique
[4/4] modify page allocation function.

For what:

  I hear there is requirements to allocate a chunk of page which is larger than
  MAX_ORDER. Now, some (embeded) device use a big memory chunk. To use memory,
  they hide some memory range by boot option (mem=) and use hidden memory
  for its own purpose. But this seems a lack of feature in memory management.

  This patch adds 
	alloc_contig_pages(start, end, nr_pages, gfp_mask)
  to allocate a chunk of page whose length is nr_pages from [start, end)
  phys address. This uses similar logic of memory-unplug, which tries to
  offline [start, end) pages. By this, drivers can allocate 30M or 128M or
  much bigger memory chunk on demand. (I allocated 1G chunk in my test).

  But yes, because of fragmentation, this cannot guarantee 100% alloc.
  If alloc_contig_pages() is called in system boot up or movable_zone is used,
  this allocation succeeds at high rate.

  I tested this on x86-64, and it seems to work as expected. But feedback from
  embeded guys are appreciated because I think they are main user of this
  function.

Thanks,
-Kame


  


--

From: KAMEZAWA Hiroyuki
Date: Friday, November 19, 2010 - 1:12 am

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Memory hotplug is a logic for making pages unused in the specified range
of pfn. So, some of core logics can be used for other purpose as
allocating a very large contigous memory block.

This patch moves some functions from mm/memory_hotplug.c to
mm/page_isolation.c. This helps adding a function for large-alloc in
page_isolation.c with memory-unplug technique.

Changelog: 2010/10/26
 - adjusted to mmotm-1024 + Bob's 3 clean ups.
Changelog: 2010/10/21
 - adjusted to mmotm-1020

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page-isolation.h |    7 ++
 mm/memory_hotplug.c            |  108 ---------------------------------------
 mm/page_isolation.c            |  111 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 118 insertions(+), 108 deletions(-)

Index: mmotm-1117/include/linux/page-isolation.h
===================================================================
--- mmotm-1117.orig/include/linux/page-isolation.h
+++ mmotm-1117/include/linux/page-isolation.h
@@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 
+/*
+ * For migration.
+ */
+
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long scan_lru_pages(unsigned long start, unsigned long end);
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
 
 #endif
Index: mmotm-1117/mm/memory_hotplug.c
===================================================================
--- mmotm-1117.orig/mm/memory_hotplug.c
+++ mmotm-1117/mm/memory_hotplug.c
@@ -615,114 +615,6 @@ int is_mem_section_removable(unsigned lo
 }
 
 /*
- * Confirm all pages in a range [start, end) is belongs to the same zone.
- */
-static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
-{
-	unsigned long pfn;
-	struct zone *zone = NULL;
-	struct ...
From: Minchan Kim
Date: Sunday, November 21, 2010 - 8:07 am

Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim
--

From: KAMEZAWA Hiroyuki
Date: Friday, November 19, 2010 - 1:14 am

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Unlike memory hotplug, at an allocation of contigous memory range, address
may not be a problem. IOW, if a requester of memory wants to allocate 100M of
of contigous memory, placement of allocated memory may not be a problem.
So, "finding a range of memory which seems to be MOVABLE" is required.

This patch adds a functon to isolate a length of memory within [start, end).
This function returns a pfn which is 1st page of isolated contigous chunk
of given length within [start, end).

If no_search=true is passed as argument, start address is always same to
the specified "base" addresss.

After isolation, free memory within this area will never be allocated.
But some pages will remain as "Used/LRU" pages. They should be dropped by
page reclaim or migration.

Changelog: 2010-11-17
 - fixed some conding style (if-then-else)

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/page_isolation.c |  146 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 146 insertions(+)

Index: mmotm-1117/mm/page_isolation.c
===================================================================
--- mmotm-1117.orig/mm/page_isolation.c
+++ mmotm-1117/mm/page_isolation.c
@@ -7,6 +7,7 @@
 #include <linux/pageblock-flags.h>
 #include <linux/memcontrol.h>
 #include <linux/migrate.h>
+#include <linux/memory_hotplug.h>
 #include <linux/mm_inline.h>
 #include "internal.h"
 
@@ -250,3 +251,148 @@ int do_migrate_range(unsigned long start
 out:
 	return ret;
 }
+
+/*
+ * Functions for getting contiguous MOVABLE pages in a zone.
+ */
+struct page_range {
+	unsigned long base; /* Base address of searching contigouous block */
+	unsigned long end;
+	unsigned long pages;/* Length of contiguous block */
+	int align_order;
+	unsigned long align_mask;
+};
+
+int __get_contig_block(unsigned long pfn, unsigned long nr_pages, void *arg)
+{
+	struct page_range *blockinfo = arg;
+	unsigned long ...
From: Minchan Kim
Date: Sunday, November 21, 2010 - 8:21 am

Acked-by: Minchan Kim <minchan.kim@gmail.com>

Just some trivial comment below. 

Intentionally, I don't add Reviewed-by. 
Instead of it, I add Acked-by since I support this work.

I reviewed your old version but have forgot it. :(




-- 
Kind regards,
Minchan Kim
--

From: KAMEZAWA Hiroyuki
Date: Sunday, November 21, 2010 - 5:11 pm

On Mon, 22 Nov 2010 00:21:31 +0900


ok.

Thanks,
-Kame

--

From: Minchan Kim
Date: Monday, November 22, 2010 - 4:20 am

On Fri, Nov 19, 2010 at 5:14 PM, KAMEZAWA Hiroyuki

Nitpick.
You used nr_pages in other place.

Does we really need this field 'align_mask'?




Could we make sure pass __trim_zone is to satisfy whole pfn in zone
what we want.
Repeated the zone check is rather annoying.
I mean let's __get_contig_block or __trim_zone already does check zone

Could we check get_pageblock_migratetype(page) == MIGRATE_MOVABLE in

If the base is 0, isn't it impossible return pfn 0?
x86 in FLAT isn't impossible but I think some architecture might be possible.
Just guessing.

How about returning negative value and return first page pfn and last



-- 
Kind regards,
Minchan Kim
--

From: KAMEZAWA Hiroyuki
Date: Tuesday, November 23, 2010 - 5:15 pm

On Mon, 22 Nov 2010 20:20:14 +0900




I'm not sure that's very good. pageblock-type can be fragmented and even
if pageblock-type is not MIGRATABLE, all pages in pageblock may be free.

Hmm, will add a check.

Thanks,

--

From: KAMEZAWA Hiroyuki
Date: Friday, November 19, 2010 - 1:15 am

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Add an function to allocate contiguous memory larger than MAX_ORDER.
The main difference between usual page allocator is that this uses
memory offline technique (Isolate pages and migrate remaining pages.).

I think this is not 100% solution because we can't avoid fragmentation,
but we have kernelcore= boot option and can create MOVABLE zone. That
helps us to allow allocate a contiguous range on demand.

The new function is

  alloc_contig_pages(base, end, nr_pages, alignment)

This function will allocate contiguous pages of nr_pages from the range
[base, end). If [base, end) is bigger than nr_pages, some pfn which
meats alignment will be allocated. If alignment is smaller than MAX_ORDER,
it will be raised to be MAX_ORDER.

__alloc_contig_pages() has much more arguments.


Some drivers allocates contig pages by bootmem or hiding some memory
from the kernel at boot. But if contig pages are necessary only in some
situation, kernelcore= boot option and using page migration is a choice.

Changelog: 2010-11-19
 - removed no_search
 - removed some drain_ functions because they are heavy.
 - check -ENOMEM case

Changelog: 2010-10-26
 - support gfp_t
 - support zonelist/nodemask
 - support [base, end) 
 - support alignment

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page-isolation.h |   15 ++
 mm/page_alloc.c                |   29 ++++
 mm/page_isolation.c            |  242 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 286 insertions(+)

Index: mmotm-1117/mm/page_isolation.c
===================================================================
--- mmotm-1117.orig/mm/page_isolation.c
+++ mmotm-1117/mm/page_isolation.c
@@ -5,6 +5,7 @@
 #include <linux/mm.h>
 #include <linux/page-isolation.h>
 #include <linux/pageblock-flags.h>
+#include <linux/swap.h>
 #include <linux/memcontrol.h>
 #include <linux/migrate.h>
 #include <linux/memory_hotplug.h>
@@ -396,3 ...
From: Minchan Kim
Date: Sunday, November 21, 2010 - 8:25 am

Acked-by: Minchan Kim <minchan.kim@gmail.com>


We need include #include <linux/bootmem.h> for using max_pfn. 

-- 
Kind regards,
Minchan Kim
--

From: KAMEZAWA Hiroyuki
Date: Sunday, November 21, 2010 - 5:13 pm

On Mon, 22 Nov 2010 00:25:56 +0900

will add that.

Thanks,
-Kame

--

From: Minchan Kim
Date: Monday, November 22, 2010 - 4:44 am

On Fri, Nov 19, 2010 at 5:15 PM, KAMEZAWA Hiroyuki

And later we can use compaction and reclaim, too.


I understand the goal of function.

Personally, I don't like the function name.
How about "__adjust_search_range"?


Why do we have to care about __GFP_IO|__GFP_FS?




-- 
Kind regards,
Minchan Kim
--

From: KAMEZAWA Hiroyuki
Date: Tuesday, November 23, 2010 - 5:20 pm

On Mon, 22 Nov 2010 20:44:03 +0900

Ah, yes. I'll check this was for what and remove this.

Thanks,
-Kame

--

From: KAMEZAWA Hiroyuki
Date: Friday, November 19, 2010 - 1:16 am

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Old story.
Because we cannot assume which memory section will be offlined next,
hotremove_migrate_alloc() just uses alloc_page(). i.e. make no decision
where the page should be migrate into. Considering memory hotplug's
nature, the next memory section near to a section which is being removed
will be removed in the next. So, migrate pages to the same node of original
page doesn't make sense in many case, it just increases load.
Migration destination page is allocated from the node where offlining script
runs.

Now, contiguous-alloc uses do_migrate_range(). In this case, migration
destination node should be the same node of migration source page.

This patch modifies hotremove_migrate_alloc() and pass "nid" to it.
Memory hotremove will pass -1. So, if the page will be moved to
the node where offlining script runs....no behavior changes.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page-isolation.h |    3 ++-
 mm/memory_hotplug.c            |    2 +-
 mm/page_isolation.c            |   21 ++++++++++++++++-----
 3 files changed, 19 insertions(+), 7 deletions(-)

Index: mmotm-1117/include/linux/page-isolation.h
===================================================================
--- mmotm-1117.orig/include/linux/page-isolation.h
+++ mmotm-1117/include/linux/page-isolation.h
@@ -41,7 +41,8 @@ extern void alloc_contig_freed_pages(uns
 
 int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
 unsigned long scan_lru_pages(unsigned long start, unsigned long end);
-int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
+int do_migrate_range(unsigned long start_pfn,
+	unsigned long end_pfn, int node);
 
 /*
  * For large alloc.
Index: mmotm-1117/mm/memory_hotplug.c
===================================================================
--- mmotm-1117.orig/mm/memory_hotplug.c
+++ mmotm-1117/mm/memory_hotplug.c
@@ -724,7 +724,7 @@ repeat:
 
 	pfn = ...
From: Minchan Kim
Date: Monday, November 22, 2010 - 5:01 am

On Fri, Nov 19, 2010 at 5:16 PM, KAMEZAWA Hiroyuki
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>


-- 
Kind regards,
Minchan Kim
--

From: Andrew Morton
Date: Friday, November 19, 2010 - 1:56 pm

On Fri, 19 Nov 2010 17:10:33 +0900

So this is an alternatve implementation for the functionality offered

From where I sit, feedback from the embedded guys is *vital*, because
they are indeed the main users.

Michal, I haven't made a note of all the people who are interested in
and who are potential users of this code.  Your patch series has a
billion cc's and is up to version 6.  Could I ask that you review and
test this code, and also hunt down other people (probably at other
organisations) who can do likewise for us?  Because until we hear from
those people that this work satisfies their needs, we can't really
proceed much further.

Thanks.



--

From: KAMEZAWA Hiroyuki
Date: Sunday, November 21, 2010 - 5:04 pm

On Fri, 19 Nov 2010 12:56:53 -0800

Yes, this will be a backends for that kind of works.

I think there are two ways to allocate contiguous pages larger than MAX_ORDER.

1) hide some memory at boot and add an another memory allocator.
2) support a range allocator as [start, end)

This is an trial from 2). I used memory-hotplug technique because I know some.
This patch itself has no "map" and "management" function, so it should be

yes. please.

Thanks,
-Kame

--

From: Michał Nazarewicz
Date: Tuesday, November 23, 2010 - 8:46 am

As a matter of fact CMA's v6 tries to use code "borrowed" from the alloc_contig_pages()
patches.

The most important difference is that alloc_contig_pages() would look for a chunk
of memory that can be allocated and then perform migration whereas CMA assumes that
regions it controls are always "migratable".


Yes, this is also a valid point.  From my use cases, the alloc_contig_pages()

Ah, yes...  I was thinking about shrinking the cc list but didn't want to
seem rude or anything removing ppl who have shown interest in the previous

A few things than:

1. As Felipe mentioned, on ARM it is often desired to have the memory
    mapped as non-cacheable, which most often mean that the memory never
    reaches the page allocator.  This means, that alloc_contig_pages()
    would not be suitable for cases where one needs such memory.

    Or could this be overcome by adding the memory back as highmem?  But
    then, it would force to compile in highmem support even if platform
    does not really need it.

2. Device drivers should not by themselves know what ranges of memory to
    allocate memory from.  Moreover, some device drivers could require
    allocation different buffers from different ranges.  As such, this
    would require some management code on top of alloc_contig_pages().

3. When posting hwmem, Johan Mossberg mentioned that he'd like to see
    notion of "pinning" chunks (so that not-pinned chunks can be moved
    around when hardware does not use them to defragment memory).  This
    would again require some management code on top of
    alloc_contig_pages().

4. I might be mistaken here, but the way I understand ZONE_MOVABLE work
    is that it is cut of from the end of memory.  Or am I talking nonsense?
    My concern is that at least one chip I'm working with requires
    allocations from different memory banks which would basically mean that
    there would have to be two movable zones, ie:

    +-------------------+-------------------+
    | Memory ...
From: KAMEZAWA Hiroyuki
Date: Tuesday, November 23, 2010 - 5:36 pm

On Tue, 23 Nov 2010 16:46:03 +0100

I'll continue to update patches, you can freely reuse my code and integrate
this set to yours. I works for this firstly for EMBEDED but I want this to be
a _generic_ function for gerenal purpose architecture.
There may be guys who want 1G page on a host with tons of free memory.


Thanks,
-Kame
 

--

From: Felipe Contreras
Date: Sunday, November 21, 2010 - 5:30 pm

On Fri, Nov 19, 2010 at 10:56 PM, Andrew Morton

Actually, now that's not needed any more by using memblock:

As I've explained before, a contiguous memory allocator would be nice,
but on ARM many drivers not only need contiguous memory, but
non-cacheable, and this requires removing the memory from normal
kernel mapping in early boot.

Cheers.

-- 
Felipe Contreras
--

From: Kleen, Andi
Date: Monday, November 22, 2010 - 1:59 am

I see them more as orthogonal: Michal's code relies on preallocation
and manages the memory after that.

This code supplies the infrastructure to replace preallocation
with just using movable zones.

-Andi


--

From: Michał Nazarewicz
Date: Tuesday, November 23, 2010 - 8:44 am

Yes and no.  The v6 version adds not-yet-finished support for sharing
the preallocated blocks with page allocator (so if CMA is not using the
memory, page allocator can allocate it, and when CMA finally wants to
use it the allocated pages are migrated).

In the v6 implementation I have added a new migration type (I cannot seem
to find who proposed such approach first).  When I'll end debugging the
code I'll try to work things out without adding additional entity (that
is new migration type).

-- 
Best regards,                                        _     _
| Humble Liege of Serenely Enlightened Majesty of  o' \,=./ `o
| Computer Science,  Michał "mina86" Nazarewicz       (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--
--

Previous thread: [PATCH 0/2] Generic hardware error reporting support by Huang Ying on Friday, November 19, 2010 - 1:10 am. (40 messages)

Next thread: Re: [RFC/RFT PATCH v3] sched: automated per tty task groups ... ps jaxk sid ... You are using the wrong number. by Robert de Bath on Friday, November 19, 2010 - 1:22 am. (3 messages)