Hi, this is an updated version. No major changes from the last one except for page allocation function. removed RFC. Order of patches is [1/4] move some functions from memory_hotplug.c to page_isolation.c [2/4] search physically contiguous range suitable for big chunk alloc. [3/4] allocate big chunk memory based on memory hotplug(migration) technique [4/4] modify page allocation function. For what: I hear there is requirements to allocate a chunk of page which is larger than MAX_ORDER. Now, some (embeded) device use a big memory chunk. To use memory, they hide some memory range by boot option (mem=) and use hidden memory for its own purpose. But this seems a lack of feature in memory management. This patch adds alloc_contig_pages(start, end, nr_pages, gfp_mask) to allocate a chunk of page whose length is nr_pages from [start, end) phys address. This uses similar logic of memory-unplug, which tries to offline [start, end) pages. By this, drivers can allocate 30M or 128M or much bigger memory chunk on demand. (I allocated 1G chunk in my test). But yes, because of fragmentation, this cannot guarantee 100% alloc. If alloc_contig_pages() is called in system boot up or movable_zone is used, this allocation succeeds at high rate. I tested this on x86-64, and it seems to work as expected. But feedback from embeded guys are appreciated because I think they are main user of this function. Thanks, -Kame --
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Memory hotplug is a logic for making pages unused in the specified range
of pfn. So, some of core logics can be used for other purpose as
allocating a very large contigous memory block.
This patch moves some functions from mm/memory_hotplug.c to
mm/page_isolation.c. This helps adding a function for large-alloc in
page_isolation.c with memory-unplug technique.
Changelog: 2010/10/26
- adjusted to mmotm-1024 + Bob's 3 clean ups.
Changelog: 2010/10/21
- adjusted to mmotm-1020
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
include/linux/page-isolation.h | 7 ++
mm/memory_hotplug.c | 108 ---------------------------------------
mm/page_isolation.c | 111 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 118 insertions(+), 108 deletions(-)
Index: mmotm-1117/include/linux/page-isolation.h
===================================================================
--- mmotm-1117.orig/include/linux/page-isolation.h
+++ mmotm-1117/include/linux/page-isolation.h
@@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_
extern int set_migratetype_isolate(struct page *page);
extern void unset_migratetype_isolate(struct page *page);
+/*
+ * For migration.
+ */
+
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long scan_lru_pages(unsigned long start, unsigned long end);
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
#endif
Index: mmotm-1117/mm/memory_hotplug.c
===================================================================
--- mmotm-1117.orig/mm/memory_hotplug.c
+++ mmotm-1117/mm/memory_hotplug.c
@@ -615,114 +615,6 @@ int is_mem_section_removable(unsigned lo
}
/*
- * Confirm all pages in a range [start, end) is belongs to the same zone.
- */
-static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
-{
- unsigned long pfn;
- struct zone *zone = NULL;
- struct ...Reviewed-by: Minchan Kim <minchan.kim@gmail.com> -- Kind regards, Minchan Kim --
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Unlike memory hotplug, at an allocation of contigous memory range, address
may not be a problem. IOW, if a requester of memory wants to allocate 100M of
of contigous memory, placement of allocated memory may not be a problem.
So, "finding a range of memory which seems to be MOVABLE" is required.
This patch adds a functon to isolate a length of memory within [start, end).
This function returns a pfn which is 1st page of isolated contigous chunk
of given length within [start, end).
If no_search=true is passed as argument, start address is always same to
the specified "base" addresss.
After isolation, free memory within this area will never be allocated.
But some pages will remain as "Used/LRU" pages. They should be dropped by
page reclaim or migration.
Changelog: 2010-11-17
- fixed some conding style (if-then-else)
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/page_isolation.c | 146 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 146 insertions(+)
Index: mmotm-1117/mm/page_isolation.c
===================================================================
--- mmotm-1117.orig/mm/page_isolation.c
+++ mmotm-1117/mm/page_isolation.c
@@ -7,6 +7,7 @@
#include <linux/pageblock-flags.h>
#include <linux/memcontrol.h>
#include <linux/migrate.h>
+#include <linux/memory_hotplug.h>
#include <linux/mm_inline.h>
#include "internal.h"
@@ -250,3 +251,148 @@ int do_migrate_range(unsigned long start
out:
return ret;
}
+
+/*
+ * Functions for getting contiguous MOVABLE pages in a zone.
+ */
+struct page_range {
+ unsigned long base; /* Base address of searching contigouous block */
+ unsigned long end;
+ unsigned long pages;/* Length of contiguous block */
+ int align_order;
+ unsigned long align_mask;
+};
+
+int __get_contig_block(unsigned long pfn, unsigned long nr_pages, void *arg)
+{
+ struct page_range *blockinfo = arg;
+ unsigned long ...Acked-by: Minchan Kim <minchan.kim@gmail.com> Just some trivial comment below. Intentionally, I don't add Reviewed-by. Instead of it, I add Acked-by since I support this work. I reviewed your old version but have forgot it. :( -- Kind regards, Minchan Kim --
On Mon, 22 Nov 2010 00:21:31 +0900 ok. Thanks, -Kame --
On Fri, Nov 19, 2010 at 5:14 PM, KAMEZAWA Hiroyuki Nitpick. You used nr_pages in other place. Does we really need this field 'align_mask'? Could we make sure pass __trim_zone is to satisfy whole pfn in zone what we want. Repeated the zone check is rather annoying. I mean let's __get_contig_block or __trim_zone already does check zone Could we check get_pageblock_migratetype(page) == MIGRATE_MOVABLE in If the base is 0, isn't it impossible return pfn 0? x86 in FLAT isn't impossible but I think some architecture might be possible. Just guessing. How about returning negative value and return first page pfn and last -- Kind regards, Minchan Kim --
On Mon, 22 Nov 2010 20:20:14 +0900 I'm not sure that's very good. pageblock-type can be fragmented and even if pageblock-type is not MIGRATABLE, all pages in pageblock may be free. Hmm, will add a check. Thanks, --
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Add an function to allocate contiguous memory larger than MAX_ORDER. The main difference between usual page allocator is that this uses memory offline technique (Isolate pages and migrate remaining pages.). I think this is not 100% solution because we can't avoid fragmentation, but we have kernelcore= boot option and can create MOVABLE zone. That helps us to allow allocate a contiguous range on demand. The new function is alloc_contig_pages(base, end, nr_pages, alignment) This function will allocate contiguous pages of nr_pages from the range [base, end). If [base, end) is bigger than nr_pages, some pfn which meats alignment will be allocated. If alignment is smaller than MAX_ORDER, it will be raised to be MAX_ORDER. __alloc_contig_pages() has much more arguments. Some drivers allocates contig pages by bootmem or hiding some memory from the kernel at boot. But if contig pages are necessary only in some situation, kernelcore= boot option and using page migration is a choice. Changelog: 2010-11-19 - removed no_search - removed some drain_ functions because they are heavy. - check -ENOMEM case Changelog: 2010-10-26 - support gfp_t - support zonelist/nodemask - support [base, end) - support alignment Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- include/linux/page-isolation.h | 15 ++ mm/page_alloc.c | 29 ++++ mm/page_isolation.c | 242 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 286 insertions(+) Index: mmotm-1117/mm/page_isolation.c =================================================================== --- mmotm-1117.orig/mm/page_isolation.c +++ mmotm-1117/mm/page_isolation.c @@ -5,6 +5,7 @@ #include <linux/mm.h> #include <linux/page-isolation.h> #include <linux/pageblock-flags.h> +#include <linux/swap.h> #include <linux/memcontrol.h> #include <linux/migrate.h> #include <linux/memory_hotplug.h> @@ -396,3 ...
Acked-by: Minchan Kim <minchan.kim@gmail.com> We need include #include <linux/bootmem.h> for using max_pfn. -- Kind regards, Minchan Kim --
On Mon, 22 Nov 2010 00:25:56 +0900 will add that. Thanks, -Kame --
On Fri, Nov 19, 2010 at 5:15 PM, KAMEZAWA Hiroyuki And later we can use compaction and reclaim, too. I understand the goal of function. Personally, I don't like the function name. How about "__adjust_search_range"? Why do we have to care about __GFP_IO|__GFP_FS? -- Kind regards, Minchan Kim --
On Mon, 22 Nov 2010 20:44:03 +0900 Ah, yes. I'll check this was for what and remove this. Thanks, -Kame --
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Old story. Because we cannot assume which memory section will be offlined next, hotremove_migrate_alloc() just uses alloc_page(). i.e. make no decision where the page should be migrate into. Considering memory hotplug's nature, the next memory section near to a section which is being removed will be removed in the next. So, migrate pages to the same node of original page doesn't make sense in many case, it just increases load. Migration destination page is allocated from the node where offlining script runs. Now, contiguous-alloc uses do_migrate_range(). In this case, migration destination node should be the same node of migration source page. This patch modifies hotremove_migrate_alloc() and pass "nid" to it. Memory hotremove will pass -1. So, if the page will be moved to the node where offlining script runs....no behavior changes. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- include/linux/page-isolation.h | 3 ++- mm/memory_hotplug.c | 2 +- mm/page_isolation.c | 21 ++++++++++++++++----- 3 files changed, 19 insertions(+), 7 deletions(-) Index: mmotm-1117/include/linux/page-isolation.h =================================================================== --- mmotm-1117.orig/include/linux/page-isolation.h +++ mmotm-1117/include/linux/page-isolation.h @@ -41,7 +41,8 @@ extern void alloc_contig_freed_pages(uns int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn); unsigned long scan_lru_pages(unsigned long start, unsigned long end); -int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn); +int do_migrate_range(unsigned long start_pfn, + unsigned long end_pfn, int node); /* * For large alloc. Index: mmotm-1117/mm/memory_hotplug.c =================================================================== --- mmotm-1117.orig/mm/memory_hotplug.c +++ mmotm-1117/mm/memory_hotplug.c @@ -724,7 +724,7 @@ repeat: pfn = ...
On Fri, Nov 19, 2010 at 5:16 PM, KAMEZAWA Hiroyuki Reviewed-by: Minchan Kim <minchan.kim@gmail.com> -- Kind regards, Minchan Kim --
On Fri, 19 Nov 2010 17:10:33 +0900 So this is an alternatve implementation for the functionality offered From where I sit, feedback from the embedded guys is *vital*, because they are indeed the main users. Michal, I haven't made a note of all the people who are interested in and who are potential users of this code. Your patch series has a billion cc's and is up to version 6. Could I ask that you review and test this code, and also hunt down other people (probably at other organisations) who can do likewise for us? Because until we hear from those people that this work satisfies their needs, we can't really proceed much further. Thanks. --
On Fri, 19 Nov 2010 12:56:53 -0800 Yes, this will be a backends for that kind of works. I think there are two ways to allocate contiguous pages larger than MAX_ORDER. 1) hide some memory at boot and add an another memory allocator. 2) support a range allocator as [start, end) This is an trial from 2). I used memory-hotplug technique because I know some. This patch itself has no "map" and "management" function, so it should be yes. please. Thanks, -Kame --
As a matter of fact CMA's v6 tries to use code "borrowed" from the alloc_contig_pages()
patches.
The most important difference is that alloc_contig_pages() would look for a chunk
of memory that can be allocated and then perform migration whereas CMA assumes that
regions it controls are always "migratable".
Yes, this is also a valid point. From my use cases, the alloc_contig_pages()
Ah, yes... I was thinking about shrinking the cc list but didn't want to
seem rude or anything removing ppl who have shown interest in the previous
A few things than:
1. As Felipe mentioned, on ARM it is often desired to have the memory
mapped as non-cacheable, which most often mean that the memory never
reaches the page allocator. This means, that alloc_contig_pages()
would not be suitable for cases where one needs such memory.
Or could this be overcome by adding the memory back as highmem? But
then, it would force to compile in highmem support even if platform
does not really need it.
2. Device drivers should not by themselves know what ranges of memory to
allocate memory from. Moreover, some device drivers could require
allocation different buffers from different ranges. As such, this
would require some management code on top of alloc_contig_pages().
3. When posting hwmem, Johan Mossberg mentioned that he'd like to see
notion of "pinning" chunks (so that not-pinned chunks can be moved
around when hardware does not use them to defragment memory). This
would again require some management code on top of
alloc_contig_pages().
4. I might be mistaken here, but the way I understand ZONE_MOVABLE work
is that it is cut of from the end of memory. Or am I talking nonsense?
My concern is that at least one chip I'm working with requires
allocations from different memory banks which would basically mean that
there would have to be two movable zones, ie:
+-------------------+-------------------+
| Memory ...On Tue, 23 Nov 2010 16:46:03 +0100 I'll continue to update patches, you can freely reuse my code and integrate this set to yours. I works for this firstly for EMBEDED but I want this to be a _generic_ function for gerenal purpose architecture. There may be guys who want 1G page on a host with tons of free memory. Thanks, -Kame --
On Fri, Nov 19, 2010 at 10:56 PM, Andrew Morton Actually, now that's not needed any more by using memblock: As I've explained before, a contiguous memory allocator would be nice, but on ARM many drivers not only need contiguous memory, but non-cacheable, and this requires removing the memory from normal kernel mapping in early boot. Cheers. -- Felipe Contreras --
I see them more as orthogonal: Michal's code relies on preallocation and manages the memory after that. This code supplies the infrastructure to replace preallocation with just using movable zones. -Andi --
Yes and no. The v6 version adds not-yet-finished support for sharing the preallocated blocks with page allocator (so if CMA is not using the memory, page allocator can allocate it, and when CMA finally wants to use it the allocated pages are migrated). In the v6 implementation I have added a new migration type (I cannot seem to find who proposed such approach first). When I'll end debugging the code I'll try to work things out without adding additional entity (that is new migration type). -- Best regards, _ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Michał "mina86" Nazarewicz (o o) +----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo-- --
