Hello everyone,
The following patchset implements a Contiguous Memory Allocator. For
those who have not yet stumbled across CMA an excerpt from
documentation:
The Contiguous Memory Allocator (CMA) is a framework, which allows
setting up a machine-specific configuration for physically-contiguous
memory management. Memory for devices is then allocated according
to that configuration.
The main role of the framework is not to allocate memory, but to
parse and manage memory configurations, as well as to act as an
in-between between device drivers and pluggable allocators. It is
thus not tied to any memory allocation method or strategy.
For more information please refer to the second patch from the
patchset which contains the documentation.
Links to the previous versions of the patchsets:
v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573/>
v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986/>
v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669/>
v4: 1. The "asterisk" flag has been removed in favour of requiring
that platform will provide a "*=<regions>" rule in the map
attribute.
2. The terminology has been changed slightly renaming "kind" to
"type" of memory. In the previous revisions, the documentation
indicated that device drivers define memory kinds and now,
v3: 1. The command line parameters have been removed (and moved to
a separate patch, the fourth one). As a consequence, the
cma_set_defaults() function has been changed -- it no longer
accepts a string with list of regions but an array of regions.
2. The "asterisk" attribute has been removed. Now, each region
has an "asterisk" flag which lets one specify whether this
region should by considered "asterisk" region.
3. SysFS support has been moved to a separate patch (the third one
in the series) and now also includes list of regions.
v2: 1. The "cma_map" command line have ...Added a rb_root_init() function which initialises a rb_root
structure as a red-black tree with at most one element. The
rationale is that using rb_root_init(root, node) is more
straightforward and cleaner then first initialising and
empty tree followed by an insert operation.
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
include/linux/rbtree.h | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)
diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h
index 7066acb..5b6dc66 100644
--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -130,6 +130,17 @@ static inline void rb_set_color(struct rb_node *rb, int color)
}
#define RB_ROOT (struct rb_root) { NULL, }
+
+static inline void rb_root_init(struct rb_root *root, struct rb_node *node)
+{
+ root->rb_node = node;
+ if (node) {
+ node->rb_parent_color = RB_BLACK; /* black, no parent */
+ node->rb_left = NULL;
+ node->rb_right = NULL;
+ }
+}
+
#define rb_entry(ptr, type, member) container_of(ptr, type, member)
#define RB_EMPTY_ROOT(root) ((root)->rb_node == NULL)
--
1.7.1
--
The Contiguous Memory Allocator framework is a set of APIs for allocating physically contiguous chunks of memory. Various chips require contiguous blocks of memory to operate. Those chips include devices such as cameras, hardware video decoders and encoders, etc. The code is highly modular and customisable to suit the needs of various users. Set of regions reserved for CMA can be configured per-platform and it is easy to add custom allocator algorithms if one has such need. Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Pawel Osciak <p.osciak@samsung.com> --- Documentation/00-INDEX | 2 + Documentation/contiguous-memory.txt | 541 +++++++++++++++++++++ include/linux/cma.h | 431 +++++++++++++++++ mm/Kconfig | 34 ++ mm/Makefile | 2 + mm/cma-best-fit.c | 407 ++++++++++++++++ mm/cma.c | 910 +++++++++++++++++++++++++++++++++++ 7 files changed, 2327 insertions(+), 0 deletions(-) create mode 100644 Documentation/contiguous-memory.txt create mode 100644 include/linux/cma.h create mode 100644 mm/cma-best-fit.c create mode 100644 mm/cma.c diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX index 8dfc670..f93e787 100644 --- a/Documentation/00-INDEX +++ b/Documentation/00-INDEX @@ -94,6 +94,8 @@ connector/ - docs on the netlink based userspace<->kernel space communication mod. console/ - documentation on Linux console drivers. +contiguous-memory.txt + - documentation on physically-contiguous memory allocation framework. cpu-freq/ - info on CPU frequency and voltage scaling. cpu-hotplug.txt diff --git a/Documentation/contiguous-memory.txt b/Documentation/contiguous-memory.txt new file mode 100644 index 0000000..8fc2400 --- /dev/null +++ b/Documentation/contiguous-memory.txt @@ -0,0 +1,541 @@ + ...
The SysFS development interface lets one change the map attribute at run time as well as observe what regions have been reserved. Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> --- .../ABI/testing/sysfs-kernel-mm-contiguous | 53 +++ Documentation/contiguous-memory.txt | 4 + include/linux/cma.h | 7 + mm/Kconfig | 18 +- mm/cma.c | 345 +++++++++++++++++++- 5 files changed, 423 insertions(+), 4 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-kernel-mm-contiguous diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-contiguous b/Documentation/ABI/testing/sysfs-kernel-mm-contiguous new file mode 100644 index 0000000..8df15bc --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-mm-contiguous @@ -0,0 +1,53 @@ +What: /sys/kernel/mm/contiguous/ +Date: August 2010 +Contact: Michal Nazarewicz <m.nazarewicz@samsung.com> +Description: + If CMA has been built with SysFS support, + /sys/kernel/mm/contiguous/ contains a file called + "map", a file called "allocators" and a directory + called "regions". + + The "map" file lets one change the CMA's map attribute + at run-time. + + The "allocators" file list all registered allocators. + Allocators with no name are listed as a single minus + sign. + + The "regions" directory list all reserved regions. + + For more details see + Documentation/contiguous-memory.txt. + +What: /sys/kernel/mm/contiguous/regions/ +Date: August 2010 +Contact: Michal Nazarewicz <m.nazarewicz@samsung.com> +Description: + The /sys/kernel/mm/contiguous/regions/ directory + contain directories for each registered CMA region. + The name of the directory is the same as the start + address of the region. + + If region is named there is also a symbolic link named + like the region pointing to the ...
This patch adds a "cma" misc device which lets user space use the CMA API. This device is meant for testing. A testing application is also provided. Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> --- drivers/misc/Kconfig | 8 + drivers/misc/Makefile | 1 + drivers/misc/cma-dev.c | 185 ++++++++++++++++++++++++ include/linux/cma.h | 30 ++++ tools/cma/cma-test.c | 373 ++++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 597 insertions(+), 0 deletions(-) create mode 100644 drivers/misc/cma-dev.c create mode 100644 tools/cma/cma-test.c diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 0b591b6..f93e812 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -395,4 +395,12 @@ source "drivers/misc/eeprom/Kconfig" source "drivers/misc/cb710/Kconfig" source "drivers/misc/iwmc3200top/Kconfig" +config CMA_DEVICE + tristate "CMA misc device (DEVELOPEMENT)" + depends on CMA_DEVELOPEMENT + help + The CMA misc device allows allocating contiguous memory areas + from user space. This is mostly for testing of the CMA + framework. + endif # MISC_DEVICES diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index 255a80d..2e82898 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -35,3 +35,4 @@ obj-y += eeprom/ obj-y += cb710/ obj-$(CONFIG_VMWARE_BALLOON) += vmware_balloon.o obj-$(CONFIG_ARM_CHARLCD) += arm-charlcd.o +obj-$(CONFIG_CMA_DEVICE) += cma-dev.o diff --git a/drivers/misc/cma-dev.c b/drivers/misc/cma-dev.c new file mode 100644 index 0000000..de534f0 --- /dev/null +++ b/drivers/misc/cma-dev.c @@ -0,0 +1,185 @@ +/* + * Contiguous Memory Allocator userspace driver + * Copyright (c) 2010 by Samsung Electronics. + * Written by Michal Nazarewicz (m.nazarewicz@samsung.com) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License ...
Added the CMA initialisation code to two Samsung platforms.
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
arch/arm/mach-s5pv210/mach-aquila.c | 31 +++++++++++++++++++++++++++++++
arch/arm/mach-s5pv210/mach-goni.c | 31 +++++++++++++++++++++++++++++++
2 files changed, 62 insertions(+), 0 deletions(-)
diff --git a/arch/arm/mach-s5pv210/mach-aquila.c b/arch/arm/mach-s5pv210/mach-aquila.c
index 0dda801..3561859 100644
--- a/arch/arm/mach-s5pv210/mach-aquila.c
+++ b/arch/arm/mach-s5pv210/mach-aquila.c
@@ -19,6 +19,7 @@
#include <linux/gpio_keys.h>
#include <linux/input.h>
#include <linux/gpio.h>
+#include <linux/cma.h>
#include <asm/mach/arch.h>
#include <asm/mach/map.h>
@@ -493,6 +494,35 @@ static void __init aquila_map_io(void)
s3c24xx_init_uarts(aquila_uartcfgs, ARRAY_SIZE(aquila_uartcfgs));
}
+static void __init aquila_reserve(void)
+{
+ static struct cma_region regions[] = {
+ {
+ .name = "fw",
+ .size = 1 << 20,
+ { .alignment = 128 << 10 },
+ },
+ {
+ .name = "b1",
+ .size = 32 << 20,
+ .asterisk = 1,
+ },
+ {
+ .name = "b2",
+ .size = 16 << 20,
+ .start = 0x40000000,
+ .asterisk = 1,
+ },
+ { }
+ };
+
+ static const char map[] __initconst =
+ "s3c-mfc5/f=fw;s3c-mfc5/a=b1;s3c-mfc5/b=b2";
+
+ cma_set_defaults(regions, map);
+ cma_early_regions_reserve(NULL);
+}
+
static void __init aquila_machine_init(void)
{
/* PMIC */
@@ -523,4 +553,5 @@ MACHINE_START(AQUILA, "Aquila")
.map_io = aquila_map_io,
.init_machine = aquila_machine_init,
.timer = &s3c24xx_timer,
+ .reserve = aquila_reserve,
MACHINE_END
diff --git a/arch/arm/mach-s5pv210/mach-goni.c b/arch/arm/mach-s5pv210/mach-goni.c
index 53754d7..edeb93f 100644
--- a/arch/arm/mach-s5pv210/mach-goni.c
+++ b/arch/arm/mach-s5pv210/mach-goni.c
@@ -19,6 +19,7 @@
#include <linux/gpio_keys.h>
#include <linux/input.h>
#include <linux/gpio.h>
+#include ...This patch adds a pair of early parameters ("cma" and
"cma.map") which let one override the CMA configuration
given by platform without the need to recompile the kernel.
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
Documentation/contiguous-memory.txt | 85 ++++++++++++++++++++++--
Documentation/kernel-parameters.txt | 7 ++
mm/Kconfig | 6 ++
mm/cma.c | 125 +++++++++++++++++++++++++++++++++++
4 files changed, 218 insertions(+), 5 deletions(-)
diff --git a/Documentation/contiguous-memory.txt b/Documentation/contiguous-memory.txt
index 8d189b8..95faec1 100644
--- a/Documentation/contiguous-memory.txt
+++ b/Documentation/contiguous-memory.txt
@@ -88,6 +88,20 @@
early region and the framework will handle the rest
including choosing the right early allocator.
+ 4. CMA allows a run-time configuration of the memory regions it
+ will use to allocate chunks of memory from. The set of memory
+ regions is given on command line so it can be easily changed
+ without the need for recompiling the kernel.
+
+ Each region has it's own size, alignment demand, a start
+ address (physical address where it should be placed) and an
+ allocator algorithm assigned to the region.
+
+ This means that there can be different algorithms running at
+ the same time, if different devices on the platform have
+ distinct memory usage characteristics and different algorithm
+ match those the best way.
+
** Use cases
Let's analyse some imaginary system that uses the CMA to see how
@@ -162,7 +176,6 @@
This solution also shows how with CMA you can assign private pools
of memory to each device if that is required.
-
Allocation mechanisms can be replaced dynamically in a similar
manner as well. Let's say that during testing, it has been
...Whats the rationale for having those #ifdef CONFIG_CMA_SYSFS sprinkled in the C code? Is SysFS not used on StrongARM? Why not implicitly include the SysFS support? --
The SysFS CMA interface is meant for development only and because of that I decided to separate it form the core in a separate patch and enable it only when explicitly requested. -- Best regards, _ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Michał "mina86" Nazarewicz (o o) +----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo-- --
I am not that familiar with how StrongARM works, and I took a bit look
at the arch/arm/mach-s* and then some of the
drivers/video/media/video/cx88 to get an idea how the hardware video
decoders would work this.
What I got from this patch review is that you are writting an IOMMU
that is on steroids. It essentially knows that this device and that
device can both share the same region, and it has fancy plugin system
to deal with fragmentation and offers an simple API for other to
write their own "allocators".
Even better, during init, the sub-platform can use
cma_early_regions_reserve(<func>) to register their own function
for reserving large regions of memory. Which from my experience (with
Xen) means that there is a mechanism in place to have it setup
contingous regions using sub-platform code.
This is how I think it works, but I am not sure if I got it right. From
looking at 'cma_alloc' and 'cma_alloc_from_region' - both return
an dma_addr_t, which is what is usually feed in the DMA API. And looking
at the cx88 driver I see it using that API..
I do understand that under ARM platform you might not have a need for
DMA at all, and you use the 'dma_addr_t' just as handles, but for
other platforms this would be used.
So here is the bit where I am confused. Why not have this
as Software IOMMU that would utilize the IOMMU API? There would be some
technical questions to be answered (such as, what to do when you have
another IOMMU and can you stack them on top of each other).
A light review below:
Should be 'bus address'. On some platforms the physical != PCI address.
Uh, really? Why? Why not just simplify your life and make it \0?
^^- C++, eh?
^^^^^^^^^ - It is gone, isn't?
I am having a hard time understanding that statement. Can you simplify
it a bit?
Those two #ifdefs are pretty ugly. What if you defined in a header
something along this:
#ifdef CONFIG_HAVE_MEMBLOCK
int __init ...No. CMA's designed for systems without IOMMU. If system has IOMMU then there is no need for contiguous memory blocks since all discontiguousnesses Essentially that's the idea. Platform init code adds early regions and later on reserves memory for all of the early regions. For the former some In the first version I've used unsigned long as return type but then it was suggested that maybe dma_addr_t would be better. This is easily If I understood you correctly this is something I'm thinking about. I'm actually thinking of ways to integrate CMA with Zach's IOMMU proposal posted some time ago. The idea would be to define a subset of functionalities of the IOMMU API that would work on systems with and without hardware IOMMU. If platform had no IOMMU CMA would be used. I'm currently trying to fully understand Zach's proposal to see how such an This is a consequence of how map is stored. It's stored as a single string I wanted the function to try all possible allocators. As a matter of fact, Actually, I would prefer to leave it. It may be useful for platform initialisation code. Especially if platform has some special regions which are allocated in a different but for the rest wants to use the default CMA's reserve call. -- Best regards, _ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Michał "mina86" Nazarewicz (o o) +----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo-- --
The Contiguous Memory Allocator framework is a set of APIs for allocating physically contiguous chunks of memory. Various chips require contiguous blocks of memory to operate. Those chips include devices such as cameras, hardware video decoders and encoders, etc. The code is highly modular and customisable to suit the needs of various users. Set of regions reserved for CMA can be configured per-platform and it is easy to add custom allocator algorithms if one has such need. Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Pawel Osciak <p.osciak@samsung.com> --- Just a quick bugfix (i n certain conditions, CMA went into infinite loop) and update in response to Konrad's comments. Documentation/00-INDEX | 2 + Documentation/contiguous-memory.txt | 544 +++++++++++++++++++++ include/linux/cma.h | 432 +++++++++++++++++ mm/Kconfig | 34 ++ mm/Makefile | 2 + mm/cma-best-fit.c | 407 ++++++++++++++++ mm/cma.c | 911 +++++++++++++++++++++++++++++++++++ 7 files changed, 2332 insertions(+), 0 deletions(-) create mode 100644 Documentation/contiguous-memory.txt create mode 100644 include/linux/cma.h create mode 100644 mm/cma-best-fit.c create mode 100644 mm/cma.c diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX index 8dfc670..f93e787 100644 --- a/Documentation/00-INDEX +++ b/Documentation/00-INDEX @@ -94,6 +94,8 @@ connector/ - docs on the netlink based userspace<->kernel space communication mod. console/ - documentation on Linux console drivers. +contiguous-memory.txt + - documentation on physically-contiguous memory allocation framework. cpu-freq/ - info on CPU frequency and voltage scaling. cpu-hotplug.txt diff --git a/Documentation/contiguous-memory.txt b/Documentation/contiguous-memory.txt new file mode 100644 index ...
Please do not consider this a proper review. I'm only glancing through So more than 6MB of memory means the page allocator cannot automatically grant the requests. That's fine but I'd still like to be as close to the An important consideration is if the alignment is always a natural alignment? i.e. a 64K buffer must be 64K aligned, 128K must be 128K aligned etc. I ask because the buddy allocator is great at granting natural alignments If drivers are using bootmem and custom allocators, I agree that some common framework is needed. If every device depended on bootmem, there would be huge chunks of unusable memory. i.e. At first glance, I think this is important It'd be very nice if the shared regions could also be used by normal movable memory allocations to minimise the amount of wastage. I imagine this would be particularly important on memory-constrained devices. So right now, we have #define MIGRATE_UNMOVABLE 0 #define MIGRATE_RECLAIMABLE 1 #define MIGRATE_MOVABLE 2 #define MIGRATE_PCPTYPES 3 /* the number of types on the pcp lists */ #define MIGRATE_RESERVE 3 #define MIGRATE_ISOLATE 4 /* can't allocate from here */ #define MIGRATE_TYPES 5 Conceptually speaking we also want MIGRATE_MOVABLE_STICKY /* Set by CMA, used by CMA and GFP_MOVABLE */ MIGRATE_MOVABLE_EXCLUSIVE /* Set by CMA, exclusive use of a device */ Sticky would be usable by the page allocator and other than forcing the migrate_type to be MIGRATE_MOVABLE, it would otherwise be normal memory. Exclusive would be isolated from normal usage by taking the pages from the normal free lists and putting them on a free list managed by CMA. Normally the page allocator uses zone->free_area[] for its free lists. The allocator would need to handle either free_area from a zone or one provided by a device using CMA. Would be tricky to pass through admittedly. I recognise this is not straight-forward so consider these to be suggestions, not requirements. Glancing through, ...
I'm not sure what you mean by "natural alignment". If 1M alignment of a 64K buffer is natural then yes, presented API requires alignment to be natural. In short, alignment must be a power of two and is never less then a PAGE_SIZE but can be more Yes. I hope to came up with a CMA version that will allow reserved spec to be reused by the rest of memory management code. For now, I won't respond to your suggestions regarding the use of page allocator but I hope to write something later today in response to Peter's and Minchan's mails. I'll make sure to cc you It's in the header to allow cma_info() and cma_info_about() to be static inlines. The idea is not to generate too many exported symbols. Also, it's not like usage No WARN() is generated but -ENOENT is returned so it is considered an error. I've also changed the code to use pr_err() when chunk is not found (it used pr_debug() previously). I'm still wondering whether the use of address is the best idea or whether passing a cma_chunk structure would be a better option. In this way, cma_alloc() It's meant to be used by drivers even though the idea is that most drivers will I'm not sure how resources.c could be reused. It puts resources in hierarchy whereas CMA does not care about hierarchy that much plus has only two levels I dunno, I first created two header files but then decided to put everything in one file. I dunno if anything is gained by exporting a few functions to Why wouldn't it? All it says is that this particular file can be distributed under GPLv2 or GPLv3 (or any later if FSF decides to publish updated version). I haven't looked at bootmem in such perspective. I'll add that to my TODO list. On the other hand, however, it seems bootmem is passée so I'm not sure if it's a good idea to integrate with it that much. -- Best regards, _ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Michał "mina86" Nazarewicz ...
Just wondering: is alignment really needed since we already align to the
This really should not be optional but compulsory. 'type' has the same function
as the GFP flags with kmalloc. They tell the kernel where the memory should be
allocated. Only if you do not care at all can you pass in NULL. But in almost
all cases the memory should be at least DMA-able (and yes, for a lot of SoCs that
is the same as any memory -- for now).
Memory types should be defined in the platform code. Some can be generic
like 'dma' (i.e. any DMAable memory), 'dma32' (32-bit DMA) and 'common' (any
memory). Others are platform specific like 'banka' and 'bankb'.
A memory type definition can either be a start address/size pair but it can
perhaps also be a GFP type (e.g. .name = "dma32", .gfp = GFP_DMA32).
Regions should be of a single memory type. So when you define the region it
should have a memory type field.
Drivers request memory of whatever type they require. The mapping just maps
one or more regions to the driver and the cma allocator will pick only those
So this would become something like this:
static struct cma_memtype types[] = {
{ .name = "a", .size = 32 << 20 },
{ .name = "b", .size = 32 << 20, .start = 512 << 20 },
// For example:
{ .name = "dma", .gfp = GFP_DMA },
{ }
}
static struct cma_region regions[] = {
// size may of course be smaller than the memtype size.
{ .name = "a", type = "a", .size = 32 << 20 },
{ .name = "b", type = "b", .size = 32 << 20 },
{ }
}
static const char map[] __initconst = "*=a,b";
No need to do anything special for driver foo here: cma_alloc will pick the
correct region based on the memory type requested by the driver.
It is probably no longer needed to specify the memory type in the mapping when
This would be something for the driver to ...Our video coder needs its firmware aligned to 128K plus it has to be located before any other buffers allocated for the chip. Because of those, we have defined a separate region just for the coder's firmware which is small (256K I'm not entirely happy with such scheme. For one, types may overlap: ie. the whole "banka" may be "dma" as well. This means that a single region could be of several different types. Moreover, as I've mentioned the video coder needs to allocate buffers from different banks. However, on never platform there's only one bank (actually two but they are interlaced) so allocations from different banks no longer make sense. Instead of changing the driver though I'd prefer to only change the mapping in the platform. -- Best regards, _ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Michał "mina86" Nazarewicz (o o) +----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo-- --
So the idea is to grab a large chunk of memory at boot time and then later allow some device to use it? I'd much rather we'd improve the regular page allocator to be smarter about this. We recently added a lot of smarts to it like memory compaction, which allows large gobs of contiguous memory to be freed for things like huge pages. If you want guarantees you can free stuff, why not add constraints to the page allocation type and only allow MIGRATE_MOVABLE pages inside a certain region, those pages are easily freed/moved aside to satisfy large contiguous allocations. Also, please remove --chain-reply-to from your git config. You're using 1.7 which should do the right thing (--no-chain-reply-to) by default. --
On Fri, 20 Aug 2010 15:15:10 +0200 That would be good. Although I expect that the allocation would need to be 100% rock-solid reliable, otherwise the end user has a non-functioning device. Could generic core VM provide the required level of service? Anyway, these patches are going to be hard to merge but not impossible. Keep going. Part of the problem is cultural, really: the consumers of this interface are weird dinky little devices which the core MM guys tend not to work with a lot, and it adds code which they wouldn't use. I agree that having two "contiguous memory allocators" floating about on the list is distressing. Are we really all 100% diligently certain that there is no commonality here with Zach's work? I agree that Peter's above suggestion would be the best thing to do. Please let's take a look at that without getting into sunk cost fallacies with existing code! It would help (a lot) if we could get more attention and buyin and fedback from the potential clients of this code. rmk's feedback is valuable. Have we heard from the linux-media people? What other subsystems might use it? ieee1394 perhaps? Please help identify specific subsystems and I can perhaps help to wake people up. And I agree that this code (or one of its alternatives!) would benefit from having a core MM person take a close interest. Any volunteers? Please cc me on future emails on this topic? --
There is some commonality with Zach's work, but Zach should be following all of this development .. So presumably he has no issues with Michal's changes. I think Zach's solution has a similar direction to this. If Michal is active (he seems more so than Zach), and follows community comments (including Zach's , but I haven't seen any) then we can defer to that solution .. Daniel -- Sent by a consultant of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. --
Comments are always welcome. :) -- Best regards, _ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Michał "mina86" Nazarewicz (o o) +----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo-- --
On Wed, 25 Aug 2010 15:58:14 -0700 The original OLPC has a camera controller which requires three contiguous, image-sized buffers in memory. That system is a little memory constrained (OK, it's desperately short of memory), so, in the past, the chances of being able to allocate those buffers anytime some kid decides to start taking pictures was poor. Thus, cafe_ccic.c has an option to snag the memory at initialization time and never let go even if you threaten its family. Hell hath no fury like a little kid whose new toy^W educational tool stops taking pictures. That, of course, is not a hugely efficient use of memory on a memory-constrained system. If the VM could reliably satisfy those allocation requestss, life would be wonderful. Seems difficult. But it would be a nicer solution than CMA, which, to a great extent, is really If this code had been present when I did the Cafe driver, I would have used it. I think it could be made useful to a number of low-end camera drivers if the videobuf layer were made to talk to it in a way which Just Works. With a bit of tweaking, I think it could be made useful in other situations: the viafb driver, for example, really needs an allocator for framebuffer memory and it seems silly to create one from scratch. Of course, there might be other possible solutions, like adding a "zones" concept to LMB^W memblock. The problem which is being addressed here is real. That said, the complexity of the solution still bugs me a bit, and the core idea is still to take big chunks of memory out of service for specific needs. It would be far better if the VM could just provide big chunks on demand. Perhaps compaction and the pressures of making transparent huge pages work will get us there, but I'm not sure we're there yet. jon --
The main problem is of course fragmentation, for this there is no solution in CMA. It has a feature intended to at least reduce memory usage though, if only a little bit. It is region sharing. It allows platform architects to define regions shared by more than one driver, as explained by Michal in the RFC. So we can at least try to reuse each chunk of memory as much as possible and not hold separate regions for each driver when they are not intended to work simultaneously. Not a I am working on new videobuf which will (hopefully) Just Work. CMA is intended to be pluggable into it, as should be any other allocator for that matter. -- Best regards, Pawel Osciak Linux Platform Group Samsung Poland R&D Center --
At this moment it seems nothing more then that but they way I see it is that with a common, standardised, centrally-managed mechanism for grabbing memory we can start thinking about the ways to reuse the memory. If each driver were to grab it's own memory in a way know to itself only the memory is truly lost but with CMA not only regions can be reused among devices but also the framework can manage the unallocated memory and try to utilize it in other ways (movable pages? cache? buffers? some kind of compressed memory swap?). What I'm trying to say is that I totally agree with your and other's comments about CMA essentially grabbing memory and never releasing it but I believe this can be combat with time when overall idea of haw the CMA API should look like is agreed upon. -- Best regards, _ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Michał "mina86" Nazarewicz (o o) +----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo-- --
I agree. compaction and movable zone will be one of good solutions. If some driver needs big contiguous chunk to work, it should make sure to be allowable to have memory size for it before going. To make sure it, we have to consider compaction of ZONE_MOVABLE zone. But one of problems is anonymous page which can be has a role of pinned page in non-swapsystem. Even most of embedded system has no swap. But it's not hard to solve it. We needs Mel's opinion, too. -- Kind regards, Minchan Kim --
I elaborates my statement for preventing confusing due to using _pinned page_. I means that anon pages isn't not a fragment problem but space problem for the devices. -- Kind regards, Minchan Kim --
Well, compaction can move those around, but if you've got too many of them its a simple matter of over-commit and for that we've got the OOM-killer ;-) --
As I said following mail, I said about free space problem. Of course, compaction could move anon pages into somewhere. What's is somewhere? At last, it's same zone. It can prevent fragment problem but not size of free space. So I mean it would be better to move it into another zone(ex, HIGHMEM) rather than OOM kill. -- Kind regards, Minchan Kim --
Real machines don't have highmem, highmem sucks!! /me runs Does cross zone movement really matter, I though these crappy devices were mostly used on crappy hardware with very limited memory, so pretty much everything would be in zone_normal.. no? But sure, if there's really a need we can look at maybe doing cross zone movement. --
It's another topic. I agree highmem isn't a gorgeous. But my desktop isn't real machine? Important thing is that we already have a highmem and many guys No. Until now, many embedded devices have used to small memory. In that case, only there is a DMA zone in system. But as I know, mobile phone starts to use big(?) memory like 1G or above sooner or later. So they starts to use HIGHMEM. Otherwise, 2G/2G space configuration. Some embedded device uses many thread model to port easily from RTOS. In that case, they don't have enough address space for application if it uses 2G/2G model. -- Kind regards, Minchan Kim --
Even more offtopic ;-) I have exactly 0 machines in daily use that use highmem, I had to test that kmap stuff in a 32bit qemu. Sadly some hardware folks still think its a sane thing to do, like ARM announcing 40bit PAE, I mean really?! At least AMD announced a 64bit tiny-chip and hopefully Intel Atom will soon be all 64bit too (please?!). --
On Wed, 25 Aug 2010 15:58:14 -0700 Hmm, you may not like this..but how about following kind of interface ? Now, memoyr hotplug supports following operation to free and _isolate_ memory region. # echo offline > /sys/devices/system/memory/memoryX/state Then, a region of memory will be isolated. (This succeeds if there are free memory.) Add a new interface. % echo offline > /sys/devices/system/memory/memoryX/state # extract memory from System RAM and make them invisible from buddy allocator. % echo cma > /sys/devices/system/memory/memoryX/state # move invisible memory to cma. Then, a chunk of memory will be moved into contiguous-memory-allocator. To move "cma" region as usual region, # echo offline > /sys/devices/system/memory/memoryX/state # echo online > /sys/devices/system/memory/memoryX/state Maybe "used-for-cma" memory are can be populated via /proc/iomem As, 100000000-63fffffff : System RAM 640000000-800000000 : Contiguous RAM (Used for drivers) (And you have to skip small memory holes by seeing this file) Of course, cma guys can keep continue to use their own boot option. With memory hotplug, kernelcore=xxxM interface can be used for creating ZONE_MOVABLE. Some complicated work may be needed as # echo movable > /sys/devices/system/memory/memoryX/state (online pages and move them into ZONE_MOVABLE) If anyone interested in, I may be able to offer some help. Thanks, -Kame --
At this point I need to say that I have no experience with hotplug memory but I think that for this to make sense the regions of memory would have to be smaller. Unless I'm misunderstanding something, the above would convert a region of sizes in order of GiBs to use for CMA. -- Best regards, _ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Michał "mina86" Nazarewicz (o o) +----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo-- --
On Thu, 26 Aug 2010 04:12:10 +0200 Now, x86's section size is == #ifdef CONFIG_X86_32 # ifdef CONFIG_X86_PAE # define SECTION_SIZE_BITS 29 # define MAX_PHYSADDR_BITS 36 # define MAX_PHYSMEM_BITS 36 # else # define SECTION_SIZE_BITS 26 # define MAX_PHYSADDR_BITS 32 # define MAX_PHYSMEM_BITS 32 # endif #else /* CONFIG_X86_32 */ # define SECTION_SIZE_BITS 27 /* matt - 128 is convenient right now */ # define MAX_PHYSADDR_BITS 44 # define MAX_PHYSMEM_BITS 46 #endif == 128MB...too big ? But it's depend on config. IBM's ppc guys used 16MB section, and recently, a new interface to shrink the number of /sys files are added, maybe usable. Something good with this approach will be you can create "cma" memory before installing driver. But yes, complicated and need some works. Bye, -Kame --
On Thu, 26 Aug 2010 11:50:17 +0900 Ah, I need to clarify what I want to say. With compaction, it's helpful, but you can't get contiguous memory larger than MAX_ORDER, I think. To get memory larger than MAX_ORDER on demand, memory hot-plug code has almost all necessary things. you may able to add # echo 0xa0000000-0xa80000000 > /sys/devices/system/memory/cma to get contiguous isolated memory. BTW, just curious...the memory for cma need not to be saved at hibernation ? Or drivers has to write its own hibernation ops by driver suspend udev or some ? Thanks, -Kame --
On embedded systems it may be like half of the RAM. Or a quarter. So bigger That's how CMA works at the moment. But if I understand you correctly, what you are proposing would allow to reserve memory *at* *runtime* long after system Hibernation was not considered as of yet but I think it's device driver's responsibility more then CMA's especially since it may make little sense to save some of the buffers -- ie. no need to keep a frame from camera since it'll be overwritten just after system wakes up from hibernation. It may also be better to stop playback and resume it later on rather than trying to save decoder's state. Again though, I haven't thought about hibernation as of yet. -- Best regards, _ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Michał "mina86" Nazarewicz (o o) +----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo-- --
On Thu, 26 Aug 2010 06:01:56 +0200 mm/memory_hotplug.c::offline_pages() does 1. disallow new allocation of memory in [start_pfn...end_pfn) 2. move all LRU pages to other regions than [start_pfn...end_pfn) 3. finally, mark all pages as PG_reserved (see __offline_isolated_pages()) What's required for cma will be a. remove _section_ limitation, which is done as BUG_ON(). b. replace 'step 3' with cma code. Maybe you can do similar just using compaction logic. The biggest difference will Hmm, ok, use-case dependent and it's a job of a driver. Thanks, -Kame --
Yeah, if we can do this, that will avoid rebooting for kdump to reserve memory. Thanks. --
On Thu, Aug 26, 2010 at 12:44 PM, KAMEZAWA Hiroyuki True. Doesn't patch's idea of Christoph helps this ? http://lwn.net/Articles/200699/ -- Kind regards, Minchan Kim --
Of course, It itself can't meet our requirement but idea of range allocation seem to be good. -- Kind regards, Minchan Kim --
On Thu, 26 Aug 2010 13:06:28 +0900 yes, I think so. But, IIRC, it's own purpose of Chirstoph's work is for removing zones. please be careful what's really necessary. Thanks, -Kame --
On Thu, Aug 26, 2010 at 1:30 PM, KAMEZAWA Hiroyuki Ahh. Sorry for missing point. You're right. The patch can't help our problem. How about changing following this? The thing is MAX_ORDER is static. But we want to avoid too big MAX_ORDER of whole zones to support devices which requires big allocation chunk. So let's add MAX_ORDER into each zone and then, each zone can have different max order. For example, while DMA[32], NORMAL, HIGHMEM can have normal size 11, MOVABLE zone could have a 15. This approach has a big side effect? -- Kind regards, Minchan Kim --
On Thu, 26 Aug 2010 18:36:24 +0900
Hm...need to check hard coded MAX_ORDER usages...I don't think
side-effect is big. Hmm. But I think enlarging MAX_ORDER isn't an
important thing. A code which strips contiguous chunks of pages from
buddy allocator is a necessaty thing, as..
What I can think of at 1st is...
==
int steal_pages(unsigned long start_pfn, unsigned long end_pfn)
{
/* Be careful mutal execution with memory hotplug, because reusing code */
split [start_pfn, end_pfn) to pageblock_order
for each pageblock in the range {
Mark this block as MIGRATE_ISOLATE
try-to-free pages in the range or
migrate pages in the range to somewhere.
/* Here all pages in the range are on buddy allocator
and free and never be allocated by anyone else. */
}
please see __rmqueue_fallback(). it selects migration-type at 1st.
Then, if you can pass start_migratetype of MIGLATE_ISOLATE,
you can automatically strip all MIGRATE_ISOLATE pages from free_area[].
return chunk of pages.
}
==
Thanks,
-Kame
--
The side effect of increasing MAX_ORDER is that page allocations get more expensive since the buddy tree gets larger, yielding more Right, once we can explicitly free the pages we want, crossing MAX_ORDER isn't too hard like you say, we can simply continue with freeing the next in order page. --
On Fri, 27 Aug 2010 17:16:39 +0900
Here is a rough code for this.
I'm sorry I can't have time to show enough good code. Maybe this cannot be
compiled. But you may be able to see what can be done with memory hotplug
or compaction code. I'll brush this up if someone has interest.
==
This is a code for creating isolated memory block of contiguous pages.
find_isolate_contig_block(unsigned long hint, unsigned long size)
will retrun [start, start+size] of isolated pages
- start > hint,
- no memory holes within it.
- page allocator will never touch pages within the range.
Of course, this can fail. This code makes use of memory-hotunplug's code.
But yes, you can think of reusing compaction codes. This is an example.
Not compiled at all...please don't see details.
---
mm/isolation.c | 236 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 236 insertions(+)
Index: kametest/mm/isolation.c
===================================================================
--- /dev/null
+++ kametest/mm/isolation.c
@@ -0,0 +1,233 @@
+struct page_range {
+ unsigned long base, end, pages;
+};
+
+int __get_contig_block(unsigned long pfn, unsigned long nr_pages, void *arg)
+{
+ struct page_range *blockinfo = arg;
+
+ if (nr_pages > blockinfo->pages) {
+ blockinfo->base = pfn;
+ blockinfo->end = pfn + nr_pages;
+ return 1;
+ }
+ return 0;
+}
+
+
+unsigned long __find_contig_block(unsigned long base,
+ unsigned long end, unsigned long pages)
+{
+ unsigned long pfn, tmp, index;
+ struct page_range blockinfo;
+ int ret;
+
+ /* Skip memory holes */
+retry:
+ blockinfo.base = base;
+ blockinfo.end = end;
+ blockinfo.pages = pages;
+ ret = walk_system_ram_range(base, end - base, &blockinfo,
+ __get_contig_block);
+ if (!ret)
+ return 0;
+ /* Ok, we gound contiguous memory chunk of size. Isolate it.*/
+ for (pfn = blockinfo->base; pfn + pages < blockinfo->end;) {
+
+ for (index = 0; index < nr_pages; index += ...On Thu, 2 Sep 2010 17:54:24 +0900
here is a _tested_ one.
If I tested correctly, I allocated 40MB of contigous pages by the new funciton.
I'm grad this can be some hints for people.
Thanks,
-Kame
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
This patch as a memory allocator for contiguous memory larger than MAX_ORDER.
alloc_contig_pages(hint, size, list);
This function allocates 'size' of contigoues pages, whose physical address
is higher than 'hint'. size is specicied in byte unit.
Allocated pages are all linked into the list and all of their page_count()
are set to 1. Return value is the top page.
free_contig_pages(list)
returns all pages in the list.
This patch does
- find an area which can be ISOLATED.
- migrate remaining pages in the area.
- steal chunk of pages from allocator.
Limitation is:
- retruned pages will be aligend to MAX_ORDER.
- returned length of page will be aligned to MAX_ORDER.
(so, the caller may have to return tails of pages by itself.)
- may allocate contiguous pages which overlap node/zones.
This is fully experimental and written as example.
(Maybe need more patches to make this complete.)
This patch moves some amount of codes from memory_hotplug.c to
page_isolation.c and based on page-offline technique used by
memory_hotplug.c
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
include/linux/page-isolation.h | 10 +
mm/memory_hotplug.c | 84 --------------
mm/page_alloc.c | 32 +++++
mm/page_isolation.c | 244 +++++++++++++++++++++++++++++++++++++++++
4 files changed, 287 insertions(+), 83 deletions(-)
Index: mmotm-0827/mm/page_isolation.c
===================================================================
--- mmotm-0827.orig/mm/page_isolation.c
+++ mmotm-0827/mm/page_isolation.c
@@ -3,8 +3,11 @@
*/
#include <linux/mm.h>
+#include <linux/swap.h>
#include <linux/page-isolation.h>
#include ...Great! I didn't look into the detail but the concept seems to be good. If someone doesn't need complex intelligent(ex, shared, private, [first|best] fit, buddy), this is enough for that. So I think this will be good regardless of CMA. I will look into this more detaily and think idea to improve. Hmm.. I can't understand below loop. -- Kind regards, Minchan Kim --
On Mon, 6 Sep 2010 00:57:53 +0900 hint is physical address. What's annoying me is x86-32, should I use physaddr_t or Ah, the allocator returns MAX_ORDER aligned pages, then, [xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyyyyyyyy] x+y = allocated x = will be used. y = will be unsused. Unnecessary. please consider this as BUG. This code just check pfn of allocated area but doesn't check which zone/node the pfn is tied to. For example, I hear IBM has following kind of memory layout. | Node0 | Node1 | Node2 | Node0 | Node2 | Node1| ..... So, some check should be added to avoid to allocate chunk of pages spreads out to multiple nodes. (I hope walk_page_range() can do enough jobs for us, but I'm not sure. Thank you for review. Regards, -Kame --
Hi Andrew, Thank you for your comments and interest in this! This is encouraging, thanks. Merging a contiguous allocator seems like a lost cause, with a relative disinterest of non-embedded people, and on the other hand because of the difficulty to satisfy those actually interested. With virtually everybody having their own, custom solutions, I think Zach's work is more focused on IOMMU and on unifying virtual memory handling. As far as I understand, any physical allocator can be As a media developer myself, I talked with people and many have expressed their interest. Among them were developers from ST-Ericsson, Intel and TI, to name a few. Their SoCs, like ours at Samsung, require contiguous memory allocation schemes as well. I am working on a driver framework for media for memory management (on the logical, not physical level). One of the goals is to allow plugging in custom allocators and memory handling functions (cache management, etc.). CMA is intended to be used as one of the pluggable allocators for it. Right now, many media drivers have to provide their own, more or less complicated, memory handling, which is of course undesirable. Some of those make it to the kernel, many are maintained outside the mainline. The problem is that, as far as I am aware, there have already been quite a few proposals for such allocators and none made it to the mainline. So companies develop their own solutions and maintain them outside the mainline. I think that the interest is definitely there, but people have their deadlines and assume that it is close to impossible to have a contiguous allocator merged. Your help and support would be very much appreciated. Working in embedded Linux for some time now, I feel that the need is definitely there and is quite substantial. -- Best regards, Pawel Osciak Linux Platform Group Samsung Poland R&D Center --
Hello Andrew, I think Pawel has replied to most of your comments, so I'll just add my own I think that the biggest problem is fragmentation here. For instance, I think that a situation where there is enough free space but it's fragmented so no single contiguous chunk can be allocated is a serious problem. However, I would argue that if there's simply no space left, a multimedia device could fail and even though it's not desirable, it would not be such a big issue in my eyes. So, if only movable or discardable pages are allocated in CMA managed regions all should work well. When a device needs memory discardable pages would get freed and movable moved unless there is no space left on the device in which case allocation would fail. Critical devices (just a hypothetical entities) could have separate regions on which only discardable pages can be allocated so that memory As Pawel said, I think Zach's trying to solve a different problem. No matter, as I've said in response to Konrad's message, I have thought about unifying Zach's IOMMU and CMA in such a way that devices could work on both systems with and without IOMMU if only they would limit Not a problem. -- Best regards, _ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Michał "mina86" Nazarewicz (o o) +----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo-- --
If you'd actually looked at the page allocator you'd see its capable of doing exactly that! I has the notion of movable pages, it can defragment free space (called compaction). Use it! --
For handling fragmentation, there is the option of ZONE_MOVABLE so it's usable by normal allocations but the CMA can take action to get it cleared out if necessary. Another option that is trickier but less disruptive would be to select a range of memory in a normal zone for CMA and mark it MIGRATE_MOVABLE so that movable pages are allocated from it. The trickier part is you need to make that bit stick so that non-movable pages are never allocated from that range. That would be trickish to implement but possible and it would avoid the fragmentation -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
Yes, indeed. And you have to be careful as well how you move pages around. Say that you have a capture and an output v4l device: the first one needs 64 MB contiguous memory and so it allocates that amount, moving pages around as needed. Once allocated that memory is pinned in place since it is needed for DMA. So if the output device also needs 64 MB, then you must have a guarantee that the first allocation didn't fragment the available contiguous memory. I also wonder how expensive it is to move all the pages around. E.g. if you have a digital camera and want to make a hires picture, then it wouldn't do if it takes a second to move all the pages around making room for the captured picture. The CPUs in many SoCs are not very powerful compared to your average desktop. And how would memory allocations in specific memory ranges (e.g. memory banks) work? Note also that these issues are not limited to embedded systems, also PCI(e) boards can sometimes require massive amounts of DMA-able memory. I have had this happen in the past with the ivtv driver with customers that had 15 or so capture cards in one box. And I'm sure it will happen in the future as well, esp. with upcoming 4k video formats. Video is a major memory consumer, particularly in embedded systems. And there It's not really that weird. The same problems can actually occur as well with the more 'mainstream' consumer level video boards, although you need I'm doing the reviewing for linux-media. It would be really nice to have a good system for this in place. For example, the current TI davinci capture driver will only work reliably (memory-wise) if you also use the out-of-tree TI cmem module. Hardly a desirable situation. Basically a fair amount of custom hacks is required at the memory to have reliable video streaming on The video subsystem is the other candidate. Probably not for the current generation of GPUs (these all have hardware IOMMUs I suspect), but definitely for the framebuffer based devices that ...
Isn't the proposed CMA thing vulnerable to the exact same problem? If you allow sharing of regions and plug some allocator in there you get the same problem. If you can solve it there, you can solve it for any Well, that's a trade-off, if you want to have the memory be usable for anything else (which I understood people did want) then you have to pay for cleaning it up when you need to use it. As for the cost of compaction vs regular page-out of random page-cache memory, compaction is actually cheaper, since it doesn't need to write out dirty data, and page-out driven writeback sucks due to the I would sincerely hope PCI(e) devices come with an IOMMU (and all memory lines wired up), really, any hardware that doesn't isn't worth the silicon its engraved in. Just don't buy it. --
Since with cma you can assign a region exclusively to a driver you can ensure that this problem does not occur. Of course, if you allow sharing then you will end up with the same type of problem unless you know that there is only one There is obviously a trade-off. I was just wondering how costly it is. E.g. would it be a noticeable delay making 64 MB memory available in this In the case of the ivtv driver the PCI device had a broken scatter-gather DMA engine, which is the underlying reason for these issues. Since I was maintainer of this driver for a few years I would love to have a reliable solution for the memory issues. It's not a big deal, 99.99% of all users will never notice anything, but still... And I don't think there are any affordable or easily obtainable alternatives to this hardware with similar feature sets, even after all these years. Anyway, I agree with your sentiment, but reality can be disappointingly different :-( And especially with regards to video hardware the creativity of the hardware designers is boundless -- to the dismay of us linux-media developers. Regards, Hans -- Hans Verkuil - video4linux developer - sponsored by TANDBERG, part of Cisco --
I think you could do the same thing, the proposed page allocator solutions still needs to manage pageblock state, you can manage those the same as you would your cma regions -- the difference is that you get the option of letting the rest of the system use the memory in a Right, dunno really, rather depends on the memory bandwidth of your arm device I suspect. It is something you'd have to test. In case the machine isn't fast enough, there really isn't anything you can do but keep the memory empty at all times; unless of course the device in question needs it. --
All FireWire controllers are OHCI and use scatter-gather lists. Most USB controllers require continuous memory for USB packets; the USB framework has its own DMA buffer cache. Some sound cards have no IOMMU; the ALSA framework preallocates buffers for those. Regards, Clemens --
I'm aware that grabbing a large chunk at boot time is a bit of waste of space and because of it I'm hoping to came up with a way of reusing the space when it's not used by CMA-aware devices. My current idea was to OK. -- Best regards, _ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Michał "mina86" Nazarewicz (o o) +----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo-- --
Right, so to me that looks like going at the problem backwards. That will complicate the page-cache instead of your bad hardware drivers (really, hardware should use IOMMUs already). So why not work on the page allocator to improve its contiguous allocation behaviour. If you look at the thing you'll find pageblocks and migration types. If you change it so that you pin the migration type of one or a number of contiguous pageblocks to say MIGRATE_MOVABLE, so that they cannot be used for anything but movable pages you're pretty much there. --
And that's exactly where I'm headed. I've created API that seems to be usable and meat mine and others requirements (not that I'm not saying it cannot be improved -- I'm always happy to hear comments) and now I'm starting to concentrate on the reusing of the grabbed memory. At first I wasn't sure how this can be managed but thanks to many comments (including yours, thanks!) I have an idea of how the thing should work and what I should do from now. -- Best regards, _ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Michał "mina86" Nazarewicz (o o) +----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo-- --
I'm only taking a quick look at this - slow as ever so pardon me if I Quick glance tells me that buffer sizes of 20MB are being thrown about which the core page allocator doesn't handle very well (and couldn't without major modification). Fragmentation avoidance only works well on sizes < MAX_ORDER_NR_PAGES which likely will be 2MB or 4MB. That said, there are things the core VM can do to help. One is related to ZONE_MOVABLE and the second is on the use of MIGRATE_ISOLATE. ZONE_MOVABLE is setup when the command line has kernelcore= or movablecore= specified. In ZONE_MOVABLE only pages that can be migrated are allocated (or huge pages if specifically configured to be allowed). The zone is setup during initialisation by slicing pieces from the end of existing zones and for various reasons, it would be best to maintain that behaviour unless CMA had a specific requirement for memory in the middle of an existing zone. So lets say the maximum amount of contiguous memory required by all devices is 64M and ZONE_MOVABLE is 64M. During normal operation, normal order-0 pages can be allocated from this zone meaning the memory is not pinned and unusable by anybody else. This avoids wasting memory. When a device needs a new buffer, compaction would need some additional smarts to compact or reclaim the size of memory needed by the driver but because all the pages in the zone are movable, it should be possible. Ideally it would have swap to reclaim because if not, compaction needs to know how to move pages outside a zone (something it currently avoids). Essentially, cma_alloc() would be a normal alloc_pages that uses ZONE_MOVABLE for buffers < MAX_ORDER_NR_PAGES but would need additional compaction smarts for the larger buffers. I think it would reuse as much of the existing VM as possible but without reviewing the code, I don't Relatively handy to do something like this. It can also be somewhat contrained by doing something similar to MIGRATE_ISOLATE to have contiguous regions of ...
