The following is a first release of an allocator based on SLAB and SLUB that integrates the best approaches from both allocators. The per cpu queuing is like the two prior releases. The NUMA facilities were much improved vs V2. Shared and alien cache support was added to track the cache hot state of objects. After this patches SLUB will track the cpu cache contents like SLAB attemped to. There are a number of architectural differences: 1. SLUB accurately tracks cpu caches instead of assuming that there is only a single cpu cache per node or system. 2. SLUB object expiration is tied into the page reclaim logic. There is no periodic cache expiration. 3. SLUB caches are dynamically configurable via the sysfs filesystem. 4. There is no per slab page metadata structure to maintain (aside from the object bitmap that usually fits into the page struct). 5. Keeps all the other good features of SLUB as well. SLUB+Q is a merging of SLUB with some queuing concepts from SLAB and a new way of managing objects in the slabs using bitmaps. It uses a percpu queue so that free operations can be properly buffered and a bitmap for managing the free/allocated state in the slabs. It is slightly more inefficient than SLUB (due to the need to place large bitmaps --sized a few words--in some slab pages if there are more than BITS_PER_LONG objects in a slab) but in general does not increase space use too much. The SLAB scheme of not touching the object during management is adopted. SLUB+Q can efficiently free and allocate cache cold objects without causing cache misses. I have had limited time for benchmarking this release so far since I was more focused on getting SLAB features merged in and making it work reliably with all the usual SLUB bells and whistles. The queueing scheme from the SLUB+Q V1/V2 releases was not changed so that the basic SMP performance is still the same. V1 and V2 did not have NUMA clean queues and therefore the performance on NUMA system was not ...
count_free() == available()
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
---
mm/slub.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2010-07-30 18:44:54.767739966 -0500
+++ linux-2.6/mm/slub.c 2010-07-30 18:45:24.248349179 -0500
@@ -1697,11 +1697,6 @@
return 1;
}
-static int count_free(struct page *page)
-{
- return available(page);
-}
-
static unsigned long count_partial(struct kmem_cache_node *n,
int (*get_count)(struct page *))
{
@@ -1750,7 +1745,7 @@
if (!n)
continue;
- nr_free = count_partial(n, count_free);
+ nr_free = count_partial(n, available);
nr_slabs = node_nr_slabs(n);
nr_objs = node_nr_objs(n);
@@ -3906,7 +3901,7 @@
x = atomic_long_read(&n->total_objects);
else if (flags & SO_OBJECTS)
x = atomic_long_read(&n->total_objects) -
- count_partial(n, count_free);
+ count_partial(n, available);
else
x = atomic_long_read(&n->nr_slabs);
@@ -4792,7 +4787,7 @@
nr_partials += n->nr_partial;
nr_slabs += atomic_long_read(&n->nr_slabs);
nr_objs += atomic_long_read(&n->total_objects);
- nr_free += count_partial(n, count_free);
+ nr_free += count_partial(n, available);
}
nr_inuse = nr_objs - nr_free;
--
Allow resizing of cpu queue and batch size. This is done in the
basic steps that are also followed by SLAB.
Careful: The ->cpu pointer is becoming volatile. References
to the ->cpu pointer either
A. Occur with interrupts disabled. This guarantees that nothing on the
processor itself interferes. This only serializes access to a single
processor specific area.
B. Occur with slub_lock taken for operations on all per cpu areas.
Taking the slub_lock guarantees that no resizing operation will occur
while accessing the percpu areas. The data in the percpu areas
is volatile even with slub_lock since the alloc and free functions
do not take slub_lock and will operate on fields of kmem_cache_cpu.
C. Are racy: Tolerable for statistics. The ->cpu pointer must always
point to a valid kmem_cache_cpu area.
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
---
include/linux/slub_def.h | 9 -
mm/slub.c | 218 +++++++++++++++++++++++++++++++++++++++++------
2 files changed, 197 insertions(+), 30 deletions(-)
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2010-07-31 18:25:53.000000000 -0500
+++ linux-2.6/mm/slub.c 2010-07-31 19:02:05.003563067 -0500
@@ -195,10 +195,19 @@
#endif
+/*
+ * We allow stat calls while slub_lock is taken or while interrupts
+ * are enabled for simplicities sake.
+ *
+ * This results in potential inaccuracies. If the platform does not
+ * support per cpu atomic operations vs. interrupts then the counters
+ * may be updated in a racy manner due to slab processing in
+ * interrupts.
+ */
static inline void stat(struct kmem_cache *s, enum stat_item si)
{
#ifdef CONFIG_SLUB_STATS
- __this_cpu_inc(s->cpu_slab->stat[si]);
+ __this_cpu_inc(s->cpu->stat[si]);
#endif
}
@@ -303,7 +312,7 @@
static inline int queue_full(struct kmem_cache_queue *q)
{
- return q->objects == QUEUE_SIZE;
+ return q->objects == ...kmalloc caches are statically defined and may take up a lot of space just
because the sizes of the node array has to be dimensioned for the largest
node count supported.
This patch makes the size of the kmem_cache structure dynamic throughout by
creating a kmem_cache slab cache for the kmem_cache objects. The bootstrap
occurs by allocating the initial one or two kmem_cache objects from the
page allocator.
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
---
include/linux/slub_def.h | 7 -
mm/slub.c | 181 +++++++++++++++++++++++++++++++++++------------
2 files changed, 139 insertions(+), 49 deletions(-)
Index: linux-2.6/include/linux/slub_def.h
===================================================================
--- linux-2.6.orig/include/linux/slub_def.h 2010-07-26 14:25:16.000000000 -0500
+++ linux-2.6/include/linux/slub_def.h 2010-07-26 14:26:24.000000000 -0500
@@ -136,19 +136,16 @@ struct kmem_cache {
#ifdef CONFIG_ZONE_DMA
#define SLUB_DMA __GFP_DMA
-/* Reserve extra caches for potential DMA use */
-#define KMALLOC_CACHES (2 * SLUB_PAGE_SHIFT)
#else
/* Disable DMA functionality */
#define SLUB_DMA (__force gfp_t)0
-#define KMALLOC_CACHES SLUB_PAGE_SHIFT
#endif
/*
* We keep the general caches in an array of slab caches that are used for
* 2^x bytes of allocations.
*/
-extern struct kmem_cache kmalloc_caches[KMALLOC_CACHES];
+extern struct kmem_cache *kmalloc_caches[SLUB_PAGE_SHIFT];
/*
* Sorry that the following has to be that ugly but some versions of GCC
@@ -213,7 +210,7 @@ static __always_inline struct kmem_cache
if (index == 0)
return NULL;
- return &kmalloc_caches[index];
+ return kmalloc_caches[index];
}
void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2010-07-26 14:26:20.000000000 -0500
+++ linux-2.6/mm/slub.c 2010-07-26 14:26:24.000000000 -0500
@@ -179,7 ...Extract the code that memory checkers and other verification tools use from
the hotpaths. Makes it easier to add new ones and reduces the disturbances
of the hotpaths.
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
---
mm/slub.c | 49 ++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 38 insertions(+), 11 deletions(-)
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2010-07-26 14:26:24.000000000 -0500
+++ linux-2.6/mm/slub.c 2010-07-26 14:26:33.000000000 -0500
@@ -793,6 +793,37 @@ static void trace(struct kmem_cache *s,
}
/*
+ * Hooks for other subsystems that check memory allocations. In a typical
+ * production configuration these hooks all should produce no code at all.
+ */
+static inline int slab_pre_alloc_hook(struct kmem_cache *s, gfp_t flags)
+{
+ lockdep_trace_alloc(flags);
+ might_sleep_if(flags & __GFP_WAIT);
+
+ return should_failslab(s->objsize, flags, s->flags);
+}
+
+static inline void slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags, void *object)
+{
+ kmemcheck_slab_alloc(s, flags, object, s->objsize);
+ kmemleak_alloc_recursive(object, s->objsize, 1, s->flags, flags);
+}
+
+static inline void slab_free_hook(struct kmem_cache *s, void *x)
+{
+ kmemleak_free_recursive(x, s->flags);
+}
+
+static inline void slab_free_hook_irq(struct kmem_cache *s, void *object)
+{
+ kmemcheck_slab_free(s, object, s->objsize);
+ debug_check_no_locks_freed(object, s->objsize);
+ if (!(s->flags & SLAB_DEBUG_OBJECTS))
+ debug_check_no_obj_freed(object, s->objsize);
+}
+
+/*
* Tracking of fully allocated slabs for debugging purposes.
*/
static void add_full(struct kmem_cache_node *n, struct page *page)
@@ -1698,10 +1729,7 @@ static __always_inline void *slab_alloc(
gfpflags &= gfp_allowed_mask;
- lockdep_trace_alloc(gfpflags);
- might_sleep_if(gfpflags & __GFP_WAIT);
-
- if (should_failslab(s->objsize, gfpflags, s->flags))
+ if ...From: Tejun Heo <tj@kernel.org>
In pcpu_build_alloc_info() and pcpu_embed_first_chunk(), @dyn_size was
ssize_t, -1 meant auto-size, 0 forced 0 and positive meant minimum
size. There's no use case for forcing 0 and the upcoming early alloc
support always requires non-zero dynamic size. Make @dyn_size always
mean minimum dyn_size.
While at it, make pcpu_build_alloc_info() static which doesn't have
any external caller as suggested by David Rientjes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Index: work/include/linux/percpu.h
===================================================================
--- work.orig/include/linux/percpu.h
+++ work/include/linux/percpu.h
@@ -104,16 +104,11 @@ extern struct pcpu_alloc_info * __init p
int nr_units);
extern void __init pcpu_free_alloc_info(struct pcpu_alloc_info *ai);
-extern struct pcpu_alloc_info * __init pcpu_build_alloc_info(
- size_t reserved_size, ssize_t dyn_size,
- size_t atom_size,
- pcpu_fc_cpu_distance_fn_t cpu_distance_fn);
-
extern int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
void *base_addr);
#ifdef CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK
-extern int __init pcpu_embed_first_chunk(size_t reserved_size, ssize_t dyn_size,
+extern int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
size_t atom_size,
pcpu_fc_cpu_distance_fn_t cpu_distance_fn,
pcpu_fc_alloc_fn_t alloc_fn,
Index: work/mm/percpu.c
===================================================================
--- work.orig/mm/percpu.c
+++ work/mm/percpu.c
@@ -1013,20 +1013,6 @@ phys_addr_t per_cpu_ptr_to_phys(void *ad
return page_to_phys(pcpu_addr_to_page(addr));
}
-static inline size_t pcpu_calc_fc_sizes(size_t static_size,
- size_t reserved_size,
- ssize_t *dyn_sizep)
-{
- size_t size_sum;
-
- size_sum = PFN_ALIGN(static_size + reserved_size +
- ...Move the gfpflags masking into the hooks for checkers and into the slowpaths.
gfpflag masking requires access to a global variable and thus adds an
additional cacheline reference to the hotpaths.
If no hooks are active then the gfpflag masking will result in
code that the compiler can toss out.
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
---
mm/slub.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2010-07-26 14:26:33.000000000 -0500
+++ linux-2.6/mm/slub.c 2010-07-26 14:26:47.000000000 -0500
@@ -798,6 +798,7 @@ static void trace(struct kmem_cache *s,
*/
static inline int slab_pre_alloc_hook(struct kmem_cache *s, gfp_t flags)
{
+ flags &= gfp_allowed_mask;
lockdep_trace_alloc(flags);
might_sleep_if(flags & __GFP_WAIT);
@@ -806,6 +807,7 @@ static inline int slab_pre_alloc_hook(st
static inline void slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags, void *object)
{
+ flags &= gfp_allowed_mask;
kmemcheck_slab_alloc(s, flags, object, s->objsize);
kmemleak_alloc_recursive(object, s->objsize, 1, s->flags, flags);
}
@@ -1679,6 +1681,7 @@ new_slab:
goto load_freelist;
}
+ gfpflags &= gfp_allowed_mask;
if (gfpflags & __GFP_WAIT)
local_irq_enable();
@@ -1727,8 +1730,6 @@ static __always_inline void *slab_alloc(
struct kmem_cache_cpu *c;
unsigned long flags;
- gfpflags &= gfp_allowed_mask;
-
if (!slab_pre_alloc_hook(s, gfpflags))
return NULL;
--
Serialize kmem_cache_create and kmem_cache_destroy using the slub_lock. Only
possible after the use of the slub_lock during dynamic dma creation has been
removed.
Then make sure that the setup of the slab sysfs entries does not race
with kmem_cache_create and kmem_cache destroy.
If a slab cache is removed before we have setup sysfs then simply skip over
the sysfs handling.
V1->V2:
- Do proper synchronization to address race conditions
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
---
mm/slub.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2010-07-27 22:51:41.000000000 -0500
+++ linux-2.6/mm/slub.c 2010-07-27 22:51:43.000000000 -0500
@@ -2482,7 +2482,6 @@ void kmem_cache_destroy(struct kmem_cach
s->refcount--;
if (!s->refcount) {
list_del(&s->list);
- up_write(&slub_lock);
if (kmem_cache_close(s)) {
printk(KERN_ERR "SLUB %s: %s called for cache that "
"still has objects.\n", s->name, __func__);
@@ -2491,8 +2490,8 @@ void kmem_cache_destroy(struct kmem_cach
if (s->flags & SLAB_DESTROY_BY_RCU)
rcu_barrier();
sysfs_slab_remove(s);
- } else
- up_write(&slub_lock);
+ }
+ up_write(&slub_lock);
}
EXPORT_SYMBOL(kmem_cache_destroy);
@@ -3147,14 +3146,12 @@ struct kmem_cache *kmem_cache_create(con
*/
s->objsize = max(s->objsize, (int)size);
s->inuse = max_t(int, s->inuse, ALIGN(size, sizeof(void *)));
- up_write(&slub_lock);
if (sysfs_slab_alias(s, name)) {
- down_write(&slub_lock);
s->refcount--;
- up_write(&slub_lock);
goto err;
}
+ up_write(&slub_lock);
return s;
}
@@ -3163,14 +3160,12 @@ struct kmem_cache *kmem_cache_create(con
if (kmem_cache_open(s, name,
size, align, flags, ctor)) {
list_add(&s->list, ...The cacheline with the flags is reachable from the hot paths after the
percpu allocator changes went in. So there is no need anymore to put a
flag into each slab page. Get rid of the SlubDebug flag and use
the flags in kmem_cache instead.
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
---
include/linux/page-flags.h | 2 --
mm/slub.c | 33 ++++++++++++---------------------
2 files changed, 12 insertions(+), 23 deletions(-)
Index: linux-2.6/include/linux/page-flags.h
===================================================================
--- linux-2.6.orig/include/linux/page-flags.h 2010-07-28 12:03:19.000000000 -0500
+++ linux-2.6/include/linux/page-flags.h 2010-07-28 12:44:57.000000000 -0500
@@ -128,7 +128,6 @@ enum pageflags {
/* SLUB */
PG_slub_frozen = PG_active,
- PG_slub_debug = PG_error,
};
#ifndef __GENERATING_BOUNDS_H
@@ -215,7 +214,6 @@ PAGEFLAG(SwapBacked, swapbacked) __CLEAR
__PAGEFLAG(SlobFree, slob_free)
__PAGEFLAG(SlubFrozen, slub_frozen)
-__PAGEFLAG(SlubDebug, slub_debug)
/*
* Private page markings that may be used by the filesystem that owns the page
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2010-07-28 12:44:56.000000000 -0500
+++ linux-2.6/mm/slub.c 2010-07-28 12:44:57.000000000 -0500
@@ -107,11 +107,17 @@
* the fast path and disables lockless freelists.
*/
+#define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
+ SLAB_TRACE | SLAB_DEBUG_FREE)
+
+static inline int kmem_cache_debug(struct kmem_cache *s)
+{
#ifdef CONFIG_SLUB_DEBUG
-#define SLABDEBUG 1
+ return unlikely(s->flags & SLAB_DEBUG_FLAGS);
#else
-#define SLABDEBUG 0
+ return 0;
#endif
+}
/*
* Issues still to be resolved:
@@ -1157,9 +1163,6 @@ static struct page *new_slab(struct kmem
inc_slabs_node(s, page_to_nid(page), page->objects);
page->slab = s;
...Remove the dynamic dma slab allocation since this causes too many issues with
nested locks etc etc. The change avoids passing gfpflags into many functions.
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
---
mm/slub.c | 151 ++++++++++++++++----------------------------------------------
1 file changed, 40 insertions(+), 111 deletions(-)
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2010-07-27 22:51:36.000000000 -0500
+++ linux-2.6/mm/slub.c 2010-07-27 22:51:36.000000000 -0500
@@ -2065,7 +2065,7 @@ init_kmem_cache_node(struct kmem_cache_n
static DEFINE_PER_CPU(struct kmem_cache_cpu, kmalloc_percpu[KMALLOC_CACHES]);
-static inline int alloc_kmem_cache_cpus(struct kmem_cache *s, gfp_t flags)
+static inline int alloc_kmem_cache_cpus(struct kmem_cache *s)
{
if (s < kmalloc_caches + KMALLOC_CACHES && s >= kmalloc_caches)
/*
@@ -2092,7 +2092,7 @@ static inline int alloc_kmem_cache_cpus(
* when allocating for the kmalloc_node_cache. This is used for bootstrapping
* memory on a fresh node that has no slab structures yet.
*/
-static void early_kmem_cache_node_alloc(gfp_t gfpflags, int node)
+static void early_kmem_cache_node_alloc(int node)
{
struct page *page;
struct kmem_cache_node *n;
@@ -2100,7 +2100,7 @@ static void early_kmem_cache_node_alloc(
BUG_ON(kmalloc_caches->size < sizeof(struct kmem_cache_node));
- page = new_slab(kmalloc_caches, gfpflags, node);
+ page = new_slab(kmalloc_caches, GFP_KERNEL, node);
BUG_ON(!page);
if (page_to_nid(page) != node) {
@@ -2144,7 +2144,7 @@ static void free_kmem_cache_nodes(struct
}
}
-static int init_kmem_cache_nodes(struct kmem_cache *s, gfp_t gfpflags)
+static int init_kmem_cache_nodes(struct kmem_cache *s)
{
int node;
@@ -2152,11 +2152,11 @@ static int init_kmem_cache_nodes(struct
struct kmem_cache_node *n;
if (slab_state == DOWN) {
- early_kmem_cache_node_alloc(gfpflags, ...This insta-reboots on my netperf benchmarking servers (but works with numa=off), so I'll have to wait until I can hook up a serial before benchmarking this series. --
There are potential issues with 1. The size of per cpu reservation on bootup and the new percpu code that allows allocations for per cpu areas during bootup. Sometime I wonder if I should just go back to static allocs for that. 2. The topology information provided by the machine for the cache setup. 3. My code of course. Bootlog would be appreciated. --
I bisected this to patch 8 but still don't have a bootlog. I'm assuming in the meantime that something is kmallocing DMA memory on this machine prior to kmem_cache_init_late() and get_slab() is returning a NULL pointer. --
There is a kernel option "earlyprintk=..." that allows you to see early
boot messages.
If this indeed is a problem with the DMA caches then try the following
patch:
Subject: slub: Move dma cache initialization up
Do dma kmalloc initialization in kmem_cache_init and not in kmem_cache_init_late()
Signed-off-by: Christoph Lameter <cl@linux.com>
---
mm/slub.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2010-08-05 12:24:21.000000000 -0500
+++ linux-2.6/mm/slub.c 2010-08-05 12:28:58.000000000 -0500
@@ -3866,13 +3866,8 @@ void __init kmem_cache_init(void)
#ifdef CONFIG_SMP
register_cpu_notifier(&slab_notifier);
#endif
-}
-void __init kmem_cache_init_late(void)
-{
#ifdef CONFIG_ZONE_DMA
- int i;
-
/* Create the dma kmalloc array and make it operational */
for (i = 0; i < SLUB_PAGE_SHIFT; i++) {
struct kmem_cache *s = kmalloc_caches[i];
@@ -3891,6 +3886,10 @@ void __init kmem_cache_init_late(void)
#endif
}
+void __init kmem_cache_init_late(void)
+{
+}
+
/*
* Find a mergeable slab cache
*/
--
Ok, so this is panicking because of the error handling when trying to create sysfs directories with the same name (in this case, :dt-0000064). I'll look into while this isn't failing gracefully later, but I isolated this to the new code that statically allocates the DMA caches in kmem_cache_init_late(). The iteration runs from 0 to SLUB_PAGE_SHIFT; that's actually incorrect since the kmem_cache_node cache occupies the first spot in the kmalloc_caches array and has a size, 64 bytes, equal to a power of two that is duplicated later. So this patch tries creating two DMA kmalloc caches with 64 byte object size which triggers a BUG_ON() during kmem_cache_release() in the error handling later. The fix is to start the iteration at 1 instead of 0 so that all other caches have their equivalent DMA caches created and the special-case kmem_cache_node cache is excluded (see below). I'm really curious why nobody else ran into this problem before, especially if they have CONFIG_SLUB_DEBUG enabled so struct kmem_cache_node has the same size. Perhaps my early bug report caused people not to test the series... I'm adding Tejun Heo to the cc because of another thing that may be problematic: alloc_percpu() allocates GFP_KERNEL memory, so when we try to allocate kmem_cache_cpu for a DMA cache we may be returning memory from a node that doesn't include lowmem so there will be no affinity between the struct and the slab. I'm wondering if it would be better for the percpu allocator to be extended for kzalloc_node(), or vmalloc_node(), when allocating memory after the slab layer is up. There're a couple more issues with the patch as well: - the entire iteration in kmem_cache_init_late() needs to be protected by slub_lock. The comment in create_kmalloc_cache() should be revised since you're no longer calling it only with irqs disabled. kmem_cache_init_late() has irqs enabled and, thus, slab_caches must be protected. - a BUG_ON(!name) needs to ...
Hello, Hmmm... do you mean adding @gfp_mask to percpu allocation function? I've been thinking about adding it for atomic allocations (Christoph, do you still want it?). I've been sort of against it because I primarily don't really like atomic allocations (it often just pushes error handling complexities elsewhere where it becomes more complex) and it would also require making vmalloc code do atomic allocations. Most of percpu use cases seem pretty happy with GFP_KERNEL allocation, so I'm still quite reluctant to change that. We can add a semi internal interface w/ @gfp_mask but w/o GFP_ATOMIC support, which is a bit ugly. How important would this be? Thanks. -- tejun --
DMA caches may only exist on certain nodes because others do not have a DMA zone. Their role is quite limited these days. DMA caches allocated on nodes without DMA zones would have their percpu area allocated on the node but the DMA allocations would be redirected to the closest node with DMA At this point I would think that we do not need that support. --
The kmem_cache_node cache is no longer at position 0. I moved it to kmem_cache_init() which is run when we only have one execution thread. That takes care of the issue and ensures that the dma Ok. --
If you do not apply all patches then you can be at a stage were kmalloc_caches[0] is still used for kmem_cache_node. Then things break. --
They do after patch 11 when you introduce dynamically sized kmalloc caches, but not after only patches 1-8 were applied. Since this wasn't booting on my system, I bisected the problem to patch 8 where kmem_cache_init_late() would create two DMA caches of size 64 bytes: one becauses of kmalloc_caches[0] (kmem_cache_node) and one because of kmalloc_caches[6] (2^6 = 64). So my fixes are necessary for patch 8 but I didn't know if that was a debugging patch for me or if you wanted to push that as part of your series, I'm not sure if you actually need to move it to kmem_cache_init() now that slub_state is protected by slub_lock. I'm not sure if we want to allocate DMA objects between kmem_cache_init() and kmem_cache_init_late(). --
Drivers may allocate dma buffers during initialization. --
Ok, I moved the DMA cache creation from kmem_cache_init_late() to kmem_cache_init(). Note: the kasprintf() will need to use GFP_NOWAIT and not GFP_KERNEL now. --
ok. I have revised the patch since there is also a problem with the indirection on kmalloc_caches. --
