A new kmemleak version is available. Thanks to all who reviewed the code
and gave feedback. Kmemleak can also be found on this git tree:
git://linux-arm.org/linux-2.6.git kmemleak
Please note that even though I already got the ack from the
slab/slob/slub maintainers, I haven't included the corresponding lines
in the patch description because the patches were slightly modified to
pass the GFP flags to the kmemleak callbacks. I would be grateful if you
review these changes and re-acknowledge them.
Changes since the previous release:
- fatal errors for kmemleak are no longer fatal for the full system.
Kmemleak can disable and clean-up after itself at run-time if such a
condition occurs
- re-worked locking in the memleak.c file together with documentation on
how it works
- kmemleak internal allocations use the GFP flags of the caller and are
no longer restricted to GFP_ATOMIC
- better (hopefully) comments all over the code
- implemented most of the comments received so far (an important
omission here is the tracking of alloc_bootmem calls since it looks a
bit difficult to pair them with free_bootmem, the latter being used
with reserve_bootmem or without a corresponding alloc_bootmem)
Still to do:
- run-time and boot-time configuration like task stacks scanning,
disabling kmemleak, enabling/disabling the automatic scanning
Thanks for your comments.
Catalin Marinas (15):
kmemleak: Add the corresponding MAINTAINERS entry
kmemleak: Simple testing module for kmemleak
kmemleak: Keep the __init functions after initialization
kmemleak: Enable the building of the memory leak detector
kmemleak: Remove some of the kmemleak false positives
arm: Provide _sdata and __bss_stop in the vmlinux.lds.S file
x86: Provide _sdata in the vmlinux_*.lds.S files
kmemleak: Add modules support
kmemleak: Add memleak_alloc callback from alloc_large_system_hash
kmemleak: Add the vmalloc memory allocation/freeing ...This patch adds the Documentation/kmemleak.txt file with some information about how kmemleak works. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- Documentation/kmemleak.txt | 127 ++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 127 insertions(+), 0 deletions(-) create mode 100644 Documentation/kmemleak.txt diff --git a/Documentation/kmemleak.txt b/Documentation/kmemleak.txt new file mode 100644 index 0000000..6617ce8 --- /dev/null +++ b/Documentation/kmemleak.txt @@ -0,0 +1,127 @@ +Kernel Memory Leak Detector +=========================== + +Introduction +------------ + +Kmemleak provides a way of detecting possible kernel memory leaks in a +way similar to a tracing garbage collector +(http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29#Tracing_garbage...), +with the difference that the orphan objects are not freed but only +reported via /sys/kernel/debug/memleak. A similar method is used by the +Valgrind tool (memcheck --leak-check) to detect the memory leaks in +user-space applications. + +Usage +----- + +CONFIG_DEBUG_MEMLEAK in "Kernel hacking" has to be enabled. A kernel +thread scans the memory every 10 min (by default) and prints any new +unreferenced objects found. To trigger an intermediate scan and display +all the possible memory leaks: + + # mount -t debugfs nodev /sys/kernel/debug/ + # cat /sys/kernel/debug/memleak + +Note that the orphan objects are listed in the order they were allocated +and one object at the beginning of the list may cause other subsequent +objects to be reported as orphan. + +Basic Algorithm +--------------- + +The memory allocations via kmalloc, vmalloc, kmem_cache_alloc and +friends are traced and the pointers, together with additional +information like size and stack trace, are stored in a prio search tree. +The corresponding freeing function calls are tracked and the pointers +removed from the kmemleak data structures. + +An allocated block of memory is ...
This patch adds the callbacks to memleak_(alloc|free) functions from the slab allocator. The patch also adds the SLAB_NOLEAKTRACE flag to avoid recursive calls to kmemleak when it allocates its own data structures. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> --- include/linux/slab.h | 2 ++ mm/slab.c | 18 ++++++++++++++++-- 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 000da12..d72ad0b 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -62,6 +62,8 @@ # define SLAB_DEBUG_OBJECTS 0x00000000UL #endif +#define SLAB_NOLEAKTRACE 0x00800000UL /* Avoid kmemleak tracing */ + /* The following flags affect the page allocator grouping pages by mobility */ #define SLAB_RECLAIM_ACCOUNT 0x00020000UL /* Objects are reclaimable */ #define SLAB_TEMPORARY SLAB_RECLAIM_ACCOUNT /* Objects are short-lived */ diff --git a/mm/slab.c b/mm/slab.c index 0918751..d11112f 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -106,6 +106,7 @@ #include <linux/string.h> #include <linux/uaccess.h> #include <linux/nodemask.h> +#include <linux/memleak.h> #include <linux/mempolicy.h> #include <linux/mutex.h> #include <linux/fault-inject.h> @@ -177,13 +178,13 @@ SLAB_STORE_USER | \ SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | \ SLAB_DESTROY_BY_RCU | SLAB_MEM_SPREAD | \ - SLAB_DEBUG_OBJECTS) + SLAB_DEBUG_OBJECTS | SLAB_NOLEAKTRACE) #else # define CREATE_MASK (SLAB_HWCACHE_ALIGN | \ SLAB_CACHE_DMA | \ SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | \ SLAB_DESTROY_BY_RCU | SLAB_MEM_SPREAD | \ - SLAB_DEBUG_OBJECTS) + SLAB_DEBUG_OBJECTS | SLAB_NOLEAKTRACE) #endif /* @@ -2610,6 +2611,13 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep, void *objp, /* Slab management obj is off-slab. */ slabp = kmem_cache_alloc_node(cachep->slabp_cache, local_flags & ~GFP_THISNODE, ...
You might also want to try linux-mm@kvack.org for these. You'll probably get some better review there, and they're just _kinda_ mm related. ;) -- Dave --
It would be really nice here to say *how* it is avoiding false negatives. :) How about: /* Don't let the pointer from the slab itself count as referencing */ -- Dave --
Hi Catalin, Heh, I run into this part again and as I have a long term memory of a goldfish I had to look up the discussion we had. So may I suggest you change the comment to: /* * If the first object in the slab is leaked (it's allocated but no * one has a reference to it), we want to make sure kmemleak does not * treat the ->s_mem pointer as a reference to the object. Otherwise * we will not report the leak. For this, maybe something like this: /* * To avoid a false negative, if an object that is in one of the * per-CPU caches is leaked, we need to make sure kmemleak doesn't * treat the array pointers as a reference to the object. Do you take care of the per-node lists as well? --
I can't figure out what other location should be erased. -- Catalin --
As far as I can tell, you need to tell kmemleak not to scan the alien caches and the shared array that is shared by all CPUs that belong to one node. I'm adding Christoph to the CC in case he wants to comment on this. --
In the ____cache_alloc() kmemleak clears the cachep->array->entry[ac->avail] pointer but this may not be enough as freed and later re-allocated objects may have pointers in the alien cache (is that correct?). A better approach (haven't tried it yet) would be not to scan objects allocated via alloc_arraycache() at all. However, there is still the initarray_cache/generic which are automatically scanned via the data section (unless I add an attribute to place them in a different, not scanned, section). -- Catalin --
An allocated object is not part of any cache in SLAB. Only freed objects are kept in the slab queues. A freed object can only be in one queue at a time. --
OK, but is there a chance that an stale pointer remains in such caches? There seems to be the transfer_objects() function that moves pointers around but doesn't clear the source values. -- Catalin --
Definitely. The pointers are never cleared. There are counters in the No need to. The counter updates take care of things. --
On Thu, Dec 18, 2008 at 9:35 PM, Christoph Lameter For kmemleak, that's a problem. Unless we explicitly annotate the caches, it will scan them and think that there's a pointer to a leaked object (i.e. false negative). Catalin already took care of the per-CPU caches but AFAICT we still need to take care of the per-node caches and the shared caches. --
Why doesnt kmemleak simply use the counter as a boundary and only access those pointers that are valid? --
Since the valid pointers in these caches only point to freed objects (which aren't tracked by kmemleak), it's better for kmemleak not to scan such structures at all. I added a kmemleak_no_scan() annotation for this. Thanks for clarification. -- Catalin --
This patch adds the callbacks to memleak_(alloc|free) functions from the
slob allocator.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
---
mm/slob.c | 15 +++++++++++----
1 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/mm/slob.c b/mm/slob.c
index cb675d1..ff5a98d 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -60,6 +60,7 @@
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/mm.h>
+#include <linux/memleak.h>
#include <linux/cache.h>
#include <linux/init.h>
#include <linux/module.h>
@@ -463,6 +464,7 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node)
{
unsigned int *m;
int align = max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
+ void *ret;
if (size < PAGE_SIZE - align) {
if (!size)
@@ -472,18 +474,18 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node)
if (!m)
return NULL;
*m = size;
- return (void *)m + align;
+ ret = (void *)m + align;
} else {
- void *ret;
-
ret = slob_new_page(gfp | __GFP_COMP, get_order(size), node);
if (ret) {
struct page *page;
page = virt_to_page(ret);
page->private = size;
}
- return ret;
}
+
+ memleak_alloc(ret, size, 1, gfp);
+ return ret;
}
EXPORT_SYMBOL(__kmalloc_node);
@@ -493,6 +495,7 @@ void kfree(const void *block)
if (unlikely(ZERO_OR_NULL_PTR(block)))
return;
+ memleak_free(block);
sp = (struct slob_page *)virt_to_page(block);
if (slob_page(sp)) {
@@ -555,12 +558,14 @@ struct kmem_cache *kmem_cache_create(const char *name, size_t size,
} else if (flags & SLAB_PANIC)
panic("Cannot create slab cache %s\n", name);
+ memleak_alloc(c, sizeof(struct kmem_cache), 1, GFP_KERNEL);
return c;
}
EXPORT_SYMBOL(kmem_cache_create);
void kmem_cache_destroy(struct kmem_cache *c)
{
+ memleak_free(c);
slob_free(c, sizeof(struct kmem_cache));
}
EXPORT_SYMBOL(kmem_cache_destroy);
@@ -577,6 +582,7 @@ ...Is this different than the last one? -- Mathematics is the supreme nostalgia of our time. --
Slightly. It now passes the gfp flags to the memleak_alloc callback Thanks. -- Catalin --
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> --
This patch adds the callbacks to memleak_(alloc|free) functions from
vmalloc/vfree.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
mm/vmalloc.c | 29 ++++++++++++++++++++++++++---
1 files changed, 26 insertions(+), 3 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f3f6e07..b15e29e 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -23,6 +23,7 @@
#include <linux/rbtree.h>
#include <linux/radix-tree.h>
#include <linux/rcupdate.h>
+#include <linux/memleak.h>
#include <asm/atomic.h>
#include <asm/uaccess.h>
@@ -1196,6 +1197,9 @@ static void __vunmap(const void *addr, int deallocate_pages)
void vfree(const void *addr)
{
BUG_ON(in_interrupt());
+
+ memleak_free(addr);
+
__vunmap(addr, 1);
}
EXPORT_SYMBOL(vfree);
@@ -1305,8 +1309,17 @@ fail:
void *__vmalloc_area(struct vm_struct *area, gfp_t gfp_mask, pgprot_t prot)
{
- return __vmalloc_area_node(area, gfp_mask, prot, -1,
- __builtin_return_address(0));
+ void *addr = __vmalloc_area_node(area, gfp_mask, prot, -1,
+ __builtin_return_address(0));
+
+ /*
+ * This needs ref_count = 2 since vm_struct also contains a
+ * pointer to this address. The guard page is also subtracted
+ * from the size.
+ */
+ memleak_alloc(addr, area->size - PAGE_SIZE, 2, gfp_mask);
+
+ return addr;
}
/**
@@ -1325,6 +1338,8 @@ static void *__vmalloc_node(unsigned long size, gfp_t gfp_mask, pgprot_t prot,
int node, void *caller)
{
struct vm_struct *area;
+ void *addr;
+ unsigned long real_size = size;
size = PAGE_ALIGN(size);
if (!size || (size >> PAGE_SHIFT) > num_physpages)
@@ -1336,7 +1351,15 @@ static void *__vmalloc_node(unsigned long size, gfp_t gfp_mask, pgprot_t prot,
if (!area)
return NULL;
- return __vmalloc_area_node(area, gfp_mask, prot, node, caller);
+ addr = __vmalloc_area_node(area, gfp_mask, prot, node, caller);
+
+ /*
+ * This needs ref_count = 2 since the vm_struct also contains
+ * a pointer to this address.
+ ...The alloc_large_system_hash function is called from various places in the kernel and it contains pointers to other allocated structures. It therefore needs to be traced by kmemleak. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- mm/page_alloc.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d8ac014..27efeb0 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -46,6 +46,7 @@ #include <linux/page-isolation.h> #include <linux/page_cgroup.h> #include <linux/debugobjects.h> +#include <linux/memleak.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -4570,6 +4571,8 @@ void *__init alloc_large_system_hash(const char *tablename, if (_hash_mask) *_hash_mask = (1 << log2qty) - 1; + memleak_alloc(table, size, 1, GFP_ATOMIC); + return table; } --
Why is this sucker GFP_ATOMIC? Since alloc_large_system_hash() is using bootmem (and is called early), I'm a little surprised that it is OK to call into memleak_alloc() which uses kmem_cache_alloc(). Is the slab even set up at this point? -- Dave --
It doesn't need to be. Early callbacks like this are logged by kmemleak in a buffer and properly registered once the slab allocator is fully initialised (slab initialisation needs to allocate some memory for itself as well). -- Catalin --
Actually, for consistency is should be GFP_ATOMIC even if the flag might not be used. All the other allocations in this function (vmalloc, __get_free_pages) use GFP_ATOMIC. -- Catalin --
Ahh, thanks for the clarification. Could you add something to the code to this effect? -- Dave --
Do you mean a comment? I can do this. -- Catalin --
Yeah, something like /* * kmemleak doesn't actually allocate memory when called this early * so the GFP_ATOMIC here is actually meaningless, but consistent * with the rest of this function. */ Maybe that's too verbose. :) -- Dave --
I'd suggest just doing a separate kmemleak_early_alloc() hook without the gfp flag. --
It looks to me like alloc_large_system_hash() could also be called at some later point and it may even invoke __vmalloc() if hashdist is set. So I would prefer not to introduce another hook and additional if's to know which one to call. BTW, I think the callback should actually be (to avoid duplicating the vmalloc call, with proper comment): if (!hashdist) memleak_alloc(table, size, 1, GFP_ATOMIC); -- Catalin --
Does memleak_alloc() detect if it gets called twice on the same memory? Also, is alloc_large_system_hash() contained in the tests that you can compile for kmemleak? -- Dave --
It does, and panics (disables itself). I think if this happens it is a No. This memleak_alloc() callback was mainly added to avoid plenty of false reports from various parts of the kernel (especially the IPv4 stack). It wasn't really meant to track the allocated hash blocks. -- Catalin --
This patch handles the kmemleak operations needed for modules loading so
that memory allocations from inside a module are properly tracked.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
kernel/module.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 56 insertions(+), 0 deletions(-)
diff --git a/kernel/module.c b/kernel/module.c
index 1f4cc00..7198681 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -51,6 +51,7 @@
#include <asm/sections.h>
#include <linux/tracepoint.h>
#include <linux/ftrace.h>
+#include <linux/memleak.h>
#if 0
#define DEBUGP printk
@@ -409,6 +410,7 @@ static void *percpu_modalloc(unsigned long size, unsigned long align,
unsigned long extra;
unsigned int i;
void *ptr;
+ int cpu;
if (align > PAGE_SIZE) {
printk(KERN_WARNING "%s: per-cpu alignment %li > %li\n",
@@ -438,6 +440,11 @@ static void *percpu_modalloc(unsigned long size, unsigned long align,
if (!split_block(i, size))
return NULL;
+ /* add the per-cpu scanning areas */
+ for_each_possible_cpu(cpu)
+ memleak_alloc(ptr + per_cpu_offset(cpu), size, 0,
+ GFP_KERNEL);
+
/* Mark allocated */
pcpu_size[i] = -pcpu_size[i];
return ptr;
@@ -452,6 +459,7 @@ static void percpu_modfree(void *freeme)
{
unsigned int i;
void *ptr = __per_cpu_start + block_size(pcpu_size[0]);
+ int cpu;
/* First entry is core kernel percpu data. */
for (i = 1; i < pcpu_num_used; ptr += block_size(pcpu_size[i]), i++) {
@@ -463,6 +471,10 @@ static void percpu_modfree(void *freeme)
BUG();
free:
+ /* remove the per-cpu scanning areas */
+ for_each_possible_cpu(cpu)
+ memleak_free(freeme + per_cpu_offset(cpu));
+
/* Merge with previous? */
if (pcpu_size[i-1] >= 0) {
pcpu_size[i-1] += pcpu_size[i];
@@ -1833,6 +1845,36 @@ static void *module_alloc_update_bounds(unsigned long size)
return ret;
}
+#ifdef CONFIG_DEBUG_MEMLEAK
+static void memleak_load_module(struct module *mod, ..._sdata is a common symbol defined by many architectures and made
available to the kernel via asm-generic/sections.h. Kmemleak uses this
symbol when scanning the data sections.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
arch/x86/kernel/vmlinux_32.lds.S | 1 +
arch/x86/kernel/vmlinux_64.lds.S | 1 +
2 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/vmlinux_32.lds.S b/arch/x86/kernel/vmlinux_32.lds.S
index a9b8560..b5d2b49 100644
--- a/arch/x86/kernel/vmlinux_32.lds.S
+++ b/arch/x86/kernel/vmlinux_32.lds.S
@@ -62,6 +62,7 @@ SECTIONS
/* writeable */
. = ALIGN(PAGE_SIZE);
+ _sdata = .; /* Start of data section */
.data : AT(ADDR(.data) - LOAD_OFFSET) { /* Data */
DATA_DATA
CONSTRUCTORS
diff --git a/arch/x86/kernel/vmlinux_64.lds.S b/arch/x86/kernel/vmlinux_64.lds.S
index 46e0544..8ad376c 100644
--- a/arch/x86/kernel/vmlinux_64.lds.S
+++ b/arch/x86/kernel/vmlinux_64.lds.S
@@ -52,6 +52,7 @@ SECTIONS
RODATA
. = ALIGN(PAGE_SIZE); /* Align data segment to page size boundary */
+ _sdata = .; /* Start of data section */
/* Data */
.data : AT(ADDR(.data) - LOAD_OFFSET) {
DATA_DATA
--
_sdata and __bss_stop are common symbols defined by many architectures
and made available to the kernel via asm-generic/sections.h. Kmemleak
uses these symbols when scanning the data sections.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <rmk+lkml@arm.linux.org.uk>
---
arch/arm/kernel/vmlinux.lds.S | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 4898bdc..3cf1d44 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -120,6 +120,7 @@ SECTIONS
.data : AT(__data_loc) {
__data_start = .; /* address in memory */
+ _sdata = .;
/*
* first, the init task union, aligned
@@ -171,6 +172,7 @@ SECTIONS
__bss_start = .; /* BSS */
*(.bss)
*(COMMON)
+ __bss_stop = .;
_end = .;
}
/* Stabs debugging sections. */
--
This patch adds the base support for the kernel memory leak detector. It traces the memory allocation/freeing in a way similar to the Boehm's conservative garbage collector, the difference being that the unreferenced objects are not freed but only shown in /sys/kernel/debug/memleak. Enabling this feature introduces an overhead to memory allocations. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> --- include/linux/memleak.h | 93 +++ init/main.c | 4 mm/memleak.c | 1263 +++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 1359 insertions(+), 1 deletions(-) create mode 100644 include/linux/memleak.h create mode 100644 mm/memleak.c diff --git a/include/linux/memleak.h b/include/linux/memleak.h new file mode 100644 index 0000000..340b9fc --- /dev/null +++ b/include/linux/memleak.h @@ -0,0 +1,93 @@ +/* + * include/linux/memleak.h + * + * Copyright (C) 2008 ARM Limited + * Written by Catalin Marinas <catalin.marinas@arm.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#ifndef __MEMLEAK_H +#define __MEMLEAK_H + +#ifdef CONFIG_DEBUG_MEMLEAK + +extern void memleak_init(void); +extern void memleak_alloc(const void *ptr, size_t ...
Looks good to me from an RCU viewpoint! --
Hi Catalin,
Few minor nits below.
On Wed, Dec 10, 2008 at 8:26 PM, Catalin Marinas
This could be
if (WARN_ON(object->flags & OBJECT_ALLOCATED))
How come you don't dump_stack() or even WARN_ON() unconditionally in
kmemleak_panic() which is called bit later so you can remove this kind
Hmm, dump_stack() is called in quite a few places. Might make sense to
--
I'm not sure just warning would be enough. If this happens, its a severe bug in kmemleak and the tool is no longer useful (it could even leak memory or free already freed blocks). I could change it to a memleak_panic call but if the object use_count isn't reliable, the See my comment above, these are genuine kmemleak bugs and it shouldn't just warn. Hopefully they will never happen unless a get/put_object is missing. Thanks. -- Catalin --
Oh, we use WARN_ON() for things like this as well to maximize the likelihood of the oops actually reaching the user. But whatever works for you the best. --
Thanks for reviewing it. FYI, in the version I'm going to post this week I added another mutex to ensure the exclusive opening of the /sys/kernel/debug/memleak file as one can now use this file to configure kmemleak at run-time. The RCU locking isn't affected and I'll add your "Reviewed-by:" line. -- Catalin --
There are allocations for which the main pointer cannot be found but
they are not memory leaks. This patch fixes some of them. For more
information on false positives, see Documentation/kmemleak.txt.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
drivers/char/vt.c | 7 +++++++
include/linux/percpu.h | 5 +++++
2 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/drivers/char/vt.c b/drivers/char/vt.c
index a5af607..a36221b 100644
--- a/drivers/char/vt.c
+++ b/drivers/char/vt.c
@@ -104,6 +104,7 @@
#include <linux/io.h>
#include <asm/system.h>
#include <linux/uaccess.h>
+#include <linux/memleak.h>
#define MAX_NR_CON_DRIVER 16
@@ -2882,6 +2883,12 @@ static int __init con_init(void)
*/
for (currcons = 0; currcons < MIN_NR_CONSOLES; currcons++) {
vc_cons[currcons].d = vc = alloc_bootmem(sizeof(struct vc_data));
+ /*
+ * Kmemleak does not track the memory allocated via
+ * alloc_bootmem() but this block contains pointers to
+ * other blocks allocated via kmalloc.
+ */
+ memleak_alloc(vc, sizeof(struct vc_data), 1, GFP_ATOMIC);
INIT_WORK(&vc_cons[currcons].SAK_work, vc_SAK);
visual_init(vc, currcons, 1);
vc->vc_screenbuf = (unsigned short *)alloc_bootmem(vc->vc_screenbuf_size);
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 9f2a375..4d1ce18 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -69,7 +69,12 @@ struct percpu_data {
void *ptrs[1];
};
+/* pointer disguising messes up the kmemleak objects tracking */
+#ifndef CONFIG_DEBUG_MEMLEAK
#define __percpu_disguise(pdata) (struct percpu_data *)~(unsigned long)(pdata)
+#else
+#define __percpu_disguise(pdata) (struct percpu_data *)(pdata)
+#endif
/*
* Use this to get to a cpu's version of the per-cpu object dynamically
* allocated. Non-atomic access to the current CPU's version should
--
This patch adds the callbacks to memleak_(alloc|free) functions from the slub allocator. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> --- mm/slub.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 749588a..d9b07cb 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -18,6 +18,7 @@ #include <linux/seq_file.h> #include <linux/cpu.h> #include <linux/cpuset.h> +#include <linux/memleak.h> #include <linux/mempolicy.h> #include <linux/ctype.h> #include <linux/debugobjects.h> @@ -140,7 +141,7 @@ * Set of flags that will prevent slab merging */ #define SLUB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \ - SLAB_TRACE | SLAB_DESTROY_BY_RCU) + SLAB_TRACE | SLAB_DESTROY_BY_RCU | SLAB_NOLEAKTRACE) #define SLUB_MERGE_SAME (SLAB_DEBUG_FREE | SLAB_RECLAIM_ACCOUNT | \ SLAB_CACHE_DMA) @@ -1608,6 +1609,7 @@ static __always_inline void *slab_alloc(struct kmem_cache *s, if (unlikely((gfpflags & __GFP_ZERO) && object)) memset(object, 0, objsize); + memleak_alloc_recursive(object, objsize, 1, s->flags, gfpflags); return object; } @@ -1710,6 +1712,7 @@ static __always_inline void slab_free(struct kmem_cache *s, struct kmem_cache_cpu *c; unsigned long flags; + memleak_free_recursive(x, s->flags); local_irq_save(flags); c = get_cpu_slab(s, smp_processor_id()); debug_check_no_locks_freed(object, c->objsize); --
Hmm, I'm not sure I understand why struct kmem_cache_cpu ->freelist is never scanned. For SMP, I suppose kmemleak doesn't scan the per-CPU areas? But for UP, struct kmem_cache is allocated with kmalloc() and that contains struct kmem_cache_cpu as well. And I suppose we never scan struct pages either. Otherwise ->freelist there would be a problem as well. Pekka --
Did you get any false positives? Or were you expecting false negatives It should scan the per-CPU areas in the memleak_scan() function: #ifdef CONFIG_SMP /* per-cpu sections scanning */ for_each_possible_cpu(i) scan_block(__per_cpu_start + per_cpu_offset(i), __per_cpu_end + per_cpu_offset(i), NULL); It was scanning the mem_map arrays in the past but removed this part and haven't seen any problems (on ARM). Why would the ->freelist be a problem? I don't fully understand the slub allocator. Aren't objects added to the freelist only after they were freed? In __slab_alloc there seems to be a line: c->page->freelist = NULL; so the freelist won't count as a reference anymore. After freeing an object, kmemleak no longer cares about references to it. -- Catalin --
Hi Catalin, I haven't tested kmemleak so I'm just commenting on the code. I was thinking about false negatives, not false positives. I think we're talking about two different things here. Don't we then have false negatives because we reach ->freelist of struct kmem_cache_cpu which contains a pointer to an object that is free'd (take a look at slab_free() fast-path)? Pekka --
Hi Pekka, Just to make sure I understand it correctly, the slab_free() fast path stores the pointer to the freed object into c->freelist. However, this object is no longer tracked by kmemleak because of the kmemleak_free_recursive() call at the beginning of this function (false negatives make sense only for allocated objects). On the slab_alloc() fast path, the pointer to an allocated object is obtained from the c->freelist pointer but this seems to be overridden by the pointer to the next free object, object[c->offset], which isn't yet tracked by kmemleak. So, during a memory scan, it shouldn't matter that the kmem_cache_cpu structures are called as they don't contain any pointer to an allocated (not free) object. The new slabs are allocated with alloc_pages() and these are not tracked by kmemleak. Is my understanding correct? Thanks. -- Catalin --
Hi Catalin, Indeed. For SLAB, it's a problem because the per-CPU cache pointer is not cleared from the struct array_cache upon _allocation_ which is the culprit of the false negative there. Yes, it is and I was just confused. Thanks! Pekka --
This patch adds the Kconfig.debug and Makefile entries needed for building kmemleak into the kernel. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- lib/Kconfig.debug | 23 +++++++++++++++++++++++ mm/Makefile | 1 + 2 files changed, 24 insertions(+), 0 deletions(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index b0f239e..1e59827 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -290,6 +290,29 @@ config SLUB_STATS out which slabs are relevant to a particular load. Try running: slabinfo -DA +config DEBUG_MEMLEAK + bool "Kernel memory leak detector" + default n + depends on EXPERIMENTAL + select DEBUG_SLAB if SLAB + select SLUB_DEBUG if SLUB + select DEBUG_FS + select STACKTRACE + select FRAME_POINTER + select KALLSYMS + help + Say Y here if you want to enable the memory leak + detector. The memory allocation/freeing is traced in a way + similar to the Boehm's conservative garbage collector, the + difference being that the orphan objects are not freed but + only shown in /sys/kernel/debug/memleak. Enabling this + feature will introduce an overhead to memory + allocations. See Documentation/kmemleak.txt for more + details. + + In order to access the memleak file, debugfs needs to be + mounted (usually at /sys/kernel/debug). + config DEBUG_PREEMPT bool "Debug preemptible kernel" depends on DEBUG_KERNEL && PREEMPT && (TRACE_IRQFLAGS_SUPPORT || PPC64) diff --git a/mm/Makefile b/mm/Makefile index c06b45a..3e43536 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -34,3 +34,4 @@ obj-$(CONFIG_MIGRATION) += migrate.o obj-$(CONFIG_SMP) += allocpercpu.o obj-$(CONFIG_QUICKLIST) += quicklist.o obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o page_cgroup.o +obj-$(CONFIG_DEBUG_MEMLEAK) += memleak.o --
So, not all architectures have STACKTRACE or FRAME_POINTER. I think a few of these should at least be done with depends. Is this feature accessible if DEBUG_FS=n? It seems to compile OK, but I wonder if it is useful. -- Dave --
I think it could depend on STACKTRACE_SUPPORT. Alternatively, it could select STACKTRACE only if it is supported, though for architectures without it, the kmemleak reports wouldn't be very useful. Does FRAME_POINTER even matter? I think STACKTRACE should be enough to get the backtrace. I even have some ARM patches for stack unwinding Well, it is recommended. If you don't have this, you can't trigger a scan manually by reading the /sys/kernel/debug/memleak file (have to rely on the automatic thread). In my local tree (not published yet), I also added support for run-time configuration by writing to this file. Is there any disadvantage in always selecting DEBUG_FS? Thanks. -- Catalin --
I think they'd still be pretty useful. They would be certainly harder
to track down, but "something allocating N bytes from the slab is
It is supposed to give cleaner stack traces. But if you don't strictly
require it, I'd leave it up to whatever the user had set before. If
people are getting crappy stack traces from anywhere in the kernel, I
I was just worried that debugfs was the only mechanism for this feature
to get its data in and out of the kernel. If it was not there, that
this feature was useless.
The problem with 'select' when it is "mixed" with 'depends':
config DEBUG_FS
bool "Debug Filesystem"
depends on SYSFS
When you 'select DEBUG_FS' it will *not* turn on SYSFS:
$ egrep 'DEBUG_FS|MEMLEAK|G_SYSFS' .config
# CONFIG_SYSFS is not set
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_MEMLEAK=y
CONFIG_DEBUG_MEMLEAK_TEST=y
I also tried compiling your feature with a bunch of things turned off in
my .config: http://sr71.net/~dave/linux/kmemleak.config
I got a compile error:
/home/dave/work/temp/linux-2.6/mm/memleak-test.c: In function ‘memleak_test_init’:
/home/dave/work/temp/linux-2.6/mm/memleak-test.c:61: error: ‘files_cachep’ undeclared (first use in this function)
/home/dave/work/temp/linux-2.6/mm/memleak-test.c:61: error: (Each undeclared identifier is reported only once
/home/dave/work/temp/linux-2.6/mm/memleak-test.c:61: error: for each function it appears in.)
make allnoconfig is cool. :)
-- Dave
--
This patch adds the CONFIG_DEBUG_KEEP_INIT option which preserves the
.init.* sections after initialization. Memory leaks happening during
this phase can be more easily tracked.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
include/linux/init.h | 6 ++++++
lib/Kconfig.debug | 12 ++++++++++++
2 files changed, 18 insertions(+), 0 deletions(-)
diff --git a/include/linux/init.h b/include/linux/init.h
index 68cb026..41321ad 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -40,9 +40,15 @@
/* These are for everybody (although not all archs will actually
discard it in modules) */
+#ifdef CONFIG_DEBUG_KEEP_INIT
+#define __init
+#define __initdata
+#define __initconst
+#else
#define __init __section(.init.text) __cold notrace
#define __initdata __section(.init.data)
#define __initconst __section(.init.rodata)
+#endif
#define __exitdata __section(.exit.data)
#define __exit_call __used __section(.exitcall.exit)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1e59827..72cde77 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -313,6 +313,18 @@ config DEBUG_MEMLEAK
In order to access the memleak file, debugfs needs to be
mounted (usually at /sys/kernel/debug).
+config DEBUG_KEEP_INIT
+ bool "Do not free the __init code/data"
+ default n
+ depends on DEBUG_MEMLEAK
+ help
+ This option moves the __init code/data out of the
+ .init.text/.init.data sections. It is useful for identifying
+ memory leaks happening during the kernel or modules
+ initialization.
+
+ If unsure, say N.
+
config DEBUG_PREEMPT
bool "Debug preemptible kernel"
depends on DEBUG_KERNEL && PREEMPT && (TRACE_IRQFLAGS_SUPPORT || PPC64)
--
This patch manipulate the section names of these functions. The better way would be to keep the section names as they are and then in init.h decide where to add these sections. This will require a new set of CONFIG_ symbols but then it is obvious what happens. Something like: config KEEP_INIT bool config KMEMLEAK ... select KEEP_INIT select DEBUG_KEEP_CPUINIT select DEBUG_KEEP_MEMINIT config HOTPLUG ... select KEEP_INIT And then use these symbols in vmlinux.lds.h Sam --
Sam, Thanks for your comments. I had a look at the vmlinux.lds.h file and there are indeed better options like DEV_KEEP etc. Anyway, I think I'll drop this patch completely. All that kmemleak needs is actually the kernel symbols to be able to print the stack trace. These don't seem to be removed together with the .init sections cleanup. -- Catalin --
This patch adds a loadable module that deliberately leaks memory. It is used for testing various memory leaking scenarios. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- lib/Kconfig.debug | 11 +++++ mm/Makefile | 1 mm/memleak-test.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 122 insertions(+), 0 deletions(-) create mode 100644 mm/memleak-test.c diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 72cde77..205c1da 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -313,6 +313,17 @@ config DEBUG_MEMLEAK In order to access the memleak file, debugfs needs to be mounted (usually at /sys/kernel/debug). +config DEBUG_MEMLEAK_TEST + tristate "Test the kernel memory leak detector" + default n + depends on DEBUG_MEMLEAK + help + Say Y or M here to build a test for the kernel memory leak + detector. This option enables a module that explicitly leaks + memory. + + If unsure, say N. + config DEBUG_KEEP_INIT bool "Do not free the __init code/data" default n diff --git a/mm/Makefile b/mm/Makefile index 3e43536..deb5935 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -35,3 +35,4 @@ obj-$(CONFIG_SMP) += allocpercpu.o obj-$(CONFIG_QUICKLIST) += quicklist.o obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o page_cgroup.o obj-$(CONFIG_DEBUG_MEMLEAK) += memleak.o +obj-$(CONFIG_DEBUG_MEMLEAK_TEST) += memleak-test.o diff --git a/mm/memleak-test.c b/mm/memleak-test.c new file mode 100644 index 0000000..0f3e651 --- /dev/null +++ b/mm/memleak-test.c @@ -0,0 +1,110 @@ +/* + * mm/memleak-test.c + * + * Copyright (C) 2008 ARM Limited + * Written by Catalin Marinas <catalin.marinas@arm.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; ...
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- MAINTAINERS | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 24741de..44ee125 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2503,6 +2503,12 @@ L: kernel-janitors@vger.kernel.org W: http://www.kerneljanitors.org/ S: Maintained +KERNEL MEMORY LEAK DETECTOR +P: Catalin Marinas +M: catalin.marinas@arm.com +L: linux-kernel@vger.kernel.org +S: Maintained + KERNEL NFSD, SUNRPC, AND LOCKD SERVERS P: J. Bruce Fields M: bfields@fieldses.org --
Something I forgot to mention here - I dropped the mem_map array scanning since it doesn't work in all the kernel configurations and, as Andrew Morton pointed out, it goes too deep into the kernel structures. I didn't see any false positives on ARM but it may need more testing. -- Catalin --
