The following patches enables Valgrind to produce useful results for User Mode Linux. Valgrind requires some patches as well. A list of the limitations and a recipe for building and using everything can be found on this page: http://uml.jfdi.org/uml/Wiki.jsp?page=ValgrindingUML Documentation/uml/valgrind-suppressions | 95 arch/um/Makefile | 4 arch/um/include/valgrind.h | 12 arch/um/kernel/physmem.c | 21 arch/um/kernel/process.c | 31 arch/um/kernel/skas/process.c | 3 arch/um/os-Linux/skas/process.c | 7 include/asm-um/thread_info.h | 8 include/linux/memcheck.h | 285 ++ include/linux/sched.h | 6 include/linux/valgrind.h | 3928 ++++++++++++++++++++++++++++++++ lib/Kconfig.debug | 20 mm/page_alloc.c | 4 mm/slab.c | 20 mm/vmalloc.c | 15 15 files changed, 4456 insertions(+), 3 deletions(-) -- Steve --
Kconfig option for Valgrind. Suppression file, for known non-issues. Valgrind header files (svn 8534) that define the client request mechanism used to annotate programs plus a couple lines to integrate with Kconfig. Signed-off-by: Steve VanDeBogart <vandebo-lkml@nerdbox.net> --- Index: linux-2.6.27-rc5/lib/Kconfig.debug =================================================================== --- linux-2.6.27-rc5.orig/lib/Kconfig.debug 2008-08-29 14:24:28.000000000 -0700 +++ linux-2.6.27-rc5/lib/Kconfig.debug 2008-08-29 14:24:31.000000000 -0700 @@ -752,6 +752,26 @@ Say N if you are unsure. +config VALGRIND_SUPPORT + bool "Add support to run UML under Valgrind (EXPERIMENTAL)" + depends on UML && EXPERIMENTAL + help + This option adds support and annotations to the UML kernel image + so that Valgrind can run and provide meaningful output. See + <http://uml.jfdi.org/uml/Wiki.jsp?page=ValgrindingUML> for more + information. + + If unsure, say N. + +config VALGRIND_LOCATION + string "Prefix path for Valgrind header files." + depends on VALGRIND_SUPPORT + default "/usr" + help + This option tells UML where to look for Valgrind header files. + If you've built Valgrind from source, use the value of the + --prefix parameter to configure. + source "samples/Kconfig" source "lib/Kconfig.kgdb" Index: linux-2.6.27-rc5/include/linux/memcheck.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.27-rc5/include/linux/memcheck.h 2008-08-29 14:24:31.000000000 -0700 @@ -0,0 +1,285 @@ + +/* + ---------------------------------------------------------------- + + Notice that the following BSD-style license applies to this one + file (memcheck.h) only. The rest of Valgrind is licensed under the + terms of the GNU General Public License, version 2, unless + otherwise indicated. See the COPYING file in the source + distribution for details. + + ...
Couldn't you just get the valgrind.h header from the currently installed valgrind? UML relies on host includes anyways, so I don't see a reason to not do that for valgrind too. -Andi --
Kernel code can't get headers from the system, unless you're thinking
#include "/usr/include/valgrind.h"
or something similar.
Jeff
--
Work email - jdike at linux dot intel dot com
--
Some um files at least least include sys/* files which are surely user space? If they can include that why not valgrind.h? -Andi --
The userspace side of UML can (and, indeed must) use the system headers. Here, we can just #include <valgrind.h>. However, the kernel side of UML, where the allocator annotations are, can't use system headers. The kernel side of things has to be self-contained within the kernel tree. Jeff -- Work email - jdike at linux dot intel dot com --
Add a flag to tell Valgrind to run the forked child natively. Necessary
because Valgrind makes additional system calls to instrumented processes,
which confuse UML.
Signed-off-by: Steve VanDeBogart <vandebo-lkml@nerdbox.net>
---
Index: linux-2.6.27-rc5/arch/um/os-Linux/skas/process.c
===================================================================
--- linux-2.6.27-rc5.orig/arch/um/os-Linux/skas/process.c 2008-08-29 15:50:24.000000000 -0700
+++ linux-2.6.27-rc5/arch/um/os-Linux/skas/process.c 2008-08-29 15:51:45.000000000 -0700
@@ -26,6 +26,7 @@
#include "skas_ptrace.h"
#include "user.h"
#include "sysdep/stub.h"
+#include "valgrind.h"
int is_skas_winch(int pid, int fd, void *data)
{
@@ -297,8 +298,12 @@
flags = CLONE_FILES;
if (proc_mm)
flags |= CLONE_VM;
- else
+ else {
flags |= SIGCHLD;
+#ifdef UML_CONFIG_VALGRIND_SUPPORT
+ flags |= VALGRIND_CLONE_LETGO;
+#endif
+ }
pid = clone(userspace_tramp, (void *) sp, flags, (void *) stub_stack);
if (pid < 0) {
--
I keep forgetting that I have a really bad feeling about this: +#define VALGRIND_CLONE_LETGO 0x80000000 /* do not track fork like childr en*/ This is effectively appropriating part of the kernel's ABI for valgrind's use. Not to mention that that bit is already taken: #define CLONE_IO 0x80000000 /* Clone io context */ Could you do this with an annotation that says "let the next clone run untraced"? Jeff -- Work email - jdike at linux dot intel dot com --
UML is part of the kernel, so getting a memory reference checker (valgrind) running in UML is part of the kernel, too. The concept of "escape from the It wasn't taken a few months ago when the valgrind patches for UML were first proposed. The list of free bits in that flag word is now empty. There may be some overlap of concept with CLONE_UNTRACED, which might Why wouldn't that be a race between the next _NR_clone from this thread and the next _NR_clone from any other existing thread [in the same process]? Valgrind can pre-pend a block of code at the start of the new thread, but almost immediately that code will want to "unvirtualize." Doing so at _NR_clone itself is convenient all around. -- John Reiser, jreiser@BitWagon.com --
Irrelevant - what if UML, or anything else for that matter, starts using CLONE_IO? All of a sudden, valgrind will start letting those Yeah, if you cloned in a signal handler, that would be a problem. How about sticking the annotation in the thread itself? This may be Jeff -- Work email - jdike at linux dot intel dot com --
Track and tell valgrind about kernel mode stacks. Valgrind gets confused
without these annotations because it expects processes to only use their
initial stack and stacks created for threads by way of clone.
Signed-off-by: Steve VanDeBogart <vandebo-lkml@nerdbox.net>
---
Index: linux-2.6.27-rc5/arch/um/kernel/process.c
===================================================================
--- linux-2.6.27-rc5.orig/arch/um/kernel/process.c 2008-08-29 14:17:31.000000000 -0700
+++ linux-2.6.27-rc5/arch/um/kernel/process.c 2008-08-29 14:41:34.000000000 -0700
@@ -16,6 +16,7 @@
#include <linux/sched.h>
#include <linux/tick.h>
#include <linux/threads.h>
+#include <linux/valgrind.h>
#include <asm/current.h>
#include <asm/pgtable.h>
#include <asm/uaccess.h>
@@ -62,10 +63,40 @@
if (atomic)
flags = GFP_ATOMIC;
page = __get_free_pages(flags, order);
+ /* There are long lived stacks and we won't free them */
+ if (page)
+ VALGRIND_STACK_REGISTER(page + (PAGE_SIZE << order) - 1, page);
return page;
}
+struct thread_info *alloc_thread_info(struct task_struct *tsk)
+{
+ struct thread_info *ti;
+#ifdef CONFIG_DEBUG_STACK_USAGE
+ gfp_t mask = GFP_KERNEL | __GFP_ZERO;
+#else
+ gfp_t mask = GFP_KERNEL;
+#endif
+ ti = (struct thread_info *) __get_free_pages(mask,
+ CONFIG_KERNEL_STACK_ORDER);
+#ifdef CONFIG_VALGRIND_SUPPORT
+ if (ti) {
+ VALGRIND_MALLOCLIKE_BLOCK(ti, sizeof(*ti), 0, 0);
+ ti->valgrind_sid = VALGRIND_STACK_REGISTER((unsigned long)ti
+ + UM_THREAD_SIZE - 1, ti + 1);
+ }
+#endif
+ return ti;
+}
+
+void free_thread_info(struct thread_info *ti)
+{
+ VALGRIND_STACK_DEREGISTER(ti->valgrind_sid);
+ VALGRIND_FREELIKE_BLOCK(ti, 0);
+ free_pages((unsigned long)ti, CONFIG_KERNEL_STACK_ORDER);
+}
+
int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)
{
int pid;
Index: linux-2.6.27-rc5/arch/um/kernel/skas/process.c
===================================================================
--- ...Valgrind annotations for valgrind: memory is addressable once it's been
alloced, and unaddressable when it is freed again. Can't use malloc-like
and free-like because valgrind considers a malloc-like chunk indivisible.
Signed-off-by: Steve VanDeBogart <vandebo-lkml@nerdbox.net>
---
Index: linux-2.6.27-rc5/mm/page_alloc.c
===================================================================
--- linux-2.6.27-rc5.orig/mm/page_alloc.c 2008-08-29 14:24:27.000000000 -0700
+++ linux-2.6.27-rc5/mm/page_alloc.c 2008-08-29 14:24:37.000000000 -0700
@@ -46,6 +46,7 @@
#include <linux/page-isolation.h>
#include <linux/memcontrol.h>
#include <linux/debugobjects.h>
+#include <linux/memcheck.h>
#include <asm/tlbflush.h>
#include <asm/div64.h>
@@ -1080,6 +1081,7 @@
if (!page)
goto failed;
}
+ VALGRIND_MAKE_MEM_UNDEFINED(page_address(page), PAGE_SIZE << order);
__count_zone_vm_events(PGALLOC, zone, 1 << order);
zone_statistics(preferred_zone, zone);
@@ -1679,6 +1681,8 @@
void __free_pages(struct page *page, unsigned int order)
{
if (put_page_testzero(page)) {
+ VALGRIND_MAKE_MEM_NOACCESS(page_address(page),
+ PAGE_SIZE << order);
if (order == 0)
free_hot_page(page);
else
--
Valgrind provides a header file that defines these annotation functions. For the ease of tracking changes to this header file, I've made minimal changes to it (just a couple lines to integrate with Kconfig). If the interfaces and/or style is objectionable to the kernel community at large, than we will have to decide to either wrap the interface that Valgrind provides or modify the header and track changes manually. -- Steve --
Hi Steve,
On Wed, Sep 3, 2008 at 8:25 AM, Steve VanDeBogart
That's trivially fixable by adding a wrapper for the ugly valgrind
macros and hiding that in some header file. Preferably something
similar to what the kmemcheck hooks look like so that we can also use
them in the future if we implement tracking at page allocator level
too.
Pekka
--
Valgrind annotations for the slab allocator: Malloc-like and free-like
for cache_alloc and free. Telling Valgrind a region is free-like clears
all the valid bits, so slabs with constructors need different treatment;
tell Valgrind about slab objects when first constructed and free them
when the slab is destroyed.
Signed-off-by: Steve VanDeBogart <vandebo-lkml@nerdbox.net>
---
Index: linux-2.6.27-rc5/mm/slab.c
===================================================================
--- linux-2.6.27-rc5.orig/mm/slab.c 2008-08-29 14:24:25.000000000 -0700
+++ linux-2.6.27-rc5/mm/slab.c 2008-08-29 14:24:42.000000000 -0700
@@ -111,6 +111,7 @@
#include <linux/rtmutex.h>
#include <linux/reciprocal_div.h>
#include <linux/debugobjects.h>
+#include <linux/memcheck.h>
#include <asm/cacheflush.h>
#include <asm/tlbflush.h>
@@ -1906,6 +1907,8 @@
int i;
for (i = 0; i < cachep->num; i++) {
void *objp = index_to_obj(cachep, slabp, i);
+ if (cachep->ctor)
+ VALGRIND_FREELIKE_BLOCK(objp, 0);
if (cachep->flags & SLAB_POISON) {
#ifdef CONFIG_DEBUG_PAGEALLOC
@@ -1932,6 +1935,15 @@
#else
static void slab_destroy_debugcheck(struct kmem_cache *cachep, struct slab *slabp)
{
+#ifdef CONFIG_VALGRIND_SUPPORT
+ int i;
+ if (cachep->ctor) {
+ for (i = 0; i < cachep->num; i++) {
+ void *objp = index_to_obj(cachep, slabp, i);
+ VALGRIND_FREELIKE_BLOCK(objp, 0);
+ }
+ }
+#endif
}
#endif
@@ -2635,6 +2647,9 @@
for (i = 0; i < cachep->num; i++) {
void *objp = index_to_obj(cachep, slabp, i);
+ if (cachep->ctor)
+ VALGRIND_MALLOCLIKE_BLOCK(objp, cachep->buffer_size,
+ 0, 0);
#if DEBUG
/* need to poison the objs? */
if (cachep->flags & SLAB_POISON)
@@ -3466,6 +3481,8 @@
objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
prefetchw(objp);
+ if (!cachep->ctor)
+ VALGRIND_MALLOCLIKE_BLOCK(objp, cachep->buffer_size, 0, 0);
if (unlikely((flags & __GFP_ZERO) && objp))
memset(objp, 0, ...Hi Steve, On Sat, Aug 30, 2008 at 2:17 AM, Steve VanDeBogart OK, I'm biased (I'm one of the kmemcheck developers) but these hooks to SLAB are too ugly to live with. My preferred solution is that you reuse the kmemcheck annotations kmemcheck_slab_alloc()/kmemcheck_slab_free() we have: http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-tip.git;a=commitdiff;h=30532cb... I'm not sure why you want to treat caches with constructor differently. Sure, the memory regions *are* initialized but from programmer's point of view you're not supposed to be touching the memory unless you got it from kmalloc() or kmem_cache_alloc(). Same goes for kfree() and kmem_cache_free() -- no touchy touchy after you pass a pointer to either of the functions (unless you're RCU, of course). Pekka --
Hi, I developed the predecessor to Steve's patches. The patch treats a cache with a constructor differently because the semantics of kmalloc are different in that case. If a cache has a constructor then the object returned by kmalloc has been initialized for certain. Namely, the constructor initialized the object before it was kmalloc'ed the first time, and each corresponding caller [re-]initialized the object before calling kfree. This is emphasized by comments in mm/slab.c: * This means, that your constructor is used only for newly allocated * slabs and you must pass objects with the same initializations to * kmem_cache_free. The slab allocator has the property that the contents of a block [object] that is returned by kmalloc are identical to the contents that were passed to kfree (or back as result from the constructor) unless something special happens, such as SLAB_POISON. A block [object] which is handed back and forth between kfree and kmalloc builds and retains history throughout the lifetime of the cache. This property is directly contrary to the semantics of libc malloc. The contents of the result of libc malloc must be considered to be *un*initialized always. Yes, there is at least one kmalloc+kfree slab object cache that depends on retaining state from kfree to kmalloc. My first attempt treated kmalloc+kfree like libc malloc+free, but I soon learned that was not the actual semantics. Because the slab allocator deals in objects that persist through kfree and kmalloc, then there are only two indications that objects ever become uninitialized: SLAB_POISON, and not having a constructor. Obviously the application of SLAB_POISON means that the object becomes [logically] uninit. Not having a constructor *tends* to indicate that kmalloc() implies uninit, by chain of reasoning based on the comment quoted above from mm/slab.c. If the object was uninit upon first kmalloc, and if kfree is supposed to receive objects that are "ready for kmalloc", and because a ...
Hi,
For RCU, I was thinking of adding a new RCU_WAITING state that a RCU-freed
object is set to upon freeing that we can use to detect whatever errors
there might be. I've included an incomplete implementation of SLAB hooks
for that. Obviously there would need to be some additional code in
kmemcheck for that. No idea if that kind of model works for valgrind,
though.
Pekka
Index: linux-2.6/mm/slab.c
===================================================================
--- linux-2.6.orig/mm/slab.c 2008-08-30 13:42:17.000000000 +0300
+++ linux-2.6/mm/slab.c 2008-08-30 13:42:22.000000000 +0300
@@ -1647,11 +1647,23 @@
free_pages((unsigned long)addr, cachep->gfporder);
}
+static void kmemcheck_rcu_free(struct kmem_cache *cache, struct slab *slab)
+{
+ int i;
+
+ for (i = 0; i < cache->num; i++) {
+ void *obj = index_to_obj(cache, slab, i);
+
+ kmemcheck_slab_rcu_free(cache, obj, obj_size(cache));
+ }
+}
+
static void kmem_rcu_free(struct rcu_head *head)
{
struct slab_rcu *slab_rcu = (struct slab_rcu *)head;
struct kmem_cache *cachep = slab_rcu->cachep;
+ kmemcheck_rcu_free(cachep, (struct slab *)slab_rcu);
kmem_freepages(cachep, slab_rcu->addr);
if (OFF_SLAB(cachep))
kmem_cache_free(cachep->slabp_cache, slab_rcu);
Index: linux-2.6/include/linux/kmemcheck.h
===================================================================
--- linux-2.6.orig/include/linux/kmemcheck.h 2008-08-30 13:42:21.000000000 +0300
+++ linux-2.6/include/linux/kmemcheck.h 2008-08-30 13:42:22.000000000 +0300
@@ -9,6 +9,7 @@
KMEMCHECK_UNINITIALIZED,
KMEMCHECK_INITIALIZED,
KMEMCHECK_FREED,
+ KMEMCHECK_RCU_WAITING, /* waiting for RCU to free the object */
};
#ifdef CONFIG_KMEMCHECK
@@ -22,6 +23,7 @@
void kmemcheck_free_shadow(struct kmem_cache *s, struct page *page, int order);
void kmemcheck_slab_alloc(struct kmem_cache *s, gfp_t gfpflags, void *object,
size_t size);
+void kmemcheck_slab_rcu_free(struct kmem_cache *s, void *object, size_t ...Thank you for pointing out the location of the kmemcheck code. I looked around briefly but didn't find the latest version. It does seem that kmemcheck and Valgrind annotations want to live in the same places so it makes perfect sense to combine them where possible. I'm not exactly sure It is true that code above the allocator should not be touching free'd slab objects. However, it is also true that objects from slabs that have a constructor should retain their per byte un/initialized state through allocation and free cycles (just the semantic of slabs with constructors AFAICT). Ideally, we'd tell Valgrind that the bytes of a free'd slab object are no longer accessible, but the initialized state should remain the same until the object is made accessible again by the next allocation of the object. Unfortunately, the compression method for A & V bits in Valgrind doesn't allow a region to be inaccessible and retain validness bits. The best we can do without spending extra space or extra cycles it to leave the memory accessible while it is free in a slab. -- Steve --
Hi Steve,
On Wed, Sep 3, 2008 at 8:08 AM, Steve VanDeBogart
Upper case macros and littering #ifdefs in the code. See the kmemcheck
hooks how to do it in a clean way.
On Wed, Sep 3, 2008 at 8:08 AM, Steve VanDeBogart
Sorry for being unclear, sure, object should be marked as initialized
when they're returned from kmem_cache_alloc(). However, what I
disagree with is *where* you're marking them as initialized. Surely
they're not semantically initialized when the slabs are allocated
(although technically they are).
On Wed, Sep 3, 2008 at 8:08 AM, Steve VanDeBogart
I don't see why you should mark them initialized all the time. Just
mark them as uninitialized on kmem_cache_free() and again as
initialized when they're about to be returned from kmem_cache_alloc()
like we do in kmemcheck.
Pekka
--
Btw, actually, we don't mark them as uninitialized at the moment, but that's a bug in kmemcheck. --
Hello Pekka, But which bits of a slab object should be marked as initialized at kmem_cache_alloc() time? We can't mark all of them as initialized because the constructor may not initialize all of them (in fact, I've programmatically confirmed that there are constructors that don't initialize all the bytes of an object). The only place to get the information of interest is to mark all bytes uninitialized and then This question has the same answer as above. -- Steve --
Hi Steve,
On Wed, Sep 3, 2008 at 6:42 PM, Steve VanDeBogart
Aah, indeed. We should fix kmemcheck as well then.
Pekka
--
Valgrind annotations for vmalloc: Valgrind doesn't understand memory
that is mapped to more than one address. Approximate validness by
assuming that the physical mapping won't be used while it is vmalloc'd
and copy the valid bits from the physical page when the fault handler
maps it in.
Signed-off-by: Steve VanDeBogart <vandebo-lkml@nerdbox.net>
---
Index: linux-2.6.27-rc5/arch/um/kernel/physmem.c
===================================================================
--- linux-2.6.27-rc5.orig/arch/um/kernel/physmem.c 2008-08-29 14:17:31.000000000 -0700
+++ linux-2.6.27-rc5/arch/um/kernel/physmem.c 2008-08-29 14:24:46.000000000 -0700
@@ -6,6 +6,7 @@
#include "linux/bootmem.h"
#include "linux/mm.h"
#include "linux/pfn.h"
+#include "linux/memcheck.h"
#include "asm/page.h"
#include "as-layout.h"
#include "init.h"
@@ -71,6 +72,26 @@
panic("map_memory(0x%lx, %d, 0x%llx, %ld, %d, %d, %d) failed, "
"err = %d\n", virt, fd, offset, len, r, w, x, err);
}
+#ifdef CONFIG_VALGRIND_SUPPORT
+ if (virt >= VMALLOC_START && virt < VMALLOC_END) {
+ /* As far as I know, the backing pages were just page alloc'd,
+ * so they are addressable, but may be either valid or invalid
+ * (depending on gfp_mask). The virtual address may be part of
+ * a vmalloc region, or a guard page, so inaddressability is ok.
+ */
+#define CHUNK_SIZE (PAGE_SIZE/8)
+ char vbits[CHUNK_SIZE];
+ int i;
+ if (len % (CHUNK_SIZE) != 0)
+ panic("Expected len to be a multiple of page size");
+ for (i = 0; i < len; i += CHUNK_SIZE) {
+ if (VALGRIND_GET_VBITS(__va(phys + i), vbits,
+ CHUNK_SIZE) > 1)
+ panic("Expected addressable source memory");
+ VALGRIND_SET_VBITS(virt + i, vbits, CHUNK_SIZE);
+ }
+ }
+#endif
}
extern int __syscall_stub_start;
Index: linux-2.6.27-rc5/mm/vmalloc.c
===================================================================
--- linux-2.6.27-rc5.orig/mm/vmalloc.c 2008-08-29 14:17:38.000000000 -0700
+++ ...