Hi everybody, As usual, the kmemtrace userspace repo is located at git://repo.or.cz/kmemtrace-user.git It's not updated now, but I will rebase it. So re-clone it, don't just git-rebase it. The changes were too extensive and I'd like to keep the revision history clean. Changes in kmemtrace: - new ABI, supports variable sized packets and it's much shorter (it has specific fields for allocations) - we'll use splice() in userspace - replaced timestamps with sequence numbers, since timestamps don't have a good enough resolution (though they could be added as an additional feature) - used relay_reserve() as Mathieu Desnoyers suggested - moved additional docs into a different commit and documented the replacement of inline with __always_inline in those commits Please have a look and let me know what you think. Eduard - Gabriel Munteanu (5): kmemtrace: Core implementation. kmemtrace: Additional documentation. kmemtrace: SLAB hooks. kmemtrace: SLUB hooks. kmemtrace: SLOB hooks. Documentation/ABI/testing/debugfs-kmemtrace | 71 ++++++ Documentation/kernel-parameters.txt | 10 + Documentation/vm/kmemtrace.txt | 126 ++++++++++ MAINTAINERS | 6 + include/linux/kmemtrace.h | 85 +++++++ include/linux/slab_def.h | 68 +++++- include/linux/slob_def.h | 9 +- include/linux/slub_def.h | 53 ++++- init/main.c | 2 + lib/Kconfig.debug | 28 +++ mm/Makefile | 2 +- mm/kmemtrace.c | 335 +++++++++++++++++++++++++++ mm/slab.c | 71 +++++- mm/slob.c | 37 +++- mm/slub.c | 66 +++++- 15 files changed, 933 insertions(+), 36 deletions(-) create mode 100644 ...
kmemtrace provides tracing for slab allocator functions, such as kmalloc,
kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected data is then fed
to the userspace application in order to analyse allocation hotspots,
internal fragmentation and so on, making it possible to see how well an
allocator performs, as well as debug and profile kernel code.
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
Documentation/kernel-parameters.txt | 10 +
MAINTAINERS | 6 +
include/linux/kmemtrace.h | 85 +++++++++
init/main.c | 2 +
lib/Kconfig.debug | 28 +++
mm/Makefile | 2 +-
mm/kmemtrace.c | 335 +++++++++++++++++++++++++++++++++++
7 files changed, 467 insertions(+), 1 deletions(-)
create mode 100644 include/linux/kmemtrace.h
create mode 100644 mm/kmemtrace.c
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index b52f47d..446a257 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -49,6 +49,7 @@ parameter is applicable:
ISAPNP ISA PnP code is enabled.
ISDN Appropriate ISDN support is enabled.
JOY Appropriate joystick support is enabled.
+ KMEMTRACE kmemtrace is enabled.
LIBATA Libata driver is enabled
LP Printer support is enabled.
LOOP Loopback device support is enabled.
@@ -941,6 +942,15 @@ and is between 256 and 4096 characters. It is defined in the file
use the HighMem zone if it exists, and the Normal
zone if it does not.
+ kmemtrace.enable= [KNL,KMEMTRACE] Format: { yes | no }
+ Controls whether kmemtrace is enabled
+ at boot-time.
+
+ kmemtrace.subbufs=n [KNL,KMEMTRACE] Overrides the number of
+ subbufs kmemtrace's relay channel has. Set this
+ higher than default (KMEMTRACE_N_SUBBUFS in code) if
+ you experience buffer overruns.
+
movablecore=nn[KMG] [KNL,X86-32,IA-64,PPC,X86-64] This ...Documented kmemtrace's ABI, purpose and design. Also includes a short usage guide, FAQ, as well as a link to the userspace application's Git repository, which is currently hosted at repo.or.cz. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> --- Documentation/ABI/testing/debugfs-kmemtrace | 71 +++++++++++++++ Documentation/vm/kmemtrace.txt | 126 +++++++++++++++++++++++++++ 2 files changed, 197 insertions(+), 0 deletions(-) create mode 100644 Documentation/ABI/testing/debugfs-kmemtrace create mode 100644 Documentation/vm/kmemtrace.txt diff --git a/Documentation/ABI/testing/debugfs-kmemtrace b/Documentation/ABI/testing/debugfs-kmemtrace new file mode 100644 index 0000000..a5ff9a6 --- /dev/null +++ b/Documentation/ABI/testing/debugfs-kmemtrace @@ -0,0 +1,71 @@ +What: /sys/kernel/debug/kmemtrace/ +Date: July 2008 +Contact: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> +Description: + +In kmemtrace-enabled kernels, the following files are created: + +/sys/kernel/debug/kmemtrace/ + cpu<n> (0400) Per-CPU tracing data, see below. (binary) + total_overruns (0400) Total number of bytes which were dropped from + cpu<n> files because of full buffer condition, + non-binary. (text) + abi_version (0400) Kernel's kmemtrace ABI version. (text) + +Each per-CPU file should be read according to the relay interface. That is, +the reader should set affinity to that specific CPU and, as currently done by +the userspace application (though there are other methods), use poll() with +an infinite timeout before every read(). Otherwise, erroneous data may be +read. The binary data has the following _core_ format: + + Event ID (1 byte) Unsigned integer, one of: + 0 - represents an allocation (KMEMTRACE_EVENT_ALLOC) + 1 - represents a freeing of previously allocated memory + (KMEMTRACE_EVENT_FREE) + Type ID (1 byte) Unsigned integer, one of: + 0 - this is a kmalloc() / kfree() + 1 - this is a kmem_cache_alloc() / ...
This adds hooks for the SLAB allocator, to allow tracing with kmemtrace.
We also convert some inline functions to __always_inline to make sure
_RET_IP_, which expands to __builtin_return_address(0), always works
as expected.
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
include/linux/slab_def.h | 68 ++++++++++++++++++++++++++++++++++++++------
mm/slab.c | 71 +++++++++++++++++++++++++++++++++++++++++----
2 files changed, 123 insertions(+), 16 deletions(-)
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 39c3a5e..7555ce9 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -14,6 +14,7 @@
#include <asm/page.h> /* kmalloc_sizes.h needs PAGE_SIZE */
#include <asm/cache.h> /* kmalloc_sizes.h needs L1_CACHE_BYTES */
#include <linux/compiler.h>
+#include <linux/kmemtrace.h>
/* Size description struct for general caches. */
struct cache_sizes {
@@ -28,8 +29,26 @@ extern struct cache_sizes malloc_sizes[];
void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
void *__kmalloc(size_t size, gfp_t flags);
-static inline void *kmalloc(size_t size, gfp_t flags)
+#ifdef CONFIG_KMEMTRACE
+extern void *kmem_cache_alloc_notrace(struct kmem_cache *cachep, gfp_t flags);
+extern size_t slab_buffer_size(struct kmem_cache *cachep);
+#else
+static __always_inline void *
+kmem_cache_alloc_notrace(struct kmem_cache *cachep, gfp_t flags)
{
+ return kmem_cache_alloc(cachep, flags);
+}
+static inline size_t slab_buffer_size(struct kmem_cache *cachep)
+{
+ return 0;
+}
+#endif
+
+static __always_inline void *kmalloc(size_t size, gfp_t flags)
+{
+ struct kmem_cache *cachep;
+ void *ret;
+
if (__builtin_constant_p(size)) {
int i = 0;
@@ -50,10 +69,17 @@ static inline void *kmalloc(size_t size, gfp_t flags)
found:
#ifdef CONFIG_ZONE_DMA
if (flags & GFP_DMA)
- return kmem_cache_alloc(malloc_sizes[i].cs_dmacachep,
- flags);
+ cachep = ...This adds hooks for the SLUB allocator, to allow tracing with kmemtrace.
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
include/linux/slub_def.h | 53 ++++++++++++++++++++++++++++++++++--
mm/slub.c | 66 +++++++++++++++++++++++++++++++++++++++++----
2 files changed, 110 insertions(+), 9 deletions(-)
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index d117ea2..d77012a 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -10,6 +10,7 @@
#include <linux/gfp.h>
#include <linux/workqueue.h>
#include <linux/kobject.h>
+#include <linux/kmemtrace.h>
enum stat_item {
ALLOC_FASTPATH, /* Allocation from cpu slab */
@@ -203,13 +204,31 @@ static __always_inline struct kmem_cache *kmalloc_slab(size_t size)
void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
void *__kmalloc(size_t size, gfp_t flags);
+#ifdef CONFIG_KMEMTRACE
+extern void *kmem_cache_alloc_notrace(struct kmem_cache *s, gfp_t gfpflags);
+#else
+static __always_inline void *
+kmem_cache_alloc_notrace(struct kmem_cache *s, gfp_t gfpflags)
+{
+ return kmem_cache_alloc(s, gfpflags);
+}
+#endif
+
static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
{
- return (void *)__get_free_pages(flags | __GFP_COMP, get_order(size));
+ unsigned int order = get_order(size);
+ void *ret = (void *) __get_free_pages(flags, order);
+
+ kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
+ size, PAGE_SIZE << order, flags);
+
+ return ret;
}
static __always_inline void *kmalloc(size_t size, gfp_t flags)
{
+ void *ret;
+
if (__builtin_constant_p(size)) {
if (size > PAGE_SIZE)
return kmalloc_large(size, flags);
@@ -220,7 +239,13 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
if (!s)
return ZERO_SIZE_PTR;
- return kmem_cache_alloc(s, flags);
+ ret = kmem_cache_alloc_notrace(s, flags);
+
+ kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC,
+ ...This adds hooks for the SLOB allocator, to allow tracing with kmemtrace.
We also convert some inline functions to __always_inline to make sure
_RET_IP_, which expands to __builtin_return_address(0), always works
as expected.
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
include/linux/slob_def.h | 9 +++++----
mm/slob.c | 37 +++++++++++++++++++++++++++++++------
2 files changed, 36 insertions(+), 10 deletions(-)
diff --git a/include/linux/slob_def.h b/include/linux/slob_def.h
index 59a3fa4..0ec00b3 100644
--- a/include/linux/slob_def.h
+++ b/include/linux/slob_def.h
@@ -3,14 +3,15 @@
void *kmem_cache_alloc_node(struct kmem_cache *, gfp_t flags, int node);
-static inline void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
+static __always_inline void *kmem_cache_alloc(struct kmem_cache *cachep,
+ gfp_t flags)
{
return kmem_cache_alloc_node(cachep, flags, -1);
}
void *__kmalloc_node(size_t size, gfp_t flags, int node);
-static inline void *kmalloc_node(size_t size, gfp_t flags, int node)
+static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
{
return __kmalloc_node(size, flags, node);
}
@@ -23,12 +24,12 @@ static inline void *kmalloc_node(size_t size, gfp_t flags, int node)
* kmalloc is the normal method of allocating memory
* in the kernel.
*/
-static inline void *kmalloc(size_t size, gfp_t flags)
+static __always_inline void *kmalloc(size_t size, gfp_t flags)
{
return __kmalloc_node(size, flags, -1);
}
-static inline void *__kmalloc(size_t size, gfp_t flags)
+static __always_inline void *__kmalloc(size_t size, gfp_t flags)
{
return kmalloc(size, flags);
}
diff --git a/mm/slob.c b/mm/slob.c
index a3ad667..23375ed 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -65,6 +65,7 @@
#include <linux/module.h>
#include <linux/rcupdate.h>
#include <linux/list.h>
+#include <linux/kmemtrace.h>
#include <asm/atomic.h>
/*
@@ -463,27 ...On Sun, Aug 10, 2008 at 8:14 PM, Eduard - Gabriel Munteanu I think Matt acked this already but as you dropped the tags, I'll ask once more before I merge this. --
Yeah, that's fine. Acked-by: Matt Mackall <mpm@selenic.com> -- Mathematics is the supreme nostalgia of our time. --
You could simplify the stuff in slub.h if you would fall back to the uninlined functions in the case that kmemtrace is enabled. IMHO adding additional inline _RET_IP == __builtin_return_address(0) right? Put that into a local variable? At least we need consistent usage within one function. Maybe convert And again. And another one. --
So, if CONFIG_KMEMTRACE is enabled, make the inlined version go away completely? I'm okay with that though I wonder if that means we now take a performance hit when CONFIG_KMEMTRACE is enabled but tracing is I think we should just convert SLUB to use _RET_IP_ everywhere. Eduard, care to make a patch and send it and rebase this on top of that? --
We already take a performance hit because of the additional function calls. With the above approach the kernel binary will grow significantly because you are now inserting an additional function call at all call sites. --
The function call is supposed to go away when we convert kmemtrace to use Mathieu's markers but I suppose even then we have a problem with inlining? --
The function calls are overwritten with NOPs? Or how does that work? --
Yes, the code is patched at runtime. But AFAIK markers already provide this stuff (called "immediate values"). Mathieu's tracepoints also do it. But it's not available on all arches. x86 and x86-64 work as far as I remember. --
The markers present in mainline kernel does not use immediate values. However, immediate values in tip does implement a load immediate/test/branch for x86, x86_64 and powerpc. I also have support for sparc64 in my lttng tree. Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 --
Did we ever see size(1) numbers for kernels with and without this support? I'm still a bit worried about adding branches to such a popular inline. Simply multiplying the branch test by the number of locations is pretty substantial, never mind the unlikely part of the branch. -- Mathematics is the supreme nostalgia of our time. --
I believe in the latest version they are just a variable test. But when Mathieu's immediate code makes it in (which it is in linux-tip), we will be overwriting the conditionals with nops (Mathieu, correct me if I'm wrong here). But the calls themselves are done in the unlikely branch. This is important, as Mathieu stated in previous thread. The reason is that all the stack setup for the function call is also in the unlikely branch, and the normal fast path does not take a hit for the function call setup. -- Steve --
The current immediate values in tip does a load immediate, test, branch, which removes the cost of the memory load. We will try to get gcc support to be able to declare patchable static jump sites, which could be patched with NOPs when disabled. But that will probably not happen "now". -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 --
The kmemtrace_mark_alloc_node itself is an inline function, which calls another inline function "trace_mark" which is designed to test a read_mostly variable, and will do an "unlikely" jmp if the variable is set (which it is when tracing is enabled), to the actual function call. There should be no extra function calls when this is configured on but tracing disabled. We try very hard to keep the speed of the tracer as close to a non tracing kernel as possible when tracing is disabled. -- Steve --
Makes sense. But then we have even more code bloat because of the tests that are inserted in all call sites of kmalloc. --
Are you talking about the tests that implement checking whether a marker is active or not? Those checks are already efficient, and will get more so with the "immediate values" optimization in or near the tree. - FChE --
AFAICT: Each test also adds an out of line call to the tracing facility. --
Frank, Christoph is correct. He's not bringing up the issue of efficiency, but the issue of bloat. The marker code will be added to everyplace that calls kmalloc. Which can be quite a lot. I'd be interested in seeing the size of the .text section with and without this patch added and makers configure in. -- Steve --
Hi - Yes, but that call is normally placed out of the cache-hot path with unlikely(). - FChE --
The long-term goal is to turn the tests into NOPs, but only once we get gcc support. Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 --
Oh, good. I'm also thinking to add a macro that expands to simple inline when Sure. Will get back soon. --
Why is this "- 4 bytes"? Is there an implied alignment byte somewhere in the ... --- ~Randy Linux Plumbers Conference, 17-19 September 2008, Portland, Oregon USA http://linuxplumbersconf.org/ --
