Hi everyone,
I hopefully fixed all your previous objections. I have also set up a git tree
for anyone who'd like to try kmemtrace (gitweb URL):http://repo.or.cz/w/linux-2.6/kmemtrace.git
Comment on the patchset and please try running kmemtrace if possible. Check
the docs for information on how to get the userspace tool and set it up.Important: the kmemtrace-user repo went stable and I'll not alter the revision
history anymore. BTW, don't be scared if you see many errors being reported by
kmemtrace-report, this is a known issue (I could use some advice on this if
you know what's going on).Changes since last submission:
1. fixed allocator tracing
2. wrote more documentation
3. reworked the ABI and documented it in Documentation/ABI; we don't include
kernel headers in userspace anymore
4. added support for disabling kmemtrace at boot-time
5. added provisions for disabling kmemtrace at runtime
6. changed slab allocators to use __always_inline instead of plain inline,
so that we're sure the return address is valid
7. removed some useless cast, as pointed out by Pekka EnbergSince the changes were quite extensive, I chose not to preserve any tags such
as "Reviewed-by".I'm waiting for your input on this.
Thanks,
EduardP.S.: Pekka, I followed your advice on adding a field containing the struct
size (managed to make room for it without adding to the current struct size).
This allows us to do crazy stuff in the future, like exporting the whole
stack trace on every allocation. Not sure how useful this is right now, but
let's keep the ABI extensible.Eduard - Gabriel Munteanu (4):
kmemtrace: Core implementation.
kmemtrace: SLAB hooks.
kmemtrace: SLUB hooks.
kmemtrace: SLOB hooks.Documentation/ABI/testing/debugfs-kmemtrace | 58 +++++++
Documentation/kernel-parameters.txt | 10 +
Documentation/vm/kmemtrace.txt | 126 ++++++++++++++
MAINTAINERS | 6 +
include/linux/kmemtrace.h ...
kmemtrace provides tracing for slab allocator functions, such as kmalloc,
kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected data is then fed
to the userspace application in order to analyse allocation hotspots,
internal fragmentation and so on, making it possible to see how well an
allocator performs, as well as debug and profile kernel code.Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
Documentation/ABI/testing/debugfs-kmemtrace | 58 +++++++
Documentation/kernel-parameters.txt | 10 +
Documentation/vm/kmemtrace.txt | 126 ++++++++++++++
MAINTAINERS | 6 +
include/linux/kmemtrace.h | 110 ++++++++++++
init/main.c | 2 +
lib/Kconfig.debug | 28 +++
mm/Makefile | 2 +-
mm/kmemtrace.c | 244 +++++++++++++++++++++++++++
9 files changed, 585 insertions(+), 1 deletions(-)
create mode 100644 Documentation/ABI/testing/debugfs-kmemtrace
create mode 100644 Documentation/vm/kmemtrace.txt
create mode 100644 include/linux/kmemtrace.h
create mode 100644 mm/kmemtrace.cdiff --git a/Documentation/ABI/testing/debugfs-kmemtrace b/Documentation/ABI/testing/debugfs-kmemtrace
new file mode 100644
index 0000000..466c2bb
--- /dev/null
+++ b/Documentation/ABI/testing/debugfs-kmemtrace
@@ -0,0 +1,58 @@
+What: /sys/kernel/debug/kmemtrace/
+Date: July 2008
+Contact: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
+Description:
+
+In kmemtrace-enabled kernels, the following files are created:
+
+/sys/kernel/debug/kmemtrace/
+ cpu<n> (0400) Per-CPU tracing data, see below. (binary)
+ total_overruns (0400) Total number of bytes which were dropped from
+ cpu<n> files because of full buffer condition,
+ non-binary. (text)
+ abi_version (0400) Kernel's kmemtrace ABI version. (text)
+
+Each per-CPU file should be re...
Hi,
[I'm cc'ing Mathieu if he wants to comment on this.]
--
Hmm ? why record an invalid event ?? I see it's not used in the code, is
If only valid for event ID 1 and only in NUMA case, please don't waste
8 bytes for GFP flags ?? Whoah, that's a lot of one-hot bits ! :) I knew
Using a magic number in the trace header lets you deal with
cross-endianness.Saving the type sizes in the trace header lets you deal with different
LTTng will be faster though : per-cpu atomic ops instead of interrupt
Automatic description of markers and dynamic assignation of IDs to
Change the documentation to prefix a root command line by "#" instead of
What in the world can be causing that ? Shouldn't it be fixed ? It might
be due to unexpected allocator behavior, non-instrumented alloc/freeSee below for detail, but this event record is way too big and not
Isn't this overridable by a command line param ? Shouldn't it be called
Hrm, I'd leave that as a kernel command line option, not config option.
Argh, and you do a supplementary copy here. You could simply alias the
buffers and write directly to them after reserving the correct amount ofktime_get is monotonic, but with potentially coarse granularity. I see
that you use ktime_to_ns here, which gives you a resolution of 1 timer
tick in the case where the TSCs are not synchronized. While it should be
"good enough" for the scheduler, I doubt it's enough for a tracer.It also takes the xtime seqlock, which adds a potentially big delay to
the tracing code (if you read the clock while the writer lock is taken).Also, when NTP modifies the clock, although it stays monotonic, the rate
at which it increments can dramatically change. I doubt you want to useI think the standard is to use =0, =1 here, not =yes, =no ?
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
The relay interface is really inconsistent and produces erroneous output
if it's not used in a specific way. It's nice to be able to catch thesePekka suggested we use types that have constant size on every arch. I
This could change too, but if the number of GFP flags is too close to
32, I'd rather keep the ABI stable, providing for a larger number of GFPHmm, I'll look at lttng's code and see what exactly you are talking
ASCII/text? :)
I'm not sure what you meant, but non-binary traces would result in hugeSure, cross-endianness is not even currently implemented in the
I'm not sure how one could record a timestamp orderly into relay buffers
Dynamic assignation makes it hard to preserve ABI compatibility with the
userspace app. And Pekka suggested it's important to preserve it in
order to allow distros to include kmemtrace.Yes, it's probably better. I just wanted to avoid a user taking that as
Of course it will be fixed. But this FAQ entry also serves as a warning
that future allocator patches could introduce untraced functions.Okay, will rebase my patches on a tracepoints-enabled branch. How close
Not quite true. I saw a few kernel subsystems that provide compile-time opt=
ionsWhat would you suggest instead?
Please keep in mind timer resolution is not that critical, we're not
benchmarking the allocators cycle-wise; instead we're merely looking at
allocation lifetimes, fragmentation, patterns etc.. Timestamps are most68
Thanks for your comments.
Cheers,
Eduard
Hi Eduard-Gabriel,
Mathieu does have a good point of optimizing the memory use of an
individual event so I'm okay with that. But we really don't want to
force people the analyze the dump on same architecture where we captured
it. So as long as that is taken care of, I'm happy.Pekka
--
We've also a new Documentation/ maintainer these days (in addition to
Randy) who should get cc:ed:--
Mathematics is the supreme nostalgia of our time.--
This adds hooks for the SLAB allocator, to allow tracing with kmemtrace.
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
include/linux/slab_def.h | 68 ++++++++++++++++++++++++++++++++++++++------
mm/slab.c | 71 +++++++++++++++++++++++++++++++++++++++++----
2 files changed, 123 insertions(+), 16 deletions(-)diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 39c3a5e..7555ce9 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -14,6 +14,7 @@
#include <asm/page.h> /* kmalloc_sizes.h needs PAGE_SIZE */
#include <asm/cache.h> /* kmalloc_sizes.h needs L1_CACHE_BYTES */
#include <linux/compiler.h>
+#include <linux/kmemtrace.h>/* Size description struct for general caches. */
struct cache_sizes {
@@ -28,8 +29,26 @@ extern struct cache_sizes malloc_sizes[];
void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
void *__kmalloc(size_t size, gfp_t flags);-static inline void *kmalloc(size_t size, gfp_t flags)
+#ifdef CONFIG_KMEMTRACE
+extern void *kmem_cache_alloc_notrace(struct kmem_cache *cachep, gfp_t flags);
+extern size_t slab_buffer_size(struct kmem_cache *cachep);
+#else
+static __always_inline void *
+kmem_cache_alloc_notrace(struct kmem_cache *cachep, gfp_t flags)
{
+ return kmem_cache_alloc(cachep, flags);
+}
+static inline size_t slab_buffer_size(struct kmem_cache *cachep)
+{
+ return 0;
+}
+#endif
+
+static __always_inline void *kmalloc(size_t size, gfp_t flags)
+{
+ struct kmem_cache *cachep;
+ void *ret;
+
if (__builtin_constant_p(size)) {
int i = 0;@@ -50,10 +69,17 @@ static inline void *kmalloc(size_t size, gfp_t flags)
found:
#ifdef CONFIG_ZONE_DMA
if (flags & GFP_DMA)
- return kmem_cache_alloc(malloc_sizes[i].cs_dmacachep,
- flags);
+ cachep = malloc_sizes[i].cs_dmacachep;
+ else
#endif
- return kmem_cache_alloc(malloc_sizes[i].cs_cachep, flags);
+ cachep = malloc_sizes[i].cs_cachep;
+
...
You're introducing this to work around the fact that struct kmem_cache
is defined in mm/slab.c? Please note that for kmemcheck we already moved
struct kmem_cache to a header file so we should be able to use the same--
This adds hooks for the SLUB allocator, to allow tracing with kmemtrace.
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
include/linux/slub_def.h | 53 ++++++++++++++++++++++++++++++++++--
mm/slub.c | 66 +++++++++++++++++++++++++++++++++++++++++----
2 files changed, 110 insertions(+), 9 deletions(-)diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index d117ea2..d77012a 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -10,6 +10,7 @@
#include <linux/gfp.h>
#include <linux/workqueue.h>
#include <linux/kobject.h>
+#include <linux/kmemtrace.h>enum stat_item {
ALLOC_FASTPATH, /* Allocation from cpu slab */
@@ -203,13 +204,31 @@ static __always_inline struct kmem_cache *kmalloc_slab(size_t size)
void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
void *__kmalloc(size_t size, gfp_t flags);+#ifdef CONFIG_KMEMTRACE
+extern void *kmem_cache_alloc_notrace(struct kmem_cache *s, gfp_t gfpflags);
+#else
+static __always_inline void *
+kmem_cache_alloc_notrace(struct kmem_cache *s, gfp_t gfpflags)
+{
+ return kmem_cache_alloc(s, gfpflags);
+}
+#endif
+
static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
{
- return (void *)__get_free_pages(flags | __GFP_COMP, get_order(size));
+ unsigned int order = get_order(size);
+ void *ret = (void *) __get_free_pages(flags, order);
+
+ kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
+ size, PAGE_SIZE << order, flags);
+
+ return ret;
}static __always_inline void *kmalloc(size_t size, gfp_t flags)
{
+ void *ret;
+
if (__builtin_constant_p(size)) {
if (size > PAGE_SIZE)
return kmalloc_large(size, flags);
@@ -220,7 +239,13 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
if (!s)
return ZERO_SIZE_PTR;- return kmem_cache_alloc(s, flags);
+ ret = kmem_cache_alloc_notrace(s, flags);
+
+ kmemtrace_mark_alloc(KME...
--
This adds hooks for the SLOB allocator, to allow tracing with kmemtrace.
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
include/linux/slob_def.h | 9 +++++----
mm/slob.c | 37 +++++++++++++++++++++++++++++++------
2 files changed, 36 insertions(+), 10 deletions(-)diff --git a/include/linux/slob_def.h b/include/linux/slob_def.h
index 59a3fa4..0ec00b3 100644
--- a/include/linux/slob_def.h
+++ b/include/linux/slob_def.h
@@ -3,14 +3,15 @@void *kmem_cache_alloc_node(struct kmem_cache *, gfp_t flags, int node);
-static inline void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
+static __always_inline void *kmem_cache_alloc(struct kmem_cache *cachep,
+ gfp_t flags)
{
return kmem_cache_alloc_node(cachep, flags, -1);
}void *__kmalloc_node(size_t size, gfp_t flags, int node);
-static inline void *kmalloc_node(size_t size, gfp_t flags, int node)
+static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
{
return __kmalloc_node(size, flags, node);
}
@@ -23,12 +24,12 @@ static inline void *kmalloc_node(size_t size, gfp_t flags, int node)
* kmalloc is the normal method of allocating memory
* in the kernel.
*/
-static inline void *kmalloc(size_t size, gfp_t flags)
+static __always_inline void *kmalloc(size_t size, gfp_t flags)
{
return __kmalloc_node(size, flags, -1);
}-static inline void *__kmalloc(size_t size, gfp_t flags)
+static __always_inline void *__kmalloc(size_t size, gfp_t flags)
{
return kmalloc(size, flags);
}
diff --git a/mm/slob.c b/mm/slob.c
index a3ad667..23375ed 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -65,6 +65,7 @@
#include <linux/module.h>
#include <linux/rcupdate.h>
#include <linux/list.h>
+#include <linux/kmemtrace.h>
#include <asm/atomic.h>/*
@@ -463,27 +464,38 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node)
{
unsigned int *m;
int align = max(ARCH_KMALLOC_MINA...
The patch description could be little less terse and maybe explain the
__always_inline changes but:--
Why is this needed? builtin_return?
--
Mathematics is the supreme nostalgia of our time.--
If we don't use __always_inline, we can't be sure whether it's inlined
or not. And we don't know if we need _THIS_IP_ or _RET_IP_ (equivalent
to __builtin_return_address(0)). Simple, plain 'inline' does not
guarantee GCC will inline that function, nor does it warn us if it is
not inlined.--
| Greg Kroah-Hartman | [PATCH 006/196] Chinese: add translation of oops-tracing.txt |
| Linus Torvalds | Linux 2.6.21-rc1 |
| david | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Vladislav Bolkhovitin | Re: Integration of SCST in the mainstream Linux kernel |
| Alexey Dobriyan | Re: [GIT]: Networking |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Evgeniy Polyakov | Re: [BUG] New Kernel Bugs |
git: | |
