Re: [RFC PATCH 1/4] kmemtrace: Core implementation.

Previous thread: [scsi_dh] [PATCH] Verify "dev" is a sdev before accessing it. by Chandra Seetharaman on Wednesday, July 16, 2008 - 8:35 pm. (1 message)

Next thread: Re: 2.6.26-rc9-git9 doesn't boot on Macintel by Yinghai Lu on Wednesday, July 16, 2008 - 8:49 pm. (35 messages)
To: <penberg@...>
Cc: <cl@...>, <linux-mm@...>, <linux-kernel@...>
Date: Wednesday, July 16, 2008 - 8:46 pm

Hello everybody,

I hopefully fixed previous complaints. Also wrote some documentation and
fixed some missing stuff in SLAB.

Please take a look and comment.

BTW, see Documentation/vm/kmemtrace.txt for details on how to use this and
for info on design details.

Eduard

Eduard - Gabriel Munteanu (4):
kmemtrace: Core implementation.
kmemtrace: SLAB hooks.
kmemtrace: SLUB hooks.
kmemtrace: SLOB hooks.

Documentation/kernel-parameters.txt | 6 +
Documentation/vm/kmemtrace.txt | 96 ++++++++++++++++
MAINTAINERS | 6 +
include/linux/kmemtrace.h | 110 ++++++++++++++++++
include/linux/slab_def.h | 56 ++++++++-
include/linux/slub_def.h | 9 ++-
init/main.c | 2 +
lib/Kconfig.debug | 4 +
mm/Makefile | 2 +-
mm/kmemtrace.c | 208 +++++++++++++++++++++++++++++++++++
mm/slab.c | 61 +++++++++-
mm/slob.c | 37 +++++-
mm/slub.c | 47 +++++++-
13 files changed, 617 insertions(+), 27 deletions(-)
create mode 100644 Documentation/vm/kmemtrace.txt
create mode 100644 include/linux/kmemtrace.h
create mode 100644 mm/kmemtrace.c

--

To: <penberg@...>
Cc: <cl@...>, <linux-mm@...>, <linux-kernel@...>
Date: Wednesday, July 16, 2008 - 8:46 pm

kmemtrace provides tracing for slab allocator functions, such as kmalloc,
kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected data is then fed
to the userspace application in order to analyse allocation hotspots,
internal fragmentation and so on, making it possible to see how well an
allocator performs, as well as debug and profile kernel code.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
Documentation/kernel-parameters.txt | 6 +
Documentation/vm/kmemtrace.txt | 96 ++++++++++++++++
MAINTAINERS | 6 +
include/linux/kmemtrace.h | 110 ++++++++++++++++++
init/main.c | 2 +
lib/Kconfig.debug | 4 +
mm/Makefile | 2 +-
mm/kmemtrace.c | 208 +++++++++++++++++++++++++++++++++++
8 files changed, 433 insertions(+), 1 deletions(-)
create mode 100644 Documentation/vm/kmemtrace.txt
create mode 100644 include/linux/kmemtrace.h
create mode 100644 mm/kmemtrace.c

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index b52f47d..b230aff 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -49,6 +49,7 @@ parameter is applicable:
ISAPNP ISA PnP code is enabled.
ISDN Appropriate ISDN support is enabled.
JOY Appropriate joystick support is enabled.
+ KMEMTRACE kmemtrace is enabled.
LIBATA Libata driver is enabled
LP Printer support is enabled.
LOOP Loopback device support is enabled.
@@ -941,6 +942,11 @@ and is between 256 and 4096 characters. It is defined in the file
use the HighMem zone if it exists, and the Normal
zone if it does not.

+ kmemtrace.subbufs=n [KNL,KMEMTRACE] Overrides the number of
+ subbufs kmemtrace's relay channel has. Set this
+ higher than default (KMEMTRACE_N_SUBBUFS in code) if
+ you experience buffer overruns.
+
movablecore=nn[KMG] [KNL,X86-32,IA-64,PPC,X86-64] This para...

To: Eduard - Gabriel Munteanu <eduard.munteanu@...>
Cc: <penberg@...>, <cl@...>, <linux-mm@...>, <linux-kernel@...>
Date: Thursday, July 17, 2008 - 5:34 pm

Otherwise looks nice. Thanks.

---
~Randy
Linux Plumbers Conference, 17-19 September 2008, Portland, Oregon USA
http://linuxplumbersconf.org/
--

To: Randy Dunlap <rdunlap@...>
Cc: <penberg@...>, <cl@...>, <linux-mm@...>, <linux-kernel@...>
Date: Thursday, July 17, 2008 - 7:49 pm

--

To: Eduard - Gabriel Munteanu <eduard.munteanu@...>
Cc: <cl@...>, <linux-mm@...>, <linux-kernel@...>, Randy Dunlap <rdunlap@...>, Matt Mackall <mpm@...>
Date: Thursday, July 17, 2008 - 4:01 am

Hi,

[Adding Randy to cc for the Documentation/ parts and Matt for the core.]

On Thu, Jul 17, 2008 at 3:46 AM, Eduard - Gabriel Munteanu

As I mentioned in private, I would prefer we drop autoconf from the

I think you're supposed to document the actual filesystem in

I still think kernel vs. cache is confusing because both allocations

So why don't we have the ABI version embedded here like blktrace has
so that user-space can check if the format matches its expectations?
That should be future-proof as well: as long as y ou keep the existing
fields where they're at now, you can always add new fields at the end

--

To: Pekka Enberg <penberg@...>
Cc: <cl@...>, <linux-mm@...>, <linux-kernel@...>, Randy Dunlap <rdunlap@...>, Matt Mackall <mpm@...>
Date: Thursday, July 17, 2008 - 2:32 pm

Yes, I'm working on a legible plain Makefile. However, I'd leave both
the autoconf variant and the plain Makefile in the package for now. Most
developers can use autoconf since it's part of the standard toolset for

We keep this here because we see all-zeros events when relay errors
occur. I'd like to keep it until I'm sure the relay problem was solved
(although I've not seen these errors in a while since I patched

You can't add fields at the end, because the struct size will change and
reads will be erroneous. Also, stamping every 'packet' with ABI version

We do it to preserve ordering of timestamps. Otherwise, the CPU might
get preempted (by IRQs or otherwise) and the event might not be logged
in the order timestamps were taken.

I thought the previous comment about 'ev.timestamp' was enough. I'll
--

To: Eduard - Gabriel Munteanu <eduard.munteanu@...>
Cc: <cl@...>, <linux-mm@...>, <linux-kernel@...>, Randy Dunlap <rdunlap@...>, Matt Mackall <mpm@...>
Date: Friday, July 18, 2008 - 4:48 am

Hi Eduard-Gabriel,

It's an ABI so you want to make it backwards compatible and extensible.
Yes, it's just for debugging, so the rules are bit more relaxed here but
that's not an excuse for not designing the ABI properly.

I really wish we would follow the example set by blktrace here. It uses a
fixed-length header that knows the length of the rest of the packet.

Pekka
--

To: Pekka J Enberg <penberg@...>
Cc: <cl@...>, <linux-mm@...>, <linux-kernel@...>, Randy Dunlap <rdunlap@...>, Matt Mackall <mpm@...>
Date: Friday, July 18, 2008 - 6:13 am

I do expect to keep things source-compatible, but even
binary-compatible? Developers debug and write patches on the latest kernel,
not on a 6-month-old kernel. Isn't it reasonable that they would
recompile kmemtrace along with the kernel?

I would deem one ABI or another stable, but then we have to worry about
not breaking it, which leads to either bloating the kernel, or keeping

I'd rather export the header length through a separate debugfs entry,
rather than add this to every packet. I don't think we need variable
length packets, unless we intend to export the whole stack trace, for
example.

By the way, do you anticipate the need for such a stack trace? It would seem
nice, but is it worth the trouble? (/me writes this down as a possible

--

To: Eduard - Gabriel Munteanu <eduard.munteanu@...>
Cc: <cl@...>, <linux-mm@...>, <linux-kernel@...>, Randy Dunlap <rdunlap@...>, Matt Mackall <mpm@...>
Date: Friday, July 18, 2008 - 10:38 am

Hi Eduard-Gabriel,

On Fri, Jul 18, 2008 at 1:13 PM, Eduard - Gabriel Munteanu

Yes, I do think it's unreasonable. I, for one, am hoping distributions
will pick up the kmemtrace userspace at some point after which I don't
need to ever compile it myself.

On Fri, Jul 18, 2008 at 1:13 PM, Eduard - Gabriel Munteanu

Like I've said before, it's debugging/tracing infrastructure so the
rules are bit more relaxed. That said, what we should do is (1) make
the ABI as future-proof as we can, (2) explicitly mark it as unstable
by documenting it in Documentation/ABI/testing and (3) at some point
in time move it in Documentation/ABI/stable and hopefully never break
it again. But sure, we probably don't need to keep any "bloat" around
like we do with the syscall interface, for example.

And hopefully, the ABI is good enough to allow adding *new* tracing
events while retaining the old ones nicely in a backwards compatible
way.

On Fri, Jul 18, 2008 at 1:13 PM, Eduard - Gabriel Munteanu

Sure, makes sense.

Pekka
--

To: Pekka Enberg <penberg@...>
Cc: <cl@...>, <linux-mm@...>, <linux-kernel@...>, Randy Dunlap <rdunlap@...>, Matt Mackall <mpm@...>
Date: Friday, July 18, 2008 - 3:40 pm

Ok, I agree it's nice to have it in distros. I wasn't planning for this,
but it's good to know others' expectations.

Then I'll also add a turn-off mechanism, so maybe it makes it into distro
kernels too (either debug or not). And we don't need to include kernel
headers from userspace anymore and I'll just provide a copy.

BTW, I also expect the kmemtrace-user git repo to become stable soon

Sounds like a good plan. I'll also update the docs (Documentation/ABI/ and
--

To: Eduard - Gabriel Munteanu <eduard.munteanu@...>
Cc: Pekka Enberg <penberg@...>, <cl@...>, <linux-mm@...>, <linux-kernel@...>, Randy Dunlap <rdunlap@...>
Date: Friday, July 18, 2008 - 4:07 pm

It's worth pointing out that this is one of the big downfalls of things
like systemtap. If a tool can't just work out of the box for a distro,
it's basically a non-starter for most users.

--
Mathematics is the supreme nostalgia of our time.

--

To: <penberg@...>
Cc: <cl@...>, <linux-mm@...>, <linux-kernel@...>
Date: Wednesday, July 16, 2008 - 8:46 pm

This adds hooks for the SLAB allocator, to allow tracing with kmemtrace.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
include/linux/slab_def.h | 56 +++++++++++++++++++++++++++++++++++++-----
mm/slab.c | 61 +++++++++++++++++++++++++++++++++++++++++----
2 files changed, 104 insertions(+), 13 deletions(-)

diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 39c3a5e..77b8045 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -14,6 +14,7 @@
#include <asm/page.h> /* kmalloc_sizes.h needs PAGE_SIZE */
#include <asm/cache.h> /* kmalloc_sizes.h needs L1_CACHE_BYTES */
#include <linux/compiler.h>
+#include <linux/kmemtrace.h>

/* Size description struct for general caches. */
struct cache_sizes {
@@ -28,8 +29,20 @@ extern struct cache_sizes malloc_sizes[];
void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
void *__kmalloc(size_t size, gfp_t flags);

+#ifdef CONFIG_KMEMTRACE
+extern void *kmem_cache_alloc_notrace(struct kmem_cache *cachep, gfp_t flags);
+#else
+static inline void *kmem_cache_alloc_notrace(struct kmem_cache *cachep,
+ gfp_t flags)
+{
+ return kmem_cache_alloc(cachep, flags);
+}
+#endif
+
static inline void *kmalloc(size_t size, gfp_t flags)
{
+ void *ret;
+
if (__builtin_constant_p(size)) {
int i = 0;

@@ -50,10 +63,17 @@ static inline void *kmalloc(size_t size, gfp_t flags)
found:
#ifdef CONFIG_ZONE_DMA
if (flags & GFP_DMA)
- return kmem_cache_alloc(malloc_sizes[i].cs_dmacachep,
- flags);
+ ret = kmem_cache_alloc_notrace(
+ malloc_sizes[i].cs_dmacachep, flags);
+ else
#endif
- return kmem_cache_alloc(malloc_sizes[i].cs_cachep, flags);
+ ret = kmem_cache_alloc_notrace(
+ malloc_sizes[i].cs_cachep, flags);
+
+ kmemtrace_mark_alloc(KMEMTRACE_TYPE_KERNEL, _THIS_IP_, ret,
+ size, malloc_sizes[i].cs_size, flags);
+
+ return ret;
}
return __kmalloc(size, flags);
...

To: Eduard - Gabriel Munteanu <eduard.munteanu@...>
Cc: <cl@...>, <linux-mm@...>, <linux-kernel@...>
Date: Thursday, July 17, 2008 - 3:38 am

Hi Eduard-Gabriel,

On Thu, Jul 17, 2008 at 3:46 AM, Eduard - Gabriel Munteanu

We have malloc_sizes[i].cs_size here as the _allocated_ size (which

...here as well. Why?

AFAICT, you should always use ->buffer_size as the_allocated_ size. Hmm?
--

To: <penberg@...>
Cc: <cl@...>, <linux-mm@...>, <linux-kernel@...>
Date: Wednesday, July 16, 2008 - 8:46 pm

This adds hooks for the SLUB allocator, to allow tracing with kmemtrace.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
include/linux/slub_def.h | 9 +++++++-
mm/slub.c | 47 ++++++++++++++++++++++++++++++++++++++++-----
2 files changed, 49 insertions(+), 7 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index d117ea2..0cef121 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -10,6 +10,7 @@
#include <linux/gfp.h>
#include <linux/workqueue.h>
#include <linux/kobject.h>
+#include <linux/kmemtrace.h>

enum stat_item {
ALLOC_FASTPATH, /* Allocation from cpu slab */
@@ -205,7 +206,13 @@ void *__kmalloc(size_t size, gfp_t flags);

static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
{
- return (void *)__get_free_pages(flags | __GFP_COMP, get_order(size));
+ unsigned int order = get_order(size);
+ void *ret = (void *) __get_free_pages(flags, order);
+
+ kmemtrace_mark_alloc(KMEMTRACE_TYPE_KERNEL, _THIS_IP_, ret,
+ size, PAGE_SIZE << order, flags);
+
+ return ret;
}

static __always_inline void *kmalloc(size_t size, gfp_t flags)
diff --git a/mm/slub.c b/mm/slub.c
index 315c392..a6f930f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -23,6 +23,7 @@
#include <linux/kallsyms.h>
#include <linux/memory.h>
#include <linux/math64.h>
+#include <linux/kmemtrace.h>

/*
* Lock order:
@@ -1652,14 +1653,25 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,

void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
{
- return slab_alloc(s, gfpflags, -1, __builtin_return_address(0));
+ void *ret = slab_alloc(s, gfpflags, -1, __builtin_return_address(0));
+
+ kmemtrace_mark_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
+ s->objsize, s->size, gfpflags);
+
+ return ret;
}
EXPORT_SYMBOL(kmem_cache_alloc);

#ifdef CONFIG_NUMA
void *kmem_cache_alloc...

To: Eduard - Gabriel Munteanu <eduard.munteanu@...>
Cc: <cl@...>, <linux-mm@...>, <linux-kernel@...>
Date: Thursday, July 17, 2008 - 3:46 am

On Thu, Jul 17, 2008 at 3:46 AM, Eduard - Gabriel Munteanu

Don't cast flags to unsigned long. The kmemtrace core should use gfp_t

--

To: Pekka Enberg <penberg@...>
Cc: <cl@...>, <linux-mm@...>, <linux-kernel@...>
Date: Thursday, July 17, 2008 - 2:06 pm

Yes, I should cut all these casts off. Will resubmit soon.

Eduard

--

To: <penberg@...>
Cc: <cl@...>, <linux-mm@...>, <linux-kernel@...>
Date: Wednesday, July 16, 2008 - 8:46 pm

This adds hooks for the SLOB allocator, to allow tracing with kmemtrace.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
mm/slob.c | 37 +++++++++++++++++++++++++++++++------
1 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/mm/slob.c b/mm/slob.c
index a3ad667..0335c01 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -65,6 +65,7 @@
#include <linux/module.h>
#include <linux/rcupdate.h>
#include <linux/list.h>
+#include <linux/kmemtrace.h>
#include <asm/atomic.h>

/*
@@ -463,27 +464,38 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node)
{
unsigned int *m;
int align = max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
+ void *ret;

if (size < PAGE_SIZE - align) {
if (!size)
return ZERO_SIZE_PTR;

m = slob_alloc(size + align, gfp, align, node);
+
if (!m)
return NULL;
*m = size;
- return (void *)m + align;
+ ret = (void *)m + align;
+
+ kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KERNEL,
+ _RET_IP_, ret,
+ size, size + align, gfp, node);
} else {
- void *ret;
+ unsigned int order = get_order(size);

- ret = slob_new_page(gfp | __GFP_COMP, get_order(size), node);
+ ret = slob_new_page(gfp | __GFP_COMP, order, node);
if (ret) {
struct page *page;
page = virt_to_page(ret);
page->private = size;
}
- return ret;
+
+ kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KERNEL,
+ _RET_IP_, ret,
+ size, PAGE_SIZE << order, gfp, node);
}
+
+ return ret;
}
EXPORT_SYMBOL(__kmalloc_node);

@@ -501,6 +513,8 @@ void kfree(const void *block)
slob_free(m, *m + align);
} else
put_page(&sp->page);
+
+ kmemtrace_mark_free(KMEMTRACE_TYPE_KERNEL, _RET_IP_, block);
}
EXPORT_SYMBOL(kfree);

@@ -569,10 +583,19 @@ void *kmem_cache_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
{
void *b;

- if (c->size < PAGE_SIZE)
+ if (c->size < PAGE_SIZE) {
b = slob_allo...

To: Eduard - Gabriel Munteanu <eduard.munteanu@...>
Cc: <cl@...>, <linux-mm@...>, <linux-kernel@...>, Matt Mackall <mpm@...>
Date: Thursday, July 17, 2008 - 3:43 am

Hi,

[Adding Matt as cc.]

On Thu, Jul 17, 2008 at 3:46 AM, Eduard - Gabriel Munteanu

--

To: Pekka Enberg <penberg@...>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@...>, <cl@...>, <linux-mm@...>, <linux-kernel@...>
Date: Thursday, July 17, 2008 - 11:46 am

Acked-by: Matt Mackall <mpm@selenic.com>

--
Mathematics is the supreme nostalgia of our time.

--

Previous thread: [scsi_dh] [PATCH] Verify "dev" is a sdev before accessing it. by Chandra Seetharaman on Wednesday, July 16, 2008 - 8:35 pm. (1 message)

Next thread: Re: 2.6.26-rc9-git9 doesn't boot on Macintel by Yinghai Lu on Wednesday, July 16, 2008 - 8:49 pm. (35 messages)