reduce the total stack usage of slab_err & object_err.
Introduce a new function to display a simple slab bug message, and call
this when vprintk is not needed.
before: (stack size as reported by checkstack on x86_64)
slab_err/object_err -> slab_bug(328)->printk
after:
slab_err/object_err -> slab_bug_message(8) -> printk
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
----
I've been trying to build a tool to estimate the maximum stack usage in
the kernel, & noticed that most of the biggest stack users are the error
handling routines.
This simple change will reduced the stack used handling some slub errors
on both 64 & 32 bit platforms, although I haven't measured it on 32 bit
it will save at least 100 bytes.
I haven't spotted anywhere that overflows the stack but this change
should reduce the chance of it happening.
It boots & run successfully on my AMD x86_64 desktop -- but I haven't
seen any slub errors so the new code hasn't been run.
regards
Richard
diff --git a/mm/slub.c b/mm/slub.c
index 0c83e6a..0646452 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -419,6 +419,15 @@ static void print_page_info(struct page *page)
}
+static void slab_bug_message(struct kmem_cache *s, char *msg)
+{
+ printk(KERN_ERR "========================================"
+ "=====================================\n");
+ printk(KERN_ERR "BUG %s: %s\n", s->name, msg);
+ printk(KERN_ERR "----------------------------------------"
+ "-------------------------------------\n\n");
+}
+
static void slab_bug(struct kmem_cache *s, char *fmt, ...)
{
va_list args;
@@ -427,11 +436,7 @@ static void slab_bug(struct kmem_cache *s, char *fmt, ...)
va_start(args, fmt);
vsnprintf(buf, sizeof(buf), fmt, args);
va_end(args);
- printk(KERN_ERR "========================================"
- "=====================================\n");
- printk(KERN_ERR "BUG %s: %s\n", s->name, buf);
- printk(KERN_ERR ...I've got a better idea: use vprintk. -- Mathematics is the supreme nostalgia of our time. --
You could simply get rid of the 100 byte buffer by using vprintk? Same method
could be used elsewhere in the kernel and does not require additional
functions. Compiles, untestted.
Subject: Slub reduce slab_bug stack usage by using vprintk
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2008-09-30 10:34:40.000000000 -0500
+++ linux-2.6/mm/slub.c 2008-09-30 10:36:10.000000000 -0500
@@ -422,15 +422,14 @@
static void slab_bug(struct kmem_cache *s, char *fmt, ...)
{
va_list args;
- char buf[100];
va_start(args, fmt);
- vsnprintf(buf, sizeof(buf), fmt, args);
- va_end(args);
printk(KERN_ERR "========================================"
"=====================================\n");
- printk(KERN_ERR "BUG %s: %s\n", s->name, buf);
- printk(KERN_ERR "----------------------------------------"
+ printk(KERN_ERR "BUG %s: ", s->name);
+ vprintk(fmt, args);
+ va_end(args);
+ printk(KERN_ERR "\n----------------------------------------"
"-------------------------------------\n\n");
}
--
I'm fixing a bunch of these in the kernel right now. -- Mathematics is the supreme nostalgia of our time. --
Yes, using vprintk is better but you still have this path : ( with your patch applied) object_err -> slab_bug(208) -> printk(216) instead of object_err -> slab_bug_message(8) -> printk(216) unfortunately the overhead for having var_args is pretty big, at least on x86_64. I haven't measured it on 32 bit yet. Richard --
(Funny, I added vprintk precisely to fix the extra buffer issue in
ext2/3 years ago, but never hit the rest of the kernel..)
Use vprintk rather that vsnprintf where possible
This does away with lots of large static and on-stack buffers as well
as a few associated locks.
Signed-off-by: Matt Mackall <mpm@selenic.com>
diff -r 6841f3ce32be -r 9ac0727d59d8 drivers/cpufreq/cpufreq.c
--- a/drivers/cpufreq/cpufreq.c Mon Sep 29 17:28:44 2008 -0500
+++ b/drivers/cpufreq/cpufreq.c Tue Sep 30 11:37:30 2008 -0500
@@ -220,9 +220,7 @@
void cpufreq_debug_printk(unsigned int type, const char *prefix,
const char *fmt, ...)
{
- char s[256];
va_list args;
- unsigned int len;
unsigned long flags;
WARN_ON(!prefix);
@@ -235,15 +233,10 @@
}
spin_unlock_irqrestore(&disable_ratelimit_lock, flags);
- len = snprintf(s, 256, KERN_DEBUG "%s: ", prefix);
-
va_start(args, fmt);
- len += vsnprintf(&s[len], (256 - len), fmt, args);
+ printk(KERN_DEBUG "%s: ", prefix);
+ vprintk(fmt, args);
va_end(args);
-
- printk(s);
-
- WARN_ON(len < 5);
}
}
EXPORT_SYMBOL(cpufreq_debug_printk);
diff -r 6841f3ce32be -r 9ac0727d59d8 drivers/scsi/arm/fas216.c
--- a/drivers/scsi/arm/fas216.c Mon Sep 29 17:28:44 2008 -0500
+++ b/drivers/scsi/arm/fas216.c Tue Sep 30 11:37:30 2008 -0500
@@ -290,10 +290,8 @@
static void
fas216_do_log(FAS216_Info *info, char target, char *fmt, va_list ap)
{
- static char buf[1024];
-
- vsnprintf(buf, sizeof(buf), fmt, ap);
- printk("scsi%d.%c: %s", info->host->host_no, target, buf);
+ printk("scsi%d.%c: ", info->host->host_no, target, buf);
+ vprintk(fmt, ap);
}
static void fas216_log_command(FAS216_Info *info, int level,
diff -r 6841f3ce32be -r 9ac0727d59d8 drivers/scsi/nsp32.c
--- a/drivers/scsi/nsp32.c Mon Sep 29 17:28:44 2008 -0500
+++ b/drivers/scsi/nsp32.c Tue Sep 30 11:37:30 2008 -0500
@@ -323,36 +323,31 @@
#define NSP32_DEBUG_INIT BIT(17)
#define NSP32_SPECIAL_PRINT_REGISTER BIT(20)
-#define ...Really 208 bytes for a va arg parameter declaration? I expected it to be simply a null terminated pointer list. --
That's fascinating. I tried a simple test case in userspace:
#include <stdarg.h>
#include <stdio.h>
void p(char *fmt, ...)
{
va_list args;
va_start(args, fmt);
vprintf(fmt, args);
va_end(args);
}
On 32-bit, I'm seeing 32 bytes of stack vs 216 on 64-bit. Disassembly
suggests it's connected to va_list fiddling with XMM registers, which
seems quite odd.
--
Mathematics is the supreme nostalgia of our time.
--
Ok, on closer inspection, this is part of the x86_64 calling convention. When calling a varargs function, the caller passes the number of floating point SSE regs used in rax. The callee then has to save these away for va_list use. The GCC prologue apparently sets aside space for xmm0-xmm7 (16 bytes each) all the time (plus rdi, rsi, rdx, rcx, r8, and r9). Obviously, we're never passing floating point args in the kernel, so we're taking about a 40+ byte hit in code size and 128 byte hit in stack size for every varargs call. Looks like the gcc people have a patch in progress: http://gcc.gnu.org/ml/gcc-patches/2008-08/msg02165.html So I think we should assume that x86_64 will sort this out eventually. -- Mathematics is the supreme nostalgia of our time. --
thanks for the info. so vprintk _is_ the best solution. I did think it a bit strange to have to add code to remove varargs to save stack space, but I didn't even consider that it might be a gcc issue. --
Looks like this got fixed for x86_64 in gcc 4.4. For most other arches, the overhead should be reasonable. -- Mathematics is the supreme nostalgia of our time. --
Cool! I once did the same, although the code has severely bitrotted by now. Is the code available somewhere? Jörn -- "Error protection by error detection and correction." -- from a university class --
No I haven't made it available as it's really only a proof of concept, and I still don't have any sensible ideas how to deal with pointers to functions. Plus I'm still testing it to see if the results are anything like reasonable. Also it's finding lots of potentially recursive code paths and my heuristic to deal with them is very basic. I'm just adding a feature so that I can ignore some code paths, so maybe that will help. Richard --
Sounds very familiar. ;) Function pointers are fairly easy. When a function pointer is part of a structure, simply consider that pointer to be a pseudo-function that doesn't consume any stack space. Whenever that pointer is written to, that value can be "called" from the pseudo-function. Callback functions that are passed as function parameters can be handles similarly. Getting this information wasn't too hard with smatch, but smatch depends on gcc 3.1, which has *ahem* matured a bit. Recursions essentially consume an infinite amount of stack unless you know the upper bound for them. I handled this two-fold. First, every single recursion is reported. Secondly, every recursion is assumed to be taken exactly once when calculating stack consumption. This is the minimal sane value. Feel free to pick two or three if you prefer. The main function code was done in two stages, iirc. First stage simply creates the call graph in memory. Somewhere in the range of a million objects. Then I collapsed the graph from the leaves. If function A calls functions B, C and D, you first throw away two of the called functions and keep the one with the biggest stack footprint. Then A is turned into a function A' that has the combined stack footprint of A and B (assuming C and D are lighter) and is a leaf function. Add some annotation that B is called, along with anything B itself called before it was collapsed. If you use this method, recursions will sooner or later turn into a pattern where A calls A. Trivial to detect. Maybe my thesis has a few more details: http://wh.fh-wedel.de/~joern/quality.pdf Jörn -- Joern's library part 13: http://www.chip-architect.com/ --
