Noticed today that the combination of 4KSTACKS and DEBUG_STACKOVERFLOW
config options is a bit deadly.
DEBUG_STACKOVERFLOW warns in do_IRQ if we're within THREAD_SIZE/8 of the
end of useable stack space, or 512 bytes on a 4k stack.
If we are, then it goes down the dump_stack path, which uses most, if
not all, of the remaining stack, thereby turning a well-intentioned
warning into a full-blown catastrophe.
The callchain from the warning looks something like this, with stack
usage shown as found on my x86 box:
4 dump_stack
4 show_trace
8 show_trace_log_lvl
4 dump_trace
print_context_stack
12 print_trace_address
print_symbol
232 __print_symbol
164 sprint_symbol
20 printk
___
448
448 bytes to tell us that we're within 512 bytes (or less) of certain
doom... and I think there's call overhead on top of that?
The large stack usage in those 2 functions is due to big char arrays, of
size KSYM_NAME_LEN (128 bytes) and KSYM_SYMBOL_LEN (223 bytes).
IOW, the stack warning effectively reduces useful stack left in our itty
bitty 4k stacks by over 10%.
Any suggestions for ways around this? The warning is somewhat helpful,
and I guess the obvious option is to lighten up the dump_stack path, but
it's still effectively reducing precious available stack space by some
amount.
With CONFIG_DEBUG_STACK_USAGE, we could print at oops time: "oh, and by
the way, you blew your stack" if there is no zeroed stack space left, as
a post-mortem. Even without that option, I think we could still check
whether the *current* %esp at oops time has gone too far? But if we
blew the stack, returned, and *then* oops, I think it'd be hard to know
without the DEBUG_STACK_USAGE option that we ran out of room.
-Eric
-
| Jens Axboe | Re: [BUG] New Kernel Bugs |
| Faik Uygur | Re: Linux 2.6.21-rc1 |
| Ingo Molnar | [patch 02/13] syslets: add syslet.h include file, user API/ABI definitions |
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
git: | |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Jarek Poplawski | Re: Data corruption issue with splice() on 2.6.27.10 |
| Steven Rostedt | Re: -rt scheduling: wakeup bug? |
