Re: [2/2] vmallocinfo: Add caller information

Previous thread: [0/2] vmalloc: Add /proc/vmallocinfo to display mappings by Christoph Lameter on Tuesday, March 18, 2008 - 3:27 pm. (5 messages)

Next thread: [1/2] vmalloc: Show vmalloced areas via /proc/vmallocinfo by Christoph Lameter on Tuesday, March 18, 2008 - 3:27 pm. (5 messages)
From: Christoph Lameter
Date: Tuesday, March 18, 2008 - 3:27 pm

Add caller information so that /proc/vmallocinfo shows where the allocation
request for a slice of vmalloc memory originated.

Result in output like this:

0xffffc20000000000-0xffffc20000801000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages
0xffffc20000801000-0xffffc20000806000   20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc
0xffffc20000806000-0xffffc20000c07000 4198400 alloc_large_system_hash+0x127/0x246 pages=1024 vmalloc vpages
0xffffc20000c07000-0xffffc20000c0a000   12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc
0xffffc20000c0a000-0xffffc20000c0c000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c0c000-0xffffc20000c0f000   12288 acpi_os_map_memory+0x13/0x1c phys=cff64000 ioremap
0xffffc20000c10000-0xffffc20000c15000   20480 acpi_os_map_memory+0x13/0x1c phys=cff65000 ioremap
0xffffc20000c16000-0xffffc20000c18000    8192 acpi_os_map_memory+0x13/0x1c phys=cff69000 ioremap
0xffffc20000c18000-0xffffc20000c1a000    8192 acpi_os_map_memory+0x13/0x1c phys=fed1f000 ioremap
0xffffc20000c1a000-0xffffc20000c1c000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c1c000-0xffffc20000c1e000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c1e000-0xffffc20000c20000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c20000-0xffffc20000c22000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c22000-0xffffc20000c24000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
0xffffc20000c24000-0xffffc20000c26000    8192 acpi_os_map_memory+0x13/0x1c phys=e0081000 ioremap
0xffffc20000c26000-0xffffc20000c28000    8192 acpi_os_map_memory+0x13/0x1c phys=e0080000 ioremap
0xffffc20000c28000-0xffffc20000c2d000   20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc
0xffffc20000c2d000-0xffffc20000c31000   16384 tcp_init+0xd5/0x31c pages=3 vmalloc
0xffffc20000c31000-0xffffc20000c34000   12288 alloc_large_system_hash+0x127/0x246 pages=2 ...
From: Ingo Molnar
Date: Wednesday, March 19, 2008 - 2:42 pm

please use one simple save_stack_trace() instead of polluting a dozen 

	Ingo
--

From: Christoph Lameter
Date: Wednesday, March 19, 2008 - 5:03 pm

save_stack_trace() depends on CONFIG_STACKTRACE which is only available 
when debugging is compiled it. I was more thinking about this as a 
generally available feature.


--

From: Ingo Molnar
Date: Friday, March 21, 2008 - 4:00 am

then make STACKTRACE available generally via the patch below.

	Ingo

------------------------------------------->
Subject: debugging: always enable stacktrace
From: Ingo Molnar <mingo@elte.hu>
Date: Fri Mar 21 11:48:32 CET 2008

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 lib/Kconfig.debug |    1 -
 1 file changed, 1 deletion(-)

Index: linux-x86.q/lib/Kconfig.debug
===================================================================
--- linux-x86.q.orig/lib/Kconfig.debug
+++ linux-x86.q/lib/Kconfig.debug
@@ -387,7 +387,6 @@ config DEBUG_LOCKING_API_SELFTESTS
 
 config STACKTRACE
 	bool
-	depends on DEBUG_KERNEL
 	depends on STACKTRACE_SUPPORT
 
 config DEBUG_KOBJECT
--

From: Christoph Lameter
Date: Friday, March 21, 2008 - 10:35 am

How do I figure out which nesting level to display if we'd do this?
--

From: Ingo Molnar
Date: Friday, March 21, 2008 - 11:45 am

the best i found for lockdep was to include a fair number of them, and 
to skip the top 3. struct vm_area that vmalloc uses isnt space-critical, 
so 4-8 entries with a 3 skip would be quite ok. (but can be more than 
that as well)

	Ingo
--

From: Christoph Lameter
Date: Friday, March 21, 2008 - 12:16 pm

STACKTRACE depends on STACKTRACE_SUPPORT which is not available on 
all arches? alpha blackfin ia64 etc are missing it?

I thought there were also issues on x86 with optimizations leading to 
weird stacktraces?


--

From: Ingo Molnar
Date: Friday, March 21, 2008 - 1:55 pm

at most there can be extra stack trace entries. This is for debugging, 
so if someone wants exact stacktraces, FRAME_POINTERS will certainly 
improve them.

	Ingo
--

From: Mike Frysinger
Date: Friday, March 21, 2008 - 7:40 pm

as long as the new code in question is properly ifdef-ed, making it
rely on STACKTRACE sounds fine.  i'll open an item in our Blackfin
tracker to add support for it.
-mike
--

From: Ingo Molnar
Date: Tuesday, April 29, 2008 - 1:48 am

i _specifically_ objected to the uglification that this patch brings 
with itself to the modified arch/x86 files (see the diff excerpt below), 
in:

    http://lkml.org/lkml/2008/3/19/450

i pointed out how it should be done _much cleaner_ (and much smaller - 
only a single patch needed) via stack-trace, without changing a dozen 
architectures, and even gave a patch to make it all easier for you:

    http://lkml.org/lkml/2008/3/19/568
    http://lkml.org/lkml/2008/3/21/88

in fact, a stacktrace printout is much more informative as well to 
users, than a punny __builtin_return_address(0)!

but you did not reply to my objections in substance, hence i considered 
the issue closed - but you apparently went ahead without addressing my 
concerns (which are rather obvious to anyone doing debug code) and now 
this ugly code is upstream.

If lockdep can get stacktrace samples from all around the kernel without 
adding "caller" info parameters to widely used APIs, then the MM is 
evidently able to do it too. _Saving_ a stacktrace is relatively fast 
[printing it to the console is what is slow], and vmalloc() is an utter 
slowpath anyway [and 1 million file descriptors does not count as a 
fastpath].

If performance is of any concern then make it dependent on 
CONFIG_DEBUG_VM or whatever debug switch in the MM - that will be 
_faster_ in the default case than the current 
pass-parameter-deep-down-the-arch crap you've pushed here. I dont 
remember the last time i genuinely needed the allocation site of a 
vmalloc().

I any case, do _NOT_ pollute any architectures with stack debugging 
hacks (and that holds for future similar patches too), that's why we 
wrote stacktrace. This needs to be reverted or fixed properly.


--

From: Christoph Lameter
Date: Tuesday, April 29, 2008 - 10:08 am

Sorry lost track of this issue. Adding stracktrace support is not a 
trivial thing and will change the basic handling of vmallocinfo.

Not sure if stacktrace support can be enabled without a penalty on various 
platforms. Doesnt this require stackframes to be formatted in a certain 
way?

--

From: Arjan van de Ven
Date: Monday, April 28, 2008 - 12:48 pm

On Tue, 29 Apr 2008 10:08:29 -0700 (PDT)

it doesn't.
--

From: Christoph Lameter
Date: Tuesday, April 29, 2008 - 11:49 am

Hmmm... Why do we have CONFIG_FRAMEPOINTER then?

The current implementation of vmalloc_caller() follows what we have done 
with kmalloc_track_caller. Its low overhead and always on.

It would be great if we could have stacktrace support both for kmalloc and 
vmalloc in the same way also with low overhead but I think following a 
backtrace requires much more than simply storing the caller address. A 
mechanism like that would require an explicit kernel CONFIG option. A 
year or so ago we had patches to implement stacktraces in the slab 
allocators but they were not merged due to various arch specific issues 
with backtraces.

We could dump the offending x86_64 pieces. Some detail of what
/proc/vmallocinfo would be lost then.
--

From: Arjan van de Ven
Date: Monday, April 28, 2008 - 2:00 pm

On Tue, 29 Apr 2008 11:49:54 -0700 (PDT)


stacktraces aren't entirely free, the cost is O(nr of modules) unfortunately ;(
--

From: Christoph Lameter
Date: Tuesday, April 29, 2008 - 12:09 pm

Well so we display out of whack backtraces? There are also issues on 
platforms that do not have a stack in the classic sense (rotating register 
file on IA64 and Sparc64 f.e.). Determining a backtrace can be very 

The current implementation /proc/vmallocinfo avoids these issues and 
with just one caller address it can print one line per vmalloc request. 
--

From: Pekka Enberg
Date: Tuesday, April 29, 2008 - 12:23 pm

I think that's the key question here whether we need to enable this on
production systems? If yes, why? If it's just a debugging aid, then I
see Ingo's point of save_stack_trace(); otherwise the low-overhead
__builtin_return_address() makes more sense.

And btw, why is this new file not in /sys/kernel....?
--

From: Pekka Enberg
Date: Tuesday, April 29, 2008 - 12:29 pm

Actually, this is vmalloc() so why do we even care? If there are
callers in the tree that use vmalloc() for performance sensitive
stuff, they ought to be converted to kmalloc() anyway, no?
--

From: Ingo Molnar
Date: Tuesday, April 29, 2008 - 12:29 pm

they have to solve that for kernel oopses and for lockdep somehow 
anyway. Other users of stacktrace are: fault injection, kmemcheck, 
latencytop, ftrace. All new debugging and instrumentation code uses it, 
and for a good reason.

	Ingo
--

Previous thread: [0/2] vmalloc: Add /proc/vmallocinfo to display mappings by Christoph Lameter on Tuesday, March 18, 2008 - 3:27 pm. (5 messages)

Next thread: [1/2] vmalloc: Show vmalloced areas via /proc/vmallocinfo by Christoph Lameter on Tuesday, March 18, 2008 - 3:27 pm. (5 messages)