Another disadvantage is one that came up earlier when markers were initially thought up: that something so invisible to the compiler (no code being generated in the instruction stream, after optimization, may be impossible to locate: not just the statement but also the putative parameters. Long ago, someone proposed inserting an asm("nop") mini-markers into the instruction stream, which could then be used as an anchor to tie a kprobe to, so that would solve the statement-location problem. But it doesn't help assure that the parameters will be available in dwarf, so someone else proposed adding another asm that just asks the parameters to be evaluated and placed *somewhere*. Each asm input constraint was to be the loosest possible, so as to not force the compiler to put the values into registers (and evict their normal tracing-ignorant tenants). I believe this combination was never actually built/tested, partly because people realized that then the compiler would always have to evaluate parameters unconditionally, whether or not a kprobe is present. (To do it otherwise would IIRC require the asm code to include control-flow-modification instructions, which would surprise gcc.) So that's roughly how we arrived at recent markers. They expose to the compiler the parameters, but arrange not to evaluate them unless necessary. The most recent markers code patches nops over most or all the hot path instructions, so there is no tangible performance impact. - FChE --
Actually, I listed that one as an advantage. But, in order to be
completely zero impact, the probe cannot interfere with optimisation,
and so you run the risk of having the probe point do strange things
(like it's in the middle of a loop that gets unrolled) or that the
variables you want to advertise get optimised away.
All of this is mitigated by correct selection of the probe points and
Actually, it does. Assuming the probe is placed in the code by someone
who knows what they're doing and is using it, you can ensure that what
you're advertising actually exists. If you look at the SCSI example I
gave, both the probe points and the variables actually exist, and will
Yes there are. There are actually two performance impacts:
1. The nops themselves take cycles to execute ... small, granted,
but it adds up with lots of probe points
2. The probes interfere with optimisation since to replace them
with a function call, they must be barriers.
I didn't say use simple probes to replace markers ... I just said it's
an alternative for things like I/O subsystems that don't want the
perturbation.
James
--
Hi - Well, you can test your theory: replace some "tracepoints" or markers or printk's with this, and see if systemtap (or gdb) can get at the same data. When "correct selection" is a function of any particular compiler's optimization algorithms, it will be difficult for a human programmer That's *if* the line number ends up being resolvable back to a PC. In fact, since there is no code emitted for it, that particular line You misunderstood - I am not talking about whether the variables exist in the context of the source code. The question is which of those variables still exist, live & addressable, in the machine code and execution state. You may be surprised to what extent compiler That's why I qualified it with "tangible". Please confirm your intuition about these costs. - FChE --
Not necessarily. A tracepoint by a barrier will always be pretty much OK, as will variables that are either passed in or passed to functions (since they have to be instantiated to pass as arguments). Plus screw ups are easily detectable by a tool that parses the dwarf. The essential point is that we need zero impact trace points and that makes them difficult to place in this fashion. However, the burden of placing and verifying them rests with the people in the actual subsystem Erm, no ... dwarf is designed to emit an entry for every line in the file (whether it contains a statment or not). The empty lines get elided in the line number program (because you can attach them to the first statement following) but a correct parser will recover them (by No ... I'm used to optimisation strangeness. Again, I'm not trying to eliminate it because that would defeat the zero impact purpose. I'm trying to build a system that can be useful without any impact. The consequence is going to be that certain trace points can't be used because of the optimiser, but that's the tradeoff. As long as the people placing the trace points are subject matter experts in the 1 is pretty obvious ... the nops have a defined cycle time in every instruction architecture. The optimisation costs are very difficult to quantify since they vary so much from compiler to compiler and function to function. James --
So as I understand things, your light-weight tracepoints are designed for very performance-sensitive code paths where we don't want to disturbe the optimization in the deactivated state. In non-performance sensitive parts of the kernel, where cycle counting is not so important, tracepoints can and probably should still be used. So I don't think you were proposing eliminating the current kernel markers in favor of this approach, yes? When you said a tool could determine if the tracepoint had gotten optimized away, or the variables were no longer present, I assume you meant at compile time, right? So with the right tool built into the kbuild infrastructure, if we could simply print warnings when tracepoints had gotten optimized away, that would make the your simple tracepoints quite safe for general use, I would think. - Ted P.S. When you said that the current kernel markers are "a bit heavyweight", how bad are they in practice? Hundreds of cycles? More? --
That's right ... I started from the position that the current markers were too heavy for an I/O subsystem, but I'm sure they have many other Yes and no. Yes because a tool will be able to detect the problems, but no if you're thinking an actual kernel compile would do it (unless some tool is designed for this and integrated into the build ... the obvious Yes ... but someone has to come up with the tool. I suppose rebuilding the line number matrix and finding the variables at the location is easy mechanical dwarf stuff ... but it will give the kernel build a lot of external dependencies it didn't have before. Plus, this level of checking can only be done if dwarf is generated (i.e. CONFIG_KERNEL_DEBUG_INFO is y). James --
Hi - It will be interesting to see how frequently such a warning appears for a good suite of such mini markers, on a diversity of architectures Good question. The only performance measurements I have seen posted indicate negligible effects. - FChE --
This is just an incremental update based on feedback. The most significant was that making the marker a compiler barrier will free the inserter from worrying about the mark sliding around changes to named variables (and thus having to worry about this in placement) at practically zero optimisation cost. I also updated the code to drop and asm section instead of using the static variable scheme. I also added documentation and made the module loader ignore them (since modules don't go through the vmlinux.lds transformations). I also added a simple versioning scheme (basically tack the version on to the end of the section name). It can be used simply and even provides backwards compatibility (just emit the old and the new sections). If everyone's happy with this, I'll follow it up with the systemtap changes to make use of them ... they've been incredibly helpful debugging some of the CDROM problems for me so far. James From: James Bottomley <James.Bottomley@HansenPartnership.com> Date: Wed, 9 Jul 2008 16:18:16 -0500 Subject: [PATCH] add simple marker trace point infrastructure his patch adds incredibly simple markers which are designed to be used via kprobes. All it does is add an extra section to the kernel (and modules) which annotates the location in source file/line of the marker and a description of the variables of interest. Tools like systemtap can then use the kernel dwarf2 debugging information to transform this to a precise probe point that gives access to the named variables. The beauty of this scheme is that it has zero cost in the unactivated case (the extra section is discardable if you're not interested in the information, and nothing is actually added into the routine being marked). The disadvantage is that it's really unusable for rolling your own marker probes because it relies on the dwarf2 information to locate the probe point for kprobes and unravel the local variables of interest, so you need an external tool like systemtap to help ...
This is the systemtap piece that allows you to use simple markers as
probe points for people who want to play around with the functionality.
James
From: James Bottomley <James.Bottomley@HansenPartnership.com>
Date: Fri, 11 Jul 2008 09:32:34 -0500
Subject: Add simple_marker statement
Now that the kernel drops simple markers in a __simple_marker section, update systemtap to parse for them by introducing an extra
<module>.simple_mark(<marker str>)
statement. It would be nice to reuse the existing mark() directive,
but unfortunately, the parser can't cope with semantic dependent
parsing (it won't allow the registration of two identical patterns),
so the easiest way to get this to work is to introduce an additional
statement type.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
---
tapsets.cxx | 124 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 121 insertions(+), 3 deletions(-)
diff --git a/tapsets.cxx b/tapsets.cxx
index adfe10e..ce59102 100644
--- a/tapsets.cxx
+++ b/tapsets.cxx
@@ -458,6 +458,7 @@ static string TOK_MAXACTIVE("maxactive");
static string TOK_STATEMENT("statement");
static string TOK_ABSOLUTE("absolute");
static string TOK_PROCESS("process");
+static string TOK_SIMPLE_MARK("simple_mark");
// Can we handle this query with just symbol-table info?
enum dbinfo_reqt
@@ -571,7 +572,15 @@ module_cache
};
typedef struct module_cache module_cache_t;
+struct marker_map_data {
+ string file;
+ int line;
+
+ marker_map_data(void) : line(-1) { };
+};
+
#ifdef HAVE_TR1_UNORDERED_MAP
+typedef tr1::unordered_map<string,struct marker_map_data> marker_map_t;
typedef tr1::unordered_map<string,Dwarf_Die> cu_function_cache_t;
typedef tr1::unordered_map<string,cu_function_cache_t*> mod_cu_function_cache_t; // module:cu -> function -> die
#else
@@ -579,6 +588,7 @@ struct stringhash {
size_t operator() (const string& s) const { hash<const char*> h; return h(s.c_str()); }
};
...Clever. We can include support for this as soon as kernel-side simple_mark widget go upstream. (For completeness, the code would need test cases, docs, and desirably support for wildcarding as in probe kernel.simple_mark("*").) - FChE --
Hi James, I'm very interested in your approach. IMHO, as Aoki investigated, the overhead of markers is not so big unless we put a lot of them into kernel. And from "active" overhead point of view, it takes less than tens of nano-seconds, while kprobes takes hundreds of nano-seconds. Kprobe also has a limitation of probable points, it can't probe "__kprobes" marked functions. So, original markers still has advantages. However, your approach is also useful, especially for embedding thousands of markers in kernel or drivers. So I think it's better to use both of them as the situation demands. I just have one comment on its name. Since it doesn't trace anything, so I'd rather like notation() or note_mark() than trace_simple(). :-) -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhiramat@redhat.com --
That's the case which I started from. The point is that if passive markers have a cost, we have to be very careful about placing them to Yes ... the zero impact markers are completely dependent on the kprobes overhead for activation ... on the other hand, one of the vendor complaints is cost of activation of kprobes, so it's nicely tied into Certainly ... as I said to Ted, I'm not planning to replace the current well ... the current markers code uses trace_mark as its base .. I was just trying to fit into that scheme. Also, don't rely on anything in this code yet ... that's why it's an RFC; I'm still playing around with the section formats and the information. After more discussions with people, I'm actually coming to the conclusion that dropping the address of the simple marker might be very useful (in place of file and line). It makes the marker section need relocation, but it would also mean they could be used simply from within the kernel as well. James --
