[PATCH 7/9 - v2][RFC] tracing: Move print functions into event class

Previous thread: [GIT PULL] RCU fixes for 2.6.34 by Paul E. McKenney on Monday, May 3, 2010 - 8:36 pm. (2 messages)

Next thread: [PATCH 1/3] lockdep: Provide off case for redundant_hardirqs_on increment by Frederic Weisbecker on Monday, May 3, 2010 - 8:45 pm. (1 message)
From: Steven Rostedt
Date: Monday, May 3, 2010 - 8:40 pm

[Version 2]

This is an RFC patch set that also affects kprobes and perf.

At the Linux Collaboration Summit, I talked with Mathieu and others about
lowering the footprint of trace events. I spent all of last week
trying to get the size as small as I could.

Currently, each TRACE_EVENT() macro adds 1 - 5K per tracepoint. I got various
results by adding a TRACE_EVENT() with the compiler, depending on
config options that did not seem related. The new tracepoint I added
would add between 1 and 5K, but I did not investigate enough to
see what the true size was.

What was consistent, was the DEFINE_EVENT(). Currently, it adds
a little over 700 bytes per DEFINE_EVENT().

This patch series does not seem to affect TRACE_EVENT() much (had
the same various sizes), but consistently brings DEFINE_EVENT()s
down from 700 bytes to 250 bytes per DEFINE_EVENT(). Since syscalls
use one "class" and are equivalent to DEFINE_EVENT() this can
be a significant savings.

With events and syscalls (82 events and 616 syscalls), before this
patch series, the size of vmlinux was: 16161794, and afterward: 16058182.

That is 103,612 bytes in savings! (over 100K)


Without tracing syscalls (82 events), it brought the size of vmlinux
down from 1591046 to 15999394.

22,071 bytes in savings.

This is just an RFC (for now), to get peoples opinions on the changes.
It does a bit of rewriting of the CPP macros, just to warning you ;-)

Changes in v2:

 o  Ported to latest tip/tracing/core

 o  Removed DECLARE_TRACE_DATA() and made DECLARE_TRACE() have
    the ability to pass a data parameter. This makes DECLARE_TRACE()
    not work with no args. A new DECLARE_TRACE_NOARGS() has been created
    that also allows data to be passed, but does is for tracepoint(void).

 o  Made the callbacks be "proto, void *data" and typecast the data
    within the function.


-- Steve

The code can be found at:

  ...
From: Steven Rostedt
Date: Monday, May 3, 2010 - 8:40 pm

From: Steven Rostedt <srostedt@redhat.com>

The raw_init function pointer in the event is used to initialize
various kinds of events. The type of initialization needed is usually
classed to the kind of event it is.

Two events with the same class will always have the same initialization
function, so it makes sense to move this to the class structure.

Perhaps even making a special system structure would work since
the initialization is the same for all events within a system.
But since there's no system structure (yet), this will just move it
to the class.

   text	   data	    bss	    dec	    hex	filename
5788186	1337252	9351592	16477030	 fb6b66	vmlinux.orig
5774567	1297492	9351592	16423651	 fa9ae3	vmlinux.fields
5774510	1293204	9351592	16419306	 fa89ea	vmlinux.init

The text grew very slightly, but this is a constant growth that happened
with the changing of the C files that call the init code.
The bigger savings is the data which will be saved the more events share
a class.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/ftrace_event.h  |    2 +-
 include/linux/syscalls.h      |    2 --
 include/trace/ftrace.h        |    9 ++++-----
 kernel/trace/trace_events.c   |   12 ++++++------
 kernel/trace/trace_export.c   |    2 +-
 kernel/trace/trace_kprobe.c   |    6 +++---
 kernel/trace/trace_syscalls.c |    2 ++
 7 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 479c3c1..393a839 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -133,6 +133,7 @@ struct ftrace_event_class {
 	int			(*define_fields)(struct ftrace_event_call *);
 	struct list_head	*(*get_fields)(struct ftrace_event_call *);
 	struct list_head	fields;
+	int			(*raw_init)(struct ftrace_event_call *);
 };
 
 struct ftrace_event_call {
@@ -144,7 +145,6 @@ struct ftrace_event_call {
 	int			enabled;
 ...
From: Steven Rostedt
Date: Monday, May 3, 2010 - 8:40 pm

From: Steven Rostedt <srostedt@redhat.com>

This patch creates a ftrace_event_class struct that event structs point to.
This class struct will be made to hold information to modify the
events. Currently the class struct only holds the events system name.

This patch slightly increases the size of the text as well as decreases
the data size. The overall change is still a slight increase, but
this change lays the ground work of other changes to make the footprint
of tracepoints smaller.

With 82 standard tracepoints, and 616 system call tracepoints:

   text	   data	    bss	    dec	    hex	filename
5788186	1337252	9351592	16477030	 fb6b66	vmlinux.orig
5792282	1333796	9351592	16477670	 fb6de6	vmlinux.class

This patch also cleans up some stale comments in ftrace.h.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/ftrace_event.h       |    6 ++++-
 include/linux/syscalls.h           |    6 +++-
 include/trace/ftrace.h             |   40 +++++++++++++++--------------------
 kernel/trace/trace_events.c        |   20 +++++++++---------
 kernel/trace/trace_events_filter.c |    6 ++--
 kernel/trace/trace_export.c        |    6 ++++-
 kernel/trace/trace_kprobe.c        |   12 +++++-----
 kernel/trace/trace_syscalls.c      |    4 +++
 8 files changed, 54 insertions(+), 46 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 39e71b0..496eea8 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -113,10 +113,14 @@ void tracing_record_cmdline(struct task_struct *tsk);
 
 struct event_filter;
 
+struct ftrace_event_class {
+	char			*system;
+};
+
 struct ftrace_event_call {
 	struct list_head	list;
+	struct ftrace_event_class *class;
 	char			*name;
-	char			*system;
 	struct dentry		*dir;
 	struct trace_event	*event;
 	int			enabled;
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 057929b..ac5791d 100644
--- a/include/linux/syscalls.h
+++ ...
From: Frederic Weisbecker
Date: Thursday, May 6, 2010 - 9:21 pm

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

--

From: Steven Rostedt
Date: Monday, May 3, 2010 - 8:40 pm

From: Steven Rostedt <srostedt@redhat.com>

The filter_active and enable both use an int (4 bytes each) to
set a single flag. We can save 4 bytes per event by combining the
two into a single integer.

   text	   data	    bss	    dec	    hex	filename
5788186	1337252	9351592	16477030	 fb6b66	vmlinux.orig
5761074	1262596	9351592	16375262	 f9ddde	vmlinux.id
5761007	1256916	9351592	16369515	 f9c76b	vmlinux.flags

This gives us another 5K in savings.

The modification of both the enable and filter fields are done
under the event_mutex, so it is still safe to combine the two.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/ftrace_event.h       |   21 +++++++++++++++++++--
 kernel/trace/trace.h               |    2 +-
 kernel/trace/trace_events.c        |   14 +++++++-------
 kernel/trace/trace_events_filter.c |   10 +++++-----
 kernel/trace/trace_kprobe.c        |    2 +-
 5 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 0be0285..5ac97a4 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -143,6 +143,16 @@ struct ftrace_event_class {
 	int			(*raw_init)(struct ftrace_event_call *);
 };
 
+enum {
+	TRACE_EVENT_FL_ENABLED_BIT,
+	TRACE_EVENT_FL_FILTERED_BIT,
+};
+
+enum {
+	TRACE_EVENT_FL_ENABLED	= (1 << TRACE_EVENT_FL_ENABLED_BIT),
+	TRACE_EVENT_FL_FILTERED	= (1 << TRACE_EVENT_FL_FILTERED_BIT),
+};
+
 struct ftrace_event_call {
 	struct list_head	list;
 	struct ftrace_event_class *class;
@@ -154,8 +164,15 @@ struct ftrace_event_call {
 	void			*mod;
 	void			*data;
 
-	int			enabled;
-	int			filter_active;
+	/*
+	 * 32 bit flags:
+	 *   bit 1:		enabled
+	 *   bit 2:		filter_active
+	 *
+	 *  Must hold event_mutex to change.
+	 */
+	unsigned int		flags;
+
 	int			perf_refcount;
 };
 
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index c88c563..6356259 100644
--- a/kernel/trace/trace.h
+++ ...
From: Steven Rostedt
Date: Monday, May 3, 2010 - 8:40 pm

From: Steven Rostedt <srostedt@redhat.com>

Currently, every event has its own trace_event structure. This is
fine since the structure is needed anyway. But the print function
structure (trace_event_functions) is now separate. Since the output
of the trace event is done by the class (with the exception of events
defined by DEFINE_EVENT_PRINT), it makes sense to have the class
define the print functions that all events in the class can use.

This makes a bigger deal with the syscall events since all syscall events
use the same class. The savings here is another 37K.

   text	   data	    bss	    dec	    hex	filename
5788186	1337252	9351592	16477030	 fb6b66	vmlinux.orig
5774574	1293204	9351592	16419370	 fa8a2a	vmlinux.init
5761154	1268356	9351592	16381102	 f9f4ae	vmlinux.print

To accomplish this, and to let the class know what event is being
printed, the event structure is embedded in the ftrace_event_call
structure. This should not be an issues since the event structure
was created for each event anyway.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/ftrace_event.h  |    2 +-
 include/linux/syscalls.h      |   18 +++------------
 include/trace/ftrace.h        |   47 +++++++++++++++++-----------------------
 kernel/trace/trace_events.c   |    6 ++--
 kernel/trace/trace_kprobe.c   |   14 +++++-------
 kernel/trace/trace_syscalls.c |    8 +++++++
 6 files changed, 42 insertions(+), 53 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 4f77932..b1a007d 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -148,7 +148,7 @@ struct ftrace_event_call {
 	struct ftrace_event_class *class;
 	char			*name;
 	struct dentry		*dir;
-	struct trace_event	*event;
+	struct trace_event	event;
 	int			enabled;
 	int			id;
 	const char		*print_fmt;
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index ...
From: Steven Rostedt
Date: Monday, May 3, 2010 - 8:40 pm

From: Steven Rostedt <srostedt@redhat.com>

Now that the trace_event structure is embedded in the ftrace_event_call
structure, there is no need for the ftrace_event_call id field.
The id field is the same as the trace_event type field.

Removing the id and re-arranging the structure brings down the tracepoint
footprint by another 5K.

   text	   data	    bss	    dec	    hex	filename
5788186	1337252	9351592	16477030	 fb6b66	vmlinux.orig
5761154	1268356	9351592	16381102	 f9f4ae	vmlinux.print
5761074	1262596	9351592	16375262	 f9ddde	vmlinux.id

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/ftrace_event.h       |    5 ++---
 include/trace/ftrace.h             |   12 ++++++------
 kernel/trace/trace_event_perf.c    |    4 ++--
 kernel/trace/trace_events.c        |    7 +++----
 kernel/trace/trace_events_filter.c |    2 +-
 kernel/trace/trace_export.c        |    4 ++--
 kernel/trace/trace_kprobe.c        |   18 ++++++++++--------
 kernel/trace/trace_syscalls.c      |   14 ++++++++------
 8 files changed, 34 insertions(+), 32 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index b1a007d..0be0285 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -149,14 +149,13 @@ struct ftrace_event_call {
 	char			*name;
 	struct dentry		*dir;
 	struct trace_event	event;
-	int			enabled;
-	int			id;
 	const char		*print_fmt;
-	int			filter_active;
 	struct event_filter	*filter;
 	void			*mod;
 	void			*data;
 
+	int			enabled;
+	int			filter_active;
 	int			perf_refcount;
 };
 
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 9765454..db839d9 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -150,7 +150,7 @@
  *
  *	entry = iter->ent;
  *
- *	if (entry->type != event_<call>.id) {
+ *	if (entry->type != event_<call>->event.type) {
  *		WARN_ON_ONCE(1);
  *		return TRACE_TYPE_UNHANDLED;
  *	}
@@ -221,7 +221,7 @@ ftrace_raw_output_##call(struct ...
From: Steven Rostedt
Date: Monday, May 3, 2010 - 8:40 pm

From: Steven Rostedt <srostedt@redhat.com>

This patch removes the register functions of TRACE_EVENT() to enable
and disable tracepoints. The registering of a event is now down
directly in the trace_events.c file. The tracepoint_probe_register()
is now called directly.

The prototypes are no longer type checked, but this should not be
an issue since the tracepoints are created automatically by the
macros. If a prototype is incorrect in the TRACE_EVENT() macro, then
other macros will catch it.

The trace_event_class structure now holds the probes to be called
by the callbacks. This removes needing to have each event have
a separate pointer for the probe.

To handle kprobes and syscalls, since they register probes in a
different manner, a "reg" field is added to the ftrace_event_class
structure. If the "reg" field is assigned, then it will be called for
enabling and disabling of the probe for either ftrace or perf. To let
the reg function know what is happening, a new enum (trace_reg) is
created that has the type of control that is needed.

With this new rework, the 82 kernel events and 616 syscall events
has their footprint dramatically lowered:

   text	   data	    bss	    dec	    hex	filename
5788186	1337252	9351592	16477030	 fb6b66	vmlinux.orig
5792282	1333796	9351592	16477670	 fb6de6	vmlinux.class
5793448	1333780	9351592	16478820	 fb7264	vmlinux.tracepoint
5796926	1337748	9351592	16486266	 fb8f7a	vmlinux.data
5774316	1306580	9351592	16432488	 fabd68	vmlinux.regs

The size went from 16477030 to 16432488, that's a total of 44K
in savings. With tracepoints being continuously added, this is
critical that the footprint becomes minimal.

v2: Changed the callback probes to pass void * and typecast the
    value within the function.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/ftrace_event.h    |   19 +++++--
 include/linux/syscalls.h        |   29 ++---------
 include/trace/ftrace.h          |  111 ++++++---------------------------------
 ...
From: Frederic Weisbecker
Date: Thursday, May 6, 2010 - 9:20 pm

Agreed. Typechecking matters for human code but not in this context.
Considering that the tracepoint and the probe are created by the same
CPP code, bugs will be tracked down quickly and located to a single



Very nice!!

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

--

From: Steven Rostedt
Date: Friday, May 7, 2010 - 5:42 am

Thanks! 

-- Steve 


-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
From: Mathieu Desnoyers
Date: Friday, May 7, 2010 - 7:54 am

So it seems that I am the only one asking for extra type-checking and
caring about problems that can appear subtily on architectures where the
number of caller/callee arguments must match. And also the only one
considering that passing more arguments to a callback that does not
expect all of them might be a problem on some architectures.

Am I the only one thinking there is something fishy there ? I might be
entirely over-paranoid, but this approach has rarely failed me in the
past.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--

From: Steven Rostedt
Date: Friday, May 7, 2010 - 8:12 am

I think you are the only one not realizing that the caller and callee
are created automatically with the same data. There is no human
intervention here.

You are asking to add a check that I can not see helping. The only way
to add a check, is to use the automated process to check the automation.
If the automated process fails, it is very likely the check will also be
broken and will not catch the bug either.

-- Steve


--

From: Mathieu Desnoyers
Date: Friday, May 7, 2010 - 8:31 am

Tell me where to fetch the git head, I'll add it myself. ;)

Thanks,


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--

From: Steven Rostedt
Date: Friday, May 7, 2010 - 8:43 am

It was posted in the 0/9 patch. Although I am modifying it now to
include the latest changes.

-- Steve


--

From: Frederic Weisbecker
Date: Friday, May 7, 2010 - 11:01 am

You are confusing two different and unrelated issues here.

One is the lost typechecking when we _register_ a probe because we now use
directly tracepoint_probe_register() for that.
I don't care personally because both tracepoint and probe are created by
the same automation. There is very few risk of callback type mess and
if there is, it will be located in a small and isolated code.

The second is this extra parameter passed whether or not it is needed.
And although we suppose it is safe, I don't feel comfortable with it.
So if we can find a more proper way to avoid it, I'm all for it.

Thanks.

--

From: Steven Rostedt
Date: Friday, May 7, 2010 - 12:08 pm

Now I'm making the extra parameter mandatory for all tracepoint
probes. ;-)

But this time, it will be at the start not the end.

	void probe(void *data, proto);


Unfortunately we can't avoid it. In order to remove the extra code
(registering and unregistering) and even share the probe among several
events, we need a way to pass the data to the probe to let the probe
know what event it is dealing with (to put in the event id into the
buffer, to let the tracer output code know what event this data is for).

The current method is that only the proto that the tracepoint uses is
passed to the probe. This gives us no way to add any more information.

This new method allows data to be assigned at probe register, and the
probe gets this data as the first parameter.

The register_* functions will still do typechecking of the probes, they
just add the "void *" at the beginning.

Actually, here is a place that I can see where Mathieu's check does come
in handy. If we add the check test to each probe, and the tracepoint
proto changes, it will flag it.

Mathieu, you've been explaining this wrong ;-)

I'm not worried about changes to ftrace.h breaking things. I'm worried
about changes to tracepoint.h breaking ftrace.h.  This is where your
check comes in. As I change the void *data from the end to the start,
I'm nervous about catching all the probes that are registered this way.
(ftrace events, syscalls, kprobes, and perf)

-- Steve


--

From: Frederic Weisbecker
Date: Friday, May 7, 2010 - 1:03 pm

Yeah right, I see the point.

--

From: Mathieu Desnoyers
Date: Friday, May 7, 2010 - 1:58 pm

Yep, it seems like I've done a terrible job at trying to explain my motivation

I'm nervous about catching all tracepoint-related problems that may arise from
both tracepoint and tracepoint probes prototype changes, yes.

I'm glad we agree then. :-)

Thanks,


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--

From: Li Zefan
Date: Friday, May 7, 2010 - 1:20 am

Accessing of ->perf_probe needs to be guarded with CONFIG_PERF_EVENTS,
otherwise it won't pass compile.

The original code is fine, because ->perf_event_enable is always there
regardless of CONFIG_PERF_EVENTS.
--

From: Steven Rostedt
Date: Friday, May 7, 2010 - 5:59 am

Good catch! I wanted to test the !CONFIG_PERF_EVENTS but that needs to
be done on a non x86 box. I'll need to do that before posting my non-RFC
patch set.

-- Steve


--

From: Steven Rostedt
Date: Monday, May 3, 2010 - 8:40 pm

From: Steven Rostedt <srostedt@redhat.com>

Move the defined fields from the event to the class structure.
Since the fields of the event are defined by the class they belong
to, it makes sense to have the class hold the information instead
of the individual events. The events of the same class would just
hold duplicate information.

After this change the size of the kernel dropped another 8K:

   text	   data	    bss	    dec	    hex	filename
5788186	1337252	9351592	16477030	 fb6b66	vmlinux.orig
5774316	1306580	9351592	16432488	 fabd68	vmlinux.reg
5774503	1297492	9351592	16423587	 fa9aa3	vmlinux.fields

Although the text increased, this was mainly due to the C files
having to adapt to the change. This is a constant increase, where
new tracepoints will not increase the Text. But the big drop is
in the data size (as well as needed allocations to hold the fields).
This will give even more savings as more tracepoints are created.

Note, if just TRACE_EVENT()s are used and not DECLARE_EVENT_CLASS()
with several DEFINE_EVENT()s, then the savings will be lost. But
we are pushing developers to consolidate events with DEFINE_EVENT()
so this should not be an issue.

The kprobes define a unique class to every new event, but are dynamic
so it should not be a issue.

The syscalls however have a single class but the fields for the individual
events are different. The syscalls use a metadata to define the
fields. I moved the fields list from the event to the metadata and
added a "get_fields()" function to the class. This function is used
to find the fields. For normal events and kprobes, get_fields() just
returns a pointer to the fields list_head in the class. For syscall
events, it returns the fields list_head in the metadata for the event.

v2:  Fixed the syscall fields. The syscall metadata needs a list
     of fields for both enter and exit.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: ...
From: Frederic Weisbecker
Date: Thursday, May 6, 2010 - 9:49 pm

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

Now I need to recover from my CPP headache before reviewing
this set further ;)

--

From: Steven Rostedt
Date: Friday, May 7, 2010 - 5:57 am

I know how you feel. Actually writing CPP takes your brain away from the
ability to write documentation, hence my lack of it ;-)

-- Steve


--

From: Steven Rostedt
Date: Monday, May 3, 2010 - 8:40 pm

From: Steven Rostedt <srostedt@redhat.com>

Multiple events may use the same method to print their data.
Instead of having all events have a pointer to their print funtions,
the trace_event structure now points to a trace_event_functions structure
that will hold the way to print ouf the event.

The event itself is now passed to the print function to let the print
function know what kind of event it should print.

This opens the door to consolidating the way several events print
their output.

v2: Fix the new function graph tracer event calls to handle this change.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/ftrace_event.h         |   17 +++-
 include/linux/syscalls.h             |   10 ++-
 include/trace/ftrace.h               |   12 ++-
 include/trace/syscall.h              |    6 +-
 kernel/trace/blktrace.c              |   13 ++-
 kernel/trace/kmemtrace.c             |   28 +++++--
 kernel/trace/trace.c                 |    9 +-
 kernel/trace/trace_functions_graph.c |   13 ++-
 kernel/trace/trace_kprobe.c          |   22 ++++--
 kernel/trace/trace_output.c          |  137 +++++++++++++++++++++++-----------
 kernel/trace/trace_output.h          |    2 +-
 kernel/trace/trace_syscalls.c        |    6 +-
 12 files changed, 186 insertions(+), 89 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 393a839..4f77932 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -70,18 +70,25 @@ struct trace_iterator {
 };
 
 
+struct trace_event;
+
 typedef enum print_line_t (*trace_print_func)(struct trace_iterator *iter,
-					      int flags);
-struct trace_event {
-	struct hlist_node	node;
-	struct list_head	list;
-	int			type;
+				      int flags, struct trace_event *event);
+
+struct trace_event_functions {
 	trace_print_func	trace;
 	trace_print_func	raw;
 	trace_print_func	hex;
 	trace_print_func	binary;
 ...
From: Steven Rostedt
Date: Monday, May 3, 2010 - 8:40 pm

From: Steven Rostedt <srostedt@redhat.com>

This patch allows data to be passed to the tracepoint callbacks
if the tracepoint was created to do so.

The DECLARE_TRACE() now adds two new functions:

	register_trace_mytracepoint_data()
	unregister_trace_mytracepoint_data()

These two are the same as the original

	register_trace_mytracepoint()
	unregister_trace_mytracepoint()

But now allow you to pass a private data pointer that will
be passed to the callback handle. For example:

DECLARE_TRACE(mytracepoint, int value, value);

will create a function called trace_mytracepoint()

	void trace_mytracepoint(int value);

If the user wants to pass data to register a function to this tracepoint
and have data also passed to this callback, they can use:

	int mycallback(int value, void *data);

	register_trace_mytracepoint_data(mycallback, mydata);

Then the mycallback() will receive the "mydata" as the parameter after
the args.

A more detailed example:

  DECLARE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

  /* In the C file */

  DEFINE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

  [...]

       trace_mytacepoint(status);

  /* In a file registering this tracepoint */

  int my_callback(int status, void *data)
  {
	struct my_struct my_data = data;
	[...]
  }

  [...]
	my_data = kmalloc(sizeof(*my_data), GFP_KERNEL);
	init_my_data(my_data);
	register_trace_mytracepoint_data(my_callback, my_data);

The same callback can also be registered to the same tracepoint as long
as the data registered is the different. Note, the data must also be used
to unregister the callback:

	unregister_trace_mytracepoint_data(my_callback, my_data);

Because of the data parameter, tracepoints declared this way can not have
no args. That is:

  DECLARE_TRACE(mytracepoint, TP_PROTO(void), TP_ARGS());

will cause an error.

If no arguments are needed, a new macro can be used instead:

  DECLARE_TRACE_NOARGS(mytracepoint);

Since there ...
From: Frederic Weisbecker
Date: Thursday, May 6, 2010 - 8:52 pm

So, we had a talk about this and we concluded that it is probably fine
on every archs to push one more argument than needed in a function.

But I think it would be nice to add a comment about this. Firstly
because this line breaks all the self-explanation of the code, I mean
I tried hard to find how the non-data callback case was handled :)







It seems that the on and off cases are exactly the same for DECLARE_TRACE*(),
you could provide a single version and let the __DECLARE_TRACE() do
the on/off trick.

Thanks.

--

From: Steven Rostedt
Date: Friday, May 7, 2010 - 7:09 am

Hmm, I replied originally from my Google phone, but I don't see it in
LKML. So apologies if this is duplicate.







I don't know what you mean here. How would __DECLARE_TRACE() do what
both DECLARE_TRACE() and DECLARE_TRACE_NOARGS() do? It will fail the
compile if proto is "void".

-- Steve


--

From: Frederic Weisbecker
Date: Friday, May 7, 2010 - 11:06 am

No, what I meant is that you have:

#ifdef CONFIG_TRACEPOINTS
[...]
+#define DECLARE_TRACE_NOARGS(name)                                       \
      __DECLARE_TRACE(name, void, , void *__data, __data)

#define DECLARE_TRACE(name, proto, args)                         \
      __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),      \
                      PARAMS(proto, void *__data),            \
                      PARAMS(args, __data))
[...]
#else
[...]
+#define DECLARE_TRACE_NOARGS(name)                                       \
      __DECLARE_TRACE(name, void, , void *__data, __data)

#define DECLARE_TRACE(name, proto, args)                         \
      __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),      \
                      PARAMS(proto, void *__data),            \
                      PARAMS(args, __data)
[...]
#endif


See? They seem to be the exact same version, so this could be only
one version outside the ifdef.
And the CONFIG_TRACEPOINTS on/off case is dealt from __DECLARE_TRACE().

--

From: Steven Rostedt
Date: Friday, May 7, 2010 - 12:10 pm

Ah, I see (said the blind man as he slipped and fell on the ice).

-- Steve



--

Previous thread: [GIT PULL] RCU fixes for 2.6.34 by Paul E. McKenney on Monday, May 3, 2010 - 8:36 pm. (2 messages)

Next thread: [PATCH 1/3] lockdep: Provide off case for redundant_hardirqs_on increment by Frederic Weisbecker on Monday, May 3, 2010 - 8:45 pm. (1 message)