login
Header Space

 
 

Re: [RFC PATCH 1/2] Marker probes in futex.c

Previous thread: [PATCH] vfs: Fix possible deadlock in ext2, ext3, ext4 when using xattrs by Jan Kara on Tuesday, April 15, 2008 - 7:38 am. (1 message)

Next thread: 2.6.25 regression: 60417f5976df breaks Thinkpad suspend by George Spelvin on Tuesday, April 15, 2008 - 7:53 am. (6 messages)
To: <linux-kernel@...>
Cc: <tglx@...>, <a.p.zijlstra@...>, <mingo@...>, <mathieu.desnoyers@...>
Date: Tuesday, April 15, 2008 - 7:50 am

Hi All,
	The following set of patches a)Implant marker probes in futex.c
and b)define a sample use-case for the markers in futex.

The marker handler patches also make use of the enhancements proposed to
the 'trace' infrastructure i.e. debugfs_printk() interface.
http://lkml.org/lkml/2008/4/15/117

The patches are generated over 2.6.25-rc8-mm1 and have been tested on an
i386 machine.
 
Thanks,
K.Prasad

--
To: <linux-kernel@...>
Cc: <tglx@...>, <a.p.zijlstra@...>, <mingo@...>, <mathieu.desnoyers@...>
Date: Tuesday, April 15, 2008 - 7:53 am

We place a probe at the function entry for each futex operation,
and also at each point where a futex operation fails.
 
Signed-off-by: K.Prasad &lt;prasad@linux.vnet.ibm.com&gt;
---
 kernel/futex.c |  110 +++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 88 insertions(+), 22 deletions(-)

Index: linux-2.6.25-rc8-mm2/kernel/futex.c
===================================================================
--- linux-2.6.25-rc8-mm2.orig/kernel/futex.c
+++ linux-2.6.25-rc8-mm2/kernel/futex.c
@@ -737,8 +737,14 @@ static int futex_wake(u32 __user *uaddr,
 	union futex_key key;
 	int ret;
 
-	if (!bitset)
-		return -EINVAL;
+	trace_mark(futex_wake_called, "uaddr:%p fshared:%p nr_wake:%d "
+			"bitset:%d",
+			uaddr, fshared, nr_wake, bitset);
+
+	if (unlikely(!bitset)) {
+		ret = -EINVAL;
+		goto out;
+	}
 
 	futex_lock_mm(fshared);
 
@@ -770,6 +776,8 @@ static int futex_wake(u32 __user *uaddr,
 	spin_unlock(&amp;hb-&gt;lock);
 out:
 	futex_unlock_mm(fshared);
+	if (unlikely(ret != 0))
+		trace_mark(futex_wake_failed, "uaddr:%p ret:%d", uaddr, ret);
 	return ret;
 }
 
@@ -901,6 +909,14 @@ static int futex_requeue(u32 __user *uad
 	struct futex_q *this, *next;
 	int ret, drop_count = 0;
 
+	/*
+	 * Call this probe futex_cmp_requeue_called, although this function is
+	 * common to both FUTEX_REQUEUE and FUTEX_CMP_REQUEUE. This is because
+	 * FUTEX_REQUEUE is deprecated.
+	 */
+	trace_mark(futex_cmp_requeue_called, "uaddr1:%p fshared:%p uaddr2:%p "
+			"nr_wake:%d nr_requeue:%d cmpval:%p",
+			uaddr1, fshared, uaddr2, nr_wake, nr_requeue, cmpval);
  retry:
 	futex_lock_mm(fshared);
 
@@ -908,8 +924,12 @@ static int futex_requeue(u32 __user *uad
 	if (unlikely(ret != 0))
 		goto out;
 	ret = get_futex_key(uaddr2, fshared, &amp;key2);
-	if (unlikely(ret != 0))
+	if (unlikely(ret != 0)) {
+		trace_mark(futex_cmp_requeue_uaddr2_failed, "uaddr1:%p "
+				"uaddr2:%p ret:%d",
+				uaddr1, uaddr2, ret);
 		goto out;
+	}
 
 	hb1 = hash_futex(&amp;k...
To: <prasad@...>
Cc: <linux-kernel@...>, <tglx@...>, <mingo@...>, <mathieu.desnoyers@...>
Date: Tuesday, April 15, 2008 - 8:02 am

This is some seriuosly ugly looking gunk, why would we want stuff like
that scattered across the code?

What is wrong with a few simple hooks like:

  trace_futex_wait(uaddr, fshares, val, abs_time, bitset);

and then deal with that.

Also, you seem to expose way too much futex internals; do you really
need that? People will go use this marker crap like ABI and further
restrain us from changing the code.

/me unhappy.



--
To: Peter Zijlstra <a.p.zijlstra@...>
Cc: <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, <mathieu.desnoyers@...>
Date: Friday, April 18, 2008 - 6:44 am

My 2c: if we even begin to countenance merging stuff like that, it's time
for all the trace code to be removed.

&lt;checks the email headers&gt;

Nope, it wasn't April 1st.  Guys, get a grip.
--
To: Peter Zijlstra <a.p.zijlstra@...>
Cc: <linux-kernel@...>, <tglx@...>, <mingo@...>, <mathieu.desnoyers@...>
Date: Tuesday, April 15, 2008 - 11:52 am

The idea is to export as much data as possible (considering that the
cost involved in doing so is less) and use them only when required.

If we were to log just the futex_ops, just as you had suggested,
"Just log:
 
  futex: &lt;uaddr&gt; wait
  futex: &lt;uaddr&gt; wakeup"
it would provide only cursory information about code-flow and not enough
debug information to help the user solve the issue he's trying to debug.

Say for e.g. in futex_wait you may want to know if abs_time had a
non-zero value but the primitive debugging information would end
up being a handicap from discovering that.

If you can specifically point me to information you think would be
absolutely unnecessary, I can get them out of the trace_mark().

--K.Prasad
--
To: <prasad@...>
Cc: <linux-kernel@...>, <tglx@...>, <mingo@...>, <mathieu.desnoyers@...>
Date: Wednesday, April 16, 2008 - 11:51 am

I'm thinking everything is superflous; you're basically logging what
strace already gives you; except worse by encoding local variable names
and exposing kernel pointers.

Also get rid of that futex_markers.h thing - its utterly useless and
makes the code less readable.

--
To: Peter Zijlstra <a.p.zijlstra@...>
Cc: <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, <mathieu.desnoyers@...>
Date: Thursday, April 17, 2008 - 6:02 pm

But we don't want to run strace just for this stuff.  As you probably
know, strace involves invasive user-space context-switching between

The pointers are probably excessive, the and the names don't really
matter.  What does matter is providing enough information for a
problem diagnosis tool &amp; person to reconstruct what the kernel must
have been thinking when it did something noteworthy.


- FChE
--
To: Frank Ch. Eigler <fche@...>
Cc: <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, <mathieu.desnoyers@...>
Date: Friday, April 18, 2008 - 2:46 am

Then what do we do when someone comes along and changes one of those
names; do we go around changing the markers and then requiring all tools
to change as well?

(And no this isn't far fetched; I'm thinking of changing fshared in the
near future).

Sounds like people will complain and generate back pressure against such
changes - something we should avoid. As soon as these markers place a

Sure, but then just make a strace like tracer and be done with it - no
need to pollute the futex code with that.

--
To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Frank Ch. Eigler <fche@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>
Date: Saturday, April 19, 2008 - 10:13 am

We should really think about what we are doing before we add a marker in
the kernel code. The information extracted should be both useful and
expected not to change too much between versions. Ideally,
implementation details should not be exported. Exporting useless
information "just because we can" would indeed put pressure on
maintainers. That's where I expect them to be the best persons to tell
what is an implementation detail likely to change, and what is a more
"conceptually stable" information. e.g. a context switch is a context
switch, this does not change with the underlying implementation.

I think that whenever we can add a more "generic" marker which solves
many special cases, we should do so. In this case, using the system call
instrumentation found in my architecture specific instrumentation
patchset would comprehend futex instrumentation. By adding extraction of
all system call parameters, things such as futexes should be covered.
However, we would still need to instrument read() or exec() to extract
the file names. Otherwise, we would have to start doing
architecture-specific code which would "know" what arguments are passed
to each system call. I guess we could do that if it lessens
instrumentation intrusiveness, but we would have to deal with a system
call tracing infrastructure tied closely to system call parameters.
System call audit code seems to already do that, so I guess we could go
that way.

Then, I think we should turn to inner-kernel instrumentation only when
the information extracted from the stable kernel ABI (e.g. system calls)
is not complete enough to understand how things work. That would be the
case for block I/O tracing for instance.


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To: Mathieu Desnoyers <mathieu.desnoyers@...>
Cc: Frank Ch. Eigler <fche@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>
Date: Saturday, April 19, 2008 - 10:56 am

Agreed - so this futex instrumentation will not go anywhere. Prasad
could perhaps help out with your arch specific syscall tracer.

--
To: Peter Zijlstra <a.p.zijlstra@...>
Cc: <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, <mathieu.desnoyers@...>
Date: Friday, April 18, 2008 - 10:29 am

Hi -


At least two answers apply.  The markers being put in should be chosen
with the concurrence of the subsystem maintainer, who should help
identify those key quantities that are likely to be both long-lived
and good diagnostic value.  So that's the discussion we're having
right now: which values should be passed.  If they're long-lived, then
they can be given a long-lived name - and it doesn't have to be a
low-level C variable name.  (There's no reason why we can't also have
a slew of short-lived pure-debugging sorts of markers compiled in.  A
marker naming convention like "__futex..." can be adopted for such
purposes - where nothing is promised for version n+1, just hoping to
help diagnose problems in this version.)

The other answer is that we should ensure that tools do not assume
that the set of markers is fixed.  Let's never set such an
expectation.  (In systemtap, we have several abstraction and
version-adaptation facilities that aim to hide such changes.)
The kernel guy can choose at least two methods to help tools
without contraining himself too much.  He can change

        trace_mark(futex_foo, "p1 %d p2 %d", p1, p2);

to a pair:

        trace_mark(futex_foo, "p1 %d p2 %d", 0, p2); // backward compat.
        trace_mark(futex_foo2, "p2 %d p3 %d", p2, p3); // new marker

or even just drop the backward compatibility one altogether.

It will need judicious choices by the kernel and willingness by the
tools to keep up.  Those that don't will simply notice fewer events
coming in, but nothing important *breaking*.

The current crop of tools (lttng, systemtap) are both from friendly
groups who recognize that they have more of an expendable diagnostic
rather than operational role, and thus are willing to carry that
burden.  By the time new tools will show up, we will have more
experience with the degree and type of marker changes over time, and
they won't be in a position to place new constraints on the

Indeed.  This is why it's important for the subsystem mainta...
To: Frank Ch. Eigler <fche@...>
Cc: <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, <mathieu.desnoyers@...>
Date: Saturday, April 19, 2008 - 10:52 am

I'm thinking the two use-cases are confused here. So we have

 a) permanent markers
 b) ad-hoc debug markers

I'm thinking that for the first class the compilation level coupling is
no problem at all. And for the second class it doesn't matter how ugly
they are as long as it works on the spot.

So I'm arguing these two should be separated.

--
To: Peter Zijlstra <a.p.zijlstra@...>
Cc: <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, <mathieu.desnoyers@...>
Date: Saturday, April 19, 2008 - 2:03 pm

Hi -


OTOH I think this is a false dichotomy.  Debugging is not only done by
a subsystem maintainer during the merge/rc period.  When something
goes wrong on a deployed machine, problem diagnosis requires data,
which ideally should be gathered as non-intrusively as possible - that
means no recompiling / rebooting, and ideally very little slowdown.

I bet that many of those markers you might consider "ad-hoc" would in
fact have some post-release diagnostic value.  Now, maybe in the case
of futexes, the kernel makes no sophisticated decisions that may need
subsequent review.  Maybe all the internals are directly calculable
from the results visible at the system call level (which threads block
and which return).  But this is certainly not so for many other parts
of the kernel.

With markers, you don't have to make this an agonizing decision.  You
just insert markers, regardless of whether it's high-level, universal,
"permanent"; or whether it's low-level, diagnostic, temporary.  The
same consumer tools will work for them all.

- FChE
--
To: Frank Ch. Eigler <fche@...>
Cc: <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, <mathieu.desnoyers@...>
Date: Saturday, April 19, 2008 - 2:29 pm

This again tries into the argument about not making markers depend on
the code structure or implementation details.

I'm really wanting to avoid ever having to be obstructed by a marker. So
any marker that does not represent a solid high level event (to take
Mathieu's example: a context switch is a context switch, and we'll
always have one) I'm not comfortable with merging that upstream.

So even though these ad-hoc markers might have some diagnostic value -
I'll never support merging them. If a customer might have some issue I
can hand him a custom kernel with these markers added in - I see
absolutely no reason to burden upstream with these.

So we _do_ have to make this decision; some shite should just not be
merged but be kept separate.

--
To: Peter Zijlstra <a.p.zijlstra@...>
Cc: <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, <mathieu.desnoyers@...>
Date: Saturday, April 19, 2008 - 2:48 pm

Hi -



It is your prerogative as a subsystem maintainer to make a guess about
this.  Others may make their own decisions differently, considering

Perhaps the kinds of bugs in your code, coupled with the kinds of
customers who experience those bugs, make tolerable this means of
diagnosis (requiring a reboot of their machines into a custom
debugging kernel).  The customers I have dealt with (and frankly, I
too) need to diagnose problems on a live running system as much as
possible.  They don't run every kernel du jour, so they don't need the
purely hypothetical tools that are dependent on a permanent set of
markers.
 

- FChE
--
To: Frank Ch. Eigler <fche@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, <mathieu.desnoyers@...>
Date: Tuesday, April 22, 2008 - 1:50 pm

Sun solved this problem for DTrace by applying their stability
attributes to all probe points -- i.e. the probes that represent an ABI
are marked as such and the temporary debugging probes are marked as
unstable.

-- 
Nicholas Miell &lt;nmiell@comcast.net&gt;

--
To: <a.p.zijlstra@...>
Cc: <fche@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, <mathieu.desnoyers@...>
Date: Saturday, April 19, 2008 - 8:28 am

Hi Peter,
	Adding to what Frank and others have said before, it would be
nice to have you point out which markers or parameters exported by
them are superfluous and I'm willing to have them removed from my patch.

I've just sent out another patch over marker infrastructure to Mathieu
which will eliminate the need to re-send the format string during
initialisation and if accepted, the patch should take care of any
concerns about duplication of code.

Indeed any markers in futex.c that you think would be absolutely
unnecessary can be removed on a case-by-case basis, but do let us know
them.

Thanks,
K.Prasad

--
To: Peter Zijlstra <a.p.zijlstra@...>
Cc: <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>
Date: Tuesday, April 15, 2008 - 8:32 am

I don't really see how it differs so much from printks, which kernel

If any of your variable type changes, then you are exporting an unknown
data structure to user-space. _That_ would break a userspace tracer
whenever you change any of these kernel variables and you don't want
that.

Exporting the field names and variable types helps to identify the
variables by their given names rather than their respective order.
Having the field type insures binary compatibility.

Clearly we can turn your trace_futex_wait(uaddr, fshares, val, abs_time,
bitset); into a trace_mark() with a simple define, and I don't see any
problem with that. I just want to make sure the event name, field names
and field types are exported, and this is done by markers. However, I
wonder why none of the kernel printk() are turned into specialized
defines to make the code "cleaner".. maybe it's because it is useful to

Because we extract the field names and types, we can create tracer
plugins that would hook on field names rather than expect a specific
number of fields and fixed field types. It makes it possible to tolerate
missing fields pretty easily. But yes, tracer tools might have to be
adapted to internal kernel changes, since they must follow kernel
structure changes. However, staying as close as possible to a canonical
representation of event fields, staying far from the specific
implemetation, would help to lessen the inter-dependency. On the other
hand, it would probably hurt trace compactness and efficiency.


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To: Mathieu Desnoyers <mathieu.desnoyers@...>
Cc: <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>
Date: Tuesday, April 15, 2008 - 8:56 am

Which never last longer than the debug session and are never exposed to

trace_futex_wait()'s signature would make the compiler issue a complaint

I must be missing something here, printks don't need to look pretty
because they never see the light of lkml. They get ripped out as soon as

See, these tracer tools are my nightmare as member of an enterprise
linux team. They'll make an already hard job even harder, no thanks!

At least reduce the hooks to the very bare minimum; only log when the
mutex changes state; not this: we entered futex_wait; we exited with
state crap.

Just log:

  futex: &lt;uaddr&gt; wait
  futex: &lt;uaddr&gt; wakeup

And other fundamental events, the rest is just not needed.

--
To: Peter Zijlstra <a.p.zijlstra@...>
Cc: <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>
Date: Tuesday, April 15, 2008 - 9:25 am

Yes, but then you would have to create new code for each event you want
to trace. In the end, it would increase the icache footprint

The thing is that the trace_marks really fills two purpose : they
extract information from the core kernel, which is meant to be shipped
on production systems so tracing tools can report what is happening on
the system and they also allow kernel hackers to add markers of their
own, so they can extract information about specific events they are
interested in along with the standard kernel instrumentation.

So, part of it is meant to be standard kernel information, part of it
can be used for debugging. And since the kernel code evolves through
time, it makes sense to have an infrastructure flexible enough to follow
these changes easily.

Your proposal is interesting though. We could keep the flexible markers
as the core infrastructure to declare static instrumentation and add
static inlines in C files to map function prototypes to markers, such
as:

static inline void trace_futex_wait(void *uaddr)
{
  trace_mark(futex_wait, "uaddr %p", uaddr);
}

static inline void trace_futex_wakeup(void *uaddr)
{
  trace_mark(futex_wakeup, "uaddr %p", uaddr);
}

But personnally I don't really see how these static inlines will look

I totally agree with you on that. This is the approach I've used in the
LTTng instrumentation.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To: Mathieu Desnoyers <mathieu.desnoyers@...>
Cc: <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>
Date: Wednesday, April 16, 2008 - 11:51 am

Looking at patch 2/2 in this series that seems to be needed anyway. So

The trouble I have with these free form thingies is that its too hard to
keep them consistent.



Why the need to duplicate it; that's utter madness.

--
To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, Christoph Hellwig <hch@...>
Date: Thursday, April 17, 2008 - 3:19 pm

Hi -


That, plus the new hand-written function (trace_futex_wait) would
still need to manage the packaging of the arguments for consumption by
separately compiled pieces.  It is desirable not to require such
hand-written functions to *also* be declared in headers for these

The main thing is type checking by engaging gcc's printf format
checking logic.  In my original markers proposal, the types were
encoded into the function name, sort of as in C++:

  trace_mark_nnnnn(futex_wake_called, uaddr, fshares, val, abs_time, bitset);

where each "n" stands for some integral value, and could be chosen
amongst a small number of other types (say -- "s": char* string, "p":
void*, "l":64-bit long).  Then, type checking could be done by the
core compiler for both event producers and consumers.  One downside
was that the trace_mark_* permutations themselves would have to be
generated by some shell/perl script [1], and some deemed this probably
unacceptable.  I'm still not sure...

[1] some systemtap archaeology:

This second instance is optional and is used as a consistency check
for the event consumer to hook up exactly to the intended producer.
The string could be empty.


- FChE
--
To: Frank Ch. Eigler <fche@...>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, Christoph Hellwig <hch@...>
Date: Friday, April 18, 2008 - 2:56 am

*blink* so all this is so you don't have to put a declarion in a header
file?

How about we put these premanent markers in a header - Mathieu says
there are &lt;200. Surely that's not too much trouble.

Then you can keep this trace_mark() (perhaps trace_printf() is a better

So instead of writing normal C code and placing a declarion in a header,
you've come up with a scheme that needs to duplicate a text string to
check integrity. Sounds like a real good way to confuse people.

--
To: Frank Ch. Eigler <fche@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, Christoph Hellwig <hch@...>
Date: Thursday, April 17, 2008 - 4:16 pm

* Frank Ch. Eigler (fche@redhat.com) wrote:


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To: Mathieu Desnoyers <mathieu.desnoyers@...>
Cc: Frank Ch. Eigler <fche@...>, Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, Christoph Hellwig <hch@...>
Date: Saturday, April 19, 2008 - 8:13 am

Atleast until 2.6.25-rc8-mm2, I would expect the marker registration to
fail if the format strings don't match.
I find an early return in set_marker() in marker.c if the format
strings don't match.

Mathieu,
	Can you let me know if you would find the patch below - which
eliminates the need to pass format string to marker_probe_register()
function, acceptable?
I have done some elementary tests with the patch, but I know it would
fail if there are duplicate markers with different format strings.
However I presume that duplicate markers are meant to appear primarily
when markers are present in inlined functions (in which case their
format strings would be the same).

Thanks,
K.Prasad

Signed-off-by: K.Prasad &lt;prasad@linux.vnet.ibm.com&gt;
---
 include/linux/marker.h          |    4 ++--
 kernel/marker.c                 |   40 +++++++++++++++++-----------------------
 samples/markers/probe-example.c |    3 ---
 3 files changed, 19 insertions(+), 28 deletions(-)

Index: linux-register_marker_patch/kernel/marker.c
===================================================================
--- linux-register_marker_patch.orig/kernel/marker.c
+++ linux-register_marker_patch/kernel/marker.c
@@ -364,7 +364,7 @@ static struct marker_entry *get_marker(c
  * Add the marker to the marker hash table. Must be called with markers_mutex
  * held.
  */
-static struct marker_entry *add_marker(const char *name, const char *format)
+static struct marker_entry *add_marker(const char *name)
 {
 	struct hlist_head *head;
 	struct hlist_node *node;
@@ -372,9 +372,15 @@ static struct marker_entry *add_marker(c
 	size_t name_len = strlen(name) + 1;
 	size_t format_len = 0;
 	u32 hash = jhash(name, name_len-1, 0);
+	struct marker *iter;
+
+	/* Search for the marker with 'name' in __markers section */
+	for (iter = __start___markers;
+		((iter &lt; __stop___markers) &amp;&amp; (strcmp(name, iter-&gt;name)));
+			iter++);
 
-	if (format)
-		format_len = strlen(format) + 1;
+	if (iter-&gt;format)
+...
To: K. Prasad <prasad@...>
Cc: Frank Ch. Eigler <fche@...>, Peter Zijlstra <a.p.zijlstra@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, Christoph Hellwig <hch@...>
Date: Saturday, April 19, 2008 - 5:33 pm

Sorry, I will have to NACK this one. A probe which registers on a marker
has two choices : either is passes a NULL format string, which tells the
marker infrastructure that th probe will dynamically parse the format
string to grab the arguments, like printk.

However, if the probe is expecting a particular layout of arguments, It
must tell so upon registration, so the marker infrastructure can check
if the marker it's trying to connect to still ave the same arguments.
That will make sure specialized probes won't break between kernel
version changes.


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>
Date: Tuesday, April 15, 2008 - 9:17 am

i'm clearly NAK-ing all futex trace hooks until the true impact of the 
whole marker facility is better understood. I've tried them for the 
scheduler and they were a clear failure: too bloated and too persistent.

but more importantly, as things stand today i've yet to see a _any_ 
bugreport where these 'tracer' tools that are being referred to were 
actually used in the field to fix something. The latency tracers (and 
the other tracer variants in -rt) on the other hand have a documented 
track record of being useful in fixing bugs and instrumenting the 
kernel.

	Ingo
--
To: Ingo Molnar <mingo@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>
Date: Tuesday, April 15, 2008 - 9:47 am

I have not seen any counter argument for the in-depth analysis of the
instruction cache impact of the optimized markers I've done. Arguing
that the markers are "bloated" based only on "size kernel/sched.o"

You will probably be interested in the following paper, which explains
various situations in which using a tracer has solved real problems at
Google, IBM, Autodesk, which are Linux users running large clusters or
Linux systems with soft RT constraints.

Linux Kernel Debugging on Google-sized clusters at Ottawa Linux Symposium 2007 
http://ltt.polymtl.ca/papers/bligh-Reprint.pdf

Now for some performance impact :

Here are some results I have taken comparing the optimized markers
approach with the dynamic ftrace approach. These runs with some ALU work
in tight loops, using clflush() to flush the cache lines pointing to
"global" data (pointer read : current-&gt;pid) used in the loop. I also
have the numbers for running the loop without the ALU work, but I leave
them out since they only make the tables harder to read : basically, the
cached impact for running the empty loop with markers or ftrace
instrumentation is about 0 to 3 cycles. It's the uncached impact which
clearly makes the difference between both approaches.

On AMD64, adding the markers or ftrace statement actually accelerates
the runs when executed with an ALU work baseline. It adds 1 to 2 cycles
with executed alone in the loop without any work.

Frank Ch. Eigler is preparing some macrobenchmarks. I hope he will find
time to post them soon.

Results in cycles per loop

baseline :
Cycles for ALU loop                    28.10013
(will be substracted for cached runs)

Cycles for clflush() and ALU loop     230.11087
(will be substracted from non-cached runs)

gcc version 4.1.3 20070812 (prerelease) (Debian 4.1.2-15), -O2

------------------------------------------------------------------------------
|x86 Pentium 4, 3.0GHz, Linux 2.6.25-rc7           |  cached    |  uncached  |
--------------------------------...
To: Mathieu Desnoyers <mathieu.desnoyers@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>
Date: Tuesday, April 15, 2008 - 12:48 pm

absent from your figures is dyn-ftrace :-)

dyn-ftrace is a young but rather nifty tool that extends on old-style 
static tracepoints rather forcefully, by adding 75,000+ tracepoints to 
the kernel here and now, with no in-source maintainence overhead at all. 
(It's a nice add-on to SystemTap because it adds space for a jprobe, but 
i digress ;-)

anyway, my objections against markers would be reduced quite 
significantly, if we were able to improve the immediate values 
optimization to not actually patch the MOV's constant portion, but to 
just patch both the MOV and the branch instruction, into two NOPs.

in fact, we can do even better: we should turn the immediate values 
optimization into a simple "patch out a jump to the slowpath" 
optimization. I.e. just a single instruction to patch in and out. (and 
that makes the NMI impact all that easier to handle as well)

That would pretty much meet my "the typical trace point should have the 
cost of a single NOP" threshold for fastpath tracing overhead, which 
i've been repeating for 2 years now, and which would make the scheduler 
markers a lot more palatable ;-)

hm?

	Ingo
--
To: Ingo Molnar <mingo@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>
Date: Tuesday, April 15, 2008 - 5:38 pm

I think we could do something there. Let's have a look at a few
marker+immediate values fast paths on x86_32 :


    4631:       b0 00                   mov    $0x0,%al
    4633:       84 c0                   test   %al,%al
    4635:       0f 85 c6 00 00 00       jne    4701 &lt;try_to_wake_up+0xea&gt;


    7059:       b0 00                   mov    $0x0,%al
    705b:       84 c0                   test   %al,%al
    705d:       75 63                   jne    70c2 &lt;sched_exec+0xb6&gt;


    83ac:       b0 00                   mov    $0x0,%al
    83ae:       84 c0                   test   %al,%al
    83b0:       75 29                   jne    83db &lt;wait_task_inactive+0x69&gt;


If we want to support NMI context and have the ability to instrument
preemptable code without too much headache, we must insure that every
modification will leave the code in a "correct" state and that we do not
grow the size of any reachable instruction.  Also, we must insure gcc
did not put code between these instructions. Modifying non-relocatable
instructions would also be a pain, since we would have to deal with
instruction pointer relocation in the breakpoint code when the code
modification is being done.

Luckily, gcc almost never place any code between the mov, test and jne
instructions. But since we cannot we sure, we could dynamically check
for this code pattern after the mov instruction. If we find it, then we
play with it as if it was a single asm block, but if we don't find what
we expect, then we use standard immediate values for that. I expect the
heavily optimised version will be usable almost all the time.

This heavily optimized version could consist of a simple jump to the
address following the "jne" instruction. To activate the immediate
value, we could simply put back a mov $0x1,%al. I don't think we care
_that_ much about the active tracing performance, since we take a
supplementary function call already in that case.

We could probably force the mov into %al to make sur...
To: Mathieu Desnoyers <mathieu.desnoyers@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>, Arjan van de Ven <arjan@...>
Date: Wednesday, April 16, 2008 - 9:17 am

sounds like a promising plan to me.

would you be interested in putting back the sched-tracer markers into 
sched.c, using the redo-markers patch i posted some time ago (attached 
below again) - and send me a series combined with immediate values 
against sched-devel/latest, so that i can have another look at their 
impact? [That would be the current optimized-markers approach initially, 
not the proposed super-optimized-markers approach.]

i wont worry much about the slowpath impact, you and Frank are right in 
that -freorder-blocks will probably hide the slowpath nicely and it will 
have other benefits as well.

Maybe a functional -freorder-blocks patch for x86 would help in judging 
that impact precisely - i think Arjan made one some time ago? Last time 
i tried it (a few weeks ago) it crashed early during bootup - but it 
would be an interesting feature to have.

	Ingo

----------------------------------&gt;
Subject: sched: reintroduce markers
From: Ingo Molnar &lt;mingo@elte.hu&gt;

---
 include/linux/sched.h             |   36 --------
 kernel/sched.c                    |   11 +-
 kernel/trace/trace.h              |   12 --
 kernel/trace/trace_functions.c    |    2 
 kernel/trace/trace_sched_switch.c |  163 +++++++++++++++++++++++++++++++++++---
 kernel/trace/trace_sched_wakeup.c |  146 +++++++++++++++++++++++++++++++---
 6 files changed, 299 insertions(+), 71 deletions(-)

Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -2031,42 +2031,6 @@ extern int sched_mc_power_savings, sched
 
 extern void normalize_rt_tasks(void);
 
-#ifdef CONFIG_CONTEXT_SWITCH_TRACER
-extern void
-ftrace_ctx_switch(struct task_struct *prev, struct task_struct *next);
-#else
-static inline void
-ftrace_ctx_switch(struct task_struct *prev, struct task_struct *next)
-{
-}
-#endif
-
-#ifdef CONFIG_SCHED_TRACER
-extern void
-ftrace_wake_up_task(struct task...
To: Ingo Molnar <mingo@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>, Arjan van de Ven <arjan@...>
Date: Wednesday, April 16, 2008 - 4:10 pm

Hi,

I am trying to figure out why the following code seems to work fine on
x86_32, but the update fails to happen on x86_64. Any idea why ? From
what I see the memcpy seems to fall in a memory area which is not the
same as the one I test at the end of the function. It only fails for
module addresses, which are vmalloc'd. Once I fix this, I'll be able to
send the patchset.

(from my own alternative.c) :

void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
{
        unsigned long flags;
        char *vaddr;
        int nr_pages = 2;
        struct page *pages[2];
        int i;

        if (*((uint8_t *)addr - 1) != BREAKPOINT_INSTRUCTION) {
                BUG_ON(len &gt; sizeof(long));
                BUG_ON((((long)addr + len - 1) &amp; ~(sizeof(long) - 1))
                        - ((long)addr &amp; ~(sizeof(long) - 1)));
        }
        if (is_vmalloc_addr(addr)) {
                pages[0] = vmalloc_to_page(addr);
                pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
        } else {
                pages[0] = virt_to_page(addr);
                pages[1] = virt_to_page(addr + PAGE_SIZE);
        }
        BUG_ON(!pages[0]);
        if (!pages[1])
                nr_pages = 1;
        vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
        BUG_ON(!vaddr);
        local_irq_save(flags);
        memcpy(&amp;vaddr[(unsigned long)addr &amp; ~PAGE_MASK], opcode, len);
        local_irq_restore(flags);
        vunmap(vaddr);
        sync_core();
        /* Could also do a CLFLUSH here to speed up CPU recovery; but
           that causes hangs on some VIA CPUs. */
        for (i = 0; i &lt; len; i++)
                BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
        return addr;
}

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To: Ingo Molnar <mingo@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>, Arjan van de Ven <arjan@...>
Date: Wednesday, April 16, 2008 - 5:22 pm

Ah, found it. the is_vmalloc_addr() is somewhat misleading because it
does not include the vmalloc'd memory for modules, which is in a
different address space. I wonder if it has other unknown side-effects.

I used !core_kernel_text() instead.


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To: Ingo Molnar <mingo@...>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...>, Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>
Date: Wednesday, April 16, 2008 - 9:46 am

you also need to make sure no cpu is executing that code ever.. 

I expect gcc to start using the macro-fusion capable ones more and more over time at least,
and for that the compare and jmp need to be consecutive.


-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To: Arjan van de Ven <arjan@...>
Cc: Ingo Molnar <mingo@...>, Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>
Date: Wednesday, April 16, 2008 - 10:00 am

By "insure that every modification will leave the code in a "correct"
state", I mean that at any given time before, during or after the code
modification, if an NMI comes on any CPU and try to run the modified
code, it should have a valid version of the code to execute. Does it

Early reasults of the work I've done last night : I can detect about 96%
of the ~120 markers I've put in my instrumented kernel.

Not only does the compare and jmp need to be consecutive, but the movb
$0x0,%al also does. I *could* try to detect specific code inserted in
between, but I really have to make sure I don't get burned by the
compiler inserting a jmp there.

I'll post my work shortly.


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To: Mathieu Desnoyers <mathieu.desnoyers@...>
Cc: Ingo Molnar <mingo@...>, Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>
Date: Wednesday, April 16, 2008 - 10:24 am

On Wed, 16 Apr 2008 10:00:09 -0400

I understand your words. My concern is that I don't quite understand how you
guarantee that you'll not be executing the code you're modifying.
Just saying "it's consistent before and after" sounds nice but probably isn't

I wonder if just sticking in 2 barriers around your code make gcc stop moving stuff too much

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To: Arjan van de Ven <arjan@...>
Cc: Ingo Molnar <mingo@...>, Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>
Date: Wednesday, April 16, 2008 - 10:48 am

Ah, I see. I insert a breakpoint and execute a bypass rather than the
code being modified. I only put back the 1st instruction byte after the

I'm not sure people would like the side-effect for the rest of
optimizations, but it should be tried.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To: Mathieu Desnoyers <mathieu.desnoyers@...>
Cc: Ingo Molnar <mingo@...>, Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>
Date: Wednesday, April 16, 2008 - 11:32 am

On Wed, 16 Apr 2008 10:48:14 -0400

sorry but I'm not convinced that that is safe without a real exclusion mechanism.
--
To: Arjan van de Ven <arjan@...>
Cc: Ingo Molnar <mingo@...>, Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>
Date: Wednesday, April 16, 2008 - 11:39 am

I'll post the patchset soon. Please have a look at the rather lengthy
comment in arch/x86/kernel/immediate.c for details. I'll be glad to
answer any question that remains.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To: Arjan van de Ven <arjan@...>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...>, Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>
Date: Wednesday, April 16, 2008 - 10:29 am

hm, an extra optimization barrier might have worse effects than even an 
extra instruction. I think if the detection and patching can be made 
100% safe, we dont care about the remaining 4% of markers that gcc 
somehow reorders.

and in parallel gcc folks might want to start helping us achieve 
single-instruction branch points? Currently there's no way to get flags 
values out of inline assembly, except via a register intermediary which 
adds another instruction. For flags that are unaffected by gcc's 
input/output constraint generation code it would make sense to allow 
them to be exported out of inline assembly.

	Ingo
--
To: Mathieu Desnoyers <mathieu.desnoyers@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>
Date: Tuesday, April 15, 2008 - 10:04 am

uhm, i'm not sure what you mean - how else would you quantify bloat than 
to look at the size of the affected subsystem?

	Ingo
--
To: Ingo Molnar <mingo@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>
Date: Tuesday, April 15, 2008 - 10:39 am

Data cache bloat inspection :
If you use the "size" output, it will take into account all the data
placed in special sections. At link time, these sections are put
together far from the actual cache hot kernel data.

Instruction cache bloat inspection :
If a code region is placed with cache cold instructions (unlikely
branches), it should not increase the cache impact, since although we
might use one more cache line, it won't be often loaded in cache because
all the code that shares this cache line is unlikely.

TLB entries bloat :
If code is added in unlikely branches, the instruction size increase
could increase the number of TLB entries required to keep cache hot
code. However, in our case, adding 10 (hot) + 50 (cold) bytes to the
scheduler code per optimized marker would require 68 markers to occupy a
whole 4kB TLB entry. Statistically, we could suppose that adding less
than 34 markers to the scheduler should not use any supplementary TLB
entry.  Adding 3 markers is therefore very unlikely to increase the TLB
impact. Given we have about 1024 TLB entries, adding 1/25th of a TLB
entry to the cache hot kernel instructions should not matter much,
especially since it might be absorbed by alignment.

And since the kernel core code is placed in "Huge TLB pages" on many
architectures nowadays, I really don't think the impact of a few bytes
out of 4MB is significant.

I therefore think that looking only at code size is misleading when
considering the cache impact of markers, since they have been designed
to put the bytes as far away as possible from cache-hot memory.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To: Ingo Molnar <mingo@...>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...>, Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, Christoph Hellwig <hch@...>
Date: Tuesday, April 15, 2008 - 10:21 am

Hi -


For example, by measuring how much of that "bloat" is actually
experienced (e.g., in the form of presence in cache or additional time
taken.) when that instrumentation is turned off vs. on.  That should
sound familiar, as this is the sort of data that you outlined (but I
don't recall whether you actually provided numbers/recipes) when
describing the "dyn-ftrace" nop-padding work.

- FChE
--
To: Mathieu Desnoyers <mathieu.desnoyers@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>
Date: Tuesday, April 15, 2008 - 8:50 am

They aren't in every -E codepath, nor they are at the start
and at the end of every important function (like system call).

Such printks are usually inserted during debugging when you don't care
about ugliness and these patches will eventually make kernel looks like
being permanently debugged one.

--
To: Alexey Dobriyan <adobriyan@...>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...>, Peter Zijlstra <a.p.zijlstra@...>, <prasad@...>, <linux-kernel@...>, <tglx@...>, <mingo@...>, Christoph Hellwig <hch@...>, Frank Ch. Eigler <fche@...>
Date: Tuesday, April 15, 2008 - 12:13 pm

Wondering how useful would printks be when debugging large systems (say
analysing futex contention on a 32-way CPU system)....also to mention
the inability to do a binary dump of the interested data structures.

--K.Prasad
 
--
To: <linux-kernel@...>
Cc: <tglx@...>, <a.p.zijlstra@...>, <mingo@...>, <mathieu.desnoyers@...>
Date: Tuesday, April 15, 2008 - 7:55 am

These files define and implement marker handlers for the probes in
futex.c. They may be built by choosing the CONFIG_FUTEX_DEBUG option
(can be built as a kernel module also).
 
They export data to the user using debugfs_printk and the output
is available in futex_debug_trace/ directory under the debugfs mount.
Separate directories under this are created for each marker probe.

Signed-off-by: K.Prasad &lt;prasad@linux.vnet.ibm.com&gt;
---
 include/linux/futex_markers.h |   63 +++++++++++++
 init/Kconfig                  |   13 ++
 kernel/Makefile               |    1 
 kernel/futex_markers.c        |  201 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 278 insertions(+)

Index: linux-2.6.25-rc8-mm1/init/Kconfig
===================================================================
--- linux-2.6.25-rc8-mm1.orig/init/Kconfig
+++ linux-2.6.25-rc8-mm1/init/Kconfig
@@ -646,6 +646,19 @@ config FUTEX
 	  support for "fast userspace mutexes".  The resulting kernel may not
 	  run glibc-based applications correctly.
 
+config FUTEX_DEBUG
+	tristate "Enable debugging for Futex - currently stats in debugfs"
+	depends on FUTEX
+	depends on DEBUG_FS &amp;&amp; MARKERS &amp;&amp; TRACE
+	default m
+	help
+	  This option provides debugging data for Futex system calls
+	  in debugfs.
+
+	  Say Y/M here if you want to enable Futex debugging
+	  in-kernel/module.
+	  Say N if you are unsure.
+
 config ANON_INODES
 	bool
 
Index: linux-2.6.25-rc8-mm1/kernel/futex_markers.c
===================================================================
--- /dev/null
+++ linux-2.6.25-rc8-mm1/kernel/futex_markers.c
@@ -0,0 +1,201 @@
+/*
+ * futex_markers.c - Marker handler for probes in futex.c
+ *
+ * Copyright IBM Corp. 2008
+ *
+ * This program is free software;  you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
...
Previous thread: [PATCH] vfs: Fix possible deadlock in ext2, ext3, ext4 when using xattrs by Jan Kara on Tuesday, April 15, 2008 - 7:38 am. (1 message)

Next thread: 2.6.25 regression: 60417f5976df breaks Thinkpad suspend by George Spelvin on Tuesday, April 15, 2008 - 7:53 am. (6 messages)
speck-geostationary