Re: [PATCH v2] perf lock: add "info" subcommand for dumping misc information

Previous thread: 2.6.34-rc5-git3 - INFO: possible circular locking dependency detected - (s_active#91){++++.+}, at: [<ffffffff81142c4a>] sysfs_addrm_finish+0x31/0x50 by Miles Lane on Friday, April 23, 2010 - 6:56 pm. (2 messages)

Next thread: Re: [PATCH 1/9] PM: Add suspend block api. by Alan Stern on Friday, April 23, 2010 - 7:30 pm. (2 messages)
From: Frederic Weisbecker
Date: Friday, April 23, 2010 - 7:05 pm

Ingo,

Please pull the perf/core branch that can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	perf/core

Thanks,
	Frederic
---

Frederic Weisbecker (6):
      perf: Generalize perf lock's sample event reordering to the session layer
      perf: Use generic sample reordering in perf sched
      perf: Use generic sample reordering in perf kmem
      perf: Use generic sample reordering in perf trace
      perf: Use generic sample reordering in perf timechart
      perf: Add a perf trace option to check samples ordering reliability

Hitoshi Mitake (1):
      perf lock: Fix state machine to recognize lock sequence

Stephane Eranian (1):
      perf: Fix initialization bug in parse_single_tracepoint_event()

William Cohen (1):
      perf: Some perf-kvm documentation edits


 tools/perf/Documentation/perf-kvm.txt |    9 +-
 tools/perf/builtin-kmem.c             |    6 +-
 tools/perf/builtin-lock.c             |  595 ++++++++++++++++++++-------------
 tools/perf/builtin-sched.c            |    8 +-
 tools/perf/builtin-timechart.c        |  112 +------
 tools/perf/builtin-trace.c            |   13 +
 tools/perf/util/parse-events.c        |   13 +-
 tools/perf/util/session.c             |  179 ++++++++++-
 tools/perf/util/session.h             |   10 +
 9 files changed, 583 insertions(+), 362 deletions(-)
--

From: Frederic Weisbecker
Date: Friday, April 23, 2010 - 7:05 pm

From: Stephane Eranian &lt;eranian@google.com&gt;

The parse_single_tracepoint_event() was setting some attributes
before it validated the event was indeed a tracepoint event. This
caused problems with other initialization routines like in the
builtin-top.c module whereby sample_period is not set if not 0.

Signed-off-by: Stephane Eranian &lt;eranian@google.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
LKML-Reference: &lt;4bcf232b.698fd80a.6fbe.ffffb737@mx.google.com&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
---
 tools/perf/util/parse-events.c |   13 +++++++------
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 3b4ec67..600d327 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -418,12 +418,6 @@ parse_single_tracepoint_event(char *sys_name,
 	u64 id;
 	int fd;
 
-	attr-&gt;sample_type |= PERF_SAMPLE_RAW;
-	attr-&gt;sample_type |= PERF_SAMPLE_TIME;
-	attr-&gt;sample_type |= PERF_SAMPLE_CPU;
-
-	attr-&gt;sample_period = 1;
-
 	snprintf(evt_path, MAXPATHLEN, &quot;%s/%s/%s/id&quot;, debugfs_path,
 		 sys_name, evt_name);
 
@@ -442,6 +436,13 @@ parse_single_tracepoint_event(char *sys_name,
 	attr-&gt;type = PERF_TYPE_TRACEPOINT;
 	*strp = evt_name + evt_length;
 
+	attr-&gt;sample_type |= PERF_SAMPLE_RAW;
+	attr-&gt;sample_type |= PERF_SAMPLE_TIME;
+	attr-&gt;sample_type |= PERF_SAMPLE_CPU;
+
+	attr-&gt;sample_period = 1;
+
+
 	return EVT_HANDLED;
 }
 
-- 
1.6.2.3

--

From: Frederic Weisbecker
Date: Friday, April 23, 2010 - 7:05 pm

From: Hitoshi Mitake &lt;mitake@dcl.info.waseda.ac.jp&gt;

Previous state machine of perf lock was really broken.
This patch improves it a little.

This patch prepares the list of state machine that represents
lock sequences for each threads.

These state machines can be one of these sequences:

      1) acquire -&gt; acquired -&gt; release
      2) acquire -&gt; contended -&gt; acquired -&gt; release
      3) acquire (w/ try) -&gt; release
      4) acquire (w/ read) -&gt; release

The case of 4) is a little special.
Double acquire of read lock is allowed, so the state machine
counts read lock number, and permits double acquire and release.

But, things are not so simple. Something in my model is still wrong.
I counted the number of lock instances with bad sequence,
and ratio is like this (case of tracing whoami): bad:233, total:2279

version 2:
 * threads are now identified with tid, not pid
 * prepared SEQ_STATE_READ_ACQUIRED for read lock.
 * bunch of struct lock_seq_stat is now linked list
 * debug information enhanced (this have to be removed someday)
   e.g.
     | === output for debug===
     |
     | bad:233, total:2279
     | bad rate:0.000000
     | histogram of events caused bad sequence
     |     acquire: 165
     |    acquired: 0
     |   contended: 0
     |     release: 68

Signed-off-by: Hitoshi Mitake &lt;mitake@dcl.info.waseda.ac.jp&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: Jason Baron &lt;jbaron@redhat.com&gt;
Cc: Xiao Guangrong &lt;xiaoguangrong@cn.fujitsu.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
LKML-Reference: &lt;1271852634-9351-1-git-send-email-mitake@dcl.info.waseda.ac.jp&gt;
[rename SEQ_STATE_UNINITED to SEQ_STATE_UNINITIALIZED]
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
---
 tools/perf/builtin-lock.c |  410 +++++++++++++++++++++++++++++++++++++--------
 1 files changed, 342 insertions(+), 68 deletions(-)

diff --git ...
From: Hitoshi Mitake
Date: Saturday, April 24, 2010 - 3:43 am

Hi Frederic,

I added &quot;info&quot; subcommand to perf lock,
this can be used as dumping metadata like thread or address of lock instances.
&quot;map&quot; was removed because info should do the work of it.

This will be useful not only for debugging but also for ordinary analyzing.

I made this patch on perf/core of your tree, could you queue this?

Signed-off-by: Hitoshi Mitake &lt;mitake@dcl.info.waseda.ac.jp&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: Jason Baron &lt;jbaron@redhat.com&gt;
Cc: Xiao Guangrong &lt;xiaoguangrong@cn.fujitsu.com&gt;
---
 tools/perf/builtin-lock.c |   68 +++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index ce27675..c54211e 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -778,18 +778,61 @@ static void print_result(void)
 	}
 }
 
+static int			info_threads;
+static int			info_map;
+
+static void rec_dump_threads(struct rb_node *node)
+{
+	struct thread_stat *st;
+	struct thread *t;
+
+	if (!node)
+		return;
+
+	if (node-&gt;rb_left)
+		rec_dump_threads(node-&gt;rb_left);
+
+	st = container_of(node, struct thread_stat, rb);
+	BUG_ON(!st);
+	t = perf_session__findnew(session, st-&gt;tid);
+	BUG_ON(!t);
+
+	printf(&quot;%10d: %s\n&quot;, st-&gt;tid, t-&gt;comm);
+
+	if (node-&gt;rb_right)
+		rec_dump_threads(node-&gt;rb_right);
+}
+
+static void dump_threads(void)
+{
+	printf(&quot;%10s: comm\n&quot;, &quot;Thread ID&quot;);
+	rec_dump_threads(thread_stats.rb_node);
+}
+
 static void dump_map(void)
 {
 	unsigned int i;
 	struct lock_stat *st;
 
+	printf(&quot;Address of instance: name of class\n&quot;);
 	for (i = 0; i &lt; LOCKHASH_SIZE; i++) {
 		list_for_each_entry(st, &amp;lockhash_table[i], hash_entry) {
-			printf(&quot;%p: %s\n&quot;, st-&gt;addr, st-&gt;name);
+			printf(&quot; %p: %s\n&quot;, st-&gt;addr, st-&gt;name);
 		}
 	}
 }
 ...
From: Hitoshi Mitake
Date: Saturday, April 24, 2010 - 3:46 am

On 04/24/10 19:43, Hitoshi Mitake wrote:
 &gt; Hi Frederic,
 &gt;
 &gt; I added &quot;info&quot; subcommand to perf lock,
 &gt; this can be used as dumping metadata like thread or address of lock 
instances.
 &gt; &quot;map&quot; was removed because info should do the work of it.
 &gt;
 &gt; This will be useful not only for debugging but also for ordinary 
analyzing.
 &gt;
 &gt; I made this patch on perf/core of your tree, could you queue this?

Sorry, I should add example of usage :(
I resend v2, please discard it.
--

From: Hitoshi Mitake
Date: Saturday, April 24, 2010 - 3:46 am

Hi Frederic,

I added &quot;info&quot; subcommand to perf lock,
this can be used as dumping metadata like thread or address of lock instances.
&quot;map&quot; was removed because info should do the work of it.

This will be useful not only for debugging but also for ordinary analyzing.

I made this patch on perf/core of your tree, could you queue this?

v2: adding example of usage
% sudo ./perf lock info -t
 | Thread ID: comm
 | 	 0: swapper
 |         1: init
 |        18: migration/5
 |        29: events/2
 |        32: events/5
 |        33: events/6
...

% sudo ./perf lock info -m
| Address of instance: name of class
|  0xffff8800b95adae0: &amp;(&amp;sighand-&gt;siglock)-&gt;rlock
|  0xffff8800bbb41ae0: &amp;(&amp;sighand-&gt;siglock)-&gt;rlock
|  0xffff8800bf165ae0: &amp;(&amp;sighand-&gt;siglock)-&gt;rlock
|  0xffff8800b9576a98: &amp;p-&gt;cred_guard_mutex
|  0xffff8800bb890a08: &amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
|  0xffff8800b9522a08: &amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
|  0xffff8800bb8aaa08: &amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
|  0xffff8800bba72a08: &amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
|  0xffff8800bf18ea08: &amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
|  0xffff8800b8a0d8a0: &amp;(&amp;ip-&gt;i_lock)-&gt;mr_lock
|  0xffff88009bf818a0: &amp;(&amp;ip-&gt;i_lock)-&gt;mr_lock
|  0xffff88004c66b8a0: &amp;(&amp;ip-&gt;i_lock)-&gt;mr_lock
|  0xffff8800bb6478a0: &amp;(shost-&gt;host_lock)-&gt;rlock
Signed-off-by: Hitoshi Mitake &lt;mitake@dcl.info.waseda.ac.jp&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: Jason Baron &lt;jbaron@redhat.com&gt;
Cc: Xiao Guangrong &lt;xiaoguangrong@cn.fujitsu.com&gt;
---
 tools/perf/builtin-lock.c |   68 +++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index ce27675..c54211e 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -778,18 +778,61 @@ static void print_result(void)
 	}
 }
 
+static int			info_threads;
+static ...
From: Frederic Weisbecker
Date: Saturday, April 24, 2010 - 6:41 am

Thanks, I queue it for the next pass.

--

From: Frederic Weisbecker
Date: Friday, April 30, 2010 - 11:49 am

I've eventually not queued it because of some various


That only walks the left nodes of the rbtree, imagine the following
rbtree, W are visited nodes, U are the unvisited:

                   Root
                 /      \
                W        U
              /  \      / \
             W    U    U   U




You won't ever have !st because container_of computes an address
based on a struct type and a contained address inside this struct.

struct thread_stat {
	struct list_head	hash_entry;
	struct rb_node		rb;
	[...]
} ts;

If ts-&gt;rb == 1000, ts == 1000 - 16 or something like this.




Please don't use printf anymore (I did the same mistakes lately),
now that are using a TUI and we might use a GUI one day, we
can't assume anymore we are dealing with a normal stdout.






No it's not ugly, it's ok, we do this everywhere in perf tools :)

Thanks.

--

From: Hitoshi Mitake
Date: Sunday, May 2, 2010 - 10:11 pm

On 05/01/10 03:49, Frederic Weisbecker wrote:
 &gt; On Sat, Apr 24, 2010 at 07:46:41PM +0900, Hitoshi Mitake wrote:
 &gt;&gt; Hi Frederic,
 &gt;&gt;
 &gt;&gt; I added &quot;info&quot; subcommand to perf lock,
 &gt;&gt; this can be used as dumping metadata like thread or address of lock 
instances.
 &gt;&gt; &quot;map&quot; was removed because info should do the work of it.
 &gt;&gt;
 &gt;&gt; This will be useful not only for debugging but also for ordinary 
analyzing.
 &gt;&gt;
 &gt;&gt; I made this patch on perf/core of your tree, could you queue this?
 &gt;&gt;
 &gt;&gt; v2: adding example of usage
 &gt;&gt; % sudo ./perf lock info -t
 &gt;&gt;   | Thread ID: comm
 &gt;&gt;   | 	 0: swapper
 &gt;&gt;   |         1: init
 &gt;&gt;   |        18: migration/5
 &gt;&gt;   |        29: events/2
 &gt;&gt;   |        32: events/5
 &gt;&gt;   |        33: events/6
 &gt;&gt; ...
 &gt;&gt;
 &gt;&gt; % sudo ./perf lock info -m
 &gt;&gt; | Address of instance: name of class
 &gt;&gt; |  0xffff8800b95adae0:&amp;(&amp;sighand-&gt;siglock)-&gt;rlock
 &gt;&gt; |  0xffff8800bbb41ae0:&amp;(&amp;sighand-&gt;siglock)-&gt;rlock
 &gt;&gt; |  0xffff8800bf165ae0:&amp;(&amp;sighand-&gt;siglock)-&gt;rlock
 &gt;&gt; |  0xffff8800b9576a98:&amp;p-&gt;cred_guard_mutex
 &gt;&gt; |  0xffff8800bb890a08:&amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
 &gt;&gt; |  0xffff8800b9522a08:&amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
 &gt;&gt; |  0xffff8800bb8aaa08:&amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
 &gt;&gt; |  0xffff8800bba72a08:&amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
 &gt;&gt; |  0xffff8800bf18ea08:&amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
 &gt;&gt; |  0xffff8800b8a0d8a0:&amp;(&amp;ip-&gt;i_lock)-&gt;mr_lock
 &gt;&gt; |  0xffff88009bf818a0:&amp;(&amp;ip-&gt;i_lock)-&gt;mr_lock
 &gt;&gt; |  0xffff88004c66b8a0:&amp;(&amp;ip-&gt;i_lock)-&gt;mr_lock
 &gt;&gt; |  0xffff8800bb6478a0:&amp;(shost-&gt;host_lock)-&gt;rlock
 &gt;&gt; Signed-off-by: Hitoshi Mitake&lt;mitake@dcl.info.waseda.ac.jp&gt;
 &gt;&gt; Cc: Ingo Molnar&lt;mingo@elte.hu&gt;
 &gt;&gt; Cc: Peter Zijlstra&lt;a.p.zijlstra@chello.nl&gt;
 &gt;&gt; Cc: Paul Mackerras&lt;paulus@samba.org&gt;
 &gt;&gt; Cc: Arnaldo Carvalho de Melo&lt;acme@redhat.com&gt;
 &gt;&gt; Cc: Jens Axboe&lt;jens.axboe@oracle.com&gt;
 &gt;&gt; Cc: Jason Baron&lt;jbaron@redhat.com&gt;
 &gt;&gt; Cc: Xiao Guangrong&lt;xiaoguangrong@cn.fujitsu.com&gt;
 &gt;
 &gt;
 &gt; I've eventually not queued it because of some various
 &gt; problems, see below:

Thanks for your advice, Frederic!

 &gt;
 &gt;
 ...
From: Hitoshi Mitake
Date: Sunday, May 2, 2010 - 10:12 pm

Hi Frederic,

I added &quot;info&quot; subcommand to perf lock,
this can be used as dumping metadata like thread or address of lock instances.
&quot;map&quot; was removed because info should do the work of it.

This will be useful not only for debugging but also for ordinary analyzing.

I made this patch on perf/core of your tree, could you queue this?

v2: adding example of usage
% sudo ./perf lock info -t
 | Thread ID: comm
 | 	 0: swapper
 |         1: init
 |        18: migration/5
 |        29: events/2
 |        32: events/5
 |        33: events/6
...

% sudo ./perf lock info -m
| Address of instance: name of class
|  0xffff8800b95adae0: &amp;(&amp;sighand-&gt;siglock)-&gt;rlock
|  0xffff8800bbb41ae0: &amp;(&amp;sighand-&gt;siglock)-&gt;rlock
|  0xffff8800bf165ae0: &amp;(&amp;sighand-&gt;siglock)-&gt;rlock
|  0xffff8800b9576a98: &amp;p-&gt;cred_guard_mutex
|  0xffff8800bb890a08: &amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
|  0xffff8800b9522a08: &amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
|  0xffff8800bb8aaa08: &amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
|  0xffff8800bba72a08: &amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
|  0xffff8800bf18ea08: &amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
|  0xffff8800b8a0d8a0: &amp;(&amp;ip-&gt;i_lock)-&gt;mr_lock
|  0xffff88009bf818a0: &amp;(&amp;ip-&gt;i_lock)-&gt;mr_lock
|  0xffff88004c66b8a0: &amp;(&amp;ip-&gt;i_lock)-&gt;mr_lock
|  0xffff8800bb6478a0: &amp;(shost-&gt;host_lock)-&gt;rlock

v3: fixed some problems Frederic pointed out
 * better rbtree tracking in dump_threads()
 * removed printf() and used pr_info() and pr_debug()

Signed-off-by: Hitoshi Mitake &lt;mitake@dcl.info.waseda.ac.jp&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: Jason Baron &lt;jbaron@redhat.com&gt;
Cc: Xiao Guangrong &lt;xiaoguangrong@cn.fujitsu.com&gt;
---
 tools/perf/builtin-lock.c |   96 ++++++++++++++++++++++++++++++++++-----------
 1 files changed, 73 insertions(+), 23 deletions(-)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index ce27675..010a7c8 100644
--- ...
From: Frederic Weisbecker
Date: Wednesday, May 5, 2010 - 2:10 pm

Applied, thanks Hitoshi!

--

From: Hitoshi Mitake
Date: Thursday, May 6, 2010 - 2:31 am

On 05/06/10 06:10, Frederic Weisbecker wrote:
 &gt; On Mon, May 03, 2010 at 02:12:00PM +0900, Hitoshi Mitake wrote:
 &gt;&gt; Hi Frederic,
 &gt;&gt;
 &gt;&gt; I added &quot;info&quot; subcommand to perf lock,
 &gt;&gt; this can be used as dumping metadata like thread or address of lock 
instances.
 &gt;&gt; &quot;map&quot; was removed because info should do the work of it.
 &gt;&gt;
 &gt;&gt; This will be useful not only for debugging but also for ordinary 
analyzing.
 &gt;&gt;
 &gt;&gt; I made this patch on perf/core of your tree, could you queue this?
 &gt;&gt;
 &gt;&gt; v2: adding example of usage
 &gt;&gt; % sudo ./perf lock info -t
 &gt;&gt;   | Thread ID: comm
 &gt;&gt;   | 	 0: swapper
 &gt;&gt;   |         1: init
 &gt;&gt;   |        18: migration/5
 &gt;&gt;   |        29: events/2
 &gt;&gt;   |        32: events/5
 &gt;&gt;   |        33: events/6
 &gt;&gt; ...
 &gt;&gt;
 &gt;&gt; % sudo ./perf lock info -m
 &gt;&gt; | Address of instance: name of class
 &gt;&gt; |  0xffff8800b95adae0:&amp;(&amp;sighand-&gt;siglock)-&gt;rlock
 &gt;&gt; |  0xffff8800bbb41ae0:&amp;(&amp;sighand-&gt;siglock)-&gt;rlock
 &gt;&gt; |  0xffff8800bf165ae0:&amp;(&amp;sighand-&gt;siglock)-&gt;rlock
 &gt;&gt; |  0xffff8800b9576a98:&amp;p-&gt;cred_guard_mutex
 &gt;&gt; |  0xffff8800bb890a08:&amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
 &gt;&gt; |  0xffff8800b9522a08:&amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
 &gt;&gt; |  0xffff8800bb8aaa08:&amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
 &gt;&gt; |  0xffff8800bba72a08:&amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
 &gt;&gt; |  0xffff8800bf18ea08:&amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
 &gt;&gt; |  0xffff8800b8a0d8a0:&amp;(&amp;ip-&gt;i_lock)-&gt;mr_lock
 &gt;&gt; |  0xffff88009bf818a0:&amp;(&amp;ip-&gt;i_lock)-&gt;mr_lock
 &gt;&gt; |  0xffff88004c66b8a0:&amp;(&amp;ip-&gt;i_lock)-&gt;mr_lock
 &gt;&gt; |  0xffff8800bb6478a0:&amp;(shost-&gt;host_lock)-&gt;rlock
 &gt;&gt;
 &gt;&gt; v3: fixed some problems Frederic pointed out
 &gt;&gt;   * better rbtree tracking in dump_threads()
 &gt;&gt;   * removed printf() and used pr_info() and pr_debug()
 &gt;
 &gt;
 &gt;
 &gt; Applied, thanks Hitoshi!
 &gt;

Thanks Frederic!

And I prepared another patch for tracking only specified threads.

I'll send the new patch in this thread later.
But, the new one is not so big and hardly depends on &quot;info -t&quot; patch.
If you think the new patch and previous one should be in single patch,
please tell me. I'll merge ...
From: Hitoshi Mitake
Date: Thursday, May 6, 2010 - 2:32 am

I implemented the feature of tracking only specified threads to perf lock.
With -t option, users can specify which threads should be tracked.

Example of usage:
| % sudo ./perf lock info -t                    # info -t is convenient with this feature
|  Thread ID: comm
|          0: swapper
|          1: init
|         12: migration/3
|         13: ksoftirqd/3
|         27: events/0
|         28: events/1
|         29: events/2
|         30: events/3
|         31: events/4
|        857: kondemand/0
|        858: kondemand/1
|        859: kondemand/2
| ...
| % sudo ./perf lock -t 27,28,29,30,31 report   # track only these threads
|                 Name   acquired  contended total wait (ns)   max wait (ns)   min wait (ns)
|
|  &amp;(&amp;cwq-&gt;lock)-&gt;r...          4          0               0               0               0
|  &amp;(&amp;cwq-&gt;lock)-&gt;r...          4          0               0               0               0
|                  key          2          0               0               0               0
|                  key          2          0               0               0               0
|  &amp;(&amp;tty-&gt;buf.lock...          2          0               0               0               0
|  &amp;(&amp;tty-&gt;buf.lock...          2          0               0               0               0
|       tty_ldisc_lock          2          0               0               0               0
|  &amp;(&amp;base-&gt;lock)-&gt;...          2          0               0               0               0
|  &amp;(&amp;(&amp;(*({ do { c...          2          0               0               0               0
|  &amp;(&amp;(&amp;(*({ do { c...          2          0               0               0               0
|  &amp;(&amp;cwq-&gt;lock)-&gt;r...          2          0               0               0               0

I believe that this is convenient feature because perf itself acquires lots of locks.
Could you queue this, Frederic?

Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Arnaldo ...
From: Frederic Weisbecker
Date: Thursday, May 6, 2010 - 5:49 pm

I'm not sure we want such per thread granularity filtering. I'm not
sure it will be very useful. But per process yeah.

And actually we should do that on tracing time rather than on post-processing.
This will lower the tracing overhead a lot.

Ideally I think we need:
	./perf lock record ls -R /

This would trace locks taken by this instance of ls only, ie: drop the -a
if we pass a command line.

What do you think?

--

From: Hitoshi Mitake
Date: Saturday, May 8, 2010 - 1:02 am

On 05/07/10 09:49, Frederic Weisbecker wrote:
 &gt; On Thu, May 06, 2010 at 06:32:56PM +0900, Hitoshi Mitake wrote:
 &gt;&gt; I implemented the feature of tracking only specified threads to perf 
lock.
 &gt;&gt; With -t option, users can specify which threads should be tracked.
 &gt;&gt;
 &gt;&gt; Example of usage:
 &gt;&gt; | % sudo ./perf lock info -t                    # info -t is 
convenient with this feature
 &gt;&gt; |  Thread ID: comm
 &gt;&gt; |          0: swapper
 &gt;&gt; |          1: init
 &gt;&gt; |         12: migration/3
 &gt;&gt; |         13: ksoftirqd/3
 &gt;&gt; |         27: events/0
 &gt;&gt; |         28: events/1
 &gt;&gt; |         29: events/2
 &gt;&gt; |         30: events/3
 &gt;&gt; |         31: events/4
 &gt;&gt; |        857: kondemand/0
 &gt;&gt; |        858: kondemand/1
 &gt;&gt; |        859: kondemand/2
 &gt;&gt; | ...
 &gt;&gt; | % sudo ./perf lock -t 27,28,29,30,31 report   # track only these 
threads
 &gt;&gt; |                 Name   acquired  contended total wait (ns)   max 
wait (ns)   min wait (ns)
 &gt;
 &gt;
 &gt;
 &gt; I'm not sure we want such per thread granularity filtering. I'm not
 &gt; sure it will be very useful. But per process yeah.
 &gt;
 &gt; And actually we should do that on tracing time rather than on 
post-processing.
 &gt; This will lower the tracing overhead a lot.
 &gt;
 &gt; Ideally I think we need:
 &gt; 	./perf lock record ls -R /
 &gt;
 &gt; This would trace locks taken by this instance of ls only, ie: drop the -a
 &gt; if we pass a command line.
 &gt;
 &gt; What do you think?
 &gt;
 &gt;

Ah, I completely agree with your opinion :)
Even if user wants to append &quot;-a&quot;,
   ./perf lock record -a ls -R /
is enough. &quot;-a&quot; should not in default set of arguments.

I'll send the patch to drop &quot;-a&quot; later.

Thanks,
	Hitoshi
--

From: Hitoshi Mitake
Date: Saturday, May 8, 2010 - 1:10 am

This patch drops &quot;-a&quot; from record_args, which is passed to cmd_record().

Even if user wants to record all lock events during process runs,
        perf lock record -a &lt;program&gt; &lt;argument&gt; ...
is enough for this purpose.

This can reduce size of perf.data.

% sudo ./perf lock record whoami
root
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.439 MB perf.data (~19170 samples) ]
% sudo ./perf lock record -a whoami   # with -a option
root
[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 48.962 MB perf.data (~2139197 samples) ]

This patch was made on perf/test of random-tracing.git,
could you queue this, Frederic?

Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: Jason Baron &lt;jbaron@redhat.com&gt;
Cc: Xiao Guangrong &lt;xiaoguangrong@cn.fujitsu.com&gt;
Signed-off-by: Hitoshi Mitake &lt;mitake@dcl.info.waseda.ac.jp&gt;

---
 tools/perf/builtin-lock.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index d7dde9c..7f72abc 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -930,7 +930,6 @@ static const struct option lock_options[] = {
 
 static const char *record_args[] = {
 	&quot;record&quot;,
-	&quot;-a&quot;,
 	&quot;-R&quot;,
 	&quot;-f&quot;,
 	&quot;-m&quot;, &quot;1024&quot;,
-- 
1.6.5.2

--

From: Frederic Weisbecker
Date: Saturday, May 8, 2010 - 9:14 am

Thanks, will test it and if it's fine I'll queue.

I did a lot of tests these last days to understand what was going on
with perf lock, I mean the fact we have various bad locking scenario.

So far, the state machine looks rather good. In fact, the real problem
is that we don't have every events. We lose a _lot_ of them and that's
because the frequency of lock events is too high and perf record
can't keep up.

I think I'm going to unearth the injection code to reduce the size
of these events.

--

From: Hitoshi Mitake
Date: Sunday, May 9, 2010 - 7:53 am

On 05/09/10 01:14, Frederic Weisbecker wrote:
 &gt; On Sat, May 08, 2010 at 05:10:29PM +0900, Hitoshi Mitake wrote:
 &gt;&gt; This patch drops &quot;-a&quot; from record_args, which is passed to cmd_record().
 &gt;&gt;
 &gt;&gt; Even if user wants to record all lock events during process runs,
 &gt;&gt;          perf lock record -a&lt;program&gt;  &lt;argument&gt;  ...
 &gt;&gt; is enough for this purpose.
 &gt;&gt;
 &gt;&gt; This can reduce size of perf.data.
 &gt;&gt;
 &gt;&gt; % sudo ./perf lock record whoami
 &gt;&gt; root
 &gt;&gt; [ perf record: Woken up 1 times to write data ]
 &gt;&gt; [ perf record: Captured and wrote 0.439 MB perf.data (~19170 samples) ]
 &gt;&gt; % sudo ./perf lock record -a whoami   # with -a option
 &gt;&gt; root
 &gt;&gt; [ perf record: Woken up 0 times to write data ]
 &gt;&gt; [ perf record: Captured and wrote 48.962 MB perf.data (~2139197 
samples) ]
 &gt;&gt;
 &gt;&gt; This patch was made on perf/test of random-tracing.git,
 &gt;&gt; could you queue this, Frederic?
 &gt;&gt;
 &gt;&gt; Cc: Ingo Molnar&lt;mingo@elte.hu&gt;
 &gt;&gt; Cc: Peter Zijlstra&lt;a.p.zijlstra@chello.nl&gt;
 &gt;&gt; Cc: Paul Mackerras&lt;paulus@samba.org&gt;
 &gt;&gt; Cc: Arnaldo Carvalho de Melo&lt;acme@redhat.com&gt;
 &gt;&gt; Cc: Jens Axboe&lt;jens.axboe@oracle.com&gt;
 &gt;&gt; Cc: Jason Baron&lt;jbaron@redhat.com&gt;
 &gt;&gt; Cc: Xiao Guangrong&lt;xiaoguangrong@cn.fujitsu.com&gt;
 &gt;&gt; Signed-off-by: Hitoshi Mitake&lt;mitake@dcl.info.waseda.ac.jp&gt;
 &gt;
 &gt;
 &gt; Thanks, will test it and if it's fine I'll queue.
 &gt;
 &gt; I did a lot of tests these last days to understand what was going on
 &gt; with perf lock, I mean the fact we have various bad locking scenario.
 &gt;
 &gt; So far, the state machine looks rather good. In fact, the real problem
 &gt; is that we don't have every events. We lose a _lot_ of them and that's
 &gt; because the frequency of lock events is too high and perf record
 &gt; can't keep up.

Really, I didn't think about lack of events :(

 &gt;
 &gt; I think I'm going to unearth the injection code to reduce the size
 &gt; of these events.
 &gt;
 &gt;

Yeah, injection will be really helpful thing.

And I have a rough idea for reducing event frequency.

Many lock event sequences are like this ...
From: Frederic Weisbecker
Date: Monday, May 10, 2010 - 11:48 pm

You can observe this by looking at perf trace output and search
for long series of the same type of events without the others.

For example it's common to find long series of lock_contended
events without corresponding lock_acquire/release/acquired.

This is because lock_contended is the lowest frequent lock event,
when the others have their buffers full, this one still has rooms


You're right, we could optimize the lock events sequence layout.
What I'm afraid of is to break userspace, but ripping the name from
the lock events while introducing injection will break userspace anyway :-(

May be we can think about providing new lock events and announce the
deprecation of the old ones and remove them later. I'm not sure yet.

But summing up in only one event is not possible. Having only one
lock_release event (and a lock init for name mapping) is suitable
for latency measurements only (timestamp + lock addr + wait time +
acquired time).
And once you dig into finer grained analysis like latency induced
by dependencies (take lock A and then take lock B under A, latency
of B depends of A), then you're screwed, because you only know
you've released locks at given times but you don't know when you
acquired them, hence you can't build a tree of dependencies with
sequences inside.

Ideally, I think that we need to remove lock contended and only
trigger lock_acquired when we contended. We obviously don't need
lock_contended/lock_acquired if the lock wasn't contended.

We could even remove lock_acquire and only play with lock_acquired,
changing a bit the rules, but that will make us lose all the dependencies
&quot;intra-lock&quot;. I mean there are locks which slowpath are implemented on top
of other locks: mutexes use mutex-&gt;wait_lock spinlock for example.

--

From: Hitoshi Mitake
Date: Wednesday, May 12, 2010 - 3:23 am

On 05/11/10 15:48, Frederic Weisbecker wrote:

 &gt;&gt;&gt;
 &gt;&gt;&gt; I think I'm going to unearth the injection code to reduce the size
 &gt;&gt;&gt; of these events.
 &gt;&gt;&gt;
 &gt;&gt;&gt;
 &gt;&gt;
 &gt;&gt; Yeah, injection will be really helpful thing.
 &gt;&gt;
 &gt;&gt; And I have a rough idea for reducing event frequency.
 &gt;&gt;
 &gt;&gt; Many lock event sequences are like this form:
 &gt;&gt;   * acquire -&gt;  acquired -&gt;  release
 &gt;&gt;   * acquire -&gt;  contended -&gt;  acquired -&gt;  release
 &gt;&gt; I think that making 3 or 4 events per each lock sequences
 &gt;&gt; is waste of CPU time and memory space.
 &gt;&gt;
 &gt;&gt; If threads store time of each events
 &gt;&gt; and make only 1 event at time of release,
 &gt;&gt; we will be able to reduce lots of time and space.
 &gt;&gt;
 &gt;&gt; For example, ID of each lock instance is 8 byte in x86_64.
 &gt;&gt; In this scheme 8 * 4 byte for ID will be only 8 byte.
 &gt;&gt; I think this optimization has worth to consider because of
 &gt;&gt; high frequency of lock events.
 &gt;&gt;
 &gt;&gt; How do you think?
 &gt;
 &gt;
 &gt; You're right, we could optimize the lock events sequence layout.
 &gt; What I'm afraid of is to break userspace, but ripping the name from
 &gt; the lock events while introducing injection will break userspace 
anyway :-(

Really? For me, at least ripping the name with injection
doesn't make bad things for userspace.
What does the word &quot;break&quot; mean in this context?

 &gt;
 &gt; May be we can think about providing new lock events and announce the
 &gt; deprecation of the old ones and remove them later. I'm not sure yet.
 &gt;
 &gt; But summing up in only one event is not possible. Having only one
 &gt; lock_release event (and a lock init for name mapping) is suitable
 &gt; for latency measurements only (timestamp + lock addr + wait time +
 &gt; acquired time).
 &gt; And once you dig into finer grained analysis like latency induced
 &gt; by dependencies (take lock A and then take lock B under A, latency
 &gt; of B depends of A), then you're screwed, because you only know
 &gt; you've released locks at given times but you don't know when you
 &gt; acquired them, hence you can't build a tree of ...
From: Frederic Weisbecker
Date: Wednesday, May 12, 2010 - 8:51 am

The fact that tools which relied on the name field in the lock events






Ok it would be indeed more efficient for what we want with perf lock.
But I see drawbacks with this: other people might be interested
in only few of these events. Someone may want to count lock_contended
events only to produce simple contention stats for example, and this
user will have in turn larger traces than he was supposed to with
this batch event.

I think it's usually better to divide the problems into smaller
problems. And here it's about dividing a high level meaning
(a lock sequence) into smaller and lower level meanings (a lock
event). Following this rule makes tracing much more powerful IMO.

A lock acquire event has also an isolated meaning outside the
whole lock sequence, like in my above lock_contended example,




But still I think it's useful to keep lock_contended and
lock_acquired. They have an important meaning alone.

--

From: Hitoshi Mitake
Date: Saturday, May 15, 2010 - 1:54 am

On 05/13/10 00:51, Frederic Weisbecker wrote:
 &gt; On Wed, May 12, 2010 at 07:23:23PM +0900, Hitoshi Mitake wrote:
 &gt;&gt; On 05/11/10 15:48, Frederic Weisbecker wrote:
 &gt;&gt;
 &gt;&gt;&gt;&gt;&gt;
 &gt;&gt;&gt;&gt;&gt; I think I'm going to unearth the injection code to reduce the size
 &gt;&gt;&gt;&gt;&gt; of these events.
 &gt;&gt;&gt;&gt;&gt;
 &gt;&gt;&gt;&gt;&gt;
 &gt;&gt;&gt;&gt;
 &gt;&gt;&gt;&gt; Yeah, injection will be really helpful thing.
 &gt;&gt;&gt;&gt;
 &gt;&gt;&gt;&gt; And I have a rough idea for reducing event frequency.
 &gt;&gt;&gt;&gt;
 &gt;&gt;&gt;&gt; Many lock event sequences are like this form:
 &gt;&gt;&gt;&gt;    * acquire -&gt;   acquired -&gt;   release
 &gt;&gt;&gt;&gt;    * acquire -&gt;   contended -&gt;   acquired -&gt;   release
 &gt;&gt;&gt;&gt; I think that making 3 or 4 events per each lock sequences
 &gt;&gt;&gt;&gt; is waste of CPU time and memory space.
 &gt;&gt;&gt;&gt;
 &gt;&gt;&gt;&gt; If threads store time of each events
 &gt;&gt;&gt;&gt; and make only 1 event at time of release,
 &gt;&gt;&gt;&gt; we will be able to reduce lots of time and space.
 &gt;&gt;&gt;&gt;
 &gt;&gt;&gt;&gt; For example, ID of each lock instance is 8 byte in x86_64.
 &gt;&gt;&gt;&gt; In this scheme 8 * 4 byte for ID will be only 8 byte.
 &gt;&gt;&gt;&gt; I think this optimization has worth to consider because of
 &gt;&gt;&gt;&gt; high frequency of lock events.
 &gt;&gt;&gt;&gt;
 &gt;&gt;&gt;&gt; How do you think?
 &gt;&gt;&gt;
 &gt;&gt;&gt;
 &gt;&gt;&gt; You're right, we could optimize the lock events sequence layout.
 &gt;&gt;&gt; What I'm afraid of is to break userspace, but ripping the name from
 &gt;&gt;&gt; the lock events while introducing injection will break userspace
 &gt;&gt; anyway :-(
 &gt;&gt;
 &gt;&gt; Really? For me, at least ripping the name with injection
 &gt;&gt; doesn't make bad things for userspace.
 &gt;&gt; What does the word &quot;break&quot; mean in this context?
 &gt;
 &gt;
 &gt; The fact that tools which relied on the name field in the lock events
 &gt; won't work anymore as it will be present on lock_init_class only.

Ah, but rewriting perf lock will be required anyway
because this is still very alpha program :)

 &gt;
 &gt;
 &gt;
 &gt;&gt;&gt;
 &gt;&gt;&gt; May be we can think about providing new lock events and announce the
 &gt;&gt;&gt; deprecation of the old ones and remove them later. I'm not sure yet.
 &gt;&gt;&gt;
 &gt;&gt;&gt; But summing up in only one event is not possible. Having only one
 &gt;&gt;&gt; ...
From: tip-bot for Hitoshi Mitake
Date: Monday, May 10, 2010 - 12:19 am

Commit-ID:  26242d859c9be9eea61f7f19514e9d272ae8ce26
Gitweb:     http://git.kernel.org/tip/26242d859c9be9eea61f7f19514e9d272ae8ce26
Author:     Hitoshi Mitake &lt;mitake@dcl.info.waseda.ac.jp&gt;
AuthorDate: Mon, 3 May 2010 14:12:00 +0900
Committer:  Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
CommitDate: Sun, 9 May 2010 13:45:24 +0200

perf lock: Add &quot;info&quot; subcommand for dumping misc information

This adds the &quot;info&quot; subcommand to perf lock which can be used
to dump metadata like threads or addresses of lock instances.
&quot;map&quot; was removed because info should do the work for it.

This will be useful not only for debugging but also for ordinary
analyzing.

v2: adding example of usage
% sudo ./perf lock info -t
 | Thread ID: comm
 | 	 0: swapper
 |         1: init
 |        18: migration/5
 |        29: events/2
 |        32: events/5
 |        33: events/6
...

% sudo ./perf lock info -m
| Address of instance: name of class
|  0xffff8800b95adae0: &amp;(&amp;sighand-&gt;siglock)-&gt;rlock
|  0xffff8800bbb41ae0: &amp;(&amp;sighand-&gt;siglock)-&gt;rlock
|  0xffff8800bf165ae0: &amp;(&amp;sighand-&gt;siglock)-&gt;rlock
|  0xffff8800b9576a98: &amp;p-&gt;cred_guard_mutex
|  0xffff8800bb890a08: &amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
|  0xffff8800b9522a08: &amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
|  0xffff8800bb8aaa08: &amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
|  0xffff8800bba72a08: &amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
|  0xffff8800bf18ea08: &amp;(&amp;p-&gt;alloc_lock)-&gt;rlock
|  0xffff8800b8a0d8a0: &amp;(&amp;ip-&gt;i_lock)-&gt;mr_lock
|  0xffff88009bf818a0: &amp;(&amp;ip-&gt;i_lock)-&gt;mr_lock
|  0xffff88004c66b8a0: &amp;(&amp;ip-&gt;i_lock)-&gt;mr_lock
|  0xffff8800bb6478a0: &amp;(shost-&gt;host_lock)-&gt;rlock

v3: fixed some problems Frederic pointed out
 * better rbtree tracking in dump_threads()
 * removed printf() and used pr_info() and pr_debug()

Signed-off-by: Hitoshi Mitake &lt;mitake@dcl.info.waseda.ac.jp&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: Jason Baron ...
From: Frederic Weisbecker
Date: Friday, April 23, 2010 - 7:05 pm

Use the new generic sample events reordering from perf kmem,
this drops the need of multiplexing the buffers on record time,
improving the scalability of perf kmem.

Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Hitoshi Mitake &lt;mitake@dcl.info.waseda.ac.jp&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Masami Hiramatsu &lt;mhiramat@redhat.com&gt;
Cc: Tom Zanussi &lt;tzanussi@gmail.com&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
---
 tools/perf/builtin-kmem.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index db474bb..ab906cb 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -335,8 +335,9 @@ static int process_sample_event(event_t *event, struct perf_session *session)
 }
 
 static struct perf_event_ops event_ops = {
-	.sample	= process_sample_event,
-	.comm	= event__process_comm,
+	.sample			= process_sample_event,
+	.comm			= event__process_comm,
+	.ordered_samples	= true,
 };
 
 static double fragmentation(unsigned long n_req, unsigned long n_alloc)
@@ -730,7 +731,6 @@ static const char *record_args[] = {
 	&quot;record&quot;,
 	&quot;-a&quot;,
 	&quot;-R&quot;,
-	&quot;-M&quot;,
 	&quot;-f&quot;,
 	&quot;-c&quot;, &quot;1&quot;,
 	&quot;-e&quot;, &quot;kmem:kmalloc&quot;,
-- 
1.6.2.3

--

From: Frederic Weisbecker
Date: Friday, April 23, 2010 - 7:05 pm

Use the new generic sample events reordering from perf trace.
Before that, the displayed traces were ordered as they were
in the input as recorded by perf record (not time ordered).

This makes eventually perf trace displaying the events as beeing
time ordered.

Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Hitoshi Mitake &lt;mitake@dcl.info.waseda.ac.jp&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Masami Hiramatsu &lt;mhiramat@redhat.com&gt;
Cc: Tom Zanussi &lt;tzanussi@gmail.com&gt;
---
 tools/perf/builtin-trace.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 2eefb33..1d034f6 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -108,6 +108,7 @@ static struct perf_event_ops event_ops = {
 	.event_type = event__process_event_type,
 	.tracing_data = event__process_tracing_data,
 	.build_id = event__process_build_id,
+	.ordered_samples = true,
 };
 
 extern volatile int session_done;
-- 
1.6.2.3

--

From: Frederic Weisbecker
Date: Friday, April 23, 2010 - 7:05 pm

Use the new generic sample events reordering from perf sched,
this drops the need of multiplexing the buffers on record time,
improving the scalability of perf sched.

Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Hitoshi Mitake &lt;mitake@dcl.info.waseda.ac.jp&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Masami Hiramatsu &lt;mhiramat@redhat.com&gt;
Cc: Tom Zanussi &lt;tzanussi@gmail.com&gt;
---
 tools/perf/builtin-sched.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 09ddc8e..94453f1 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -1651,9 +1651,10 @@ static int process_lost_event(event_t *event __used,
 }
 
 static struct perf_event_ops event_ops = {
-	.sample	= process_sample_event,
-	.comm	= event__process_comm,
-	.lost	= process_lost_event,
+	.sample			= process_sample_event,
+	.comm			= event__process_comm,
+	.lost			= process_lost_event,
+	.ordered_samples	= true,
 };
 
 static int read_events(void)
@@ -1850,7 +1851,6 @@ static const char *record_args[] = {
 	&quot;record&quot;,
 	&quot;-a&quot;,
 	&quot;-R&quot;,
-	&quot;-M&quot;,
 	&quot;-f&quot;,
 	&quot;-m&quot;, &quot;1024&quot;,
 	&quot;-c&quot;, &quot;1&quot;,
-- 
1.6.2.3

--

From: Frederic Weisbecker
Date: Friday, April 23, 2010 - 7:05 pm

The sample events recorded by perf record are not time ordered
because we have one buffer per cpu for each event (even demultiplexed
per task/per cpu for task bound events). But when we read trace events
we want them to be ordered by time because many state machines are
involved.

There are currently two ways perf tools deal with that:

- use -M to multiplex every buffers (perf sched, perf kmem)
  But this creates a lot of contention in SMP machines on
  record time.

- use a post-processing time reordering (perf timechart, perf lock)
  The reordering used by timechart is simple but doesn't scale well
  with huge flow of events, in terms of performance and memory use
  (unusable with perf lock for example).
  Perf lock has its own samples reordering that flushes its memory
  use in a regular basis and that uses a sorting based on the
  previous event queued (a new event to be queued is close to the
  previous one most of the time).

This patch proposes to export perf lock's samples reordering facility
to the session layer that reads the events. So if a tool wants to
get ordered sample events, it needs to set its
struct perf_event_ops::ordered_samples to true and that's it.

This prepares tracing based perf tools to get rid of the need to
use buffers multiplexing (-M) or to implement their own
reordering.

Also lower the flush period to 2 as it's sufficient already.

Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Hitoshi Mitake &lt;mitake@dcl.info.waseda.ac.jp&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Masami Hiramatsu &lt;mhiramat@redhat.com&gt;
Cc: Tom Zanussi &lt;tzanussi@gmail.com&gt;
---
 tools/perf/builtin-lock.c |  197 +++++----------------------------------------
 tools/perf/util/session.c |  179 ++++++++++++++++++++++++++++++++++++++++-
 tools/perf/util/session.h |   10 +++
 3 files changed, 210 insertions(+), 176 deletions(-)

diff ...
From: Frederic Weisbecker
Date: Friday, April 23, 2010 - 7:05 pm

Use the new generic sample events reordering from perf timechart,
this drops the ad hoc sample reordering it was using before.

Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Hitoshi Mitake &lt;mitake@dcl.info.waseda.ac.jp&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Masami Hiramatsu &lt;mhiramat@redhat.com&gt;
Cc: Tom Zanussi &lt;tzanussi@gmail.com&gt;
Cc: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
---
 tools/perf/builtin-timechart.c |  112 ++--------------------------------------
 1 files changed, 5 insertions(+), 107 deletions(-)

diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index 96f4a09..c35aa44 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -143,9 +143,6 @@ struct wake_event {
 static struct power_event    *power_events;
 static struct wake_event     *wake_events;
 
-struct sample_wrapper *all_samples;
-
-
 struct process_filter;
 struct process_filter {
 	char			*name;
@@ -566,88 +563,6 @@ static void end_sample_processing(void)
 	}
 }
 
-static u64 sample_time(event_t *event, const struct perf_session *session)
-{
-	int cursor;
-
-	cursor = 0;
-	if (session-&gt;sample_type &amp; PERF_SAMPLE_IP)
-		cursor++;
-	if (session-&gt;sample_type &amp; PERF_SAMPLE_TID)
-		cursor++;
-	if (session-&gt;sample_type &amp; PERF_SAMPLE_TIME)
-		return event-&gt;sample.array[cursor];
-	return 0;
-}
-
-
-/*
- * We first queue all events, sorted backwards by insertion.
- * The order will get flipped later.
- */
-static int queue_sample_event(event_t *event, struct perf_session *session)
-{
-	struct sample_wrapper *copy, *prev;
-	int size;
-
-	size = event-&gt;sample.header.size + sizeof(struct sample_wrapper) + 8;
-
-	copy = malloc(size);
-	if (!copy)
-		return 1;
-
-	memset(copy, 0, size);
-
-	copy-&gt;next = NULL;
-	copy-&gt;timestamp = sample_time(event, session);
-
-	memcpy(&amp;copy-&gt;data, event, ...
From: Frederic Weisbecker
Date: Friday, April 23, 2010 - 7:05 pm

From: William Cohen &lt;wcohen@redhat.com&gt;

asciidoc does not allow the &quot;===&quot; to be longer than the line
above it.
Also fix a couple types and formatting errors.

Signed-off-by: William Cohen &lt;wcohen@redhat.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
LKML-Reference: &lt;4BD204C5.9000504@redhat.com&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
---
 tools/perf/Documentation/perf-kvm.txt |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/perf/Documentation/perf-kvm.txt b/tools/perf/Documentation/perf-kvm.txt
index 93400a0..d004e19 100644
--- a/tools/perf/Documentation/perf-kvm.txt
+++ b/tools/perf/Documentation/perf-kvm.txt
@@ -1,5 +1,5 @@
 perf-kvm(1)
-==============
+===========
 
 NAME
 ----
@@ -41,9 +41,9 @@ There are a couple of variants of perf kvm:
 OPTIONS
 -------
 --host=::
-        Collect host side perforamnce profile.
+        Collect host side performance profile.
 --guest=::
-        Collect guest side perforamnce profile.
+        Collect guest side performance profile.
 --guestmount=&lt;path&gt;::
 	Guest os root file system mount directory. Users mounts guest os
         root directories under &lt;path&gt; by a specific filesystem access method,
@@ -64,4 +64,5 @@ OPTIONS
 
 SEE ALSO
 --------
-linkperf:perf-top[1] perf-record[1] perf-report[1] perf-diff[1] perf-buildid-list[1]
+linkperf:perf-top[1], linkperf:perf-record[1], linkperf:perf-report[1],
+linkperf:perf-diff[1], linkperf:perf-buildid-list[1]
-- 
1.6.2.3

--

From: Frederic Weisbecker
Date: Friday, April 23, 2010 - 7:05 pm

To ensure sample events time reordering is reliable, add a -d option
to perf trace to check that automatically.

Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Hitoshi Mitake &lt;mitake@dcl.info.waseda.ac.jp&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Masami Hiramatsu &lt;mhiramat@redhat.com&gt;
Cc: Tom Zanussi &lt;tzanussi@gmail.com&gt;
---
 tools/perf/builtin-trace.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 1d034f6..77f556f 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -11,6 +11,8 @@
 
 static char const		*script_name;
 static char const		*generate_script_lang;
+static bool			debug_ordering;
+static u64			last_timestamp;
 
 static int default_start_script(const char *script __unused,
 				int argc __unused,
@@ -87,6 +89,14 @@ static int process_sample_event(event_t *event, struct perf_session *session)
 	}
 
 	if (session-&gt;sample_type &amp; PERF_SAMPLE_RAW) {
+		if (debug_ordering) {
+			if (data.time &lt; last_timestamp) {
+				pr_err(&quot;Samples misordered, previous: %llu &quot;
+					&quot;this: %llu\n&quot;, last_timestamp,
+					data.time);
+			}
+			last_timestamp = data.time;
+		}
 		/*
 		 * FIXME: better resolve from pid from the struct trace_entry
 		 * field, although it should be the same than this perf
@@ -532,6 +542,8 @@ static const struct option options[] = {
 		   &quot;generate perf-trace.xx script in specified language&quot;),
 	OPT_STRING('i', &quot;input&quot;, &amp;input_name, &quot;file&quot;,
 		    &quot;input file name&quot;),
+	OPT_BOOLEAN('d', &quot;debug-ordering&quot;, &amp;debug_ordering,
+		   &quot;check that samples time ordering is monotonic&quot;),
 
 	OPT_END()
 };
-- 
1.6.2.3

--

From: Masami Hiramatsu
Date: Saturday, April 24, 2010 - 9:13 am

Hi Frederic,


Just out of curiosity, is there any possibility to add 
other debugging options to perf-trace?
If so, I think we can pass some arguments for
this -d option, e.g. &quot;-d ordering,events&quot; and so on.
Just an idea.

Thank you,



-- 
Masami Hiramatsu
e-mail: mhiramat@redhat.com
--

From: Frederic Weisbecker
Date: Sunday, April 25, 2010 - 11:08 am

Sure, if somebody wants to come with another debugging option
in perf trace, no problem.

--

From: Frederic Weisbecker
Date: Friday, April 23, 2010 - 7:27 pm

I forgot to highlight some things here.

- The -M option is not used anymore. Well actually I just checked and it's
  used by the record perl/python scripts. But it's not needed there anymore, so
  I'll drop it in another pass. But globally it's over with the buffers
  multiplexing needs.

- But I haven't plugged the reordering thing to the live mode, because I'm
  not sure exactly if that would be welcome. With live mode we want the
  events as they arrive, using the reordering there would make it get the
  events per bunches of 2 seconds slices. I guess we'll figure out a solution
  for that.

- Perf lock gets into a better shape. There is still some work to make
  it truly usable though. I need to unearth the event injection thing
  to lower the size of the events, profile by lock classes, etc...

- Various important fixes

Thanks.

--

From: Ingo Molnar
Date: Tuesday, April 27, 2010 - 2:15 am

Pulled, thanks a lot Frederic!

	Ingo
--

Previous thread: 2.6.34-rc5-git3 - INFO: possible circular locking dependency detected - (s_active#91){++++.+}, at: [<ffffffff81142c4a>] sysfs_addrm_finish+0x31/0x50 by Miles Lane on Friday, April 23, 2010 - 6:56 pm. (2 messages)

Next thread: Re: [PATCH 1/9] PM: Add suspend block api. by Alan Stern on Friday, April 23, 2010 - 7:30 pm. (2 messages)