Ingo,
Please pull the perf/core branch that can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
perf/core
Thanks,
Frederic
---
Frederic Weisbecker (6):
perf: Generalize perf lock's sample event reordering to the session layer
perf: Use generic sample reordering in perf sched
perf: Use generic sample reordering in perf kmem
perf: Use generic sample reordering in perf trace
perf: Use generic sample reordering in perf timechart
perf: Add a perf trace option to check samples ordering reliability
Hitoshi Mitake (1):
perf lock: Fix state machine to recognize lock sequence
Stephane Eranian (1):
perf: Fix initialization bug in parse_single_tracepoint_event()
William Cohen (1):
perf: Some perf-kvm documentation edits
tools/perf/Documentation/perf-kvm.txt | 9 +-
tools/perf/builtin-kmem.c | 6 +-
tools/perf/builtin-lock.c | 595 ++++++++++++++++++++-------------
tools/perf/builtin-sched.c | 8 +-
tools/perf/builtin-timechart.c | 112 +------
tools/perf/builtin-trace.c | 13 +
tools/perf/util/parse-events.c | 13 +-
tools/perf/util/session.c | 179 ++++++++++-
tools/perf/util/session.h | 10 +
9 files changed, 583 insertions(+), 362 deletions(-)
--
From: Stephane Eranian <eranian@google.com> The parse_single_tracepoint_event() was setting some attributes before it validated the event was indeed a tracepoint event. This caused problems with other initialization routines like in the builtin-top.c module whereby sample_period is not set if not 0. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> LKML-Reference: <4bcf232b.698fd80a.6fbe.ffffb737@mx.google.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> --- tools/perf/util/parse-events.c | 13 +++++++------ 1 files changed, 7 insertions(+), 6 deletions(-) diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 3b4ec67..600d327 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -418,12 +418,6 @@ parse_single_tracepoint_event(char *sys_name, u64 id; int fd; - attr->sample_type |= PERF_SAMPLE_RAW; - attr->sample_type |= PERF_SAMPLE_TIME; - attr->sample_type |= PERF_SAMPLE_CPU; - - attr->sample_period = 1; - snprintf(evt_path, MAXPATHLEN, "%s/%s/%s/id", debugfs_path, sys_name, evt_name); @@ -442,6 +436,13 @@ parse_single_tracepoint_event(char *sys_name, attr->type = PERF_TYPE_TRACEPOINT; *strp = evt_name + evt_length; + attr->sample_type |= PERF_SAMPLE_RAW; + attr->sample_type |= PERF_SAMPLE_TIME; + attr->sample_type |= PERF_SAMPLE_CPU; + + attr->sample_period = 1; + + return EVT_HANDLED; } -- 1.6.2.3 --
From: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Previous state machine of perf lock was really broken.
This patch improves it a little.
This patch prepares the list of state machine that represents
lock sequences for each threads.
These state machines can be one of these sequences:
1) acquire -> acquired -> release
2) acquire -> contended -> acquired -> release
3) acquire (w/ try) -> release
4) acquire (w/ read) -> release
The case of 4) is a little special.
Double acquire of read lock is allowed, so the state machine
counts read lock number, and permits double acquire and release.
But, things are not so simple. Something in my model is still wrong.
I counted the number of lock instances with bad sequence,
and ratio is like this (case of tracing whoami): bad:233, total:2279
version 2:
* threads are now identified with tid, not pid
* prepared SEQ_STATE_READ_ACQUIRED for read lock.
* bunch of struct lock_seq_stat is now linked list
* debug information enhanced (this have to be removed someday)
e.g.
| === output for debug===
|
| bad:233, total:2279
| bad rate:0.000000
| histogram of events caused bad sequence
| acquire: 165
| acquired: 0
| contended: 0
| release: 68
Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <1271852634-9351-1-git-send-email-mitake@dcl.info.waseda.ac.jp>
[rename SEQ_STATE_UNINITED to SEQ_STATE_UNINITIALIZED]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
tools/perf/builtin-lock.c | 410 +++++++++++++++++++++++++++++++++++++--------
1 files changed, 342 insertions(+), 68 deletions(-)
diff --git ...Hi Frederic,
I added "info" subcommand to perf lock,
this can be used as dumping metadata like thread or address of lock instances.
"map" was removed because info should do the work of it.
This will be useful not only for debugging but also for ordinary analyzing.
I made this patch on perf/core of your tree, could you queue this?
Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
tools/perf/builtin-lock.c | 68 +++++++++++++++++++++++++++++++++++++++++++--
1 files changed, 65 insertions(+), 3 deletions(-)
diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index ce27675..c54211e 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -778,18 +778,61 @@ static void print_result(void)
}
}
+static int info_threads;
+static int info_map;
+
+static void rec_dump_threads(struct rb_node *node)
+{
+ struct thread_stat *st;
+ struct thread *t;
+
+ if (!node)
+ return;
+
+ if (node->rb_left)
+ rec_dump_threads(node->rb_left);
+
+ st = container_of(node, struct thread_stat, rb);
+ BUG_ON(!st);
+ t = perf_session__findnew(session, st->tid);
+ BUG_ON(!t);
+
+ printf("%10d: %s\n", st->tid, t->comm);
+
+ if (node->rb_right)
+ rec_dump_threads(node->rb_right);
+}
+
+static void dump_threads(void)
+{
+ printf("%10s: comm\n", "Thread ID");
+ rec_dump_threads(thread_stats.rb_node);
+}
+
static void dump_map(void)
{
unsigned int i;
struct lock_stat *st;
+ printf("Address of instance: name of class\n");
for (i = 0; i < LOCKHASH_SIZE; i++) {
list_for_each_entry(st, &lockhash_table[i], hash_entry) {
- printf("%p: %s\n", st->addr, st->name);
+ printf(" %p: %s\n", st->addr, st->name);
}
}
}
...On 04/24/10 19:43, Hitoshi Mitake wrote: > Hi Frederic, > > I added "info" subcommand to perf lock, > this can be used as dumping metadata like thread or address of lock instances. > "map" was removed because info should do the work of it. > > This will be useful not only for debugging but also for ordinary analyzing. > > I made this patch on perf/core of your tree, could you queue this? Sorry, I should add example of usage :( I resend v2, please discard it. --
Hi Frederic, I added "info" subcommand to perf lock, this can be used as dumping metadata like thread or address of lock instances. "map" was removed because info should do the work of it. This will be useful not only for debugging but also for ordinary analyzing. I made this patch on perf/core of your tree, could you queue this? v2: adding example of usage % sudo ./perf lock info -t | Thread ID: comm | 0: swapper | 1: init | 18: migration/5 | 29: events/2 | 32: events/5 | 33: events/6 ... % sudo ./perf lock info -m | Address of instance: name of class | 0xffff8800b95adae0: &(&sighand->siglock)->rlock | 0xffff8800bbb41ae0: &(&sighand->siglock)->rlock | 0xffff8800bf165ae0: &(&sighand->siglock)->rlock | 0xffff8800b9576a98: &p->cred_guard_mutex | 0xffff8800bb890a08: &(&p->alloc_lock)->rlock | 0xffff8800b9522a08: &(&p->alloc_lock)->rlock | 0xffff8800bb8aaa08: &(&p->alloc_lock)->rlock | 0xffff8800bba72a08: &(&p->alloc_lock)->rlock | 0xffff8800bf18ea08: &(&p->alloc_lock)->rlock | 0xffff8800b8a0d8a0: &(&ip->i_lock)->mr_lock | 0xffff88009bf818a0: &(&ip->i_lock)->mr_lock | 0xffff88004c66b8a0: &(&ip->i_lock)->mr_lock | 0xffff8800bb6478a0: &(shost->host_lock)->rlock Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> --- tools/perf/builtin-lock.c | 68 +++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 65 insertions(+), 3 deletions(-) diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c index ce27675..c54211e 100644 --- a/tools/perf/builtin-lock.c +++ b/tools/perf/builtin-lock.c @@ -778,18 +778,61 @@ static void print_result(void) } } +static int info_threads; +static ...
Thanks, I queue it for the next pass. --
I've eventually not queued it because of some various
That only walks the left nodes of the rbtree, imagine the following
rbtree, W are visited nodes, U are the unvisited:
Root
/ \
W U
/ \ / \
W U U U
You won't ever have !st because container_of computes an address
based on a struct type and a contained address inside this struct.
struct thread_stat {
struct list_head hash_entry;
struct rb_node rb;
[...]
} ts;
If ts->rb == 1000, ts == 1000 - 16 or something like this.
Please don't use printf anymore (I did the same mistakes lately),
now that are using a TUI and we might use a GUI one day, we
can't assume anymore we are dealing with a normal stdout.
No it's not ugly, it's ok, we do this everywhere in perf tools :)
Thanks.
--
On 05/01/10 03:49, Frederic Weisbecker wrote: > On Sat, Apr 24, 2010 at 07:46:41PM +0900, Hitoshi Mitake wrote: >> Hi Frederic, >> >> I added "info" subcommand to perf lock, >> this can be used as dumping metadata like thread or address of lock instances. >> "map" was removed because info should do the work of it. >> >> This will be useful not only for debugging but also for ordinary analyzing. >> >> I made this patch on perf/core of your tree, could you queue this? >> >> v2: adding example of usage >> % sudo ./perf lock info -t >> | Thread ID: comm >> | 0: swapper >> | 1: init >> | 18: migration/5 >> | 29: events/2 >> | 32: events/5 >> | 33: events/6 >> ... >> >> % sudo ./perf lock info -m >> | Address of instance: name of class >> | 0xffff8800b95adae0:&(&sighand->siglock)->rlock >> | 0xffff8800bbb41ae0:&(&sighand->siglock)->rlock >> | 0xffff8800bf165ae0:&(&sighand->siglock)->rlock >> | 0xffff8800b9576a98:&p->cred_guard_mutex >> | 0xffff8800bb890a08:&(&p->alloc_lock)->rlock >> | 0xffff8800b9522a08:&(&p->alloc_lock)->rlock >> | 0xffff8800bb8aaa08:&(&p->alloc_lock)->rlock >> | 0xffff8800bba72a08:&(&p->alloc_lock)->rlock >> | 0xffff8800bf18ea08:&(&p->alloc_lock)->rlock >> | 0xffff8800b8a0d8a0:&(&ip->i_lock)->mr_lock >> | 0xffff88009bf818a0:&(&ip->i_lock)->mr_lock >> | 0xffff88004c66b8a0:&(&ip->i_lock)->mr_lock >> | 0xffff8800bb6478a0:&(shost->host_lock)->rlock >> Signed-off-by: Hitoshi Mitake<mitake@dcl.info.waseda.ac.jp> >> Cc: Ingo Molnar<mingo@elte.hu> >> Cc: Peter Zijlstra<a.p.zijlstra@chello.nl> >> Cc: Paul Mackerras<paulus@samba.org> >> Cc: Arnaldo Carvalho de Melo<acme@redhat.com> >> Cc: Jens Axboe<jens.axboe@oracle.com> >> Cc: Jason Baron<jbaron@redhat.com> >> Cc: Xiao Guangrong<xiaoguangrong@cn.fujitsu.com> > > > I've eventually not queued it because of some various > problems, see below: Thanks for your advice, Frederic! > > ...
Hi Frederic, I added "info" subcommand to perf lock, this can be used as dumping metadata like thread or address of lock instances. "map" was removed because info should do the work of it. This will be useful not only for debugging but also for ordinary analyzing. I made this patch on perf/core of your tree, could you queue this? v2: adding example of usage % sudo ./perf lock info -t | Thread ID: comm | 0: swapper | 1: init | 18: migration/5 | 29: events/2 | 32: events/5 | 33: events/6 ... % sudo ./perf lock info -m | Address of instance: name of class | 0xffff8800b95adae0: &(&sighand->siglock)->rlock | 0xffff8800bbb41ae0: &(&sighand->siglock)->rlock | 0xffff8800bf165ae0: &(&sighand->siglock)->rlock | 0xffff8800b9576a98: &p->cred_guard_mutex | 0xffff8800bb890a08: &(&p->alloc_lock)->rlock | 0xffff8800b9522a08: &(&p->alloc_lock)->rlock | 0xffff8800bb8aaa08: &(&p->alloc_lock)->rlock | 0xffff8800bba72a08: &(&p->alloc_lock)->rlock | 0xffff8800bf18ea08: &(&p->alloc_lock)->rlock | 0xffff8800b8a0d8a0: &(&ip->i_lock)->mr_lock | 0xffff88009bf818a0: &(&ip->i_lock)->mr_lock | 0xffff88004c66b8a0: &(&ip->i_lock)->mr_lock | 0xffff8800bb6478a0: &(shost->host_lock)->rlock v3: fixed some problems Frederic pointed out * better rbtree tracking in dump_threads() * removed printf() and used pr_info() and pr_debug() Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> --- tools/perf/builtin-lock.c | 96 ++++++++++++++++++++++++++++++++++----------- 1 files changed, 73 insertions(+), 23 deletions(-) diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c index ce27675..010a7c8 100644 --- ...
On 05/06/10 06:10, Frederic Weisbecker wrote: > On Mon, May 03, 2010 at 02:12:00PM +0900, Hitoshi Mitake wrote: >> Hi Frederic, >> >> I added "info" subcommand to perf lock, >> this can be used as dumping metadata like thread or address of lock instances. >> "map" was removed because info should do the work of it. >> >> This will be useful not only for debugging but also for ordinary analyzing. >> >> I made this patch on perf/core of your tree, could you queue this? >> >> v2: adding example of usage >> % sudo ./perf lock info -t >> | Thread ID: comm >> | 0: swapper >> | 1: init >> | 18: migration/5 >> | 29: events/2 >> | 32: events/5 >> | 33: events/6 >> ... >> >> % sudo ./perf lock info -m >> | Address of instance: name of class >> | 0xffff8800b95adae0:&(&sighand->siglock)->rlock >> | 0xffff8800bbb41ae0:&(&sighand->siglock)->rlock >> | 0xffff8800bf165ae0:&(&sighand->siglock)->rlock >> | 0xffff8800b9576a98:&p->cred_guard_mutex >> | 0xffff8800bb890a08:&(&p->alloc_lock)->rlock >> | 0xffff8800b9522a08:&(&p->alloc_lock)->rlock >> | 0xffff8800bb8aaa08:&(&p->alloc_lock)->rlock >> | 0xffff8800bba72a08:&(&p->alloc_lock)->rlock >> | 0xffff8800bf18ea08:&(&p->alloc_lock)->rlock >> | 0xffff8800b8a0d8a0:&(&ip->i_lock)->mr_lock >> | 0xffff88009bf818a0:&(&ip->i_lock)->mr_lock >> | 0xffff88004c66b8a0:&(&ip->i_lock)->mr_lock >> | 0xffff8800bb6478a0:&(shost->host_lock)->rlock >> >> v3: fixed some problems Frederic pointed out >> * better rbtree tracking in dump_threads() >> * removed printf() and used pr_info() and pr_debug() > > > > Applied, thanks Hitoshi! > Thanks Frederic! And I prepared another patch for tracking only specified threads. I'll send the new patch in this thread later. But, the new one is not so big and hardly depends on "info -t" patch. If you think the new patch and previous one should be in single patch, please tell me. I'll merge ...
I implemented the feature of tracking only specified threads to perf lock.
With -t option, users can specify which threads should be tracked.
Example of usage:
| % sudo ./perf lock info -t # info -t is convenient with this feature
| Thread ID: comm
| 0: swapper
| 1: init
| 12: migration/3
| 13: ksoftirqd/3
| 27: events/0
| 28: events/1
| 29: events/2
| 30: events/3
| 31: events/4
| 857: kondemand/0
| 858: kondemand/1
| 859: kondemand/2
| ...
| % sudo ./perf lock -t 27,28,29,30,31 report # track only these threads
| Name acquired contended total wait (ns) max wait (ns) min wait (ns)
|
| &(&cwq->lock)->r... 4 0 0 0 0
| &(&cwq->lock)->r... 4 0 0 0 0
| key 2 0 0 0 0
| key 2 0 0 0 0
| &(&tty->buf.lock... 2 0 0 0 0
| &(&tty->buf.lock... 2 0 0 0 0
| tty_ldisc_lock 2 0 0 0 0
| &(&base->lock)->... 2 0 0 0 0
| &(&(&(*({ do { c... 2 0 0 0 0
| &(&(&(*({ do { c... 2 0 0 0 0
| &(&cwq->lock)->r... 2 0 0 0 0
I believe that this is convenient feature because perf itself acquires lots of locks.
Could you queue this, Frederic?
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo ...I'm not sure we want such per thread granularity filtering. I'm not sure it will be very useful. But per process yeah. And actually we should do that on tracing time rather than on post-processing. This will lower the tracing overhead a lot. Ideally I think we need: ./perf lock record ls -R / This would trace locks taken by this instance of ls only, ie: drop the -a if we pass a command line. What do you think? --
On 05/07/10 09:49, Frederic Weisbecker wrote: > On Thu, May 06, 2010 at 06:32:56PM +0900, Hitoshi Mitake wrote: >> I implemented the feature of tracking only specified threads to perf lock. >> With -t option, users can specify which threads should be tracked. >> >> Example of usage: >> | % sudo ./perf lock info -t # info -t is convenient with this feature >> | Thread ID: comm >> | 0: swapper >> | 1: init >> | 12: migration/3 >> | 13: ksoftirqd/3 >> | 27: events/0 >> | 28: events/1 >> | 29: events/2 >> | 30: events/3 >> | 31: events/4 >> | 857: kondemand/0 >> | 858: kondemand/1 >> | 859: kondemand/2 >> | ... >> | % sudo ./perf lock -t 27,28,29,30,31 report # track only these threads >> | Name acquired contended total wait (ns) max wait (ns) min wait (ns) > > > > I'm not sure we want such per thread granularity filtering. I'm not > sure it will be very useful. But per process yeah. > > And actually we should do that on tracing time rather than on post-processing. > This will lower the tracing overhead a lot. > > Ideally I think we need: > ./perf lock record ls -R / > > This would trace locks taken by this instance of ls only, ie: drop the -a > if we pass a command line. > > What do you think? > > Ah, I completely agree with your opinion :) Even if user wants to append "-a", ./perf lock record -a ls -R / is enough. "-a" should not in default set of arguments. I'll send the patch to drop "-a" later. Thanks, Hitoshi --
This patch drops "-a" from record_args, which is passed to cmd_record().
Even if user wants to record all lock events during process runs,
perf lock record -a <program> <argument> ...
is enough for this purpose.
This can reduce size of perf.data.
% sudo ./perf lock record whoami
root
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.439 MB perf.data (~19170 samples) ]
% sudo ./perf lock record -a whoami # with -a option
root
[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 48.962 MB perf.data (~2139197 samples) ]
This patch was made on perf/test of random-tracing.git,
could you queue this, Frederic?
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
---
tools/perf/builtin-lock.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index d7dde9c..7f72abc 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -930,7 +930,6 @@ static const struct option lock_options[] = {
static const char *record_args[] = {
"record",
- "-a",
"-R",
"-f",
"-m", "1024",
--
1.6.5.2
--
Thanks, will test it and if it's fine I'll queue. I did a lot of tests these last days to understand what was going on with perf lock, I mean the fact we have various bad locking scenario. So far, the state machine looks rather good. In fact, the real problem is that we don't have every events. We lose a _lot_ of them and that's because the frequency of lock events is too high and perf record can't keep up. I think I'm going to unearth the injection code to reduce the size of these events. --
On 05/09/10 01:14, Frederic Weisbecker wrote: > On Sat, May 08, 2010 at 05:10:29PM +0900, Hitoshi Mitake wrote: >> This patch drops "-a" from record_args, which is passed to cmd_record(). >> >> Even if user wants to record all lock events during process runs, >> perf lock record -a<program> <argument> ... >> is enough for this purpose. >> >> This can reduce size of perf.data. >> >> % sudo ./perf lock record whoami >> root >> [ perf record: Woken up 1 times to write data ] >> [ perf record: Captured and wrote 0.439 MB perf.data (~19170 samples) ] >> % sudo ./perf lock record -a whoami # with -a option >> root >> [ perf record: Woken up 0 times to write data ] >> [ perf record: Captured and wrote 48.962 MB perf.data (~2139197 samples) ] >> >> This patch was made on perf/test of random-tracing.git, >> could you queue this, Frederic? >> >> Cc: Ingo Molnar<mingo@elte.hu> >> Cc: Peter Zijlstra<a.p.zijlstra@chello.nl> >> Cc: Paul Mackerras<paulus@samba.org> >> Cc: Arnaldo Carvalho de Melo<acme@redhat.com> >> Cc: Jens Axboe<jens.axboe@oracle.com> >> Cc: Jason Baron<jbaron@redhat.com> >> Cc: Xiao Guangrong<xiaoguangrong@cn.fujitsu.com> >> Signed-off-by: Hitoshi Mitake<mitake@dcl.info.waseda.ac.jp> > > > Thanks, will test it and if it's fine I'll queue. > > I did a lot of tests these last days to understand what was going on > with perf lock, I mean the fact we have various bad locking scenario. > > So far, the state machine looks rather good. In fact, the real problem > is that we don't have every events. We lose a _lot_ of them and that's > because the frequency of lock events is too high and perf record > can't keep up. Really, I didn't think about lack of events :( > > I think I'm going to unearth the injection code to reduce the size > of these events. > > Yeah, injection will be really helpful thing. And I have a rough idea for reducing event frequency. Many lock event sequences are like this ...
You can observe this by looking at perf trace output and search for long series of the same type of events without the others. For example it's common to find long series of lock_contended events without corresponding lock_acquire/release/acquired. This is because lock_contended is the lowest frequent lock event, when the others have their buffers full, this one still has rooms You're right, we could optimize the lock events sequence layout. What I'm afraid of is to break userspace, but ripping the name from the lock events while introducing injection will break userspace anyway :-( May be we can think about providing new lock events and announce the deprecation of the old ones and remove them later. I'm not sure yet. But summing up in only one event is not possible. Having only one lock_release event (and a lock init for name mapping) is suitable for latency measurements only (timestamp + lock addr + wait time + acquired time). And once you dig into finer grained analysis like latency induced by dependencies (take lock A and then take lock B under A, latency of B depends of A), then you're screwed, because you only know you've released locks at given times but you don't know when you acquired them, hence you can't build a tree of dependencies with sequences inside. Ideally, I think that we need to remove lock contended and only trigger lock_acquired when we contended. We obviously don't need lock_contended/lock_acquired if the lock wasn't contended. We could even remove lock_acquire and only play with lock_acquired, changing a bit the rules, but that will make us lose all the dependencies "intra-lock". I mean there are locks which slowpath are implemented on top of other locks: mutexes use mutex->wait_lock spinlock for example. --
On 05/11/10 15:48, Frederic Weisbecker wrote: >>> >>> I think I'm going to unearth the injection code to reduce the size >>> of these events. >>> >>> >> >> Yeah, injection will be really helpful thing. >> >> And I have a rough idea for reducing event frequency. >> >> Many lock event sequences are like this form: >> * acquire -> acquired -> release >> * acquire -> contended -> acquired -> release >> I think that making 3 or 4 events per each lock sequences >> is waste of CPU time and memory space. >> >> If threads store time of each events >> and make only 1 event at time of release, >> we will be able to reduce lots of time and space. >> >> For example, ID of each lock instance is 8 byte in x86_64. >> In this scheme 8 * 4 byte for ID will be only 8 byte. >> I think this optimization has worth to consider because of >> high frequency of lock events. >> >> How do you think? > > > You're right, we could optimize the lock events sequence layout. > What I'm afraid of is to break userspace, but ripping the name from > the lock events while introducing injection will break userspace anyway :-( Really? For me, at least ripping the name with injection doesn't make bad things for userspace. What does the word "break" mean in this context? > > May be we can think about providing new lock events and announce the > deprecation of the old ones and remove them later. I'm not sure yet. > > But summing up in only one event is not possible. Having only one > lock_release event (and a lock init for name mapping) is suitable > for latency measurements only (timestamp + lock addr + wait time + > acquired time). > And once you dig into finer grained analysis like latency induced > by dependencies (take lock A and then take lock B under A, latency > of B depends of A), then you're screwed, because you only know > you've released locks at given times but you don't know when you > acquired them, hence you can't build a tree of ...
The fact that tools which relied on the name field in the lock events Ok it would be indeed more efficient for what we want with perf lock. But I see drawbacks with this: other people might be interested in only few of these events. Someone may want to count lock_contended events only to produce simple contention stats for example, and this user will have in turn larger traces than he was supposed to with this batch event. I think it's usually better to divide the problems into smaller problems. And here it's about dividing a high level meaning (a lock sequence) into smaller and lower level meanings (a lock event). Following this rule makes tracing much more powerful IMO. A lock acquire event has also an isolated meaning outside the whole lock sequence, like in my above lock_contended example, But still I think it's useful to keep lock_contended and lock_acquired. They have an important meaning alone. --
On 05/13/10 00:51, Frederic Weisbecker wrote: > On Wed, May 12, 2010 at 07:23:23PM +0900, Hitoshi Mitake wrote: >> On 05/11/10 15:48, Frederic Weisbecker wrote: >> >>>>> >>>>> I think I'm going to unearth the injection code to reduce the size >>>>> of these events. >>>>> >>>>> >>>> >>>> Yeah, injection will be really helpful thing. >>>> >>>> And I have a rough idea for reducing event frequency. >>>> >>>> Many lock event sequences are like this form: >>>> * acquire -> acquired -> release >>>> * acquire -> contended -> acquired -> release >>>> I think that making 3 or 4 events per each lock sequences >>>> is waste of CPU time and memory space. >>>> >>>> If threads store time of each events >>>> and make only 1 event at time of release, >>>> we will be able to reduce lots of time and space. >>>> >>>> For example, ID of each lock instance is 8 byte in x86_64. >>>> In this scheme 8 * 4 byte for ID will be only 8 byte. >>>> I think this optimization has worth to consider because of >>>> high frequency of lock events. >>>> >>>> How do you think? >>> >>> >>> You're right, we could optimize the lock events sequence layout. >>> What I'm afraid of is to break userspace, but ripping the name from >>> the lock events while introducing injection will break userspace >> anyway :-( >> >> Really? For me, at least ripping the name with injection >> doesn't make bad things for userspace. >> What does the word "break" mean in this context? > > > The fact that tools which relied on the name field in the lock events > won't work anymore as it will be present on lock_init_class only. Ah, but rewriting perf lock will be required anyway because this is still very alpha program :) > > > >>> >>> May be we can think about providing new lock events and announce the >>> deprecation of the old ones and remove them later. I'm not sure yet. >>> >>> But summing up in only one event is not possible. Having only one >>> ...
Commit-ID: 26242d859c9be9eea61f7f19514e9d272ae8ce26 Gitweb: http://git.kernel.org/tip/26242d859c9be9eea61f7f19514e9d272ae8ce26 Author: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> AuthorDate: Mon, 3 May 2010 14:12:00 +0900 Committer: Frederic Weisbecker <fweisbec@gmail.com> CommitDate: Sun, 9 May 2010 13:45:24 +0200 perf lock: Add "info" subcommand for dumping misc information This adds the "info" subcommand to perf lock which can be used to dump metadata like threads or addresses of lock instances. "map" was removed because info should do the work for it. This will be useful not only for debugging but also for ordinary analyzing. v2: adding example of usage % sudo ./perf lock info -t | Thread ID: comm | 0: swapper | 1: init | 18: migration/5 | 29: events/2 | 32: events/5 | 33: events/6 ... % sudo ./perf lock info -m | Address of instance: name of class | 0xffff8800b95adae0: &(&sighand->siglock)->rlock | 0xffff8800bbb41ae0: &(&sighand->siglock)->rlock | 0xffff8800bf165ae0: &(&sighand->siglock)->rlock | 0xffff8800b9576a98: &p->cred_guard_mutex | 0xffff8800bb890a08: &(&p->alloc_lock)->rlock | 0xffff8800b9522a08: &(&p->alloc_lock)->rlock | 0xffff8800bb8aaa08: &(&p->alloc_lock)->rlock | 0xffff8800bba72a08: &(&p->alloc_lock)->rlock | 0xffff8800bf18ea08: &(&p->alloc_lock)->rlock | 0xffff8800b8a0d8a0: &(&ip->i_lock)->mr_lock | 0xffff88009bf818a0: &(&ip->i_lock)->mr_lock | 0xffff88004c66b8a0: &(&ip->i_lock)->mr_lock | 0xffff8800bb6478a0: &(shost->host_lock)->rlock v3: fixed some problems Frederic pointed out * better rbtree tracking in dump_threads() * removed printf() and used pr_info() and pr_debug() Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Jason Baron ...
Use the new generic sample events reordering from perf kmem,
this drops the need of multiplexing the buffers on record time,
improving the scalability of perf kmem.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Li Zefan <lizf@cn.fujitsu.com>
---
tools/perf/builtin-kmem.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index db474bb..ab906cb 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -335,8 +335,9 @@ static int process_sample_event(event_t *event, struct perf_session *session)
}
static struct perf_event_ops event_ops = {
- .sample = process_sample_event,
- .comm = event__process_comm,
+ .sample = process_sample_event,
+ .comm = event__process_comm,
+ .ordered_samples = true,
};
static double fragmentation(unsigned long n_req, unsigned long n_alloc)
@@ -730,7 +731,6 @@ static const char *record_args[] = {
"record",
"-a",
"-R",
- "-M",
"-f",
"-c", "1",
"-e", "kmem:kmalloc",
--
1.6.2.3
--
Use the new generic sample events reordering from perf trace.
Before that, the displayed traces were ordered as they were
in the input as recorded by perf record (not time ordered).
This makes eventually perf trace displaying the events as beeing
time ordered.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
---
tools/perf/builtin-trace.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 2eefb33..1d034f6 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -108,6 +108,7 @@ static struct perf_event_ops event_ops = {
.event_type = event__process_event_type,
.tracing_data = event__process_tracing_data,
.build_id = event__process_build_id,
+ .ordered_samples = true,
};
extern volatile int session_done;
--
1.6.2.3
--
Use the new generic sample events reordering from perf sched,
this drops the need of multiplexing the buffers on record time,
improving the scalability of perf sched.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
---
tools/perf/builtin-sched.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 09ddc8e..94453f1 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -1651,9 +1651,10 @@ static int process_lost_event(event_t *event __used,
}
static struct perf_event_ops event_ops = {
- .sample = process_sample_event,
- .comm = event__process_comm,
- .lost = process_lost_event,
+ .sample = process_sample_event,
+ .comm = event__process_comm,
+ .lost = process_lost_event,
+ .ordered_samples = true,
};
static int read_events(void)
@@ -1850,7 +1851,6 @@ static const char *record_args[] = {
"record",
"-a",
"-R",
- "-M",
"-f",
"-m", "1024",
"-c", "1",
--
1.6.2.3
--
The sample events recorded by perf record are not time ordered because we have one buffer per cpu for each event (even demultiplexed per task/per cpu for task bound events). But when we read trace events we want them to be ordered by time because many state machines are involved. There are currently two ways perf tools deal with that: - use -M to multiplex every buffers (perf sched, perf kmem) But this creates a lot of contention in SMP machines on record time. - use a post-processing time reordering (perf timechart, perf lock) The reordering used by timechart is simple but doesn't scale well with huge flow of events, in terms of performance and memory use (unusable with perf lock for example). Perf lock has its own samples reordering that flushes its memory use in a regular basis and that uses a sorting based on the previous event queued (a new event to be queued is close to the previous one most of the time). This patch proposes to export perf lock's samples reordering facility to the session layer that reads the events. So if a tool wants to get ordered sample events, it needs to set its struct perf_event_ops::ordered_samples to true and that's it. This prepares tracing based perf tools to get rid of the need to use buffers multiplexing (-M) or to implement their own reordering. Also lower the flush period to 2 as it's sufficient already. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Ingo Molnar <mingo@elte.hu> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Tom Zanussi <tzanussi@gmail.com> --- tools/perf/builtin-lock.c | 197 +++++---------------------------------------- tools/perf/util/session.c | 179 ++++++++++++++++++++++++++++++++++++++++- tools/perf/util/session.h | 10 +++ 3 files changed, 210 insertions(+), 176 deletions(-) diff ...
Use the new generic sample events reordering from perf timechart,
this drops the ad hoc sample reordering it was using before.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
---
tools/perf/builtin-timechart.c | 112 ++--------------------------------------
1 files changed, 5 insertions(+), 107 deletions(-)
diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index 96f4a09..c35aa44 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -143,9 +143,6 @@ struct wake_event {
static struct power_event *power_events;
static struct wake_event *wake_events;
-struct sample_wrapper *all_samples;
-
-
struct process_filter;
struct process_filter {
char *name;
@@ -566,88 +563,6 @@ static void end_sample_processing(void)
}
}
-static u64 sample_time(event_t *event, const struct perf_session *session)
-{
- int cursor;
-
- cursor = 0;
- if (session->sample_type & PERF_SAMPLE_IP)
- cursor++;
- if (session->sample_type & PERF_SAMPLE_TID)
- cursor++;
- if (session->sample_type & PERF_SAMPLE_TIME)
- return event->sample.array[cursor];
- return 0;
-}
-
-
-/*
- * We first queue all events, sorted backwards by insertion.
- * The order will get flipped later.
- */
-static int queue_sample_event(event_t *event, struct perf_session *session)
-{
- struct sample_wrapper *copy, *prev;
- int size;
-
- size = event->sample.header.size + sizeof(struct sample_wrapper) + 8;
-
- copy = malloc(size);
- if (!copy)
- return 1;
-
- memset(copy, 0, size);
-
- copy->next = NULL;
- copy->timestamp = sample_time(event, session);
-
- memcpy(&copy->data, event, ...From: William Cohen <wcohen@redhat.com>
asciidoc does not allow the "===" to be longer than the line
above it.
Also fix a couple types and formatting errors.
Signed-off-by: William Cohen <wcohen@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <4BD204C5.9000504@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
tools/perf/Documentation/perf-kvm.txt | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/tools/perf/Documentation/perf-kvm.txt b/tools/perf/Documentation/perf-kvm.txt
index 93400a0..d004e19 100644
--- a/tools/perf/Documentation/perf-kvm.txt
+++ b/tools/perf/Documentation/perf-kvm.txt
@@ -1,5 +1,5 @@
perf-kvm(1)
-==============
+===========
NAME
----
@@ -41,9 +41,9 @@ There are a couple of variants of perf kvm:
OPTIONS
-------
--host=::
- Collect host side perforamnce profile.
+ Collect host side performance profile.
--guest=::
- Collect guest side perforamnce profile.
+ Collect guest side performance profile.
--guestmount=<path>::
Guest os root file system mount directory. Users mounts guest os
root directories under <path> by a specific filesystem access method,
@@ -64,4 +64,5 @@ OPTIONS
SEE ALSO
--------
-linkperf:perf-top[1] perf-record[1] perf-report[1] perf-diff[1] perf-buildid-list[1]
+linkperf:perf-top[1], linkperf:perf-record[1], linkperf:perf-report[1],
+linkperf:perf-diff[1], linkperf:perf-buildid-list[1]
--
1.6.2.3
--
To ensure sample events time reordering is reliable, add a -d option
to perf trace to check that automatically.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
---
tools/perf/builtin-trace.c | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 1d034f6..77f556f 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -11,6 +11,8 @@
static char const *script_name;
static char const *generate_script_lang;
+static bool debug_ordering;
+static u64 last_timestamp;
static int default_start_script(const char *script __unused,
int argc __unused,
@@ -87,6 +89,14 @@ static int process_sample_event(event_t *event, struct perf_session *session)
}
if (session->sample_type & PERF_SAMPLE_RAW) {
+ if (debug_ordering) {
+ if (data.time < last_timestamp) {
+ pr_err("Samples misordered, previous: %llu "
+ "this: %llu\n", last_timestamp,
+ data.time);
+ }
+ last_timestamp = data.time;
+ }
/*
* FIXME: better resolve from pid from the struct trace_entry
* field, although it should be the same than this perf
@@ -532,6 +542,8 @@ static const struct option options[] = {
"generate perf-trace.xx script in specified language"),
OPT_STRING('i', "input", &input_name, "file",
"input file name"),
+ OPT_BOOLEAN('d', "debug-ordering", &debug_ordering,
+ "check that samples time ordering is monotonic"),
OPT_END()
};
--
1.6.2.3
--
Hi Frederic, Just out of curiosity, is there any possibility to add other debugging options to perf-trace? If so, I think we can pass some arguments for this -d option, e.g. "-d ordering,events" and so on. Just an idea. Thank you, -- Masami Hiramatsu e-mail: mhiramat@redhat.com --
Sure, if somebody wants to come with another debugging option in perf trace, no problem. --
I forgot to highlight some things here. - The -M option is not used anymore. Well actually I just checked and it's used by the record perl/python scripts. But it's not needed there anymore, so I'll drop it in another pass. But globally it's over with the buffers multiplexing needs. - But I haven't plugged the reordering thing to the live mode, because I'm not sure exactly if that would be welcome. With live mode we want the events as they arrive, using the reordering there would make it get the events per bunches of 2 seconds slices. I guess we'll figure out a solution for that. - Perf lock gets into a better shape. There is still some work to make it truly usable though. I need to unearth the event injection thing to lower the size of the events, profile by lock classes, etc... - Various important fixes Thanks. --
Pulled, thanks a lot Frederic! Ingo --
