I recently noticed that the new reordering design is broken when it deals with tons of events. This patchset provides another algorithm to deal with that, tested without any problem. And since it involves more frequent flushes, I guess it could plug nicely with the live mode. Frederic Weisbecker (2): perf: Introduce a new "round of buffers read" pseudo event perf: Provide a new deterministic events reordering algorithm tools/perf/builtin-record.c | 34 ++++++++++---- tools/perf/util/event.h | 3 +- tools/perf/util/session.c | 106 +++++++++++++++++++++++++++++++------------ tools/perf/util/session.h | 36 +++++++++------ 4 files changed, 123 insertions(+), 56 deletions(-) --
In order to provide a more robust and deterministic reordering
algorithm, we need to know when we reach a point where we just
did a pass through every counter buffers to read everything they
had.
This patch introduces a new PERF_RECORD_FINISHED_ROUND pseudo event
that only consists in an event header and doesn't need to contain
anything.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
---
tools/perf/builtin-record.c | 34 ++++++++++++++++++++++++----------
tools/perf/util/event.h | 3 ++-
2 files changed, 26 insertions(+), 11 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 0ff67d1..96bbf42 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -494,6 +494,29 @@ static void event__synthesize_guest_os(struct machine *machine, void *data)
" relocation symbol.\n", machine->pid);
}
+static struct perf_event_header finished_round_event = {
+ .size = sizeof(struct perf_event_header),
+ .type = PERF_RECORD_FINISHED_ROUND,
+};
+
+static void mmap_read_all(void)
+{
+ int i, counter, thread;
+
+ for (i = 0; i < nr_cpu; i++) {
+ for (counter = 0; counter < nr_counters; counter++) {
+ for (thread = 0; thread < thread_num; thread++) {
+ if (mmap_array[i][counter][thread].base)
+ mmap_read(&mmap_array[i][counter][thread]);
+ }
+
+ }
+ }
+
+ if (perf_header__has_feat(&session->header, HEADER_TRACE_INFO))
+ write_output(&finished_round_event, sizeof(finished_round_event));
+}
+
static int __cmd_record(int argc, const char **argv)
{
int i, counter;
@@ -735,16 +758,7 @@ static int __cmd_record(int argc, const char **argv)
int hits = samples;
int thread;
- for (i = 0; i < nr_cpu; i++) {
- for (counter = 0; ...The current events reordering algorithm is based on a heuristic that
gets broken once we deal with a very fast flow of events and/or with
a lot of cpus.
Indeed the time period based flushing is not suitable anymore
in the following case, assuming we have a flush period of two
seconds.
CPU 0 | CPU 1
|
cnt1 timestamps | cnt1 timestamps
|
0 | 0
1 | 1
2 | 2
3 | 3
[...] | [...]
10 seconds later
If we spend too much time to read the buffers (case of a lot of
events to record in each buffers or when we have a lot of CPU buffers
to read), in the next pass the CPU 0 buffer could contain a slice
of several seconds of events. We'll read them all and notice we've
reached the period to flush. In the above example we flush the first
half of the CPU 0 buffer, then we read the CPU 1 buffer where we
have events that were on the flush slice and then the reordering
fails.
It's simple to reproduce with:
perf lock record perf bench sched messaging
To solve this, we use a new solution that doesn't rely on a
heuristical time slice period anymore but on a deterministic basis
based on how perf record does its job.
perf record saves the buffers through passes. A pass is a tour
on every buffers from every CPUs. This is made in order: for
each CPU we read the buffers of every counters. So the more
buffers we visit, the later will be the timestamps of their events.
When perf record finishes a pass it records a
PERF_RECORD_FINISHED_ROUND pseudo event.
We record the max timestamp t found in the pass n. Assuming these
timestamps are monotonic across cpus, we know that if a buffer
still has events with timestamps below t, they will be all available
and then read in the pass n + 1.
Hence when we start to read the pass n + 2, we can safely flush every
events with timestamps below t.
...Hi Frederic,
Very nice! I tried these out with live mode and it seems to work fine,
after applying the patch below.
I initially had a problem with 'unexpected end of event stream' errors,
then noticed that the FINISHED_ROUND events were basically just headers
with no data, which caused the read of the 0-length payload to appear as
end-of-stream.
I'll do some more testing (and fix some warnings in the scripts that
this mode seems to elicit), but it seems so far to work pretty well for
live mode...
Tom
From: Tom Zanussi <tzanussi@gmail.com>
Date: Wed, 5 May 2010 00:27:40 -0500
Subject: [PATCH] perf/live-mode: handle payload-less events
Some events, such as the PERF_RECORD_FINISHED_ROUND event consist of
only an event header and no data. In this case, a 0-length payload
will be read, and the 0 return value will be wrongly interpreted as an
'unexpected end of event stream'.
This patch allows for proper handling of data-less events by skipping
0-length reads.
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
---
tools/perf/util/session.c | 19 +++++++++++--------
1 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 9401909..00ab298 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -696,15 +696,18 @@ more:
p = &event;
p += sizeof(struct perf_event_header);
- err = do_read(self->fd, p, size - sizeof(struct perf_event_header));
- if (err <= 0) {
- if (err == 0) {
- pr_err("unexpected end of event stream\n");
- goto done;
- }
+ if (size - sizeof(struct perf_event_header)) {
+ err = do_read(self->fd, p,
+ size - sizeof(struct perf_event_header));
+ if (err <= 0) {
+ if (err == 0) {
+ pr_err("unexpected end of event stream\n");
+ goto done;
+ }
- pr_err("failed to read event data\n");
- goto out_err;
+ pr_err("failed to read event data\n");
+ goto out_err;
+ }
}
if (size == 0 ||
--
1.6.4.GIT
--
Cool, I'm applying your patch then, --
Commit-ID: 794e43b56c18b95fc9776c914a2659e7d558a352 Gitweb: http://git.kernel.org/tip/794e43b56c18b95fc9776c914a2659e7d558a352 Author: Tom Zanussi <tzanussi@gmail.com> AuthorDate: Wed, 5 May 2010 00:27:40 -0500 Committer: Frederic Weisbecker <fweisbec@gmail.com> CommitDate: Sun, 9 May 2010 13:49:52 +0200 perf/live-mode: Handle payload-less events Some events, such as the PERF_RECORD_FINISHED_ROUND event consist of only an event header and no data. In this case, a 0-length payload will be read, and the 0 return value will be wrongly interpreted as an 'unexpected end of event stream'. This patch allows for proper handling of data-less events by skipping 0-length reads. Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Masami Hiramatsu <mhiramat@redhat.com> LKML-Reference: <1273038527.6383.51.camel@tropicana> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> --- tools/perf/util/session.c | 19 +++++++++++-------- 1 files changed, 11 insertions(+), 8 deletions(-) diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 9401909..00ab298 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -696,15 +696,18 @@ more: p = &event; p += sizeof(struct perf_event_header); - err = do_read(self->fd, p, size - sizeof(struct perf_event_header)); - if (err <= 0) { - if (err == 0) { - pr_err("unexpected end of event stream\n"); - goto done; - } + if (size - sizeof(struct perf_event_header)) { + err = do_read(self->fd, p, + size - sizeof(struct perf_event_header)); + if (err <= 0) { + if (err == 0) { + pr_err("unexpected end of event stream\n"); + goto done; + } - pr_err("failed to read event data\n"); - goto out_err; + pr_err("failed to read event data\n"); + goto out_err; + } } if (size == 0 || --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian | [RFC/PATCH] RE: git calls SSH_ASKPASS even if DISPLAY is not set |
| Junio C Hamano | Re: Rss produced by git is not valid xml? |
| Linux Kernel Mailing List | iSeries: fix section mismatch in iseries_veth |
| Linux Kernel Mailing List | ixbge: remove TX lock and redo TX accounting. |
| Linux Kernel Mailing List | ixgbe: fix several counter register errata |
| Linux Kernel Mailing List | b43: fix build with CONFIG_SSB_PCIHOST=n |
| Linux Kernel Mailing List | 9p: block-based virtio client |
| Michael Breuer | Re: [PATCH] af_packet: Don |
