Andi,
On Fri, Nov 16, 2007 at 04:15:56PM +0100, Andi Kleen wrote:
No, he is talking about something similar to what was in perfctr.
The kernel emulates 64-bit counters in software and that is you
get back when you read the counters. If you read via RDPMC, you
get 40 bits. To reconstruct the full 64-bit value from user land
you need the upper bits. One approach is for the kernel to allow
you to remap a page that has the 64-bit (software) counters. With
that and a bit of mask/shifting you can reconstruct the full value.
What I dropped is the cr4.pce enabled for self-monitoring sessions.
Read my follow-up message to Dean's message.
Perfmon2 allows you to have an in-kernel sampling buffer. The idea is
not new, Oprofile has this as well. The problem here is that if the
buffer is in the kernel the format of the samples is fixed and it
should have to. Tools may want to record samples in different formats
and as you said some may need extra information gathered in the kernel.
Some may want to aggregate samples in the kernel (Oprofile used to
do that), some may want to use a double-buffer approach to minimize
blind spots, others may simply use the counter overflow mechanism to
record something that is non-PMU related, e.g, kernel call stack.
I have built such a module and it was quite interesting to collect
the call stack when you hit a last cache level miss.
The idea behind customizable sampling format is simple: extract the
format from the perfmon core and put this into a kernel module. The
core provides a simple registration mechanism and the two communicate
via a set of callbacks.
Perfmon2 comes with a basic default format which works on all
platforms. But it is possible to develop others without having to
patch the kernel nor recompile nor reboot. At its core, each format provides
a handler routine which is called on counter overflow. The handler routine
controls what is recorded, how it is recorded, how it is exported to
userland, and wheher overflow notifications need to be sent.
Using this mechanism, for instance, we were able to connect the
Oprofile kernel code to perfmon2 on Itanium with a 100 lines of
code. The exact same approach would also work on X86 Oprofile as well.
This is also how we support PEBS because, as you said, the format of the
samples is not under your control. if you want zero-copy PEBS support,
you have to follow the PEBS format.
I am sure other processors haev and will have hardware buffers as well.
Yes, you could do that without changing the core implementation of
perfmon2.
--
-Stephane
-