This patchkit implements architectural perfmon support in oprofile. This allows to do generic profiling of a few standard events in all newer Intel CPUs, including Atom and Nehalem. The CPU describes its event in CPUID so they can be used without knowing anything about the CPU. The code requires some changes to the oprofile userland, which I am posting separately to the oprofile list. -Andi --
From: Andi Kleen <ak@linux.intel.com>
allow to modify it at runtime
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
arch/x86/oprofile/op_x86_model.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/oprofile/op_x86_model.h b/arch/x86/oprofile/op_x86_model.h
index 45b605f..575e08e 100644
--- a/arch/x86/oprofile/op_x86_model.h
+++ b/arch/x86/oprofile/op_x86_model.h
@@ -32,8 +32,8 @@ struct pt_regs;
* various x86 CPU models' perfctr support.
*/
struct op_x86_model_spec {
- unsigned int const num_counters;
- unsigned int const num_controls;
+ unsigned int num_counters;
+ unsigned int num_controls;
void (*fill_in_addresses)(struct op_msrs * const msrs);
void (*setup_ctrs)(struct op_msrs const * const msrs);
int (*check_ctrs)(struct pt_regs * const regs,
--
1.5.6
--
From: Andi Kleen <ak@linux.intel.com> Newer Intel CPUs (Core1+) have support for architectural events described in CPUID 0xA. See the IA32 SDM Vol3b.18 for details. The advantage of this is that it can be done without knowing about the specific CPU, because the CPU describes by itself what performance events are supported. This is only a fallback because only a limited set of 6 events are supported. This allows to do profiling on Nehalem and on Atom systems (later not tested) This patch implements support for that in oprofile's Intel Family 6 profiling module. It also has the advantage of supporting an arbitary number of events now as reported by the CPU. Also allow arbitary counter widths >32bit while we're at it. Requires a patched oprofile userland to support the new architecture. Signed-off-by: Andi Kleen <ak@linux.intel.com> --- Documentation/kernel-parameters.txt | 5 ++ arch/x86/oprofile/nmi_int.c | 32 +++++++++-- arch/x86/oprofile/op_model_ppro.c | 104 +++++++++++++++++++++++++++------- arch/x86/oprofile/op_x86_model.h | 3 + 4 files changed, 116 insertions(+), 28 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 056742c..10c8b1b 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1486,6 +1486,11 @@ and is between 256 and 4096 characters. It is defined in the file oprofile.timer= [HW] Use timer interrupt instead of performance counters + oprofile.force_arch_perfmon=1 [X86] + Force use of architectural perfmon performance counters + in oprofile on Intel CPUs. The kernel selects the + correct default on its own. + osst= [HW,SCSI] SCSI Tape Driver Format: <buffer_size>,<write_threshold> See also Documentation/scsi/st.txt. diff --git a/arch/x86/oprofile/nmi_int.c b/arch/x86/oprofile/nmi_int.c index 36d2f92..6438c32 100644 --- a/arch/x86/oprofile/nmi_int.c +++ b/arch/x86/oprofile/nmi_int.c @@ -430,6 ...
From: Andi Kleen <ak@linux.intel.com> This essentially reverts Linus' earlier 4b9f12a3779c548b68bc9af7d94030868ad3aa1b commit. Nehalem is not core_2, so it shouldn't be reported as such. However with the earlier arch perfmon patch it will fall back to arch perfmon mode now, so there is no need to fake it as core_2. The only drawback is that Linus will need to patch the arch perfmon support into his oprofile binary now, but I think he can do that. Signed-off-by: Andi Kleen <ak@linux.intel.com> --- arch/x86/oprofile/nmi_int.c | 3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/arch/x86/oprofile/nmi_int.c b/arch/x86/oprofile/nmi_int.c index 6438c32..669a713 100644 --- a/arch/x86/oprofile/nmi_int.c +++ b/arch/x86/oprofile/nmi_int.c @@ -418,9 +418,6 @@ static int __init ppro_init(char **cpu_type) case 15: case 23: *cpu_type = "i386/core_2"; break; - case 26: - *cpu_type = "i386/core_2"; - break; default: /* Unknown */ return 0; -- 1.5.6 --
From: Andi Kleen <ak@linux.intel.com> It's actually useless now, but document it anyways. Signed-off-by: Andi Kleen <ak@linux.intel.com> --- Documentation/kernel-parameters.txt | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 10c8b1b..5e77e1a 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1491,6 +1491,12 @@ and is between 256 and 4096 characters. It is defined in the file in oprofile on Intel CPUs. The kernel selects the correct default on its own. + oprofile.p4force=1 [X86] + On Intel NetBurst CPUs assume new models are compatible + to older ones. This might allow oprofile to be used when + the kernel doesn't know the CPU, but is slightly dangerous. + Should be obsolete by now. + osst= [HW,SCSI] SCSI Tape Driver Format: <buffer_size>,<write_threshold> See also Documentation/scsi/st.txt. -- 1.5.6 --
We should rework/remove this (maybe later), if it makes no longer sense to keep it. If we had have a force_cpu_type implementation this could be thrown away. But as long as it is in it's better to have it documented. -- Advanced Micro Devices, Inc. Operating System Research Center email: robert.richter@amd.com --
I will send this patch upstream together with the architectural perfmon implementation and when the userland part is upstream. -- Advanced Micro Devices, Inc. Operating System Research Center email: robert.richter@amd.com --
Could you create a separate patch that introduces this new kernel parameter? This would make it easier to send all other changes upstream. We already discussed the need of this parameter. Maybe it would fit better to have a more generalized paramater for this that could be reused then by other archs/models as well. Something like force_pmu_detection that could be used for all new CPUs (also other models) that do not yet have a specific kernel implementation. Even better would a sysfs entry instead with that we can specify which cpu type to use: echo "i386/arch_perfmon" > /sys/module/oprofile/parameters/cpu_type That would allow us to switch the pmu at runtime and also from the Put this to an init function of op_x86_model_spec. Then it could be also static. -- Advanced Micro Devices, Inc. Operating System Research Center email: robert.richter@amd.com --
The parameter only makes sense together with something which uses it. So an additional one liner patch ( + docs) would be a patch depending on the earlier arch perfmon patch. If you want that really I can do it, but frankly it doesn't make sense to me. It's only really a debugging feature, I can also just take it out I thought the result of the discussion was that it was not useful because there's no equivalent on arch perfmon on any other x86 CPUs? You mean something like pmu=<oprofile arch string> to force Switching at runtime would be complicated changes I think Also --
I think this would be the best solution, providing a parameter oprofile.force_pmu=<oprofile arch string> This can easily be implemented and also reused by others. I would be Right, this is overhead nobody will use. -Robert -- Advanced Micro Devices, Inc. Operating System Research Center email: robert.richter@amd.com --
Ok I can implement that, but it'll be a separate patch. Might be until next week that I can work on it though. -Andi --
Thanks Andi, -- Advanced Micro Devices, Inc. Operating System Research Center email: robert.richter@amd.com --
Andi, I have uploaded all pending OProfile patches to my kernel.org repository. As we already talked about this, there are changes in that implement model specific init/exit functions. Please change your patch in a way that it uses these functions. This will make your implementation cleaner. I will also send some more comments to the patches itself. It would help me if you could send the new patches relative to my tree. Thanks a lot, -Robert -- Advanced Micro Devices, Inc. Operating System Research Center email: robert.richter@amd.com --
