This is the latest version of the perf offcore / extra regs patchkit. I addressed all earlier comments I believe and this should be ready for merge now. - Allocating multiple registers is supported now - I kept the percore allocation because the manual kmalloc method turned out to be more complicated and alloc_percpu() really already did what I wanted. However I switched to using the CPU notifier to set up the topology for hotplug, together with a bugfix to make sure x86 sets up the topology before calling it. - A few other improvements. -Andi --
From: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> --- tools/perf/Documentation/perf-list.txt | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt index 399751b..700afd7 100644 --- a/tools/perf/Documentation/perf-list.txt +++ b/tools/perf/Documentation/perf-list.txt @@ -58,6 +58,13 @@ raw encoding of 0x1A8 can be used: perf stat -e r1a8 -a sleep 1 perf record -e r1a8 ... +Some special events on x86 encode additional data in the normally unused +[32;63] bits of the raw value. This is particularly used for +the OFFCORE_RESPONSE events on Intel Core i7. The 16bit +mask in the OFFCORE_RESPONSE register is put into bits [32;48]. +For example the OFFCORE_RESPONSE_0.ANY_DATA.ANY_CACHE_DRAM event +is encoded as r00007f11004301b7. + You should refer to the processor specific documentation for getting these details. Some of them are referenced in the SEE ALSO section below. -- 1.7.1 --
From: Andi Kleen <ak@linux.intel.com> Intel Nehalem/Westmere have a special OFFCORE_RESPONSE event that can be used to monitor any offcore accesses from a core. This is a very useful event for various tunings, and it's also needed to implement the generic LLC-* events correctly. Unfortunately this event requires programming a mask in a separate register. And worse this separate register is per core, not per CPU thread. This patch adds: - Teaches perf_events that OFFCORE_RESPONSE needs extra parameters. The extra parameters are passed by user space in the unused upper 32bits of the config word. - Add support to the Intel perf_event core to schedule per core resources. This adds fairly generic infrastructure that can be also used for other per core resources. The basic code has is patterned after the similar AMD northbridge constraints code. Thanks to Stephane Eranian who pointed out some problems in the original version and suggested improvements. Full git tree: git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc-2.6.git perf-offcore2 Cc: eranian@google.com v2: Lots of updates based on review feedback. Also fixes some issues v3: Fix hotplug. Handle multiple extra registers. Fix init order. Various improvements. Signed-off-by: Andi Kleen <ak@linux.intel.com> --- arch/x86/kernel/cpu/perf_event.c | 70 ++++++++++++ arch/x86/kernel/cpu/perf_event_intel.c | 192 ++++++++++++++++++++++++++++++++ include/linux/perf_event.h | 2 + 3 files changed, 264 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c index ed63101..346006a 100644 --- a/arch/x86/kernel/cpu/perf_event.c +++ b/arch/x86/kernel/cpu/perf_event.c @@ -93,6 +93,8 @@ struct amd_nb { struct event_constraint event_constraints[X86_PMC_IDX_MAX]; }; +struct intel_percore; + #define MAX_LBR_ENTRIES 16 struct cpu_hw_events { @@ -127,6 +129,13 @@ struct cpu_hw_events { struct perf_branch_stack lbr_stack; ...
Stuff like that shouldn't be mixed in with the tags and usually goes below the --- separator since its not supposed to end up in the Just wondering, shouldn't we program the extra msr _before_ we flip the I've got a patch moving the whole pmu init to early_initcall(), which is after mm_init() so it would actually work. --
So do you want to make this patchkit depend on that patch? Or just integrate it and then change later? -Andi -- ak@linux.intel.com -- Speaking for myself only. --
Doesn't really matter, I can fix it up afterwards, just wanted to let you know.. I can also flip that enable thing. I'll take these 4 patches and fixup these things and then see what falls out ;-) --
OK, that initcall patch hit -tip and Ingo wants that allocation stuff fixed up before committing those patches. Could you run another version of these patches (you can ignore the checkpatch output that results from the non-std struct initialization layout)? Just make the allocation path like the AMD NB stuff, use the cpu_prepare and cpu_dead hooks to alloc/free and add a refcount in the percpu object. --
While looking at the AMD code, I think we can do away with amd_nb_lock, all the hotplug stuff is serialized on the hotplug mutex anyway. --
WARNING: please, no space before tabs #59: FILE: arch/x86/kernel/cpu/perf_event.c:136: +^Iint ^I^I^I^Ipercore_used; /* Used by this CPU? */$ WARNING: please, no space before tabs #80: FILE: arch/x86/kernel/cpu/perf_event.c:198: +^Iu64 ^I^I^Iconfig_mask;$ WARNING: please, no space before tabs #81: FILE: arch/x86/kernel/cpu/perf_event.c:199: +^Iu64 ^I^I^Ivalid_mask;$ WARNING: please, no space before tabs #85: FILE: arch/x86/kernel/cpu/perf_event.c:203: +^I.event = (e), ^I\$ WARNING: please, no space before tabs #86: FILE: arch/x86/kernel/cpu/perf_event.c:204: +^I.msr = (ms),^I ^I\$ WARNING: please, no space before tabs #87: FILE: arch/x86/kernel/cpu/perf_event.c:205: +^I.config_mask = (m), ^I\$ WARNING: please, no space before tabs #177: FILE: arch/x86/kernel/cpu/perf_event_intel.c:12: +^Iint ^I^Iref;^I^I/* reference count */$ WARNING: please, no space before tabs #179: FILE: arch/x86/kernel/cpu/perf_event_intel.c:14: +^Iu64 ^I^Iextra_config;^I/* extra MSR config */$ WARNING: please, no space before tabs #183: FILE: arch/x86/kernel/cpu/perf_event_intel.c:18: +^Iraw_spinlock_t ^I^Ilock;^I^I/* protect structure */$ WARNING: please, no space before tabs #184: FILE: arch/x86/kernel/cpu/perf_event_intel.c:19: +^Istruct er_account ^Iregs[MAX_EXTRA_REGS];$ WARNING: please use device_initcall() instead of __initcall() #408: FILE: arch/x86/kernel/cpu/perf_event_intel.c:1108: +__initcall(init_intel_percore); Fixed those up, please be more careful next time. --
This "separate register" is MSR_OFFCORE_RSP_0, right? But from the SDM, MSR_OFFCORE_RSP_0 is "thread" scope, see SDM 3b, Appendix B.4 MSRS IN THE INTEL® MICROARCHITECTURE CODENAME NEHALEM Or am I missing some obvious thing? Thanks, Lin Ming --
I thought we agreed it made more sense to program the extra msr before enabling the counter. --
Ah sorry, I seem to have mixed up the various versions. The latest does indeed do as we agreed. Sorry for the noise. --
From: Andi Kleen <ak@linux.intel.com>
The generic perf LLC-* events do count the L2 caches, not the real
L3 LLC on Intel Nehalem and Westmere. This lead to quite some confusion.
Fixing this properly requires use of the special OFFCORE_RESPONSE
events which need a separate mask register. This has been implemented
in a earlier patch.
Now use this infrastructure to set correct events for the LLC-*
on Nehalem and Westmere
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
arch/x86/kernel/cpu/perf_event.c | 12 ++++----
arch/x86/kernel/cpu/perf_event_intel.c | 46 +++++++++++++++++++++++---------
2 files changed, 39 insertions(+), 19 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 346006a..b1abe89 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -467,8 +467,9 @@ static inline int x86_pmu_initialized(void)
}
static inline int
-set_ext_hw_attr(struct hw_perf_event *hwc, struct perf_event_attr *attr)
+set_ext_hw_attr(struct hw_perf_event *hwc, struct perf_event *event)
{
+ struct perf_event_attr *attr = &event->attr;
unsigned int cache_type, cache_op, cache_result;
u64 config, val;
@@ -494,9 +495,8 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct perf_event_attr *attr)
if (val == -1)
return -EINVAL;
- hwc->config |= val;
-
- return 0;
+ hwc->config |= val & X86_RAW_EVENT_MASK;
+ return x86_pmu_extra_regs(val, event);
}
static int x86_setup_perfctr(struct perf_event *event)
@@ -521,10 +521,10 @@ static int x86_setup_perfctr(struct perf_event *event)
}
if (attr->type == PERF_TYPE_RAW)
- return 0;
+ return x86_pmu_extra_regs(event->attr.config, event);
if (attr->type == PERF_TYPE_HW_CACHE)
- return set_ext_hw_attr(hwc, attr);
+ return set_ext_hw_attr(hwc, event);
if (attr->config >= x86_pmu.max_events)
return -EINVAL;
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c ...From: Andi Kleen <ak@linux.intel.com> When booting up a CPU set the various topology masks before calling the CPU_STARTING notifier. This way the notifier can actually use the masks. This is needed for a perf change. Signed-off-by: Andi Kleen <ak@linux.intel.com> --- arch/x86/kernel/smpboot.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 083e99d..9d2980e 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -281,6 +281,10 @@ static void __cpuinit smp_callin(void) */ smp_store_cpu_info(cpuid); + /* This must be done before setting cpu_online_mask */ + set_cpu_sibling_map(raw_smp_processor_id()); + wmb(); + notify_cpu_starting(cpuid); /* @@ -322,10 +326,6 @@ notrace static void __cpuinit start_secondary(void *unused) legacy_pic->unmask(0); } - /* This must be done before setting cpu_online_mask */ - set_cpu_sibling_map(raw_smp_processor_id()); - wmb(); - /* * We need to hold call_lock, so there is no inconsistency * between the time smp_call_function() determines number of -- 1.7.1 --
Here's an updated patch.
x86: set cpu masks before calling CPU_STARTING notifiers
When booting up a CPU set the various topology masks before
calling the CPU_STARTING notifier. This way the notifier
can actually use the masks.
This is needed for a perf change.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 083e99d..c019c03 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -281,6 +281,13 @@ static void __cpuinit smp_callin(void)
*/
smp_store_cpu_info(cpuid);
+ /*
+ * This must be done before setting cpu_online_mask
+ * or calling notify_cpu_starting.
+ */
+ set_cpu_sibling_map(raw_smp_processor_id());
+ wmb();
+
notify_cpu_starting(cpuid);
/*
@@ -322,10 +329,6 @@ notrace static void __cpuinit start_secondary(void *unused)
legacy_pic->unmask(0);
}
- /* This must be done before setting cpu_online_mask */
- set_cpu_sibling_map(raw_smp_processor_id());
- wmb();
-
/*
* We need to hold call_lock, so there is no inconsistency
* between the time smp_call_function() determines number of
--
Commit-ID: 5ef428c4b5950dddce7311e84321abb3aff7ebb0 Gitweb: http://git.kernel.org/tip/5ef428c4b5950dddce7311e84321abb3aff7ebb0 Author: Andi Kleen <ak@linux.intel.com> AuthorDate: Thu, 18 Nov 2010 11:47:31 +0100 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Fri, 26 Nov 2010 15:14:56 +0100 x86: Set cpu masks before calling CPU_STARTING notifiers When booting up a CPU set the various topology masks before calling the CPU_STARTING notifier. This way the notifier can actually use the masks. This is needed for a perf change. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290077254-12165-2-git-send-email-andi@firstfloor.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- arch/x86/kernel/smpboot.c | 11 +++++++---- 1 files changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index f0a0624..68f61ac 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -281,6 +281,13 @@ static void __cpuinit smp_callin(void) */ smp_store_cpu_info(cpuid); + /* + * This must be done before setting cpu_online_mask + * or calling notify_cpu_starting. + */ + set_cpu_sibling_map(raw_smp_processor_id()); + wmb(); + notify_cpu_starting(cpuid); /* @@ -316,10 +323,6 @@ notrace static void __cpuinit start_secondary(void *unused) */ check_tsc_sync_target(); - /* This must be done before setting cpu_online_mask */ - set_cpu_sibling_map(raw_smp_processor_id()); - wmb(); - /* * We need to hold call_lock, so there is no inconsistency * between the time smp_call_function() determines number of --
