Re: [PATCH 3/9] oprofile, perf, x86: introduce new functions to reserve perfctrs by index

Previous thread: [BUG] Linux-2.6.33 to 33-git9 intel_drm: drm_fb_helper panic by Tarkan Erimer on Thursday, March 4, 2010 - 7:34 am. (3 messages)

Next thread: [PATCH] V4L/DVB: mx1-camera: compile fix by =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= on Thursday, March 4, 2010 - 9:54 am. (13 messages)
From: Robert Richter
Date: Thursday, March 4, 2010 - 8:22 am

This patch set improves the perfctr reservation code. New functions
are available to reserve a counter by its index only. It is no longer
necessary to allocate both msrs of a counter which also improves the
code and makes it easier.

For oprofile a handler is implemented that returns an error now if a
counter is already reserved by a different subsystem such as perf or
watchdog. Before, oprofile silently ignored that counter. Finally the
new reservation functions can be used to allocate special parts of the
pmu such as IBS, which is necessary to use IBS with perf too.

The patches are available in the oprofile tree:

 git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git core

If there are no objections, I suggest to merge it into the
tip/perf/core too, maybe after pending patches went in. If there are
already conflicts, I will do the merge for this.

See enclose changelog.

-Robert


The following changes since commit bb1165d6882f423f90fc7007a88c6c993b7c2ac4:
  Robert Richter (1):
        perf, x86: rename macro in ARCH_PERFMON_EVENTSEL_ENABLE

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git core

Robert Richter (9):
      perf, x86: reduce number of CONFIG_X86_LOCAL_APIC macros
      oprofile, perf, x86: do not allocate evntsel counter msr
      oprofile, perf, x86: introduce new functions to reserve perfctrs by index
      tsc, x86: use new perfctr reservation functions in tsc code
      perf, x86: use new perfctr reservation functions in perf code
      oprofile/x86: rework error handler in nmi_setup()
      oprofile/x86: return -EBUSY if counters are already reserved
      oprofile/x86: group IBS code
      oprofile/x86: implement perfctr reservation for IBS

 arch/x86/include/asm/nmi.h             |    2 +
 arch/x86/include/asm/perf_event.h      |    3 +
 arch/x86/kernel/cpu/perf_event.c       |   36 ++---
 arch/x86/kernel/cpu/perfctr-watchdog.c |   35 +++--
 arch/x86/kernel/tsc.c          ...
From: Robert Richter
Date: Thursday, March 4, 2010 - 8:22 am

This codes improves the error handler in nmi_setup(). Most parts of
the code are moved to allocate_msrs(). In case of an error
allocate_msrs() also frees already allocated memory. nmi_setup()
becomes easier and better extendable.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/oprofile/nmi_int.c |   33 +++++++++++++++++++--------------
 1 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/arch/x86/oprofile/nmi_int.c b/arch/x86/oprofile/nmi_int.c
index 2c505ee..c0c21f2 100644
--- a/arch/x86/oprofile/nmi_int.c
+++ b/arch/x86/oprofile/nmi_int.c
@@ -295,6 +295,7 @@ static void free_msrs(void)
 		kfree(per_cpu(cpu_msrs, i).controls);
 		per_cpu(cpu_msrs, i).controls = NULL;
 	}
+	nmi_shutdown_mux();
 }
 
 static int allocate_msrs(void)
@@ -307,14 +308,21 @@ static int allocate_msrs(void)
 		per_cpu(cpu_msrs, i).counters = kzalloc(counters_size,
 							GFP_KERNEL);
 		if (!per_cpu(cpu_msrs, i).counters)
-			return 0;
+			goto fail;
 		per_cpu(cpu_msrs, i).controls = kzalloc(controls_size,
 							GFP_KERNEL);
 		if (!per_cpu(cpu_msrs, i).controls)
-			return 0;
+			goto fail;
 	}
 
+	if (!nmi_setup_mux())
+		goto fail;
+
 	return 1;
+
+fail:
+	free_msrs();
+	return 0;
 }
 
 static void nmi_cpu_setup(void *dummy)
@@ -342,17 +350,7 @@ static int nmi_setup(void)
 	int cpu;
 
 	if (!allocate_msrs())
-		err = -ENOMEM;
-	else if (!nmi_setup_mux())
-		err = -ENOMEM;
-	else
-		err = register_die_notifier(&profile_exceptions_nb);
-
-	if (err) {
-		free_msrs();
-		nmi_shutdown_mux();
-		return err;
-	}
+		return -ENOMEM;
 
 	/* We need to serialize save and setup for HT because the subset
 	 * of msrs are distinct for save and setup operations
@@ -374,9 +372,17 @@ static int nmi_setup(void)
 
 		mux_clone(cpu);
 	}
+
+	err = register_die_notifier(&profile_exceptions_nb);
+	if (err)
+		goto fail;
+
 	on_each_cpu(nmi_cpu_setup, NULL, 1);
 	nmi_enabled = 1;
 	return 0;
+fail:
+	free_msrs();
+	return err;
 }
 
 ...
From: Robert Richter
Date: Thursday, March 4, 2010 - 8:22 am

Moving code in preparation of the next patch. This groups all IBS code
together.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/oprofile/op_model_amd.c |  202 +++++++++++++++++++-------------------
 1 files changed, 101 insertions(+), 101 deletions(-)

diff --git a/arch/x86/oprofile/op_model_amd.c b/arch/x86/oprofile/op_model_amd.c
index 128d84c..9c0d978 100644
--- a/arch/x86/oprofile/op_model_amd.c
+++ b/arch/x86/oprofile/op_model_amd.c
@@ -105,107 +105,6 @@ static u32 get_ibs_caps(void)
 	return ibs_caps;
 }
 
-#ifdef CONFIG_OPROFILE_EVENT_MULTIPLEX
-
-static void op_mux_switch_ctrl(struct op_x86_model_spec const *model,
-			       struct op_msrs const * const msrs)
-{
-	u64 val;
-	int i;
-
-	/* enable active counters */
-	for (i = 0; i < NUM_COUNTERS; ++i) {
-		int virt = op_x86_phys_to_virt(i);
-		if (!reset_value[virt])
-			continue;
-		rdmsrl(msrs->controls[i].addr, val);
-		val &= model->reserved;
-		val |= op_x86_get_ctrl(model, &counter_config[virt]);
-		wrmsrl(msrs->controls[i].addr, val);
-	}
-}
-
-#endif
-
-/* functions for op_amd_spec */
-
-static void op_amd_shutdown(struct op_msrs const * const msrs);
-
-static int op_amd_fill_in_addresses(struct op_msrs * const msrs)
-{
-	int i;
-
-	for (i = 0; i < NUM_COUNTERS; i++) {
-		if (reserve_perfctr(i)) {
-			msrs->counters[i].addr = MSR_K7_PERFCTR0 + i;
-			msrs->controls[i].addr = MSR_K7_EVNTSEL0 + i;
-		} else if (counter_config[i].enabled) {
-			op_x86_warn_reserved(i);
-			op_amd_shutdown(msrs);
-			return -EBUSY;
-		}
-	}
-
-	return 0;
-}
-
-static void op_amd_setup_ctrs(struct op_x86_model_spec const *model,
-			      struct op_msrs const * const msrs)
-{
-	u64 val;
-	int i;
-
-	/* setup reset_value */
-	for (i = 0; i < NUM_VIRT_COUNTERS; ++i) {
-		if (counter_config[i].enabled
-		    && msrs->counters[op_x86_virt_to_phys(i)].addr)
-			reset_value[i] = counter_config[i].count;
-		else
-			reset_value[i] = 0;
-	}
-
-	/* clear all counters */
-	for ...
From: Robert Richter
Date: Thursday, March 4, 2010 - 8:22 am

For sharing IBS with oprofile and perf a resource allocation is
needed. This patch implements IBS allocation by extending the perfctr
reservation code.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/include/asm/perf_event.h |    3 +
 arch/x86/oprofile/op_model_amd.c  |  104 +++++++++++++++++++++++++------------
 2 files changed, 74 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 80e6936..2a95e01 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -117,6 +117,9 @@ union cpuid10_edx {
  */
 #define X86_PMC_IDX_FIXED_BTS				(X86_PMC_IDX_FIXED + 16)
 
+#define X86_PMC_IDX_SPECIAL_IBS_FETCH			(X86_PMC_IDX_FIXED + 17)
+#define X86_PMC_IDX_SPECIAL_IBS_OP			(X86_PMC_IDX_FIXED + 18)
+
 /* IbsFetchCtl bits/masks */
 #define IBS_FETCH_RAND_EN		(1ULL<<57)
 #define IBS_FETCH_VAL			(1ULL<<49)
diff --git a/arch/x86/oprofile/op_model_amd.c b/arch/x86/oprofile/op_model_amd.c
index 9c0d978..9bd040d 100644
--- a/arch/x86/oprofile/op_model_amd.c
+++ b/arch/x86/oprofile/op_model_amd.c
@@ -61,6 +61,7 @@ struct op_ibs_config {
 };
 
 static struct op_ibs_config ibs_config;
+static u64 ibs_fetch_ctl;
 static u64 ibs_op_ctl;
 
 /*
@@ -105,6 +106,63 @@ static u32 get_ibs_caps(void)
 	return ibs_caps;
 }
 
+static void shutdown_ibs(void)
+{
+	if (ibs_fetch_ctl) {
+		ibs_fetch_ctl = 0;
+		release_perfctr(X86_PMC_IDX_SPECIAL_IBS_FETCH);
+	}
+
+	if (ibs_op_ctl) {
+		ibs_op_ctl = 0;
+		release_perfctr(X86_PMC_IDX_SPECIAL_IBS_OP);
+	}
+}
+
+static int setup_ibs(void)
+{
+	ibs_fetch_ctl = 0;
+	if (ibs_caps && ibs_config.fetch_enabled) {
+		if (!reserve_perfctr(X86_PMC_IDX_SPECIAL_IBS_FETCH))
+			goto fail;
+		ibs_fetch_ctl =
+			((ibs_config.max_cnt_fetch >> 4) & IBS_FETCH_MAX_CNT)
+			| (ibs_config.rand_en ? IBS_FETCH_RAND_EN : 0)
+			| IBS_FETCH_ENABLE;
+	}
+
+	ibs_op_ctl = 0;
+	if (ibs_caps && ibs_config.op_enabled) {
+		if ...
From: Robert Richter
Date: Thursday, March 4, 2010 - 8:22 am

Current perfctr reservation code allocates single pmu msrs. The msr
addresses may differ depending on the model and offset calculation is
necessary. This can be easier implemented by reserving a counter by
its index only.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/include/asm/nmi.h             |    2 ++
 arch/x86/kernel/cpu/perfctr-watchdog.c |   29 ++++++++++++++++++++---------
 arch/x86/oprofile/op_model_amd.c       |    4 ++--
 arch/x86/oprofile/op_model_ppro.c      |    4 ++--
 4 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index 93da9c3..f5db47d 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -19,6 +19,8 @@ extern void die_nmi(char *str, struct pt_regs *regs, int do_panic);
 extern int check_nmi_watchdog(void);
 extern int nmi_watchdog_enabled;
 extern int avail_to_resrv_perfctr_nmi_bit(unsigned int);
+extern int reserve_perfctr(unsigned int);
+extern void release_perfctr(unsigned int);
 extern int reserve_perfctr_nmi(unsigned int);
 extern void release_perfctr_nmi(unsigned int);
 extern int reserve_evntsel_nmi(unsigned int);
diff --git a/arch/x86/kernel/cpu/perfctr-watchdog.c b/arch/x86/kernel/cpu/perfctr-watchdog.c
index 5c129ee..0868b8b 100644
--- a/arch/x86/kernel/cpu/perfctr-watchdog.c
+++ b/arch/x86/kernel/cpu/perfctr-watchdog.c
@@ -117,11 +117,8 @@ int avail_to_resrv_perfctr_nmi_bit(unsigned int counter)
 }
 EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi_bit);
 
-int reserve_perfctr_nmi(unsigned int msr)
+int reserve_perfctr(unsigned int counter)
 {
-	unsigned int counter;
-
-	counter = nmi_perfctr_msr_to_bit(msr);
 	/* register not managed by the allocator? */
 	if (counter > NMI_MAX_COUNTER_BITS)
 		return 1;
@@ -130,19 +127,33 @@ int reserve_perfctr_nmi(unsigned int msr)
 		return 1;
 	return 0;
 ...
From: Andi Kleen
Date: Friday, March 19, 2010 - 10:45 pm

Sorry reviewing old patch. This doesn't work for the fixed counters on intel,
which don't have a index (or rather they have a separate number space)

I had a old patch to fix the reservation for them (and a matching
patch to perf to use it).

How to resolve this?

-Andi

---

---
 arch/x86/kernel/cpu/perfctr-watchdog.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Index: linux-2.6.32-ak/arch/x86/kernel/cpu/perfctr-watchdog.c
===================================================================
--- linux-2.6.32-ak.orig/arch/x86/kernel/cpu/perfctr-watchdog.c
+++ linux-2.6.32-ak/arch/x86/kernel/cpu/perfctr-watchdog.c
@@ -70,9 +70,13 @@ static inline unsigned int nmi_perfctr_m
 	case X86_VENDOR_AMD:
 		return msr - MSR_K7_PERFCTR0;
 	case X86_VENDOR_INTEL:
-		if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
+		if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
+			if (msr >= MSR_CORE_PERF_FIXED_CTR0 &&
+			    msr < MSR_CORE_PERF_FIXED_CTR0 + 8)
+				return NMI_MAX_COUNTER_BITS -
+						(msr - MSR_CORE_PERF_FIXED_CTR0);
 			return msr - MSR_ARCH_PERFMON_PERFCTR0;
-
+		}
 		switch (boot_cpu_data.x86) {
 		case 6:
 			return msr - MSR_P6_PERFCTR0;




-- 
ak@linux.intel.com -- Speaking for myself only.
--

From: Robert Richter
Date: Thursday, March 25, 2010 - 8:52 am

Andi,

so far it does not seem this reservation patches will go upstream. So
we still do not have a solution of how to share the pmu with perf. The
current approach is a global pmu lock. I don't think this is a good
solution and we already see questions on the oprofile mailing list why
counters are not available to use. This will become much worse if perf
is using counters permanently in the kernel (e.g. the perf nmi
watchdog). This will make oprofile unusable.


Fixed counter reservation is not really used in the kernel except for
p4. Oprofile only reserves generic counters. Unless there is a general
solution how to reserve counters I don't see the need to extend the
reservation code for fixed counters since the only subsystem using it
would be the nmi watchdog.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com

--

From: Andi Kleen
Date: Thursday, March 25, 2010 - 12:33 pm

There's the current reservation 
code which has some issues and is butt ugly, but does its job mostly.

The only real problem I have with it is that it doesn't support fixed counters.

That's not very hard to fix (patches posted), but of course requires
some basic cooperation from impartial maintainers. I think extending

NMI watchdog is not on by default luckily.

Anyways I don't really understand what the problem with just
allocating counters properly in perf_events like everyone else.
They need to do that anyways to cooperate with firmware or VMs using these
counters.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.
--

From: Robert Richter
Date: Thursday, March 4, 2010 - 8:22 am

Two msrs must be reserverd per generic counter though both are
depending each other. Allocating the evntsel counter msr is not
necessary if the corresponding perfctr msr is already reserved. This
patch removes evntsel counter msr reservation.

This is implemented for AMD and P6 models only. for P4 still the
legacy code is used, so no changes there.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event.c       |   15 +--------------
 arch/x86/kernel/cpu/perfctr-watchdog.c |    6 ------
 arch/x86/kernel/tsc.c                  |    2 --
 arch/x86/oprofile/op_model_amd.c       |   11 ++---------
 arch/x86/oprofile/op_model_ppro.c      |   11 ++---------
 5 files changed, 5 insertions(+), 40 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 96b3a98..62a1a5e 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -248,19 +248,8 @@ static bool reserve_pmc_hardware(void)
 			goto perfctr_fail;
 	}
 
-	for (i = 0; i < x86_pmu.num_events; i++) {
-		if (!reserve_evntsel_nmi(x86_pmu.eventsel + i))
-			goto eventsel_fail;
-	}
-
 	return true;
 
-eventsel_fail:
-	for (i--; i >= 0; i--)
-		release_evntsel_nmi(x86_pmu.eventsel + i);
-
-	i = x86_pmu.num_events;
-
 perfctr_fail:
 	for (i--; i >= 0; i--)
 		release_perfctr_nmi(x86_pmu.perfctr + i);
@@ -275,10 +264,8 @@ static void release_pmc_hardware(void)
 {
 	int i;
 
-	for (i = 0; i < x86_pmu.num_events; i++) {
+	for (i = 0; i < x86_pmu.num_events; i++)
 		release_perfctr_nmi(x86_pmu.perfctr + i);
-		release_evntsel_nmi(x86_pmu.eventsel + i);
-	}
 
 	if (nmi_watchdog == NMI_LOCAL_APIC)
 		enable_lapic_nmi_watchdog();
diff --git a/arch/x86/kernel/cpu/perfctr-watchdog.c b/arch/x86/kernel/cpu/perfctr-watchdog.c
index fb329e9..5c129ee 100644
--- a/arch/x86/kernel/cpu/perfctr-watchdog.c
+++ b/arch/x86/kernel/cpu/perfctr-watchdog.c
@@ -313,17 +313,11 @@ static int single_msr_reserve(void)
 {
 	if ...
From: Robert Richter
Date: Thursday, March 4, 2010 - 8:22 am

In case a counter is already reserved by the watchdog or perf_event
subsystem, oprofile ignored this counters silently. This case is
handled now and oprofile_setup() now reports an error.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/oprofile/nmi_int.c       |    5 ++++-
 arch/x86/oprofile/op_model_amd.c  |   10 +++++++++-
 arch/x86/oprofile/op_model_p4.c   |   15 ++++++++++++++-
 arch/x86/oprofile/op_model_ppro.c |   10 +++++++++-
 arch/x86/oprofile/op_x86_model.h  |    2 +-
 5 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/arch/x86/oprofile/nmi_int.c b/arch/x86/oprofile/nmi_int.c
index c0c21f2..9f001d9 100644
--- a/arch/x86/oprofile/nmi_int.c
+++ b/arch/x86/oprofile/nmi_int.c
@@ -357,7 +357,10 @@ static int nmi_setup(void)
 	 */
 
 	/* Assume saved/restored counters are the same on all CPUs */
-	model->fill_in_addresses(&per_cpu(cpu_msrs, 0));
+	err = model->fill_in_addresses(&per_cpu(cpu_msrs, 0));
+	if (err)
+		goto fail;
+
 	for_each_possible_cpu(cpu) {
 		if (!cpu)
 			continue;
diff --git a/arch/x86/oprofile/op_model_amd.c b/arch/x86/oprofile/op_model_amd.c
index 905995d..128d84c 100644
--- a/arch/x86/oprofile/op_model_amd.c
+++ b/arch/x86/oprofile/op_model_amd.c
@@ -129,7 +129,9 @@ static void op_mux_switch_ctrl(struct op_x86_model_spec const *model,
 
 /* functions for op_amd_spec */
 
-static void op_amd_fill_in_addresses(struct op_msrs * const msrs)
+static void op_amd_shutdown(struct op_msrs const * const msrs);
+
+static int op_amd_fill_in_addresses(struct op_msrs * const msrs)
 {
 	int i;
 
@@ -137,8 +139,14 @@ static void op_amd_fill_in_addresses(struct op_msrs * const msrs)
 		if (reserve_perfctr(i)) {
 			msrs->counters[i].addr = MSR_K7_PERFCTR0 + i;
 			msrs->controls[i].addr = MSR_K7_EVNTSEL0 + i;
+		} else if (counter_config[i].enabled) {
+			op_x86_warn_reserved(i);
+			op_amd_shutdown(msrs);
+			return -EBUSY;
 		}
 	}
+
+	return 0;
 }
 
 static void op_amd_setup_ctrs(struct op_x86_model_spec ...
From: Robert Richter
Date: Thursday, March 4, 2010 - 8:22 am

Use reserve_perfctr()/release_perfctr() in tsc code.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/tsc.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index ff98ef2..2cdacae 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -871,7 +871,7 @@ static unsigned long __init calibrate_cpu(void)
 	unsigned long flags;
 
 	for (i = 0; i < 4; i++)
-		if (avail_to_resrv_perfctr_nmi_bit(i))
+		if (reserve_perfctr(i))
 			break;
 	no_ctr_free = (i == 4);
 	if (no_ctr_free) {
@@ -881,8 +881,6 @@ static unsigned long __init calibrate_cpu(void)
 		rdmsrl(MSR_K7_EVNTSEL3, evntsel3);
 		wrmsrl(MSR_K7_EVNTSEL3, 0);
 		rdmsrl(MSR_K7_PERFCTR3, pmc3);
-	} else {
-		reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i);
 	}
 	local_irq_save(flags);
 	/* start measuring cycles, incrementing from 0 */
@@ -900,7 +898,7 @@ static unsigned long __init calibrate_cpu(void)
 		wrmsrl(MSR_K7_PERFCTR3, pmc3);
 		wrmsrl(MSR_K7_EVNTSEL3, evntsel3);
 	} else {
-		release_perfctr_nmi(MSR_K7_PERFCTR0 + i);
+		release_perfctr(i);
 	}
 
 	return pmc_now * tsc_khz / (tsc_now - tsc_start);
-- 
1.7.0


--

From: Robert Richter
Date: Thursday, March 4, 2010 - 8:22 am

The function reserve_pmc_hardware() and release_pmc_hardware() were
hard to read. This patch improves readablity of the code by removing
most of the CONFIG_X86_LOCAL_APIC macros.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event.c |   15 +++++++++------
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 6531b4b..96b3a98 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -234,9 +234,10 @@ again:
 static atomic_t active_events;
 static DEFINE_MUTEX(pmc_reserve_mutex);
 
+#ifdef CONFIG_X86_LOCAL_APIC
+
 static bool reserve_pmc_hardware(void)
 {
-#ifdef CONFIG_X86_LOCAL_APIC
 	int i;
 
 	if (nmi_watchdog == NMI_LOCAL_APIC)
@@ -251,11 +252,9 @@ static bool reserve_pmc_hardware(void)
 		if (!reserve_evntsel_nmi(x86_pmu.eventsel + i))
 			goto eventsel_fail;
 	}
-#endif
 
 	return true;
 
-#ifdef CONFIG_X86_LOCAL_APIC
 eventsel_fail:
 	for (i--; i >= 0; i--)
 		release_evntsel_nmi(x86_pmu.eventsel + i);
@@ -270,12 +269,10 @@ perfctr_fail:
 		enable_lapic_nmi_watchdog();
 
 	return false;
-#endif
 }
 
 static void release_pmc_hardware(void)
 {
-#ifdef CONFIG_X86_LOCAL_APIC
 	int i;
 
 	for (i = 0; i < x86_pmu.num_events; i++) {
@@ -285,9 +282,15 @@ static void release_pmc_hardware(void)
 
 	if (nmi_watchdog == NMI_LOCAL_APIC)
 		enable_lapic_nmi_watchdog();
-#endif
 }
 
+#else
+
+static bool reserve_pmc_hardware(void) { return true; }
+static void release_pmc_hardware(void) {}
+
+#endif
+
 static inline bool bts_available(void)
 {
 	return x86_pmu.enable_bts != NULL;
-- 
1.7.0


--

From: Robert Richter
Date: Thursday, March 4, 2010 - 8:22 am

Use reserve_perfctr()/release_perfctr() in perf code.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 62a1a5e..16ae7b4 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -244,7 +244,7 @@ static bool reserve_pmc_hardware(void)
 		disable_lapic_nmi_watchdog();
 
 	for (i = 0; i < x86_pmu.num_events; i++) {
-		if (!reserve_perfctr_nmi(x86_pmu.perfctr + i))
+		if (!reserve_perfctr(i))
 			goto perfctr_fail;
 	}
 
@@ -252,7 +252,7 @@ static bool reserve_pmc_hardware(void)
 
 perfctr_fail:
 	for (i--; i >= 0; i--)
-		release_perfctr_nmi(x86_pmu.perfctr + i);
+		release_perfctr(i);
 
 	if (nmi_watchdog == NMI_LOCAL_APIC)
 		enable_lapic_nmi_watchdog();
@@ -265,7 +265,7 @@ static void release_pmc_hardware(void)
 	int i;
 
 	for (i = 0; i < x86_pmu.num_events; i++)
-		release_perfctr_nmi(x86_pmu.perfctr + i);
+		release_perfctr(i);
 
 	if (nmi_watchdog == NMI_LOCAL_APIC)
 		enable_lapic_nmi_watchdog();
-- 
1.7.0


--

From: Peter Zijlstra
Date: Thursday, March 4, 2010 - 10:59 am

Right, so cleaning up that reservation code is nice, but wouldn't it be
much nicer to simply do away with all that and make everything use the
(low level) perf code?

You expressed interest in both making the oprofile kernel space user the
perf apis as well as making oprofile user space use the perf abi, in
which case the kernel entirely side goes away.



--

From: Peter Zijlstra
Date: Thursday, March 11, 2010 - 4:48 am

Alternatively, could we maybe further simplify this reservation into:

int  reserve_pmu(void);
void release_pmu(void);

And not bother with anything finer grained.

--

From: Ingo Molnar
Date: Thursday, March 11, 2010 - 5:47 am

Yeah, that looks quite a bit simpler.

It's all about making it easier to live with legacies anyway - all modern 
facilities will use perf events to access the PMU.

	Ingo
--

From: Robert Richter
Date: Thursday, March 11, 2010 - 8:45 am

It does not solve the current problem that some parts of the pmu *are*
used simultaneously by different subsystems. But, even if only perf
would be used in the kernel you still can't be sure that all parts of
the pmu are available to be used, you simply don't have it under your
control. So why such limitations as an 'int reserve_pmu(int index)' is
almost the same but provides much more flexibility?

The question already arose if the watchdog would use perf permanently
and thus would block oprofile by making it unusable. The current
reservation code would provide a framework to solves this, sharing
perfctrs with watchdog, perf and oprofile. And, since the pmu becomes
more and more features other than perfctrs, why shouldn't it be

Scheduling events with perf is also 'some sort' of reservation, so
this code could be moved later into perf at all. In this case we also
will have to be able to reserve single counters or features by its
index.

For now, I don't think it is possible to change oprofile to use perf
in a big bang. This will disrupt oprofile users. I want to do the
switch to perf in a series of small changes and patch sets to make
sure, oprofile will not break. And this new reservation code is a step
towards perf.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com

--

Previous thread: [BUG] Linux-2.6.33 to 33-git9 intel_drm: drm_fb_helper panic by Tarkan Erimer on Thursday, March 4, 2010 - 7:34 am. (3 messages)

Next thread: [PATCH] V4L/DVB: mx1-camera: compile fix by =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= on Thursday, March 4, 2010 - 9:54 am. (13 messages)