This patch adds Intel architectural PMU support (version 1, 2, and 3). Signed-off-by: Stephane Eranian <eranian@gmail.com> -- Index: o3/arch/x86/perfmon/Makefile =================================================================== --- o3.orig/arch/x86/perfmon/Makefile 2008-11-25 18:09:47.000000000 +0100 +++ o3/arch/x86/perfmon/Makefile 2008-11-25 18:21:33.000000000 +0100 @@ -3,3 +3,4 @@ # Contributed by Stephane Eranian <eranian@hpl.hp.com> # obj-$(CONFIG_PERFMON) += perfmon.o +obj-$(CONFIG_X86_PERFMON_INTEL_ARCH) += perfmon_intel_arch.o Index: o3/arch/x86/perfmon/perfmon_intel_arch.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ o3/arch/x86/perfmon/perfmon_intel_arch.c 2008-11-25 18:22:00.000000000 +0100 @@ -0,0 +1,628 @@ +/* + * This file contains the Intel architectural perfmon v1, v2, v3 + * description tables. + * + * Architectural perfmon was introduced with Intel Core Solo/Duo + * processors. + * + * Copyright (c) 2006-2007 Hewlett-Packard Development Company, L.P. + * Contributed by Stephane Eranian <eranian@hpl.hp.com> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA + * 02111-1307 USA + */ +#include <linux/kprobes.h> +#include <linux/perfmon_kern.h> +#include <asm/msr.h> +#include <asm/apic.h> + +static u64 enable_mask[PFM_MAX_PMCS]; +static u16 max_enable; +static int ...
in arch/x86/include/asm/intel_arch_perfmon.h we have already:
union cpuid10_eax {
struct {
unsigned int version_id:8;
unsigned int num_counters:8;
unsigned int bit_width:8;
unsigned int mask_length:8;
} split;
unsigned int full;
};
A) static
So we have:
set->used_pmcs and enable_mask and max_enable.
Why can set->used_pmcs contain bits which are not in the enable_mask
in the first place ? Why does the arch code not tell the generic code
which pmcs are available so we can avoid all this mask, weight
whatever magic ?
We store the same information in slightly different incarnations in
various places and then we need to mingle them all together to get to
Why are the counters enabled at all when an overflow is pending, which
stopped the counters anyway ?
Thanks,
tglx
--
Thomas, Ok, some explanations are needed here. Both perfmon_intel_arch.c and perfmon_amd64.c are supposed to be kernel modules. They are hardcoded right now to make the patch simpler. Have PMU description be modules is a key features because it allows adding new processor support without necessarily patch the core kernel or rebooting. This has been working nicely on Itanium. With the introduction of Intel architectural Perfmon (IA32 SDM chapter 18), this becomes possible on Intel X86 as well. on AMD64 processor, this is not really an issue as they all use the same architected PMU, except for family 10h which has nice extensions. In anycase, the idea is to encapsulate as much as possible code related into a PMU model into each module. That is why you are seing some redundancy. There is a difference between enable_mask and used_pmcs. The used_pmcs bitmasks shows all the config registers in use. Whereas enable_mask shows the all config registers which have start/stop capabilities. For the basic AMD64 PMU (4 counters) used_pmcs and enable_mask are equivalent, but that is not the case on Barcelona once we support IBS and sampling. So Because used_pmcs is part of generic code and enable_mask is a x86 construct. As I said above, for now, I could drop enable_mask. The arch code already export the list of available pmcs and pmds in impl_pmcs and impl_pmds. There is a difference between impl_pmcs and used_pmcs. The former list everything that is available. The latter shows what we are currently using. We may be using fewer registers than what is available, and we use this information to avoid saving/restoring MSR on context switch Because on Intel and AMD64, counters are not automatically frozen on interrupt. On Intel X86, they can be configured to do so, but it is an all or nothing setting. I am not using this option because we would then have a problem with the NMI watchdog given that it is also using a counter. --
Understood. If we need that in the near future then it's ok to keep it, it just did not make any sense from the current code. But I think you should do this once when you set up the context and keep that as a separate mask. Right now you evaluate enable_mask and Well, my question was: why do we have to stop the counters when an overflow is pending already ? The overflow pending is set inside of stop_save() and cleared somewhere else. stop_save() is called from pfm_arch_stop() and pfm_arch_ctxswout_thread(). The first thing it does is to disable the counters. Now at some points the counters are obviously reenabled for this context, but why are they reenabled _before_ the pending overflow has been resolved ? For N counters that N * 2 wrmsrl() overhead. Thanks, tglx --
To install the new processor in the first place you have to reboot anyways. Also typically at least for new families (which tend to be the only ones with radically new performance counters) it's typically needed to change some things in the core kernel anyways. So It becomes possible, but without having to use any modules. It should just work. Probably even without it worked -- at least if you limit yourself to family 6 -- because the register layout all stayed the same too. That said having modular PMUs is probably a good thing for distribution kernels, but there is really no need for any code compromises just to avoid a core kernel patch now and then. -Andi --
Well, I need more than eax. We could rewrite this union to include
eax, edx, So I propose we call it union cpuid10 and define it as:
union cpuid_eax {
struct {
unsigned int version_id:8;
unsigned int num_counters:8;
unsigned int bit_width:8;
unsigned int mask_length:8;
} split_eax;
struct {
unsigned int num_counters:5;
unsigned int bit_width:8;
unsigned int reserved:19;
} split_edx;
unsigned int full;
}
--
