Hi, This is a trivial change to fix the cpu_khz value returned when running on a virtualized environment. We have seen instances when the cpu_khz value is off by couple of MHz's when running on VMware's platform on AMD hardware. -- Since the TSC frequency read from hypervisor is accurate for the guest, and since the hypervisor will always clock the vcpu at the TSC frequency, there is no need to calibrate it again. To avoid any calibration errors through calibrate_cpu this patch skips calling calibrate_cpu for kernel running under hypervisors. Signed-off-by: Alok N Kataria <akataria@vmware.com> Cc: K. Y. Srinivasan <ksrinivasan@novell.com> Cc: Greg KH <greg@kroah.com> Cc: H. Peter Anvin <hpa@zytor.com> Index: linux-x86-tree.git/arch/x86/kernel/tsc.c =================================================================== --- linux-x86-tree.git.orig/arch/x86/kernel/tsc.c 2010-08-03 12:21:20.000000000 -0700 +++ linux-x86-tree.git/arch/x86/kernel/tsc.c 2010-08-13 15:07:08.000000000 -0700 @@ -927,7 +927,7 @@ void __init tsc_init(void) } if (cpu_has(&boot_cpu_data, X86_FEATURE_CONSTANT_TSC) && - (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)) + (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && !x86_hyper) cpu_khz = calibrate_cpu(); printk("Detected %lu.%03lu MHz processor.\n", --
I'm somewhat reluctant to take this one, since it assumes all the hypervisors act the same. This seems rather inherently wrong. In fact, the whole statement is fishy as heck... instead of being dependent on AMD and so on, this should either be a function pointer or a CPU (mis)feature bit. Can we do this saner? -hpa --
Thanks for taking a look. In any case, I agree that my previous patch did assume all hypervisors to be same, which might be wrong. How about using the X86_FEATURE_TSC_RELIABLE bit for this too ? i.e. Skip cpu_calibrate call if TSC_RELIABLE bit is set. As of now that bit is set for vmware only. Something like the below. Signed-off-by: Alok N Kataria <akataria@vmware.com> Cc: H. Peter Anvin <hpa@zytor.com> Index: linux-x86-tree.git/arch/x86/kernel/tsc.c =================================================================== --- linux-x86-tree.git.orig/arch/x86/kernel/tsc.c 2010-08-03 12:21:20.000000000 -0700 +++ linux-x86-tree.git/arch/x86/kernel/tsc.c 2010-08-16 21:59:32.000000000 -0700 @@ -927,7 +927,8 @@ void __init tsc_init(void) } if (cpu_has(&boot_cpu_data, X86_FEATURE_CONSTANT_TSC) && - (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)) + (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && + !(cpu_has(&boot_cpu_data, X86_FEATURE_TSC_RELIABLE))) cpu_khz = calibrate_cpu(); printk("Detected %lu.%03lu MHz processor.\n", --
I know it was... and calibrate_cpu() seems to be an AMD-specific function, but that's rather crappy. I'm thinking that perhaps we should make it an x86_init function, then the AMD CPU detection can install it That seems like a much better approach. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. --
From: "H. Peter Anvin" <hpa@zytor.com>
Btw, can we revisit this AMD-specific issue? IIUC, Alok you're seeing
a mismatch between the calibrated TSC value and the cpu frequency even
on cpus which have the CONSTANT_TSC bit set, i.e. their TSC is counting
with P0 frequency. Can you please elaborate more on what systems you're
seeing this (cpu family, chipset, etc)?
Thanks.
--
Regards/Gruss,
Boris.
--
Hi Borislav, We have seen these issues when running inside a Virtual Machine on VMware's platform. Please look at the vmware_set_cpu_features function, it relies on the hypervisor to provide a constant/reliable TSC. Though still when running the kernel on virtual cpus, as compared to running on physical cpus, the timing characteristics are different, since virtual cpus have to time share physical cpus with each other, which may result in errors during calibration. As a result its better to get these values directly from the hypervisor rather than trying to calibrate them. And just to clarify, we have never seen this on a physical machine. Thanks, --
From: Alok Kataria <akataria@vmware.com>
Date: Tue, Aug 17, 2010 at 09:45:32AM -0700
Thanks for clarifying this. I kinda stumbled over "AMD hardware" in your
Anyway, your patch got me looking into the code and currently we do
calibrate_cpu() on all AMD systems with invariant TSC. However, machines
starting with F10h revB have a setting in MSRC001_0015[24] which, when
set, denotes that the TSC counts with P0 frequency and thus equals the
core frequency. So for those machines we won't need to calibrate the cpu
frequency. I'm figuring out some additional details internally and will
be sending a fix soon.
Thanks.
--
Regards/Gruss,
Boris.
--
6b37f5a20c0e5c334c010a587058354215433e92 introduced the CPU frequency
calibration code for AMD CPUs whose TSCs didn't increment with the
core's P0 frequency. From F10h, revB onward, the TSC increment rate is
denoted by MSRC001_0015[24] and when this bit is set (which is normally
done by the BIOS,) the TSC increments with the P0 frequency so the
calibration is not needed and booting can be a couple of mcecs faster on
those machines.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
arch/x86/kernel/tsc.c | 14 ++++++++++++--
1 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index ce8e502..41b2b8b 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -927,8 +927,18 @@ void __init tsc_init(void)
}
if (cpu_has(&boot_cpu_data, X86_FEATURE_CONSTANT_TSC) &&
- (boot_cpu_data.x86_vendor == X86_VENDOR_AMD))
- cpu_khz = calibrate_cpu();
+ (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)) {
+
+ if (boot_cpu_data.x86 > 0x10 ||
+ (boot_cpu_data.x86 == 0x10 &&
+ boot_cpu_data.x86_model >= 0x2)) {
+ u64 val;
+
+ rdmsrl(MSR_K7_HWCR, val);
+ if (!(val & BIT(24)))
+ cpu_khz = calibrate_cpu();
+ }
+ }
printk("Detected %lu.%03lu MHz processor.\n",
(unsigned long)cpu_khz / 1000,
--
1.7.1
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
--
Yuck! There are a number of problems with this code, quite a few of which are, of course, pre-existing, but this makes it worse. calibrate_cpu() is AMD-specific, despite the generic name. (It is also, strangely enough, only compiled on 64 bits for some reason???) Either which way, it is definitely not okay for the test for when the code applies to be this distant from the code itself. The easy way to fix this is to rename it amd_calibrate_cpu() and move the applicability test into the routine itself. That is probably okay as long as there are no other users. However, if there are other users, then this really should move into x86_init and have a function pointer associated with it. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. --
From: "H. Peter Anvin" <hpa@zytor.com>
Yeah, this is ancient stuff, who knows. I will test the fix on 32-bit
Right, do you have strong preferences between x86_init and x86_platform?
The version below uses x86_platform because it has the calibrate_tsc()
function in there too. Also, the version below nicely moves all that
AMD-specific code to cpu/amd.c.
I didn't opt for a calibrate_cpu_noop stub because I didn't want to
pollute x86_init.c with yet another noop prototype. But I guess I should
do that since the pointer testing is still executed while stubs are
removed completely by smart compilers :).
Anyway, the version below builds, I'll test tomorrow and send an
official version then:
--
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index baa579c..f00ed28 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -146,6 +146,7 @@ struct x86_cpuinit_ops {
*/
struct x86_platform_ops {
unsigned long (*calibrate_tsc)(void);
+ unsigned long (*calibrate_cpu)(void);
unsigned long (*get_wallclock)(void);
int (*set_wallclock)(unsigned long nowtime);
void (*iommu_shutdown)(void);
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 60a57b1..6478bd5 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -6,6 +6,7 @@
#include <asm/processor.h>
#include <asm/apic.h>
#include <asm/cpu.h>
+#include <asm/nmi.h>
#include <asm/pci-direct.h>
#ifdef CONFIG_X86_64
@@ -381,6 +382,56 @@ static void __cpuinit early_init_amd_mc(struct cpuinfo_x86 *c)
#endif
}
+/*
+ * calibrate_cpu is used on systems with fixed rate TSCs to determine
+ * processor frequency
+ */
+#define TICK_COUNT 100000000
+unsigned long __init amd_calibrate_cpu(void)
+{
+ int tsc_start, tsc_now;
+ int i, no_ctr_free;
+ unsigned long evntsel3 = 0, pmc3 = 0, pmc_now = 0;
+ unsigned long flags;
+
+ for (i = 0; i < 4; i++)
+ if ...Don't think it matters much, but tglx might have an opinion. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. --
Hi Boris, Thanks for doing this. I already had a patch ready doing just this, though this change looks much nicer since you have moved all the code to the amd file. Guess I just have to wait for you to do the noop stuff if you are doing that. Once the patch is completed I can then just initialize this func ptr to the noop routine when on VMware's platform. --
From: Alok Kataria <akataria@vmware.com> Date: Wed, Aug 18, 2010 at 01:51:35PM -0400 yeah, actually I was on the verge of completely removing that calibrate_cpu code since maybe 99% of the systems out there should have a TSC counting with the P0 rate so the issue becomes moot and no need for fixing. But according to Murphy, there'll be this one system which needs it and thus we have to keep it, unfortunately. Anyway, I'll keep you posted. Thanks. -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 --
6b37f5a20c0e5c334c010a587058354215433e92 introduced the CPU frequency
calibration code for AMD CPUs whose TSCs didn't increment with the
core's P0 frequency. From F10h, revB onward, the TSC increment rate is
denoted by MSRC001_0015[24] and when this bit is set (which is normally
done by the BIOS,) the TSC increments with the P0 frequency so the
calibration is not needed and booting can be a couple of mcecs faster on
those machines.
While at it, make the code work on 32-bit. In addition, use the 4th
perfctr since using perfctr 0 might clash with perfctr-watchdog.c during
LAPIC init. Finally, warn about wrongly calibrated value in the most
seldom cases when the core TSC is not incrementing with P0 frequency.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
Here's the new version, had to change quite a lot and check all families
first.
@Alok, I think in your case you will want to do
x86_cpuinit.calibrate_cpu = NULL;
since this means no need to recalibrate.
Sorry for the delay.
arch/x86/include/asm/x86_init.h | 1 +
arch/x86/kernel/cpu/amd.c | 74 +++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/tsc.c | 65 ++++------------------------------
3 files changed, 83 insertions(+), 57 deletions(-)
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index baa579c..c63ab76 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -133,6 +133,7 @@ struct x86_init_ops {
*/
struct x86_cpuinit_ops {
void (*setup_percpu_clockev)(void);
+ unsigned long (*calibrate_cpu)(void);
};
/**
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index ba5f62f..236bcff 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -6,6 +6,7 @@
#include <asm/processor.h>
#include <asm/apic.h>
#include <asm/cpu.h>
+#include <asm/nmi.h>
#include <asm/pci-direct.h>
#ifdef CONFIG_X86_64
@@ -381,6 +382,62 @@ static void __cpuinit ...Hi Borislav, Thanks for working on this change.. Right, I will send that patch once this is committed. --
Build failure: /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c: In function ‘amd_calibrate_cpu’: /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:397: error: implicit declaration of function ‘avail_to_resrv_perfctr_nmi_bit’ /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:409: error: implicit declaration of function ‘reserve_perfctr_nmi’ /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:410: error: implicit declaration of function ‘reserve_evntsel_nmi’ /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:429: error: implicit declaration of function ‘release_perfctr_nmi’ /home/hpa/kernel/linux-2.6-tip.cpu/arch/x86/kernel/cpu/amd.c:430: error: implicit declaration of function ‘release_evntsel_nmi’ Reproducible by doing "make ARCH=i386 allnoconfig". -hpa --
From: "H. Peter Anvin" <hpa@zytor.com> Sh*t, I can't catch a break with that Kconfig dependency stuff, can I? This happens because perfctr-watchdog.c gets pulled in by CONFIG_X86_LOCAL_APIC which is, _of course_, not selected in an allnoconfig build. Fixing this would mean exporting all that perfcounter reservation functionality for the allnoconfig case, which is of course doable but I'm starting to question the need for recalibrating the TSC at all: I mean, in the 99% of the cases MSRC001_0015[24] should be set by the BIOS and if not then the BIOS which does that is pretty b0rked anyway. So I'm thinking of removing the recalibration code and simply warning the user instead, for the 1% case. Andreas, what do you think? -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 --
I opt for removing the recalibration code plus keeping a FIRMWARE_WARN for borked BIOSes (just in case that there are any old systems with the wrong setting). Andreas -- Operating | Advanced Micro Devices GmbH System | Einsteinring 24, 85609 Dornach b. München, Germany Research | Geschäftsführer: Alberto Bozzo, Andrew Bowd Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis München (OSRC) | Registergericht München, HRB Nr. 43632 --
... and checking the HWCR MSR and issuing the firmware warning should only happen if not running on a hypervisor. (Validate whether CPU has X86_FEATURE_HYPERVISOR bit set or not.) Andreas -- Operating | Advanced Micro Devices GmbH System | Einsteinring 24, 85609 Dornach b. München, Germany Research | Geschäftsführer: Alberto Bozzo, Andrew Bowd Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis München (OSRC) | Registergericht München, HRB Nr. 43632 --
6b37f5a20c0e5c334c010a587058354215433e92 introduced the CPU frequency
calibration code for AMD CPUs whose TSCs didn't increment with the
core's P0 frequency. From F10h, revB onward, however, the TSC increment
rate is denoted by MSRC001_0015[24] and when this bit is set (which
should be done by the BIOS) the TSC increments with the P0 frequency
so the calibration is not needed and booting can be a couple of mcecs
faster on those machines.
Besides, there should be virtually no machines out there which don't
have this bit set, therefore this calibration can be safely removed. It
is a shaky hack anyway since it assumes implicitly that the core is in
P0 when BIOS hands off to the OS, which might not always be the case.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
arch/x86/kernel/cpu/amd.c | 17 +++++++++++++
arch/x86/kernel/tsc.c | 58 ---------------------------------------------
2 files changed, 17 insertions(+), 58 deletions(-)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index ba5f62f..fc563fa 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -412,6 +412,23 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
set_cpu_cap(c, X86_FEATURE_EXTD_APICID);
}
#endif
+
+ /* We need to do the following only once */
+ if (c != &boot_cpu_data)
+ return;
+
+ if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
+
+ if (c->x86 > 0x10 ||
+ (c->x86 == 0x10 && c->x86_model >= 0x2)) {
+ u64 val;
+
+ rdmsrl(MSR_K7_HWCR, val);
+ if (!(val & BIT(24)))
+ printk(KERN_WARNING FW_BUG "TSC doesn't count "
+ "with P0 frequency!\n");
+ }
+ }
}
static void __cpuinit init_amd(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index ce8e502..13b6a6c 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -854,60 +854,6 @@ static void __init init_tsc_clocksource(void)
clocksource_register_khz(&clocksource_tsc, tsc_khz);
}
-#ifdef ...Commit-ID: acf01734b1747b1ec4be6f159aff579ea5f7f8e2 Gitweb: http://git.kernel.org/tip/acf01734b1747b1ec4be6f159aff579ea5f7f8e2 Author: Borislav Petkov <bp@amd64.org> AuthorDate: Wed, 25 Aug 2010 18:28:23 +0200 Committer: H. Peter Anvin <hpa@linux.intel.com> CommitDate: Wed, 25 Aug 2010 13:32:52 -0700 x86, tsc: Remove CPU frequency calibration on AMD 6b37f5a20c0e5c334c010a587058354215433e92 introduced the CPU frequency calibration code for AMD CPUs whose TSCs didn't increment with the core's P0 frequency. From F10h, revB onward, however, the TSC increment rate is denoted by MSRC001_0015[24] and when this bit is set (which should be done by the BIOS) the TSC increments with the P0 frequency so the calibration is not needed and booting can be a couple of mcecs faster on those machines. Besides, there should be virtually no machines out there which don't have this bit set, therefore this calibration can be safely removed. It is a shaky hack anyway since it assumes implicitly that the core is in P0 when BIOS hands off to the OS, which might not always be the case. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20100825162823.GE26438@aftab> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> --- arch/x86/kernel/cpu/amd.c | 17 +++++++++++++ arch/x86/kernel/tsc.c | 58 --------------------------------------------- 2 files changed, 17 insertions(+), 58 deletions(-) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index ba5f62f..fc563fa 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -412,6 +412,23 @@ static void __cpuinit early_init_amd(struct cpuinfo_x86 *c) set_cpu_cap(c, X86_FEATURE_EXTD_APICID); } #endif + + /* We need to do the following only once */ + if (c != &boot_cpu_data) + return; + + if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) { + + if (c->x86 > 0x10 || + (c->x86 == 0x10 && c->x86_model >= 0x2)) { + u64 val; + + rdmsrl(MSR_K7_HWCR, val); + if ...
Nice... this works for us too, we don't muck with that MSR bit either, its directly passed as is from the h/w to the guest. So no additional changes would be needed for us with this. Hope that the 3rd time is a charm for you too :) Thanks, --
From: Alok Kataria <akataria@vmware.com> That's nice, KVM appears to not hit it either due to unsynchronized Yeah, I think it is :). Sorry for taking so long but removing code which is actively executed from the kernel is not such a light decision. But the hw guys made sure that this bit is always set so we don't need the calibration. It wouldn't work in all cases anyway (hint: boosted cores). Thanks. -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 --
Very cool. This was brought up on an earlier thread and is really nice because it also avoids the 50+ppm variability easily seen in the calibrate results boot to boot. That variance causes difficulty keeping close NTP sync across reboots, as the persistent drift value is invalid after a reboot. I recently proposed a timer based solution that doesn't block bootup, and allows for very accurate calibration. This might help fill the gaps on legacy systems that do not provide TSC freq information. I'd be interested if you had any comments. https://kerneltrap.org/mailarchive/linux-kernel/2010/7/28/4598868 Am I just missing the point in the code where you actually use the msr value to assign cpu_khz? Or is the idea that the default tsc calibration already is correct, and we don't need to further refine it? And yea, the calibrate_cpu function needs to be renamed. thanks -john --
From: john stultz <johnstul@us.ibm.com> Date: Thu, Aug 19, 2010 at 02:47:35PM -0400 Yes. Actually Alok brought the code to my attention originally, and what puzzled me was the fact that we do calibrate_cpu() on all F10h and later AMD machines needlessly (well, almost all, maybe 99%). This is because, on F10h, revB machines and later we support "TSC increments with P0 frequency" so what native_calibrate_tsc() comes up with in terms of tsc_khz should be the same as cpu_khz. So the MSR bit check above is to see whether the TSC increments with P0 frequency and call calibrate_cpu only if _not_. As a result, this patch basically drops the additional cpu_khz calibration when it isn't needed. And as such it doesn't have a whole lot to do with the actual TSC calibration - this is up to you guys :). The original reason for the calibrate_cpu() is that there's the possibility for the TSC to count with the northbridge clockrate and there we need to recalibrate obviously. Done, the whole code is moved to cpu/amd.c anyway. I'm currently testing the new version and will be sending out maybe tomorrow or so. Thanks. -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 --
Hi HPA, I am planning to add a calibrate_apic function ptr in x86_platform_ops, for getting the APIC frequency too directly from the hypervisor. If you want I can add this calibrate_cpu function ptr too or is the patch below okay for now ? Thanks, --
