Hi hpa, ingo, and tglx, The following changes are based on commit: d94e93d495514c69d4a7a553c0938cd777267e5d Changes include addition of vitual RTC driver and many clockevent tweaks in order to support Medfield (next generation) in the same binary kernel image. We consolidated tsc and lapic calibration into platform specific routine under x86_init for MRST. There will be a follow-up RFC patch does the similar consolidation for standard PC with HPET. *** BLURB HERE *** Feng Tang (2): x86/platform: add a wallclock_init func to x86_platforms ops x86/mrst: add vrtc driver which serves as a wall clock device Jacob Pan (6): x86: avoid check hlt if no timer interrupts x86/mrst/pci: return 0 for non-present pci bars x86/apic: allow use of lapic timer early calibration result x86/mrst: change clock selection logic to support medfield x86/apbt: support more timer configurations on mrst x86/mrst: Add nop functions to x86_init mpparse functions arch/x86/include/asm/apb_timer.h | 2 +- arch/x86/include/asm/bugs.h | 1 + arch/x86/include/asm/fixmap.h | 4 + arch/x86/include/asm/mrst.h | 30 ++++++++ arch/x86/include/asm/vrtc.h | 27 +++++++ arch/x86/include/asm/x86_init.h | 2 + arch/x86/kernel/Makefile | 2 +- arch/x86/kernel/apb_timer.c | 18 +++-- arch/x86/kernel/apic/apic.c | 21 +++++- arch/x86/kernel/cpu/bugs.c | 4 + arch/x86/kernel/mrst.c | 143 ++++++++++++++++++++++++++++++++++--- arch/x86/kernel/setup.c | 2 + arch/x86/kernel/vrtc.c | 100 ++++++++++++++++++++++++++ arch/x86/kernel/x86_init.c | 2 + arch/x86/pci/mrst.c | 2 +- 15 files changed, 336 insertions(+), 24 deletions(-) create mode 100644 arch/x86/include/asm/vrtc.h create mode 100644 arch/x86/kernel/vrtc.c --
From: Jacob Pan <jacob.jun.pan@intel.com>
check hlt requires external timer interrupt to wake up the
cpu. for platforms equipped with only per cpu timers, we don't
have external interrupts during check hlt.
this patch checks such condition to avoid kernel hang at hlt.
it also saves boot time in that hlt four times in the current code
requires four ticks to break out of them.
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
arch/x86/include/asm/bugs.h | 1 +
arch/x86/kernel/cpu/bugs.c | 4 ++++
2 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/bugs.h b/arch/x86/include/asm/bugs.h
index 08abf63..1e0cef8 100644
--- a/arch/x86/include/asm/bugs.h
+++ b/arch/x86/include/asm/bugs.h
@@ -2,6 +2,7 @@
#define _ASM_X86_BUGS_H
extern void check_bugs(void);
+extern struct clock_event_device *global_clock_event;
#if defined(CONFIG_CPU_SUP_INTEL) && defined(CONFIG_X86_32)
int ppro_with_ram_bug(void);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 01a2652..c0d9688 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -90,6 +90,10 @@ static void __init check_hlt(void)
return;
printk(KERN_INFO "Checking 'hlt' instruction... ");
+ if (!global_clock_event) {
+ printk(KERN_CONT "no clockevent to wake up, skipped.\n");
+ return;
+ }
if (!boot_cpu_data.hlt_works_ok) {
printk("disabled\n");
return;
--
1.6.3.3
--
I really wish I knew the exact systems affected by the HLT bug. If I remember correctly, it was some 386 systems -- or possibly 486 systems as well -- a very long time ago. This test just provides a diagnosis if the system really is bad (it hangs with an obvious message) at the cost of some 40 ms to the system boot time. I suspect C1 (HLT) being broken is not anywhere close to the predominant power management problem in the current day, and as such I'm wondering if this particular test hasn't outlived its usefulness. Thoughts? -hpa --
we could at least hide it behind the "don't run on pentium or newer" config options.. --
I'd be cool skipping it for family 5 or newer. I'm just wondering if we should kill it completely -- IIRC it was only a handful of 386/486 systems which had problems, usually due to marginal power supplies which couldn't handle the noise of a variable load (DOS not having any power management would run at a reliable 100% load) -- that's not exactly the type of systems which would have survived to modern day. -hpa --
Also SMM and hardware bugs on some platforms - Cyrix MediaGX 5510 for example where a hlt at the wrong moment during ATA transfers hung the box until power cycle. But all old old stuff. Alan --
I think family < 5 seems a reasonable cutoff. Note that the ATA transfer bug you describe above would not be caught by the existing check. -hpa --
On Fri, 07 May 2010 15:27:34 -0700 MediaGX5510 would I'm pretty certain be 486 reporting anyway --
but a boot time "does hlt work at all" check won't catch that. --
Ack. That would take care of all relevant machines. Linus --
Sounds like a plan. Jacob, do you want to submit a new patch (bypassing this check if boot_cpu_info.x86 >= 5)? -hpa --
Just sent out the updated patch. I guess you meant boot_cpu_data instead of boot_cpu_info. --
From: Jacob Pan <jacob.jun.pan@intel.com>
lapic timer calibration can be combined with tsc in platform specific
calibration functions. if such calibration result is obtained early,
we can skip the redundent calibration loops.
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
arch/x86/kernel/apic/apic.c | 21 ++++++++++++++++++++-
1 files changed, 20 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index e5a4a1e..8ef56ac 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -175,7 +175,7 @@ static struct resource lapic_resource = {
.flags = IORESOURCE_MEM | IORESOURCE_BUSY,
};
-static unsigned int calibration_result;
+unsigned int calibration_result;
static int lapic_next_event(unsigned long delta,
struct clock_event_device *evt);
@@ -597,6 +597,25 @@ static int __init calibrate_APIC_clock(void)
long delta, deltatsc;
int pm_referenced = 0;
+ /**
+ * check if lapic timer has already been calibrated by platform
+ * specific routine, such as tsc calibration code. if so, we just fill
+ * in the clockevent structure and return.
+ */
+
+ if (calibration_result) {
+ apic_printk(APIC_VERBOSE, "lapic timer already calibrated %d\n",
+ calibration_result);
+ lapic_clockevent.mult = div_sc(calibration_result/APIC_DIVISOR,
+ TICK_NSEC, lapic_clockevent.shift);
+ lapic_clockevent.max_delta_ns =
+ clockevent_delta2ns(0x7FFFFF, &lapic_clockevent);
+ lapic_clockevent.min_delta_ns =
+ clockevent_delta2ns(0xF, &lapic_clockevent);
+ lapic_clockevent.features &= ~CLOCK_EVT_FEAT_DUMMY;
+ return 0;
+ }
+
local_irq_disable();
/* Replace the global interrupt handler */
--
1.6.3.3
--
I'd rather move lapic calibration into TSC calibration in general as we do the same thing twice for no good reason. Aside of my general objection it'd be not a good idea to make this global w/o renaming it to something sensible like --
I am trying to avoid the risks of completely remove the current lapic calibration code since there are so many platforms with different timer options. And I don't understand things like why pm timer is preferred. perhaps, the calibration data can directly be assigned to lapic timer clock_event_device.mult? There is no need for the device specific result --
Jacob, We do not have access to the clocksource at that point. But we do a calibration loop for TSC and for lapic timer. There is no reason why No. That needs an accessor function. Thanks, tglx --
I have a simple patch that uses HPET to calibrate both tsc and lapic timers in one loop in native_calibrate_tsc(). Will send out for RFC. The question I have is the reference clock used for calibrating those local timers, PIT, HPET, PM timer how should they be ranked. Can we make those known freq clocksource devices available at this point so that we can use the clocksource Agreed, so you are ok with bypassing the existing calibration_result variable and use .mult to carry the result? --
Personally I'd rank the PMTMR first, then HPET, then PIT, just based on the relative complexity and relative known bugginess of the various implementations. The PMTMRs main defect is that it can't generate an interrupt; it's just a dumb counter. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. --
From: Jacob Pan <jacob.jun.pan@intel.com> Penwell has added always on lapic timers and tsc, we want to treat it as a variant of moorestown so that one binary kernel can boot on both. this patch added clock selction logic so that platform code can set up the optimal configuration for both moorestown and medfield. This patch will also mark Penwell TSC reliable, thus no need for watchdog clocksource to monitor it. Signed-off-by: Alek Du <alek.du@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> --- arch/x86/include/asm/mrst.h | 30 +++++++++++ arch/x86/kernel/mrst.c | 119 ++++++++++++++++++++++++++++++++++++++---- 2 files changed, 137 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/mrst.h b/arch/x86/include/asm/mrst.h index 451d30e..3054407 100644 --- a/arch/x86/include/asm/mrst.h +++ b/arch/x86/include/asm/mrst.h @@ -11,7 +11,37 @@ #ifndef _ASM_X86_MRST_H #define _ASM_X86_MRST_H extern int pci_mrst_init(void); +extern unsigned int calibration_result; + int __init sfi_parse_mrtc(struct sfi_table_header *table); +extern int mrst_timer_options __cpuinitdata; +extern int mrst_identify_cpu(void); + +/** + * Medfield is the follow-up of Moorestown, it combines two chip solution into + * one. Other than that it also added always-on and constant tsc and lapic + * timers. Medfield is the platform name, and the chip name is called Penwell + * we treat Medfield/Penwell as a variant of Moorestown. Penwell can be + * identified via MSRs. + */ + + +#define MRST_CPU_CHIP_LINCROFT 1 +#define MRST_CPU_CHIP_PENWELL 2 + +#define PENWELL_FAMILY 0x20670 + +/** + * Penwell uses spread spectrum clock, so the freq number is not exactly + * as reported by MSR. + */ +#define PENWELL_FSB_FREQ_83SKU 83200 +#define PENWELL_FSB_FREQ_100SKU 99840 + +#define MRST_TIMER_DEFAULT 0 +#define MRST_TIMER_APBT_ONLY 1 +#define MRST_TIMER_LAPIC_APBT 2 + #define ...
apbt sucks performance wise, so why do you consider it a better This is not Penwell specific at all. The ratio can be read out on all Core based CPUs either from MSR_PLATFORM_ID[12:8] or MSR_PERF_STAT[44:40] depending on XE operation enabled (MSR_PERF_STAT[31] == 1) This should be made general available and not burried into the mrst I guess the 111 is Penwell/MRST specific, right ? According to SDM we have anyway different results for the various CPU families, but we should utilize this in a generic way and have the Yikes. From which Intel cookbook is this ? Aside of that we have that info in boot_cpu_info already, don't we ? So there is neither a requirement for the extra cpuid call nor for Are we going to add one of those for each new family ? This is --
From the driver side we need this value determined and visible to the drivers because some things like the IPC interface need to know rather than getting the cpuid magic replicated around. --
Ok. That makes sense. Is there more what MRST drivers need to know ? Thanks, tglx --
On Tue, 11 May 2010 17:50:28 +0200 (CEST) At some point they will need to know what platform they are booting on, I have a possible patch for that which just exposes the boot loader value but I'm not sure what the final direction on that will be. For CPU they need to know which CPU type as it affects some bits like the message formats in use. Probably that means the mrst_cpu_type enum or function should have an enum value of 0 which means "Not a moorestown device". --
apbt is always-on, I guess depends on the load, it can be better than
having broadcast timers. e.g. if there are frequency transitions
between C0 to deep C states. if we are always in c0, I can easily see
native performance impact with per cpu apbt. I don't have power number
to backup either cases. ftrace shows programming lapic timer is quite
expensive, I don't understand.
1) | clockevents_program_event() {
1) | lapic_next_event() {
1) 2.947 us | native_apic_mem_write();
1) 8.682 us | }
1) + 14.676 us | }
0) | clockevents_program_event() {
0) 4.146 us | apbt_next_event();
it can be more straightforward if we don't allow user cmdline
initially, I thought we need this before boot_cpu_data is initialized.
But we actually don't need it early. I will fix that.
--
Hi tglx,
Please help to review this patch, it is against the latest patches Jacob sent out:
Basically the idea is to put bus_ratio and fsb in cpuinfo_x86 structure, and the
CPU early_init_intel function will fill the info.
From 5ae648b2f18778df4eb3f1916a98971332482544 Mon Sep 17 00:00:00 2001
From: Jacob Pan <jacob.jun.pan@linux.intel.com>
Date: Fri, 14 May 2010 13:45:46 -0700
Subject: [PATCH 1/2] x86/mrst: Auto detect freq for local timers
Some Intel CPUs can directly get fsb frequency and bus ratio from
various MSRs. This patch enables this feature and benefit Medfield
platform.
Signed-off-by: Alek Du <alek.du@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
arch/x86/include/asm/processor.h | 3 +++
arch/x86/kernel/cpu/intel.c | 34 ++++++++++++++++++++++++++++++++++
arch/x86/kernel/mrst.c | 32 ++++++++++++++++++++++++--------
3 files changed, 61 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 32428b4..f72107f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -94,6 +94,9 @@ struct cpuinfo_x86 {
int x86_cache_alignment; /* In bytes */
int x86_power;
unsigned long loops_per_jiffy;
+ /* support TSC and LAPIC non-calibartion way */
+ __u32 bus_ratio;
+ __u32 fsb; /* In khz */
#ifdef CONFIG_SMP
/* cpus sharing the last level cache: */
cpumask_var_t llc_shared_map;
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 85f69cd..f620abc 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -27,6 +27,38 @@
#include <asm/apic.h>
#endif
+/* MSR_FSB_FREQ in Khz */
+const u32 fsb[] = {266667, 133333, 200000, 166667, 333333, 100000, 400000, 0};
+/* detect Intel cpus that can do TSC/LAPIC non-calibration way */
+static void __cpuinit intel_tsc_fsb(struct cpuinfo_x86 *c)
+{
+ u32 lo, hi;
+
+ if (c->x86 != 6)
+ return;
+ if (c->x86_model != ...Sorry, send again, the previous patch should be merged with the patch that
export lapic_timer_frequency and allow lapic use early calibration result.
Otherwise, it will break the build...
From 4103dc6165530a490c12a6177d916f2c2e012436 Mon Sep 17 00:00:00 2001
From: Jacob Pan <jacob.jun.pan@linux.intel.com>
Date: Fri, 14 May 2010 13:45:46 -0700
Subject: [PATCH] x86/mrst: Auto detect freq for local timers
Some Intel CPUs can directly get fsb frequency and bus ratio from
various MSRs. This patch enables this feature and benefit Medfield
platform.
lapic timer calibration can be combined with tsc in platform specific
calibration functions. if such calibration result is obtained early,
we can skip the redundent calibration loops.
Signed-off-by: Alek Du <alek.du@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
arch/x86/include/asm/apic.h | 1 +
arch/x86/include/asm/processor.h | 3 +++
arch/x86/kernel/apic/apic.c | 33 ++++++++++++++++++++++++++-------
arch/x86/kernel/cpu/intel.c | 34 ++++++++++++++++++++++++++++++++++
arch/x86/kernel/mrst.c | 32 ++++++++++++++++++++++++--------
5 files changed, 88 insertions(+), 15 deletions(-)
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 1fa03e0..448c763 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -50,6 +50,7 @@ extern unsigned int apic_verbosity;
extern int local_apic_timer_c2_ok;
extern int disable_apic;
+extern unsigned int lapic_timer_frequency;
#ifdef CONFIG_SMP
extern void __inquire_remote_apic(int apicid);
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 32428b4..f72107f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -94,6 +94,9 @@ struct cpuinfo_x86 {
int x86_cache_alignment; /* In bytes */
int x86_power;
unsigned long loops_per_jiffy;
+ /* support TSC and LAPIC non-calibartion way ...From: Feng Tang <feng.tang@intel.com>
Moorestown platform doesn't have a m146818 RTC device like traditional
x86 PC, but a firmware emulated virtual RTC device(vrtc), which provides
some basic RTC functions like get/set time. vrtc serves as the only
wall clock device on Moorestown platform.
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
arch/x86/include/asm/fixmap.h | 4 ++
arch/x86/include/asm/vrtc.h | 27 +++++++++++
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/mrst.c | 20 ++++++++
arch/x86/kernel/vrtc.c | 100 +++++++++++++++++++++++++++++++++++++++++
5 files changed, 152 insertions(+), 1 deletions(-)
create mode 100644 arch/x86/include/asm/vrtc.h
create mode 100644 arch/x86/kernel/vrtc.c
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index d07b44f..c4ba8d9 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -117,6 +117,10 @@ enum fixed_addresses {
FIX_TEXT_POKE1, /* reserve 2 pages for text_poke() */
FIX_TEXT_POKE0, /* first page is last, because allocation is backward */
__end_of_permanent_fixed_addresses,
+
+#ifdef CONFIG_X86_MRST
+ FIX_LNW_VRTC,
+#endif
/*
* 256 temporary boot-time mappings, used by early_ioremap(),
* before ioremap() is functional.
diff --git a/arch/x86/include/asm/vrtc.h b/arch/x86/include/asm/vrtc.h
new file mode 100644
index 0000000..4e40129
--- /dev/null
+++ b/arch/x86/include/asm/vrtc.h
@@ -0,0 +1,27 @@
+#ifndef _MRST_VRTC_H
+#define _MRST_VRTC_H
+
+#ifdef CONFIG_X86_MRST
+extern unsigned char vrtc_cmos_read(unsigned char reg);
+extern void vrtc_cmos_write(unsigned char val, unsigned char reg);
+extern unsigned long vrtc_get_time(void);
+extern int vrtc_set_mmss(unsigned long nowtime);
+extern void vrtc_set_base(void __iomem *base);
+
+#define MRST_VRTC_PGOFFSET (0xc00)
+
+#else
+static inline unsigned char vrtc_cmos_read(unsigned char reg)
+{
+ return ...Even though many of the rtc drivers print this way, it seems a very backwards way of presenting time to me. I think "yyyy-mm-dd hh:mm:ss" is a better representation, more compact and readable. Also KERN_INFO may not be good, maybe KERN_DEBUG or pr_debug? --
On Fri, 07 May 2010 11:51:06 -0700 Consistency is really more important here IMHO - lots of drivers have set an existing policy. --
(added Alessandro Zummo to cc's) look at drivers/rtc. All of them seem to use a templated copy/paste dev_dbg, which seems to point to a use for a possible common rtc_util.c or some such where this could be standardized. Is there somewhere else this style is used? --
Probably the right thing to do is to (a) move all this printing to common code; (b) change to ISO 8601 format. -hpa --
On Sat, 8 May 2010 03:56:09 +0800 Thank you all for the comments. please review this follow-on patch. - Feng -- diff --git a/arch/x86/kernel/vrtc.c b/arch/x86/kernel/vrtc.c index fa1eb63..0fbd74d 100644 --- a/arch/x86/kernel/vrtc.c +++ b/arch/x86/kernel/vrtc.c @@ -74,8 +74,8 @@ unsigned long vrtc_get_time(void) /* vRTC YEAR reg contains the offset to 1960 */ year += 1960; - printk(KERN_INFO "vRTC: sec: %d min: %d hour: %d day: %d " - "mon: %d year: %d\n", sec, min, hour, mday, mon, year); + pr_debug("vrtc time: %4d-%02d-%02d %02d:%02d:%02d\n", + year, mon, mday, hour, min, sec); return mktime(year, mon, mday, hour, min, sec); } --
That doesn't move it to common code, though. I'd rather see existing common style used, then merged (centralized) and *then* the style corrected. -hpa --
On Tue, 11 May 2010 02:22:15 +0800 Hi Peter, The reason I didn't move it to rtc common code is, this vrtc.c sits in arch/x86/kernel/ and better not to depend on drivers/rtc, as drivers/rtc may not be always enabled in kernel configuration. I also have another general driver for vrtc which will be in drivers/rtc/, just like general x86 kernel which has a rtc.c in arch/ and a rtc-cmos.c in drivers/rtc, I will clean my code up and try to move these funcs to a common code. Thanks, Feng --
Errm. That's a MRST specific header and nothing outside of MRST is using it. So why the #ifdef CONFIG_X86_MRST and the inline functions Why do we need a fixmap for that ? There is no need to setup RTC that early. The first call is from timekeeping_init() This lock_cmos magic should just die. I have no idea why something Please remove the debug noise Thanks, tglx --
Hi Thomas, Thanks for the great comments! On Tue, 11 May 2010 22:57:44 +0800 My bad not mentioning there is another rtc-mrst.c which will sit in drivers/rtc, it will use some of the functions listed here. It will be posted later. vrtc.c/rtc-mrst.c is similar with the rtc.c/rtc-cmos.c in general x86 PCs, as drivers/rtc may not always be enabled in kernel, vrtc.c need sit in arch/x86 to provide the get/set_time service, while rtc-mrst.c will serve general rtc Actually when to init the vrtc register is a big problem for me, vrtc need be inited before timekeeping_init(), and I thought better to put it somewhere in setup_arch(), as it is architecture specific, and ioremap is not working at that time. Also that's the reason I created a new wallclock_init func for x86_platforms, I could not find a better way I agree I should move this init code to vrtc.c, but still think it should I will try to reuse the rtc_lock defined in rtc.c whose get/set_time Will make it a pr_debug. Thanks, Feng --
Feng, [ Cc'ed John ] There is no particular reason why we need to read it in timekeeping_init(). Nothing in the kernel needs the correct wall time at that point. So we can safely move the setting of xtime to rtc wall clock time to a separate timekeeping_late_init() function. Please don't create artifical dependencies. Use a separate vrtc_lock to serialize the access to vrtc. Thanks, tglx --
Hi Thomas, On Mon, 17 May 2010 17:15:55 +0800 Yeah, good suggestion, if xtime init is moved to a later time in kernel I just checked the code, when wall clock's get/set_time service is called, it is always protected by rtc_lock(code in arch/x86/kernel/rtc.c), then no need to add the lock for each individual register read/write operation. I will submit a v2 vrtc patch. Thanks, --
John was away yesterday, so we have to wait for his answer, but I don't Grr. Yes, missed that the code is called under rtc_lock already. So you can drop the locking in your code completely. Thanks, tglx --
No big objections here. Still would like to keep the amount of time that the kernel is up without xtime being initialized to a minimum. However the generic RTC code already have this issue since some of them require interrupts to be enabled to do a read, so pushing it off into a _late_init() function is probably just a short term fix until we figure out how to get the generic RTC code working better with the timekeeping code. Does the delayed init required by vrtc cause any trouble with suspend/resume? thanks -john --
Why? Nothing _IS_ depending on xtime at this point. We just should have it ready when we mount the first file system, but even that is Well, you need to wait until the driver is loaded and the hardware accessible which might take some time when I2C, SPI or similar stuff is involved. No way to get this stuff accesible early. So really the question is, why do we want wall time initialized early? All we care about until we hit user space is clock monotonic and working timekeeping in general. NTP is not an issue either before we No, suspend/resume does not go through that early code. It has everything setup (mappings, etc) Thanks, tglx --
From 2451f205e29be57b30f0c50759c1b05a67bb97d1 Mon Sep 17 00:00:00 2001
From: Feng Tang <feng.tang@intel.com>
Date: Wed, 19 May 2010 16:57:20 +0800
Subject: [PATCH 1/3] timekeeping: moving xtime's init to a later time
xtime doesn't need to be inited early, move it to a subsys_initcall,
as most of its consumers come from userspace. It will also give enough
time for some MMIO based wallclock devices to init
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
kernel/time/timekeeping.c | 28 +++++++++++++++++++++-------
1 files changed, 21 insertions(+), 7 deletions(-)
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 39f6177..7bc32de 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -529,26 +529,37 @@ void __attribute__((weak)) read_boot_clock(struct timespec *ts)
}
/*
- * timekeeping_init - Initializes the clocksource and common timekeeping values
+ * timekeeping_init - Initializes the clocksource
*/
void __init timekeeping_init(void)
{
struct clocksource *clock;
unsigned long flags;
- struct timespec now, boot;
-
- read_persistent_clock(&now);
- read_boot_clock(&boot);
write_seqlock_irqsave(&xtime_lock, flags);
- ntp_init();
-
clock = clocksource_default_clock();
if (clock->enable)
clock->enable(clock);
timekeeper_setup_internals(clock);
+ ntp_init();
+ write_sequnlock_irqrestore(&xtime_lock, flags);
+}
+
+/*
+ * timekeeping_late_init - Initaizes the common timekeeping values
+ */
+static int __init timekeeping_late_init(void)
+{
+ unsigned long flags;
+ struct timespec now, boot;
+
+ read_persistent_clock(&now);
+ read_boot_clock(&boot);
+
+ write_seqlock_irqsave(&xtime_lock, flags);
+
xtime.tv_sec = now.tv_sec;
xtime.tv_nsec = now.tv_nsec;
raw_time.tv_sec = 0;
@@ -563,7 +574,10 @@ void __init timekeeping_init(void)
total_sleep_time.tv_sec = 0;
total_sleep_time.tv_nsec = 0;
...From 9ba7685bfdcaa2a5c40eade388902c478bc46071 Mon Sep 17 00:00:00 2001
From: Feng Tang <feng.tang@intel.com>
Date: Thu, 20 May 2010 17:40:05 +0800
Subject: [PATCH 2/3] x86: unify current 3 similar ways of saving IRQ info
There are 3 places defining the similar function of saving IRQ
vector info into mp_irqs[] array: mmparse/acpi/sfi. This patch
will reduce the redundant code, and make it only one API:
void mp_save_irq(struct mpc_intsrc *m);
Cc: Len Brown <len.brown@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
arch/x86/include/asm/mpspec.h | 6 ++++++
arch/x86/kernel/acpi/boot.c | 32 +++-----------------------------
arch/x86/kernel/mpparse.c | 14 +++++++-------
arch/x86/kernel/mrst.c | 30 ++----------------------------
4 files changed, 18 insertions(+), 64 deletions(-)
diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index c82868e..17f4314 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -42,6 +42,12 @@ extern int quad_local_to_mp_bus_id [NR_CPUS/4][4];
#endif /* CONFIG_X86_64 */
+#ifdef CONFIG_X86_IO_APIC
+void mp_save_irq(struct mpc_intsrc *m);
+#else
+static inline void mp_save_irq(struct mpc_intsrc *m) {}
+#endif
+
#if defined(CONFIG_MCA) || defined(CONFIG_EISA)
extern int mp_bus_id_to_type[MAX_MP_BUSSES];
#endif
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 9a5ed58..8f32fe2 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -923,32 +923,6 @@ static int __init acpi_parse_madt_lapic_entries(void)
extern int es7000_plat;
#endif
-static void assign_to_mp_irq(struct mpc_intsrc *m,
- struct mpc_intsrc *mp_irq)
-{
- memcpy(mp_irq, m, sizeof(struct mpc_intsrc));
-}
-
-static int mp_irq_cmp(struct mpc_intsrc *mp_irq,
- struct mpc_intsrc *m)
-{
- return memcmp(mp_irq, m, sizeof(struct mpc_intsrc));
-}
-
-static void save_mp_irq(struct ...From 14760d3bf6f77a17400c30258e365c06cbc36661 Mon Sep 17 00:00:00 2001 From: Feng Tang <feng.tang@intel.com> Date: Wed, 24 Mar 2010 10:31:35 +0800 Subject: [PATCH 3/3] x86/mrst: add vrtc driver which serves as a wall clock device Moorestown platform doesn't have a m146818 RTC device like traditional x86 PC, but a firmware emulated virtual RTC device(vrtc), which provides some basic RTC functions like get/set time. vrtc serves as the only wall clock device on Moorestown platform. Currently, vrtc init func will be called as arch_initcall() before xtime's init. Also move the sfi vrtc table parsing from mrst.c to vrtc.c There will be another general vrtc driver for rtc subsystem Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Feng Tang <feng.tang@intel.com> --- arch/x86/include/asm/mrst.h | 2 - arch/x86/include/asm/vrtc.h | 24 +++++++ arch/x86/kernel/Makefile | 2 +- arch/x86/kernel/mrst.c | 45 +------------- arch/x86/kernel/vrtc.c | 147 +++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 173 insertions(+), 47 deletions(-) create mode 100644 arch/x86/include/asm/vrtc.h create mode 100644 arch/x86/kernel/vrtc.c diff --git a/arch/x86/include/asm/mrst.h b/arch/x86/include/asm/mrst.h index 451d30e..fa144f2 100644 --- a/arch/x86/include/asm/mrst.h +++ b/arch/x86/include/asm/mrst.h @@ -11,9 +11,7 @@ #ifndef _ASM_X86_MRST_H #define _ASM_X86_MRST_H extern int pci_mrst_init(void); -int __init sfi_parse_mrtc(struct sfi_table_header *table); #define SFI_MTMR_MAX_NUM 8 -#define SFI_MRTC_MAX 8 #endif /* _ASM_X86_MRST_H */ diff --git a/arch/x86/include/asm/vrtc.h b/arch/x86/include/asm/vrtc.h new file mode 100644 index 0000000..fcdfcde --- /dev/null +++ b/arch/x86/include/asm/vrtc.h @@ -0,0 +1,24 @@ +#ifndef _MRST_VRTC_H +#define _MRST_VRTC_H + +#define SFI_MRTC_MAX 8 +#define VRTC_IOLEN 1024 + +#ifdef CONFIG_X86_MRST +extern unsigned char vrtc_reg_read(unsigned char reg); +extern void vrtc_reg_write(unsigned ...
From: Jacob Pan <jacob.jun.pan@intel.com>
With the addition of Medfield as a follow-up of Moorestown, more timer
configurations are available to the kernel. This patch allows the
optimal default configuration to be chosen and overwritten by cmdline
as well.
i.e. For Moorestown, percpu APB timers are default. For Medfield local
always-on local APIC timers are default (w/o any platform timers).
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
arch/x86/include/asm/apb_timer.h | 2 +-
arch/x86/kernel/apb_timer.c | 18 ++++++++++--------
2 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/arch/x86/include/asm/apb_timer.h b/arch/x86/include/asm/apb_timer.h
index c74a2ee..4127fd1 100644
--- a/arch/x86/include/asm/apb_timer.h
+++ b/arch/x86/include/asm/apb_timer.h
@@ -55,7 +55,7 @@ extern unsigned long apbt_quick_calibrate(void);
extern int arch_setup_apbt_irqs(int irq, int trigger, int mask, int cpu);
extern void apbt_setup_secondary_clock(void);
extern unsigned int boot_cpu_id;
-extern int disable_apbt_percpu;
+extern int mrst_timer_options;
extern struct sfi_timer_table_entry *sfi_get_mtmr(int hint);
extern void sfi_free_mtmr(struct sfi_timer_table_entry *mtmr);
diff --git a/arch/x86/kernel/apb_timer.c b/arch/x86/kernel/apb_timer.c
index a353475..08dfbf8 100644
--- a/arch/x86/kernel/apb_timer.c
+++ b/arch/x86/kernel/apb_timer.c
@@ -43,10 +43,11 @@
#include <asm/fixmap.h>
#include <asm/apb_timer.h>
+#include <asm/mrst.h>
#define APBT_MASK CLOCKSOURCE_MASK(32)
#define APBT_SHIFT 22
-#define APBT_CLOCKEVENT_RATING 150
+#define APBT_CLOCKEVENT_RATING 110
#define APBT_CLOCKSOURCE_RATING 250
#define APBT_MIN_DELTA_USEC 200
@@ -83,8 +84,6 @@ struct apbt_dev {
char name[10];
};
-int disable_apbt_percpu __cpuinitdata;
-
static DEFINE_PER_CPU(struct apbt_dev, cpu_apbt_dev);
#ifdef CONFIG_SMP
@@ -204,9 +203,9 @@ static inline int __init setup_x86_mrst_timer(char *arg)
return -EINVAL;
if ...From: Feng Tang <feng.tang@intel.com>
Some wall clock devices use MMIO based HW register, this new function will
give them a chance to do some initialization work before their get/set_time
service get called.
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
arch/x86/include/asm/x86_init.h | 2 ++
arch/x86/kernel/setup.c | 2 ++
arch/x86/kernel/x86_init.c | 2 ++
3 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 519b543..be027a8 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -138,6 +138,7 @@ struct x86_cpuinit_ops {
/**
* struct x86_platform_ops - platform specific runtime functions
* @calibrate_tsc: calibrate TSC
+ * @wallclock_init: init the wallclock device
* @get_wallclock: get time from HW clock like RTC etc.
* @set_wallclock: set time back to HW clock
* @is_untracked_pat_range exclude from PAT logic
@@ -145,6 +146,7 @@ struct x86_cpuinit_ops {
*/
struct x86_platform_ops {
unsigned long (*calibrate_tsc)(void);
+ void (*wallclock_init)(void);
unsigned long (*get_wallclock)(void);
int (*set_wallclock)(unsigned long nowtime);
void (*iommu_shutdown)(void);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c4851ef..d001d8c 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1050,6 +1050,8 @@ void __init setup_arch(char **cmdline_p)
#endif
x86_init.oem.banner();
+ x86_platform.wallclock_init();
+
mcheck_init();
}
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 61a1e8c..ee00d76 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -24,6 +24,7 @@ void __init x86_init_uint_noop(unsigned int unused) { }
void __init x86_init_pgd_noop(pgd_t *unused) { }
int __init iommu_init_noop(void) { return 0; }
void iommu_shutdown_noop(void) { ...Moorestown PCI code has special handling of devices with fixed BARs. In
case of BAR sizing writes, we need to update the fake PCI MMCFG space with real
size decode value.
When a BAR is not present, we need to return 0 instead of ~0. ~0 will be
treated as device error per bugzilla 12006.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
---
arch/x86/pci/mrst.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/x86/pci/mrst.c b/arch/x86/pci/mrst.c
index 8bf2fcb..d5c7aef 100644
--- a/arch/x86/pci/mrst.c
+++ b/arch/x86/pci/mrst.c
@@ -109,7 +109,7 @@ static int pci_device_update_fixed(struct pci_bus *bus, unsigned int devfn,
decode++;
decode = ~(decode - 1);
} else {
- decode = ~0;
+ decode = 0;
}
/*
--
1.6.3.3
--
Moorestown does not have BIOS provided MP tables, we can save some time by avoiding scaning of these tables. e.g. [ 0.000000] Scan SMP from c0000000 for 1024 bytes. [ 0.000000] Scan SMP from c009fc00 for 1024 bytes. [ 0.000000] Scan SMP from c00f0000 for 65536 bytes. [ 0.000000] Scan SMP from c00bfff0 for 1024 bytes. Searching EBDA with the base at 0x40E will also result in random pointer deferencing within 1MB. This can be a problem in Lincroft if the pointer hits VGA area and VGA mode is not enabled. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> --- arch/x86/kernel/mrst.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/mrst.c b/arch/x86/kernel/mrst.c index 0b3ef9f..73c1692 100644 --- a/arch/x86/kernel/mrst.c +++ b/arch/x86/kernel/mrst.c @@ -351,5 +351,9 @@ void __init x86_mrst_early_setup(void) legacy_pic = &null_legacy_pic; + /* Avoid searching for BIOS MP tables */ + x86_init.mpparse.find_smp_config = x86_init_noop; + x86_init.mpparse.get_smp_config = x86_init_uint_noop; + mrst_identify_cpu(); } -- 1.6.3.3 --
| Jesse Barnes | Re: [stable] [BUG][PATCH] cpqphp: fix kernel NULL pointer dereference |
| Greg KH | [003/136] p54usb: add Zcomax XG-705A usbid |
| Magnus Damm | [PATCH 03/07] ARM: Use shared GIC entry macros on Realview |
| Oliver Neukum | Re: [Bug #13682] The webcam stopped working when upgrading from 2.6.29 to 2.6.30 |
| Martin Schwidefsky | Re: [PATCH] optimized ktime_get[_ts] for GENERIC_TIME=y |
git: | |
| Junio C Hamano | Re: Some advanced index playing |
| Jeff King | Re: confusion over the new branch and merge config |
| Robin Rosenberg | Re: cvs2svn conversion directly to git ready for experimentation |
| Linus Torvalds | git binary size... |
| Ævar Arnfjörð Bjarmason | Re: Challenge with Git-Bash |
| Linux Kernel Mailing List | md: move allocation of ->queue from mddev_find to md_probe |
| Linux Kernel Mailing List | md: raid0: Represent zone->zone_offset in sectors. |
| Linux Kernel Mailing List | [ARM] S3C24XX: Add gpio_to_irq() facility |
| Linux Kernel Mailing List | md: raid0_make_request(): Replace local variable block by sector. |
| Linux Kernel Mailing List |
