Re: [PATCH 7/8] x86/mrst: add vrtc driver which serves as a wall clock device

Previous thread: [PATCH 1/3] pda_power: add support for writeable properties by Daniel Mack on Tuesday, March 23, 2010 - 2:06 am. (3 messages)

Next thread: [PATCH] USB: tty: fix incorrect use of tty_insert_flip_string_fixed_flag by Johan Hovold on Friday, May 7, 2010 - 10:46 am. (1 message)
From: Jacob Pan
Date: Friday, May 7, 2010 - 10:41 am

Hi hpa, ingo, and tglx,

The following changes are based on commit:
d94e93d495514c69d4a7a553c0938cd777267e5d

Changes include addition of vitual RTC driver and many clockevent tweaks in
order to support Medfield (next generation) in the same binary kernel image.

We consolidated tsc and lapic calibration into platform specific routine
under x86_init for MRST. There will be a follow-up RFC patch does the similar
consolidation for standard PC with HPET.

*** BLURB HERE ***

Feng Tang (2):
  x86/platform: add a wallclock_init func to x86_platforms ops
  x86/mrst: add vrtc driver which serves as a wall clock device

Jacob Pan (6):
  x86: avoid check hlt if no timer interrupts
  x86/mrst/pci: return 0 for non-present pci bars
  x86/apic: allow use of lapic timer early calibration result
  x86/mrst: change clock selection logic to support medfield
  x86/apbt: support more timer configurations on mrst
  x86/mrst: Add nop functions to x86_init mpparse functions

 arch/x86/include/asm/apb_timer.h |    2 +-
 arch/x86/include/asm/bugs.h      |    1 +
 arch/x86/include/asm/fixmap.h    |    4 +
 arch/x86/include/asm/mrst.h      |   30 ++++++++
 arch/x86/include/asm/vrtc.h      |   27 +++++++
 arch/x86/include/asm/x86_init.h  |    2 +
 arch/x86/kernel/Makefile         |    2 +-
 arch/x86/kernel/apb_timer.c      |   18 +++--
 arch/x86/kernel/apic/apic.c      |   21 +++++-
 arch/x86/kernel/cpu/bugs.c       |    4 +
 arch/x86/kernel/mrst.c           |  143 ++++++++++++++++++++++++++++++++++---
 arch/x86/kernel/setup.c          |    2 +
 arch/x86/kernel/vrtc.c           |  100 ++++++++++++++++++++++++++
 arch/x86/kernel/x86_init.c       |    2 +
 arch/x86/pci/mrst.c              |    2 +-
 15 files changed, 336 insertions(+), 24 deletions(-)
 create mode 100644 arch/x86/include/asm/vrtc.h
 create mode 100644 arch/x86/kernel/vrtc.c

--

From: Jacob Pan
Date: Friday, May 7, 2010 - 10:41 am

From: Jacob Pan <jacob.jun.pan@intel.com>

check hlt requires external timer interrupt to wake up the
cpu. for platforms equipped with only per cpu timers, we don't
have external interrupts during check hlt.
this patch checks such condition to avoid kernel hang at hlt.
it also saves boot time in that hlt four times in the current code
requires four ticks to break out of them.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/include/asm/bugs.h |    1 +
 arch/x86/kernel/cpu/bugs.c  |    4 ++++
 2 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/bugs.h b/arch/x86/include/asm/bugs.h
index 08abf63..1e0cef8 100644
--- a/arch/x86/include/asm/bugs.h
+++ b/arch/x86/include/asm/bugs.h
@@ -2,6 +2,7 @@
 #define _ASM_X86_BUGS_H
 
 extern void check_bugs(void);
+extern struct clock_event_device *global_clock_event;
 
 #if defined(CONFIG_CPU_SUP_INTEL) && defined(CONFIG_X86_32)
 int ppro_with_ram_bug(void);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 01a2652..c0d9688 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -90,6 +90,10 @@ static void __init check_hlt(void)
 		return;
 
 	printk(KERN_INFO "Checking 'hlt' instruction... ");
+	if (!global_clock_event) {
+		printk(KERN_CONT "no clockevent to wake up, skipped.\n");
+		return;
+	}
 	if (!boot_cpu_data.hlt_works_ok) {
 		printk("disabled\n");
 		return;
-- 
1.6.3.3

--

From: H. Peter Anvin
Date: Friday, May 7, 2010 - 1:32 pm

I really wish I knew the exact systems affected by the HLT bug.  If I
remember correctly, it was some 386 systems -- or possibly 486 systems
as well -- a very long time ago.  This test just provides a diagnosis if
the system really is bad (it hangs with an obvious message) at the cost
of some 40 ms to the system boot time.  I suspect C1 (HLT) being broken
is not anywhere close to the predominant power management problem in the
current day, and as such I'm wondering if this particular test hasn't
outlived its usefulness.

Thoughts?

	-hpa

--


we could at least hide it behind the "don't run on pentium or newer" config options..
--


I'd be cool skipping it for family 5 or newer.  I'm just wondering if we
should kill it completely -- IIRC it was only a handful of 386/486
systems which had problems, usually due to marginal power supplies which
couldn't handle the noise of a variable load (DOS not having any power
management would run at a reliable 100% load) -- that's not exactly the
type of systems which would have survived to modern day.

	-hpa

--


Also SMM and hardware bugs on some platforms - Cyrix MediaGX 5510 for
example where a hlt at the wrong moment during ATA transfers hung the box
until power cycle. But all old old stuff.

Alan
--


I think family < 5 seems a reasonable cutoff.

Note that the ATA transfer bug you describe above would not be caught by
the existing check.

	-hpa

--


On Fri, 07 May 2010 15:27:34 -0700

MediaGX5510 would I'm pretty certain be 486 reporting anyway
--


but a boot time "does hlt work at all" check won't catch that.
--


Ack. That would take care of all relevant machines.

		Linus
--


Sounds like a plan.  Jacob, do you want to submit a new patch (bypassing
this check if boot_cpu_info.x86 >= 5)?

	-hpa

--


Just sent out the updated patch. I guess you meant boot_cpu_data instead of boot_cpu_info.
--

From: Jacob Pan
Date: Friday, May 7, 2010 - 10:41 am

From: Jacob Pan <jacob.jun.pan@intel.com>

lapic timer calibration can be combined with tsc in platform specific
calibration functions. if such calibration result is obtained early,
we can skip the redundent calibration loops.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/kernel/apic/apic.c |   21 ++++++++++++++++++++-
 1 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index e5a4a1e..8ef56ac 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -175,7 +175,7 @@ static struct resource lapic_resource = {
 	.flags = IORESOURCE_MEM | IORESOURCE_BUSY,
 };
 
-static unsigned int calibration_result;
+unsigned int calibration_result;
 
 static int lapic_next_event(unsigned long delta,
 			    struct clock_event_device *evt);
@@ -597,6 +597,25 @@ static int __init calibrate_APIC_clock(void)
 	long delta, deltatsc;
 	int pm_referenced = 0;
 
+	/**
+	 * check if lapic timer has already been calibrated by platform
+	 * specific routine, such as tsc calibration code. if so, we just fill
+	 * in the clockevent structure and return.
+	 */
+
+	if (calibration_result) {
+		apic_printk(APIC_VERBOSE, "lapic timer already calibrated %d\n",
+				calibration_result);
+		lapic_clockevent.mult = div_sc(calibration_result/APIC_DIVISOR,
+					TICK_NSEC, lapic_clockevent.shift);
+		lapic_clockevent.max_delta_ns =
+			clockevent_delta2ns(0x7FFFFF, &lapic_clockevent);
+		lapic_clockevent.min_delta_ns =
+			clockevent_delta2ns(0xF, &lapic_clockevent);
+		lapic_clockevent.features &= ~CLOCK_EVT_FEAT_DUMMY;
+		return 0;
+	}
+
 	local_irq_disable();
 
 	/* Replace the global interrupt handler */
-- 
1.6.3.3

--

From: Thomas Gleixner
Date: Tuesday, May 11, 2010 - 6:46 am

I'd rather move lapic calibration into TSC calibration in general as
we do the same thing twice for no good reason.



Aside of my general objection it'd be not a good idea to make this
global w/o renaming it to something sensible like

--

From: Pan, Jacob jun
Date: Tuesday, May 11, 2010 - 12:42 pm

I am trying to avoid the risks of completely remove the current lapic 
calibration code since there are so many platforms with different timer
options. And I don't understand things like why pm timer is preferred.
perhaps, the calibration data can directly be assigned to lapic timer
clock_event_device.mult? There is no need for the device specific result
--

From: Thomas Gleixner
Date: Tuesday, May 11, 2010 - 12:50 pm

Jacob,


We do not have access to the clocksource at that point. But we do a
calibration loop for TSC and for lapic timer. There is no reason why

No. That needs an accessor function.

Thanks,

	tglx
--

From: Pan, Jacob jun
Date: Tuesday, May 11, 2010 - 1:46 pm

I have a simple patch that uses HPET to calibrate both tsc and lapic timers in one
loop in native_calibrate_tsc(). Will send out for RFC.

The question I have is the reference clock used for calibrating those local timers,
PIT, HPET, PM timer how should they be ranked. Can we make those known freq
clocksource devices available at this point so that we can use the clocksource
Agreed, so you are ok with bypassing the existing calibration_result variable and use .mult
to carry the result?

--

From: H. Peter Anvin
Date: Tuesday, May 11, 2010 - 1:51 pm

Personally I'd rank the PMTMR first, then HPET, then PIT, just based on
the relative complexity and relative known bugginess of the various
implementations.

The PMTMRs main defect is that it can't generate an interrupt; it's just
a dumb counter.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--

From: Jacob Pan
Date: Friday, May 7, 2010 - 10:41 am

From: Jacob Pan <jacob.jun.pan@intel.com>

Penwell has added always on lapic timers and tsc, we want to treat
it as a variant of moorestown so that one binary kernel can boot on both.
this patch added clock selction logic so that platform code can set up the
optimal configuration for both moorestown and medfield.

This patch will also mark Penwell TSC reliable, thus no need for
watchdog clocksource to monitor it.

Signed-off-by: Alek Du <alek.du@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/include/asm/mrst.h |   30 +++++++++++
 arch/x86/kernel/mrst.c      |  119 ++++++++++++++++++++++++++++++++++++++----
 2 files changed, 137 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/mrst.h b/arch/x86/include/asm/mrst.h
index 451d30e..3054407 100644
--- a/arch/x86/include/asm/mrst.h
+++ b/arch/x86/include/asm/mrst.h
@@ -11,7 +11,37 @@
 #ifndef _ASM_X86_MRST_H
 #define _ASM_X86_MRST_H
 extern int pci_mrst_init(void);
+extern unsigned int calibration_result;
+
 int __init sfi_parse_mrtc(struct sfi_table_header *table);
+extern int mrst_timer_options __cpuinitdata;
+extern int mrst_identify_cpu(void);
+
+/**
+ * Medfield is the follow-up of Moorestown, it combines two chip solution into
+ * one. Other than that it also added always-on and constant tsc and lapic
+ * timers. Medfield is the platform name, and the chip name is called Penwell
+ * we treat Medfield/Penwell as a variant of Moorestown. Penwell can be
+ * identified via MSRs.
+ */
+
+
+#define MRST_CPU_CHIP_LINCROFT	1
+#define MRST_CPU_CHIP_PENWELL	2
+
+#define PENWELL_FAMILY		0x20670
+
+/**
+ * Penwell uses spread spectrum clock, so the freq number is not exactly
+ * as reported by MSR.
+ */
+#define PENWELL_FSB_FREQ_83SKU         83200
+#define PENWELL_FSB_FREQ_100SKU        99840
+
+#define MRST_TIMER_DEFAULT	0
+#define MRST_TIMER_APBT_ONLY	1
+#define MRST_TIMER_LAPIC_APBT	2
+
 
 #define ...
From: Thomas Gleixner
Date: Tuesday, May 11, 2010 - 7:36 am

apbt sucks performance wise, so why do you consider it a better


 This is not Penwell specific at all. The ratio can be read out on all
 Core based CPUs either from MSR_PLATFORM_ID[12:8] or
 MSR_PERF_STAT[44:40] depending on XE operation enabled
 (MSR_PERF_STAT[31] == 1)

 This should be made general available and not burried into the mrst

 I guess the 111 is Penwell/MRST specific, right ?

 According to SDM we have anyway different results for the various CPU
 families, but we should utilize this in a generic way and have the

  Yikes. From which Intel cookbook is this ?

  Aside of that we have that info in boot_cpu_info already, don't we ?
  So there is neither a requirement for the extra cpuid call nor for

  Are we going to add one of those for each new family ? This is
--

From: Alan Cox
Date: Tuesday, May 11, 2010 - 8:30 am

From the driver side we need this value determined and visible to the
drivers because some things like the IPC interface need to know rather
than getting the cpuid magic replicated around.

--

From: Thomas Gleixner
Date: Tuesday, May 11, 2010 - 8:50 am

Ok. That makes sense. Is there more what MRST drivers need to know ?

Thanks,

	tglx

--

From: Alan Cox
Date: Tuesday, May 11, 2010 - 9:03 am

On Tue, 11 May 2010 17:50:28 +0200 (CEST)

At some point they will need to know what platform they are booting on, I
have a possible patch for that which just exposes the boot loader value
but I'm not sure what the final direction on that will be.

For CPU they need to know which CPU type as it affects some bits like the
message formats in use. Probably that means the mrst_cpu_type enum or
function should have an enum value of 0 which means "Not a moorestown
device".

--

From: Pan, Jacob jun
Date: Thursday, May 13, 2010 - 3:16 pm

apbt is always-on, I guess depends on the load, it can be better than
having broadcast timers. e.g. if there are frequency transitions
between C0 to deep C states. if we are always in c0, I can easily see
native performance impact with per cpu apbt. I don't have power number
to backup either cases. ftrace shows programming lapic timer is quite
expensive, I don't understand.


 1)               |          clockevents_program_event() {
 1)               |            lapic_next_event() {
 1)   2.947 us    |              native_apic_mem_write();
 1)   8.682 us    |            }
 1) + 14.676 us   |          }

 0)               |                  clockevents_program_event() {
 0)   4.146 us    |                    apbt_next_event();
it can be more straightforward if we don't allow user cmdline
initially, I thought we need this before boot_cpu_data is initialized.
But we actually don't need it early. I will fix that.

--

From: Du, Alek
Date: Sunday, May 16, 2010 - 7:14 pm

Hi tglx,

Please help to review this patch, it is against the latest patches Jacob sent out:
Basically the idea is to put bus_ratio and fsb in cpuinfo_x86 structure, and the
CPU early_init_intel function will fill the info.

From 5ae648b2f18778df4eb3f1916a98971332482544 Mon Sep 17 00:00:00 2001
From: Jacob Pan <jacob.jun.pan@linux.intel.com>
Date: Fri, 14 May 2010 13:45:46 -0700
Subject: [PATCH 1/2] x86/mrst: Auto detect freq for local timers

Some Intel CPUs can directly get fsb frequency and bus ratio from
various MSRs. This patch enables this feature and benefit Medfield
platform.

Signed-off-by: Alek Du <alek.du@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/include/asm/processor.h |    3 +++
 arch/x86/kernel/cpu/intel.c      |   34 ++++++++++++++++++++++++++++++++++
 arch/x86/kernel/mrst.c           |   32 ++++++++++++++++++++++++--------
 3 files changed, 61 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 32428b4..f72107f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -94,6 +94,9 @@ struct cpuinfo_x86 {
 	int			x86_cache_alignment;	/* In bytes */
 	int			x86_power;
 	unsigned long		loops_per_jiffy;
+	/* support TSC and LAPIC non-calibartion way */
+	__u32			bus_ratio;
+	__u32			fsb;	/* In khz */
 #ifdef CONFIG_SMP
 	/* cpus sharing the last level cache: */
 	cpumask_var_t		llc_shared_map;
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 85f69cd..f620abc 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -27,6 +27,38 @@
 #include <asm/apic.h>
 #endif
 
+/* MSR_FSB_FREQ in Khz */
+const u32 fsb[] = {266667, 133333, 200000, 166667, 333333, 100000, 400000, 0};
+/* detect Intel cpus that can do TSC/LAPIC non-calibration way */
+static void __cpuinit intel_tsc_fsb(struct cpuinfo_x86 *c)
+{
+	u32 lo, hi;
+
+	if (c->x86 != 6)
+		return;
+	if (c->x86_model != ...
From: Du, Alek
Date: Sunday, May 16, 2010 - 7:27 pm

Sorry, send again, the previous patch should be merged with the patch that
export lapic_timer_frequency and allow lapic use early calibration result.
Otherwise, it will break the build...


From 4103dc6165530a490c12a6177d916f2c2e012436 Mon Sep 17 00:00:00 2001
From: Jacob Pan <jacob.jun.pan@linux.intel.com>
Date: Fri, 14 May 2010 13:45:46 -0700
Subject: [PATCH] x86/mrst: Auto detect freq for local timers

Some Intel CPUs can directly get fsb frequency and bus ratio from
various MSRs. This patch enables this feature and benefit Medfield
platform.

lapic timer calibration can be combined with tsc in platform specific
calibration functions. if such calibration result is obtained early,
we can skip the redundent calibration loops.

Signed-off-by: Alek Du <alek.du@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/include/asm/apic.h      |    1 +
 arch/x86/include/asm/processor.h |    3 +++
 arch/x86/kernel/apic/apic.c      |   33 ++++++++++++++++++++++++++-------
 arch/x86/kernel/cpu/intel.c      |   34 ++++++++++++++++++++++++++++++++++
 arch/x86/kernel/mrst.c           |   32 ++++++++++++++++++++++++--------
 5 files changed, 88 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 1fa03e0..448c763 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -50,6 +50,7 @@ extern unsigned int apic_verbosity;
 extern int local_apic_timer_c2_ok;
 
 extern int disable_apic;
+extern unsigned int lapic_timer_frequency;
 
 #ifdef CONFIG_SMP
 extern void __inquire_remote_apic(int apicid);
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 32428b4..f72107f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -94,6 +94,9 @@ struct cpuinfo_x86 {
 	int			x86_cache_alignment;	/* In bytes */
 	int			x86_power;
 	unsigned long		loops_per_jiffy;
+	/* support TSC and LAPIC non-calibartion way ...
From: Jacob Pan
Date: Friday, May 7, 2010 - 10:41 am

From: Feng Tang <feng.tang@intel.com>

Moorestown platform doesn't have a m146818 RTC device like traditional
x86 PC, but a firmware emulated virtual RTC device(vrtc), which provides
some basic RTC functions like get/set time. vrtc serves as the only
wall clock device on Moorestown platform.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/include/asm/fixmap.h |    4 ++
 arch/x86/include/asm/vrtc.h   |   27 +++++++++++
 arch/x86/kernel/Makefile      |    2 +-
 arch/x86/kernel/mrst.c        |   20 ++++++++
 arch/x86/kernel/vrtc.c        |  100 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 152 insertions(+), 1 deletions(-)
 create mode 100644 arch/x86/include/asm/vrtc.h
 create mode 100644 arch/x86/kernel/vrtc.c

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index d07b44f..c4ba8d9 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -117,6 +117,10 @@ enum fixed_addresses {
 	FIX_TEXT_POKE1,	/* reserve 2 pages for text_poke() */
 	FIX_TEXT_POKE0, /* first page is last, because allocation is backward */
 	__end_of_permanent_fixed_addresses,
+
+#ifdef	CONFIG_X86_MRST
+	FIX_LNW_VRTC,
+#endif
 	/*
 	 * 256 temporary boot-time mappings, used by early_ioremap(),
 	 * before ioremap() is functional.
diff --git a/arch/x86/include/asm/vrtc.h b/arch/x86/include/asm/vrtc.h
new file mode 100644
index 0000000..4e40129
--- /dev/null
+++ b/arch/x86/include/asm/vrtc.h
@@ -0,0 +1,27 @@
+#ifndef _MRST_VRTC_H
+#define _MRST_VRTC_H
+
+#ifdef CONFIG_X86_MRST
+extern unsigned char vrtc_cmos_read(unsigned char reg);
+extern void vrtc_cmos_write(unsigned char val, unsigned char reg);
+extern unsigned long vrtc_get_time(void);
+extern int vrtc_set_mmss(unsigned long nowtime);
+extern void vrtc_set_base(void __iomem *base);
+
+#define MRST_VRTC_PGOFFSET  (0xc00)
+
+#else
+static inline unsigned char vrtc_cmos_read(unsigned char reg)
+{
+	return ...
From: Joe Perches
Date: Friday, May 7, 2010 - 11:51 am

Even though many of the rtc drivers print this way, it seems
a very backwards way of presenting time to me.

I think "yyyy-mm-dd hh:mm:ss" is a better representation, more
compact and readable.

Also KERN_INFO may not be good, maybe KERN_DEBUG or pr_debug?

--

From: Alan Cox
Date: Friday, May 7, 2010 - 12:02 pm

On Fri, 07 May 2010 11:51:06 -0700

Consistency is really more important here IMHO - lots of drivers have set
an existing policy.

--

From: Joe Perches
Date: Friday, May 7, 2010 - 12:06 pm

(added Alessandro Zummo to cc's)

look at drivers/rtc.

All of them seem to use a templated copy/paste dev_dbg,
which seems to point to a use for a possible common rtc_util.c
or some such where this could be standardized.
 
Is there somewhere else this style is used?

--

From: H. Peter Anvin
Date: Friday, May 7, 2010 - 12:56 pm

Probably the right thing to do is to (a) move all this printing to
common code; (b) change to ISO 8601 format.

	-hpa

--

From: Feng Tang
Date: Monday, May 10, 2010 - 2:17 am

On Sat, 8 May 2010 03:56:09 +0800

Thank you all for the comments. please review this follow-on patch.

- Feng
--
diff --git a/arch/x86/kernel/vrtc.c b/arch/x86/kernel/vrtc.c
index fa1eb63..0fbd74d 100644
--- a/arch/x86/kernel/vrtc.c
+++ b/arch/x86/kernel/vrtc.c
@@ -74,8 +74,8 @@ unsigned long vrtc_get_time(void)
 	/* vRTC YEAR reg contains the offset to 1960 */
 	year += 1960;
 
-	printk(KERN_INFO "vRTC: sec: %d min: %d hour: %d day: %d "
-		"mon: %d year: %d\n", sec, min, hour, mday, mon, year);
+	pr_debug("vrtc time: %4d-%02d-%02d %02d:%02d:%02d\n",
+		year, mon, mday, hour, min, sec);
 
 	return mktime(year, mon, mday, hour, min, sec);
 }
--

From: H. Peter Anvin
Date: Monday, May 10, 2010 - 11:22 am

That doesn't move it to common code, though.  I'd rather see existing
common style used, then merged (centralized) and *then* the style corrected.

	-hpa
--

From: Feng Tang
Date: Monday, May 10, 2010 - 7:30 pm

On Tue, 11 May 2010 02:22:15 +0800

Hi Peter,

The reason I didn't move it to rtc common code is, this vrtc.c sits in
arch/x86/kernel/ and better not to depend on drivers/rtc, as drivers/rtc
may not be always enabled in kernel configuration.

I also have another general driver for vrtc which will be in drivers/rtc/,
just like general x86 kernel which has a rtc.c in arch/ and a rtc-cmos.c
in drivers/rtc, I will clean my code up and try to move these funcs to a
common code. 

Thanks,
Feng
--

From: Thomas Gleixner
Date: Tuesday, May 11, 2010 - 7:57 am

Errm. That's a MRST specific header and nothing outside of MRST is
  using it. So why the #ifdef CONFIG_X86_MRST and the inline functions

Why do we need a fixmap for that ? There is no need to setup RTC that
early. The first call is from timekeeping_init() 


  This lock_cmos magic should just die. I have no idea why something

  Please remove the debug noise

Thanks,

	tglx
--

From: Feng Tang
Date: Tuesday, May 11, 2010 - 7:34 pm

Hi Thomas,

Thanks for the great comments!


On Tue, 11 May 2010 22:57:44 +0800

My bad not mentioning there is another rtc-mrst.c which will sit in drivers/rtc,
it will use some of the functions listed here. It will be posted later.

vrtc.c/rtc-mrst.c is similar with the rtc.c/rtc-cmos.c in general x86 PCs, as
drivers/rtc may not always be enabled in kernel, vrtc.c need sit in arch/x86
to provide the get/set_time service, while rtc-mrst.c will serve general rtc

Actually when to init the vrtc register is a big problem for me, vrtc
need be inited before timekeeping_init(), and I thought better to put it
somewhere in setup_arch(), as it is architecture specific, and ioremap
is not working at that time. Also that's the reason I created a new
wallclock_init func for x86_platforms, I could not find a better way
I agree I should move this init code to vrtc.c, but still think it should

I will try to reuse the rtc_lock defined in rtc.c whose get/set_time

Will make it a pr_debug.

Thanks,
Feng
--

From: Thomas Gleixner
Date: Monday, May 17, 2010 - 2:15 am

Feng,

[ Cc'ed John ]


There is no particular reason why we need to read it in
timekeeping_init(). Nothing in the kernel needs the correct wall time
at that point. So we can safely move the setting of xtime to rtc wall
clock time to a separate timekeeping_late_init() function.



Please don't create artifical dependencies. Use a separate vrtc_lock
to serialize the access to vrtc.

Thanks,

	tglx
--

From: Feng Tang
Date: Monday, May 17, 2010 - 11:27 pm

Hi Thomas,


On Mon, 17 May 2010 17:15:55 +0800
Yeah, good suggestion, if xtime init is moved to a later time in kernel
I just checked the code, when wall clock's get/set_time service is called,
it is always protected by rtc_lock(code in arch/x86/kernel/rtc.c), then
no need to add the lock for each individual register read/write operation.

I will submit a v2 vrtc patch.

Thanks,
--

From: Thomas Gleixner
Date: Tuesday, May 18, 2010 - 12:38 am

John was away yesterday, so we have to wait for his answer, but I don't

Grr. Yes, missed that the code is called under rtc_lock already. So
you can drop the locking in your code completely.

Thanks,

	tglx
--

From: john stultz
Date: Tuesday, May 18, 2010 - 1:43 pm

No big objections here. Still would like to keep the amount of time that
the kernel is up without xtime being initialized to a minimum. However
the generic RTC code already have this issue since some of them require
interrupts to be enabled to do a read, so pushing it off into a
_late_init() function is probably just a short term fix until we figure
out how to get the generic RTC code working better with the timekeeping
code.

Does the delayed init required by vrtc cause any trouble with
suspend/resume?

thanks
-john 

--

From: Thomas Gleixner
Date: Tuesday, May 18, 2010 - 2:02 pm

Why? Nothing _IS_ depending on xtime at this point. We just should
have it ready when we mount the first file system, but even that is

Well, you need to wait until the driver is loaded and the hardware
accessible which might take some time when I2C, SPI or similar stuff
is involved. No way to get this stuff accesible early.

So really the question is, why do we want wall time initialized early?
All we care about until we hit user space is clock monotonic and
working timekeeping in general. NTP is not an issue either before we

No, suspend/resume does not go through that early code. It has
everything setup (mappings, etc)

Thanks,

	tglx


--

From: Feng Tang
Date: Thursday, May 20, 2010 - 7:15 pm

From 2451f205e29be57b30f0c50759c1b05a67bb97d1 Mon Sep 17 00:00:00 2001
From: Feng Tang <feng.tang@intel.com>
Date: Wed, 19 May 2010 16:57:20 +0800
Subject: [PATCH 1/3] timekeeping: moving xtime's init to a later time

xtime doesn't need to be inited early, move it to a subsys_initcall,
as most of its consumers come from userspace. It will also give enough
time for some MMIO based wallclock devices to init

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 kernel/time/timekeeping.c |   28 +++++++++++++++++++++-------
 1 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 39f6177..7bc32de 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -529,26 +529,37 @@ void __attribute__((weak)) read_boot_clock(struct timespec *ts)
 }
 
 /*
- * timekeeping_init - Initializes the clocksource and common timekeeping values
+ * timekeeping_init - Initializes the clocksource
  */
 void __init timekeeping_init(void)
 {
 	struct clocksource *clock;
 	unsigned long flags;
-	struct timespec now, boot;
-
-	read_persistent_clock(&now);
-	read_boot_clock(&boot);
 
 	write_seqlock_irqsave(&xtime_lock, flags);
 
-	ntp_init();
-
 	clock = clocksource_default_clock();
 	if (clock->enable)
 		clock->enable(clock);
 	timekeeper_setup_internals(clock);
 
+	ntp_init();
+	write_sequnlock_irqrestore(&xtime_lock, flags);
+}
+
+/*
+ *  timekeeping_late_init - Initaizes the common timekeeping values
+ */
+static int __init timekeeping_late_init(void)
+{
+	unsigned long flags;
+	struct timespec now, boot;
+
+	read_persistent_clock(&now);
+	read_boot_clock(&boot);
+
+	write_seqlock_irqsave(&xtime_lock, flags);
+
 	xtime.tv_sec = now.tv_sec;
 	xtime.tv_nsec = now.tv_nsec;
 	raw_time.tv_sec = 0;
@@ -563,7 +574,10 @@ void __init timekeeping_init(void)
 	total_sleep_time.tv_sec = 0;
 	total_sleep_time.tv_nsec = 0;
 ...
From: Feng Tang
Date: Thursday, May 20, 2010 - 7:16 pm

From 9ba7685bfdcaa2a5c40eade388902c478bc46071 Mon Sep 17 00:00:00 2001
From: Feng Tang <feng.tang@intel.com>
Date: Thu, 20 May 2010 17:40:05 +0800
Subject: [PATCH 2/3] x86: unify current 3 similar ways of saving IRQ info

There are 3 places defining the similar function of saving IRQ
vector info into mp_irqs[] array: mmparse/acpi/sfi. This patch
will reduce the redundant code, and make it only one API:
	void mp_save_irq(struct mpc_intsrc *m);

Cc: Len Brown <len.brown@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 arch/x86/include/asm/mpspec.h |    6 ++++++
 arch/x86/kernel/acpi/boot.c   |   32 +++-----------------------------
 arch/x86/kernel/mpparse.c     |   14 +++++++-------
 arch/x86/kernel/mrst.c        |   30 ++----------------------------
 4 files changed, 18 insertions(+), 64 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index c82868e..17f4314 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -42,6 +42,12 @@ extern int quad_local_to_mp_bus_id [NR_CPUS/4][4];
 
 #endif /* CONFIG_X86_64 */
 
+#ifdef CONFIG_X86_IO_APIC
+void mp_save_irq(struct mpc_intsrc *m);
+#else
+static inline void mp_save_irq(struct mpc_intsrc *m) {}
+#endif
+
 #if defined(CONFIG_MCA) || defined(CONFIG_EISA)
 extern int mp_bus_id_to_type[MAX_MP_BUSSES];
 #endif
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 9a5ed58..8f32fe2 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -923,32 +923,6 @@ static int __init acpi_parse_madt_lapic_entries(void)
 extern int es7000_plat;
 #endif
 
-static void assign_to_mp_irq(struct mpc_intsrc *m,
-				    struct mpc_intsrc *mp_irq)
-{
-	memcpy(mp_irq, m, sizeof(struct mpc_intsrc));
-}
-
-static int mp_irq_cmp(struct mpc_intsrc *mp_irq,
-				struct mpc_intsrc *m)
-{
-	return memcmp(mp_irq, m, sizeof(struct mpc_intsrc));
-}
-
-static void save_mp_irq(struct ...
From: Feng Tang
Date: Thursday, May 20, 2010 - 7:19 pm

From 14760d3bf6f77a17400c30258e365c06cbc36661 Mon Sep 17 00:00:00 2001
From: Feng Tang <feng.tang@intel.com>
Date: Wed, 24 Mar 2010 10:31:35 +0800
Subject: [PATCH 3/3] x86/mrst: add vrtc driver which serves as a wall clock device

Moorestown platform doesn't have a m146818 RTC device like traditional
x86 PC, but a firmware emulated virtual RTC device(vrtc), which provides
some basic RTC functions like get/set time. vrtc serves as the only
wall clock device on Moorestown platform.

Currently, vrtc init func will be called as arch_initcall() before
xtime's init. Also move the sfi vrtc table parsing from mrst.c to
vrtc.c

There will be another general vrtc driver for rtc subsystem

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 arch/x86/include/asm/mrst.h |    2 -
 arch/x86/include/asm/vrtc.h |   24 +++++++
 arch/x86/kernel/Makefile    |    2 +-
 arch/x86/kernel/mrst.c      |   45 +-------------
 arch/x86/kernel/vrtc.c      |  147 +++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 173 insertions(+), 47 deletions(-)
 create mode 100644 arch/x86/include/asm/vrtc.h
 create mode 100644 arch/x86/kernel/vrtc.c

diff --git a/arch/x86/include/asm/mrst.h b/arch/x86/include/asm/mrst.h
index 451d30e..fa144f2 100644
--- a/arch/x86/include/asm/mrst.h
+++ b/arch/x86/include/asm/mrst.h
@@ -11,9 +11,7 @@
 #ifndef _ASM_X86_MRST_H
 #define _ASM_X86_MRST_H
 extern int pci_mrst_init(void);
-int __init sfi_parse_mrtc(struct sfi_table_header *table);
 
 #define SFI_MTMR_MAX_NUM 8
-#define SFI_MRTC_MAX	8
 
 #endif /* _ASM_X86_MRST_H */
diff --git a/arch/x86/include/asm/vrtc.h b/arch/x86/include/asm/vrtc.h
new file mode 100644
index 0000000..fcdfcde
--- /dev/null
+++ b/arch/x86/include/asm/vrtc.h
@@ -0,0 +1,24 @@
+#ifndef _MRST_VRTC_H
+#define _MRST_VRTC_H
+
+#define SFI_MRTC_MAX	8
+#define VRTC_IOLEN	1024
+
+#ifdef CONFIG_X86_MRST
+extern unsigned char vrtc_reg_read(unsigned char reg);
+extern void vrtc_reg_write(unsigned ...
From: Jacob Pan
Date: Friday, May 7, 2010 - 10:41 am

From: Jacob Pan <jacob.jun.pan@intel.com>

With the addition of Medfield as a follow-up of Moorestown, more timer
configurations are available to the kernel. This patch allows the
optimal default configuration to be chosen and overwritten by cmdline
as well.
i.e. For Moorestown, percpu APB timers are default. For Medfield local
always-on local APIC timers are default (w/o any platform timers).

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/include/asm/apb_timer.h |    2 +-
 arch/x86/kernel/apb_timer.c      |   18 ++++++++++--------
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/apb_timer.h b/arch/x86/include/asm/apb_timer.h
index c74a2ee..4127fd1 100644
--- a/arch/x86/include/asm/apb_timer.h
+++ b/arch/x86/include/asm/apb_timer.h
@@ -55,7 +55,7 @@ extern unsigned long apbt_quick_calibrate(void);
 extern int arch_setup_apbt_irqs(int irq, int trigger, int mask, int cpu);
 extern void apbt_setup_secondary_clock(void);
 extern unsigned int boot_cpu_id;
-extern int disable_apbt_percpu;
+extern int mrst_timer_options;
 
 extern struct sfi_timer_table_entry *sfi_get_mtmr(int hint);
 extern void sfi_free_mtmr(struct sfi_timer_table_entry *mtmr);
diff --git a/arch/x86/kernel/apb_timer.c b/arch/x86/kernel/apb_timer.c
index a353475..08dfbf8 100644
--- a/arch/x86/kernel/apb_timer.c
+++ b/arch/x86/kernel/apb_timer.c
@@ -43,10 +43,11 @@
 
 #include <asm/fixmap.h>
 #include <asm/apb_timer.h>
+#include <asm/mrst.h>
 
 #define APBT_MASK			CLOCKSOURCE_MASK(32)
 #define APBT_SHIFT			22
-#define APBT_CLOCKEVENT_RATING		150
+#define APBT_CLOCKEVENT_RATING		110
 #define APBT_CLOCKSOURCE_RATING		250
 #define APBT_MIN_DELTA_USEC		200
 
@@ -83,8 +84,6 @@ struct apbt_dev {
 	char name[10];
 };
 
-int disable_apbt_percpu __cpuinitdata;
-
 static DEFINE_PER_CPU(struct apbt_dev, cpu_apbt_dev);
 
 #ifdef CONFIG_SMP
@@ -204,9 +203,9 @@ static inline int __init setup_x86_mrst_timer(char *arg)
 		return -EINVAL;
 
 	if ...
From: Jacob Pan
Date: Friday, May 7, 2010 - 10:41 am

From: Feng Tang <feng.tang@intel.com>

Some wall clock devices use MMIO based HW register, this new function will
give them a chance to do some initialization work before their get/set_time
service get called.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/include/asm/x86_init.h |    2 ++
 arch/x86/kernel/setup.c         |    2 ++
 arch/x86/kernel/x86_init.c      |    2 ++
 3 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 519b543..be027a8 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -138,6 +138,7 @@ struct x86_cpuinit_ops {
 /**
  * struct x86_platform_ops - platform specific runtime functions
  * @calibrate_tsc:		calibrate TSC
+ * @wallclock_init:		init the wallclock device
  * @get_wallclock:		get time from HW clock like RTC etc.
  * @set_wallclock:		set time back to HW clock
  * @is_untracked_pat_range	exclude from PAT logic
@@ -145,6 +146,7 @@ struct x86_cpuinit_ops {
  */
 struct x86_platform_ops {
 	unsigned long (*calibrate_tsc)(void);
+	void (*wallclock_init)(void);
 	unsigned long (*get_wallclock)(void);
 	int (*set_wallclock)(unsigned long nowtime);
 	void (*iommu_shutdown)(void);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c4851ef..d001d8c 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1050,6 +1050,8 @@ void __init setup_arch(char **cmdline_p)
 #endif
 	x86_init.oem.banner();
 
+	x86_platform.wallclock_init();
+
 	mcheck_init();
 }
 
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 61a1e8c..ee00d76 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -24,6 +24,7 @@ void __init x86_init_uint_noop(unsigned int unused) { }
 void __init x86_init_pgd_noop(pgd_t *unused) { }
 int __init iommu_init_noop(void) { return 0; }
 void iommu_shutdown_noop(void) { ...
From: Thomas Gleixner
Date: Tuesday, May 11, 2010 - 7:42 am

No need for that.

Thanks,

	tglx
--

From: Jacob Pan
Date: Friday, May 7, 2010 - 10:41 am

Moorestown PCI code has special handling of devices with fixed BARs. In
case of BAR sizing writes, we need to update the fake PCI MMCFG space with real
size decode value.

When a BAR is not present, we need to return 0 instead of ~0. ~0 will be
treated as device error per bugzilla 12006.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 arch/x86/pci/mrst.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/pci/mrst.c b/arch/x86/pci/mrst.c
index 8bf2fcb..d5c7aef 100644
--- a/arch/x86/pci/mrst.c
+++ b/arch/x86/pci/mrst.c
@@ -109,7 +109,7 @@ static int pci_device_update_fixed(struct pci_bus *bus, unsigned int devfn,
 			decode++;
 			decode = ~(decode - 1);
 		} else {
-			decode = ~0;
+			decode = 0;
 		}
 
 		/*
-- 
1.6.3.3

--

From: Jacob Pan
Date: Friday, May 7, 2010 - 10:41 am

Moorestown does not have BIOS provided MP tables, we can save some time
by avoiding scaning of these tables. e.g.
[    0.000000] Scan SMP from c0000000 for 1024 bytes.
[    0.000000] Scan SMP from c009fc00 for 1024 bytes.
[    0.000000] Scan SMP from c00f0000 for 65536 bytes.
[    0.000000] Scan SMP from c00bfff0 for 1024 bytes.

Searching EBDA with the base at 0x40E will also result in random pointer
deferencing within 1MB. This can be a problem in Lincroft if the pointer
hits VGA area and VGA mode is not enabled.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/kernel/mrst.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/mrst.c b/arch/x86/kernel/mrst.c
index 0b3ef9f..73c1692 100644
--- a/arch/x86/kernel/mrst.c
+++ b/arch/x86/kernel/mrst.c
@@ -351,5 +351,9 @@ void __init x86_mrst_early_setup(void)
 
 	legacy_pic = &null_legacy_pic;
 
+	/* Avoid searching for BIOS MP tables */
+	x86_init.mpparse.find_smp_config = x86_init_noop;
+	x86_init.mpparse.get_smp_config = x86_init_uint_noop;
+
 	mrst_identify_cpu();
 }
-- 
1.6.3.3

--

Previous thread: [PATCH 1/3] pda_power: add support for writeable properties by Daniel Mack on Tuesday, March 23, 2010 - 2:06 am. (3 messages)

Next thread: [PATCH] USB: tty: fix incorrect use of tty_insert_flip_string_fixed_flag by Johan Hovold on Friday, May 7, 2010 - 10:46 am. (1 message)