Re: [git pull?] clocksource: ACPI pmtmr bugfixes [Was: Re: ACPI PM-Timer on K6-3 SiS5591: Houston...]

Previous thread: Re: [PATCH v2 02/13] ARM:DaVinci: base address for dm6467 by Jaswinder Singh on Sunday, August 10, 2008 - 2:34 am. (2 messages)

Next thread: [2.6 patch] fix watchdog/ixp4xx_wdt.c compilation by Adrian Bunk on Sunday, August 10, 2008 - 4:03 am. (2 messages)
From: Andreas Mohr
Date: Sunday, August 10, 2008 - 3:17 am

Hello all,

forgive me for such a blunt description, but:
"bad hardware - PASS, good hardware - FAIL"
That's about what could be used as a summary for my recent observations
about our ACPI PM-Timer usage ;)

With a modified pmtmr_test on my stoneage system, I get the following
sequence (relevant excerpt only):


652419 <==
65241b <===================
65241d <==
65201e
65241f
652020
652421
652022
652423
652024
652425
652429
65202e
652431
652032
652433
652034
652435
652036
652437
652038 <==
65203a <====================
65203c <==
65203e
652040
652042
652044
652046
652048

Result: catastrophic timer behaviour (a large backwards skip is possible),
even in case we do a triple-read workaround, due to a floating bit at
0x0400 (possibly caused by underclocking from 400 to 150, but whatever...).

And my system does pass the bootup PM-Timer check quite often despite
this severe defect (2 in 4 bootups _did_ register my defective
acpi_pm clocksource).

That was the "PASS for broken hardware" part.

Now to the "FAIL for sufficiently-good hardware" part:

I realized that in historic versions (e.g. 2.6.12) read_pmtmr()
encompassed the _entire_ "triple-reading due to latch bug" logic.
Nowadays read_pmtmr() is the raw inline version of a single inl() only!
However despite this large change, the initial hardware check
(at init_acpi_pm_clocksource()) _kept using_ the now single-read read_pmtmr()
as if nothing had happened. Why nobody realized this during review is
beyond me.
The result is that even on latch-bugged systems which are capable to be used
successfully (via the triple-read workaround) the PM-Timer bootup check most
likely can fail quite often now due to now doing buggy single-reads
during initial check
(cannot verify this in real life due to lack of affected P3 hardware here).


Both issues are caused by very weak monotony verification in
init_acpi_pm_clocksource(), thus that init check should get improved
by leaps and bounds instead of stupidly ...
From: Dominik Brodowski
Date: Sunday, August 10, 2008 - 9:29 am

Hi Andreas,


this isn't the bug which is handled by the read-three-times-workaround.
Instead, that handels the following PIIX4 errata:

 * PIIX4 Errata:
 *
 * The power management timer may return improper results when read.
 * Although the timer value settles properly after incrementing,
 * while incrementing there is a 3 ns window every 69.8 ns where the
 * timer value is indeterminate (a 4.2% chance that the data will be
 * incorrect when read). As a result, the ACPI free running count up

No surprise there -- it is the first time I see such an error; and it might


Well, we could do something like this for sure, but I haven't seen any other
=> might do this, but currently I'm not yet convinced whether we really need
it.

Best,
	Dominik


acpi_pm.c: use proper read function also in errata mode.

When acpi_pm is used in errata mode (three reads instead of one), also the
acpi_pm init functions need to use three reads instead of just one.

Thanks to Andreas Mohr for spotting this issue.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.de>

diff --git a/drivers/clocksource/acpi_pm.c b/drivers/clocksource/acpi_pm.c
index 5ca1d80..2c00edd 100644
--- a/drivers/clocksource/acpi_pm.c
+++ b/drivers/clocksource/acpi_pm.c
@@ -151,13 +151,13 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, PCI_DEVICE_ID_SERVERWORKS_LE,
  */
 static int verify_pmtmr_rate(void)
 {
-	u32 value1, value2;
+	cycle_t value1, value2;
 	unsigned long count, delta;
 
 	mach_prepare_counter();
-	value1 = read_pmtmr();
+	value1 = clocksource_acpi_pm.read()
 	mach_countup(&count);
-	value2 = read_pmtmr();
+	value2 = clocksource_acpi_pm.read()
 	delta = (value2 - value1) & ACPI_PM_MASK;
 
 	/* Check that the PMTMR delta is within 5% of what we expect */
@@ -177,7 +177,7 @@ static int verify_pmtmr_rate(void)
 
 static int __init init_acpi_pm_clocksource(void)
 {
-	u32 value1, value2;
+	cycle_t value1, value2;
 	unsigned int i;
 
 	if (!pmtmr_ioport)
@@ -187,9 +187,9 @@ ...
From: Arjan van de Ven
Date: Sunday, August 10, 2008 - 9:40 am

On Sun, 10 Aug 2008 18:29:20 +0200

if it's really only one board that's faulty... a dmi blacklist might be
appropriate.


-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Andreas Mohr
Date: Sunday, August 10, 2008 - 12:08 pm

OK, right, technically this workaround is not related to this
different bug.
And it's in fact not this triple-read which has any weakness here but rather

Yeah, might be motherboard only, but likely still chipset-global,
since probably not too many people tried this beast with ACPI / acpi_pm even
(we're not even talking about the usual Linux ACPI 2001 blacklist limit
with this board, more like 1999, 1998 or even 1997 stoneage).

OK, dmidecode said:
        Vendor: Award Software International, Inc.
        Version: 4.51 PG
        Release Date: 07/05/99


_DAMN_ you're fast! ;)

Technically it's related to the base type of cycle_t (i.e., u64 and thus
probably "unsigned long long"), thus %llx is the format specifier


Even if it's not a systematic chipset / layout error, then I'm sure there's always
the occasional custom-broken (read: damaged) system which would need a
useful check to avoid counter-related lockups.

IMHO the current init check is too weak, it will catch the very simplest
types of problems only, and that's not a good thing.

About Arjan's suggestion to use DMI blacklisting here: not the right
method here IMHO since one could easily catch such problems generically
and thus much more reliably than maintaining an ever-growing and thus
always-incomplete blacklist collection.
Anyway, he provided important input still ;)

Andreas Mohr
--

From: Dominik Brodowski
Date: Sunday, August 10, 2008 - 1:02 pm

Hi Andreas,



what about this?

Best,
	Dominik


acpi_pm.c: check for monotonicity

Expand the check for monotonicity by doing ten tests instead of one. 

Applies on top of "acpi_pm.c: use proper read function also in errata mode."

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.de>

diff --git a/drivers/clocksource/acpi_pm.c b/drivers/clocksource/acpi_pm.c
index 2c00edd..f05c4fb 100644
--- a/drivers/clocksource/acpi_pm.c
+++ b/drivers/clocksource/acpi_pm.c
@@ -178,7 +178,7 @@ static int verify_pmtmr_rate(void)
 static int __init init_acpi_pm_clocksource(void)
 {
 	cycle_t value1, value2;
-	unsigned int i;
+	unsigned int i, j, good = 0;
 
 	if (!pmtmr_ioport)
 		return -ENODEV;
@@ -187,24 +187,32 @@ static int __init init_acpi_pm_clocksource(void)
 						clocksource_acpi_pm.shift);
 
 	/* "verify" this timing source: */
-	value1 = clocksource_acpi_pm.read();
-	for (i = 0; i < 10000; i++) {
-		value2 = clocksource_acpi_pm.read();
-		if (value2 == value1)
-			continue;
-		if (value2 > value1)
-			goto pm_good;
-		if ((value2 < value1) && ((value2) < 0xFFF))
-			goto pm_good;
-		printk(KERN_INFO "PM-Timer had inconsistent results:"
-			" 0x%#llx, 0x%#llx - aborting.\n", value1, value2);
-		return -EINVAL;
+	for (j = 0; j < 10; j++) {
+		value1 = clocksource_acpi_pm.read();
+		for (i = 0; i < 10000; i++) {
+			value2 = clocksource_acpi_pm.read();
+			if (value2 == value1)
+				continue;
+			if (value2 > value1)
+				good++;
+				break;
+			if ((value2 < value1) && ((value2) < 0xFFF))
+				good++;
+				break;
+			printk(KERN_INFO "PM-Timer had inconsistent results:"
+			       " 0x%#llx, 0x%#llx - aborting.\n",
+			       value1, value2);
+			return -EINVAL;
+		}
+		udelay(300 * i);
+	}
+
+	if (good != 10) {
+		printk(KERN_INFO "PM-Timer had no reasonable result:"
+		       " 0x%#llx - aborting.\n", value1);
+		return -ENODEV;
 	}
-	printk(KERN_INFO "PM-Timer had no reasonable result:"
-			" 0x%#llx - aborting.\n", ...
From: Dominik Brodowski
Date: Monday, August 18, 2008 - 12:03 pm

Hi,

anyone willing to take these two patches? Possibly 2.6.27 material; hasn't
been in -next or -mm.

Best,
	Dominik


The following changes since commit 796aadeb1b2db9b5d463946766c5bbfd7717158c:
  Linus Torvalds (1):
        Merge branch 'fixes' of git://git.kernel.org/.../davej/cpufreq

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6.git clocksource

Dominik Brodowski (2):
      acpi_pm.c: use proper read function also in errata mode.
      acpi_pm.c: check for monotonicity

 drivers/clocksource/acpi_pm.c |   50 +++++++++++++++++++++++-----------------
 1 files changed, 29 insertions(+), 21 deletions(-)
--

From: Dominik Brodowski
Date: Monday, August 18, 2008 - 12:05 pm

Expand the check for monotonicity by doing ten tests instead of one.

Applies on top of "acpi_pm.c: use proper read function also in errata mode."

CC: John Stultz <johnstul@us.ibm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 drivers/clocksource/acpi_pm.c |   42 ++++++++++++++++++++++++----------------
 1 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/drivers/clocksource/acpi_pm.c b/drivers/clocksource/acpi_pm.c
index 2c00edd..f05c4fb 100644
--- a/drivers/clocksource/acpi_pm.c
+++ b/drivers/clocksource/acpi_pm.c
@@ -178,7 +178,7 @@ static int verify_pmtmr_rate(void)
 static int __init init_acpi_pm_clocksource(void)
 {
 	cycle_t value1, value2;
-	unsigned int i;
+	unsigned int i, j, good = 0;
 
 	if (!pmtmr_ioport)
 		return -ENODEV;
@@ -187,24 +187,32 @@ static int __init init_acpi_pm_clocksource(void)
 						clocksource_acpi_pm.shift);
 
 	/* "verify" this timing source: */
-	value1 = clocksource_acpi_pm.read();
-	for (i = 0; i < 10000; i++) {
-		value2 = clocksource_acpi_pm.read();
-		if (value2 == value1)
-			continue;
-		if (value2 > value1)
-			goto pm_good;
-		if ((value2 < value1) && ((value2) < 0xFFF))
-			goto pm_good;
-		printk(KERN_INFO "PM-Timer had inconsistent results:"
-			" 0x%#llx, 0x%#llx - aborting.\n", value1, value2);
-		return -EINVAL;
+	for (j = 0; j < 10; j++) {
+		value1 = clocksource_acpi_pm.read();
+		for (i = 0; i < 10000; i++) {
+			value2 = clocksource_acpi_pm.read();
+			if (value2 == value1)
+				continue;
+			if (value2 > value1)
+				good++;
+				break;
+			if ((value2 < value1) && ((value2) < 0xFFF))
+				good++;
+				break;
+			printk(KERN_INFO "PM-Timer had inconsistent results:"
+			       " 0x%#llx, 0x%#llx - aborting.\n",
+			       value1, value2);
+			return -EINVAL;
+		}
+		udelay(300 * i);
+	}
+
+	if (good != 10) {
+		printk(KERN_INFO "PM-Timer had no reasonable result:"
+		       " 0x%#llx ...
From: Dominik Brodowski
Date: Monday, August 18, 2008 - 12:05 pm

When acpi_pm is used in errata mode (three reads instead of one), also the
acpi_pm init functions need to use three reads instead of just one.

Thanks to Andreas Mohr for spotting this issue.

CC: John Stultz <johnstul@us.ibm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 drivers/clocksource/acpi_pm.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/clocksource/acpi_pm.c b/drivers/clocksource/acpi_pm.c
index 5ca1d80..2c00edd 100644
--- a/drivers/clocksource/acpi_pm.c
+++ b/drivers/clocksource/acpi_pm.c
@@ -151,13 +151,13 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, PCI_DEVICE_ID_SERVERWORKS_LE,
  */
 static int verify_pmtmr_rate(void)
 {
-	u32 value1, value2;
+	cycle_t value1, value2;
 	unsigned long count, delta;
 
 	mach_prepare_counter();
-	value1 = read_pmtmr();
+	value1 = clocksource_acpi_pm.read()
 	mach_countup(&count);
-	value2 = read_pmtmr();
+	value2 = clocksource_acpi_pm.read()
 	delta = (value2 - value1) & ACPI_PM_MASK;
 
 	/* Check that the PMTMR delta is within 5% of what we expect */
@@ -177,7 +177,7 @@ static int verify_pmtmr_rate(void)
 
 static int __init init_acpi_pm_clocksource(void)
 {
-	u32 value1, value2;
+	cycle_t value1, value2;
 	unsigned int i;
 
 	if (!pmtmr_ioport)
@@ -187,9 +187,9 @@ static int __init init_acpi_pm_clocksource(void)
 						clocksource_acpi_pm.shift);
 
 	/* "verify" this timing source: */
-	value1 = read_pmtmr();
+	value1 = clocksource_acpi_pm.read();
 	for (i = 0; i < 10000; i++) {
-		value2 = read_pmtmr();
+		value2 = clocksource_acpi_pm.read();
 		if (value2 == value1)
 			continue;
 		if (value2 > value1)
@@ -197,11 +197,11 @@ static int __init init_acpi_pm_clocksource(void)
 		if ((value2 < value1) && ((value2) < 0xFFF))
 			goto pm_good;
 		printk(KERN_INFO "PM-Timer had inconsistent results:"
-			" 0x%#x, 0x%#x - aborting.\n", value1, ...
From: Andrew Morton
Date: Monday, August 18, 2008 - 12:19 pm

On Mon, 18 Aug 2008 21:03:25 +0200

A bare git URL is somewhat user-unfriendly.







I guess this file falls under Thomas's git-hrt tree.  I can queue the
patches up and spam Thomas with them, but I'm at a bit of a loss
regarding their priority due to the above questions.
--

From: Dominik Brodowski
Date: Monday, August 18, 2008 - 12:35 pm

Hi Andrew,



Indeed: on all affected hardware (some Intel ICH4, PIIX4 and PIIX4E chipsets)
there's about a 4.2% chance that initialization of the ACPI PMTMR fails. On
those chipsets, we need to read out the timer value at least three times to
get a correct result, for every once in a while (i.e. within a 3 ns window
every 69.8 ns) the read returns a bogus result. During normal operation we
work around this issue, but during initialization reading a bogus value may

Indeed: http://lkml.org/lkml/2008/8/10/77 -- quote:
"Result: catastrophic timer behaviour (a large backwards skip is possible),"

The current check for monotonicity is way too weak. And at least on one

That would be great, thanks.

	Dominik
--

From: Andrew Morton
Date: Monday, August 18, 2008 - 12:47 pm

On Mon, 18 Aug 2008 21:35:17 +0200


yeah, that does look like 2.6.27 material.  And possibly 2.6.26.x and

Could I trouble you to resend them as plain-old-patches, with full
changelogs along with your thoughts about the suitablity for
2.6.2[5678].x please?

I could attempt to put all that together here, but I'd probably end up
with something inappropriate.

Thanks.
--

From: Dominik Brodowski
Date: Monday, August 18, 2008 - 1:09 pm

patches will be sent as replies to this message. Thanks for taking care of
this; I should have written more meaningful texts. Also, I failed to
remember the second patch hasn't been tested yet ..


.. but still I think this is stuff for 2.6.27; not (yet) suitable for 2.6.26
and 2.6.25 based on the -stable rules.

Best,
	Dominik
--

From: Dominik Brodowski
Date: Monday, August 18, 2008 - 1:11 pm

From 13cc8b6ff0d8198102f60691c38c07151a693c7b Mon Sep 17 00:00:00 2001
From: Dominik Brodowski <linux@dominikbrodowski.net>
Date: Sun, 10 Aug 2008 21:34:54 +0200
Subject: [PATCH 2/2] acpi_pm.c: check for monotonicity

The current check for monotonicity is way too weak: Andreas Mohr reports
( http://lkml.org/lkml/2008/8/10/77  ) that on one of his test systems the
current check only triggers in 50% of all cases, leading to catastrophic timer
behaviour. To fix this issue, expand the check for monotonicity by doing ten
consecutive tests instead of one.

CC: John Stultz <johnstul@us.ibm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 drivers/clocksource/acpi_pm.c |   42 ++++++++++++++++++++++++----------------
 1 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/drivers/clocksource/acpi_pm.c b/drivers/clocksource/acpi_pm.c
index 2c00edd..f05c4fb 100644
--- a/drivers/clocksource/acpi_pm.c
+++ b/drivers/clocksource/acpi_pm.c
@@ -178,7 +178,7 @@ static int verify_pmtmr_rate(void)
 static int __init init_acpi_pm_clocksource(void)
 {
 	cycle_t value1, value2;
-	unsigned int i;
+	unsigned int i, j, good = 0;
 
 	if (!pmtmr_ioport)
 		return -ENODEV;
@@ -187,24 +187,32 @@ static int __init init_acpi_pm_clocksource(void)
 						clocksource_acpi_pm.shift);
 
 	/* "verify" this timing source: */
-	value1 = clocksource_acpi_pm.read();
-	for (i = 0; i < 10000; i++) {
-		value2 = clocksource_acpi_pm.read();
-		if (value2 == value1)
-			continue;
-		if (value2 > value1)
-			goto pm_good;
-		if ((value2 < value1) && ((value2) < 0xFFF))
-			goto pm_good;
-		printk(KERN_INFO "PM-Timer had inconsistent results:"
-			" 0x%#llx, 0x%#llx - aborting.\n", value1, value2);
-		return -EINVAL;
+	for (j = 0; j < 10; j++) {
+		value1 = clocksource_acpi_pm.read();
+		for (i = 0; i < 10000; i++) {
+			value2 = clocksource_acpi_pm.read();
+			if (value2 == ...
From: Andreas Mohr
Date: Monday, August 18, 2008 - 1:18 pm

Hi,


Technically spoken this log message could now be considered partially
outdated... (we're doing 10 evaluations after all, not one with a
precise end result).


Seeing a define for those several open-coded 10 loops values would be nice.

Andreas Mohr
--

From: Andrew Morton
Date: Monday, August 18, 2008 - 1:28 pm

On Mon, 18 Aug 2008 22:18:44 +0200

Also it's a bit dodgy printing a cycle_t with %llx.  We don't _know_
that cycle_t was implemented with `long long' - if this was always
true, we wouldn't (or shouldn't) have a cycle_t at all.

But it seems that it happens to work for all architectures which
implement acpi.

--

From: Dominik Brodowski
Date: Monday, August 18, 2008 - 1:42 pm

Hi,


Well, adding a format modifier just for this isn't necessary IMO, especially
as all acpi-implementing architectures seem to be fine. Regarding the other
issues: well, I don't care all that much about a define for something used
twice and this comment; but if you want to squash it into the other patch
I'm fine with it.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>

diff --git a/drivers/clocksource/acpi_pm.c b/drivers/clocksource/acpi_pm.c
index f05c4fb..09d4650 100644
--- a/drivers/clocksource/acpi_pm.c
+++ b/drivers/clocksource/acpi_pm.c
@@ -175,6 +175,9 @@ static int verify_pmtmr_rate(void)
 #define verify_pmtmr_rate() (0)
 #endif
 
+/* Number of monotonicity checks to perform during initialization */
+#define ACPI_PM_MONOTONICITY_CHECKS 10
+
 static int __init init_acpi_pm_clocksource(void)
 {
 	cycle_t value1, value2;
@@ -187,7 +190,7 @@ static int __init init_acpi_pm_clocksource(void)
 						clocksource_acpi_pm.shift);
 
 	/* "verify" this timing source: */
-	for (j = 0; j < 10; j++) {
+	for (j = 0; j < ACPI_PM_MONOTONICITY_CHECKS; j++) {
 		value1 = clocksource_acpi_pm.read();
 		for (i = 0; i < 10000; i++) {
 			value2 = clocksource_acpi_pm.read();
@@ -207,9 +210,9 @@ static int __init init_acpi_pm_clocksource(void)
 		udelay(300 * i);
 	}
 
-	if (good != 10) {
-		printk(KERN_INFO "PM-Timer had no reasonable result:"
-		       " 0x%#llx - aborting.\n", value1);
+	if (good != ACPI_PM_MONOTONICITY_CHECKS) {
+		printk(KERN_INFO "PM-Timer failed consistency check "
+		       " (0x%#llx) - aborting.\n", value1);
 		return -ENODEV;
 	}
 
--

From: Dominik Brodowski
Date: Monday, August 18, 2008 - 1:10 pm

On all hardware (some Intel ICH4, PIIX4 and PIIX4E chipsets) affected by a
hardware errata there's about a 4.2% chance that initialization of the ACPI
PMTMR fails. On those chipsets, we need to read out the timer value at least
three times to get a correct result, for every once in a while (i.e. within
a 3 ns window every 69.8 ns) the read returns a bogus result. During normal
operation we work around this issue, but during initialization reading a
bogus value may lead to -EINVAL even though the hardware is usable.

Thanks to Andreas Mohr for spotting this issue.

CC: John Stultz <johnstul@us.ibm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 drivers/clocksource/acpi_pm.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/clocksource/acpi_pm.c b/drivers/clocksource/acpi_pm.c
index 5ca1d80..2c00edd 100644
--- a/drivers/clocksource/acpi_pm.c
+++ b/drivers/clocksource/acpi_pm.c
@@ -151,13 +151,13 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, PCI_DEVICE_ID_SERVERWORKS_LE,
  */
 static int verify_pmtmr_rate(void)
 {
-	u32 value1, value2;
+	cycle_t value1, value2;
 	unsigned long count, delta;
 
 	mach_prepare_counter();
-	value1 = read_pmtmr();
+	value1 = clocksource_acpi_pm.read()
 	mach_countup(&count);
-	value2 = read_pmtmr();
+	value2 = clocksource_acpi_pm.read()
 	delta = (value2 - value1) & ACPI_PM_MASK;
 
 	/* Check that the PMTMR delta is within 5% of what we expect */
@@ -177,7 +177,7 @@ static int verify_pmtmr_rate(void)
 
 static int __init init_acpi_pm_clocksource(void)
 {
-	u32 value1, value2;
+	cycle_t value1, value2;
 	unsigned int i;
 
 	if (!pmtmr_ioport)
@@ -187,9 +187,9 @@ static int __init init_acpi_pm_clocksource(void)
 						clocksource_acpi_pm.shift);
 
 	/* "verify" this timing source: */
-	value1 = read_pmtmr();
+	value1 = clocksource_acpi_pm.read();
 	for (i = 0; i < 10000; i++) ...
From: Andrew Morton
Date: Tuesday, August 19, 2008 - 2:43 am

compiler got upset.  Did you send the wrong version?
--

From: Dominik Brodowski
Date: Tuesday, August 19, 2008 - 2:49 am

gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7) on amd64 worked fine... what did
yours complain about?

Best,
	Dominik
--

From: Andrew Morton
Date: Tuesday, August 19, 2008 - 2:59 am

missing semicolons
--

From: Dominik Brodowski
Date: Friday, August 22, 2008 - 3:22 pm

next try...


From: Dominik Brodowski <linux@dominikbrodowski.net>
Date: Fri, 22 Aug 2008 23:49:21 +0200
Subject: [PATCH 1/2] acpi_pm.c: use proper read function also in errata mode.

On all hardware (some Intel ICH4, PIIX4 and PIIX4E chipsets) affected by a
hardware errata there's about a 4.2% chance that initialization of the ACPI
PMTMR fails. On those chipsets, we need to read out the timer value at least
three times to get a correct result, for every once in a while (i.e. within
a 3 ns window every 69.8 ns) the read returns a bogus result. During normal
operation we work around this issue, but during initialization reading a
bogus value may lead to -EINVAL even though the hardware is usable.

Thanks to Andreas Mohr for spotting this issue.

CC: John Stultz <johnstul@us.ibm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 drivers/clocksource/acpi_pm.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/clocksource/acpi_pm.c b/drivers/clocksource/acpi_pm.c
index 5ca1d80..860d033 100644
--- a/drivers/clocksource/acpi_pm.c
+++ b/drivers/clocksource/acpi_pm.c
@@ -151,13 +151,13 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, PCI_DEVICE_ID_SERVERWORKS_LE,
  */
 static int verify_pmtmr_rate(void)
 {
-	u32 value1, value2;
+	cycle_t value1, value2;
 	unsigned long count, delta;
 
 	mach_prepare_counter();
-	value1 = read_pmtmr();
+	value1 = clocksource_acpi_pm.read();
 	mach_countup(&count);
-	value2 = read_pmtmr();
+	value2 = clocksource_acpi_pm.read();
 	delta = (value2 - value1) & ACPI_PM_MASK;
 
 	/* Check that the PMTMR delta is within 5% of what we expect */
@@ -177,7 +177,7 @@ static int verify_pmtmr_rate(void)
 
 static int __init init_acpi_pm_clocksource(void)
 {
-	u32 value1, value2;
+	cycle_t value1, value2;
 	unsigned int i;
 
 	if (!pmtmr_ioport)
@@ -187,9 +187,9 @@ static int __init ...
From: Dominik Brodowski
Date: Friday, August 22, 2008 - 3:26 pm

From: Dominik Brodowski <linux@dominikbrodowski.net>
Date: Fri, 22 Aug 2008 23:51:23 +0200
Subject: [PATCH 2/2] acpi_pm.c: check for monotonicity

The current check for monotonicity is way too weak: Andreas Mohr reports
( http://lkml.org/lkml/2008/8/10/77  ) that on one of his test systems the
current check only triggers in 50% of all cases, leading to catastrophic timer
behaviour. To fix this issue, expand the check for monotonicity by doing ten
consecutive tests instead of one.

CC: John Stultz <johnstul@us.ibm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 drivers/clocksource/acpi_pm.c |   46 +++++++++++++++++++++++++---------------
 1 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/drivers/clocksource/acpi_pm.c b/drivers/clocksource/acpi_pm.c
index 860d033..4eee533 100644
--- a/drivers/clocksource/acpi_pm.c
+++ b/drivers/clocksource/acpi_pm.c
@@ -21,6 +21,7 @@
 #include <linux/errno.h>
 #include <linux/init.h>
 #include <linux/pci.h>
+#include <linux/delay.h>
 #include <asm/io.h>
 
 /*
@@ -175,10 +176,13 @@ static int verify_pmtmr_rate(void)
 #define verify_pmtmr_rate() (0)
 #endif
 
+/* Number of monotonicity checks to perform during initialization */
+#define ACPI_PM_MONOTONICITY_CHECKS 10
+
 static int __init init_acpi_pm_clocksource(void)
 {
 	cycle_t value1, value2;
-	unsigned int i;
+	unsigned int i, j, good = 0;
 
 	if (!pmtmr_ioport)
 		return -ENODEV;
@@ -187,24 +191,32 @@ static int __init init_acpi_pm_clocksource(void)
 						clocksource_acpi_pm.shift);
 
 	/* "verify" this timing source: */
-	value1 = clocksource_acpi_pm.read();
-	for (i = 0; i < 10000; i++) {
-		value2 = clocksource_acpi_pm.read();
-		if (value2 == value1)
-			continue;
-		if (value2 > value1)
-			goto pm_good;
-		if ((value2 < value1) && ((value2) < 0xFFF))
-			goto pm_good;
-		printk(KERN_INFO "PM-Timer had inconsistent results:"
-			" 0x%#llx, ...
From: Jochen Voß
Date: Saturday, August 23, 2008 - 1:48 am

Hi,

some minor comments:

300*10000 microseconds seems like a long time to me.  Is this the
If the inner loop runs out once, you alreay know that you will later
abort here.  Maybe move the check directly after the inner loop to
avoid the additional delay (10*10000*300 microseconds = 30 seconds) in
case of failure?

I hope this helps,
Jochen
-- 
http://seehuhn.de/
--

From: Andrew Morton
Date: Monday, August 18, 2008 - 1:25 pm

On Mon, 18 Aug 2008 22:09:16 +0200

(cc stable)

OK, thanks.  I tagged these as

Cc: <stable@kernel.org>         [2.6.26.x, 2.6.25.x but not immediately]

which is a bit rubbery.  What do you think are the criteria for
deciding when these are ready for the backports?  Something like "after
2.6.28-rc1 if nothing blew up"?


--

From: Dominik Brodowski
Date: Monday, August 18, 2008 - 1:29 pm

That sounds perfectly reasonable.

Best,
	Dominik
--

From: Andreas Mohr
Date: Monday, August 18, 2008 - 1:00 pm

Hi,


Just want to say thank you for promptly following up on this issue!

I wanted to test your 2nd patch (improved init), but was unable to do so
(currently too busy due to large changes, plus distance to production system,
downtime NOT appreciated ;).

Andreas Mohr
--

Previous thread: Re: [PATCH v2 02/13] ARM:DaVinci: base address for dm6467 by Jaswinder Singh on Sunday, August 10, 2008 - 2:34 am. (2 messages)

Next thread: [2.6 patch] fix watchdog/ixp4xx_wdt.c compilation by Adrian Bunk on Sunday, August 10, 2008 - 4:03 am. (2 messages)