[PATCH] x86 (64): make calibrate_APIC_clock() smp-safe

Previous thread: [PATCH 0/2][RT] powerpc - fix bug in irq reverse mapping radix tree (Resend) by Sebastien Dugue on Thursday, July 24, 2008 - 3:23 am. (12 messages)

Next thread: Linux Bootup hangs after adding RealTime Premption and HR-Timer by p_ashish on Thursday, July 24, 2008 - 4:10 am. (1 message)
From: Martin Wilck
Date: Thursday, July 24, 2008 - 3:47 am

Hi Thomas and Peter, hi everyone,

Asynchrounous events (e.g.SMIs) which occur during the APIC timer 
calibration can cause timer miscalibrations, sometimes by large amounts.

This patch fixes this by two separate measures:
   a) make sure that no significant interruption occurs between APIC and
      TSC reads
   b) make sure that the measurement loop isn't significantly longer
      than originally intended.

I am sorry, due to a misconfiguration of our SMTP server I need to send 
the patch as attachment.

Martin

-- 
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel:			++49 5251 8 15113
Fax:			++49 5251 8 20209
Email:			mailto:martin.wilck@fujitsu-siemens.com
Internet:		http://www.fujitsu-siemens.com
Company Details:	http://www.fujitsu-siemens.com/imprint.html
From: Cyrill Gorcunov
Date: Thursday, July 24, 2008 - 4:16 am

| Patch: make calibrate_APIC_clock() SMI-safe
| 
| Asynchrounous events (e.g.SMIs) which occur during the APIC timer calibration
| can cause timer miscalibrations, sometimes by large amounts. This patch fixes
| this by two separate measures:
|   a) make sure that no significant interruption occurs between APIC and 
|      TSC reads
|   b) make sure that the measurement loop isn't significantly longer 
|      than originally intended.
| 
| Signed-off-by: Martin Wilck <martin.wilck@fujitsu-siemens.com>
| Signed-off-by: Gerhard Wichert <gerhard.wichert@fujitsu-siemens.com>
| 
| --- linux-2.6.26/arch/x86/kernel/apic_64.c	2008-07-13 23:51:29.000000000 +0200
| +++ linux-2.6.26/arch/x86/kernel/apic_64.c.new	2008-07-24 11:41:24.000000000 +0200
| @@ -314,6 +314,19 @@ static void setup_APIC_timer(void)
|  
|  #define TICK_COUNT 100000000
|  
| +#define MAX_DIFFERENCE 1000UL
| +static inline void __read_tsc_and_apic(unsigned long *tsc, unsigned *apic)
| +{
| +	unsigned long tsc0, tsc1, diff;
| +	do {
| +		rdtscll(tsc0);
| +		*apic = apic_read(APIC_TMCCT);
| +		rdtscll(tsc1);
| +		diff = tsc1 - tsc0;
| +	} while (diff > MAX_DIFFERENCE);
| +	*tsc = tsc0 + (diff >> 1);
| +}
| +
|  static void __init calibrate_APIC_clock(void)
|  {
|  	unsigned apic, apic_start;
| @@ -329,25 +342,37 @@ static void __init calibrate_APIC_clock(
|  	 *
|  	 * No interrupt enable !
|  	 */
| +smi_occured:
|  	__setup_APIC_LVTT(250000000, 0, 0);
|  
| -	apic_start = apic_read(APIC_TMCCT);
|  #ifdef CONFIG_X86_PM_TIMER
|  	if (apic_calibrate_pmtmr && pmtmr_ioport) {
| +		apic_start = apic_read(APIC_TMCCT);
|  		pmtimer_wait(5000);  /* 5ms wait */
|  		apic = apic_read(APIC_TMCCT);
|  		result = (apic_start - apic) * 1000L / 5;
|  	} else
|  #endif
|  	{
| -		rdtscll(tsc_start);
| +		__read_tsc_and_apic(&tsc_start, &apic_start);
|  
|  		do {
| -			apic = apic_read(APIC_TMCCT);
| -			rdtscll(tsc);
| +			__read_tsc_and_apic(&tsc, &apic);
|  		} while ((tsc - tsc_start) < TICK_COUNT &&
|  ...
From: Martin Wilck
Date: Thursday, July 24, 2008 - 4:58 am

Yes, if the SMI flood never ends. But you wouldn't want to work on such 
a system anyway (with the current kernel code, your APIC timer would 
most probably be miscalibrated, which can have the weirdest effects).

Anyway, we can add a maximum iteration count to the patch so that there 
is no risk of getting stuck.

Martin

-- 
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel:			++49 5251 8 15113
Fax:			++49 5251 8 20209
Email:			mailto:martin.wilck@fujitsu-siemens.com
Internet:		http://www.fujitsu-siemens.com
Company Details:	http://www.fujitsu-siemens.com/imprint.html
--

From: Cyrill Gorcunov
Date: Thursday, July 24, 2008 - 5:05 am

yes, it will issue some effects but it's better then stuck there.
More over in 'case of SMI flood with current patch you don't get
error message printed i think so you better add max iteration
counter so user will see on console (or whatever) that he is got
problems.
		- Cyrill -
--

From: Martin Wilck
Date: Thursday, July 24, 2008 - 6:55 am

I disagree. If you have a system that generates SMIs in this extreme 
frequency, you're better off stuck than running on such an unstable 
system. The user _will_ see messages on the console if this happens. 
Note that apparently there are few people who have trouble with this. We 
did see problems, but never had more than 1 SMI disturbing the 
calibration procedure.

Anyway, here is another patch that defines max iteration counts. I 
haven't added a "Signed-off:" line, because I prefer the original version.

Martin

-- 
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel:			++49 5251 8 15113
Fax:			++49 5251 8 20209
Email:			mailto:martin.wilck@fujitsu-siemens.com
Internet:		http://www.fujitsu-siemens.com
Company Details:	http://www.fujitsu-siemens.com/imprint.html
From: Cyrill Gorcunov
Date: Thursday, July 24, 2008 - 7:31 am

yes, Martin, it'll be written on console (just forgot it's not interrupt
driven). I've Cc'ed Maciej in previous message so we should better wait
for his opinion I think. For me the almost ideal solution could be like -
lets user to choose what he wants. I mean you even could add some boot
param to specify behaviour on a such case like panic on SMI flood during
calibration. yes - if we got smi flood we have serious troubles anyway but
i don't think that being just stuck is good choise. And that is why I do like
much more _this_ patch. Anyway - thanks!

		- Cyrill -
--

From: Cyrill Gorcunov
Date: Thursday, July 24, 2008 - 8:01 am

[Cyrill Gorcunov - Thu, Jul 24, 2008 at 06:31:51PM +0400]
| [Martin Wilck - Thu, Jul 24, 2008 at 03:55:02PM +0200]
| > Cyrill Gorcunov wrote:
| >
| >> yes, it will issue some effects but it's better then stuck there.
| >> More over in 'case of SMI flood with current patch you don't get
| >> error message printed i think so you better add max iteration
| >> counter so user will see on console (or whatever) that he is got
| >> problems.
| >>                 - Cyrill -
| >
| > I disagree. If you have a system that generates SMIs in this extreme  
| > frequency, you're better off stuck than running on such an unstable  
| > system. The user _will_ see messages on the console if this happens.  
| > Note that apparently there are few people who have trouble with this. We  
| > did see problems, but never had more than 1 SMI disturbing the  
| > calibration procedure.
| >
| > Anyway, here is another patch that defines max iteration counts. I  
| > haven't added a "Signed-off:" line, because I prefer the original 
| > version.
| >
| > Martin
| >
| 
| yes, Martin, it'll be written on console (just forgot it's not interrupt
| driven). I've Cc'ed Maciej in previous message so we should better wait
| for his opinion I think. For me the almost ideal solution could be like -
| lets user to choose what he wants. I mean you even could add some boot
| param to specify behaviour on a such case like panic on SMI flood during
| calibration. yes - if we got smi flood we have serious troubles anyway but
| i don't think that being just stuck is good choise. And that is why I do like
| much more _this_ patch. Anyway - thanks!
| 
| 		- Cyrill -

btw, Martin, don't get me wrong please - i'm not just complaining :)
The changes you propose is important enough _but_ it could introduce
regression. Look, with situation of miscalibrated apic timer kernel
was working before but with the patch it could stop to work. So if
user has a such screwed motherboard he could be shocked if it stop
booting with ...
From: Martin Wilck
Date: Thursday, July 24, 2008 - 8:13 am

Hi Cyrill,

 > btw, Martin, don't get me wrong please - i'm not just complaining :)
 > The changes you propose is important enough _but_ it could introduce
 > regression. Look, with situation of miscalibrated apic timer kernel
 > was working before but with the patch it could stop to work. So if
 > user has a such screwed motherboard he could be shocked if it stop
 > booting with message about SMI happened. we defenitely have to provide
 > some workaround for this. And your max iteration counter solution
 > would be fine I think.

Let's see what other people think about this. I am fine with both 
solutions. Currently only the first one has been tested though (testing 
these patches thoroughly needs long-time reboot tests).

One more remark: There are similar calibration routines around in the 
kernel which suffer from similar problems as calibrate_APIC_clock(). 
AFAIK, only calibrate_delay() was made SMI-safe by Venkatesh Pallipadi 
years ago.

Martin

-- 
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel:			++49 5251 8 15113
Fax:			++49 5251 8 20209
Email:			mailto:martin.wilck@fujitsu-siemens.com
Internet:		http://www.fujitsu-siemens.com
Company Details:	http://www.fujitsu-siemens.com/imprint.html
--

From: Martin Wilck
Date: Friday, July 25, 2008 - 2:02 am

Here is a new, simplified version of our patch that only uses measure a).
We verified that this is sufficient for accurate calibration.

If we fail to determine the start or end time of the calibration 
correctly 10 times in a row, we will print a critical error message and 
go on. One might as well argue that this should cause a kernel panic (it 
is impossible to run on the CPU for only a few cycles without being 
interrupted by an SMI!), but Cyrill probably won't agree.

Martin

-- 
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel:			++49 5251 8 15113
Fax:			++49 5251 8 20209
Email:			mailto:martin.wilck@fujitsu-siemens.com
Internet:		http://www.fujitsu-siemens.com
Company Details:	http://www.fujitsu-siemens.com/imprint.html
From: Cyrill Gorcunov
Date: Friday, July 25, 2008 - 3:08 am

| [PATCH] x86 (64): make calibrate_APIC_clock() SMI-safe (take 2)
| 
| Non-maskable asynchronous events (e.g. SMIs) which occur during the APIC 
| timer calibration can cause timer miscalibrations, sometimes by large amounts. 
| This patch fixes this by making sure that no significant interruption occurs 
| between APIC and TSC reads. SMIs may still occur at some stage in the 
| calibration loop, causing the loop to last longer than intended. This 
| doesn't matter though, as long as the start and end values are both 
| taken simultaneously.
| 
| Signed-off-by: Martin Wilck <martin.wilck@fujitsu-siemens.com>
| Signed-off-by: Gerhard Wichert <gerhard.wichert@fujitsu-siemens.com>
| 
| --- arch/x86/kernel/apic_64.c	2008-07-25 10:45:09.000000000 +0200
| +++ arch/x86/kernel/apic_64.c.new	2008-07-25 10:45:19.000000000 +0200
| @@ -300,6 +300,31 @@ static void setup_APIC_timer(void)
|  }
|  
|  /*
| + * Helper function for calibrate_APIC_clock(): Make sure that
| + * APIC TMCTT and TSC are read at the same time, to reasonable
| + * accuracy. On any sane system, the retry loop won't need more
| + * than a single retry, given that the rdtsc/apic_read/rdtsc
| + * sequence won't take more than a few cycles.
| + */
| +
| +#define MAX_DIFFERENCE 1000UL
| +#define MAX_ITER 10
| +static inline int
| +__read_tsc_and_apic(unsigned long *tsc, unsigned *apic)
| +{
| +	unsigned long tsc0, tsc1, diff;
| +	int i = 0;
| +	do {
| +		rdtscll(tsc0);
| +		*apic = apic_read(APIC_TMCCT);
| +		rdtscll(tsc1);
| +		diff = tsc1 - tsc0;
| +	} while (diff > MAX_DIFFERENCE && ++i < MAX_ITER); 
| +	*tsc = tsc0 + (diff >> 1);
| +	return diff > MAX_DIFFERENCE ? -EIO : 0;
| +}
| +
| +/*
|   * In this function we calibrate APIC bus clocks to the external
|   * timer. Unfortunately we cannot use jiffies and the timer irq
|   * to calibrate, since some later bootup code depends on getting
| @@ -318,7 +343,7 @@ static void __init calibrate_APIC_clock(
|  {
|  	unsigned apic, apic_start;
|  	unsigned ...
From: Martin Wilck
Date: Friday, July 25, 2008 - 5:29 am

> +
 > +       return -EIO ;

This is wrong - you need to set *tsc also in the -EIO case, otherwise 
the function can return total bogus.

I have to say that my simplified patch failed to do the calibration 
correctly on our test system (the original patch worked well). Please 
stay tuned, we are investigating this currently.

Martin

-- 
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel:			++49 5251 8 15113
Fax:			++49 5251 8 20209
Email:			mailto:martin.wilck@fujitsu-siemens.com
Internet:		http://www.fujitsu-siemens.com
Company Details:	http://www.fujitsu-siemens.com/imprint.html
--

From: Cyrill Gorcunov
Date: Friday, July 25, 2008 - 5:59 am

indeed, thanks! that is not the only problem - I also initialized 'i'

ok, Martin, I'm leaving for vacation soon - hope someone else could
take a look :)

		- Cyrill -
--

From: Martin Wilck
Date: Friday, July 25, 2008 - 6:38 am

Please forget that. It was observed on an old "enterprise" kernel with 
which we are currently testing the backported patch with, and it was due 
to the fact that the initial counter value on that kernel was divided by 
APIC_DIVISOR (=16). The resulting initial counter value was too low in a 
"SMI flood" case, the counter could overlap. APIC_DIVISOR is no longer 
used in the current kernel.

Martin

-- 
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel:			++49 5251 8 15113
Fax:			++49 5251 8 20209
Email:			mailto:martin.wilck@fujitsu-siemens.com
Internet:		http://www.fujitsu-siemens.com
Company Details:	http://www.fujitsu-siemens.com/imprint.html
--

From: Cyrill Gorcunov
Date: Friday, July 25, 2008 - 6:48 am

Martin, if I understood you right - this means your patch is not
needed? Actually on 64bit mode APIC_DIVISOR is a bit hidden in
__setup_APIC_LVTT - you may see it as APIC_TDR_DIV_16 while setting
up divisor register. I was proposing patch for that but it leaded
to potetntial overflow (thanks Ingo for catching) so we leave it as
is. Maybe I miss something?

		- Cyrill -
--

From: Martin Wilck
Date: Friday, July 25, 2008 - 7:01 am

The patch would still be needed. Just the reported failure of my 
simplified patch on the old kernel would not have occurred in the 
current kernel. IOW, the patch is fine for the current kernel, but not 

The problem was not that the divisor 16 was used for the counter speed 
(APIC_TDR_DIV_16), but that the old code set the counter start value to 
(250000000/16) rather than just 250000000. That means the counter will 
underflow earlier.

I am attaching a "take 3" patch which minimizes the risk of an underflow 
by using the maximum possible initial value for the APIC timer.

Martin

-- 
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel:			++49 5251 8 15113
Fax:			++49 5251 8 20209
Email:			mailto:martin.wilck@fujitsu-siemens.com
Internet:		http://www.fujitsu-siemens.com
Company Details:	http://www.fujitsu-siemens.com/imprint.html
--

From: Cyrill Gorcunov
Date: Friday, July 25, 2008 - 7:15 am

ah, ok, thanks

		- Cyrill -
--

From: Cyrill Gorcunov
Date: Friday, July 25, 2008 - 8:01 am

Martin, it seems you forgot to attach the patch :) (since it was
continious tense I thought I would see patch in minute but...
it was not eventually apeeared :)

		- Cyrill -
--

From: Martin Wilck
Date: Friday, July 25, 2008 - 8:13 am

I am sorry, here it is. Thanks :)

Martin

-- 
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel:			++49 5251 8 15113
Fax:			++49 5251 8 20209
Email:			mailto:martin.wilck@fujitsu-siemens.com
Internet:		http://www.fujitsu-siemens.com
Company Details:	http://www.fujitsu-siemens.com/imprint.html
From: Cyrill Gorcunov
Date: Friday, July 25, 2008 - 8:39 am

| [PATCH] x86 (64): make calibrate_APIC_clock() SMI-safe (take 3)
| 
| Non-maskable asynchronous events (e.g. SMIs) which occur during the APIC 
| timer calibration can cause timer miscalibrations, sometimes by large amounts. 
| This patch fixes this by making sure that no significant interruption occurs 
| between APIC and TSC reads. SMIs may still occur at some stage in the 
| calibration loop, causing the loop to last longer than intended. This 
| doesn't matter though, as long as the start and end values are both 
| taken simultaneously.
| 
| Changed wrt take 2: Use max. possible start value for the APIC timer 
| to avoid underflow.
| 
| Signed-off-by: Martin Wilck <martin.wilck@fujitsu-siemens.com>
| Signed-off-by: Gerhard Wichert <gerhard.wichert@fujitsu-siemens.com>
| 
| --- arch/x86/kernel/apic_64.c	2008-07-25 15:39:51.000000000 +0200
| +++ arch/x86/kernel/apic_64.c.new	2008-07-25 15:55:08.000000000 +0200
| @@ -300,6 +300,31 @@ static void setup_APIC_timer(void)
|  }
|  
|  /*
| + * Helper function for calibrate_APIC_clock(): Make sure that
| + * APIC TMCTT and TSC are read at the same time, to reasonable
| + * accuracy. On any sane system, the retry loop won't need more
| + * than a single retry, given that the rdtsc/apic_read/rdtsc
| + * sequence won't take more than a few cycles.
| + */
| +
| +#define MAX_DIFFERENCE 1000UL
| +#define MAX_ITER 10
| +static inline int
| +__read_tsc_and_apic(unsigned long *tsc, unsigned *apic)
| +{
| +	unsigned long tsc0, tsc1, diff;
| +	int i = 0;
| +	do {
| +		rdtscll(tsc0);
| +		*apic = apic_read(APIC_TMCCT);
| +		rdtscll(tsc1);
| +		diff = tsc1 - tsc0;
| +	} while (diff > MAX_DIFFERENCE && ++i < MAX_ITER); 
| +	*tsc = tsc0 + (diff >> 1);
| +	return diff > MAX_DIFFERENCE ? -EIO : 0;
| +}
| +
| +/*
|   * In this function we calibrate APIC bus clocks to the external
|   * timer. Unfortunately we cannot use jiffies and the timer irq
|   * to calibrate, since some later bootup code depends on getting
| @@ -318,7 ...
From: Ingo Molnar
Date: Saturday, July 26, 2008 - 8:40 am

nice - could you please implement it symmetrically on 32-bit APIC 
calibration as well?

	Ingo
--

From: Jean Delvare
Date: Thursday, March 12, 2009 - 2:41 am

Hi all,


Sorry for replying to such an old thread, but did this patch go
anywhere? I can't seem to find it in git. Was it somehow obsoleted
by a different fix for the same issue (SMI flood during APIC
calibration)? Or is the upstream kernel still affected?

Thanks,
-- 
Jean Delvare
Suse L3
--

From: Martin Wilck
Date: Thursday, March 12, 2009 - 6:38 am

Hi Jean,


We were asked for a solution for all affected architectures (in 
particular, also i386) which was more than we were able to do
at the time, given that the pressure had been reduced by finding a BIOS 
fix for the particular situation on system in question.

The APIC clock calibration code on i386 is more complex than x86_64, we 
saw no way to provide a high-quality, regression-safe, tested patch for 
that to upstream with a reasonable amount of effort.

That aside, I still think the x86_64 patch is fine and won't cause 
regressions. It just doesn't help on 32bit systems.

Martin

-- 
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel:			++49 5251 525 2796
Fax:			++49 5251 525 2820
Email:			mailto:martin.wilck@fujitsu-siemens.com
Internet:		http://www.fujitsu-siemens.com
Company Details:	http://www.fujitsu-siemens.com/imprint.html
--

From: Olaf Dabrunz
Date: Friday, July 25, 2008 - 9:51 am

Note that the SMIs may be triggered when the APIC is read. This may
change after the first (or the first few) SMIs have been triggered. So
the "too many SMIs" case during the calibration does not necessarily
mean that the system can not run normally after the calibration is done.

This is why I would prefer the solution with the error message.

Regards,

-- 
Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, Nürnberg

--

From: H. Peter Anvin
Date: Thursday, July 24, 2008 - 6:31 am

The other thing we may want to consider is doing multiple runs of the 
calibration loop.  I suspect we'd get a more accurate result running, 
say, 9 loops of the 1/9 the current duration and looking either for the 
best result or the median (NOT the mean.)

	-hpa

--

From: Martin Wilck
Date: Thursday, July 24, 2008 - 6:42 am

We considered that. However I think if you know what result to expect 
(here: TICK_COUNT + a few cycles) it is both easier and safer to compare 
against the expected value than to do statistics. SMI protection is all 
about outliers (thus, the median is a good suggestion but I'd say it 
would be over-enigineered).

Martin

PS: I am sorry for the stupid, misleading typo (SMi -> smp) in the mail 
subject.

-- 
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel:			++49 5251 8 15113
Fax:			++49 5251 8 20209
Email:			mailto:martin.wilck@fujitsu-siemens.com
Internet:		http://www.fujitsu-siemens.com
Company Details:	http://www.fujitsu-siemens.com/imprint.html
--

Previous thread: [PATCH 0/2][RT] powerpc - fix bug in irq reverse mapping radix tree (Resend) by Sebastien Dugue on Thursday, July 24, 2008 - 3:23 am. (12 messages)

Next thread: Linux Bootup hangs after adding RealTime Premption and HR-Timer by p_ashish on Thursday, July 24, 2008 - 4:10 am. (1 message)