Hi Thomas and Peter, hi everyone,
Asynchrounous events (e.g.SMIs) which occur during the APIC timer
calibration can cause timer miscalibrations, sometimes by large amounts.
This patch fixes this by two separate measures:
a) make sure that no significant interruption occurs between APIC and
TSC reads
b) make sure that the measurement loop isn't significantly longer
than originally intended.
I am sorry, due to a misconfiguration of our SMTP server I need to send
the patch as attachment.
Martin
--
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6
Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany
Tel: ++49 5251 8 15113
Fax: ++49 5251 8 20209
Email: mailto:martin.wilck@fujitsu-siemens.com
Internet: http://www.fujitsu-siemens.com
Company Details: http://www.fujitsu-siemens.com/imprint.html
| Patch: make calibrate_APIC_clock() SMI-safe
|
| Asynchrounous events (e.g.SMIs) which occur during the APIC timer calibration
| can cause timer miscalibrations, sometimes by large amounts. This patch fixes
| this by two separate measures:
| a) make sure that no significant interruption occurs between APIC and
| TSC reads
| b) make sure that the measurement loop isn't significantly longer
| than originally intended.
|
| Signed-off-by: Martin Wilck <martin.wilck@fujitsu-siemens.com>
| Signed-off-by: Gerhard Wichert <gerhard.wichert@fujitsu-siemens.com>
|
| --- linux-2.6.26/arch/x86/kernel/apic_64.c 2008-07-13 23:51:29.000000000 +0200
| +++ linux-2.6.26/arch/x86/kernel/apic_64.c.new 2008-07-24 11:41:24.000000000 +0200
| @@ -314,6 +314,19 @@ static void setup_APIC_timer(void)
|
| #define TICK_COUNT 100000000
|
| +#define MAX_DIFFERENCE 1000UL
| +static inline void __read_tsc_and_apic(unsigned long *tsc, unsigned *apic)
| +{
| + unsigned long tsc0, tsc1, diff;
| + do {
| + rdtscll(tsc0);
| + *apic = apic_read(APIC_TMCCT);
| + rdtscll(tsc1);
| + diff = tsc1 - tsc0;
| + } while (diff > MAX_DIFFERENCE);
| + *tsc = tsc0 + (diff >> 1);
| +}
| +
| static void __init calibrate_APIC_clock(void)
| {
| unsigned apic, apic_start;
| @@ -329,25 +342,37 @@ static void __init calibrate_APIC_clock(
| *
| * No interrupt enable !
| */
| +smi_occured:
| __setup_APIC_LVTT(250000000, 0, 0);
|
| - apic_start = apic_read(APIC_TMCCT);
| #ifdef CONFIG_X86_PM_TIMER
| if (apic_calibrate_pmtmr && pmtmr_ioport) {
| + apic_start = apic_read(APIC_TMCCT);
| pmtimer_wait(5000); /* 5ms wait */
| apic = apic_read(APIC_TMCCT);
| result = (apic_start - apic) * 1000L / 5;
| } else
| #endif
| {
| - rdtscll(tsc_start);
| + __read_tsc_and_apic(&tsc_start, &apic_start);
|
| do {
| - apic = apic_read(APIC_TMCCT);
| - rdtscll(tsc);
| + __read_tsc_and_apic(&tsc, &apic);
| } while ((tsc - tsc_start) < TICK_COUNT &&
| ...Yes, if the SMI flood never ends. But you wouldn't want to work on such a system anyway (with the current kernel code, your APIC timer would most probably be miscalibrated, which can have the weirdest effects). Anyway, we can add a maximum iteration count to the patch so that there is no risk of getting stuck. Martin -- Martin Wilck PRIMERGY System Software Engineer FSC IP ESP DEV 6 Fujitsu Siemens Computers GmbH Heinz-Nixdorf-Ring 1 33106 Paderborn Germany Tel: ++49 5251 8 15113 Fax: ++49 5251 8 20209 Email: mailto:martin.wilck@fujitsu-siemens.com Internet: http://www.fujitsu-siemens.com Company Details: http://www.fujitsu-siemens.com/imprint.html --
yes, it will issue some effects but it's better then stuck there. More over in 'case of SMI flood with current patch you don't get error message printed i think so you better add max iteration counter so user will see on console (or whatever) that he is got problems. - Cyrill - --
I disagree. If you have a system that generates SMIs in this extreme frequency, you're better off stuck than running on such an unstable system. The user _will_ see messages on the console if this happens. Note that apparently there are few people who have trouble with this. We did see problems, but never had more than 1 SMI disturbing the calibration procedure. Anyway, here is another patch that defines max iteration counts. I haven't added a "Signed-off:" line, because I prefer the original version. Martin -- Martin Wilck PRIMERGY System Software Engineer FSC IP ESP DEV 6 Fujitsu Siemens Computers GmbH Heinz-Nixdorf-Ring 1 33106 Paderborn Germany Tel: ++49 5251 8 15113 Fax: ++49 5251 8 20209 Email: mailto:martin.wilck@fujitsu-siemens.com Internet: http://www.fujitsu-siemens.com Company Details: http://www.fujitsu-siemens.com/imprint.html
yes, Martin, it'll be written on console (just forgot it's not interrupt driven). I've Cc'ed Maciej in previous message so we should better wait for his opinion I think. For me the almost ideal solution could be like - lets user to choose what he wants. I mean you even could add some boot param to specify behaviour on a such case like panic on SMI flood during calibration. yes - if we got smi flood we have serious troubles anyway but i don't think that being just stuck is good choise. And that is why I do like much more _this_ patch. Anyway - thanks! - Cyrill - --
[Cyrill Gorcunov - Thu, Jul 24, 2008 at 06:31:51PM +0400] | [Martin Wilck - Thu, Jul 24, 2008 at 03:55:02PM +0200] | > Cyrill Gorcunov wrote: | > | >> yes, it will issue some effects but it's better then stuck there. | >> More over in 'case of SMI flood with current patch you don't get | >> error message printed i think so you better add max iteration | >> counter so user will see on console (or whatever) that he is got | >> problems. | >> - Cyrill - | > | > I disagree. If you have a system that generates SMIs in this extreme | > frequency, you're better off stuck than running on such an unstable | > system. The user _will_ see messages on the console if this happens. | > Note that apparently there are few people who have trouble with this. We | > did see problems, but never had more than 1 SMI disturbing the | > calibration procedure. | > | > Anyway, here is another patch that defines max iteration counts. I | > haven't added a "Signed-off:" line, because I prefer the original | > version. | > | > Martin | > | | yes, Martin, it'll be written on console (just forgot it's not interrupt | driven). I've Cc'ed Maciej in previous message so we should better wait | for his opinion I think. For me the almost ideal solution could be like - | lets user to choose what he wants. I mean you even could add some boot | param to specify behaviour on a such case like panic on SMI flood during | calibration. yes - if we got smi flood we have serious troubles anyway but | i don't think that being just stuck is good choise. And that is why I do like | much more _this_ patch. Anyway - thanks! | | - Cyrill - btw, Martin, don't get me wrong please - i'm not just complaining :) The changes you propose is important enough _but_ it could introduce regression. Look, with situation of miscalibrated apic timer kernel was working before but with the patch it could stop to work. So if user has a such screwed motherboard he could be shocked if it stop booting with ...
Hi Cyrill, > btw, Martin, don't get me wrong please - i'm not just complaining :) > The changes you propose is important enough _but_ it could introduce > regression. Look, with situation of miscalibrated apic timer kernel > was working before but with the patch it could stop to work. So if > user has a such screwed motherboard he could be shocked if it stop > booting with message about SMI happened. we defenitely have to provide > some workaround for this. And your max iteration counter solution > would be fine I think. Let's see what other people think about this. I am fine with both solutions. Currently only the first one has been tested though (testing these patches thoroughly needs long-time reboot tests). One more remark: There are similar calibration routines around in the kernel which suffer from similar problems as calibrate_APIC_clock(). AFAIK, only calibrate_delay() was made SMI-safe by Venkatesh Pallipadi years ago. Martin -- Martin Wilck PRIMERGY System Software Engineer FSC IP ESP DEV 6 Fujitsu Siemens Computers GmbH Heinz-Nixdorf-Ring 1 33106 Paderborn Germany Tel: ++49 5251 8 15113 Fax: ++49 5251 8 20209 Email: mailto:martin.wilck@fujitsu-siemens.com Internet: http://www.fujitsu-siemens.com Company Details: http://www.fujitsu-siemens.com/imprint.html --
Here is a new, simplified version of our patch that only uses measure a). We verified that this is sufficient for accurate calibration. If we fail to determine the start or end time of the calibration correctly 10 times in a row, we will print a critical error message and go on. One might as well argue that this should cause a kernel panic (it is impossible to run on the CPU for only a few cycles without being interrupted by an SMI!), but Cyrill probably won't agree. Martin -- Martin Wilck PRIMERGY System Software Engineer FSC IP ESP DEV 6 Fujitsu Siemens Computers GmbH Heinz-Nixdorf-Ring 1 33106 Paderborn Germany Tel: ++49 5251 8 15113 Fax: ++49 5251 8 20209 Email: mailto:martin.wilck@fujitsu-siemens.com Internet: http://www.fujitsu-siemens.com Company Details: http://www.fujitsu-siemens.com/imprint.html
| [PATCH] x86 (64): make calibrate_APIC_clock() SMI-safe (take 2)
|
| Non-maskable asynchronous events (e.g. SMIs) which occur during the APIC
| timer calibration can cause timer miscalibrations, sometimes by large amounts.
| This patch fixes this by making sure that no significant interruption occurs
| between APIC and TSC reads. SMIs may still occur at some stage in the
| calibration loop, causing the loop to last longer than intended. This
| doesn't matter though, as long as the start and end values are both
| taken simultaneously.
|
| Signed-off-by: Martin Wilck <martin.wilck@fujitsu-siemens.com>
| Signed-off-by: Gerhard Wichert <gerhard.wichert@fujitsu-siemens.com>
|
| --- arch/x86/kernel/apic_64.c 2008-07-25 10:45:09.000000000 +0200
| +++ arch/x86/kernel/apic_64.c.new 2008-07-25 10:45:19.000000000 +0200
| @@ -300,6 +300,31 @@ static void setup_APIC_timer(void)
| }
|
| /*
| + * Helper function for calibrate_APIC_clock(): Make sure that
| + * APIC TMCTT and TSC are read at the same time, to reasonable
| + * accuracy. On any sane system, the retry loop won't need more
| + * than a single retry, given that the rdtsc/apic_read/rdtsc
| + * sequence won't take more than a few cycles.
| + */
| +
| +#define MAX_DIFFERENCE 1000UL
| +#define MAX_ITER 10
| +static inline int
| +__read_tsc_and_apic(unsigned long *tsc, unsigned *apic)
| +{
| + unsigned long tsc0, tsc1, diff;
| + int i = 0;
| + do {
| + rdtscll(tsc0);
| + *apic = apic_read(APIC_TMCCT);
| + rdtscll(tsc1);
| + diff = tsc1 - tsc0;
| + } while (diff > MAX_DIFFERENCE && ++i < MAX_ITER);
| + *tsc = tsc0 + (diff >> 1);
| + return diff > MAX_DIFFERENCE ? -EIO : 0;
| +}
| +
| +/*
| * In this function we calibrate APIC bus clocks to the external
| * timer. Unfortunately we cannot use jiffies and the timer irq
| * to calibrate, since some later bootup code depends on getting
| @@ -318,7 +343,7 @@ static void __init calibrate_APIC_clock(
| {
| unsigned apic, apic_start;
| unsigned ...> + > + return -EIO ; This is wrong - you need to set *tsc also in the -EIO case, otherwise the function can return total bogus. I have to say that my simplified patch failed to do the calibration correctly on our test system (the original patch worked well). Please stay tuned, we are investigating this currently. Martin -- Martin Wilck PRIMERGY System Software Engineer FSC IP ESP DEV 6 Fujitsu Siemens Computers GmbH Heinz-Nixdorf-Ring 1 33106 Paderborn Germany Tel: ++49 5251 8 15113 Fax: ++49 5251 8 20209 Email: mailto:martin.wilck@fujitsu-siemens.com Internet: http://www.fujitsu-siemens.com Company Details: http://www.fujitsu-siemens.com/imprint.html --
indeed, thanks! that is not the only problem - I also initialized 'i' ok, Martin, I'm leaving for vacation soon - hope someone else could take a look :) - Cyrill - --
Please forget that. It was observed on an old "enterprise" kernel with which we are currently testing the backported patch with, and it was due to the fact that the initial counter value on that kernel was divided by APIC_DIVISOR (=16). The resulting initial counter value was too low in a "SMI flood" case, the counter could overlap. APIC_DIVISOR is no longer used in the current kernel. Martin -- Martin Wilck PRIMERGY System Software Engineer FSC IP ESP DEV 6 Fujitsu Siemens Computers GmbH Heinz-Nixdorf-Ring 1 33106 Paderborn Germany Tel: ++49 5251 8 15113 Fax: ++49 5251 8 20209 Email: mailto:martin.wilck@fujitsu-siemens.com Internet: http://www.fujitsu-siemens.com Company Details: http://www.fujitsu-siemens.com/imprint.html --
Martin, if I understood you right - this means your patch is not needed? Actually on 64bit mode APIC_DIVISOR is a bit hidden in __setup_APIC_LVTT - you may see it as APIC_TDR_DIV_16 while setting up divisor register. I was proposing patch for that but it leaded to potetntial overflow (thanks Ingo for catching) so we leave it as is. Maybe I miss something? - Cyrill - --
The patch would still be needed. Just the reported failure of my simplified patch on the old kernel would not have occurred in the current kernel. IOW, the patch is fine for the current kernel, but not The problem was not that the divisor 16 was used for the counter speed (APIC_TDR_DIV_16), but that the old code set the counter start value to (250000000/16) rather than just 250000000. That means the counter will underflow earlier. I am attaching a "take 3" patch which minimizes the risk of an underflow by using the maximum possible initial value for the APIC timer. Martin -- Martin Wilck PRIMERGY System Software Engineer FSC IP ESP DEV 6 Fujitsu Siemens Computers GmbH Heinz-Nixdorf-Ring 1 33106 Paderborn Germany Tel: ++49 5251 8 15113 Fax: ++49 5251 8 20209 Email: mailto:martin.wilck@fujitsu-siemens.com Internet: http://www.fujitsu-siemens.com Company Details: http://www.fujitsu-siemens.com/imprint.html --
Martin, it seems you forgot to attach the patch :) (since it was continious tense I thought I would see patch in minute but... it was not eventually apeeared :) - Cyrill - --
I am sorry, here it is. Thanks :) Martin -- Martin Wilck PRIMERGY System Software Engineer FSC IP ESP DEV 6 Fujitsu Siemens Computers GmbH Heinz-Nixdorf-Ring 1 33106 Paderborn Germany Tel: ++49 5251 8 15113 Fax: ++49 5251 8 20209 Email: mailto:martin.wilck@fujitsu-siemens.com Internet: http://www.fujitsu-siemens.com Company Details: http://www.fujitsu-siemens.com/imprint.html
| [PATCH] x86 (64): make calibrate_APIC_clock() SMI-safe (take 3)
|
| Non-maskable asynchronous events (e.g. SMIs) which occur during the APIC
| timer calibration can cause timer miscalibrations, sometimes by large amounts.
| This patch fixes this by making sure that no significant interruption occurs
| between APIC and TSC reads. SMIs may still occur at some stage in the
| calibration loop, causing the loop to last longer than intended. This
| doesn't matter though, as long as the start and end values are both
| taken simultaneously.
|
| Changed wrt take 2: Use max. possible start value for the APIC timer
| to avoid underflow.
|
| Signed-off-by: Martin Wilck <martin.wilck@fujitsu-siemens.com>
| Signed-off-by: Gerhard Wichert <gerhard.wichert@fujitsu-siemens.com>
|
| --- arch/x86/kernel/apic_64.c 2008-07-25 15:39:51.000000000 +0200
| +++ arch/x86/kernel/apic_64.c.new 2008-07-25 15:55:08.000000000 +0200
| @@ -300,6 +300,31 @@ static void setup_APIC_timer(void)
| }
|
| /*
| + * Helper function for calibrate_APIC_clock(): Make sure that
| + * APIC TMCTT and TSC are read at the same time, to reasonable
| + * accuracy. On any sane system, the retry loop won't need more
| + * than a single retry, given that the rdtsc/apic_read/rdtsc
| + * sequence won't take more than a few cycles.
| + */
| +
| +#define MAX_DIFFERENCE 1000UL
| +#define MAX_ITER 10
| +static inline int
| +__read_tsc_and_apic(unsigned long *tsc, unsigned *apic)
| +{
| + unsigned long tsc0, tsc1, diff;
| + int i = 0;
| + do {
| + rdtscll(tsc0);
| + *apic = apic_read(APIC_TMCCT);
| + rdtscll(tsc1);
| + diff = tsc1 - tsc0;
| + } while (diff > MAX_DIFFERENCE && ++i < MAX_ITER);
| + *tsc = tsc0 + (diff >> 1);
| + return diff > MAX_DIFFERENCE ? -EIO : 0;
| +}
| +
| +/*
| * In this function we calibrate APIC bus clocks to the external
| * timer. Unfortunately we cannot use jiffies and the timer irq
| * to calibrate, since some later bootup code depends on getting
| @@ -318,7 ...nice - could you please implement it symmetrically on 32-bit APIC calibration as well? Ingo --
Hi all, Sorry for replying to such an old thread, but did this patch go anywhere? I can't seem to find it in git. Was it somehow obsoleted by a different fix for the same issue (SMI flood during APIC calibration)? Or is the upstream kernel still affected? Thanks, -- Jean Delvare Suse L3 --
Hi Jean, We were asked for a solution for all affected architectures (in particular, also i386) which was more than we were able to do at the time, given that the pressure had been reduced by finding a BIOS fix for the particular situation on system in question. The APIC clock calibration code on i386 is more complex than x86_64, we saw no way to provide a high-quality, regression-safe, tested patch for that to upstream with a reasonable amount of effort. That aside, I still think the x86_64 patch is fine and won't cause regressions. It just doesn't help on 32bit systems. Martin -- Martin Wilck PRIMERGY System Software Engineer FSC IP ESP DEV 6 Fujitsu Siemens Computers GmbH Heinz-Nixdorf-Ring 1 33106 Paderborn Germany Tel: ++49 5251 525 2796 Fax: ++49 5251 525 2820 Email: mailto:martin.wilck@fujitsu-siemens.com Internet: http://www.fujitsu-siemens.com Company Details: http://www.fujitsu-siemens.com/imprint.html --
Note that the SMIs may be triggered when the APIC is read. This may change after the first (or the first few) SMIs have been triggered. So the "too many SMIs" case during the calibration does not necessarily mean that the system can not run normally after the calibration is done. This is why I would prefer the solution with the error message. Regards, -- Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, Nürnberg --
The other thing we may want to consider is doing multiple runs of the calibration loop. I suspect we'd get a more accurate result running, say, 9 loops of the 1/9 the current duration and looking either for the best result or the median (NOT the mean.) -hpa --
We considered that. However I think if you know what result to expect (here: TICK_COUNT + a few cycles) it is both easier and safer to compare against the expected value than to do statistics. SMI protection is all about outliers (thus, the median is a good suggestion but I'd say it would be over-enigineered). Martin PS: I am sorry for the stupid, misleading typo (SMi -> smp) in the mail subject. -- Martin Wilck PRIMERGY System Software Engineer FSC IP ESP DEV 6 Fujitsu Siemens Computers GmbH Heinz-Nixdorf-Ring 1 33106 Paderborn Germany Tel: ++49 5251 8 15113 Fax: ++49 5251 8 20209 Email: mailto:martin.wilck@fujitsu-siemens.com Internet: http://www.fujitsu-siemens.com Company Details: http://www.fujitsu-siemens.com/imprint.html --
