Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer

Previous thread: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer by Aaron Durbin on Wednesday, August 8, 2007 - 4:08 pm. (1 message)

Next thread: [PATCH] lib/sort.c, 2.6.22 kernel by Subbaiah Venkata on Wednesday, August 8, 2007 - 4:19 pm. (3 messages)
From: Aaron Durbin
Date: Wednesday, August 8, 2007 - 4:17 pm

In setup_APIC_timer with the HPET in use, a condition can arise while
waiting for the next irq slice to expire on the HPET which will either
cause an NMI watchdog to fire or a 3 minute busy loop if the NMI
watchdog is disabled.

The HPET comparator and the counter keep incrementing during its normal
operation. When a comparison event fires the comparator will increment
by the designated period. If the HPET trigger occurs right after
the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin
for up to 3 minutes (with a clock of 25MHz) waiting for the HPET
counter to wrap around. However, when the NMI watchdog is enabled the
NMI watchdog will detect a lockup and reboot the machine. This
scenario can be exasperated by the presence of an SMI which will
increase the window of opportunity for the condition to occur.

The fix is to wait for the compartor to change which signals the
end of the tick slice.

---

The last patch had a typo in the diff which really would cause the
problem state above. Sorry.

diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c
index 900ff38..06797e2 100644
--- a/arch/x86_64/kernel/apic.c
+++ b/arch/x86_64/kernel/apic.c
@@ -791,10 +791,12 @@ static void setup_APIC_timer(unsigned in
 
 	/* wait for irq slice */
 	if (hpet_address && hpet_use_timer) {
+		/*
+		 * Wait for the comparator value to change which signals that
+		 * the tick slice has expired.
+		 */
 		int trigger = hpet_readl(HPET_T0_CMP);
-		while (hpet_readl(HPET_COUNTER) >= trigger)
-			/* do nothing */ ;
-		while (hpet_readl(HPET_COUNTER) <  trigger)
+		while (trigger == hpet_readl(HPET_T0_CMP))
 			/* do nothing */ ;
 	} else {
 		int c1, c2;

-

From: Andi Kleen
Date: Wednesday, August 8, 2007 - 5:26 pm

Nasty.

Iirc the clockevents code did away with the synchronization
completely because Thomas determined it was useless because long term
it'll drift anyways. So perhaps it could be just removed?

-

From: Aaron Durbin
Date: Wednesday, August 8, 2007 - 5:44 pm

I was thinking along the same lines as well, but I really didn't know how
important all that code was for waiting for the next irq slice. I'm not time
expert, but I would imagine we would resynchronize correctly in the future after
this code path?
-

From: Andi Kleen
Date: Wednesday, August 8, 2007 - 5:50 pm

The idea is to just have the apic timers roughly at the same times + some fixed
offset as the main timer.  Otherwise e.g. add_timer could jitter up to a jiffie
regarding the jiffies counter. But long term drift will destroy the synchronization 
anyways.

I guess i'll send it in anyways for .23; it would be too risky to change anything 
radical in this area that late. Then later it can be revisited.

-Andi 


-

From: Andrew Morton
Date: Thursday, August 9, 2007 - 1:30 pm

On Wed, 8 Aug 2007 16:17:19 -0700

but, but.  Didn't this get fixed by
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc2/2.6.23-rc2-mm...
?
-

From: Aaron Durbin
Date: Thursday, August 9, 2007 - 2:16 pm

That original patch looks good to me as well. I just checked git head to see if
this was already fixed upstream, and it didn't appear to be. Sorry for the
duplicate patch.

-Aaron
-

From: Andrew Morton
Date: Thursday, August 9, 2007 - 2:56 pm

On Thu, 9 Aug 2007 14:16:16 -0700

Yeah, sorry, I'm being a bit more batch-oriented in getting stuff into
Linus nowadays.  I seem to have 44 patches to go and another 20 which 
should go, but which belong to subsystem maintainers.  We'll get there.
-

Previous thread: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer by Aaron Durbin on Wednesday, August 8, 2007 - 4:08 pm. (1 message)

Next thread: [PATCH] lib/sort.c, 2.6.22 kernel by Subbaiah Venkata on Wednesday, August 8, 2007 - 4:19 pm. (3 messages)