just if you want this lockup to go away. I think you did the hardest bit
already: to detect the situation reliably, without false positives. Now
the 'action' needs to change: instead of 'turning off ftrace' (which is
brutal - ftrace was just the last drop of water that pushed the system
over the edge), we can instead do 'double the minimum clockevent delta
threshold'.
there's already such code in kernel/time/tick-oneshot.c:
/*
* We tried 2 times to program the device with the given
* min_delta_ns. If that's not working then we double it
* and emit a warning.
*/
if (++i > 2) {
/* Increase the min. delta and try again */
if (!dev->min_delta_ns)
dev->min_delta_ns = 5000;
else
dev->min_delta_ns += dev->min_delta_ns >> 1;
what would be needed is to simply double ->min_delta_ns on every such
situation you detect? Once you do that, it takes effect on the next tick
automatically.
Or something like that. In theory. :-)
Ingo
--