I don't think that's necessarily a contradiction, if we keep it to
confronting the problem. A simple patch wouldn't have provided any further
understanding of the problem compared to what I already said. You would
have seen what the patch does (which I described already differently), but
not really why it does that.
In this sense I prefer to force the confrontation of the problem. I'm
afraid a working patch would encourage to simply ignore the problem, as
your problem at hand would be solved without completely understanding it.
The point is that I'd really like you to understand the problem, so I'm
not the only one who understands this code :) and in the end it might
allow better collaboration to further improve this code.
To make it very clear this is just about understanding the problem, I
don't want to force a specific solution (which a patch would practically
do). If we both understand the problem, we can also discuss the solution
and maybe we find something better, but maybe I'm also totally wrong,
which would be a little embarrassing :), but that would be fine too. There
may be better ways to go about this problem, but IMO it would still be
better than just ignoring the problem and force it with a patch.
If we go from your base assumption above "there is no NTP adjustment", I
would actually agree with you and it wouldn't matter much on which side
to correct the equation.
The question is now what happens, if there are NTP adjustments, i.e. when
the time_freq value isn't zero. Based on this initialization we tell the
NTP daemon the base frequency, although not directly but it knows the
length freq1 == 1 sec. If the clock now needs adjustment, the NTP daemon
tells the kernel via time_freq how to change the speed so that freq1 ==
1 sec + time_freq.
The problem is now that by using CLOCK_TICK_ADJUST we are cheating and we
don't tell the NTP daemon the real frequency. We define 1 sec as freq2 +
tick_adjust and with the NTP adjustment we have freq2 + tick_adj == 1 sec
+ time_freq. Above initialization now calcalutes the base time length for
an update interval of freq2 / NTP_INTERVAL_FREQ, this means the requested
time_freq adjustment isn't applied to (freq2 + tick_adj) cycles but to
freq2 cycles, so this finally means any adjustment made by the NTP daemon
is slightly off.
To demonstrate this let's look at some real values and let's use the PIT
for it (as this is where this originated and on which CLOCK_TICK_ADJUST is
based on). With freq1=1193182 and HZ=1000 we program the timer with 1193
cycles and the actual update frequency is freq2=1193000. To adjust for
this difference we change the length of a timer tick:
(NSEC_PER_SEC + CLOCK_TICK_ADJUST) / NTP_INTERVAL_FREQ
or
(10^9 nsec - 152533 nsec) / 1000
This way every 1000 interrupts or 1193000 cycles the clock is advanced by
999847467 nsec, so that we advance time according to freq1, but any
further adjustments aren't applied to an interval of what the NTP daemon
thinks is 1 sec but 999847467 nsec instead.
In consequence this means, if we want to improve timekeeping, we first set
the (update_cycles * NTP_INTERVAL_FREQ) interval as close as possible to
the real frequency. It doesn't has to be perfect as we usually don't know
the real frequency with 100% certainty anyway. Second, we drop the tick
adjustment if possible and leave the adjustments to the NTP daemon and as
long as the drift is within the 500ppm limit it has no problem to manage
this.
Look at your Part A, with NO_HZ the update interval is much larger, so any
rounding error becomes practically irrelevant. In the above PIT example
the update interval wouldn't be 1193 cycles but 596591 cycles instead.
I hope this helps to understand what I suggested. For NO_HZ we can just
drop this correction without problems. In the other case it's a little
more complicated, but we basically have to check the update interval and
if (interval * NTP_INTERVAL_FREQ) is within the 500ppm limit there isn't
much we really have to do.
This correction was only introduced to accommodate clocks which exceed
this 500ppm limit for whatever reason, so the NTP daemon isn't confused.
The best solution would be really to limit this correction to these
clocks.
bye, Roman
--