I've re-based my timekeeping patch queue ontop of Roman's recent NTP and div64 patches. I've been a little short on time, so I've not been able to get them all tested (ntp testing takes a number of hours), but they do build and I wanted to get some feedback on the re-base. My apologies to Ingo for sitting on his patches for this long. I'll get a test setup going over the weekend to make sure all is well. The total patch queue is large, so while folks probably won't try it themselves, here's the full series I'm using to build: #start w/ linus' -git #roman's udiv patches 01-udiv.patchd 02-convert.patch 03-remove.patch #roman's ntp changes 1-cleanup.patch 2-ntp4.patch 3-timefreq.patch 4-timeoffset.patch 5-tai.patch 6-scaleshift.patch 7-ticklength.patch 8-leap.patch #my timekeeping changes mult-adj-split.patch monotonic-raw #ntp cleanups from Ingo and me ntp-cleanup-ingo1.patch ntp-cleanup-ingo2.patch ntp_cleanup-john.patch --
The clocksource frequency is represented by
clocksource->mult/2^(clocksource->shift). Currently, when NTP makes
adjustments to the clock frequency, they are made directly to the mult
value.
This has the drawback that once changed, we cannot know what the orignal
mult value was, or how much adjustment has been applied.
This property causes problems in calculating proper ntp intervals when
switching back and forth between clocksources.
This patch separates the current mult value into a mult and mult_adj
pair. The mult value stays constant, while the ntp clocksource
adjustments are done only to the mult_adj value.
This allows for correct ntp interval calculation and additionally lays
the groundwork for a new notion of time, what I'm calling the
monotonic-raw time, which is introduced in a following patch.
Thoughts or comments would be appreciated.
thanks
-john
Signed-off-by: John Stultz <johnstul@us.ibm.com>
diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 17fda52..37851e1 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -357,7 +357,7 @@ void update_vsyscall(struct timespec *wall, struct clocksource *c)
/* copy fsyscall clock data */
fsyscall_gtod_data.clk_mask = c->mask;
- fsyscall_gtod_data.clk_mult = c->mult;
+ fsyscall_gtod_data.clk_mult = c->mult + c->mult_adj;
fsyscall_gtod_data.clk_shift = c->shift;
fsyscall_gtod_data.clk_fsys_mmio = c->fsys_mmio;
fsyscall_gtod_data.clk_cycle_last = c->cycle_last;
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 3b26fbd..36aa8da 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -814,7 +814,7 @@ void update_vsyscall(struct timespec *wall_time, struct clocksource *clock)
/* XXX this assumes clock->shift == 22 */
/* 4611686018 ~= 2^(20+64-22) / 1e9 */
- t2x = (u64) clock->mult * 4611686018ULL;
+ t2x = (u64) (clock->mult + cl...Hi, This add an extra memory access and extra instruction to a code path, we should keep as short as possible. I'd rather add a mult_raw or mult_org and use that for your next patch. bye, Roman --
Yea, I played with that at first, but the functoinal duplication (two values, both storing close to the same info) was ugly. I guess I could go back to mult_orig. -john --
In talking with Josip Loncaric, and his work on clock synchronization
(see btime.sf.net), he mentioned that for really close synchronization,
it is useful to have access to "hardware time", that is a notion of time
that is not in any way adjusted by the clock slewing done to keep close
time sync.
Part of the issue is if we are using the kernel's ntp adjusted
representation of time in order to measure how we should correct time,
we can run into what Paul McKenney aptly described as "Painting a road
using the lines we're painting as the guide".
I had been thinking of a similar problem, and was trying to come up with
a way to give users access to a purely hardware based time
representation that avoided users having to know the underlying
frequency and mask values needed to deal with the wide variety of
possible underlying hardware counters.
My solution is to introduce CLOCK_MONOTONIC_RAW. This exposes a
nanosecond based time value, that increments starting at bootup and has
no frequency adjustments made to it what so ever.
The time is accessed from userspace via the posix_clock_gettime()
syscall, passing CLOCK_MONOTONIC_RAW as the clock_id.
This patch very much depends on the mult/mult_adj split patch in this
series.
Thoughts comments would be greatly appreciated!
thanks
-john
Signed-off-by: John Stultz <johnstul@us.ibm.com>
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index e917a30..1460803 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -79,6 +79,7 @@ struct clocksource {
/* timekeeping specific data, ignore */
cycle_t cycle_interval;
u64 xtime_interval;
+ u64 raw_interval;
/*
* Second part is written at each timer interrupt
* Keep it in a different cache line to dirty no
diff --git a/include/linux/time.h b/include/linux/time.h
index 2091a19..9548eb4 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -116,6 +116,7 @@ extern int do_setitimer(int which, struct itimerval...Hi, A general comment: the raw clock doesn't need any adjustments, so updates don't have to be done that frequently and you can move most of that logic IMO that's really a clock property, so this belongs in the clock structure. (Some day we may want to have multiple active clocks for various purposes You don't really have to use clock->shift as a scale, thus simplifying the shifting here (e.g. by using NTP_SCALE_SHIFT). I'm thinking about doing this for e.g. xtime_nsec as well. bye, Roman --
You're suggesting adding MONTONIC_RAW code to ntp.c ? That doesn't make much sense to me (the whole point is its not ntp adjusted). Even if you mean just inside the "if (clock->xtime_nsec >= (u64)NSEC_PER_SEC ..." conditional, then that means we don't get to leverage the cycles_interval, and cycle_last, values, so all of that would have to be duplicated and managed. Which I'm not sure it would save us much more I disagree. I think that crufts up the clocksource structure (which is ideally just a simple hw counter abstraction), with timekeeping state. I'm still not sold on the multiple clocks with multiple notions of time idea you keep on bringing up. But if/when we cross that bridge, maybe it would be better to add a timekeeping_clock mid-layer abstraction that keeps the clocksource specific timekeeping state. That way we don't add lots of complexity for the clocksource driver writers to deal with and we allow the clocksources to be better re-purposed (for maybe more sane Could you explain this more? Given the clocksources have different shift values, why should we not keep the extra resolution? How does adding the extra shifting to convert from the clocksource scale to the NTP_SCALE_SHIFT value simplify the shifting? thanks -john --
Hi, I didn't looked at the code, what I meant was moving it close to it, so it's You could use a shift (e.g. SHIFT_HZ for non NO_HZ), to scale these value up a It's not cruft, littering the code with unnecessary static variables is IMO worse. In OO terms this would be an abstract base class, the actual implementation only needs to provide a few functions and otherwise doesn't really have to worry about all the details. Splitting the structure is certainly an option, but hiding related variables Well, here are some ideas where I think it might be useful, it would make it possible to chain clocks with different frequencies: - SMP systems with slighty different clock frequencies. - instead of deactivating an unstable tsc clock, delay activation until we know it's stable. - deactivate a tsc (which stops during sleep) before sleep and resync when needed. - use different clocks for timer purposes (e.g. jiffies and a slow clock). (The last point would make IMO more acceptable to use hrtimer for more trivial The extra resolution is still there, as it doesn't make any sense to use a NTP_SCALE_SHIFT is a constant and a very simple shift, which produces better code. There are two ways to use the nsec value here: 1. nsec_now = (((cycle_new - cycle_last) * mult) + nsec) >> shift 2. nsec_now = ((cycle_new - cycle_last) * mult) >> shift + nsec >> NTP_SCALE_SHIFT Both do the same under the condition that the gettime operation is not significantly shorter than 1nsec, but the second version can generate slightly better code. I intended to use the first for xtime_nsec, but instead the value is still splitted at every timer interrupt. For the raw time please at least get rid of the splitting of raw_nsec in update_wall_time(). From the monotonic_raw value you only need the tv_sec part and the tv_nsec part can be calculated as above. bye, Roman --
Bah. Ok, I've talked myself out of this one. I still think it crufts up the clocksource structure, but its more consistent that we follow the established cruft (such as the pre-calculated cycle_interval/xtime_interval/raw_interval combo) rather then me trying to arbitrarily draw the line in the sand at this I still think pulling out all of the non-counter-abstraction bits out of the clocksource and into a mid-level timekeeping_clock structure would still be ideal here, but I'll save our time/energy on that one for another day. :) thanks -john --
Make this file more readable by applying a consistent coding style. No code changed. Signed-off-by: Ingo Molnar <mingo@elte.hu> Merged on top of Roman's very similar cleanups. (In other words: Blame John for the merge screwups). Signed-off-by: John Stultz <johnstul@us.ibm.com> Index: linux-2.6/kernel/time/ntp.c =================================================================== --- linux-2.6.orig/kernel/time/ntp.c 2008-03-14 19:21:02.000000000 -0700 +++ linux-2.6/kernel/time/ntp.c 2008-03-14 20:10:41.000000000 -0700 @@ -1,13 +1,10 @@ /* - * linux/kernel/time/ntp.c - * * NTP state machine interfaces and logic. * * This code was mainly moved from kernel/timer.c and kernel/time.c * Please see those files for relevant copyright info and historical * changelogs. */ - #include <linux/mm.h> #include <linux/time.h> #include <linux/timer.h> @@ -22,32 +19,35 @@ /* * Timekeeping variables */ -unsigned long tick_usec = TICK_USEC; /* USER_HZ period (usec) */ -unsigned long tick_nsec; /* ACTHZ period (nsec) */ -u64 tick_length; -static u64 tick_length_base; - -static struct hrtimer leap_timer; - -#define MAX_TICKADJ 500 /* microsecs */ -#define MAX_TICKADJ_SCALED (((u64)(MAX_TICKADJ * NSEC_PER_USEC) << \ - NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ) +unsigned long tick_usec = TICK_USEC; /* USER_HZ period (usec) */ +unsigned long tick_nsec; /* ACTHZ period (nsec) */ +u64 tick_length; +static u64 tick_length_base; +static const long max_tickadj = 500; /* (usec) */ + +static struct hrtimer leap_timer; /* * phase-lock loop variables */ /* TIME_ERROR prevents overwriting the CMOS clock */ -static int time_state = TIME_OK; /* clock synchronization status */ -int time_status = STA_UNSYNC; /* clock status bits */ -static long time_tai; /* TAI offset (s) */ -static s64 time_offset; /* time adjustment (ns) */ -static long time_constant = 2; /* pll time constant */ -long time_maxerror = ...
<snip> Since you are at it, isn't "time_adj" a poor name since you also have a "time_adjust"? -- Regards, Jörg-Volker. --
Hi, These are two different (and independent) mechanism to adjust time. The=20 names are a little confusing, but I'd be a little reluctant to rename=20 time_adj, as the same name is used in the reference implementation. bye, Roman
Hi, That's a matter of taste, but MAX_TICKADJ just works as well, so why I must say I really hate this kind of formating, it either gets out of sync or requires reformatting. I would very strongly prefer not to The function argument is actually s32 and depending on NTP_INTERVAL_FREQ it might produce a link error on 32bit archs. bye, Roman --
From: Ingo Molnar <mingo@elte.hu>
Subject: [patch] ntp: clean up code flow
clean up the flow of code by splitting the way too large and way too
nested do_adjtimex() function.
no change in the logic.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Merged on top of Roman's very similar cleanups.
(In other words: Blame John for the merge screwups).
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Index: linux-2.6/kernel/time/ntp.c
===================================================================
--- linux-2.6.orig/kernel/time/ntp.c 2008-03-14 20:10:41.000000000 -0700
+++ linux-2.6/kernel/time/ntp.c 2008-03-14 20:43:48.000000000 -0700
@@ -275,6 +275,87 @@ static void notify_cmos_timer(void)
static inline void notify_cmos_timer(void) { }
#endif
+static inline void process_adjtimex_adj_offset(struct timex *txc, long sec)
+{
+ if (txc->modes & ADJ_STATUS) {
+ if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
+ time_state = TIME_OK;
+ time_status = STA_UNSYNC;
+ }
+ /* only set allowed bits */
+ time_status &= STA_RONLY;
+ time_status |= txc->status & ~STA_RONLY;
+
+ switch (time_state) {
+ case TIME_OK:
+ start_timer:
+ if (time_status & STA_INS) {
+ time_state = TIME_INS;
+ sec += 86400 - sec % 86400;
+ hrtimer_start(&leap_timer, ktime_set(sec, 0), HRTIMER_MODE_ABS);
+ } else if (time_status & STA_DEL) {
+ time_state = TIME_DEL;
+ sec += 86400 - (sec + 1) % 86400;
+ hrtimer_start(&leap_timer, ktime_set(sec, 0), HRTIMER_MODE_ABS);
+ }
+ break;
+ case TIME_INS:
+ case TIME_DEL:
+ time_state = TIME_OK;
+ goto start_timer;
+ break;
+ case TIME_WAIT:
+ if (!(time_status & (STA_INS | STA_DEL)))
+ time_state = TIME_OK;
+ break;
+ case TIME_OOP:
+ hrtimer_restart(&leap_timer);
+ break;
+ }
+ }
+
+ if (txc->modes & ADJ_NANO)
+ time_status |= STA_NANO;
+ if (txc->modes & ADJ_MICRO)
+ time_status &...Hi John, I don't understand, why the goto is required here. If you move these two cases in front of case "TIME_OK" and omit the break, you can remove the goto and the label "start_timer". Don't forget the /* fall through */ comment then. If you want to keep this patch a simple code movement, maybe add a later patch to improve this on top of it. Best Regards Ingo Oeser --
Hi, Nice catch, I'll do this in the original patch. bye, Roman --
Hi, Somewhat ugly name... From an userspace perspective it's the core of ntp_adjtime(), so I'd prefer something close to that. bye, Roman --
Oh, that's a mis-merge. Thanks for catching it. The name should have been: process_adjtimex_modes(). process_adjtimex_adj_offset was basically the same as your ntp_update_offset() function, so I killed it but grabbed the wrong does process_adjtimex_modes(), or ntp_process_adjtimex_modes work for you? thanks -john --
Make time_status, time_maxerror and time_esterror static to ntp.c
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Index: linux-2.6/arch/cris/arch-v32/kernel/time.c
===================================================================
--- linux-2.6.orig/arch/cris/arch-v32/kernel/time.c 2008-02-13 12:07:00.000000000 -0800
+++ linux-2.6/arch/cris/arch-v32/kernel/time.c 2008-03-14 20:13:24.000000000 -0700
@@ -248,8 +248,7 @@ timer_interrupt(int irq, void *dev_id)
* The division here is not time critical since it will run once in
* 11 minutes
*/
- if ((time_status & STA_UNSYNC) == 0 &&
- xtime.tv_sec > last_rtc_update + 660 &&
+ if (ntp_synced() && xtime.tv_sec > last_rtc_update + 660 &&
(xtime.tv_nsec / 1000) >= 500000 - (tick_nsec / 1000) / 2 &&
(xtime.tv_nsec / 1000) <= 500000 + (tick_nsec / 1000) / 2) {
if (set_rtc_mmss(xtime.tv_sec) == 0)
Index: linux-2.6/arch/mn10300/kernel/rtc.c
===================================================================
--- linux-2.6.orig/arch/mn10300/kernel/rtc.c 2008-02-13 12:07:00.000000000 -0800
+++ linux-2.6/arch/mn10300/kernel/rtc.c 2008-03-14 20:13:24.000000000 -0700
@@ -117,8 +117,7 @@ void check_rtc_time(void)
* RTC clock accordingly every ~11 minutes. set_rtc_mmss() has to be
* called as close as possible to 500 ms before the new second starts.
*/
- if ((time_status & STA_UNSYNC) == 0 &&
- xtime.tv_sec > last_rtc_update + 660 &&
+ if (ntp_synced() && xtime.tv_sec > last_rtc_update + 660 &&
xtime.tv_nsec / 1000 >= 500000 - ((unsigned) TICK_SIZE) / 2 &&
xtime.tv_nsec / 1000 <= 500000 + ((unsigned) TICK_SIZE) / 2
) {
Index: linux-2.6/include/linux/timex.h
===================================================================
--- linux-2.6.orig/include/linux/timex.h 2008-03-14 19:21:02.000000000 -0700
+++ linux-2.6/include/linux/timex.h 2008-03-14 20:18:44.000000000...Hi, time_maxerror and time_esterror are ok, but ntp_synced() is such a small function, I'd rather keep it inlined. bye, Roman --
| Adrian Bunk | If you want me to quit I will quit |
| Andi Kleen | [PATCH x86] [6/16] Add a new arch_early_alloc() interface for x86-64 |
| Chuck Ebbert | Why do so many machines need "noapic"? |
| Linus Torvalds | Linux 2.6.27-rc8 |
git: | |
| Sean Brown | git repository size vs. subversion repository size |
| Linus Torvalds | Re: git versus CVS (versus bk) |
| drafnel | [PATCH 2/5] Make mktag a builtin. |
| Andrew Morton | Untracked working tree files |
| Richard Stallman | Real men don't attack straw men |
| Todd Pytel | IDE or SCSI virtual disks for VMWare image? |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Parvinder Bhasin | Disabling IPv6 ? |
| Mark Seger | occasionally corrupted network stats in /proc/net/dev |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Hannes Eder | [PATCH 00/27] drivers/net: fix sparse warnings |
| Gerrit Renker | [PATCH 36/37] dccp: Initialisation and type-checking of feature sysctls |
