This patch introduces a cpu time clock for s390 (only ticking
if the virtual cpu is running) and bases the s390 implementation
of sched_clock() on it.
The times lice length on a virtual cpu can be anything
between the calculated time slice and zero. In reality
this doesn't seem to be problem, since the scheduler is fair
enough to not let a single process starve but the current
implementation can lead to inefficient short time slices.
By providing a 'virtual' sched_clock() we guarantee that a
process can get its time slice regardless of scheduling
decisions from the hypervisor.
Patch applies to 2.6.22 git and works fine with CFS.
Jan
--
arch/s390/kernel/time.c | 18 ++++++++++++------
arch/s390/kernel/vtime.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
include/asm-s390/timer.h | 2 ++
3 files changed, 59 insertions(+), 6 deletions(-)
--- ./include/asm-s390/timer.h.cpu_clock 2007-07-18 13:43:53.000000000 +0200
+++ ./include/asm-s390/timer.h 2007-07-18 20:41:13.000000000 +0200
@@ -48,6 +48,8 @@
extern void init_cpu_vtimer(void);
extern void vtime_init(void);
+extern unsigned long long cpu_clock(void);
+
#endif /* __KERNEL__ */
#endif /* _ASM_S390_TIMER_H */
--- ./arch/s390/kernel/time.c.cpu_clock 2007-07-18 13:43:35.000000000 +0200
+++ ./arch/s390/kernel/time.c 2007-07-18 21:01:07.000000000 +0200
@@ -62,21 +62,27 @@
static u64 xtime_cc;
/*
- * Scheduler clock - returns current time in nanosec units.
+ * Monotonic_clock - returns # of nanoseconds passed since time_init()
*/
-unsigned long long sched_clock(void)
+unsigned long long monotonic_clock(void)
{
return ((get_clock() - jiffies_timer_cc) * 125) >> 9;
}
+EXPORT_SYMBOL(monotonic_clock);
/*
- * Monotonic_clock - returns # of nanoseconds passed since time_init()
+ * Scheduler clock - returns current time in nanosec units.
+ * Now based on virtual cpu time to only account time the guest
+ * was actually running.
*/
-unsigned long long ...The Xen sched_clock implementation is very similar, and it seems to work
Runn*ing*? Does it include time the VCPU spends idle/blocked? If not,
then the scheduler won't be able to tell how long a process has been
asleep. Maybe this doesn't matter (I had this problem in a version of
Xen's sched_clock, and I can't say I saw an ill effects from it).
J
-
Good point .. I think we need a measure of both virtual and real time here - virtual for accounting task-execution time and real for I guess it will show up as some corner case behaviour, which people are yet to discover on virtual env. -- Regards, vatsa -
CFS does measure time elapsed across task-sleep periods (and does something similar to what the old scheduler's 'sleep average' interactivity mechanism did), but that mechanism measures "time spent running during sleep", not "time spent idling". still, CFS needs time measurement across idle periods as well, for another purpose: to be able to do precise task statistics for /proc. (for top, ps, etc.) So it's still true that sched_clock() should include idle periods too. Ingo -
No, it does not include idle time, if we're going idle the cpu timer I'm not sure, s390 already has an implemetation for precise accounting in the architecture code, does CFS also improve accounting data? Jan -
what kind of precise accounting does s390 have in the architecture code? CFS changes task (and load) accounting to be sched_clock() driven in essence. Ingo -
s390 has per-process accounting that is aware of virtual cpu time, implemented in arch/s390/kernel/time.c: account_ticks() and arch/s390/kernel/vtime.c: account_*_vtime(). Timestamps are taken in entry.S for system calls, interrupts and other system entries and are accounted later, we don't call update_process_times(). -
PowerPC's sched_clock() currently measures real time. On POWER5 and POWER6 machines we could change it to use a register called the "PURR" (for Processor Utilization of Resources Register), which only measures time spent while the partition is running. But the PURR has another function as well: it measures the distribution of dispatch cycles between the two hardware threads on each core when running in SMT mode. That is, the cpu dispatches instructions from one thread or the other (not both) on each CPU cycle, and each thread's PURR only gets incremented on cycles where the cpu dispatches instructions for that thread. So the sum of the two threads' PURRs adds up to real time. Do you think this makes the PURR more useful for CFS, or less? To me it looks like this would mean that CFS can make a more equitable distribution of CPU time if, for example, you had 3 runnable tasks on a 2-core x dual-threaded machine (4 virtual CPUs). BTW, what does "time spent running during sleep" mean? Does it mean As with s390, 64-bit PowerPC also uses CONFIG_VIRT_CPU_ACCOUNTING. That affects how tsk->utime and tsk->stime are accumulated (we call account_user_time and account_system_time directly rather than calling update_process_times) as well as the system hardirq/softirq time, idle time, and stolen time. When you say "precise task statistics for /proc", where are they accumulated? I don't see any changes to the way that tsk->utime and ctime are computed. Paul. -
Sounds reasonable to me. I've proposed in the past that sched_clock
should be scaled by the cpufreq frequency to achieve the same effect
(ie, measure the actual number of cpu cycles that are really available
to tasks).
But more specifically, what you've described is exactly analogous to
That's how I interpreted it. You're only credited for sleeping if
someone else wanted the CPU in the meantime.
J
-
there's one complication: sched_clock() still needs to increase while the CPU (or thread) is idle, so that we can have a correct measurement of the CPU's utilization, for SMP load-balancing. CFS constructs another clock from sched_clock() [the rq->fair_clock] which does stop while the CPU is idle. So perhaps a combination of the PURR and real-time might work as sched_clock(): when a hardware thread is in cpu_idle(), it should advance its sched clock with _half_ the rate of real-time [so that the sum of advance of all threads if they are all idle is equal to real time], and use the PURR if they are not idle. This would still correctly keep a meaningful load-average if the physical CPU is under-utilized. If you do such a change you'll immediately see whether the approach is right: monitor the cpu_load[] values in /proc/sched_debug, they should match the intuitive 'load average' of that CPU (if divided by 1024), and yeah. It's "the amount of fair runtime i missed out on while others were tsk->utime and tsk->stime is only used for a single purpose: to determine the 'split' factor of how to split up the precise total time we now use p->se.sum_exec_runtime that measures (in nanoseconds) the precise amount of time spent executing (sum of system and user time) - and ->stime and ->utime is used to determine the 'split'. [this allows us to gather ->stime and ->utime via low-resolution sampling, while keeping the 'total' precise. Accounting at every system entry point would be quite expensive on most platforms.] Ingo -
Using se.sum_exec_runtime to generate ->utime and ->stime breaks the process accounting we have on s390 (and probably on PowerPC too). With CONFIG_VIRT_CPU_ACCOUNTING we already have precise values in ->utime and ->stime. Can we make the calculation of the CFS-based time values conditional by CONFIG_VIRT_CPU_ACCOUNTING? -
At least for s390 and powerpc the utime and stime already contain a very precise value how much time was spent in the user and system context. For s390 the granularity is a microsecond. The other values nice, idle, With the exact accounting of utime and stime that would mean that p->se.sum_exec_runtime is utime + stime, no? Precise Accounting at every cpu context switch has some cost, but for s390 it is not as bad as it sounds. We do 2 store-cpu-timer (STPT) instructions, 2 64 bit adds and 2 64 bit subtracts. In terms of cycles Imho, we just have to update utime and stime when the process accounting values are requested and set se.sum_exec_runtime to the sum of utime and stime for CONFIG_VIRT_CPU_ACCOUNTING=y. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. -
