Re: [PATCH 1/5] Add a global synchronization point for pvclock

Previous thread: [PATCH 4/5] export new cpuid KVM_CAP by Glauber Costa on Thursday, April 15, 2010 - 11:37 am. (6 messages)

Next thread: [GIT Pull] x86 fixes for 2.6.34 by Thomas Gleixner on Thursday, April 15, 2010 - 12:11 pm. (1 message)
From: Glauber Costa
Date: Thursday, April 15, 2010 - 11:37 am

Hello folks,

In this series, I present a couple of fixes for kvmclock.
In patch 1, a guest-side fix is proposed for a problem that is biting us
for quite a while now. the tsc inside VMs does not seem to be
that good, (up to now, only single-socket nehalems were stable enough),
and we're seeing small (but nevertheless wrong) time warps inside SMP guests.
I am proposing the fix to reside on common code in pvclock.c, but would
be good to hear Jeremy on this.

On the other 3 patches, I change kvmclock MSR numbers in a compatible
fashion. Both MSR sets will be supported for a while.

Patch 5 adds documentation about kvmclock, which to date, we lacked.

Glauber Costa (5):
  Add a global synchronization point for pvclock
  change msr numbers for kvmclock
  Try using new kvm clock msrs
  export new cpuid KVM_CAP
  add documentation about kvmclock

 Documentation/kvm/kvmclock.txt  |  138 +++++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/kvm_para.h |   12 +++-
 arch/x86/kernel/kvmclock.c      |   68 +++++++++++++------
 arch/x86/kernel/pvclock.c       |   23 +++++++
 arch/x86/kvm/x86.c              |   13 ++++-
 include/linux/kvm.h             |    1 +
 6 files changed, 231 insertions(+), 24 deletions(-)
 create mode 100644 Documentation/kvm/kvmclock.txt

--

From: Glauber Costa
Date: Thursday, April 15, 2010 - 11:37 am

Avi pointed out a while ago that those MSRs falls into the pentium
PMU range. So the idea here is to add new ones, and after a while,
deprecate the old ones.

Signed-off-by: Glauber Costa <glommer@redhat.com>
---
 arch/x86/include/asm/kvm_para.h |    8 ++++++--
 arch/x86/kvm/x86.c              |    7 ++++++-
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index ffae142..0cffb96 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -17,8 +17,12 @@
 #define KVM_FEATURE_NOP_IO_DELAY	1
 #define KVM_FEATURE_MMU_OP		2
 
-#define MSR_KVM_WALL_CLOCK  0x11
-#define MSR_KVM_SYSTEM_TIME 0x12
+#define MSR_KVM_WALL_CLOCK_OLD  0x11
+#define MSR_KVM_SYSTEM_TIME_OLD 0x12
+
+/* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */
+#define MSR_KVM_WALL_CLOCK  0x4b564d00
+#define MSR_KVM_SYSTEM_TIME 0x4b564d01
 
 #define KVM_MAX_MMU_OP_BATCH           32
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8824b73..714aae2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -575,9 +575,10 @@ static inline u32 bit(int bitno)
  * kvm-specific. Those are put in the beginning of the list.
  */
 
-#define KVM_SAVE_MSRS_BEGIN	5
+#define KVM_SAVE_MSRS_BEGIN	7
 static u32 msrs_to_save[] = {
 	MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
+	MSR_KVM_SYSTEM_TIME_OLD, MSR_KVM_WALL_CLOCK_OLD,
 	HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
 	HV_X64_MSR_APIC_ASSIST_PAGE,
 	MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
@@ -1099,10 +1100,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 	case MSR_IA32_MISC_ENABLE:
 		vcpu->arch.ia32_misc_enable_msr = data;
 		break;
+	case MSR_KVM_WALL_CLOCK_OLD:
 	case MSR_KVM_WALL_CLOCK:
 		vcpu->kvm->arch.wall_clock = data;
 		kvm_write_wall_clock(vcpu->kvm, data);
 		break;
+	case MSR_KVM_SYSTEM_TIME_OLD:
 	case MSR_KVM_SYSTEM_TIME: {
 		if (vcpu->arch.time_page) {
 ...
From: Glauber Costa
Date: Thursday, April 15, 2010 - 11:37 am

We now added a new set of clock-related msrs in replacement of the old
ones. In theory, we could just try to use them and get a return value
indicating they do not exist, due to our use of kvm_write_msr_save.

However, kvm clock registration happens very early, and if we ever
try to write to a non-existant MSR, we raise a lethal #GP, since our
idt handlers are not in place yet.

So this patch tests for a cpuid feature exported by the host to
decide which set of msrs are supported.

Signed-off-by: Glauber Costa <glommer@redhat.com>
---
 arch/x86/include/asm/kvm_para.h |    4 ++
 arch/x86/kernel/kvmclock.c      |   68 +++++++++++++++++++++++++++------------
 2 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 0cffb96..a32710a 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -16,6 +16,10 @@
 #define KVM_FEATURE_CLOCKSOURCE		0
 #define KVM_FEATURE_NOP_IO_DELAY	1
 #define KVM_FEATURE_MMU_OP		2
+/* We could just try to use new msr values, but they are queried very early,
+ * kernel does not have idt handlers yet, and failures are fatal */
+#define KVM_FEATURE_CLOCKSOURCE2	3
+
 
 #define MSR_KVM_WALL_CLOCK_OLD  0x11
 #define MSR_KVM_SYSTEM_TIME_OLD 0x12
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index feaeb0d..6d814ce 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -29,6 +29,7 @@
 #define KVM_SCALE 22
 
 static int kvmclock = 1;
+static int kvm_use_new_msrs = 0;
 
 static int parse_no_kvmclock(char *arg)
 {
@@ -41,6 +42,18 @@ early_param("no-kvmclock", parse_no_kvmclock);
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock);
 static struct pvclock_wall_clock wall_clock;
 
+static int kvm_system_time_write_value(int low, int high)
+{
+	if (kvm_use_new_msrs)
+		return native_write_msr_safe(MSR_KVM_SYSTEM_TIME_OLD, low, high);
+	else
+		return ...
From: Avi Kivity
Date: Saturday, April 17, 2010 - 11:55 am

This comment has no place in a header that's oriented to userspace.  

Instead of this and all those if ()s all over the place, you can define 
variables for the MSR indices.  Set them up on initialization, the rest 
of the code just looks them up.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--

From: Avi Kivity
Date: Saturday, April 17, 2010 - 11:51 am

This is exposed to userspace.  Userspace that is compiled with the new 
headers, but runs on an old kernel, will break.

So you need to keep the old names, and define a new KVM_FEATURE for the 
new names.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--

From: Glauber Costa
Date: Thursday, April 15, 2010 - 11:37 am

In recent stress tests, it was found that pvclock-based systems
could seriously warp in smp systems. Using ingo's time-warp-test.c,
I could trigger a scenario as bad as 1.5mi warps a minute in some systems.
(to be fair, it wasn't that bad in most of them). Investigating further, I
found out that such warps were caused by the very offset-based calculation
pvclock is based on.

This happens even on some machines that report constant_tsc in its tsc flags,
specially on multi-socket ones.

Two reads of the same kernel timestamp at approx the same time, will likely
have tsc timestamped in different occasions too. This means the delta we
calculate is unpredictable at best, and can probably be smaller in a cpu
that is legitimately reading clock in a forward ocasion.

Some adjustments on the host could make this window less likely to happen,
but still, it pretty much poses as an intrinsic problem of the mechanism.

A while ago, I though about using a shared variable anyway, to hold clock
last state, but gave up due to the high contention locking was likely
to introduce, possibly rendering the thing useless on big machines. I argue,
however, that locking is not necessary.

We do a read-and-return sequence in pvclock, and between read and return,
the global value can have changed. However, it can only have changed
by means of an addition of a positive value. So if we detected that our
clock timestamp is less than the current global, we know that we need to
return a higher one, even though it is not exactly the one we compared to.

OTOH, if we detect we're greater than the current time source, we atomically
replace the value with our new readings. This do causes contention on big
boxes (but big here means *BIG*), but it seems like a good trade off, since
it provide us with a time source guaranteed to be stable wrt time warps.

After this patch is applied, I don't see a single warp in time during 5 days
of execution, in any of the machines I saw them before.

Signed-off-by: Glauber ...
From: Marcelo Tosatti
Date: Friday, April 16, 2010 - 1:23 pm

__cacheline_aligned_in_smp to avoid other data from sharing the

Don't you need to handle wrap-around?
--

From: Jeremy Fitzhardinge
Date: Friday, April 16, 2010 - 1:36 pm

Is the problem that the tscs are starting out of sync, or that they're
drifting relative to each other over time?  Do the problems become worse
Does this need a barrier() to prevent the compiler from re-reading
last_value for the subsequent lines?  Otherwise "(ret < last)" and

So if CPU A's tsc is, say, 50us behind CPU B's, then it will return a
static time (from last_time) for 50us until its tsc catched up - or if
CPU A happens to update last_time to give it a kick?

Is it worth trying to update CPU A's tsc offset at the same time?

    J
--

From: Zachary Amsden
Date: Friday, April 16, 2010 - 2:05 pm

This is one source of the problem, but the same thing happens at many 
levels... tsc may start out of sync, drift between sockets, be badly 
re-calibrated by the BIOS, etc... the issue persists even if the TSCs 
are perfectly in sync - the measurement of them is not.

So reading TSC == 100,000 units at time A and then waiting 10 units, one 
may read TSC == 100,010 +/- 5 units because the code stream is not 
perfectly serialized - nor can it be.  There will always be some amount 
of error unless running in perfect lock-step, which only happens in a 
simulator.

This inherent measurement error can cause apparent time to go backwards 
when measured simultaneously across multiple CPUs, or when 
re-calibrating against an external clocksource.  Combined with other 
factors as above, it can be of sufficient magnitude to be noticed.  KVM 
clock is particularly exposed to the problem because the TSC is measured 
and recalibrated for each virtual CPU whenever there is a physical CPU 
switch, so micro-adjustments forwards and backwards may occur during the 
recalibration - and appear as a real backwards time warp to the guest.  
I have some patches to fix that issue, but the SMP problem remains to be 
fixed - and is addressed quite thoroughly by this patch.

Zach
--

From: Peter Zijlstra
Date: Monday, April 19, 2010 - 3:39 am

ACCESS_ONCE() is your friend.

--

From: Avi Kivity
Date: Monday, April 19, 2010 - 4:10 am

Oh yes, just trying to avoid a patch with both atomic64_read() and 
ACCESS_ONCE().



-- 
error compiling committee.c: too many arguments to function

--

From: Glauber Costa
Date: Monday, April 19, 2010 - 7:21 am

you're mixing the private version of the patch you saw with this one.
there isn't any atomic reads in here. I'll use a barrier then

--

From: Avi Kivity
Date: Monday, April 19, 2010 - 7:33 am

This patch writes last_value atomically, but reads it non-atomically.  A 
barrier is insufficient.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--

From: Peter Zijlstra
Date: Monday, April 19, 2010 - 7:46 am

What avi says! :-)

On a 32bit machine a 64bit read are two 32bit reads, so

  last = last_value;

becomes:

  last.high = last_value.high;
  last.low  = last_vlue.low;

(or the reverse of course)

Now imagine a write getting interleaved with that ;-)



--

From: Jeremy Fitzhardinge
Date: Monday, April 19, 2010 - 9:18 am

You could explicitly do:

	do {
		h = last.high;
		barrier();
		l = last.low;
		barrier();
	} while (last.high != h);


This works because we expect last to be always increasing, so the only
worry is low wrapping and incrementing high, and is more efficient than
making the read fully atomic (the write is still cmpxchg64).  But it's
pretty ugly to open code just for 32b architectures; its something that
might be useful to turn into a general abstraction (monotonic_read_64
FTW!).  I already have code like this in the Xen time code, so I could
make immediate use of it.

    J
--

From: Avi Kivity
Date: Tuesday, April 20, 2010 - 2:31 am

I don't think this is worthwhile - the cmpxchg is not that expensive on 
most kvm capable hosts (the exception is the Pentium D).

btw, do you want this code in pvclock.c, or shall we keep it kvmclock 
specific?

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--

From: Jeremy Fitzhardinge
Date: Tuesday, April 20, 2010 - 11:23 am

I think its a pvclock-level fix.  I'd been hoping to avoid having
something like this, but I think its ultimately necessary.

    J
--

From: Avi Kivity
Date: Tuesday, April 20, 2010 - 11:54 am

Did you observe drift on Xen, or is this "ultimately" pointing at the 
future?

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--

From: Jeremy Fitzhardinge
Date: Tuesday, April 20, 2010 - 12:42 pm

People are reporting weirdnesses that "clocksource=jiffies" apparently
resolves.  Xen and KVM are faced with the same hardware constraints, and
it wouldn't surprise me if there were small measurable
non-monotonicities in the PV clock under Xen.  May as well be safe.

Of course, it kills any possibility of being able to usefully export
this interface down to usermode.

My main concern about this kind of simple fix is that if there's a long
term systematic drift between different CPU's tscs, then this will
somewhat mask the problem while giving really awful time measurement on
the "slow" CPU(s).  In that case it really needs to adjust the scaling
factor to correct for the drift (*not* update the offset).  But if we're
definitely only talking about fixed, relatively small time offsets then
it is fine.

    J
--

From: Zachary Amsden
Date: Tuesday, April 20, 2010 - 5:07 pm

Does the drift only occur on SMP VMs?

--

From: Glauber Costa
Date: Thursday, April 22, 2010 - 6:11 am

Can you by any chance run ingo's time warp test on those machines?

You need to define TOD to 1, and leave out the TSC test.

For me, warps exists on every machine out there, but the nehalems, so far
--

From: Avi Kivity
Date: Friday, April 23, 2010 - 2:34 am

Plus, replace cpuid by lfence/mfence.  cpuid will trap.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--

From: Jeremy Fitzhardinge
Date: Friday, April 23, 2010 - 12:22 pm

I presume the clobbers are needed for cpuid; if you use [lm]fence then
they shouldn't be needed, right?

    J

--

From: Avi Kivity
Date: Friday, April 23, 2010 - 12:25 pm

Right.  But I'm not sure lfence/mfence are sufficient - from what I 
understand rdtsc can pass right through them.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--

From: Zachary Amsden
Date: Friday, April 23, 2010 - 2:31 pm

Does lfence / mfence actually serialize?  I thought there was some great 
confusion about that not being the case on all AMD processors, and 
possibly not at all on Intel.

A trap, however is a great way to serialize.

I think, there is no serializing instruction which can be used from 
userspace which does not trap, at least, I don't know one off the top of 
my head.

Zach
--

From: Jeremy Fitzhardinge
Date: Friday, April 23, 2010 - 2:35 pm

rsm is not technically privileged... but not quite usable from usermode ;)

    J
--

From: Zachary Amsden
Date: Friday, April 23, 2010 - 2:41 pm

rsm under hardware virtualization makes my head hurt
--

From: Avi Kivity
Date: Saturday, April 24, 2010 - 2:30 am

Either one independently is sufficient for me.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--

From: Avi Kivity
Date: Saturday, April 24, 2010 - 2:29 am

iret.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--

From: Jeremy Fitzhardinge
Date: Monday, April 19, 2010 - 9:11 am

Well, on a 32b system, you can explicitly order the updates of low and
high, then do a high-low-checkhigh read.  That would be much more
efficient than atomic64.  If we really care about 32b.

    J

--

From: Glauber Costa
Date: Monday, April 19, 2010 - 7:26 am

Yes it is.

But as I said, this seem to be a very deep worst case scenario. Most of boxes
The offsets usually seem pretty small, under a microsecond. So I don't think
it has anything to do with tscs starting out of sync. Specially because the
delta-based calculation has the exact purpose of handling that case.


--

From: Jeremy Fitzhardinge
Date: Monday, April 19, 2010 - 9:19 am

So you think they're drifting out of sync from an initially synced
state?  If so, what would bound the drift?

    J
--

From: Glauber Costa
Date: Monday, April 19, 2010 - 11:25 am

I think delta calculation introduces errors.

Marcelo can probably confirm it, but he has a nehalem with an appearently
very good tsc source. Even this machine warps.

It stops warping if we only write pvclock data structure once and forget it,
(which only updated tsc_timestamp once), according to him.

Obviously, we can't do that everywhere.
--

From: Marcelo Tosatti
Date: Monday, April 19, 2010 - 6:57 pm

Yes. So its not as if the guest visible TSCs go out of sync (they don't
on this machine Glauber mentioned, or even on a multi-core Core 2 Duo),
but the delta calculation is very hard (if not impossible) to get right.

The timewarps i've seen were in the 0-200ns range, and very rare (once
--

From: Glauber Costa
Date: Tuesday, April 20, 2010 - 5:59 am

For the record, we can only even do that in those machines. If we try to update
time structures only once in machines with the 
X86_FEATURE_TSC_SAYS_IT_IS_OKAY_BUT_IN_REALITY_IS_NOT_OKAY feature flag, guests
won't even boot.

We can detect that, and besides doing calculation only once, also export some
bit indicating that to the guest. Humm... I'm thinking now, that because of
migration, we should check this bit every time, because we might have changed host.
So instead of using an expensive cpuid check, we should probably use some bit in
the vcpu_time_info structure, and use a cpuid bit just to say it is enabled.

Jeremy,

are you okay in turning one of the pad fields in the structure into a flags field?
--

From: Avi Kivity
Date: Tuesday, April 20, 2010 - 8:16 am

Right, we need a bit in the structure itself that says you can trust the 
time not to go backwards, and a bit in cpuid that says you can trust the 
bit in the structure (unless we can be sure no implementations leak 
garbage into the padding field).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--

From: Zachary Amsden
Date: Tuesday, April 20, 2010 - 5:01 pm

There's a perfect way to do this and it still fails to stop timewarps.  
You can set the performance counters to overflow if more instructions 
are issued than your code path, run an assembly instruction stream and 
if the performance interrupt hits, restart the calibration.

The calibration happens not just once, but on every migration, and 
currently, I believe, on every VCPU switch.  Even if we reduce the 
number of calibrations to the bare minimum and rule out SMIs and NMIs, 
there will still be variation due to factors beyond our control because 
of the unpredictable nature of cache and instruction issue.

However, X86_FEATURE_NONSTOP_TRULY_RELIABLE_TSC_HONESTLY does imply one 
key feature which the code is missing today:  on SMP VMs, the 
calibration of kvmclock needs to be done only once, and the clock can 
then be used for all VCPUs.  That, I think, stops Glauber's bug from 
appearing on the server side.

I will spin that into my web of patches and send the cocoon out sometime 
this evening.

Zach
--

From: Avi Kivity
Date: Wednesday, April 21, 2010 - 1:06 am

It's completely impractical.   The PMU is a global resource that is 
already shared among users and the host; programming and restoring it is 


That's the plan.


-- 
error compiling committee.c: too many arguments to function

--

From: Avi Kivity
Date: Saturday, April 17, 2010 - 11:48 am

Please define a cpuid bit that makes this optional.  When we eventually 




-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--

From: Avi Kivity
Date: Saturday, April 17, 2010 - 11:49 am

On a 32-bit guest, that is.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--

From: Peter Zijlstra
Date: Monday, April 19, 2010 - 3:43 am

Right, do bear in mind that the x86 implementation of atomic64_read() is
terrifyingly expensive, it is better to not do that read and simply use
the result of the cmpxchg.


--

From: Avi Kivity
Date: Monday, April 19, 2010 - 3:47 am

atomic64_read() _is_ cmpxchg64b.  Are you thinking of some clever 
implementation for smp i386?

-- 
error compiling committee.c: too many arguments to function

--

From: Peter Zijlstra
Date: Monday, April 19, 2010 - 3:56 am

No, what I was suggesting was to rewrite that loop no to need the
initial read but use the cmpxchg result of the previous iteration.

Something like:

  u64 last = 0;

  /* more stuff */
 
  do {
    if (ret < last)
      return last;
    last = cmpxchg64(&last_value, last, ret);
  } while (last != ret);

That only has a single cmpxchg8 in there per loop instead of two
(avoiding the atomic64_read() one).

--

From: Avi Kivity
Date: Monday, April 19, 2010 - 4:13 am

Still have two cmpxchgs in the common case.  The first iteration will 
fail, fetching last_value, the second will work.

It will be better when we have contention, though, so it's worthwhile.

-- 
error compiling committee.c: too many arguments to function

--

From: Peter Zijlstra
Date: Monday, April 19, 2010 - 4:19 am

Right, another option is to put the initial read outside of the loop,
that way you'll have the best of all cases, a single LOCK'ed op in the
loop, and only a single LOCK'ed op for the fast path on sensible
architectures ;-)

    last = atomic64_read(&last_value);
    do {
      if (ret < last)
        return last;
      last = atomic64_cmpxchg(&last_value, last, ret);
    } while (unlikely(last != ret));

Or so.

--

From: Glauber Costa
Date: Monday, April 19, 2010 - 7:32 am

isn't a barrier enough here?


--

From: Avi Kivity
Date: Monday, April 19, 2010 - 7:37 am

No.  On i386, the statement

    last = last_value;

will be split by the compiler into two 32-bit loads.  If a write 
(atomic, using cmpxchg) on another cpu happens between those two loads, 
then the variable last will have a corrupted value.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--

From: Peter Zijlstra
Date: Monday, April 19, 2010 - 3:46 am

Right, so on x86 we have:

X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency
independent, not that it doesn't stop in C states and similar fun stuff.

X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is constant
and synced between cores.

--

From: Avi Kivity
Date: Monday, April 19, 2010 - 3:49 am

Sockets and boards too?  (IOW, how reliable is TSC_RELIABLE)?

-- 
error compiling committee.c: too many arguments to function

--

From: Peter Zijlstra
Date: Monday, April 19, 2010 - 3:51 am

Not sure, IIRC we clear that when the TSC sync test fails, eg when we
mark the tsc clocksource unusable.

--

From: Avi Kivity
Date: Monday, April 19, 2010 - 3:54 am

Worrying.  By the time we detect this the guest may already have gotten 
confused by clocks going backwards.

-- 
error compiling committee.c: too many arguments to function

--

From: Zachary Amsden
Date: Monday, April 19, 2010 - 11:35 am

Upstream, we are marking the TSC unstable preemptively when hardware 
which will eventually sync test is detected, so this should be fine.

Zach
--

From: Avi Kivity
Date: Tuesday, April 20, 2010 - 2:39 am

ENOPARSE?

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--

From: Avi Kivity
Date: Wednesday, April 21, 2010 - 1:08 am

That works within a socket; but multiple socket machines need not feed 
all sockets from the same crystal and reset line (esp. likely for 
multiple board machines, but might even happen with single board 
FSB-less processors).

-- 
error compiling committee.c: too many arguments to function

--

From: Peter Zijlstra
Date: Monday, April 19, 2010 - 3:49 am

Fun, we also have:

X86_FEATURE_NONSTOP_TSC, which states the thing doesn't stop in C
states.

--

From: Avi Kivity
Date: Monday, April 19, 2010 - 3:53 am

All of them? I though tsc stops in some mwait deep REM sleep thing.

So what do we need?  test for both TSC_RELIABLE and NONSTOP_TSC?  IMO 
TSC_RELIABLE should imply NONSTOP_TSC.

-- 
error compiling committee.c: too many arguments to function

--

From: Peter Zijlstra
Date: Monday, April 19, 2010 - 3:59 am

The idea is that TSC will not stop ever (except s2r like stuff), not

Yeah, I think RELIABLE does imply NONSTOP and CONSTANT, but NONSTOP &&
CONSTANT does not make RELIABLE.



--

From: Avi Kivity
Date: Monday, April 19, 2010 - 4:35 am

and this maps to NONSTOP, so I think we're fine.

-- 
error compiling committee.c: too many arguments to function

--

From: Jeremy Fitzhardinge
Date: Monday, October 25, 2010 - 4:30 pm

Unfortunately this is breaking Xen save/restore: if you restore on a
host which was booted more recently than the save host, causing the
system time to be smaller.  The effect is that the domain's time leaps
forward to a fixed point, and stays there until the host catches up to
the source host...

I guess last_time needs to be reset on this type of event.  I guess the
cleanest way would be for pvclock.c to register a sysdev suspend/resume
handler.


--

From: Avi Kivity
Date: Tuesday, October 26, 2010 - 1:14 am

Should be for Xen only; kvm save/restore doesn't involve the guest.

-- 
error compiling committee.c: too many arguments to function

--

From: Glauber Costa
Date: Tuesday, October 26, 2010 - 3:49 am

the suspend/resume path do adjust the time base. Maybe something similar
should be done ?


--

From: Jeremy Fitzhardinge
Date: Tuesday, October 26, 2010 - 10:04 am

Xen doesn't guarantee the system time is monotonic across those kinds of
events.  The domain could maintain its own offset to maintain an
illusion of monotonicity, but I think its simpler to just zero
last_value on resume.

    J
--

Previous thread: [PATCH 4/5] export new cpuid KVM_CAP by Glauber Costa on Thursday, April 15, 2010 - 11:37 am. (6 messages)

Next thread: [GIT Pull] x86 fixes for 2.6.34 by Thomas Gleixner on Thursday, April 15, 2010 - 12:11 pm. (1 message)