Re: [PATCH 4/5] KVM: SVM: Allow EFER.LMSLE to be set with nested svm

Previous thread: [PATCH 5/5] KVM: SVM: Don't allow nested guest to VMMCALL into host by Joerg Roedel on Wednesday, May 5, 2010 - 7:04 am. (1 message)

Next thread: ***IMPORTANT NOTICE*** by Webmail Help Desk on Wednesday, May 5, 2010 - 6:54 am. (1 message)
From: Joerg Roedel
Date: Wednesday, May 5, 2010 - 7:04 am

Hi Avi, Marcelo,

here is a set of patches which fix problems in kvm-amd. Patch 1 fixes a
stupid problem with the event-reinjection introduced by me in my
previous patchset.  Patch 2 was a helper to find the bug patch 3 fixes.
I kept it in the patchset because it may be helpful in the future to
debug other problems too. Patch 3 is the most important fix because it
makes kvm-amd on 32 bit hosts work again.  Without this patch the first
vmrum fails with exit-reason VMEXIT_INVALID. Patch 4 fixes the Xen 4.0
shipped with SLES11 in nested svm. The last patch in this series fixes a
potential l2-guest breakout scenario because it may be possible for the
l2-guest to issue hypercalls directly to the host if the l1-hypervisor
does not intercept VMMCALL.

Thanks,

	Joerg

Diffstat:

 arch/x86/include/asm/msr-index.h |    2 +
 arch/x86/kvm/svm.c               |  108 ++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/x86.c               |    2 +-
 3 files changed, 106 insertions(+), 6 deletions(-)

Shortlog:

Joerg Roedel (5):
      KVM: X86: Fix stupid bug in exception reinjection path
      KVM: SVM: Dump vmcb contents on failed vmrun
      KVM: SVM: Fix wrong intercept masks on 32 bit
      KVM: SVM: Allow EFER.LMSLE to be set with nested svm
      KVM: SVM: Don't allow nested guest to VMMCALL into host


--

From: Joerg Roedel
Date: Wednesday, May 5, 2010 - 7:04 am

This patch enables setting of efer bit 13 which is allowed
in all SVM capable processors. This is necessary for the
SLES11 version of Xen 4.0 to boot with nested svm.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/include/asm/msr-index.h |    2 ++
 arch/x86/kvm/svm.c               |    2 +-
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index bc473ac..352767d 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -20,6 +20,7 @@
 #define _EFER_LMA		10 /* Long mode active (read-only) */
 #define _EFER_NX		11 /* No execute enable */
 #define _EFER_SVME		12 /* Enable virtualization */
+#define _EFER_LMSLE		13 /* Long Mode Segment Limit Enable */
 #define _EFER_FFXSR		14 /* Enable Fast FXSAVE/FXRSTOR */
 
 #define EFER_SCE		(1<<_EFER_SCE)
@@ -27,6 +28,7 @@
 #define EFER_LMA		(1<<_EFER_LMA)
 #define EFER_NX			(1<<_EFER_NX)
 #define EFER_SVME		(1<<_EFER_SVME)
+#define EFER_LMSLE		(1<<_EFER_LMSLE)
 #define EFER_FFXSR		(1<<_EFER_FFXSR)
 
 /* Intel MSRs. Some also available on other CPUs */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 74f7b9d..bc087c7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -610,7 +610,7 @@ static __init int svm_hardware_setup(void)
 
 	if (nested) {
 		printk(KERN_INFO "kvm: Nested Virtualization enabled\n");
-		kvm_enable_efer_bits(EFER_SVME);
+		kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
 	}
 
 	for_each_possible_cpu(cpu) {
-- 
1.7.1


--

From: Avi Kivity
Date: Wednesday, May 5, 2010 - 7:46 am

Interesting, why does it require it?


What if the host doesn't have it?

Why enable it only for the nested case?  It's not svm specific (it's 
useful for running non-hvm Xen in non-nested mode).

Isn't there a cpuid bit for it?  If so, it should be exposed to 
userspace, and the feature should depend on it.

-- 
error compiling committee.c: too many arguments to function

--

From: Joerg Roedel
Date: Wednesday, May 5, 2010 - 8:04 am

I don't know. I traced the Xen crash down and found that is gets a #GP

I have heard inofficial statements that they set this bit to provide the
functionality to their guests. And Xen sets this bit together with the


Because there is no cpuid bit for this feature. You can roughly check
for it using the svm cpuid bit.


	Joerg

--

From: Avi Kivity
Date: Wednesday, May 5, 2010 - 8:06 am

Doesn't it kill cross-vendor migration?

-- 
error compiling committee.c: too many arguments to function

--

From: Avi Kivity
Date: Wednesday, May 5, 2010 - 8:14 am

Oh, the fact that it's in if (nested) protects against that (at least 
until Intel implements EFER.SVME.  So the patch is good.

-- 
error compiling committee.c: too many arguments to function

--

From: Roedel, Joerg
Date: Wednesday, May 5, 2010 - 8:16 am

Enabling Nested SVM kills it anyway, so this is not an issue. AFAIK the
feature is not present on Intel CPUs.

	Joerg


--

From: Andre Przywara
Date: Wednesday, May 5, 2010 - 1:57 pm

It does not. Best is you check the patch, which has just been posted 
yesterday on xen-devel for upstream inclusion:
http://lists.xensource.com/archives/html/xen-devel/2010-05/msg00096.html
Basically it _tries_ to set the bit via safe_wrmsr() to detect whether 
its legal (in replacement of the missing CPUID check). Afterwards it 
resets it. If it available, it allows _guests_ to enable it. AFAIK the 
only user of this feature is VMware with binary translation running 
64bit guests, as they rely on segment limits to protect their hypervisor 
code from being read from the guest.
If I understood this correctly, there is a bug somewhere, maybe even in 
KVM's nested SVM implementation. Xen is fine with this bit-set provoking 
VMware's binary translation relies on VMX for running 64bit guests, on 
AMD you don't need SVM if you had a K8RevE (dual core 90nm) with this 
feature. In fact you should not be able to run VMware with 64bit guests 
inside a KVM guest on an Intel box (without nested VMX, that is).
On AMD you can either use nested SVM or this feature.

Regards,

-- 
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712

--

From: Roedel, Joerg
Date: Thursday, May 6, 2010 - 2:38 am

Ok, I looked at this again and reproduced the traces I already deleted
and fetched the Xen crash message and found something I missed before.
The relevant part of the KVM trace is:

 qemu-system-x86-7364  [012]   790.715351: kvm_exit: reason msr rip 0xffff82c4801b5c93
 qemu-system-x86-7364  [012]   790.715352: kvm_msr: msr_write c0000080 = 0x3d01
 qemu-system-x86-7364  [012]   790.715354: kvm_inj_exception: #GP (0x0)

And the Xen-Crash message is:

(XEN) Xen call trace:
(XEN)    [<ffff82c4801b5c95>] svm_cpu_up+0x135/0x200
(XEN)    [<ffff82c4801b5d9c>] start_svm+0x3c/0xe0
(XEN)    [<ffff82c4801948b2>] identify_cpu+0xd2/0x240
(XEN)    [<ffff82c480252c6b>] __start_xen+0x1dbb/0x3660
(XEN)    [<ffff82c4801000b5>] __high_start+0xa1/0xa3
(XEN)    
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) GENERAL PROTECTION FAULT
(XEN) [error_code=0000]
(XEN) ****************************************

The MSR write happens on rip 0xffff82c4801b5c93 while the #GP is
injected at rip ffff82c4801b5c95 (== right after the wrmsr instruction).
So yes, there is another bug in KVM here. The problem is that the
set_efer function does not report write errors to ist caller and injects
the #GP directly. The svm:wrmsr_interception recognizes a success and
advances the rip.
The attached patch fixes this.

From e0d69cf7a396d35ae9aa4778e87f82c243bfa0ae Mon Sep 17 00:00:00 2001
From: Joerg Roedel <joerg.roedel@amd.com>
Date: Thu, 6 May 2010 11:07:46 +0200
Subject: [PATCH] KVM: X86: Inject #GP with the right rip on efer writes

This patch fixes a bug in the KVM efer-msr write path. If a
guest writes to a reserved efer bit the set_efer function
injects the #GP directly. The architecture dependent wrmsr
function does not see this, assumes success and advances the
rip. This results in a #GP in the guest with the wrong rip.
This patch fixes this by reporting efer write errors back to
the architectural wrmsr function.

Signed-off-by: Joerg Roedel ...
From: Avi Kivity
Date: Thursday, May 6, 2010 - 4:42 am

Applied, thanks.

-- 
error compiling committee.c: too many arguments to function

--

From: Joerg Roedel
Date: Wednesday, May 5, 2010 - 7:04 am

This patch adds a function to dump the vmcb into the kernel
log and calls it after a failed vmrun to ease debugging.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/svm.c |   95 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 95 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 889f660..0201b06 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2637,6 +2637,99 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_NPF]				= pf_interception,
 };
 
+void dump_vmcb(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct vmcb_control_area *control = &svm->vmcb->control;
+	struct vmcb_save_area *save = &svm->vmcb->save;
+
+	pr_err("VMCB Control Area:\n");
+	pr_err("cr_read:            %04x\n", control->intercept_cr_read);
+	pr_err("cr_write:           %04x\n", control->intercept_cr_write);
+	pr_err("dr_read:            %04x\n", control->intercept_dr_read);
+	pr_err("dr_write:           %04x\n", control->intercept_dr_write);
+	pr_err("exceptions:         %08x\n", control->intercept_exceptions);
+	pr_err("intercepts:         %016llx\n", control->intercept);
+	pr_err("pause filter count: %d\n", control->pause_filter_count);
+	pr_err("iopm_base_pa:       %016llx\n", control->iopm_base_pa);
+	pr_err("msrpm_base_pa:      %016llx\n", control->msrpm_base_pa);
+	pr_err("tsc_offset:         %016llx\n", control->tsc_offset);
+	pr_err("asid:               %d\n", control->asid);
+	pr_err("tlb_ctl:            %d\n", control->tlb_ctl);
+	pr_err("int_ctl:            %08x\n", control->int_ctl);
+	pr_err("int_vector:         %08x\n", control->int_vector);
+	pr_err("int_state:          %08x\n", control->int_state);
+	pr_err("exit_code:          %08x\n", control->exit_code);
+	pr_err("exit_info1:         %016llx\n", control->exit_info_1);
+	pr_err("exit_info2:         %016llx\n", control->exit_info_2);
+	pr_err("exit_int_info:      %08x\n", ...
From: Joerg Roedel
Date: Wednesday, May 5, 2010 - 7:04 am

This patch makes KVM on 32 bit SVM working again by
correcting the masks used for iret interception. With the
wrong masks the upper 32 bits of the intercepts are masked
out which leaves vmrun unintercepted. This is not legal on
svm and the vmrun fails.
Bug was introduced by commits 95ba827313 and 3cfc3092.

Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/svm.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0201b06..74f7b9d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2290,7 +2290,7 @@ static int cpuid_interception(struct vcpu_svm *svm)
 static int iret_interception(struct vcpu_svm *svm)
 {
 	++svm->vcpu.stat.nmi_window_exits;
-	svm->vmcb->control.intercept &= ~(1UL << INTERCEPT_IRET);
+	svm->vmcb->control.intercept &= ~(1ULL << INTERCEPT_IRET);
 	svm->vcpu.arch.hflags |= HF_IRET_MASK;
 	return 1;
 }
@@ -2824,7 +2824,7 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
 
 	svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;
 	vcpu->arch.hflags |= HF_NMI_MASK;
-	svm->vmcb->control.intercept |= (1UL << INTERCEPT_IRET);
+	svm->vmcb->control.intercept |= (1ULL << INTERCEPT_IRET);
 	++vcpu->stat.nmi_injections;
 }
 
@@ -2891,10 +2891,10 @@ static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
 
 	if (masked) {
 		svm->vcpu.arch.hflags |= HF_NMI_MASK;
-		svm->vmcb->control.intercept |= (1UL << INTERCEPT_IRET);
+		svm->vmcb->control.intercept |= (1ULL << INTERCEPT_IRET);
 	} else {
 		svm->vcpu.arch.hflags &= ~HF_NMI_MASK;
-		svm->vmcb->control.intercept &= ~(1UL << INTERCEPT_IRET);
+		svm->vmcb->control.intercept &= ~(1ULL << INTERCEPT_IRET);
 	}
 }
 
-- 
1.7.1


--

From: Avi Kivity
Date: Thursday, May 6, 2010 - 1:51 am

All applied, thanks.

-- 
error compiling committee.c: too many arguments to function

--

Previous thread: [PATCH 5/5] KVM: SVM: Don't allow nested guest to VMMCALL into host by Joerg Roedel on Wednesday, May 5, 2010 - 7:04 am. (1 message)

Next thread: ***IMPORTANT NOTICE*** by Webmail Help Desk on Wednesday, May 5, 2010 - 6:54 am. (1 message)