Hi Avi, Marcelo,
here is a set of patches which fix problems in kvm-amd. Patch 1 fixes a
stupid problem with the event-reinjection introduced by me in my
previous patchset. Patch 2 was a helper to find the bug patch 3 fixes.
I kept it in the patchset because it may be helpful in the future to
debug other problems too. Patch 3 is the most important fix because it
makes kvm-amd on 32 bit hosts work again. Without this patch the first
vmrum fails with exit-reason VMEXIT_INVALID. Patch 4 fixes the Xen 4.0
shipped with SLES11 in nested svm. The last patch in this series fixes a
potential l2-guest breakout scenario because it may be possible for the
l2-guest to issue hypercalls directly to the host if the l1-hypervisor
does not intercept VMMCALL.
Thanks,
Joerg
Diffstat:
arch/x86/include/asm/msr-index.h | 2 +
arch/x86/kvm/svm.c | 108 ++++++++++++++++++++++++++++++++++++--
arch/x86/kvm/x86.c | 2 +-
3 files changed, 106 insertions(+), 6 deletions(-)
Shortlog:
Joerg Roedel (5):
KVM: X86: Fix stupid bug in exception reinjection path
KVM: SVM: Dump vmcb contents on failed vmrun
KVM: SVM: Fix wrong intercept masks on 32 bit
KVM: SVM: Allow EFER.LMSLE to be set with nested svm
KVM: SVM: Don't allow nested guest to VMMCALL into host
--
This patch enables setting of efer bit 13 which is allowed
in all SVM capable processors. This is necessary for the
SLES11 version of Xen 4.0 to boot with nested svm.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
arch/x86/include/asm/msr-index.h | 2 ++
arch/x86/kvm/svm.c | 2 +-
2 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index bc473ac..352767d 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -20,6 +20,7 @@
#define _EFER_LMA 10 /* Long mode active (read-only) */
#define _EFER_NX 11 /* No execute enable */
#define _EFER_SVME 12 /* Enable virtualization */
+#define _EFER_LMSLE 13 /* Long Mode Segment Limit Enable */
#define _EFER_FFXSR 14 /* Enable Fast FXSAVE/FXRSTOR */
#define EFER_SCE (1<<_EFER_SCE)
@@ -27,6 +28,7 @@
#define EFER_LMA (1<<_EFER_LMA)
#define EFER_NX (1<<_EFER_NX)
#define EFER_SVME (1<<_EFER_SVME)
+#define EFER_LMSLE (1<<_EFER_LMSLE)
#define EFER_FFXSR (1<<_EFER_FFXSR)
/* Intel MSRs. Some also available on other CPUs */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 74f7b9d..bc087c7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -610,7 +610,7 @@ static __init int svm_hardware_setup(void)
if (nested) {
printk(KERN_INFO "kvm: Nested Virtualization enabled\n");
- kvm_enable_efer_bits(EFER_SVME);
+ kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
}
for_each_possible_cpu(cpu) {
--
1.7.1
--
Interesting, why does it require it? What if the host doesn't have it? Why enable it only for the nested case? It's not svm specific (it's useful for running non-hvm Xen in non-nested mode). Isn't there a cpuid bit for it? If so, it should be exposed to userspace, and the feature should depend on it. -- error compiling committee.c: too many arguments to function --
I don't know. I traced the Xen crash down and found that is gets a #GP I have heard inofficial statements that they set this bit to provide the functionality to their guests. And Xen sets this bit together with the Because there is no cpuid bit for this feature. You can roughly check for it using the svm cpuid bit. Joerg --
Doesn't it kill cross-vendor migration? -- error compiling committee.c: too many arguments to function --
Oh, the fact that it's in if (nested) protects against that (at least until Intel implements EFER.SVME. So the patch is good. -- error compiling committee.c: too many arguments to function --
Enabling Nested SVM kills it anyway, so this is not an issue. AFAIK the feature is not present on Intel CPUs. Joerg --
It does not. Best is you check the patch, which has just been posted yesterday on xen-devel for upstream inclusion: http://lists.xensource.com/archives/html/xen-devel/2010-05/msg00096.html Basically it _tries_ to set the bit via safe_wrmsr() to detect whether its legal (in replacement of the missing CPUID check). Afterwards it resets it. If it available, it allows _guests_ to enable it. AFAIK the only user of this feature is VMware with binary translation running 64bit guests, as they rely on segment limits to protect their hypervisor code from being read from the guest. If I understood this correctly, there is a bug somewhere, maybe even in KVM's nested SVM implementation. Xen is fine with this bit-set provoking VMware's binary translation relies on VMX for running 64bit guests, on AMD you don't need SVM if you had a K8RevE (dual core 90nm) with this feature. In fact you should not be able to run VMware with 64bit guests inside a KVM guest on an Intel box (without nested VMX, that is). On AMD you can either use nested SVM or this feature. Regards, -- Andre Przywara AMD-OSRC (Dresden) Tel: x29712 --
Ok, I looked at this again and reproduced the traces I already deleted and fetched the Xen crash message and found something I missed before. The relevant part of the KVM trace is: qemu-system-x86-7364 [012] 790.715351: kvm_exit: reason msr rip 0xffff82c4801b5c93 qemu-system-x86-7364 [012] 790.715352: kvm_msr: msr_write c0000080 = 0x3d01 qemu-system-x86-7364 [012] 790.715354: kvm_inj_exception: #GP (0x0) And the Xen-Crash message is: (XEN) Xen call trace: (XEN) [<ffff82c4801b5c95>] svm_cpu_up+0x135/0x200 (XEN) [<ffff82c4801b5d9c>] start_svm+0x3c/0xe0 (XEN) [<ffff82c4801948b2>] identify_cpu+0xd2/0x240 (XEN) [<ffff82c480252c6b>] __start_xen+0x1dbb/0x3660 (XEN) [<ffff82c4801000b5>] __high_start+0xa1/0xa3 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) GENERAL PROTECTION FAULT (XEN) [error_code=0000] (XEN) **************************************** The MSR write happens on rip 0xffff82c4801b5c93 while the #GP is injected at rip ffff82c4801b5c95 (== right after the wrmsr instruction). So yes, there is another bug in KVM here. The problem is that the set_efer function does not report write errors to ist caller and injects the #GP directly. The svm:wrmsr_interception recognizes a success and advances the rip. The attached patch fixes this. From e0d69cf7a396d35ae9aa4778e87f82c243bfa0ae Mon Sep 17 00:00:00 2001 From: Joerg Roedel <joerg.roedel@amd.com> Date: Thu, 6 May 2010 11:07:46 +0200 Subject: [PATCH] KVM: X86: Inject #GP with the right rip on efer writes This patch fixes a bug in the KVM efer-msr write path. If a guest writes to a reserved efer bit the set_efer function injects the #GP directly. The architecture dependent wrmsr function does not see this, assumes success and advances the rip. This results in a #GP in the guest with the wrong rip. This patch fixes this by reporting efer write errors back to the architectural wrmsr function. Signed-off-by: Joerg Roedel ...
Applied, thanks. -- error compiling committee.c: too many arguments to function --
This patch adds a function to dump the vmcb into the kernel
log and calls it after a failed vmrun to ease debugging.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
arch/x86/kvm/svm.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 95 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 889f660..0201b06 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2637,6 +2637,99 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm) = {
[SVM_EXIT_NPF] = pf_interception,
};
+void dump_vmcb(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ struct vmcb_control_area *control = &svm->vmcb->control;
+ struct vmcb_save_area *save = &svm->vmcb->save;
+
+ pr_err("VMCB Control Area:\n");
+ pr_err("cr_read: %04x\n", control->intercept_cr_read);
+ pr_err("cr_write: %04x\n", control->intercept_cr_write);
+ pr_err("dr_read: %04x\n", control->intercept_dr_read);
+ pr_err("dr_write: %04x\n", control->intercept_dr_write);
+ pr_err("exceptions: %08x\n", control->intercept_exceptions);
+ pr_err("intercepts: %016llx\n", control->intercept);
+ pr_err("pause filter count: %d\n", control->pause_filter_count);
+ pr_err("iopm_base_pa: %016llx\n", control->iopm_base_pa);
+ pr_err("msrpm_base_pa: %016llx\n", control->msrpm_base_pa);
+ pr_err("tsc_offset: %016llx\n", control->tsc_offset);
+ pr_err("asid: %d\n", control->asid);
+ pr_err("tlb_ctl: %d\n", control->tlb_ctl);
+ pr_err("int_ctl: %08x\n", control->int_ctl);
+ pr_err("int_vector: %08x\n", control->int_vector);
+ pr_err("int_state: %08x\n", control->int_state);
+ pr_err("exit_code: %08x\n", control->exit_code);
+ pr_err("exit_info1: %016llx\n", control->exit_info_1);
+ pr_err("exit_info2: %016llx\n", control->exit_info_2);
+ pr_err("exit_int_info: %08x\n", ...This patch makes KVM on 32 bit SVM working again by correcting the masks used for iret interception. With the wrong masks the upper 32 bits of the intercepts are masked out which leaves vmrun unintercepted. This is not legal on svm and the vmrun fails. Bug was introduced by commits 95ba827313 and 3cfc3092. Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> --- arch/x86/kvm/svm.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 0201b06..74f7b9d 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2290,7 +2290,7 @@ static int cpuid_interception(struct vcpu_svm *svm) static int iret_interception(struct vcpu_svm *svm) { ++svm->vcpu.stat.nmi_window_exits; - svm->vmcb->control.intercept &= ~(1UL << INTERCEPT_IRET); + svm->vmcb->control.intercept &= ~(1ULL << INTERCEPT_IRET); svm->vcpu.arch.hflags |= HF_IRET_MASK; return 1; } @@ -2824,7 +2824,7 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu) svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI; vcpu->arch.hflags |= HF_NMI_MASK; - svm->vmcb->control.intercept |= (1UL << INTERCEPT_IRET); + svm->vmcb->control.intercept |= (1ULL << INTERCEPT_IRET); ++vcpu->stat.nmi_injections; } @@ -2891,10 +2891,10 @@ static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked) if (masked) { svm->vcpu.arch.hflags |= HF_NMI_MASK; - svm->vmcb->control.intercept |= (1UL << INTERCEPT_IRET); + svm->vmcb->control.intercept |= (1ULL << INTERCEPT_IRET); } else { svm->vcpu.arch.hflags &= ~HF_NMI_MASK; - svm->vmcb->control.intercept &= ~(1UL << INTERCEPT_IRET); + svm->vmcb->control.intercept &= ~(1ULL << INTERCEPT_IRET); } } -- 1.7.1 --
All applied, thanks. -- error compiling committee.c: too many arguments to function --
