Linux: KVM Paravirtualization

Submitted by Jeremy
on January 5, 2007 - 9:57pm

A new feature that will first be availble in the upcoming 2.6.20 kernel is KVM, a Kernel-based Virtual Machine. The project's webpage describes KVM as, "a full virtualization solution for Linux on x86 hardware. It consists of a loadable kernel module (kvm.ko) and a userspace component. Using KVM, one can run multiple virtual machines running unmodified Linux or Windows images. Each virtual machine has private virtualized hardware: a network card, disk, graphics adapter, etc." The project's FAQ explains that the functionality requires "an x86 machine running a recent Linux kernel on an Intel processor with VT (virtualization technology) extensions, or an AMD processor with SVM extensions (also called AMD-V)." The userland aspect of KVM is a slighlty modified version of qemu, used to instantiate the virtual machine.

Ingo Molnar [interview] announced a new patch introducing paravirtualization support for KVM, outdating the KVM FAQ which in comparing KVM to Xen notes, "Xen supports both full virtualization and a technique called paravirtualization, which allows better performance for modified guests. kvm does not at present support paravirtualization." In describing his patch which is against the 2.6.20-rc3 + KVM trunk kernel, Ingo said it, "includes support for the hardware cr3-cache feature of Intel-VMX CPUs. (which speeds up context switches and TLB flushes)". He went on to add, "some aspects of the code are still a bit ad-hoc and incomplete, but the code is stable enough in my testing and i'd like to have some feedback." In a series of benchmarks, he found 2-task context switch performance to be improved by a factor of four, while "hackbench 1" showed twice as good performance, and "hackbench 5" showed a 30% improvement. His email goes on to detail how the paravirtualization works.


From: Ingo Molnar [email blocked]
To: kvm-devel [email blocked]
Subject: [announce] [patch] KVM paravirtualization for Linux
Date:	Fri, 5 Jan 2007 22:52:23 +0100


i'm pleased to announce the first release of paravirtualized KVM (Linux 
under Linux), which includes support for the hardware cr3-cache feature 
of Intel-VMX CPUs. (which speeds up context switches and TLB flushes)

the patch is against 2.6.20-rc3 + KVM trunk and can be found at:

   http://redhat.com/~mingo/kvm-paravirt-patches/

Some aspects of the code are still a bit ad-hoc and incomplete, but the 
code is stable enough in my testing and i'd like to have some feedback. 

Firstly, here are some numbers:

2-task context-switch performance (in microseconds, lower is better):

 native:                       1.11
 ----------------------------------
 Qemu:                        61.18
 KVM upstream:                53.01
 KVM trunk:                    6.36
 KVM trunk+paravirt/cr3:       1.60

i.e. 2-task context-switch performance is faster by a factor of 4, and 
is now quite close to native speed!

"hackbench 1" (utilizes 40 tasks, numbers in seconds, lower is better):

 native:                       0.25
 ----------------------------------
 Qemu:                         7.8
 KVM upstream:                 2.8
 KVM trunk:                    0.55
 KVM paravirt/cr3:             0.36

almost twice as fast.

"hackbench 5" (utilizes 200 tasks, numbers in seconds, lower is better):

 native:                       0.9
 ----------------------------------
 Qemu:                        35.2
 KVM upstream:                 9.4
 KVM trunk:                    2.8
 KVM paravirt/cr3:             2.2

still a 30% improvement - which isnt too bad considering that 200 tasks 
are context-switching in this workload and the cr3 cache in current CPUs 
is only 4 entries.

the patchset does the following:

- it provides an ad-hoc paravirtualization hypercall API between a Linux 
  guest and a Linux host. (this will be replaced with a proper
  hypercall later on.)

- using the hypercall API it utilizes the "cr3 target cache" feature in 
  Intel VMX CPUs, and extends KVM to make use of that cache. This 
  feature allows the avoidance of expensive VM exits into hypervisor 
  context. (The guest needs to be 'aware' and the cache has to be
  shared between the guest and the hypervisor. So fully emulated OSs
  wont benefit from this feature.)

- a few simpler paravirtualization changes are done for Linux guests: IO 
  port delays do not cause a VM exit anymore, the i8259A IRQ controller 
  code got simplified (this will be replaced with a proper, hypercall
  based and host-maintained IRQ controller implementation) and TLB 
  flushes are more efficient, because no cr3 reads happen which would 
  otherwise cause a VM exit. These changes have a visible effect
  already: they reduce qemu's CPU usage when a guest idles in HLT, by 
  about 25%. (from ~20% CPU usage to 14% CPU usage if an -rt guest has 
  HZ=1000)

Paravirtualization is triggered via the kvm_paravirt=1 boot option (for 
now, this too is ad-hoc) - if that is passed then the KVM guest will 
probe for paravirtualization availability on the hypervisor side - and 
will use it if found. (If the guest does not find KVM-paravirt support 
on the hypervisor side then it will continue as a fully emulated guest.)

Issues: i only tested this on 32-bit VMX. (64-bit should work with not 
too many changes, the paravirt.c bits can be carried over to 64-bit 
almost as-is. But i didnt want to spread the code too wide.)

Comments, suggestions are welcome!

	Ingo


From: Zachary Amsden [email blocked] Subject: Re: [announce] [patch] KVM paravirtualization for Linux Date: Fri, 05 Jan 2007 14:15:19 -0800 Ingo Molnar wrote: > i'm pleased to announce the first release of paravirtualized KVM (Linux > under Linux), which includes support for the hardware cr3-cache feature > of Intel-VMX CPUs. (which speeds up context switches and TLB flushes) > > the patch is against 2.6.20-rc3 + KVM trunk and can be found at: > > http://redhat.com/~mingo/kvm-paravirt-patches/ > > Some aspects of the code are still a bit ad-hoc and incomplete, but the > code is stable enough in my testing and i'd like to have some feedback. > Your code looks generally good. I have some comments. You can't do this, even though you want to: -EXPORT_SYMBOL(paravirt_ops); +EXPORT_SYMBOL_GPL(paravirt_ops); The problem is it makes all modules GPL - or at least all modules that use any kind of locking, pull in the basic definitions to enable and disable interrupts, thus the paravirt_ops symbol, so basically all modules. What you really want is more like EXPORT_SYMBOL_READABLE_GPL(paravirt_ops); But I'm not sure that is technically feasible yet. The kvm code should probably go in kvm.c instead of paravirt.c. Index: linux/drivers/serial/8250.c =================================================================== --- linux.orig/drivers/serial/8250.c +++ linux/drivers/serial/8250.c @@ -1371,7 +1371,7 @@ static irqreturn_t serial8250_interrupt( l = l->next; - if (l == i->head && pass_counter++ > PASS_LIMIT) { + if (!kvm_paravirt Is this a bug that might happen under other virtualizations as well, not just kvm? Perhaps it deserves a disable feature instead of a kvm specific check. Which also gets rid of the need for this unusually placed extern: Index: linux/include/linux/sched.h =================================================================== --- linux.orig/include/linux/sched.h +++ linux/include/linux/sched.h @@ -1911,6 +1911,11 @@ static inline void set_task_cpu(struct t #endif /* CONFIG_SMP */ +/* + * Is paravirtualization active? + */ +extern int kvm_paravirt; +
From: Ingo Molnar [email blocked] Subject: Re: [announce] [patch] KVM paravirtualization for Linux Date: Fri, 5 Jan 2007 23:30:09 +0100 * Zachary Amsden [email blocked] wrote: > What you really want is more like > EXPORT_SYMBOL_READABLE_GPL(paravirt_ops); yep. Not a big issue - what is important is to put the paravirt ops into the read-only section so that it's somewhat harder for rootkits to modify. (Also, it needs to be made clear that this is fundamental, lowlevel system functionality written by people under the GPLv2, so that if you utilize it beyond its original purpose, using its internals, you likely create a work derived from the kernel. Something simple as irq disabling probably doesnt qualify, and that we exported to modules for a long time, but lots of other details do. So the existence of paravirt_ops isnt a free-for all.) > But I'm not sure that is technically feasible yet. > > The kvm code should probably go in kvm.c instead of paravirt.c. no. This is fundamental architecture boot code, not module code. kvm.c should eventually go into kernel/ and arch/*/kernel, not the other way around. > Index: linux/drivers/serial/8250.c > =================================================================== > --- linux.orig/drivers/serial/8250.c > +++ linux/drivers/serial/8250.c > @@ -1371,7 +1371,7 @@ static irqreturn_t serial8250_interrupt( > > l = l->next; > > - if (l == i->head && pass_counter++ > PASS_LIMIT) { > + if (!kvm_paravirt > > Is this a bug that might happen under other virtualizations as well, > not just kvm? Perhaps it deserves a disable feature instead of a kvm > specific check. yes - this limit is easily triggered via the KVM/Qemu virtual serial drivers. You can think of "kvm_paravirt" as "Linux paravirt", it's just a flag. Ingo
From: Anthony Liguori [email blocked] Subject: Re: [kvm-devel] [announce] [patch] KVM paravirtualization for Linux Date: Fri, 05 Jan 2007 17:02:28 -0600 This is pretty cool. I've read the VT spec a number of times and never understood why they included the CR3 caching :-) I suspect that you may even be faster than Xen for context switches because of the hardware assistance here. Any chance you can run your benchmarks against Xen? You may already know this, but there isn't a CR3 cache in SVM AFAIK so a fair bit of the code will probably have to be enabled conditionally. Otherwise, I don't think SVM support would be that hard. The only other odd bit I noticed was: > + magic_val = KVM_API_MAGIC; > + para_state->guest_version = KVM_PARA_API_VERSION; > + para_state->host_version = -1; > + para_state->size = sizeof(*para_state); > + > + asm volatile ("movl %0, %%cr3" > + : "=&r" (ret) > + : "a" (para_state), > + "0" (magic_val) > + ); If I read this correctly, you're using a CR3 write as a hypercall (and relying on registers being set/returned that aren't part of the actual CR3 move)? Any reason for not just using vmcall? It seems a bit less awkward. Even a PIO operation would be a bit cleaner IMHO. PIO exits tend to be fast in VT/SVM so that's an added benefit. Regards, Anthony Liguori

Related Links:

3D graphics in a guest OS

Anonymous (not verified)
on
January 6, 2007 - 9:13am

Will guest OS's have access to video hardware to provide accelerated 3d graphics using kvm?

no

Anonymous (not verified)
on
January 7, 2007 - 4:04am

no

with xen it is possible to

Anonymous (not verified)
on
January 7, 2007 - 10:13am

with xen it is possible to use a second graphic card for a guest. is this also true for kvm?

Should be possible to

Anonymous (not verified)
on
January 8, 2007 - 6:23am

Should be possible to implement, but would probably require KVM/Xen-aware drivers for guest Windows (while host should ignore it).
IOMMU will allow usage of second card for guest system even without such drivers, but only AMD systems (chipsets) are supposed to get this soon.

I hope big players will tackle this problem with their drivers and try to make this holy grail of virtualisation a reality: ability to playing Windows 3D games uder virtualised Windows at near-native performance.

Jail

Anonymous (not verified)
on
January 8, 2007 - 5:27am

Sound like jail in BSD family? But jail can run without need VT extensions.

No

Anonymous (not verified)
on
January 8, 2007 - 11:07am

Sound like jail in BSD family? But jail can run without need VT extensions.

No, sound nothing like jail in BSD family. Jail can run without need VT extensions because jail don't do virtualisation.

I do hope that clear?

Actually BSD jails do

Anonymous (not verified)
on
August 7, 2007 - 11:27pm

Actually BSD jails do virtualize their network stack (at least) and more work is on going to virtualize more of the kernel - but not all so it's sort of like paravirtualization ;-)

They can do a kernel, so they can do everything?

Anonymous (not verified)
on
January 9, 2007 - 8:13am

What follows is a rant.

Christoph Lameter wrote in <Pine.LNX.4.64.0701081016140.9173@schroedinger.engr.sgi.com>:

"Xen is duplicating basic OS components like the scheduler etc. As
a result its difficult to maintain and not well integrated with Linux.
KVM looks like a better approach."

So, everything has to be done the Linux way, right? Xen doesn't get active support because the Xen guys are too stupid to do it the Linux way. It's ridiculous - they even reinvented the wheel whilst Linux already has five of them! So let's quickly build something similar on our own. Lets put a lot of effort into it just to be able to say "I've done it my way". Yeah. It's computer science masturbation and it's sooo much fun.

Oh, btw, did I tell you? kvm has already won because it - only - runs on future hardware, unlike Xen, which also runs on current hardware.

Ok, let's stop here. I really like Linux a lot and I admire many of the top guys for what they're doing. But I often feel like there's a lot of narcism that sometimes rules more that technical aspects and the will to cooperate.

What is kvm good for? In my eyes, it's not Xen who's the intruder or alien, it's kvm as it duplicates already existning functionality.

Christoph Hellwig wrote in <20070107182946.GA8158@infradead.org>:

"After all the Novell Marketing Hype you'll probably have to keep Xen ;-)
Except for that I suspect a paravirt kvm or lhype might be the better
hypervisor choice in the long term.".

I'd like to see Christoph's argument for his guess. But hey, who cares, now that Christoph has made his opinion public? Now there's "Two Thumbs Up" for kvm, one from Christoph and the other from, hey, Christoph again. I hope to find a mentor too that has such a strong belief in me. Even if he can't give any arguments for that.

commodity hardware quirks

Anonymous (not verified)
on
January 9, 2007 - 9:31am

Aside from the duplication of lots of tricky code, and potential performance improvements, there is another simple issue: unlike, say, VM running on an IBM mainframe, where hardware, firmware, and software are designed to work together, operating systems and environments running on "commodity" hardware need to accommodate every broken motherboard, BIOS,
firmware, etc. It's all there to see in the Linux kernel source: workarounds for brain-dead interrupt controllers, buggy BIOS and ACPI memory maps and IRQ tables, flaky clock hardware and fixups for controllers, ...

Some of these hardware issues can be worked around from within the privileged domain, while others cannot.

you are wrong

on
January 9, 2007 - 12:52pm

First, KVM doesn't require "future" hardware. It runs on current processors, sold by Intel for nearly a year and by AMD for half a year.

Second, read the news you are replying. Paravirtualization means that even older processor do work. In this regard Xen and KVM are identical.

KVM is pretty and integrated, Xen is fugly, but it a) has more features right now (like live migration); b) is industry standard, supported by Linux, some *BSD and Solaris.

--
:wq

KVM does require recent hardware

MarkWilliamson (not verified)
on
January 9, 2007 - 10:03pm

Disclaimer: I do work on Xen.

I think the point regarding "future" hardware was possibly that the hardware KVM requires currently is not widespread, but it's certainly out there in lots of consumer machines and if people buy a new system to run virtual machines on (which seems reasonable in an enterprise setting) then it'll have the extensions.

KVM does require this hardware at the moment, though. Its paravirtualisation is a performance optimisation but does not allow it to run on older processors. I think that it's paravirtualisation is currently a fairly long way from doing this... It's still valid to use it as a performance optimisation though.

I'm not entirely convinced by the argument that KVM is a pretty solution, though I confess I've not looked at the code in detail. It certainly has lots of advantages - seemingly it's particularly handy for desktop use. OTOH, on the server I'd say there are some good arguments for using a hypervisor at the moment (e.g. arguably stronger resource isolation).

Nice thing is, we can use both. A world in which we have enterprise distros running Xen and desktop distros running KVM seems quite reasonable - similar to the VMware ESX server / VMware workstation distinction in the closed source world. If that were to happen, I'd say everybody wins... Certainly it seems like Linux users stand to benefit from all sorts of different ways the virtualisation market should shake out.

Well, there are actually

Anonymous (not verified)
on
February 18, 2007 - 8:07pm

Well, there are actually technical reasons in favor of KVM. And kernel developers usually don't mention them because they are obvious to them.

XEN was once designed as a very thin layer underneith the guest OSes. That worked fairly well for the machines XEN was originally used on. But then crazy people came along and wanted to port XEN to other machines, like those 1024CPU altix ones. And they noticed many problems when doing so.

Most of those problems fell into two categories. Either the hardware contained bugs and XEN needed a workaround. In those cases the usual method was to take the workaround from Linux and add it to XEN. Or XEN ran into a scalability problem and was too slow on those machines. In these cases code from Linux was taken, as it already is scalable on all those machines, and added to XEN.

So overall we end up with a formerly thin layer that is accumulating more and more code from Linux as it is getting ported to more and more machines Linux is already running on. Most kernel developers consider that to be plain silly.

KVM versus Xen

Jan Stafford (not verified)
on
March 7, 2007 - 4:47pm

KVM's thinner profile makes it a good match for the kernel, but what about using Xen in other software development schemes? What about XenSource instead of VMware as the platform running on Linux inside an ISV's virtual appliance? I'd like to hear developers' take on that.

when will this patch be merged?

gerr (not verified)
on
January 16, 2007 - 8:29am

I dont know whether this patch will be merged,if so and when?

Actually BSD jails do

on
September 18, 2007 - 3:16am

Actually BSD jails do virtualize their network stack (at least) and more work is on going to virtualize more of the kernel - but not all so it's sort of like paravirtualization ;-)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.