Re: [PATCH 1/2] VMware guest detection for x86 and x86-64

Previous thread: linux-net: no next-20080221 tree by Stephen Rothwell on Thursday, February 21, 2008 - 3:58 am. (1 message)

Next thread: Apm_emulation and proper suspend by Kristoffer Ericson on Thursday, February 21, 2008 - 4:33 am. (2 messages)
From: Joerg Roedel
Date: Thursday, February 21, 2008 - 4:32 am

When running Linux inside KVM all MTRRs are blank because there is no reason to
set them up. So doing a WARN_ON if all MTRRs are blank is not necessary. It is
sufficient to print the warning message using printk.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kernel/cpu/mtrr/main.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index b6e136f..e7f95a9 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -689,7 +689,6 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
 	/* kvm/qemu doesn't have mtrr set right, don't trim them all */
 	if (!highest_pfn) {
 		printk(KERN_WARNING "WARNING: strange, CPU MTRRs all blank?\n");
-		WARN_ON(1);
 		return 0;
 	}
 
-- 
1.5.3.7



--

From: Ingo Molnar
Date: Thursday, February 21, 2008 - 4:54 am

instead of obscuring a possibly useful warning, please instead detect 
that it's a KVM guest and skip both the warning and the backtrace in 
that case.

	Ingo
--

From: Joerg Roedel
Date: Thursday, February 21, 2008 - 5:47 am

How usefull is the backtrace in that place? I agree that the printk
warning may be usefull, but I don't see why the backtrace from the
WARN_ON is necessary.

Joerg

-- 
           |           AMD Saxony Limited Liability Company & Co. KG
 Operating |         Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System    |                  Register Court Dresden: HRA 4896
 Research  |              General Partner authorized to represent:
 Center    |             AMD Saxony LLC (Wilmington, Delaware, US)
           | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy


--

From: Ingo Molnar
Date: Thursday, February 21, 2008 - 6:03 am

it allows us to collect such things on kerneloops.org for example.

	Ingo
--

From: Joerg Roedel
Date: Thursday, February 21, 2008 - 6:27 am

Makes sense. I will send an updated patch.

Joerg

-- 
           |           AMD Saxony Limited Liability Company & Co. KG
 Operating |         Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System    |                  Register Court Dresden: HRA 4896
 Research  |              General Partner authorized to represent:
 Center    |             AMD Saxony LLC (Wilmington, Delaware, US)
           | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy


--

From: Yan Li
Date: Sunday, September 7, 2008 - 4:47 pm

This patch depends on VMware detection.

Signed-off-by: Yan Li <elliot.li.tech@gmail.com>
---
 arch/x86/kernel/cpu/mtrr/main.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index b117d7f..7e2bd23 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -45,6 +45,7 @@
 #include <asm/processor.h>
 #include <asm/msr.h>
 #include <asm/kvm_para.h>
+#include <asm/vmware.h>
 #include "mtrr.h"
 
 u32 num_var_ranges = 0;
@@ -1496,8 +1497,8 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
 
 	/* kvm/qemu doesn't have mtrr set right, don't trim them all */
 	if (!highest_pfn) {
-		WARN(!kvm_para_available(), KERN_WARNING
-				"WARNING: strange, CPU MTRRs all blank?\n");
+		WARN(!(kvm_para_available() || (is_vmware_guest())),
+		     KERN_WARNING "WARNING: strange, CPU MTRRs all blank?\n");
 		return 0;
 	}
 
-- 
1.5.6.3


-- 
Li, Yan

"Everything that is really great and inspiring is created by the
individual who can labor in freedom."
              - Albert Einstein, in Out of My Later Years (1950)
--

From: Yan Li
Date: Sunday, September 7, 2008 - 4:45 pm

This patch detects whether we are running as a VMware guest or
not. Used 'official' detection code from VMware's Open Virtual Machine
Tools (open-vm-tools/checkvm), with LGPLv2.1 changed to GPLv2.

It provides a function:
int is_vmware_guest(void)
that can be used easily to detect if we are running as a VMware guest.

Currently this can be useful in suppressing false warning from mtrr
module, hope some other modules find this useful too (like adopting
less aggressive strategy on cache using, since the host is already
doing cache for us, etc.)

Signed-off-by: Yan Li <elliot.li.tech@gmail.com>
---
 arch/x86/lib/Makefile    |    1 +
 arch/x86/lib/vmware.c    |   61 ++++++++++++++++++++++++++++++++++++++++++++++
 include/asm-x86/vmware.h |   30 ++++++++++++++++++++++
 3 files changed, 92 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/lib/vmware.c
 create mode 100644 include/asm-x86/vmware.h

diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index aa3fa41..8327a12 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -8,6 +8,7 @@ lib-y := delay.o
 lib-y += thunk_$(BITS).o
 lib-y += usercopy_$(BITS).o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
+lib-y += vmware.o
 
 ifeq ($(CONFIG_X86_32),y)
         lib-y += checksum_32.o
diff --git a/arch/x86/lib/vmware.c b/arch/x86/lib/vmware.c
new file mode 100644
index 0000000..39c9360
--- /dev/null
+++ b/arch/x86/lib/vmware.c
@@ -0,0 +1,61 @@
+/*
+ * Check if we are running as a VMware guest or not
+ *
+ * Copyright (C) 2007 VMware, Inc. All rights reserved.
+ * Adapted to Linux by Yan Li <elliot.li.tech@gmail.com>
+ * from open-vm-tools/checkvm
+ * from VMware's Open Virtual Machine Tools (under LGPLv2.1)
+ *               (open-vm-tools.sourceforge.net)
+ *
+ * The original codes from VMware are licensed under LGPLv2.1, I (Yan
+ * Li) converted the following parts to use GPL license as stated in
+ * COPYING file of Linux.
+ */
+
+#include <linux/kernel.h>
+#include ...
From: David Dillow
Date: Sunday, September 7, 2008 - 5:36 pm

If you want this to be used by more callsites, it probably doesn't make
sense to have it print out a message each time.

In fact would it make more sense to have a framework (cpu feature flag?)
to detect that we're in any virtualized environment and make this one of
the detection routines, and perhaps cache the result. Especially if this
detection would be used to manage anything near a hot-path in the page
cache as you suggested.

But maybe that's overkill.

--

From: Yan Li
Date: Sunday, September 7, 2008 - 6:49 pm

Sure.  Another possible solution is to print that message at the first
call only, it's good for debugging to keep that in dmesg.

If it's not possible to do live migration in/out of a running VMware
environment, we can also safely do this detection only at the first

That sounds good too. And I think the current routines for detecting
KVM, Xen and VMWare are all ready. I can do that if there are more
positive feedbacks.

Thanks.

-- 
Li, Yan

"Everything that is really great and inspiring is created by the
individual who can labor in freedom."
              - Albert Einstein, in Out of My Later Years (1950)
--

From: Ingo Molnar
Date: Monday, September 8, 2008 - 7:04 am

hm, i know it's not your fault as you just took this vmware code, but 
this is really not an acceptable method of detection. The above is 
totally unsafe to do on native hardware - we dont know whether there's 
anything on that port.

vmware could have used one of the following methods to communicate to 
the guest kernel:

 - a CPUID and an MSR range - like a good virtual CPU should. That way 
   even bootloaders could detect the presence of vmware.
 - or a PCI ID and a PCI driver like KVM does
 - or a system call hypercall gateway like Xen and KVM does
 - or it could even have used a DMI signature of some sort

but no, vmware had to use 30 year old unsafe ISA port magic...

To add insult to injury that port is named 'backdoor' - very smart and 
confidence raising naming. Plus it does not even use some well-known PC 
port that is harmless to read - it has to be from the middle of the 
generic IO port resource range where a real PCI card could sit: 0x5658. 
Brilliant.

is there really no vmware PCI ID to query? Could you post the lspci -v 
output of your vmware guest? We could add an early-quirk for one of the 
core vmware PCI devices (in case there are any - i bet there are).

	Ingo
--

From: Yan Li
Date: Monday, September 8, 2008 - 5:20 pm

Yeah, I agree with you, it's a bad method. I just took it for granted
that vmware has done the necessary study and they knew what they were
doing.  I have tested it on two boxes so I thought they were OK. Now I

They haven't done this. Per VMware's design, the cpuinfo in virtual
guest is identical to the underlying physical CPU. I guess they want
to send most of the code to run on underlying CPU directly and won't


I think they didn't use this way cause VMware wanted it to be

Some people are using this idea. From dmidcode, the VMware-related
parts are:

Handle 0x0001, DMI type 1, 25 bytes
System Information
        Manufacturer: VMware, Inc.
        Product Name: VMware Virtual Platform
        Version: None
        Serial Number: VMware-56 4d d2 bf 8d ea 6e ec-81 67 6d 50 42
	72 07 46
        UUID: 564DD2BF-8DEA-6EEC-8167-6D5042720746
        Wake-up Type: Power Switch
............

Handle 0x001A, DMI type 10, 8 bytes
On Board Device 1 Information
        Type: Video
        Status: Disabled
        Description: VMware SVGA II

I think it's pretty safe to assume all VMware products include

VMware may change the PCI ID at their will so I prefer checking the
DMI since it's easier.

So if we ditched the official method we run the risk of some false
negatives.  But checking the DMI manufacturer would be good enough.

-- 
Li, Yan

"Everything that is really great and inspiring is created by the
individual who can labor in freedom."
              - Albert Einstein, in Out of My Later Years (1950)
--

From: H. Peter Anvin
Date: Monday, September 8, 2008 - 5:34 pm

If we get false negatives that is quite frankly their problem, not ours. 
  If nothing else, we should be able to look for a host bridge with the 
VMWare vendor ID -- that should arguably be safer than DMI.

	-hpa
--

From: Yan Li
Date: Tuesday, September 9, 2008 - 5:28 am

Yeah, VMware's PCI vendor Id: 0x15AD

Why using PCI vendor ID is safer than DMI?

-- 
Li, Yan

"Everything that is really great and inspiring is created by the
individual who can labor in freedom."
              - Albert Einstein, in Out of My Later Years (1950)
--

From: H. Peter Anvin
Date: Tuesday, September 9, 2008 - 1:12 pm

Mostly because DMI is human-readable and therefore more likely to change 
for non-technical reasons.

	-hpa

--

From: Yan Li
Date: Tuesday, September 16, 2008 - 6:32 am

I found that in this situation we can't use PCI info.  My intention to
do this is to fix the false warning from
arch/x86/kernel/cpu/mtrr/main.c (around L695). When booting a VMware
guest we got:
"WARNING: strange, CPU MTRRs all blank?"

For VMware guest this warning is false, just as that for a KVM guest.

This code is from mtrr_trim_uncached_memory(), and used by
setup_arch(), which is used far before PCI is ready.

Therefore I think we can only use DMI here. Any idea?

Thanks!

-- 
Li, Yan

"Everything that is really great and inspiring is created by the
individual who can labor in freedom."
              - Albert Einstein, in Out of My Later Years (1950)
--

From: Ingo Molnar
Date: Wednesday, September 17, 2008 - 3:52 am

PCI quirks can be used almost arbitrarily early stage, see:
arch/x86/kernel/early-quirks.c.

Adding a VM identification callback to early-quirks.c would be fine. But 
if there's a reliable and specific enough DMI string that's fine as 
well. (but PCI is better, since it's a generally more stable enumeration 
interface)

	Ingo
--

From: Yan Li
Date: Wednesday, September 17, 2008 - 7:03 am

The problem here is that mtrr_trim_uncached_memory() is called 108
lines before the invocation of early_quirks(), and 48 lines before
that of dmi_scan_machine(). That's quite early.  The only thing ran
before that is the initialization of CPU, so we have nearly nothing to
use to check the fingerprint of the underlying machine.

I feel It's also unfit to touch the whole PCI or DMI thing before CPU
registers and memory are settled.  A simple solution here is to only
issue a KERN_INFO when we detected mtrr is empty and later, when we
can be sure that the OS is not running as a VM, issue a warning. The
later part can be done in early_quirks().


-- 
Li, Yan
--

From: Ingo Molnar
Date: Wednesday, September 17, 2008 - 7:10 am

that still leaves the CPUID/MSR method for the virtualizer to announce 

ok, we can move the MTRR message further back, to after the early quirks 
phase.

	Ingo
--

From: H. Peter Anvin
Date: Wednesday, September 17, 2008 - 8:38 am

FWIW, it's getting pretty clear with the recent bout of Virtual PC bugs 
that we need virtualizer detection, and that a lot of VMs are doing 
various idiotic things.

Again, with Virtual PC, it seems that DMI is the preferred detection 
method, as disgusting as it is, simply because the alternatives are the 
moral equivalent of ad hoc probing for ISA cards (a random I/O port for 

Makes sense to me.

	-hpa

--

From: Yan Li
Date: Wednesday, September 24, 2008 - 5:22 am

Detects whether we are running as a VMware guest or not. Detection is
based upon DMI vendor string.

It provides a function:
int is_vmware_guest(void)
that can be used easily to detect if we are running as a VMware guest
or not.

I haven't used PCI vendor Id since that requires copying a trunk of
codes from early_quirks() and I think copying code is not good. And
reusing codes from early_quirks() needs intrivial change to present
codes structure. Comparatively, checking "VMware" string against DMI
manufacturer is a lot more simpler (one-line code). Also there's no
evidence indicating that VMware will change their vendor string in
near future. Therefore I choose to use simpler way.

Tested on x86 and x86-64 VMs and machines.

Signed-off-by: Yan Li <elliot.li.tech@gmail.com>
---
 arch/x86/Kconfig         |   10 ++++++++++
 arch/x86/kernel/Makefile |    1 +
 arch/x86/kernel/vmware.c |   23 +++++++++++++++++++++++
 include/asm-x86/vmware.h |   20 ++++++++++++++++++++
 4 files changed, 54 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/kernel/vmware.c
 create mode 100644 include/asm-x86/vmware.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ed92864..85dfebd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -445,6 +445,16 @@ config PARAVIRT_DEBUG
          Enable to debug paravirt_ops internals.  Specifically, BUG if
 	 a paravirt_op is missing when it is called.
 
+config VMWARE_GUEST_DETECT
+	bool "VMware guest detection support"
+	default y
+	depends on DMI && !X86_VOYAGER
+	help
+          This enables detection of running as a full-virtualized
+          VMware guest (as under VMware Workstation or VMware
+          Server). Currently this is used to suppress false warnings
+          from initialization.
+
 config MEMTEST
 	bool "Memtest"
 	help
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 3db651f..a3a16a8 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -87,6 +87,7 @@ ...
From: Cristi Magherusan
Date: Wednesday, September 24, 2008 - 7:10 am

We can also use this feature to force the HZ value to 100 or 250 at most
when running in a virtual environment, since VirtualBox had some issues
with this by taking a lot of CPU time when the HZ was set to 1000.

Cristi

--=20
Cristi M=C4=83gheru=C8=99an,
Inginer de sistem/retea
Universitatea Tehnic=C4=83 din Cluj-Napoca
Centrul de Comunica=C8=9Bii "Pusztai Kalman"
Tel. 0264/401247  http://cc.utcluj.ro
From: Yan Li
Date: Wednesday, September 24, 2008 - 7:23 am

That's good. But this function is used for detecting VMware guest
only.  Do you think VMware also suffers from this problem?

-- 
Li, Yan
--

From: Alok kataria
Date: Wednesday, September 24, 2008 - 9:19 am

Hi Yan,

Thanks for doing this patch.
It would be really beneficial to detect if we are running on a
hypervisor in general. Though i think the approach should be more
generic,  so that we have a common interface for all the hypervisors.
I have some patches which use "cpuid" to detect if we are running on a
hypevisor and use various cpuid leafs to get some hypervisor specific
info.
This CPUID interface will be available only in the newer (read,
Hardware version 7) version of VMware products. So still for the
products which don't use the newer hardware version, this patch is
helpful.

Btw, are you pushing these patches for the 2.6.27 release ? If this is
for the x86 tree(2.6.28) i think we should hold on, until i post the
proposal for the cpuid patches, so that we can unify this and have a
generic way to detect on which hypervisor are we running .

Thanks,
--

From: H. Peter Anvin
Date: Wednesday, September 24, 2008 - 9:21 am

I don't think there is any way in hell this is going into 2.6.27.

For it to make 2.6.28 it will have to be ready very soon.

	-hpa
--

From: Yan Li
Date: Wednesday, September 24, 2008 - 5:19 pm

I'd like to see it in 2.6.28 to fix the false warning here. I'll take
several comments here and post a improved patch soon (changing code is
fast but testing them on VMs here with different configurations are
time-consuming). But I think people could just start to test this
version of patch since further change will be mostly cosmetic, FWIW.

-- 
Li, Yan
--

From: Yan Li
Date: Wednesday, September 24, 2008 - 5:15 pm

Hi Alok,

Thanks for your comments. Sure, it's good to add a common interface

My motivation behind this patch is to serve the MTRR codes to fix a
false warning, so I'd like to see it in 2.6.28 as soon as
possible. The latest 2.6.27-rc7 is issuing false warning when running
under the VMware Server 1.0.7, complaining that MTRR's all blank.
Currently the false warning has been confirmed under both KVM and
VMware so the detection for these two VMs are added in my [PATCH
2/2]. For this specific reason (fixing false warning), a common
interface maybe not necessary unless we are sure all VMs have their
CPU's MTRR blank (it would be very difficult to confirm this on all
VMs human has made). Therefore I'd like to make this patch as simple
as possible and make into 2.6.28 since it's fixing false warning (one
can say it's a regression since at least 2.6.24 doesn't issue a false
warning here).

Also I'd be very happy to work with you to combine this with your
CPUID detection code. I think VMware Server 2.0 is using Hardware
version 7 VM, right? So I can combine your and my code to test it.

But my concern is that for such a simple function (detecting VMware,
not a common interface), is it worth to have more codes to use two
different ways for detecting new and older VMwware while a simple
dmi_name_in_vendors() might be enough in both situation? I don't think
bloating the kernel is good.

Thanks!

-- 
Li, Yan
--

From: H. Peter Anvin
Date: Wednesday, September 24, 2008 - 5:26 pm

It's not a false warning.  It's a true warning.

	-hpa
--

From: Yan Li
Date: Wednesday, September 24, 2008 - 7:34 pm

You can say that warning is justified because it's true that the
MTRR's are all blank in VMware. But that warning is no good to VMware
users since it's by-design all MTRR's are blank. For example, we've
scripts watching the log for lines containing "warning" and "error" on
all production servers. So this warning actually is "false" to VMware
guest users. A KERN_INFO message should be enough here.

We already have a checking to suppress warning on KVM so I think we
should also suppress the warning for VMware guest and Virtual PC.

-- 
Li, Yan
--

From: Cristi Magherusan
Date: Wednesday, September 24, 2008 - 11:13 am

I don't know for sure about VMware, but someone who has it installed can
try it. I had this issue with a CentOS 5-server virtual machine
downloaded from http://www.thoughtpolice.co.uk/vmware/

The fix consisted in using a kernel compiled with the HZ value set to
100 instead of the default which was 1000.

Cristi

--=20
Cristi M=C4=83gheru=C8=99an,
Inginer de sistem/retea
Universitatea Tehnic=C4=83 din Cluj-Napoca
Centrul de Comunica=C8=9Bii "Pusztai Kalman"
Tel. 0264/401247  http://cc.utcluj.ro
From: H. Peter Anvin
Date: Wednesday, September 24, 2008 - 11:16 am

HZ is a compile-time constant, though.  Changing that would require 
adding a bunch of general divides, at the very least.

	-hpa
--

From: Yan Li
Date: Wednesday, September 24, 2008 - 5:23 pm

Oh I never heard about this. I've been using several VMware VMs
(combined RHEL and SLES and Debian) but haven't seen such
issue. What's the symptom?


-- 
Li, Yan
--

From: Bernd Eckenfels
Date: Wednesday, September 24, 2008 - 6:28 pm

Having a high HZ slows down VMs and also leads to tick loss (time drift). HZ
100 or 250 is recommended by most Vendors.

I think some distributions use the reduced HZ value by default anyway, so
you might never had a problem with this.

It is also only a problem with system load and timekeeping, which is not
always obvious. When running a lot of nearly idle VM Guests you might see
it.

Gruss
Bernd
--

From: H. Peter Anvin
Date: Wednesday, September 24, 2008 - 9:19 am

I'd like to make this a general VM platform detection subsystem.  We 
have similar issues with Virtual PC, and again, DMI appears to be the 
sanest way to detect it -- at least to a primary screen.

	-hpa
--

From: Yan Li
Date: Wednesday, September 24, 2008 - 5:32 pm

I think if Virtual PC has similar problems we should add codes to
detect Virtual PC to be used by mtrr/main.c. A general interface might
not be good for this specific problem (false MTRR blank warning) since
we have no way to know all VMs has MTRR set to blank thus handling
KVM, VMware and Virtual PC here should be enough for now.

If you can tell me what manufacture vendor string is in Virtual PC I
can make another similar patch using dmi_name_in_vendors().

So at first I'd like to see the false warning for VMware and Virtual
PC get fixed soon.

-- 
Li, Yan
--

From: H. Peter Anvin
Date: Wednesday, September 24, 2008 - 5:37 pm

Supposedly "Microsoft Corporation", "Virtual Machine".  No idea what 
pre-MS versions of VPC return.

	-hpa
--

From: Yan Li
Date: Wednesday, September 24, 2008 - 7:48 pm

Thanks, so we can begin with this.

-- 
Li, Yan
--

From: David Sanders
Date: Thursday, September 25, 2008 - 2:56 am

From: i8042-x86ia64io.h

	{
		.ident = "Microsoft Virtual Machine",
		.matches = {
			DMI_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
			DMI_MATCH(DMI_PRODUCT_NAME, "Virtual Machine"),
			DMI_MATCH(DMI_PRODUCT_VERSION, "VS2005R2"),
		},
	},
--

From: Yan Li
Date: Thursday, September 25, 2008 - 3:23 am

Oh, cool, thanks!

-- 
Li, Yan
--

From: Greg KH
Date: Wednesday, September 24, 2008 - 7:23 pm

Why do we need to do this within the kernel, what is that going to
achieve?

People can do this easily in userspace if they need to detect this, I
think there's a patch for util-linux-ng adding such a simple utility
that handles almost all of the known virtualization engines right now.

thanks,

greg k-h
--

From: Yan Li
Date: Wednesday, September 24, 2008 - 7:47 pm

Hi Greg,

For me this is used in the next patch (for mtrr/main.c) to suppress an
unnecessary warning when running as a VMware guest:
http://lkml.org/lkml/2008/9/24/144

We already have code to suppress warning under KVM so the above patch
suppress warnings for VMware guest also.

H. Peter Anvin and Alok kataria are also proposing we may need a more
general approach for detecting hypervisors that can be used for some
other quirks.

Thanks.


-- 
Li, Yan
--

From: Greg KH
Date: Wednesday, September 24, 2008 - 7:55 pm

Well, having a config option like this isn't the way to go as it will be
forced on for all distros and users anyway.

A simple cpuid test is the easier way to do this, that's what the
userspace tools do, if it's really needed in the kernel.  But hopefully,
such things shouldn't be needed within the kernel as it's not Linux's
fault that the hypervisor has bugs in it :)

We wouldn't be wanting to work around bugs in Microsoft's hypervisor,
would we?

thanks,

greg k-h
--

From: Yan Li
Date: Wednesday, September 24, 2008 - 8:29 pm

I think it's a common practice for VM to blank the MTRRs rather than a
bug. Many hypervisors (KVM, VMware, Virtual PC) are doing this since
long before. Therefore I think issuing a warning here complaining

My idea is that this should be included in all general purpose kernels
or the vendors may have to cope with flood questions about boot time
warnings when using under VMware/KVM/Virtual PC.  It's configurable so
good for vendors who wish to provide different kernels for using with

A simple CPUID test is good but can't be used for VMware guest since
they just use underlying CPUID, so nothing special here can be


-- 
Li, Yan
--

From: H. Peter Anvin
Date: Wednesday, September 24, 2008 - 9:54 pm

We pretty much have to, just as we have to work around bugs in, say, 
AMD's microcode.  We have avoided it so far, but it's gotten to a 
breaking point, and rather than having ad hoc hacks scattered all over 
the place I want a centralized test site setting a single global variable.

Unfortunately, hypervisor vendors haven't adopted a uniform detection 
scheme (CPUID level 0x40000000 is sometimes mentioned as a 
pseudo-standard, but it's not universal, and not all virtualization 
solutions even can override CPUID.)

	-hpa
--

From: Greg KH
Date: Thursday, September 25, 2008 - 5:56 am

Ah, I was hoping they were all doing this, as it seems the most "sane"
manner.  Good luck :)

greg k-h
--

From: Yan Li
Date: Thursday, September 25, 2008 - 7:38 am

That sounds great but technically this centralized test can only be
done after dmi_scan_machine(), so it can't help the detection code in
mtrr_trim_uncached_memory() which is ran very early before
dmi_scan_machine(). So I think my patch is still necessary unless we
want to live with the warning message in all VMware guest.

-- 
Li, Yan
--

From: Alok kataria
Date: Wednesday, September 24, 2008 - 7:28 pm

Sorry for joining the discussion this late. But i only noticed this
after somebody pointed me to it.


Even if there is anything on that port on native hardware it would
work perfectly well and is _safe_.
First let me post the code to access this backdoor port (the way it
should really be done )
-------------------------------------------------------------------------------
#define VMWARE_BDOOR_MAGIC     0x564D5868
#define VMWARE_BDOOR_PORT      0x5658

#define VMWARE_BDOOR_CMD_GETVERSION         10
#define VMWARE_BDOOR_CMD_GETHZ              45
#define VMWARE_BDOOR_CMD_LAZYTIMEREMULATION 49

#define VMWARE_BDOOR(cmd, eax, ebx, ecx, edx)                         \
        __asm__("inl (%%dx)" :                                        \
                "=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :          \
                "0"(VMWARE_BDOOR_MAGIC), "1"(VMWARE_BDOOR_CMD_##cmd), \
                "2"(VMWARE_BDOOR_PORT), "3"(0) :                      \
                "memory");


static inline int vmware_platform(void)
{
        uint32_t eax, ebx, ecx, edx;
        VMWARE_BDOOR(GETVERSION, eax, ebx, ecx, edx);
        return eax != (uint32_t)-1 && ebx == VMWARE_BDOOR_MAGIC;
}
------------------------------------------------------------------------------------------------------

So whenever we query port  0x5658 , with the GETVERSION command (which
is the first thing we do with this port), we expect that  eax !=
0xFFFFFFFF   and ebx has a  VMWARE specific MAGIC value.  Please note
that ebx has been initialized to zero in the code above.

Now  consider the 2 possible cases on Native hardware
1. Nothing on port 0x5658
In this case the hardware will write a value == 0xFFFFFFFF which will
result in vmware_platform returning zero.
2. Device on port 0x5658
In this case the hardware may return a legitimate value in register
eax, but won't update register ebx. Whereas we check for a MAGIC value
in ebx for this port access. The result is vmware_platform returning
zero.

Also ...
From: H. Peter Anvin
Date: Wednesday, September 24, 2008 - 9:38 pm

You have no idea what you just did to a real piece of hardware.

	-hpa
--

From: Alok Kataria
Date: Wednesday, September 24, 2008 - 9:46 pm

Why ? what do you mean ? 
ebx is a local variable in the code above that i posted. 
Only when on hypervisor will we write the magic value over there.
How can this affect native hardware, i fail to understand. 
Please explain.

Thanks,

--

From: H. Peter Anvin
Date: Wednesday, September 24, 2008 - 9:54 pm

You accessed a bloody I/O port!

If you think it's harmless because it was an IN, you're sorely mistaken.

	-hpa

--

From: Alok Kataria
Date: Wednesday, September 24, 2008 - 10:02 pm

Hi Peter, 

It would be really helpful if you could explain me when can this go
wrong or what kinds of problems can this cause on native hardware.

Thanks,

--

From: H. Peter Anvin
Date: Wednesday, September 24, 2008 - 10:04 pm

You accessed an unknown I/O port.

This means you caused an unknown action in an unknown peripheral device.

This could cause ANYTHING to happen.

	-hpa

--

From: Alok Kataria
Date: Wednesday, September 24, 2008 - 10:23 pm

Hmm...what can a IN on an unknown port cause on native hardware, if a
port is not being used it would return 0xFFFFFFFF in eax, and if you
have a real device there (a sane one), what can IN result in apart from
reading some IO register/counter value in eax ?
If there is anything apart from the above 2 outcomes, please let me know
exactly what you mean.

Thanks,
Alok

--

From: H. Peter Anvin
Date: Wednesday, September 24, 2008 - 10:30 pm

First, you are assuming all devices are "sane".  This is obviously wrong 
-- you're poking in hyperspace, and you don't know if you're going to 
hit someone's ancient controller card that perhaps drives a medical 
accelerator for all you know.

Second, you are assuming that devices you call "sane" don't have I/O 
ports with read side effects.  Many, if not most, devices have some I/O 
ports with read side effects, especially read-clear semantics and/or 
queue drain operations.

Third, in the real world hardware is buggy.  Not just a little, but 
severely so.  Accessing a part of a device which is uninitialized, 
powered down or plain broken can wedge the device or the whole system.

In short, poking at I/O ports which you don't know what they are at best 
takes us bad to the bad old days of ISA probing (without the protection 
of customary address assignments); I think it has to be an absolutely 
last resort and would be reflective of utterly incompetent design.  It 
is significantly *worse* than stealing random opcodes, Virtual PC-style, 
and that is also unacceptable.

	-hpa
--

From: Alan Cox
Date: Thursday, September 25, 2008 - 1:45 am

Changing the status of the device that is actually at the port, losing
IRQs, hanging the bus solid, causing data corruption (eg if you probe the
address of the data port of something like a disk doing a block transfer)

Alan
--

From: Zachary Amsden
Date: Thursday, September 25, 2008 - 1:48 pm

Peter's right.  The hardware device simply sees an I/O request to a port
and a read / write.  The actual internal implementation may not even see
whether it is a R/W, and may do anything.  Some of our virtual hardware
is activated in strange ways, reads where you would expect writes, etc.

For example, a device made by Exploder Technologies has two ports, 3686,
and 3687.  Read access to port 3686 returns the status register and any
access at all to port 3687 moves the robot arm, activating the
thermonuclear self destruct device and destroying the earth.  I have
such a device in my basement, but I have to be careful not to issue any
I/O to port 3687 on it, whether it is writes OR reads.

A less contrived example is a LFSR that returns a new cryptographically
random value on every read.  Reading a register would cause a state
change in the hardware, and this could be fatal to something that
requires exact synchronization of tokens, perhaps securID type
applications.

All that said, we've never encountered an I/O device that uses this ISA
port for anything at all.  However, some old broken hardware might
misdecode bus addresses and try to service the I/O request anyway.  So
while it might be an acceptible way for us to use in VMware tools where
it only ever makes sense to install in a VM anyway, it could be
considered non-appropriate for general kernel application.

The whole backdoor thing is also broken because it requires
non-architectural side effects to operate (IN instructions can not
arbitrarily change all GPRs).  This can confuse applications which are
very smart and try to single step over the instruction by emulating it,
logging the port I/O, then restoring GPRs to the state before execution
and writing the 1 register affected by the IN.  Such clever debuggers
and profiling tools have been written.

Zach

--

From: H. Peter Anvin
Date: Thursday, September 25, 2008 - 2:59 pm

To be fair, SMM sometimes also play these kinds of games -- even though 
it is equally frowned upon there.

However, it is the particular use of this for detection use that is 
utterly damning.  Using random I/O port probes for hardware detect 
should have disappeared in the early 1990's, and it's really disturbing 
that virtualization vendors -- not just VMWare -- are, in effect, 
re-making all the mistakes hardware vendors did in the 1980's.

Fortunately, we can usually use DMI to bail us out.  Just like we used 
to look for magic strings in the VGA BIOS so we could figure out what 
exact kind of SuperVGA card we have.

	-hpa


--

From: Zachary Amsden
Date: Thursday, September 25, 2008 - 3:20 pm

It's not disturbing, it's expected.  Re-using old broken solutions happens all the time, they can be perfectly valid in some contexts.  The problem is that they tend to live on and evolve into a larger context where they break again.  Surely we can do better, but how to do that isn't always clear-cut.  DMI is a pretty good standard for this, but it still doesn't solve the problem in all contexts (userspace apps).

Zach

--

From: H. Peter Anvin
Date: Thursday, September 25, 2008 - 3:27 pm

This, of course, is what CPUID is for.

	-hpa
--

From: Valdis.Kletnieks
Date: Friday, September 26, 2008 - 5:27 am

I remember from my old mainframe days that IBM actually got this right all the
way back in 1967 for the *first* product called VM - the 'Store CPUID'
instruction would give a specified result when executed on bare iron, but if
you did it inside a virtual machine running under CP-67 it would give a
documented different value that couldn't happen on bare iron.

So this mistake goes back a lot further than the 80s... :)


From: Gerd Hoffmann
Date: Friday, September 26, 2008 - 5:47 am

... except that it doesn't always work.  It requires vmx/svm, otherwise
cpuid doesn't trap and thus can't be filled by the hypervisor ...

cheers,
  Gerd

--

From: Valdis.Kletnieks
Date: Friday, September 26, 2008 - 6:22 am

Which would be hardware implementers in the '80s getting wrong the first few
times what IBM did right the first time.  Doesn't *anybody* do literature
searches before doing stuff anymore? ;)
From: H. Peter Anvin
Date: Friday, September 26, 2008 - 10:37 am

Well, yes.  There are some prety strong reasons to believe that Intel 
got that one wrong *deliberately*, until VMware finally forced their hand.

	-hpa
--

From: Pavel Machek
Date: Friday, October 3, 2008 - 7:12 am

Any details? Did intel try to force people to ia64?

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

From: Zachary Amsden
Date: Friday, September 26, 2008 - 1:35 pm

instruction which operates differently when running on bare metal vs. in
a hypervisor is never acceptible unless the instruction is trappable.

There will always be a guest which refuses to operate propely in a
hypervisor, either by defect or by design, so 'sensitive' instructions
should always be trappable.  Do it right and you can nest recursively ;)

Zach

--

From: David Sanders
Date: Thursday, September 25, 2008 - 3:17 pm

Don't give al-quieda any ideas.
--

Previous thread: linux-net: no next-20080221 tree by Stephen Rothwell on Thursday, February 21, 2008 - 3:58 am. (1 message)

Next thread: Apm_emulation and proper suspend by Kristoffer Ericson on Thursday, February 21, 2008 - 4:33 am. (2 messages)