When running Linux inside KVM all MTRRs are blank because there is no reason to
set them up. So doing a WARN_ON if all MTRRs are blank is not necessary. It is
sufficient to print the warning message using printk.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
arch/x86/kernel/cpu/mtrr/main.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index b6e136f..e7f95a9 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -689,7 +689,6 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
/* kvm/qemu doesn't have mtrr set right, don't trim them all */
if (!highest_pfn) {
printk(KERN_WARNING "WARNING: strange, CPU MTRRs all blank?\n");
- WARN_ON(1);
return 0;
}
--
1.5.3.7
--
instead of obscuring a possibly useful warning, please instead detect that it's a KVM guest and skip both the warning and the backtrace in that case. Ingo --
How usefull is the backtrace in that place? I agree that the printk
warning may be usefull, but I don't see why the backtrace from the
WARN_ON is necessary.
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
--
it allows us to collect such things on kerneloops.org for example. Ingo --
Makes sense. I will send an updated patch.
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
--
This patch depends on VMware detection.
Signed-off-by: Yan Li <elliot.li.tech@gmail.com>
---
arch/x86/kernel/cpu/mtrr/main.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index b117d7f..7e2bd23 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -45,6 +45,7 @@
#include <asm/processor.h>
#include <asm/msr.h>
#include <asm/kvm_para.h>
+#include <asm/vmware.h>
#include "mtrr.h"
u32 num_var_ranges = 0;
@@ -1496,8 +1497,8 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
/* kvm/qemu doesn't have mtrr set right, don't trim them all */
if (!highest_pfn) {
- WARN(!kvm_para_available(), KERN_WARNING
- "WARNING: strange, CPU MTRRs all blank?\n");
+ WARN(!(kvm_para_available() || (is_vmware_guest())),
+ KERN_WARNING "WARNING: strange, CPU MTRRs all blank?\n");
return 0;
}
--
1.5.6.3
--
Li, Yan
"Everything that is really great and inspiring is created by the
individual who can labor in freedom."
- Albert Einstein, in Out of My Later Years (1950)
--
This patch detects whether we are running as a VMware guest or
not. Used 'official' detection code from VMware's Open Virtual Machine
Tools (open-vm-tools/checkvm), with LGPLv2.1 changed to GPLv2.
It provides a function:
int is_vmware_guest(void)
that can be used easily to detect if we are running as a VMware guest.
Currently this can be useful in suppressing false warning from mtrr
module, hope some other modules find this useful too (like adopting
less aggressive strategy on cache using, since the host is already
doing cache for us, etc.)
Signed-off-by: Yan Li <elliot.li.tech@gmail.com>
---
arch/x86/lib/Makefile | 1 +
arch/x86/lib/vmware.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++
include/asm-x86/vmware.h | 30 ++++++++++++++++++++++
3 files changed, 92 insertions(+), 0 deletions(-)
create mode 100644 arch/x86/lib/vmware.c
create mode 100644 include/asm-x86/vmware.h
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index aa3fa41..8327a12 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -8,6 +8,7 @@ lib-y := delay.o
lib-y += thunk_$(BITS).o
lib-y += usercopy_$(BITS).o getuser.o putuser.o
lib-y += memcpy_$(BITS).o
+lib-y += vmware.o
ifeq ($(CONFIG_X86_32),y)
lib-y += checksum_32.o
diff --git a/arch/x86/lib/vmware.c b/arch/x86/lib/vmware.c
new file mode 100644
index 0000000..39c9360
--- /dev/null
+++ b/arch/x86/lib/vmware.c
@@ -0,0 +1,61 @@
+/*
+ * Check if we are running as a VMware guest or not
+ *
+ * Copyright (C) 2007 VMware, Inc. All rights reserved.
+ * Adapted to Linux by Yan Li <elliot.li.tech@gmail.com>
+ * from open-vm-tools/checkvm
+ * from VMware's Open Virtual Machine Tools (under LGPLv2.1)
+ * (open-vm-tools.sourceforge.net)
+ *
+ * The original codes from VMware are licensed under LGPLv2.1, I (Yan
+ * Li) converted the following parts to use GPL license as stated in
+ * COPYING file of Linux.
+ */
+
+#include <linux/kernel.h>
+#include ...If you want this to be used by more callsites, it probably doesn't make sense to have it print out a message each time. In fact would it make more sense to have a framework (cpu feature flag?) to detect that we're in any virtualized environment and make this one of the detection routines, and perhaps cache the result. Especially if this detection would be used to manage anything near a hot-path in the page cache as you suggested. But maybe that's overkill. --
Sure. Another possible solution is to print that message at the first
call only, it's good for debugging to keep that in dmesg.
If it's not possible to do live migration in/out of a running VMware
environment, we can also safely do this detection only at the first
That sounds good too. And I think the current routines for detecting
KVM, Xen and VMWare are all ready. I can do that if there are more
positive feedbacks.
Thanks.
--
Li, Yan
"Everything that is really great and inspiring is created by the
individual who can labor in freedom."
- Albert Einstein, in Out of My Later Years (1950)
--
hm, i know it's not your fault as you just took this vmware code, but this is really not an acceptable method of detection. The above is totally unsafe to do on native hardware - we dont know whether there's anything on that port. vmware could have used one of the following methods to communicate to the guest kernel: - a CPUID and an MSR range - like a good virtual CPU should. That way even bootloaders could detect the presence of vmware. - or a PCI ID and a PCI driver like KVM does - or a system call hypercall gateway like Xen and KVM does - or it could even have used a DMI signature of some sort but no, vmware had to use 30 year old unsafe ISA port magic... To add insult to injury that port is named 'backdoor' - very smart and confidence raising naming. Plus it does not even use some well-known PC port that is harmless to read - it has to be from the middle of the generic IO port resource range where a real PCI card could sit: 0x5658. Brilliant. is there really no vmware PCI ID to query? Could you post the lspci -v output of your vmware guest? We could add an early-quirk for one of the core vmware PCI devices (in case there are any - i bet there are). Ingo --
Yeah, I agree with you, it's a bad method. I just took it for granted
that vmware has done the necessary study and they knew what they were
doing. I have tested it on two boxes so I thought they were OK. Now I
They haven't done this. Per VMware's design, the cpuinfo in virtual
guest is identical to the underlying physical CPU. I guess they want
to send most of the code to run on underlying CPU directly and won't
I think they didn't use this way cause VMware wanted it to be
Some people are using this idea. From dmidcode, the VMware-related
parts are:
Handle 0x0001, DMI type 1, 25 bytes
System Information
Manufacturer: VMware, Inc.
Product Name: VMware Virtual Platform
Version: None
Serial Number: VMware-56 4d d2 bf 8d ea 6e ec-81 67 6d 50 42
72 07 46
UUID: 564DD2BF-8DEA-6EEC-8167-6D5042720746
Wake-up Type: Power Switch
............
Handle 0x001A, DMI type 10, 8 bytes
On Board Device 1 Information
Type: Video
Status: Disabled
Description: VMware SVGA II
I think it's pretty safe to assume all VMware products include
VMware may change the PCI ID at their will so I prefer checking the
DMI since it's easier.
So if we ditched the official method we run the risk of some false
negatives. But checking the DMI manufacturer would be good enough.
--
Li, Yan
"Everything that is really great and inspiring is created by the
individual who can labor in freedom."
- Albert Einstein, in Out of My Later Years (1950)
--
If we get false negatives that is quite frankly their problem, not ours. If nothing else, we should be able to look for a host bridge with the VMWare vendor ID -- that should arguably be safer than DMI. -hpa --
Yeah, VMware's PCI vendor Id: 0x15AD
Why using PCI vendor ID is safer than DMI?
--
Li, Yan
"Everything that is really great and inspiring is created by the
individual who can labor in freedom."
- Albert Einstein, in Out of My Later Years (1950)
--
Mostly because DMI is human-readable and therefore more likely to change for non-technical reasons. -hpa --
I found that in this situation we can't use PCI info. My intention to
do this is to fix the false warning from
arch/x86/kernel/cpu/mtrr/main.c (around L695). When booting a VMware
guest we got:
"WARNING: strange, CPU MTRRs all blank?"
For VMware guest this warning is false, just as that for a KVM guest.
This code is from mtrr_trim_uncached_memory(), and used by
setup_arch(), which is used far before PCI is ready.
Therefore I think we can only use DMI here. Any idea?
Thanks!
--
Li, Yan
"Everything that is really great and inspiring is created by the
individual who can labor in freedom."
- Albert Einstein, in Out of My Later Years (1950)
--
PCI quirks can be used almost arbitrarily early stage, see: arch/x86/kernel/early-quirks.c. Adding a VM identification callback to early-quirks.c would be fine. But if there's a reliable and specific enough DMI string that's fine as well. (but PCI is better, since it's a generally more stable enumeration interface) Ingo --
The problem here is that mtrr_trim_uncached_memory() is called 108 lines before the invocation of early_quirks(), and 48 lines before that of dmi_scan_machine(). That's quite early. The only thing ran before that is the initialization of CPU, so we have nearly nothing to use to check the fingerprint of the underlying machine. I feel It's also unfit to touch the whole PCI or DMI thing before CPU registers and memory are settled. A simple solution here is to only issue a KERN_INFO when we detected mtrr is empty and later, when we can be sure that the OS is not running as a VM, issue a warning. The later part can be done in early_quirks(). -- Li, Yan --
that still leaves the CPUID/MSR method for the virtualizer to announce ok, we can move the MTRR message further back, to after the early quirks phase. Ingo --
FWIW, it's getting pretty clear with the recent bout of Virtual PC bugs that we need virtualizer detection, and that a lot of VMs are doing various idiotic things. Again, with Virtual PC, it seems that DMI is the preferred detection method, as disgusting as it is, simply because the alternatives are the moral equivalent of ad hoc probing for ISA cards (a random I/O port for Makes sense to me. -hpa --
Detects whether we are running as a VMware guest or not. Detection is
based upon DMI vendor string.
It provides a function:
int is_vmware_guest(void)
that can be used easily to detect if we are running as a VMware guest
or not.
I haven't used PCI vendor Id since that requires copying a trunk of
codes from early_quirks() and I think copying code is not good. And
reusing codes from early_quirks() needs intrivial change to present
codes structure. Comparatively, checking "VMware" string against DMI
manufacturer is a lot more simpler (one-line code). Also there's no
evidence indicating that VMware will change their vendor string in
near future. Therefore I choose to use simpler way.
Tested on x86 and x86-64 VMs and machines.
Signed-off-by: Yan Li <elliot.li.tech@gmail.com>
---
arch/x86/Kconfig | 10 ++++++++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/vmware.c | 23 +++++++++++++++++++++++
include/asm-x86/vmware.h | 20 ++++++++++++++++++++
4 files changed, 54 insertions(+), 0 deletions(-)
create mode 100644 arch/x86/kernel/vmware.c
create mode 100644 include/asm-x86/vmware.h
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ed92864..85dfebd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -445,6 +445,16 @@ config PARAVIRT_DEBUG
Enable to debug paravirt_ops internals. Specifically, BUG if
a paravirt_op is missing when it is called.
+config VMWARE_GUEST_DETECT
+ bool "VMware guest detection support"
+ default y
+ depends on DMI && !X86_VOYAGER
+ help
+ This enables detection of running as a full-virtualized
+ VMware guest (as under VMware Workstation or VMware
+ Server). Currently this is used to suppress false warnings
+ from initialization.
+
config MEMTEST
bool "Memtest"
help
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 3db651f..a3a16a8 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -87,6 +87,7 @@ ...We can also use this feature to force the HZ value to 100 or 250 at most when running in a virtual environment, since VirtualBox had some issues with this by taking a lot of CPU time when the HZ was set to 1000. Cristi --=20 Cristi M=C4=83gheru=C8=99an, Inginer de sistem/retea Universitatea Tehnic=C4=83 din Cluj-Napoca Centrul de Comunica=C8=9Bii "Pusztai Kalman" Tel. 0264/401247 http://cc.utcluj.ro
That's good. But this function is used for detecting VMware guest only. Do you think VMware also suffers from this problem? -- Li, Yan --
Hi Yan, Thanks for doing this patch. It would be really beneficial to detect if we are running on a hypervisor in general. Though i think the approach should be more generic, so that we have a common interface for all the hypervisors. I have some patches which use "cpuid" to detect if we are running on a hypevisor and use various cpuid leafs to get some hypervisor specific info. This CPUID interface will be available only in the newer (read, Hardware version 7) version of VMware products. So still for the products which don't use the newer hardware version, this patch is helpful. Btw, are you pushing these patches for the 2.6.27 release ? If this is for the x86 tree(2.6.28) i think we should hold on, until i post the proposal for the cpuid patches, so that we can unify this and have a generic way to detect on which hypervisor are we running . Thanks, --
I don't think there is any way in hell this is going into 2.6.27. For it to make 2.6.28 it will have to be ready very soon. -hpa --
I'd like to see it in 2.6.28 to fix the false warning here. I'll take several comments here and post a improved patch soon (changing code is fast but testing them on VMs here with different configurations are time-consuming). But I think people could just start to test this version of patch since further change will be mostly cosmetic, FWIW. -- Li, Yan --
Hi Alok, Thanks for your comments. Sure, it's good to add a common interface My motivation behind this patch is to serve the MTRR codes to fix a false warning, so I'd like to see it in 2.6.28 as soon as possible. The latest 2.6.27-rc7 is issuing false warning when running under the VMware Server 1.0.7, complaining that MTRR's all blank. Currently the false warning has been confirmed under both KVM and VMware so the detection for these two VMs are added in my [PATCH 2/2]. For this specific reason (fixing false warning), a common interface maybe not necessary unless we are sure all VMs have their CPU's MTRR blank (it would be very difficult to confirm this on all VMs human has made). Therefore I'd like to make this patch as simple as possible and make into 2.6.28 since it's fixing false warning (one can say it's a regression since at least 2.6.24 doesn't issue a false warning here). Also I'd be very happy to work with you to combine this with your CPUID detection code. I think VMware Server 2.0 is using Hardware version 7 VM, right? So I can combine your and my code to test it. But my concern is that for such a simple function (detecting VMware, not a common interface), is it worth to have more codes to use two different ways for detecting new and older VMwware while a simple dmi_name_in_vendors() might be enough in both situation? I don't think bloating the kernel is good. Thanks! -- Li, Yan --
It's not a false warning. It's a true warning. -hpa --
You can say that warning is justified because it's true that the MTRR's are all blank in VMware. But that warning is no good to VMware users since it's by-design all MTRR's are blank. For example, we've scripts watching the log for lines containing "warning" and "error" on all production servers. So this warning actually is "false" to VMware guest users. A KERN_INFO message should be enough here. We already have a checking to suppress warning on KVM so I think we should also suppress the warning for VMware guest and Virtual PC. -- Li, Yan --
I don't know for sure about VMware, but someone who has it installed can try it. I had this issue with a CentOS 5-server virtual machine downloaded from http://www.thoughtpolice.co.uk/vmware/ The fix consisted in using a kernel compiled with the HZ value set to 100 instead of the default which was 1000. Cristi --=20 Cristi M=C4=83gheru=C8=99an, Inginer de sistem/retea Universitatea Tehnic=C4=83 din Cluj-Napoca Centrul de Comunica=C8=9Bii "Pusztai Kalman" Tel. 0264/401247 http://cc.utcluj.ro
HZ is a compile-time constant, though. Changing that would require adding a bunch of general divides, at the very least. -hpa --
Oh I never heard about this. I've been using several VMware VMs (combined RHEL and SLES and Debian) but haven't seen such issue. What's the symptom? -- Li, Yan --
Having a high HZ slows down VMs and also leads to tick loss (time drift). HZ 100 or 250 is recommended by most Vendors. I think some distributions use the reduced HZ value by default anyway, so you might never had a problem with this. It is also only a problem with system load and timekeeping, which is not always obvious. When running a lot of nearly idle VM Guests you might see it. Gruss Bernd --
I'd like to make this a general VM platform detection subsystem. We have similar issues with Virtual PC, and again, DMI appears to be the sanest way to detect it -- at least to a primary screen. -hpa --
I think if Virtual PC has similar problems we should add codes to detect Virtual PC to be used by mtrr/main.c. A general interface might not be good for this specific problem (false MTRR blank warning) since we have no way to know all VMs has MTRR set to blank thus handling KVM, VMware and Virtual PC here should be enough for now. If you can tell me what manufacture vendor string is in Virtual PC I can make another similar patch using dmi_name_in_vendors(). So at first I'd like to see the false warning for VMware and Virtual PC get fixed soon. -- Li, Yan --
Supposedly "Microsoft Corporation", "Virtual Machine". No idea what pre-MS versions of VPC return. -hpa --
Thanks, so we can begin with this. -- Li, Yan --
From: i8042-x86ia64io.h
{
.ident = "Microsoft Virtual Machine",
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
DMI_MATCH(DMI_PRODUCT_NAME, "Virtual Machine"),
DMI_MATCH(DMI_PRODUCT_VERSION, "VS2005R2"),
},
},
--
Why do we need to do this within the kernel, what is that going to achieve? People can do this easily in userspace if they need to detect this, I think there's a patch for util-linux-ng adding such a simple utility that handles almost all of the known virtualization engines right now. thanks, greg k-h --
Hi Greg, For me this is used in the next patch (for mtrr/main.c) to suppress an unnecessary warning when running as a VMware guest: http://lkml.org/lkml/2008/9/24/144 We already have code to suppress warning under KVM so the above patch suppress warnings for VMware guest also. H. Peter Anvin and Alok kataria are also proposing we may need a more general approach for detecting hypervisors that can be used for some other quirks. Thanks. -- Li, Yan --
Well, having a config option like this isn't the way to go as it will be forced on for all distros and users anyway. A simple cpuid test is the easier way to do this, that's what the userspace tools do, if it's really needed in the kernel. But hopefully, such things shouldn't be needed within the kernel as it's not Linux's fault that the hypervisor has bugs in it :) We wouldn't be wanting to work around bugs in Microsoft's hypervisor, would we? thanks, greg k-h --
I think it's a common practice for VM to blank the MTRRs rather than a bug. Many hypervisors (KVM, VMware, Virtual PC) are doing this since long before. Therefore I think issuing a warning here complaining My idea is that this should be included in all general purpose kernels or the vendors may have to cope with flood questions about boot time warnings when using under VMware/KVM/Virtual PC. It's configurable so good for vendors who wish to provide different kernels for using with A simple CPUID test is good but can't be used for VMware guest since they just use underlying CPUID, so nothing special here can be -- Li, Yan --
We pretty much have to, just as we have to work around bugs in, say, AMD's microcode. We have avoided it so far, but it's gotten to a breaking point, and rather than having ad hoc hacks scattered all over the place I want a centralized test site setting a single global variable. Unfortunately, hypervisor vendors haven't adopted a uniform detection scheme (CPUID level 0x40000000 is sometimes mentioned as a pseudo-standard, but it's not universal, and not all virtualization solutions even can override CPUID.) -hpa --
Ah, I was hoping they were all doing this, as it seems the most "sane" manner. Good luck :) greg k-h --
That sounds great but technically this centralized test can only be done after dmi_scan_machine(), so it can't help the detection code in mtrr_trim_uncached_memory() which is ran very early before dmi_scan_machine(). So I think my patch is still necessary unless we want to live with the warning message in all VMware guest. -- Li, Yan --
Sorry for joining the discussion this late. But i only noticed this
after somebody pointed me to it.
Even if there is anything on that port on native hardware it would
work perfectly well and is _safe_.
First let me post the code to access this backdoor port (the way it
should really be done )
-------------------------------------------------------------------------------
#define VMWARE_BDOOR_MAGIC 0x564D5868
#define VMWARE_BDOOR_PORT 0x5658
#define VMWARE_BDOOR_CMD_GETVERSION 10
#define VMWARE_BDOOR_CMD_GETHZ 45
#define VMWARE_BDOOR_CMD_LAZYTIMEREMULATION 49
#define VMWARE_BDOOR(cmd, eax, ebx, ecx, edx) \
__asm__("inl (%%dx)" : \
"=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) : \
"0"(VMWARE_BDOOR_MAGIC), "1"(VMWARE_BDOOR_CMD_##cmd), \
"2"(VMWARE_BDOOR_PORT), "3"(0) : \
"memory");
static inline int vmware_platform(void)
{
uint32_t eax, ebx, ecx, edx;
VMWARE_BDOOR(GETVERSION, eax, ebx, ecx, edx);
return eax != (uint32_t)-1 && ebx == VMWARE_BDOOR_MAGIC;
}
------------------------------------------------------------------------------------------------------
So whenever we query port 0x5658 , with the GETVERSION command (which
is the first thing we do with this port), we expect that eax !=
0xFFFFFFFF and ebx has a VMWARE specific MAGIC value. Please note
that ebx has been initialized to zero in the code above.
Now consider the 2 possible cases on Native hardware
1. Nothing on port 0x5658
In this case the hardware will write a value == 0xFFFFFFFF which will
result in vmware_platform returning zero.
2. Device on port 0x5658
In this case the hardware may return a legitimate value in register
eax, but won't update register ebx. Whereas we check for a MAGIC value
in ebx for this port access. The result is vmware_platform returning
zero.
Also ...You have no idea what you just did to a real piece of hardware. -hpa --
Why ? what do you mean ? ebx is a local variable in the code above that i posted. Only when on hypervisor will we write the magic value over there. How can this affect native hardware, i fail to understand. Please explain. Thanks, --
You accessed a bloody I/O port! If you think it's harmless because it was an IN, you're sorely mistaken. -hpa --
Hi Peter, It would be really helpful if you could explain me when can this go wrong or what kinds of problems can this cause on native hardware. Thanks, --
You accessed an unknown I/O port. This means you caused an unknown action in an unknown peripheral device. This could cause ANYTHING to happen. -hpa --
Hmm...what can a IN on an unknown port cause on native hardware, if a port is not being used it would return 0xFFFFFFFF in eax, and if you have a real device there (a sane one), what can IN result in apart from reading some IO register/counter value in eax ? If there is anything apart from the above 2 outcomes, please let me know exactly what you mean. Thanks, Alok --
First, you are assuming all devices are "sane". This is obviously wrong -- you're poking in hyperspace, and you don't know if you're going to hit someone's ancient controller card that perhaps drives a medical accelerator for all you know. Second, you are assuming that devices you call "sane" don't have I/O ports with read side effects. Many, if not most, devices have some I/O ports with read side effects, especially read-clear semantics and/or queue drain operations. Third, in the real world hardware is buggy. Not just a little, but severely so. Accessing a part of a device which is uninitialized, powered down or plain broken can wedge the device or the whole system. In short, poking at I/O ports which you don't know what they are at best takes us bad to the bad old days of ISA probing (without the protection of customary address assignments); I think it has to be an absolutely last resort and would be reflective of utterly incompetent design. It is significantly *worse* than stealing random opcodes, Virtual PC-style, and that is also unacceptable. -hpa --
Changing the status of the device that is actually at the port, losing IRQs, hanging the bus solid, causing data corruption (eg if you probe the address of the data port of something like a disk doing a block transfer) Alan --
Peter's right. The hardware device simply sees an I/O request to a port and a read / write. The actual internal implementation may not even see whether it is a R/W, and may do anything. Some of our virtual hardware is activated in strange ways, reads where you would expect writes, etc. For example, a device made by Exploder Technologies has two ports, 3686, and 3687. Read access to port 3686 returns the status register and any access at all to port 3687 moves the robot arm, activating the thermonuclear self destruct device and destroying the earth. I have such a device in my basement, but I have to be careful not to issue any I/O to port 3687 on it, whether it is writes OR reads. A less contrived example is a LFSR that returns a new cryptographically random value on every read. Reading a register would cause a state change in the hardware, and this could be fatal to something that requires exact synchronization of tokens, perhaps securID type applications. All that said, we've never encountered an I/O device that uses this ISA port for anything at all. However, some old broken hardware might misdecode bus addresses and try to service the I/O request anyway. So while it might be an acceptible way for us to use in VMware tools where it only ever makes sense to install in a VM anyway, it could be considered non-appropriate for general kernel application. The whole backdoor thing is also broken because it requires non-architectural side effects to operate (IN instructions can not arbitrarily change all GPRs). This can confuse applications which are very smart and try to single step over the instruction by emulating it, logging the port I/O, then restoring GPRs to the state before execution and writing the 1 register affected by the IN. Such clever debuggers and profiling tools have been written. Zach --
To be fair, SMM sometimes also play these kinds of games -- even though it is equally frowned upon there. However, it is the particular use of this for detection use that is utterly damning. Using random I/O port probes for hardware detect should have disappeared in the early 1990's, and it's really disturbing that virtualization vendors -- not just VMWare -- are, in effect, re-making all the mistakes hardware vendors did in the 1980's. Fortunately, we can usually use DMI to bail us out. Just like we used to look for magic strings in the VGA BIOS so we could figure out what exact kind of SuperVGA card we have. -hpa --
It's not disturbing, it's expected. Re-using old broken solutions happens all the time, they can be perfectly valid in some contexts. The problem is that they tend to live on and evolve into a larger context where they break again. Surely we can do better, but how to do that isn't always clear-cut. DMI is a pretty good standard for this, but it still doesn't solve the problem in all contexts (userspace apps). Zach --
This, of course, is what CPUID is for. -hpa --
I remember from my old mainframe days that IBM actually got this right all the way back in 1967 for the *first* product called VM - the 'Store CPUID' instruction would give a specified result when executed on bare iron, but if you did it inside a virtual machine running under CP-67 it would give a documented different value that couldn't happen on bare iron. So this mistake goes back a lot further than the 80s... :)
... except that it doesn't always work. It requires vmx/svm, otherwise cpuid doesn't trap and thus can't be filled by the hypervisor ... cheers, Gerd --
Which would be hardware implementers in the '80s getting wrong the first few times what IBM did right the first time. Doesn't *anybody* do literature searches before doing stuff anymore? ;)
Well, yes. There are some prety strong reasons to believe that Intel got that one wrong *deliberately*, until VMware finally forced their hand. -hpa --
Any details? Did intel try to force people to ia64? -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
instruction which operates differently when running on bare metal vs. in a hypervisor is never acceptible unless the instruction is trappable. There will always be a guest which refuses to operate propely in a hypervisor, either by defect or by design, so 'sensitive' instructions should always be trappable. Do it right and you can nest recursively ;) Zach --
