Re: [BUG] x86 kenel won't boot under Virtual PC

Previous thread: [RFC][PATCH] Remove cgroup member from struct page by Balbir Singh on Sunday, August 31, 2008 - 10:47 am. (72 messages)

Next thread: [PATCH] kernel/cpu.c: Move the CPU_DYING notifiers by Manfred Spraul on Sunday, August 31, 2008 - 10:58 am. (7 messages)
From: David Sanders
Date: Sunday, August 31, 2008 - 11:22 am

I recently discovered that x86 kernels won't boot under Virtual PC.  In this 
case paravirt was not built into the kernel.  The kernel just "hangs" on 
attempted boot with no error messages.  Git-bisect pinpointed the following 
commit as the problem:

commit 32c464f5d9701db45bc1673288594e664065388e
Author: Jan Beulich <jbeulich@novell.com>
Date:   Wed Oct 17 18:04:41 2007 +0200

    x86: multi-byte single instruction NOPs

    Add support for and use the multi-byte NOPs recently documented to be
    available on all PentiumPro and later processors.

    This patch only applies cleanly on top of the "x86: misc.
    constifications" patch sent earlier.

    [ tglx: arch/x86 adaptation ]

    Signed-off-by: Jan Beulich <jbeulich@novell.com>
    Signed-off-by: Andi Kleen <ak@suse.de>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

     arch/x86/kernel/alternative.c  |   23 ++++++++++++++++++++++-
     include/asm-x86/processor_32.h |   22 ++++++++++++++++++++++
     include/asm-x86/processor_64.h |   22 ++++++++++++++++++++++
eb662fea6ed7904074c87dd004b9f6d23964c844 M      include

Any suggestion on where to go from here would be appreciated.
--

From: Linus Torvalds
Date: Sunday, August 31, 2008 - 11:47 am

What CPU does Virtual PC emulate? As far as Wikipedia is concerned (not 
that I'd take it on complete faith) it emulates a 32-bit Intel Pentium II.

And that commit makes the kernel use the "P6 nops" for such hardware. 
Maybe Virtual PC doesn't support the newer intel nop things?

Intel docs say that it should be available on any intel CPU that has 
CPUID.01H.EAX[11:8] = 0110B or 1111B. That's the "family ID", and Pentium 
II should have a family ID of 6 (ie that 0110B case).

So it sounds like a Virtual PC bug, but I dunno. And maybe we should just 
use the legcay nops for anything that isn't modern (ie P4+ or Core)?

		Linus
--

From: Arjan van de Ven
Date: Sunday, August 31, 2008 - 12:27 pm

On Sun, 31 Aug 2008 11:47:04 -0700 (PDT)

it's probably even a security bug in that I don't see what would be
stopping a ring 3 user process from executing these instructions...

--

From: Linus Torvalds
Date: Sunday, August 31, 2008 - 12:39 pm

Well, it could be that Virtual PC raises a #UD exception in the virtual 
machine. In user space, that would just cause the kernel to kill the poor 
innocent victim. But when the kernel gets a #UD exception on what it 
expects to be a nop, it just won't work.

		Linus
--

From: David Sanders
Date: Friday, September 5, 2008 - 8:38 am

31 Aug 2008, David Sanders wrote:


Sorry about the confusion. I was just learning git and applied a couple 
patches wrong that lead me to believe I had not found the answer, when in 
fact I had.  The problem is the multibyte NOP instructions introduced into 
the kernel in 2007.  For 2.6.24-rc1 reverting the patch (32c464f5) corrects 
the problem.  But then of course we cannot use these instructions as we would 
like.  I propose the following patch which adds a boot-time parameter to not 
use these multibyte NOPs in systems that are unprepared to deal with them.  
The patch disables a optimization used in one place by commenting out some 
lines in nop.h.  Please comment.

    x86: Fix to multibyte NOPs breaking Virtual PC (win)

    To be applied to linux-2.6.27-rc5.

    Signed-off-by: David Sanders <linux@sandersweb.net>

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 2763cb3..6a63e78 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -58,6 +58,15 @@ static int __init setup_noreplace_paravirt(char *str)
 __setup("noreplace-paravirt", setup_noreplace_paravirt);
 #endif
 
+static int intel_legacy_nops = 0;
+
+static int __init setup_intel_legacy_nops(char *str)
+{
+	intel_legacy_nops = 1;
+	return 1;
+}
+__setup("intel-legacy-nops", setup_intel_legacy_nops);
+
 #define DPRINTK(fmt, args...) if (debug_alternative) \
 	printk(KERN_DEBUG fmt, args)
 
@@ -165,13 +174,14 @@ static const struct nop {
 const unsigned char *const *find_nop_table(void)
 {
 	const unsigned char *const *noptable = intel_nops;
-	int i;
+	int i = 0;
 
-	for (i = 0; noptypes[i].cpuid >= 0; i++) {
+	while (noptypes[i].cpuid >= 0 && !intel_legacy_nops) {
 		if (boot_cpu_has(noptypes[i].cpuid)) {
 			noptable = noptypes[i].noptable;
 			break;
 		}
+		i++;
 	}
 	return noptable;
 }
diff --git a/include/asm-x86/nops.h b/include/asm-x86/nops.h
index ad0bedd..a6f0571 100644
--- a/include/asm-x86/nops.h
+++ ...
From: Jan Beulich
Date: Friday, September 5, 2008 - 9:15 am

I disagree here: If I configure a 686+ kernel, I expect these NOPs to be
that way (and to work). If you want to run on something that's not
compliant, you just shouldn't configure your kernel that way.

Jan

--

From: Linus Torvalds
Date: Friday, September 5, 2008 - 9:39 am

Well, if you actually do a

	git grep 'ASM_NOP[0-9]'

you'll find that just the _definitions_ of those things are the bulk of it 
BY FAR, and that there doesn't seem to be a single user that cares even 
remotely about performance.

So I actually think that the whole thing is a waste of time. We should 
probably 

 - pick a single set of NOP's per 32-bit/64-bit (since the good nops in 
   32-bit aren't 64-bit instructions at all, so we do want different nops 
   depending on _that_)

   The whole static choice by microarchitecture is pure garbage.

 - Probably also just declare that those default nops are single 
   instructions, just so that we never even have to think about it from a 
   dynamic replacement angle.

   Look at the uses again, and realize that it really is just pure garbage 
   to have this kind of complex and subtle stuff going on.

 - Move the optimized nop definitions (K7_NOPx etc) to the only place that 
   cares - asm/x86/kernel/alternative.c. When we do things _dynamically_, 
   it can actually make sense to pick a nop more precisely, but for this 
   whole static thing it's just a pain.

IOW, if it actually _worked_ reasonably, I wouldn't care. But clearly it 
doesn't. And once it's not working reasonably, it should be fixed.

		Linus
--

From: Ingo Molnar
Date: Friday, September 5, 2008 - 11:43 am

yes - we had 3-4 tries already and while it looked worthwile initially 
it's clearly not showing signs of getting more robust and whatever 
benefits there might be slightly better NOPs is dwarved by all these 
robustness problems. Peter?

	Ingo
--

From: H. Peter Anvin
Date: Friday, September 5, 2008 - 1:06 pm

There is already a fix for this based on a first-principles test in tip.

	-hpa


--

From: Jeremy Fitzhardinge
Date: Friday, September 5, 2008 - 12:08 pm

Well, the paravirt_ops patching uses multibyte nops to pad out the
unused space in a patch site, and they're generally on hot paths
(otherwise we wouldn't bother with patching).  But even then I don't
think the particular nop chosen matters all that much, and even if it
did using the dumb redundant prefix long nops seems to be as good as or
better than the p6 nops.  The "call mcount" patching ftrace wants to do


Yep.  We could run the p6 nops with an exception handler to see if the
cpu actually supports them or not.  And if not, just fall back to
something simple and good enough.

    J
--

From: H. Peter Anvin
Date: Friday, September 5, 2008 - 1:12 pm

Okay, we do not generate P6 NOPs statically if CONFIG_X86_GENERIC_CPU is 
enabled -- unless, of course gcc/binutils generates them, which they now 
do for certain CPU types.  If you are selecting a specific CPU *and* you 
don't select GENERIC, you should get valid code for the selected CPU only.

As far as using them dynamically, there is code in -tip that determines 
dynamically and explicitly if the long NOPs are available, and only uses 
them if they are.

	-hpa
--

From: Linus Torvalds
Date: Friday, September 5, 2008 - 9:30 am

Hmm.. I'm not a huge fan of the ASM_NOP mess, but you also disable it for 
64-bit x86 too.

On 32-bit, at least the generic nops are fairly reasonable, but the 
default nops for 64-bit really look pretty sad, and the P6 nops really do 
look better.

So I would suggest perhaps moving the static P6 nop selection into the 
CONFIG_X86_64 thing.

The alternative is to just get rid of that static nop selection, and just 
have two cases: 32-bit and 64-bit, and just pick obviously safe cases for 
them. 

So I think that particular part would be better off with changing the 
Kconfig language instead. Ie something like this..

(I removed the 32-bit CPU's from the choices, except for MPENTIUM4 that 
really should be merged with MPSC - the difference between MPENTIUM4 and 
MPSC seems to be just a totally bogus 32-bit vs 64-bit thing. Yes, yes, 
there are PENTIUM4's without 64-bit, but there are also Prescott chips 
that run 32-bit kernels, so the thing is a bit confused.

		Linus

---
 arch/x86/Kconfig.cpu |   13 ++++++++-----
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index 2c518fb..b225219 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -382,14 +382,17 @@ config X86_OOSTORE
 # P6_NOPs are a relatively minor optimization that require a family >=
 # 6 processor, except that it is broken on certain VIA chips.
 # Furthermore, AMD chips prefer a totally different sequence of NOPs
-# (which work on all CPUs).  As a result, disallow these if we're
-# compiling X86_GENERIC but not X86_64 (these NOPs do work on all
-# x86-64 capable chips); the list of processors in the right-hand clause
-# are the cores that benefit from this optimization.
+# (which work on all CPUs).  In addition, it looks like Virtual PC
+# does not understand them.
+#
+# As a result, disallow these if we're not compiling for X86_64 (these
+# NOPs do work on all x86-64 capable chips); the list of processors in
+# the right-hand ...
From: Andi Kleen
Date: Friday, September 5, 2008 - 10:55 am

Heh -- I originally only added it because you're asked for it ...

Yes going back to some simple generic nops and avoid all these problems
would be a great thing.

-Andi
--

From: David Sanders
Date: Sunday, August 31, 2008 - 1:03 pm

Virtual PC does not emulate a processor like it does with the motherboard, 
video, NIC ,etc.  What you see in the virtual machine is the actual 
processor, except that Virtual PC looks at all of your instructions and 
modifies ring 0 code and the like.  It may be that Virtual PC was not 
designed with an awareness of these nops that the commit added.

I would suggest an configuration option to select legacy-nops or newer-nops 
and a kernel boot-time parameter so it can be disabled to allow installation 
of a distribution for example.

I would be happy to submit such a patch if you agree (or I'll try to anyway).
--

From: David Sanders
Date: Monday, September 1, 2008 - 1:23 pm

OK, below is my patch.  The problem is I just built with it and the boot 
problem still exists.  Either the patch is flawed or git-bisect gave me the 
wrong commit.  Any help would be appreciated.
David

Signed-off-by: David Sanders <linux@sandersweb.net>

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ed92864..a38f362 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -445,6 +445,13 @@ config PARAVIRT_DEBUG
          Enable to debug paravirt_ops internals.  Specifically, BUG if
 	 a paravirt_op is missing when it is called.
 
+config INTEL_NOPS
+	bool "Use legacy x86 NOPS"
+	depends on X86_32
+	help
+	This option tells kernel to use legacy x86 nops
+
+
 config MEMTEST
 	bool "Memtest"
 	help
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 2763cb3..10713e4 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -58,6 +58,16 @@ static int __init setup_noreplace_paravirt(char *str)
 __setup("noreplace-paravirt", setup_noreplace_paravirt);
 #endif
 
+static int intel_legacy_nops = 0;
+
+static int __init setup_intel_legacy_nops(char *str)
+{
+	intel_legacy_nops = 1;
+	return 1;
+}
+
+__setup("intel-legacy-nops", setup_intel_legacy_nops);
+
 #define DPRINTK(fmt, args...) if (debug_alternative) \
 	printk(KERN_DEBUG fmt, args)
 
@@ -167,12 +177,17 @@ const unsigned char *const *find_nop_table(void)
 	const unsigned char *const *noptable = intel_nops;
 	int i;
 
-	for (i = 0; noptypes[i].cpuid >= 0; i++) {
-		if (boot_cpu_has(noptypes[i].cpuid)) {
-			noptable = noptypes[i].noptable;
-			break;
+	if (!intel_legacy_nops) {
+#ifndef CONFIG_INTEL_NOPS
+		for (i = 0; noptypes[i].cpuid >= 0; i++) {
+			if (boot_cpu_has(noptypes[i].cpuid)) {
+				noptable = noptypes[i].noptable;
+				break;
+			}
 		}
+#endif /* CONFIG_INTEL_NOPS */
 	}
+
 	return noptable;
 }
--

From: Linus Torvalds
Date: Monday, September 1, 2008 - 3:22 pm

Try just reverting the commit that you bisected to, and double-check. If 
your bisect was flawed, there's no point even staring at the nop code ;)

		Linus
--

From: David Sanders
Date: Tuesday, September 2, 2008 - 5:08 am

Thanks Linus.
I reverted the commit and the problem persits, so nops are definitely not the 
problem.  I started an new bisection but am now in a state that won't compile 
due to errors.  What do I do in that case?
David
--

From: Linus Torvalds
Date: Tuesday, September 2, 2008 - 11:12 am

You can try "git bisect skip", but in general the better choice is to just 
do

	git bisect visualize

to open up gitk with the current set of possible targets, and then just 
pick a likely point by hand, and do

	git reset --hard <sha1-of-the-thing-you-picked>

and try that one instead. There's some talk about this in "man git-bisect"
but maybe it's not very good.

			Linus
--

From: David Sanders
Date: Tuesday, September 2, 2008 - 11:44 am

OK thanks, I had to build a new version of git because the one installed with 
my distribution was too old to have git bisect skip.  I am trying to use the 
gitk method you mentioned but it is going slow and on my screen gitk's fonts 
are very hard to read.
David
--

From: Peter Zijlstra
Date: Wednesday, September 3, 2008 - 4:09 am

Yeah - gitk's default fonts are a nightmare, I get a semi readable
interface by adding this to ~/.gitk

set mainfont {{dejavu sans mono} 10}
set textfont {{dejavu sans mono} 9}
set uifont {{dejavu sans mono} 10 bold}

But it seems the tcl/tk font rendering is pretty horrible all-round. I
generally prefer to use qgit for this reason.

--

From: David Sanders
Date: Wednesday, September 3, 2008 - 4:20 am

Thanks that is a big improvement.
--

Previous thread: [RFC][PATCH] Remove cgroup member from struct page by Balbir Singh on Sunday, August 31, 2008 - 10:47 am. (72 messages)

Next thread: [PATCH] kernel/cpu.c: Move the CPU_DYING notifiers by Manfred Spraul on Sunday, August 31, 2008 - 10:58 am. (7 messages)