Re: 2.6.{26.2,27-rc} oops on virtualbox

Previous thread: [RFC v2][PATCH 0/9] kernel-based checkpoint-restart by Dave Hansen on Wednesday, August 20, 2008 - 12:25 pm. (13 messages)

Next thread: USB Serial device disconnect causes IRQ disable by amruth on Wednesday, August 20, 2008 - 12:34 pm. (24 messages)
From: Luiz Fernando N. Capitulino
Date: Wednesday, August 20, 2008 - 12:29 pm

Hi there,

 Users of different Linux distros (includes Ubuntu, Mandriva, ArchLinux and
possibly Fedora) are reporting that kernel 2.6.26.2 is OOPSing in the
Virtual Box emulator[1].

 It is not clear if this is a kernel or Virtual Box bug, but as the kernel
is also OOPsing in QEMU (although with different behaivor) I have decided
to post my debug results here in case someone is interested in debugging
the kernel part further.

 I have done a bisection by hand among kernel versions and found that
the commit which triggers the oops in _Virtual Box_ was introduced in
2.6.26-rc1 and the problem also happens with latest Linus tree.

 By using git bisect I found that the commit is this:

"""
commit e587cadd8f47e202a30712e2906a65a0606d5865
Author: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Date:   Thu Mar 6 08:48:49 2008 -0500

    x86: enhance DEBUG_RODATA support - alternatives

[...]

"""

 By reverting this commit I don't get the OOPS anymore. I have
tested with 2.6.26-rc1 and latest Linus tree (2.6.27-rc3).

 What puzzles me though is that a similar problem happens with
QEMU, but it also OOPSes with kernels before 2.6.26-rc1,
reverting the patch above makes no difference and it works with
current Linus tree.

 Does this look like a kernel bug?

 All my tests have been done with vanilla kernels, but I have
built .iso installation images with them and I'm not sure of
what the build script does.

[1] http://en.wikipedia.org/wiki/Virtual_box

 Thanks for reading this.

-- 
Luiz Fernando N. Capitulino
--

From: H. Peter Anvin
Date: Thursday, August 21, 2008 - 2:34 pm

No, it looks like a very common virtualizer bug.  Does the attached 
patch work for you?

	-hpa

From: H. Peter Anvin
Date: Thursday, August 21, 2008 - 11:42 pm

Also, in addition to this, please try tip:master.  There is a patch in 
tip:master which I hope should fix this problem, but the details are 
important.

	-hpa
--

From: Ingo Molnar
Date: Thursday, August 21, 2008 - 11:50 pm

access coordinates would be at:

  http://people.redhat.com/mingo/tip.git/README

	Ingo
--

From: Luiz Fernando N. Capitulino
Date: Friday, August 22, 2008 - 7:39 am

Em Fri, 22 Aug 2008 08:50:12 +0200
Ingo Molnar <mingo@elte.hu> escreveu:

| 
| * H. Peter Anvin <hpa@zytor.com> wrote:
| 
| > H. Peter Anvin wrote:
| >>>
| >>>  Does this look like a kernel bug?
| >>>
| >>
| >> No, it looks like a very common virtualizer bug.  Does the attached  
| >> patch work for you?
| >>
| >
| > Also, in addition to this, please try tip:master.  There is a patch in 
| > tip:master which I hope should fix this problem, but the details are 
| > important.
| 
| access coordinates would be at:
| 
|   http://people.redhat.com/mingo/tip.git/README

 As I already have Linus tree downloaded I have cloned it in
the usual way.

 Got the same results: OOPS in virtualbox but it works on QEMU.

 The OOPS's output follows and I have attached the .config I'm using
to reproduce the problem.

"""
BUG: unable to handle kernel NULL pointer dereference at 00000246
IP: [<c01310f1>] vprintk+0x181/0x440
*pde = 00000000 
Oops: 0002 [#1] SMP 
Modules linked in:

Pid: 1, comm: swapper Not tainted (2.6.27-rc4-test24-tip #3)
EIP: 0060:[<c01310f1>] EFLAGS: 00010246 CPU: 0
EIP is at vprintk+0x181/0x440
EAX: 00000246 EBX: 00000000 ECX: c0130ca9 EDX: 0000dedd
ESI: c0474ae3 EDI: c04cf6bc EBP: c7435f24 ESP: c7435eb0
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
Process swapper (pid: 1, ti=c7434000 task=c7438000 task.ti=c7434000)
Stack: 0000dedd c0130ca9 c7435f40 00000000 a026104f a026106c c7434000 c7435ee6 
       00000006 00000246 00000000 a0260cf3 0000001c c7434000 00000282 00000046 
       c11a85a0 c7435efc c0135c6f c7435f14 c0115fcb a0296e91 c0104c2c 00000000 
Call Trace:
 [<c0130ca9>] ? release_console_sem+0x199/0x1e0
 [<c0135c6f>] ? irq_exit+0x3f/0x90
 [<c0115fcb>] ? smp_apic_timer_interrupt+0x5b/0x90
 [<c0104c2c>] ? apic_timer_interrupt+0x28/0x30
 [<c0474ae3>] ? net_ns_init+0x0/0x1ad
 [<c0474ae3>] ? net_ns_init+0x0/0x1ad
 [<c0346ed9>] ? printk+0x18/0x1f
 [<c0474b00>] ? net_ns_init+0x1d/0x1ad
 [<c0474ae3>] ? net_ns_init+0x0/0x1ad
 [<c0101116>] ? ...
From: Mathieu Desnoyers
Date: Friday, August 22, 2008 - 8:34 am

Can you try booting with the kernel argument :
  debug_alternative 

The dmesg of the kernel bootup up to the oops would be helpful.

My guess is that there may be something wrong with irq disabling which
protects text_poke_early in apply_alternatives().




-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--

From: Luiz Fernando N. Capitulino
Date: Friday, August 22, 2008 - 9:29 am

Em Fri, 22 Aug 2008 11:34:52 -0400
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> escreveu:

| * Luiz Fernando N. Capitulino (lcapitulino@mandriva.com.br) wrote:
| > Em Fri, 22 Aug 2008 08:50:12 +0200
| > Ingo Molnar <mingo@elte.hu> escreveu:
| > 
| > | 
| > | * H. Peter Anvin <hpa@zytor.com> wrote:
| > | 
| > | > H. Peter Anvin wrote:
| > | >>>
| > | >>>  Does this look like a kernel bug?
| > | >>>
| > | >>
| > | >> No, it looks like a very common virtualizer bug.  Does the attached  
| > | >> patch work for you?
| > | >>
| > | >
| > | > Also, in addition to this, please try tip:master.  There is a patch in 
| > | > tip:master which I hope should fix this problem, but the details are 
| > | > important.
| > | 
| > | access coordinates would be at:
| > | 
| > |   http://people.redhat.com/mingo/tip.git/README
| > 
| >  As I already have Linus tree downloaded I have cloned it in
| > the usual way.
| > 
| >  Got the same results: OOPS in virtualbox but it works on QEMU.
| > 
| >  The OOPS's output follows and I have attached the .config I'm using
| > to reproduce the problem.
| > 
| 
| Can you try booting with the kernel argument :
|   debug_alternative 
| 
| The dmesg of the kernel bootup up to the oops would be helpful.
| 
| My guess is that there may be something wrong with irq disabling which
| protects text_poke_early in apply_alternatives().

 I have attached two files:

  - normal.txt: normal boot with no debug options
  - debug-alternative.txt ignore_loglevel and debug-alternative boot
    options

 I had to pass ignore_loglevel otherwise it wouldn't print
anything.

-- 
Luiz Fernando N. Capitulino
From: Mathieu Desnoyers
Date: Friday, August 22, 2008 - 9:35 am

Ok, now can you try booting with either of those args :

noreplace-paravirt
noreplace-smp

And see which one(s) works ?

Thanks,



-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--

From: Luiz Fernando N. Capitulino
Date: Friday, August 22, 2008 - 10:20 am

Em Fri, 22 Aug 2008 12:35:20 -0400
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> escreveu:

| * Luiz Fernando N. Capitulino (lcapitulino@mandriva.com.br) wrote:
| > Em Fri, 22 Aug 2008 11:34:52 -0400
| > Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> escreveu:
| > 
| > | * Luiz Fernando N. Capitulino (lcapitulino@mandriva.com.br) wrote:
| > | > Em Fri, 22 Aug 2008 08:50:12 +0200
| > | > Ingo Molnar <mingo@elte.hu> escreveu:
| > | > 
| > | > | 
| > | > | * H. Peter Anvin <hpa@zytor.com> wrote:
| > | > | 
| > | > | > H. Peter Anvin wrote:
| > | > | >>>
| > | > | >>>  Does this look like a kernel bug?
| > | > | >>>
| > | > | >>
| > | > | >> No, it looks like a very common virtualizer bug.  Does the attached  
| > | > | >> patch work for you?
| > | > | >>
| > | > | >
| > | > | > Also, in addition to this, please try tip:master.  There is a patch in 
| > | > | > tip:master which I hope should fix this problem, but the details are 
| > | > | > important.
| > | > | 
| > | > | access coordinates would be at:
| > | > | 
| > | > |   http://people.redhat.com/mingo/tip.git/README
| > | > 
| > | >  As I already have Linus tree downloaded I have cloned it in
| > | > the usual way.
| > | > 
| > | >  Got the same results: OOPS in virtualbox but it works on QEMU.
| > | > 
| > | >  The OOPS's output follows and I have attached the .config I'm using
| > | > to reproduce the problem.
| > | > 
| > | 
| > | Can you try booting with the kernel argument :
| > |   debug_alternative 
| > | 
| > | The dmesg of the kernel bootup up to the oops would be helpful.
| > | 
| > | My guess is that there may be something wrong with irq disabling which
| > | protects text_poke_early in apply_alternatives().
| > 
| >  I have attached two files:
| > 
| >   - normal.txt: normal boot with no debug options
| >   - debug-alternative.txt ignore_loglevel and debug-alternative boot
| >     options
| > 
| >  I had to pass ignore_loglevel otherwise it wouldn't print
| > anything.
| > 
| 
| Ok, ...
From: H. Peter Anvin
Date: Friday, August 22, 2008 - 11:11 am

Hi Luiz, two more tests:

1. a small program to run in userspace and tell us what you get;
2. a patch against -linus for testing.

	-hpa

From: Luiz Fernando N. Capitulino
Date: Friday, August 22, 2008 - 12:40 pm

Em Fri, 22 Aug 2008 11:11:25 -0700
"H. Peter Anvin" <hpa@zytor.com> escreveu:

| Hi Luiz, two more tests:
| 
| 1. a small program to run in userspace and tell us what you get;

 88776655:44332211

 It is the same output in the virtualized system and the host
system.

| 2. a patch against -linus for testing.

 I have tried this patch with Linus tree early today, should I try
it with Ingo's tree too?

-- 
Luiz Fernando N. Capitulino
--

From: H. Peter Anvin
Date: Friday, August 22, 2008 - 1:31 pm

It doesn't apply to tip.  This did not fix the problem?

	-hpa
--

From: Luiz Fernando N. Capitulino
Date: Friday, August 22, 2008 - 1:55 pm

Em Fri, 22 Aug 2008 13:31:49 -0700
"H. Peter Anvin" <hpa@zytor.com> escreveu:

| Luiz Fernando N. Capitulino wrote:
| > 
| > | 2. a patch against -linus for testing.
| > 
| >  I have tried this patch with Linus tree early today, should I try
| > it with Ingo's tree too?
| > 
| 
| It doesn't apply to tip.  This did not fix the problem?

 No, it did not. :(

-- 
Luiz Fernando N. Capitulino
--

From: Luiz Fernando N. Capitulino
Date: Friday, August 22, 2008 - 1:57 pm

Em Fri, 22 Aug 2008 14:20:54 -0300
"Luiz Fernando N. Capitulino" <lcapitulino@mandriva.com.br> escreveu:

| Em Fri, 22 Aug 2008 12:35:20 -0400
| Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> escreveu:
| 
| | * Luiz Fernando N. Capitulino (lcapitulino@mandriva.com.br) wrote:
| | > Em Fri, 22 Aug 2008 11:34:52 -0400
| | > Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> escreveu:
| | > 
| | > | * Luiz Fernando N. Capitulino (lcapitulino@mandriva.com.br) wrote:
| | > | > Em Fri, 22 Aug 2008 08:50:12 +0200
| | > | > Ingo Molnar <mingo@elte.hu> escreveu:
| | > | > 
| | > | > | 
| | > | > | * H. Peter Anvin <hpa@zytor.com> wrote:
| | > | > | 
| | > | > | > H. Peter Anvin wrote:
| | > | > | >>>
| | > | > | >>>  Does this look like a kernel bug?
| | > | > | >>>
| | > | > | >>
| | > | > | >> No, it looks like a very common virtualizer bug.  Does the attached  
| | > | > | >> patch work for you?
| | > | > | >>
| | > | > | >
| | > | > | > Also, in addition to this, please try tip:master.  There is a patch in 
| | > | > | > tip:master which I hope should fix this problem, but the details are 
| | > | > | > important.
| | > | > | 
| | > | > | access coordinates would be at:
| | > | > | 
| | > | > |   http://people.redhat.com/mingo/tip.git/README
| | > | > 
| | > | >  As I already have Linus tree downloaded I have cloned it in
| | > | > the usual way.
| | > | > 
| | > | >  Got the same results: OOPS in virtualbox but it works on QEMU.
| | > | > 
| | > | >  The OOPS's output follows and I have attached the .config I'm using
| | > | > to reproduce the problem.
| | > | > 
| | > | 
| | > | Can you try booting with the kernel argument :
| | > |   debug_alternative 
| | > | 
| | > | The dmesg of the kernel bootup up to the oops would be helpful.
| | > | 
| | > | My guess is that there may be something wrong with irq disabling which
| | > | protects text_poke_early in apply_alternatives().
| | > 
| | >  I have attached two files:
| | > 
| | >   - normal.txt: normal boot with ...
From: H. Peter Anvin
Date: Friday, August 22, 2008 - 2:08 pm

Yes, the big issue is exactly what VirtualBox screws up in this matter, 
how to detect it, and how to work around it.

It's pretty clear it's a VirtualBox f*ckup at this point, but the 
failure mechanism isn't at all obvious and so far the workaround is elusive.

I'm strongly suspect this is a VirtualBox tcache management failure, but 
that doesn't help the situation without knowing how it happens.

	-hpa


--

From: Gerhard Brauer
Date: Tuesday, August 26, 2008 - 7:18 am

On Archlinux we have the same problem. We have a bugreport here:
http://bugs.archlinux.org/task/11141

Myself test it with a LiveCD/Install-ISO which has 2.6.26 as install
kernel. We have the guest oops on virtualbox-ose, virtualbox-sun and both on
i686 or x86_64 hosts.

Some things i noticed:
- The system boots always when i either enable VT-x in guest settings or
  disable acpi and run the guest with acpi=off.
- The oops occurs always on (disk)-io, no matter which file system i
  use.
- When the oops has occured and the guest has to close and restart then,
  if i don't use VT-x or acpi=off, i always get an oops directly when
  initrd/kernel is starting. Last screen message before the oops then is
  "Freeing SMP alternatives".

Here is also an archive with guest dmesg and messages.log from such an
oops when heavy disk io leads to the oops:

Gerhard

-- 
Standards sind eine tolle Sache.
Ich finde, jeder sollte einen haben.
--

From: Mathieu Desnoyers
Date: Tuesday, August 26, 2008 - 7:53 am

Hrm, can you try this ?

1 - Make sure you kernel is not CONFIG_DEBUG_RODATA

2 - Change the whole text_poke implementation in
arch/x86/kernel/alternative.c to this :

void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
{
  return text_poke_early(addr, opcode, len);
}

If this works, I suspect that the problem comes from a vmap/vunmap
problem. If it still fails, the problem would likely come from a race
with interrupt disabling probably due to missing data/instruction cache
flush.

Then, after having tested (2), try this on top of it :

In arch/x86/kernel/alternative.c, alternatives_smp_switch()

Add   unsigned long flags;
Change 
spin_lock -> spin_lock_irqsave(&smp_alt, flags);
spin_unlock(&smp_alt); -> spin_unlock_irqrestore(&smp_alt, flags);

This will help testing if there is a problem with interrupts coming
shortly after the modification. If it fixes the problem, my guess is
that we should flush the instruction cache (and maybe the data cache ?)
in text_poke and text_poke early when interrupts are off.


-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--

From: Luiz Fernando N. Capitulino
Date: Tuesday, August 26, 2008 - 9:09 am

Em Tue, 26 Aug 2008 10:53:38 -0400
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> escreveu:

| * Gerhard Brauer (gerhard.brauer@web.de) wrote:
| > On Fri, Aug 22, 2008 at 02:08:13PM -0700, H. Peter Anvin wrote:
| > > Luiz Fernando N. Capitulino wrote:
| > >>
| > >>  I have asked Mandriva and Ubuntu users to test this and all of
| > >> them so far are saying that noreplace-paravirt works.
| > >>
| > >>  It makes the system slower, but it works.
| > >>
| > >
| > > Yes, the big issue is exactly what VirtualBox screws up in this matter,  
| > > how to detect it, and how to work around it.
| > >
| > > It's pretty clear it's a VirtualBox f*ckup at this point, but the failure 
| > > mechanism isn't at all obvious and so far the workaround is elusive.
| > >
| > > I'm strongly suspect this is a VirtualBox tcache management failure, but  
| > > that doesn't help the situation without knowing how it happens.
| > 
| > On Archlinux we have the same problem. We have a bugreport here:
| > http://bugs.archlinux.org/task/11141
| > 
| > Myself test it with a LiveCD/Install-ISO which has 2.6.26 as install
| > kernel. We have the guest oops on virtualbox-ose, virtualbox-sun and both on
| > i686 or x86_64 hosts.
| > 
| > Some things i noticed:
| > - The system boots always when i either enable VT-x in guest settings or
| >   disable acpi and run the guest with acpi=off.
| > - The oops occurs always on (disk)-io, no matter which file system i
| >   use.
| > - When the oops has occured and the guest has to close and restart then,
| >   if i don't use VT-x or acpi=off, i always get an oops directly when
| >   initrd/kernel is starting. Last screen message before the oops then is
| >   "Freeing SMP alternatives".
| > 
| > Here is also an archive with guest dmesg and messages.log from such an
| > oops when heavy disk io leads to the oops:
| > http://bugs.archlinux.org/task/11141?getfile=2445
| > 
| 
| Hrm, can you try this ?
| 
| 1 - Make sure you kernel is not ...
From: Luiz Fernando N. Capitulino
Date: Tuesday, August 26, 2008 - 9:13 am

Em Tue, 26 Aug 2008 10:53:38 -0400
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> escreveu:

| Then, after having tested (2), try this on top of it :
| 
| In arch/x86/kernel/alternative.c, alternatives_smp_switch()
| 
| Add   unsigned long flags;
| Change 
| spin_lock -> spin_lock_irqsave(&smp_alt, flags);
| spin_unlock(&smp_alt); -> spin_unlock_irqrestore(&smp_alt, flags);

 Hmm, I can't find spin_lock functions in alternatives_smp_switch()
looks like the current implementation is now using mutexes.

 What tree are you referring to?

-- 
Luiz Fernando N. Capitulino
--

From: Mathieu Desnoyers
Date: Tuesday, August 26, 2008 - 10:18 am

Sorry, I was looking directly at the commit which caused the problem.
Yes, these modif should go on top of the text_poke -> text_poke_early.

So in current mainline, change, in alternatives_smp_switch() :

mutex_lock(&smp_alt);
...

mutex_unlock(&smp_alt);

to

mutex_lock(&smp_alt);
local_irq_save(flags);
...

local_irq_restore(flags);
mutex_unlock(&smp_alt);

Thanks,


-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--

From: H. Peter Anvin
Date: Tuesday, August 26, 2008 - 10:32 am

I have been unable to replicate this on my own hardware mostly because 
my testing machine decided to blow its DVD drive in some very strange 
way, but I did pick apart the data from Luiz, and found it very interesting:

The code sequence before patching looks like:

c012fc69:       51                      push   %ecx
c012fc6a:       52                      push   %edx
c012fc6b:       ff 15 40 b9 41 c0       call   *0xc041b940
c012fc71:       5a                      pop    %edx
c012fc72:       59                      pop    %ecx

After patching:

50 9d 0f 1f 84 00 00 00 <00> 00

... which disassembles to (in Intel notation):

C012FC69  50                push eax
C012FC6A  9D                popfd
C012FC6B  0F1F840000000000  nop dword [eax+eax+0x0]

We do, indeed have a return point that falls in the *middle* of a 
patched instruction, and if the patching happens in the middle of the 
instruction call, then, well, bad things happen.

Furthermore, why on Earth is %ecx/%edx pushed and popped in-line here? 
Surely it should be the responsibility of the PV call to present a 
no-clobber interface (using an assembly wrapper if necessary[*]), rather 
than bloating every callsite like this?

	-hpa


[*] One can compile gcc code with -fcall-saved-* to use nonstandard 
register conventions.  Unfortunately stock gcc only lets you do this 
with a file parameter, and doesn't support doing this with attributes.
--

From: Luiz Fernando N. Capitulino
Date: Tuesday, August 26, 2008 - 11:02 am

Em Tue, 26 Aug 2008 13:18:22 -0400
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> escreveu:

| * Luiz Fernando N. Capitulino (lcapitulino@mandriva.com.br) wrote:
| > Em Tue, 26 Aug 2008 10:53:38 -0400
| > Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> escreveu:
| > 
| > | Then, after having tested (2), try this on top of it :
| > | 
| > | In arch/x86/kernel/alternative.c, alternatives_smp_switch()
| > | 
| > | Add   unsigned long flags;
| > | Change 
| > | spin_lock -> spin_lock_irqsave(&smp_alt, flags);
| > | spin_unlock(&smp_alt); -> spin_unlock_irqrestore(&smp_alt, flags);
| > 
| >  Hmm, I can't find spin_lock functions in alternatives_smp_switch()
| > looks like the current implementation is now using mutexes.
| > 
| 
| Sorry, I was looking directly at the commit which caused the problem.
| Yes, these modif should go on top of the text_poke -> text_poke_early.
| 
| So in current mainline, change, in alternatives_smp_switch() :
| 
| mutex_lock(&smp_alt);
| ...
| 
| mutex_unlock(&smp_alt);
| 
| to
| 
| mutex_lock(&smp_alt);
| local_irq_save(flags);
| ...
| 
| local_irq_restore(flags);
| mutex_unlock(&smp_alt);

 Did not help, same oops here.

-- 
Luiz Fernando N. Capitulino
--

From: Mathieu Desnoyers
Date: Tuesday, August 26, 2008 - 11:15 am

Ok, it might still be caused by paravirt and alternatives instruction
patching. What if you also do :

alternative_instructions()

+        unsigned long flags;
        /* The patching is not fully atomic, so try to avoid local interruptions
           that might execute the to be patched code.
           Other CPUs are not running. */
        stop_nmi();
#ifdef CONFIG_X86_MCE
        stop_mce();
#endif
+        local_irq_save(flags);


...
+        local_irq_restore(flags);
        restart_nmi();
#ifdef CONFIG_X86_MCE
        restart_mce();
#endif

?

Hrm,

Since those local_irq_save/restore occur _before_ the paravirt patching
is done, I wonder if there would be a race in the way cli/sti traps are
handled by Virtualbox wrt incoming interrupt ?

Thanks,


-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--

From: H. Peter Anvin
Date: Tuesday, August 26, 2008 - 12:52 pm

One thing that I think really needs to be considered is that the current 
PV stubs are (a) large, and (b) non-atomic.

In the case at hand we have:

c012fc69:       51                      push   %ecx
c012fc6a:       52                      push   %edx
c012fc6b:       ff 15 40 b9 41 c0       call   *0xc041b940
c012fc71:       5a                      pop    %edx
c012fc72:       59                      pop    %ecx

Ten bytes replacing a two-byte native sequence.

If this was done as a call to an out-of-line stub, it would be only five 
bytes, which would reduce native icache overhead from 400% to 150%, but 
perhaps more importantly, it would not be subject to returns inside the 
sequence itself (since the out-of-line stub would still exist.)  As an 
optional bonus, at least on 32 bits the indirect call could be replaced 
with a direct call in the out-of-line stub.

	-hpa
--

From: Gerhard Brauer
Date: Tuesday, August 26, 2008 - 1:34 pm

Hej! This last changes (in addition to the others you mentioned) seems
to be a good shot. I could reboot 8 times the guest, compile several
packages (something which always leeds to the oops) and currently i
build two big packages simultan. So this is heavy IO.

I will try tomorrow more heavy build tests (to gain the good feeling to
the vbox+guest kernel again like it was with 2.6.25), but i think your
changes goes in the right direction.

Here is the diff what i've changed on your hints:

,----[ arch/x86/kernel/alternative.c ]
| --- alternative.c.org	2008-07-13 23:51:29.000000000 +0200
| +++ alternative.c	2008-08-26 21:35:20.000000000 +0200
| @@ -343,6 +343,7 @@
|  void alternatives_smp_switch(int smp)
|  {
|  	struct smp_alt_module *mod;
| +	unsigned long flags;
|  
|  #ifdef CONFIG_LOCKDEP
|  	/*
| @@ -359,7 +360,7 @@
|  		return;
|  	BUG_ON(!smp && (num_online_cpus() > 1));
|  
| -	spin_lock(&smp_alt);
| +	spin_lock_irqsave(&smp_alt, flags);
|  
|  	/*
|  	 * Avoid unnecessary switches because it forces JIT based VMs to
| @@ -383,7 +384,7 @@
|  						mod->text, mod->text_end);
|  	}
|  	smp_mode = smp;
| -	spin_unlock(&smp_alt);
| +	spin_unlock_irqrestore(&smp_alt, flags);
|  }
|  
|  #endif
| @@ -420,6 +421,7 @@
|  
|  void __init alternative_instructions(void)
|  {
| +	unsigned long flags;
|  	/* The patching is not fully atomic, so try to avoid local interruptions
|  	   that might execute the to be patched code.
|  	   Other CPUs are not running. */
| @@ -427,6 +429,7 @@
|  #ifdef CONFIG_X86_MCE
|  	stop_mce();
|  #endif
| +	local_irq_save(flags);
|  
|  	apply_alternatives(__alt_instructions, __alt_instructions_end);
|  
| @@ -465,6 +468,7 @@
|  				(unsigned long)__smp_locks,
|  				(unsigned long)__smp_locks_end);
|  
| +	local_irq_restore(flags);
|  	restart_nmi();
|  #ifdef CONFIG_X86_MCE
|  	restart_mce();
| @@ -508,33 +512,5 @@
|   */
|  void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
|  {
| -	unsigned ...
From: Mathieu Desnoyers
Date: Tuesday, August 26, 2008 - 1:48 pm

OK, so we have a problem with interrupts coming while we are doing the
alternatives patching.

First thing, I wonder if Virtualbox expects the OS to patch all its
paravirt instructions in one go ?

Also, could you then try to :
- to revert all those changes
- Do this to text_poke_early and text_poke :

- put the sync_core() within the irq off critical section
(test)
- add a wbinvd();  just after the sync_core() in both functions
(test).

Thanks,


-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--

From: Gerhard Brauer
Date: Tuesday, August 26, 2008 - 2:25 pm

Could you please explain more what to change? I don't see where to put


Thank you
	Gerhard

--

From: Mathieu Desnoyers
Date: Tuesday, August 26, 2008 - 2:35 pm

Sure,

First patch to test :

x86 alternative text_poke move sync_core

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 arch/x86/kernel/alternative.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6-lttng/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kernel/alternative.c	2008-08-26 17:26:41.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kernel/alternative.c	2008-08-26 17:26:58.000000000 -0400
@@ -488,8 +488,8 @@ void *text_poke_early(void *addr, const 
 	unsigned long flags;
 	local_irq_save(flags);
 	memcpy(addr, opcode, len);
-	local_irq_restore(flags);
 	sync_core();
+	local_irq_restore(flags);
 	/* Could also do a CLFLUSH here to speed up CPU recovery; but
 	   that causes hangs on some VIA CPUs. */
 	return addr;
@@ -529,9 +529,9 @@ void *__kprobes text_poke(void *addr, co
 	BUG_ON(!vaddr);
 	local_irq_save(flags);
 	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
+	sync_core();
 	local_irq_restore(flags);
 	vunmap(vaddr);
-	sync_core();
 	/* Could also do a CLFLUSH here to speed up CPU recovery; but
 	   that causes hangs on some VIA CPUs. */

Second patch to apply on top of the first one :


x86 alternative text_poke add wbinvd

Add a cache flush instruction before reenabling interrupts in text_poke.

If this works, we could use clflush() (which is sadly buggy on some archs) which
is faster since it only clear a cacheline instead of the entire cache.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 arch/x86/kernel/alternative.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6-lttng/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kernel/alternative.c	2008-08-26 17:27:33.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kernel/alternative.c	2008-08-26 17:27:53.000000000 -0400
@@ -489,6 ...
From: H. Peter Anvin
Date: Tuesday, August 26, 2008 - 2:51 pm

Well, in this case it's VirtualBox we're talking about, a virtual 
architecture.  It's hard to know what it will do under *any* circumstances.

	-hpa
--

From: Gerhard Brauer
Date: Tuesday, August 26, 2008 - 5:13 pm

With this got the oops again when compiling in guest. Reboot afterwards

With second patch i get the early oops after "freeing smp". Seems no way
to get the guest bootet normaly (ony with replace-paravirt).

So the changes from the other mail took more effect IMHO.

Gerhard

--

From: Luiz Fernando N. Capitulino
Date: Wednesday, August 27, 2008 - 12:13 pm

Em Tue, 26 Aug 2008 22:34:49 +0200
Gerhard Brauer <gerhard.brauer@web.de> escreveu:

| On Tue, Aug 26, 2008 at 02:15:58PM -0400, Mathieu Desnoyers wrote:
| > 
| > Ok, it might still be caused by paravirt and alternatives instruction
| > patching. What if you also do :
| > 
| > alternative_instructions()
| > 
| > +        unsigned long flags;
| >         /* The patching is not fully atomic, so try to avoid local interruptions
| >            that might execute the to be patched code.
| >            Other CPUs are not running. */
| >         stop_nmi();
| > #ifdef CONFIG_X86_MCE
| >         stop_mce();
| > #endif
| > +        local_irq_save(flags);
| > 
| > 
| > ...
| > +        local_irq_restore(flags);
| >         restart_nmi();
| > #ifdef CONFIG_X86_MCE
| >         restart_mce();
| > #endif
| > 
| > ?
| 
| Hej! This last changes (in addition to the others you mentioned) seems
| to be a good shot. I could reboot 8 times the guest, compile several
| packages (something which always leeds to the oops) and currently i
| build two big packages simultan. So this is heavy IO.

 Yeah, it works for me too and it's good to know that you are doing
additional tests. I'm doing only boot tests... I was testing lots of
kernels and doing additional tests would take a lot of time.

 Now, what does this mean? Is VirtualBox issuing interrupts when it
shouldn't or should this section of the code be better protected?

-- 
Luiz Fernando N. Capitulino
--

From: Mathieu Desnoyers
Date: Wednesday, August 27, 2008 - 4:33 pm

Since this problem appears while we are using a simple memcpy (the
text_poke_early version), but disappears when we disable interrupts for
a longer period of this, I suspect a problem with irq disabling in
Virtualbox.

We could try to add some nsleep() or msleep() calls within text_poke and
text_poke_early before and after the code modificatoin to see if the
problem disappears. If it does, then that would somewhat confirm the
racy irq disable thesis.


-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--

From: Luiz Fernando N. Capitulino
Date: Thursday, August 28, 2008 - 6:30 am

Em Wed, 27 Aug 2008 19:33:28 -0400
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> escreveu:

| * Luiz Fernando N. Capitulino (lcapitulino@mandriva.com.br) wrote:
| > Em Tue, 26 Aug 2008 22:34:49 +0200
| > Gerhard Brauer <gerhard.brauer@web.de> escreveu:
| > 
| > | On Tue, Aug 26, 2008 at 02:15:58PM -0400, Mathieu Desnoyers wrote:
| > | > 
| > | > Ok, it might still be caused by paravirt and alternatives instruction
| > | > patching. What if you also do :
| > | > 
| > | > alternative_instructions()
| > | > 
| > | > +        unsigned long flags;
| > | >         /* The patching is not fully atomic, so try to avoid local interruptions
| > | >            that might execute the to be patched code.
| > | >            Other CPUs are not running. */
| > | >         stop_nmi();
| > | > #ifdef CONFIG_X86_MCE
| > | >         stop_mce();
| > | > #endif
| > | > +        local_irq_save(flags);
| > | > 
| > | > 
| > | > ...
| > | > +        local_irq_restore(flags);
| > | >         restart_nmi();
| > | > #ifdef CONFIG_X86_MCE
| > | >         restart_mce();
| > | > #endif
| > | > 
| > | > ?
| > | 
| > | Hej! This last changes (in addition to the others you mentioned) seems
| > | to be a good shot. I could reboot 8 times the guest, compile several
| > | packages (something which always leeds to the oops) and currently i
| > | build two big packages simultan. So this is heavy IO.
| > 
| >  Yeah, it works for me too and it's good to know that you are doing
| > additional tests. I'm doing only boot tests... I was testing lots of
| > kernels and doing additional tests would take a lot of time.
| > 
| >  Now, what does this mean? Is VirtualBox issuing interrupts when it
| > shouldn't or should this section of the code be better protected?
| > 
| 
| Since this problem appears while we are using a simple memcpy (the
| text_poke_early version), but disappears when we disable interrupts for
| a longer period of this, I suspect a problem with irq disabling in
| Virtualbox.
| 
| We ...
From: Gerhard Brauer
Date: Sunday, August 31, 2008 - 2:29 am

Ok, some news from archlinux side:
Our distribution kernel was upgraded from 2.6.26.2 to 2.6.26.3. With
this upgrade to patchlevel .3 the "early oops"(freeing smp...) has gone.
My virtual machines boots always fine with this, and i have one
confirmation from a user about this.

Kernel upgrade does not solve the kernel panic during work with the VM,
when there is heavy disk IO. I test and could reproduce this by untar 2
big files in seperate dirs: bsdtar -x -f VirtualBox-1.6.2-OSE.tar.bz2.
Doing this simultan crashed the VM always.
SreenShot:
http://users.archlinux.de/~gerbra/tmp/2008-08-31-110449_724x456_scrot.png

This heavy IO oops does not occur under 2.6.26.2 when using the
"3-changes-patch" against alternatives.c, which we have tested in the
other mails. There must be something irq related which fix this
3-changes-patch, and what was not fixed in 2.6.26.3
On the other hand: I never have stressed a VM like this before
researching for this problem. So it could also be that the heavy-IO
problem way a total seperate problem from that we're talking about here.
Doing my "normal" work now in VM (it's my devel VM for compiling and
testing), until now i don't have had this IO oops.

We use a mostly unpatched kernel as distribution kernel.

So short summary from my side:
a) With "3-changes-patch" i got a rock solide VM
b) 2.6.26.2 have the early oops on boot and IO oops when sometimes
   bootet.
c) 2.6.26.3 have only the heavy-IO oops

I'll try a fresh VM, where i will test:
a) Using sata controller emulation as bus (now i have ide(piix3))
b) Using different filesystems (With 2.6.26.2 early oops and heavy-io
   oops could be reproduced with any filesystem).


Regards
	Gerhard

--

From: Stefan Lippers-Hollmann
Date: Sunday, August 31, 2008 - 6:28 am

Hi

On Sonntag, 31. August 2008, Gerhard Brauer wrote:

Sorry, I can't confirm this here on Debian unstable (with virtualbox-ose 
1.6.2 or 1.6.4), are you sure that other configuration options didn't 
change between the different kernel versions? Preemption and paravirt can
influence the probability of the early boot panic seriously, without really
avoiding it alltogether.

Actually I still get the same issues with implanting
ftp://ftp5.gwdg.de/pub/linux/archlinux/core/os/i686/kernel26-2.6.26.3-1-i686.pkg.tar.gz

Regards
	Stefan Lippers-Hollmann
--

From: Gerhard Brauer
Date: Sunday, August 31, 2008 - 7:03 am

Only changes between our 2.6.26.2-1 and 2.6.26.3-1 are some minor
framebuffer changes in config. If i have a look at the different
patchsets between the two versions i don't see something which could be

Hmm, one user also reports that he have no problem when using a vanilla
2.6.26 as guest kernel. But there must be some reasons when different
distributions notice a major problem between 2.6.25 and 2.6.26 with
their stock kernels. Although i don't even know if our few reports here

Gerhard

-- 
Heute ist das Morgen wovor du gestern Angst hattest...
--

From: Luiz Fernando N. Capitulino
Date: Sunday, August 31, 2008 - 7:09 am

Em Sun, 31 Aug 2008 11:29:23 +0200
Gerhard Brauer <gerhard.brauer@web.de> escreveu:

| On Thu, Aug 28, 2008 at 10:30:13AM -0300, Luiz Fernando N. Capitulino wrote:
| > Em Wed, 27 Aug 2008 19:33:28 -0400
| > Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> escreveu:
| > | 
| > | Since this problem appears while we are using a simple memcpy (the
| > | text_poke_early version), but disappears when we disable interrupts for
| > | a longer period of this, I suspect a problem with irq disabling in
| > | Virtualbox.
| > | 
| > | We could try to add some nsleep() or msleep() calls within text_poke and
| > | text_poke_early before and after the code modificatoin to see if the
| > | problem disappears. If it does, then that would somewhat confirm the
| > | racy irq disable thesis.
| > 
| >  Well, a Ubuntu kernel guy has reported in the virtualbox's ticket[1]
| > that the oops doesn't happen if he puts a printk() in the crash site.
| > 
| >  The funny thing is that someone (who might be a virtualbox developer)
| > used the same race argument to say that this is a bug in the kernel.
| > 
| >  What concerns me though is that how can virtualbox be worth using
| > in the Linux community if it's probably not working for various distros
| > (currently Fedora, Ubuntu, Mandriva and ArchLinux).
| > 
| >  Thanks for the effort, guys.
| > 
| > [1] http://www.virtualbox.org/ticket/1875
| 
| Ok, some news from archlinux side:
| Our distribution kernel was upgraded from 2.6.26.2 to 2.6.26.3. With
| this upgrade to patchlevel .3 the "early oops"(freeing smp...) has gone.
| My virtual machines boots always fine with this, and i have one
| confirmation from a user about this.
| 
| Kernel upgrade does not solve the kernel panic during work with the VM,
| when there is heavy disk IO. I test and could reproduce this by untar 2
| big files in seperate dirs: bsdtar -x -f VirtualBox-1.6.2-OSE.tar.bz2.
| Doing this simultan crashed the VM always.
| SreenShot:
| ...
From: Gerhard Brauer
Date: Sunday, September 21, 2008 - 6:41 am

Am Sonntag, den 31.08.2008, 11:09 -0300 schrieb Luiz Fernando N.

I was away the last days, but i notice that the virtualbox update to
2.0.2 solve all problems i mentioned. We also have same responses from
other arch users.
I never saw the "early boot oops" (this have had gone still with our
kernel update, mystical....). But also the "heavy IO oops" has gone. So
it was a virtualbox problem, they fixed it in:
http://www.virtualbox.org/ticket/1875

So from my side this is solved.

Gerhard

--

From: Ingo Molnar
Date: Monday, September 22, 2008 - 2:51 am

great - the VirtualBox recompiler didnt notice the paravirt code 
modification sequence. Any Linux kernel side NOP issue (which caused 
that early oops) should be solved in 2.6.27-rc7 as well.

so the combo of 2.0.2 and later VirtualBox plus v2.6.27-rc7 and later 
should have no known bugs.

	Ingo
--

From: Luiz Fernando N. Capitulino
Date: Wednesday, September 24, 2008 - 6:24 am

Em Sun, 21 Sep 2008 15:41:39 +0200
Gerhard Brauer <gerhard.brauer@web.de> escreveu:

| Am Sonntag, den 31.08.2008, 11:09 -0300 schrieb Luiz Fernando N.
| Capitulino:
| 
| >  Mandriva kernel was 2.6.26.3 based at the time I started testing
| > this and all my last tests have been done on 2.6.27-rc4. I think it's
| > very unusual to have a change in a -stable kernel not present in the
| > latest -rc.
| 
| I was away the last days, but i notice that the virtualbox update to
| 2.0.2 solve all problems i mentioned. We also have same responses from
| other arch users.
| I never saw the "early boot oops" (this have had gone still with our
| kernel update, mystical....). But also the "heavy IO oops" has gone. So
| it was a virtualbox problem, they fixed it in:
| http://www.virtualbox.org/ticket/1875
| 
| So from my side this is solved.

 Yeah, we have ran some tests here as well and it is solved.

 Thanks a lot for the people involved in debugging this problem.

-- 
Luiz Fernando N. Capitulino
--

From: Gerhard Brauer
Date: Thursday, August 28, 2008 - 6:50 am

nsleep isn't known here as a function, only references i found is maybe
in posix-timers.c.

msleep() is known, but each time i add for ex.
msleep(100);
in any place in text_poke and/or text_poke_early it get a kernel panic
on boot. Here's a screenie:
http://users.archlinux.de/~gerbra/tmp/2008-08-28-132337_724x456_scrot.png

I also tried to work with the isolated changes we have last made, but it
seems that only the 3 changes together work.
Also i tried to went back to older versions of alternatives.c referenced
in:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.26.y.git;a=history;f=arch/x...

But with my few knowledges i ran in too many errors.

So, have you any further ideas, code that i/we could test?
Or - i'm naive - are the "3 changes" we made ready to go in the kernel

Gerhard


--

From: Gerhard Brauer
Date: Tuesday, August 26, 2008 - 12:27 pm

Sorry for the delay but i need to build a complete distribution kernel
and my machine is not the fastest.

My host:
archlinux 2.6.26 P4 2Ghz
VirtualBox: Sun xVM 1.6.4
gcc 4.4.1-3

My guest:
archlinux 2.6.26

My "tests":
I could sometimes boot the guest with the "tricks" (VT-x enabled, acpi
off,...). But i always get an oops if i compile something bigger on this
guest (ex. virtualbox-modules where the tarball must be untarrt with
bsdtar -> disk io)
If this happens the next reboot leads always to the early oops (Freeing
smp....). Each reboot do this. Then i close virtualbox application,
unload/reload vboxdrv from host and start vbox again. Then i could
mostimes boot the guest again. But next heavy disk IO leads again to the
oops.
If i could boot without oops, and reboot or halt the guest, then the

With our distribution kernel i could change these spin_lock/unlock in
alternatives.c. Fist thought was that there was a slightly better
behavior (first boot goes on, i could compile something, but next
package i build thee opps (heavy io opps) comes again. And then also
after reboot the early oops (freeing smp...)
Here is a screenie from oops when building something:
http://users.archlinux.de/~gerbra/tmp/2008-08-26-210724_724x456_scrot.png

Sometimes (could not be reproduced) the virtualbox app also traps with
an error dialog (Guru message), which offers a log from the VM and a
scren shot. Maybe this could be helpfull. Log and screenie could be
found here:


Regards
	Gerhard

--

From: Luiz Fernando N. Capitulino
Date: Tuesday, August 26, 2008 - 9:02 am

Em Tue, 26 Aug 2008 16:18:51 +0200
Gerhard Brauer <gerhard.brauer@web.de> escreveu:

| On Fri, Aug 22, 2008 at 02:08:13PM -0700, H. Peter Anvin wrote:
| > Luiz Fernando N. Capitulino wrote:
| >>
| >>  I have asked Mandriva and Ubuntu users to test this and all of
| >> them so far are saying that noreplace-paravirt works.
| >>
| >>  It makes the system slower, but it works.
| >>
| >
| > Yes, the big issue is exactly what VirtualBox screws up in this matter,  
| > how to detect it, and how to work around it.
| >
| > It's pretty clear it's a VirtualBox f*ckup at this point, but the failure 
| > mechanism isn't at all obvious and so far the workaround is elusive.
| >
| > I'm strongly suspect this is a VirtualBox tcache management failure, but  
| > that doesn't help the situation without knowing how it happens.
| 
| On Archlinux we have the same problem. We have a bugreport here:
| http://bugs.archlinux.org/task/11141
| 
| Myself test it with a LiveCD/Install-ISO which has 2.6.26 as install
| kernel. We have the guest oops on virtualbox-ose, virtualbox-sun and both on
| i686 or x86_64 hosts.
| 
| Some things i noticed:
| - The system boots always when i either enable VT-x in guest settings or
|   disable acpi and run the guest with acpi=off.

 Yes, lots of ubuntu users have reported the same but another "lots"
of them have reported that the trick didn't work.

 Thanks for joining!

-- 
Luiz Fernando N. Capitulino
--

From: Gerhard Brauer
Date: Tuesday, August 26, 2008 - 9:40 am

I must relativate above: i have two test enviroments, one is our
LiveCD/Install-ISO with 2.6.26 which we made special for a linux
conference last weekend (our official iso comes still with 2.6.25). With
this iso the "trick" with VT-x or noacpi works.
But on an installed archlinux (with distribution kernel 2.6.26) this
does'nt work. Sometimes it works when i restart the virtualbox
application, but mostly not. So on this installed guest system the only
working solution seems to add noreplace-paravirt as kernel parameter.
But this makes the system terrible slow (mostly on udev things).

I try Mathieu's hints currently by building a new distribution kernel
with the changes.
But i think the biggest problem to maybe solve this from the sight of
kernel devs is that we all have different "test" enviroments (vbox
versions, architectures, distribution kernels,...) where the oops (i
think) not appears for all on the same place.
On the other hand, the more we we try such patches in different
enviroments there is a better chance to get a real fix - from kernel dev

Gerhard

-- 
www,archlinux.de
--

From: H. Peter Anvin
Date: Friday, August 22, 2008 - 10:16 am

Was looking at the code stream, and noticed this:

Code: c0 0f 84 0b 01 00 00 b8 d0 bf 41 c0 c7 05 6c c0 41 c0 ff ff ff ff 
e8 7f 82 21 00 e8 1a 03 02 00 8b 45 b0 50 9d 0f 1f 84 00 00 00 <00> 00 
8b 45 bc 83 c4 60 5b 5e 5f 5d c3 66 90 a1 6c c0 41 c0 e8

Code: c0 0f 84 0b 01 00 00 b8 d0 bf 41 c0 c7 05 6c c0 41 c0 ff ff ff ff 
e8 7f 82 21 00 e8 1a 03 02 00 8b 45 b0 50 9d 0f 1f 84 00 00 00 <00> 00 
8b 45 bc 83 c4 60 5b 5e 5f 5d c3 66 90 a1 6c c0 41 c0 e8

The EIP is in the *MIDDLE* of a NOPL instruction:

C012FC46  C00F84            ror byte [edi],0x84
C012FC49  0B01              or eax,[ecx]
C012FC4B  0000              add [eax],al
C012FC4D  B8D0BF41C0        mov eax,0xc041bfd0
C012FC52  C7056CC041C0FFFF  mov dword [dword 0xc041c06c],0xffffffff
          -FFFF
C012FC5C  E87F822100        call dword 0xc0347ee0
C012FC61  E81A030200        call dword 0xc014ff80
C012FC66  8B45B0            mov eax,[ebp-0x50]
C012FC69  50                push eax
C012FC6A  9D                popfd
C012FC6B  0F1F840000000000  nop dword [eax+eax+0x0]
C012FC73  8B45BC            mov eax,[ebp-0x44]
C012FC76  83C460            add esp,byte +0x60
C012FC79  5B                pop ebx
C012FC7A  5E                pop esi
C012FC7B  5F                pop edi
C012FC7C  5D                pop ebp
C012FC7D  C3                ret
C012FC7E  6690              xchg ax,ax
C012FC80  A16CC041C0        mov eax,[0xc041c06c]

There are two possibilities: VirtualBox mis-executes (not merely traps, 
which is what tip:master looks for) the NOPL instruction, or something 
is jumping into the middle of the sequence that is then replaced by the 
NOPL.

So, Luiz: the DEBUG_INFO version of vmlinux would be helpful.  It would 
also help to know the exact version of VirtualBox you're running, what 
source you got it from, and what your host system looks like.

	-hpa
--

From: Mathieu Desnoyers
Date: Friday, August 22, 2008 - 10:45 am

The patch which turns on this bug this this important change to the
apply paravirt : it disables interrupts _near_ the code patching,
_within_ the loop. Before, interrupts were disabled outside of the loop.
It needs to disable interrupts within the loop to be able to use vmap in
text_poke().

So I bet VirtualBox has a race in the way it handles interrupt
disabling.

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--

From: H. Peter Anvin
Date: Friday, August 22, 2008 - 10:57 am

That seems a bit far-fetched.  The fault is in an initcall, and there 
are no interrupts involved.  Perhaps VirtualBox doesn't manage its 
tcache correctly, but I don't see this as being interrupt-related.

	-hpa
--

From: Luiz Fernando N. Capitulino
Date: Friday, August 22, 2008 - 12:10 pm

Em Fri, 22 Aug 2008 10:16:07 -0700
"H. Peter Anvin" <hpa@zytor.com> escreveu:

| So, Luiz: the DEBUG_INFO version of vmlinux would be helpful.  It would 
| also help to know the exact version of VirtualBox you're running, what 
| source you got it from, and what your host system looks like.

 You will find vmlinux with DEBUG_INFO enabled at:

http://users.mandriva.com.br/~lcapitulino/virtualbox-oops/

 I'm running Mandriva's VirtualBox 1.6.4 OSE, my host kernel is 2.6.26-3mnb
(patched).

 I could try with upstream's VirtualBox just to be sure it's not
something else, but I don't think it is since there are reports for
ArchLinux and Ubuntu as well:

https://bugs.launchpad.net/ubuntu/intrepid/+source/linux/+bug/246067

-- 
Luiz Fernando N. Capitulino
--

From: H. Peter Anvin
Date: Friday, August 22, 2008 - 12:14 pm

Not necessary, but I wanted to get the information so I can try to 
reproduce locally.

	-hpa
--

From: H. Peter Anvin
Date: Friday, August 22, 2008 - 12:18 pm

What is your host *system* like -- CPU especially, and is your host 
kernel 32 or 64 bits?

	-hpa
--

From: Luiz Fernando N. Capitulino
Date: Friday, August 22, 2008 - 12:42 pm

Em Fri, 22 Aug 2008 12:18:21 -0700
"H. Peter Anvin" <hpa@zytor.com> escreveu:

| Luiz Fernando N. Capitulino wrote:
| > 
| >  I'm running Mandriva's VirtualBox 1.6.4 OSE, my host kernel is 2.6.26-3mnb
| > (patched).
| > 
| 
| What is your host *system* like -- CPU especially, and is your host 
| kernel 32 or 64 bits?

 32 bits, /proc/cpuinfo output:

"""
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 15
model		: 4
model name	: Intel(R) Pentium(R) 4 CPU 2.40GHz
stepping	: 1
cpu MHz		: 2410.462
cache size	: 1024 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc pebs bts pni monitor ds_cpl cid xtpr
bogomips	: 4825.33
clflush size	: 64
power management:
"""

 I have 1G of RAM and a VIA mobo.

-- 
Luiz Fernando N. Capitulino
--

From: Luiz Fernando N. Capitulino
Date: Friday, August 22, 2008 - 7:28 am

Em Thu, 21 Aug 2008 14:34:07 -0700
"H. Peter Anvin" <hpa@zytor.com> escreveu:

| > 
| >  Does this look like a kernel bug?
| > 
| 
| No, it looks like a very common virtualizer bug.  Does the attached 
| patch work for you?

 Unfortunately it does not.

-- 
Luiz Fernando N. Capitulino
--

Previous thread: [RFC v2][PATCH 0/9] kernel-based checkpoint-restart by Dave Hansen on Wednesday, August 20, 2008 - 12:25 pm. (13 messages)

Next thread: USB Serial device disconnect causes IRQ disable by amruth on Wednesday, August 20, 2008 - 12:34 pm. (24 messages)