Re: [LKML] Re: [LKML] [PATCH] Fix NULL pointer for Xen guests

Previous thread: Implement trace_power_frequency for all cpufreq drivers by Thomas Renninger on Tuesday, April 27, 2010 - 7:57 am. (6 messages)

Next thread: Re: [GIT PULL] updates for oprofile by Phil Carmody on Tuesday, April 27, 2010 - 8:25 am. (8 messages)
From: Prarit Bhargava
Date: Tuesday, April 27, 2010 - 8:24 am

Upstream PV guests fail to boot because of a NULL pointer.  It is possible that
xen guests have irq_desc->chip_data = NULL.

Test for NULL chip_data pointer before attempting to complete an irq move.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 127b871..eb2789c 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2545,6 +2545,9 @@ void irq_force_complete_move(int irq)
 	struct irq_desc *desc = irq_to_desc(irq);
 	struct irq_cfg *cfg = desc->chip_data;
 
+	if (!cfg)
+		return;
+
 	__irq_complete_move(&desc, cfg->vector);
 }
 #else
--

From: Konrad Rzeszutek Wilk
Date: Tuesday, April 27, 2010 - 9:58 am

Can you provide a short example of test scenario? As in what I should do
--

From: Prarit Bhargava
Date: Tuesday, April 27, 2010 - 10:09 am

Take the latest upstream (well ... to be honest, a bit older than that 
because of some other bugs) -- take 2.6.33 and try to boot it as a PV 
guest.  I'm using a RHEL5 Xen HV fwiw ...

--

From: Andrew Jones
Date: Tuesday, April 27, 2010 - 10:59 am

Another ingredient is to boot the guest with a configuration where its
maxvcpus is greater than its vcpus. If you have RHEL 5.5 userspace then
you can create a config with lines like this

maxvcpus = 4
vcpus = 2

with that you'll crash on boot. Then you can check that
irq_force_complete_move is on the stack if you have "preserve" for
on_crash and use xenctx to look at the state of the vcpus.

If the Xen you're using doesn't support the maxvcpus var, then I believe
you can do the same principle, but in a different way, using the
vcpus_avail var. Or, you can boot with > 1 vcpus and then attempt to
remove one with 'xm vcpu-set'.


--

From: Konrad Rzeszutek Wilk
Date: Tuesday, April 27, 2010 - 11:34 am

2.6.34-rc5 PV boots under Xen for me (and pretty much since 2.6.33 +
Suresh fix for the CONFIG_RODATA_MARK).

Perhaps I am missing some of the .config options you have set that make it not work?

The irqbalance daemon looks to be running - but I think you are hitting
this during bootup?  How long do you have to wait for this to trigger?

How many CPUs did you assign to your guest?


OK, so your control domain is RHEL5. Mine is the Jeremy's xen/next one
(2.6.32). Let me try to compile RHEL5 under FC11 - any tricks necessary
to do that?
--

From: Prarit Bhargava
Date: Tuesday, April 27, 2010 - 11:47 am

It happens during bootup.   I don't have a 2.6.33 vanilla panic handy 
but I do have one from an earlier 2.6.32...

rip: ffffffff81256f45 delay_tsc+0x45

rsp: ffff8800fac95a98

rax: fffffffff6ef46d0   rbx: 00000002   rcx: f6ef46d0   rdx: 0010850c

rsi: 002b3bb6   rdi: 002b3bcc   rbp: ffff8800fac95ab8

  r8: ffffffff    r9: 00000002   r10: 00000002   r11: 00000000

r12: fffffffff6dec1c4   r13: 00000002   r14: 002b3bcc   r15: 00000001

  cs: 0000e033    ds: 00000000    fs: 00000000    gs: 00000000



Stack:

  000000000002ef45 ffff8800fac95c88 0000000000000009 ffff8800fac93540

  ffff8800fac95ac8 ffffffff81256ef6 ffff8800fac95b48 ffffffff814c6341

  0000000000000010 ffff8800fac95b38 ffff880000000008 ffff8800fac95b58

  ffff8800fac95b08 a22d306b065d4a66 0000000000000000 0000000000000000



Code:

f3 90 65 8b 1c 25 d8 e3 00 00 44 39 eb 75 23 66 66 90 0f ae e8<e8>  46 3d dc ff
66 90 48 98 48 89



Call Trace:

   [<ffffffff81256f45>] delay_tsc+0x45<--

   [<ffffffff81256ef6>] __const_udelay+0x46

   [<ffffffff814c6341>] panic+0x135

   [<ffffffff814ca23c>] oops_end+0xdc

   [<ffffffff81042272>] no_context+0xf2

   [<ffffffff8125946c>] __bitmap_weight+0x8c

   [<ffffffff81042505>] __bad_area_nosemaphore+0x125

   [<ffffffff8105fad4>] find_busiest_group+0x254

   [<ffffffff810425d3>] bad_area_nosemaphore+0x13

   [<ffffffff814cbccf>] do_page_fault+0x2ef

   [<ffffffff814c9595>] page_fault+0x25

   [<ffffffff810302f2>] irq_force_complete_move+0x12

   [<ffffffff81015214>] fixup_irqs+0xa4

   [<ffffffff8102ce59>] cpu_disable_common+0x1a9

   [<ffffffff8100f9c2>] check_events+0x12

   [<ffffffff810c2550>] __stop_machine+0x120

   [<ffffffff8100ff75>] xen_cpu_disable+0x25

   [<ffffffff814b0427>] take_cpu_down+0x17

   [<ffffffff810c25f9>] stop_cpu+0xa9

   [<ffffffff8108869d>] worker_thread+0x16d

   [<ffffffff8100f19d>] xen_force_evtchn_callback+0xd

   [<ffffffff8108dd00>] wake_up_bit+0x40

   [<ffffffff814c90f6>] ...
From: Konrad Rzeszutek Wilk
Date: Monday, May 3, 2010 - 12:16 pm

Yes. No luck reproducing the crash/panic. I am just not seeing the failure you
guys are seeing.

Let me build once more 2.6.33 vanilla + CONFIG_DEBUG_MARK_RODATA=n) and check
this. And also install a vanilla RHEL5 dom0 as it looks impossible to
compile a 2.6.18-era kernel under FC11.

The Xen I am using is xen-unstable - so 4.0.1. I know that the IRQ balance
code in the Xen hypervisor was fixed in 4.0 (it used to run out of
context - now it runs in the IRQ context). Maybe this bug you are seeing
(and have the fix for) is just a red-heering?
--

From: Prarit Bhargava
Date: Monday, May 3, 2010 - 12:56 pm

Let me try reproducing this on FC11 + 2.6.33.

P.

--

From: Konrad Rzeszutek Wilk
Date: Tuesday, May 4, 2010 - 8:02 am

Rebuilding everything from scratch did it. I am seeing a similar
failure where xenctx reports:

Call Trace:
  [<ffffffff8107f780>] stop_cpu+0xc6  <--
  [<ffffffff8105520e>] worker_thread+0x15d 
  [<ffffffff8107f6ba>] __stop_machine+0x106 
  [<ffffffff81058afb>] wake_up_bit+0x25 
  [<ffffffff81038720>] spin_unlock_irqrestore+0x9 
  [<ffffffff810550b1>] spin_lock_irq+0xb 
  [<ffffffff810586cb>] kthread+0x7a 
  [<ffffffff8100a964>] kernel_thread_helper+0x4 
  [<ffffffff81009d61>] int_ret_from_sys_call+0x7 
  [<ffffffff814033dd>] retint_restore_args+0x5 
  [<ffffffff8100a960>] gs_change+0x13 

With this guest file:

kernel = "/mnt/lab/vs11/vmlinuz"
ramdisk = "/mnt/lab/vs11/initramfs.cpio.gz"
memory = 2048
maxvcpus = 4
vcpus = 2
vif = [ 'mac=00:0F:4B:00:00:71, bridge=switch' ]
vfb = [ 'vnc=1, vnclisten=0.0.0.0,vncunused=1']
root = "debug loglevel=10 plymouth:splash=solar plymouth:debug norm console=hvc0 initcall_debug"

This is with the latest linux kernel:
d93ac51c7a129db7a1431d859a3ef45a0b1f3fc5 (Merge branch 'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client)

With your patch the PV guests keeps on going.

So:


Interestingly enough, I couldn't reproduce this on my Intel box, but on
a AMD box with a very wacked TSC (cpu MHz         : 2795681.405) I can
--

From: Prarit Bhargava
Date: Tuesday, May 4, 2010 - 8:21 am

Huh ... that's odd.  I'll grab a dinar based system and see if I can 
reproduce it there.  It would be interesting to know what the 
differences are.

P.
--

From: Andrew Morton
Date: Wednesday, April 28, 2010 - 11:26 am

On Tue, 27 Apr 2010 11:24:42 -0400

I assume this is needed for 2.6.34?

What about 2.6.33.x and earlier?
--

From: Prarit Bhargava
Date: Wednesday, April 28, 2010 - 11:29 am

Hey Andrew,

I actually pinged Chris Wright to see about including this in the 
-stable branches.  I haven't heard anything back so I'll reping him.

P.
--

From: Suresh Siddha
Date: Wednesday, April 28, 2010 - 11:42 am

It will be applicable for 2.6.33 and beyond.

thanks,
suresh

--

From: Andrew Morton
Date: Wednesday, April 28, 2010 - 11:50 am

On Wed, 28 Apr 2010 14:29:06 -0400

Well.  Pinging people offlist isn't very reliable.  Put

Cc: <stable@kernel.org>

at the end of the changelog and cc stable@kernel.org on the original
patch and then the patch will reliably receive consideration for
backporting.

I have added Cc:<stable@kernel.org> to my copy of the patch, so the
-stable guys will at least see it when I drop it after it is merged. 
But if the x86 maintainers were to merge your patch as you sent it, it
would have no Cc: <stable@kernel.org> when it goes into Linus's tree.

I worry that if the -stable maintainer see me drop a patch, but the
patch in Linus's tree doesn't have the stable tag, they might not merge
the fix into -stable.  I bugged them about this scenario recently and
the reply was a bit waffly ;)

By far the safest thing to do is to include the stable tag in your
changelog right at the outset.
--

From: Greg KH
Date: Wednesday, April 28, 2010 - 12:15 pm

It was?

I try my best, that if I see you drop a patch, to go dig through Linus's
tree to find if it landed there.  If not, I leave it in my queue, and do
that for a few releases.  If after a long time (like 6 months) I either
ping someone, or just drop it from my queue as I guessed that someone
dropped it for some reason.


Yes, that's the _easiest_ and will not get lost.

thanks,

greg k-h
--

From: H. Peter Anvin
Date: Friday, April 30, 2010 - 1:55 pm

This looks like it should be tagged stable for 2.6.33.  Is that correct?

	-hpa


--

From: H. Peter Anvin
Date: Friday, April 30, 2010 - 2:33 pm

Nevermind... see it has already been discussed.

	-hpa
--

From: Prarit Bhargava
Date: Friday, April 30, 2010 - 3:01 pm

Yes.

--

From: tip-bot for Prarit Bhargava
Date: Friday, April 30, 2010 - 2:36 pm

Commit-ID:  bbd391a15d82e14efe9d69ba64cadb855b061dba
Gitweb:     http://git.kernel.org/tip/bbd391a15d82e14efe9d69ba64cadb855b061dba
Author:     Prarit Bhargava <prarit@redhat.com>
AuthorDate: Tue, 27 Apr 2010 11:24:42 -0400
Committer:  H. Peter Anvin <hpa@zytor.com>
CommitDate: Fri, 30 Apr 2010 14:31:38 -0700

x86: Fix NULL pointer access in irq_force_complete_move() for Xen guests

Upstream PV guests fail to boot because of a NULL pointer in
irq_force_complete_move().  It is possible that xen guests have
irq_desc->chip_data = NULL.

Test for NULL chip_data pointer before attempting to complete an irq move.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
LKML-Reference: <20100427152434.16193.49104.sendpatchset@prarit.bos.redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org> [2.6.33]
---
 arch/x86/kernel/apic/io_apic.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 127b871..eb2789c 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2545,6 +2545,9 @@ void irq_force_complete_move(int irq)
 	struct irq_desc *desc = irq_to_desc(irq);
 	struct irq_cfg *cfg = desc->chip_data;
 
+	if (!cfg)
+		return;
+
 	__irq_complete_move(&desc, cfg->vector);
 }
 #else
--

Previous thread: Implement trace_power_frequency for all cpufreq drivers by Thomas Renninger on Tuesday, April 27, 2010 - 7:57 am. (6 messages)

Next thread: Re: [GIT PULL] updates for oprofile by Phil Carmody on Tuesday, April 27, 2010 - 8:25 am. (8 messages)