Re: Xen kernel 2.6.23-rc7 bug at xen_mc_flush (arch/i386/xen/multicalls.c:68)

Previous thread: Linux 2.4.35.3 by Willy Tarreau on Sunday, September 23, 2007 - 6:20 pm. (1 message)

Next thread: [patch 1/3] new timerfd API - new timerfd API by Davide Libenzi on Sunday, September 23, 2007 - 6:49 pm. (4 messages)
To: <linux-kernel@...>
Date: Sunday, September 23, 2007 - 5:55 pm

Using kernel 2.6.23-rc7 as xen domU client system I observe a kernel bug
which occurs reproducibly when calling a shell from midnight commander F2
context menu or with testcase given below (However most other programs seem
to
be well behaved and do not trigger this bug). - A kernel compiled with debug
info gives:

Kernel BUG at c01037dc [verbose debug info unavailable]
invalid opcode: 0000 [#5]
PREEMPT SMP
...
Call Trace:
[<c0103de9>] <0> [<c015d1d1>] <0> [<c0190078>] <0> [<c012633e>] <0> [<c016fa54>]
<0> [<c0106547>] <0> [<c01080d2>] <0> =======================
...
gdb) l *0xc01037dc
0xc01037dc is in xen_mc_flush (arch/i386/xen/multicalls.c:68).
63 } else
64 BUG_ON(b->argidx != 0);
65
66 local_irq_restore(flags);
67
68 BUG_ON(ret);
69 }
0xc0103de9 is in xen_exit_mmap (arch/i386/xen/multicalls.h:42).
0xc015d1d1 is in exit_mmap (include/asm/paravirt.h:722).
0xc0190078 is in load_script (fs/binfmt_script.c:19).
0xc012633e is in mmput (kernel/fork.c:395).
0xc016fa54 is in do_execve (fs/exec.c:1421).
0xc0106547 is in sys_execve (arch/i386/kernel/process.c:793).
No source file for address 0xc01080d2.

/proc/cpuinfo: ...AMD Athlon(tm) X2 Dual Core Processor BE-2350 ...

full info is at http://spblinux.de/xen/20070923/

Same bug if preempt is disabled; same bug if vcpus is reduced to 1 in xen
domU.

Please cc to osth at freesurf.ch because I am not on the list.

Christian Ostheimer

testcase which triggers the bug:

#!/bin/bash
#
# modified configure script: max commandline length test
CONFIG_SHELL=/bin/bash
i=0
export teststring=ABCD
while (test "X"`$CONFIG_SHELL -c "echo X$teststring" 2>/dev/null` \
= "XX$teststring") >/dev/null 2>&1 &&
new_result=`expr "X$teststring" : ".*" 2>&1` &&
lt_cv_sys_max_cmd_len=$new_result &&
test $i != 17 # 1/2 MB s...

To: <osth@...>
Cc: <linux-kernel@...>
Date: Monday, September 24, 2007 - 8:43 pm

Hm, it just seems that its trying to unpin an mm on the error path of
execve, and so it hasn't been pinned. The simplest way to reproduce is:

$ echo foo > foo
$ chmod +x foo
$ ./foo

Anyway, try this patch.

J

---
arch/i386/xen/mmu.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

===================================================================
--- a/arch/i386/xen/mmu.c
+++ b/arch/i386/xen/mmu.c
@@ -558,6 +558,9 @@ void xen_exit_mmap(struct mm_struct *mm)
put_cpu();

spin_lock(&mm->page_table_lock);
- xen_pgd_unpin(mm->pgd);
+
+ /* pgd may not be pinned in the error exit path of execve */
+ if (PagePinned(virt_to_page(mm->pgd)))
+ xen_pgd_unpin(mm->pgd);
spin_unlock(&mm->page_table_lock);
}

-

To: Jeremy Fitzhardinge <jeremy@...>
Cc: <linux-kernel@...>
Date: Tuesday, September 25, 2007 - 5:27 am

Bug is solved by this patch. Thanks! - Maybe this patch can make it into
2.6.23 final?

Christian Ostheimer

Neu: Das erste ADSL-Abo ohne Monatsgebühr! Steigen Sie jetzt auf sunrise
ADSL free um.
http://www.sunrise.ch/privatkunden/iminternetsurfen/adsl/adsl_abosundpre...

-

To: <osth@...>
Cc: <linux-kernel@...>
Date: Tuesday, September 25, 2007 - 12:48 pm

Yes, I'll send it out today.

J
-

To: <osth@...>
Cc: <linux-kernel@...>
Date: Monday, September 24, 2007 - 3:47 am

OK, I think I've seen this before, and need to track it down. Could you
try again with a kernel with debug info, and does anything relevant
appear in "xm desg"?

Thanks,
J
-

Previous thread: Linux 2.4.35.3 by Willy Tarreau on Sunday, September 23, 2007 - 6:20 pm. (1 message)

Next thread: [patch 1/3] new timerfd API - new timerfd API by Davide Libenzi on Sunday, September 23, 2007 - 6:49 pm. (4 messages)