Re: 2.6.27-rc xen pvops regression?

Previous thread: XFS status update for August 2008 by Christoph Hellwig on Sunday, August 31, 2008 - 10:10 am. (1 message)

Next thread: [PATCH] prevent ia64 from invoking irq handlers on offline CPUs by Paul E. McKenney on Sunday, August 31, 2008 - 10:28 am. (7 messages)
From: Bernhard Schmidt
Date: Sunday, August 31, 2008 - 10:18 am

Hello,

I've been a happy user of vanilla 2.6.26 with pvops in my Xen domU for
a couple of months. I wanted to test 2.6.27-rc (-rc3 a couple of weeks
ago, -rc5 just now) to verify another issue and ran into the following
error

Using config file "/etc/xen/miredo2.cfg".
Started domain miredo2
    0: 0x00000395 -> 0x00010000
Using APIC driver default
Could not enable APIC!
Allocating PCI resources starting at 20000000 (gap: 10000000:f0000000)
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 64861
Kernel command line: root=/dev/xvda1 ro 3 console=xvc0 lapic
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 1024 (order: 10, 4096 bytes)
Detected 2800.098 MHz processor.
Console: colour dummy device 80x25
console [tty0] enabled
console [hvc0] enabled
Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
Memory: 256164k/262144k available (1270k kernel code, 5152k reserved, 496k data, 156k init, 0k highmem)
virtual kernel memory layout:
    fixmap  : 0xf57a9000 - 0xf57fd000   ( 336 kB)
    pkmap   : 0xf5200000 - 0xf5400000   (2048 kB)
    vmalloc : 0xd0800000 - 0xf51fe000   ( 585 MB)
    lowmem  : 0xc0000000 - 0xd0000000   ( 256 MB)
      .init : 0xc02bc000 - 0xc02e3000   ( 156 kB)
      .data : 0xc023dbba - 0xc02b9fa0   ( 496 kB)
      .text : 0xc0100000 - 0xc023dbba   (1270 kB)
Checking if this processor honours the WP bit even in supervisor mode...Ok.
SLUB: Genslabs=12, HWalign=128, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
installing Xen timer for CPU 0
Calibrating delay loop (skipped), value calculated using timer frequency.. 5600.19 BogoMIPS (lpj=11200392)
Mount-cache hash table entries: 512
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Intel(R) Pentium(R) 4 CPU 2.80GHz stepping 04
Freeing SMP alternatives: 0k freed
initcall spawn_ksoftirqd+0x0/0x30 returned with ...
From: Alex Nixon
Date: Sunday, August 31, 2008 - 3:12 pm

Thanks Bernhard - I'm able to reproduce it with your config.

I've had a look, and it seems the register_vcpu_info hypercall is being 
made twice, failing on the second time.  In fact, 
paravirt_pagetable_setup_{start,done} are both being called twice - once 
from arch/x86/kernel/setup.c, and then again from arch/x86/mm/init_32.c

I suspect one of them should go, but I'm not familiar enough with this 
code to suggest which one.

But still with that crudely resolved, the kernel BUGs (see below).  
Jeremy - any ideas?

- Alex


`xm dmesg':

(XEN) mm.c:2249:d168 Mfn 24384 not pinned

dmesg:

EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 160k freed
------------[ cut here ]------------
kernel BUG at 
/local/scratch/hotplug.linux.trees.git/arch/x86/xen/enlighten.c:850!
invalid opcode: 0000 [#1]

Pid: 1, comm: init Tainted: G        W 
(2.6.27-rc5-tip-00672-ge5c5407-dirty #52)
EIP: 0061:[<c0102b04>] EFLAGS: 00010282 CPU: 0
EIP is at pin_pagetable_pfn+0x3f/0x4b
EAX: ffffffea EBX: e242dd74 ECX: 00000001 EDX: 00000000
ESI: 00007ff0 EDI: 00000000 EBP: e242dd8c ESP: e242dd74
 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: e021
Process init (pid: 1, ti=e242c000 task=e242a000 task.ti=e242c000)
Stack: 00000004 00024384 00000000 c144e8e0 00022747 00000000 e242dd9c 
c0102ee8
       c02fe5d4 c144e8e0 e242dda4 c0102f23 e242ddbc c0116722 00000000 
00000000
       00000000 00000000 e242de2c c01467a8 00000000 c1000000 00000000 
22747067
Call Trace:
 [<c0102ee8>] ? xen_release_ptpage+0x52/0x71
 [<c0102f23>] ? xen_release_pte+0xd/0xf
 [<c0116722>] ? __pte_free_tlb+0x35/0x4b
 [<c01467a8>] ? free_pgd_range+0x377/0x38c
 [<c015a4bf>] ? setup_arg_pages+0x1b8/0x22b
 [<c017a80f>] ? load_elf_binary+0x3f1/0x10a4
 [<c0104ffa>] ? check_events+0x8/0xe
 [<c0104f71>] ? xen_restore_fl_direct_end+0x0/0x3
 [<c0104937>] ? xen_force_evtchn_callback+0xf/0x14
 [<c0104f71>] ? xen_restore_fl_direct_end+0x0/0x3
 [<c014298a>] ? ...
From: Jeremy Fitzhardinge
Date: Sunday, August 31, 2008 - 4:01 pm

Hm, yes.  Looks like a unifcation side-effect, though I thought I'd run
a 32-bit kernel recently enough to notice this.  Anyway, I think the

Bit hard to know what this means without knowing what mfn 24384 is.


--

From: Bernhard Schmidt
Date: Sunday, August 31, 2008 - 4:47 pm

FWIW, I don't have any message in my "xm dmesg".

Bernhard
--

Previous thread: XFS status update for August 2008 by Christoph Hellwig on Sunday, August 31, 2008 - 10:10 am. (1 message)

Next thread: [PATCH] prevent ia64 from invoking irq handlers on offline CPUs by