login
Header Space

 
 

[PATCH 08 of 36] x86_64: Add gate_offset() and gate_segment() macros

Previous thread: none

Next thread: none
To: Ingo Molnar <mingo@...>
Cc: LKML <linux-kernel@...>, <x86@...>, xen-devel <xen-devel@...>, Stephen Tweedie <sct@...>, Eduardo Habkost <ehabkost@...>, Mark McLoughlin <markmc@...>
Date: Wednesday, June 25, 2008 - 12:18 am

Hi Ingo,

This series lays the groundwork for 64-bit Xen support.  It follows
the usual pattern: a series of general cleanups and improvements,
followed by additions and modifications needed to slide Xen in.

Most of the 64-bit paravirt-ops work has already been done and
integrated for some time, so the changes are relatively minor.

Interesting and potentially hazardous changes in this series are:

"paravirt/x86_64: move __PAGE_OFFSET to leave a space for hypervisor"

This moves __PAGE_OFFSET up by 16 GDT slots, from 0xffff810000000000
to 0xffff880000000000.  I have no general justification for this: the
specific reason is that Xen claims the first 16 kernel GDT slots for
itself, and we must move up the mapping to make room.  In the process
I parameterised the compile-time construction of the initial
pagetables in head_64.S to cope with it.

"x86_64: adjust mapping of physical pagetables to work with Xen"
"x86_64: create small vmemmap mappings if PSE not available"

This rearranges the construction of the physical mapping so that it
works with Xen.  This affects three aspects of the code:
 1. It can't use pse, so it will only use pse if the processor
    supports it.
 2. It never replaces an existing mapping, so it can just extend the
    early boot-provided mappings (either from head_64.S or the Xen domain
    builder).
 3. It makes sure that any page is iounmapped before attaching it to the 
    pagetable to avoid having writable aliases of pagetable pages.

The logical structure of the code is more or less unchanged, and still
works fine in the native case.

vmemmap mapping is likewise changed.

"x86_64: PSE no longer a hard requirement."

Because booting under Xen doesn't set PSE, it's no longer a hard
requirement for the kernel.  PSE will be used whereever possible.

Overall diffstat:
 arch/x86/Kconfig                    |    7 +
 arch/x86/ia32/ia32entry.S           |   37 +++--
 arch/x86/kernel/aperture_64.c       |    4 
 arch/x86/kernel/asm-offsets_32.c    |...
To: Jeremy Fitzhardinge <jeremy@...>
Cc: Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Stephen Tweedie <sct@...>, LKML <linux-kernel@...>
Date: Wednesday, June 25, 2008 - 8:40 am

This will significantly decrease the maximum amount of physical

Both sound like cases of "let's hack Linux to work around Xen 
problems"

-Andi
--
To: Andi Kleen <andi@...>
Cc: Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Stephen Tweedie <sct@...>, LKML <linux-kernel@...>
Date: Wednesday, June 25, 2008 - 4:03 pm

A bit, but not "significantly".  We'd already discussed that if the 
amount of physical starts approaching 2^48 then we'd hope that the chips 
will grow some more virtual bits.

    J
--
To: Andi Kleen <andi@...>, Jeremy Fitzhardinge <jeremy@...>
Cc: Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, LKML <linux-kernel@...>, Stephen Tweedie <sct@...>
Date: Wednesday, June 25, 2008 - 2:45 pm

What does Linux expect to scale up to? Reserving 16 PML4 entries leaves the
kernel with 120TB of available 'negative' address space. Should be plenty, I
would think.

 -- Keir


--
To: Keir Fraser <keir.fraser@...>
Cc: Andi Kleen <andi@...>, Jeremy Fitzhardinge <jeremy@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, LKML <linux-kernel@...>, Stephen Tweedie <sct@...>
Date: Wednesday, June 25, 2008 - 3:13 pm

There are already (ok non x86-64) systems shipping today with 10+TB of 
addressable memory. 100+TB is not that far away with typical
growth rates. Besides there has to be much more in the negative address
space than just direct mapping.

So far we always that 64bit Linux can support upto 1/4*max VA memory.
With your change that formula would be not true anymore.

-Andi 


--
To: Andi Kleen <andi@...>
Cc: Jeremy Fitzhardinge <jeremy@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, LKML <linux-kernel@...>, Stephen Tweedie <sct@...>
Date: Wednesday, June 25, 2008 - 3:22 pm

There are obviously no x64 boxes around at the moment with &gt;1TB of regular
shared memory, since no CPUs have more than 40 address lines. 100+TB RAM is
surely years away.

If this is a blocker issue, we could just keep PAGE_OFFSET as it is when Xen
support is not configured into the kernel. Then those who are concerned
about 5% extra headroom at 100TB RAM sizes can configure their kernel

Does the formula have any practical significance?

 -- Keir


--
To: Keir Fraser <keir.fraser@...>
Cc: Andi Kleen <andi@...>, Jeremy Fitzhardinge <jeremy@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, LKML <linux-kernel@...>, Stephen Tweedie <sct@...>
Date: Wednesday, June 25, 2008 - 4:14 pm

Yes, but why build something non scalable now that you have to fix in a few
years?  Especially when it comes with "i have no justification" in

Yes, because getting more than 48bits of VA will be extremly costly
in terms of infrastructure and assuming continuing growth rates and very large
machines 46bits is not all that much.

-Andi
--
To: Jeremy Fitzhardinge <jeremy@...>
Cc: LKML <linux-kernel@...>, <x86@...>, xen-devel <xen-devel@...>, Stephen Tweedie <sct@...>, Eduardo Habkost <ehabkost@...>, Mark McLoughlin <markmc@...>, Vegard Nossum <vegard.nossum@...>, Nick Piggin <npiggin@...>, Yinghai Lu <yhlu.kernel@...>, Arjan van de Ven <arjan@...>, Avi Kivity <avi@...>, Linus Torvalds <torvalds@...>
Date: Wednesday, June 25, 2008 - 4:42 am

This reduces native kernel max memory support from around 127 TB to 
around 120 TB. We also limit the Xen hypervisor to ~7 TB of physical 
memory - is that wise in the long run? Sure, current CPUs support 40 
physical bits [1 TB] for now so it's all theoretical at this moment.

my guess is that CPU makers will first extend the physical lines all the 
way up to 46-47 bits before they are willing to touch the logical model 
and extend the virtual space beyond 48 bits (47 bits of that available 
to kernel-space in practice - i.e. 128 TB).

So eventually, in a few years, we'll feel some sort of crunch when the # 
of physical lines approaches the # of logical bits - just like when 

That should be fine too - and probably useful for 64-bit kmemcheck 
support as well.

To further increase the symmetry between 64-bit and 32-bit, could you 
please also activate the mem=nopentium switch on 64-bit to allow the 
forcing of a non-PSE native 64-bit bootup? (Obviously not a good idea 
normally, as it wastes 0.1% of RAM and increases PTE related CPU cache 
footprint and TLB overhead, but it is useful for debugging.)

a few other risk areas:

- the vmalloc-sync changes. Are you absolutely sure that it does not
  matter for performance?

- "The 32-bit early_ioremap will work equally well for 64-bit, so just
   use it." Famous last words ;-)

Anyway, that's all theory - i'll try out your patchset in -tip to see 
what breaks in practice ;-)

	Ingo
--
To: Jeremy Fitzhardinge <jeremy@...>
Cc: LKML <linux-kernel@...>, <x86@...>, xen-devel <xen-devel@...>, Stephen Tweedie <sct@...>, Eduardo Habkost <ehabkost@...>, Mark McLoughlin <markmc@...>, Vegard Nossum <vegard.nossum@...>, Nick Piggin <npiggin@...>, Yinghai Lu <yhlu.kernel@...>
Date: Wednesday, June 25, 2008 - 11:22 am

i've put the commits (and a good number of dependent commits) into the 
new tip/x86/xen-64bit topic branch.

It quickly broke the build in testing:

 include/asm/pgalloc.h: In function ‘paravirt_pgd_free':
 include/asm/pgalloc.h:14: error: parameter name omitted
 arch/x86/kernel/entry_64.S: In file included from 
 arch/x86/kernel/traps_64.c:51:include/asm/pgalloc.h: In function ‘paravirt_pgd_free':
 include/asm/pgalloc.h:14: error: parameter name omitted
 [...]

with this config:

  http://redhat.com/~mingo/misc/config-Wed_Jun_25_16_37_51_CEST_2008.bad

this could easily be some integration mistake on my part, so please 
double-check the end result.

Merging it into tip/master is a bit tricky, due to various interactions. 
This should work fine if you check out the latest tip/master:

  git-merge tip/x86/xen-64bit
  [ ... fix up the trivial merge conflict ... ]

i've already merged tip/x86/xen-64bit-base topic into master, to make it 
easier. (there were a few preconditions for the 64-bit Xen patches which 
arent carried in linux-next - such as the nmi-safe changes.)

	Ingo
--
To: Ingo Molnar <mingo@...>
Cc: LKML <linux-kernel@...>, <x86@...>, xen-devel <xen-devel@...>, Stephen Tweedie <sct@...>, Eduardo Habkost <ehabkost@...>, Mark McLoughlin <markmc@...>, Vegard Nossum <vegard.nossum@...>, Nick Piggin <npiggin@...>, Yinghai Lu <yhlu.kernel@...>
Date: Wednesday, June 25, 2008 - 4:12 pm

No, looks like my fault.  The non-PARAVIRT version of 
paravirt_pgd_free() is:

static inline void paravirt_pgd_free(struct mm_struct *mm, pgd_t *) {}

but C doesn't like missing parameter names, even if unused.

This should fix it:

diff -r 19b73cc5fdf4 include/asm-x86/pgalloc.h
--- a/include/asm-x86/pgalloc.h	Wed Jun 25 11:24:41 2008 -0400
+++ b/include/asm-x86/pgalloc.h	Wed Jun 25 13:11:56 2008 -0700
@@ -11,7 +11,7 @@
 #include &lt;asm/paravirt.h&gt;
 #else
 #define paravirt_pgd_alloc(mm)	__paravirt_pgd_alloc(mm)
-static inline void paravirt_pgd_free(struct mm_struct *mm, pgd_t *) {}
+static inline void paravirt_pgd_free(struct mm_struct *mm, pgd_t *pgd) {}
 static inline void paravirt_alloc_pte(struct mm_struct *mm, unsigned long pfn)	{}
 static inline void paravirt_alloc_pmd(struct mm_struct *mm, unsigned long pfn)	{}
 static inline void paravirt_alloc_pmd_clone(unsigned long pfn, unsigned long clonepfn,



--
To: Jeremy Fitzhardinge <jeremy@...>
Cc: LKML <linux-kernel@...>, <x86@...>, xen-devel <xen-devel@...>, Stephen Tweedie <sct@...>, Eduardo Habkost <ehabkost@...>, Mark McLoughlin <markmc@...>, Vegard Nossum <vegard.nossum@...>, Nick Piggin <npiggin@...>, Yinghai Lu <yhlu.kernel@...>
Date: Thursday, June 26, 2008 - 6:57 am

that fixed the build but now we've got a boot crash with this config:

 time.c: Detected 2010.304 MHz processor.
 spurious 8259A interrupt: IRQ7.
 BUG: unable to handle kernel NULL pointer dereference at  0000000000000000
 IP: [&lt;0000000000000000&gt;]
 PGD 0
 Thread overran stack, or stack corrupted
 Oops: 0010 [1] SMP
 CPU 0

with:

  http://redhat.com/~mingo/misc/config-Thu_Jun_26_12_46_46_CEST_2008.bad

i've pushed out the current tip/xen-64bit branch, so that you can see 
how things look like at the moment, but i cannot put it into tip/master 
yet.

	Ingo
--
To: Ingo Molnar <mingo@...>
Cc: LKML <linux-kernel@...>, <x86@...>, xen-devel <xen-devel@...>, Stephen Tweedie <sct@...>, Eduardo Habkost <ehabkost@...>, Mark McLoughlin <markmc@...>, Vegard Nossum <vegard.nossum@...>, Nick Piggin <npiggin@...>, Yinghai Lu <yhlu.kernel@...>
Date: Thursday, June 26, 2008 - 3:02 pm

I don't know if this will fix this bug, but it's definitely a bugfix.  
It was trashing random pages by overwriting them with pagetables...

Subject: x86_64: memory mapping: don't trash large pmd mapping

Don't trash a large pmd's data when mapping physical memory.
This is a bugfix for "x86_64: adjust mapping of physical pagetables
to work with Xen".

Signed-off-by: Jeremy Fitzhardinge &lt;jeremy.fitzhardinge@citrix.com&gt;
---
 arch/x86/mm/init_64.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

===================================================================
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -311,7 +311,8 @@
 		}
 
 		if (pmd_val(*pmd)) {
-			phys_pte_update(pmd, address, end);
+			if (!pmd_large(*pmd))
+				phys_pte_update(pmd, address, end);
 			continue;
 		}
 


--
To: Ingo Molnar <mingo@...>
Cc: LKML <linux-kernel@...>, <x86@...>, xen-devel <xen-devel@...>, Stephen Tweedie <sct@...>, Eduardo Habkost <ehabkost@...>, Mark McLoughlin <markmc@...>, Vegard Nossum <vegard.nossum@...>, Nick Piggin <npiggin@...>, Yinghai Lu <yhlu.kernel@...>
Date: Thursday, June 26, 2008 - 2:25 pm

What stage during boot?  I'm seeing an initrd problem, but that's 
relatively late.

    J
--
To: Ingo Molnar <mingo@...>
Cc: LKML <linux-kernel@...>, <x86@...>, xen-devel <xen-devel@...>, Stephen Tweedie <sct@...>, Eduardo Habkost <ehabkost@...>, Mark McLoughlin <markmc@...>, Vegard Nossum <vegard.nossum@...>, Nick Piggin <npiggin@...>, Yinghai Lu <yhlu.kernel@...>
Date: Thursday, June 26, 2008 - 10:28 am

Blerg, a contextless NULL rip.  Have you done any bisection on it?  
Could you try again with the same config, but with 
"CONFIG_PARAVIRT_DEBUG" enabled as well?  That will BUG if it turns out 
to be trying to call a NULL paravirt-op


Yeah, I was expecting things to break somewhere with this lot :/

Could you add this patch?  I don't think it will help this case, but 
it's a bugfix.

    J

Subject: x86_64: use SWAPGS_UNSAFE_STACK in ia32entry.S

Use SWAPGS_UNSAFE_STACK in ia32entry.S in the places where the active
stack is the usermode stack.

Signed-off-by: Jeremy Fitzhardinge &lt;jeremy.fitzhardinge@citrix.com&gt;
---
 arch/x86/ia32/ia32entry.S |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

===================================================================
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -98,7 +98,7 @@
 	CFI_SIGNAL_FRAME
 	CFI_DEF_CFA	rsp,0
 	CFI_REGISTER	rsp,rbp
-	SWAPGS
+	SWAPGS_UNSAFE_STACK
 	movq	%gs:pda_kernelstack, %rsp
 	addq	$(PDA_STACKOFFSET),%rsp	
 	/*
@@ -210,7 +210,7 @@
 	CFI_DEF_CFA	rsp,PDA_STACKOFFSET
 	CFI_REGISTER	rip,rcx
 	/*CFI_REGISTER	rflags,r11*/
-	SWAPGS
+	SWAPGS_UNSAFE_STACK
 	movl	%esp,%r8d
 	CFI_REGISTER	rsp,r8
 	movq	%gs:pda_kernelstack,%rsp


--
To: Jeremy Fitzhardinge <jeremy@...>
Cc: LKML <linux-kernel@...>, <x86@...>, xen-devel <xen-devel@...>, Stephen Tweedie <sct@...>, Eduardo Habkost <ehabkost@...>, Mark McLoughlin <markmc@...>, Vegard Nossum <vegard.nossum@...>, Nick Piggin <npiggin@...>, Yinghai Lu <yhlu.kernel@...>
Date: Thursday, June 26, 2008 - 6:58 am

plus -tip auto-testing found another build failure with:

 http://redhat.com/~mingo/misc/config-Thu_Jun_26_12_46_46_CEST_2008.bad

arch/x86/kernel/entry_64.S: Assembler messages:
arch/x86/kernel/entry_64.S:1201: Error: invalid character '_' in mnemonic
arch/x86/kernel/entry_64.S:1205: Error: invalid character '_' in mnemonic
arch/x86/kernel/entry_64.S:1209: Error: invalid character '_' in mnemonic
arch/x86/kernel/entry_64.S:1213: Error: invalid character '_' in mnemonic

	Ingo
--
To: Ingo Molnar <mingo@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Thursday, June 26, 2008 - 10:34 am

I'm confused.  How did this config both crash and not build?

    J

--
To: Jeremy Fitzhardinge <jeremy@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Friday, June 27, 2008 - 12:03 pm

this problem still reproduces.

i've pushed out all fixes into tip/x86/xen-64bit. That branch combined 
with the config above still reproduces the build failure above.

	Ingo
--
To: Ingo Molnar <mingo@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Friday, June 27, 2008 - 3:04 pm

Subject: x86_64: fix non-paravirt compilation

Make sure SWAPGS and PARAVIRT_ADJUST_EXCEPTION_FRAME are properly
defined when CONFIG_PARAVIRT is off.

Fixes Ingo's build failure:
arch/x86/kernel/entry_64.S: Assembler messages:
arch/x86/kernel/entry_64.S:1201: Error: invalid character '_' in mnemonic
arch/x86/kernel/entry_64.S:1205: Error: invalid character '_' in mnemonic
arch/x86/kernel/entry_64.S:1209: Error: invalid character '_' in mnemonic
arch/x86/kernel/entry_64.S:1213: Error: invalid character '_' in mnemonic

Signed-off-by: Jeremy Fitzhardinge &lt;jeremy.fitzhardinge@citrix.com&gt;
---
 include/asm-x86/irqflags.h  |   22 +++++++++++++---------
 include/asm-x86/processor.h |    3 ---
 2 files changed, 13 insertions(+), 12 deletions(-)

===================================================================
--- a/include/asm-x86/irqflags.h
+++ b/include/asm-x86/irqflags.h
@@ -167,7 +167,20 @@
 #define INTERRUPT_RETURN_NMI_SAFE	NATIVE_INTERRUPT_RETURN_NMI_SAFE
 
 #ifdef CONFIG_X86_64
+#define SWAPGS	swapgs
+/*
+ * Currently paravirt can't handle swapgs nicely when we
+ * don't have a stack we can rely on (such as a user space
+ * stack).  So we either find a way around these or just fault
+ * and emulate if a guest tries to call swapgs directly.
+ *
+ * Either way, this is a good way to document that we don't
+ * have a reliable stack. x86_64 only.
+ */
 #define SWAPGS_UNSAFE_STACK	swapgs
+
+#define PARAVIRT_ADJUST_EXCEPTION_FRAME	/*  */
+
 #define INTERRUPT_RETURN	iretq
 #define USERGS_SYSRET64				\
 	swapgs;					\
@@ -233,15 +246,6 @@
 #else
 
 #ifdef CONFIG_X86_64
-/*
- * Currently paravirt can't handle swapgs nicely when we
- * don't have a stack we can rely on (such as a user space
- * stack).  So we either find a way around these or just fault
- * and emulate if a guest tries to call swapgs directly.
- *
- * Either way, this is a good way to document that we don't
- * have a reliable stack. x86_64 only.
- */
 #define ARCH_LOCKDEP_SYS_EXIT		call lock...
To: Jeremy Fitzhardinge <jeremy@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Sunday, June 29, 2008 - 4:43 am

i've put tip/x86/xen-64bit into tip/master briefly and it quickly 
triggered this crash on 64-bit x86:

Linux version 2.6.26-rc8-tip-00241-gc6c8cb2-dirty (mingo@dione)
 (gcc version 4.2.3) #12303 SMP Sun Jun 29 10:30:01 CEST 2008

Command line: root=/dev/sda6 console=ttyS0,115200 earlyprintk=serial,ttyS0,115200 debug initcall_debug apic=verbose sysrq_always_enabled ignore_loglevel selinux=0
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
 BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
 BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS)
 BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data)
 BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  Centaur CentaurHauls
console [earlyser0] enabled
debug: ignoring loglevel setting.
Entering add_active_range(0, 0x0, 0x9f) 0 entries of 25600 used
Entering add_active_range(0, 0x100, 0x3fff0) 1 entries of 25600 used
last_pfn = 0x3fff0 max_arch_pfn = 0x3ffffffff
init_memory_mapping
kernel direct mapping tables up to 3fff0000 @ 8000-a000
PANIC: early exception 0e rip 10:ffffffff804b24e2 error 0 cr2 ffffffffff300000
Pid: 0, comm: swapper Not tainted 2.6.26-rc8-tip-00241-gc6c8cb2-dirty #12303

Call Trace:
 [&lt;ffffffff80efe196&gt;] early_idt_handler+0x56/0x6a
 [&lt;ffffffff804b24e2&gt;] ? __memcpy_fromio+0x12/0x30
 [&lt;ffffffff804b24d9&gt;] ? __memcpy_fromio+0x9/0x30
 [&lt;ffffffff80f32f27&gt;] dmi_scan_machine+0x57/0x1b0
 [&lt;ffffffff80f02c15&gt;] setup_arch+0x3f5/0x5e0
 [&lt;ffffffff80efedd5&gt;] start_kernel+0x75/0x350
 [&lt;ffffffff80efe289&gt;] x86_64_start_reservations+0x89/0xa0
 [&lt;ffffffff80efe397&gt;] x86_64_start_kernel+0xf7/0x100

RIP 0x10

with this config:

  [ message continues ]
" title="http://redhat.com/~mingo/misc/config-Sun_Jun_29_10_2...">http://redhat.com/~mingo/misc/config-Sun_Jun_29_10_2...
To: Ingo Molnar <mingo@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Sunday, June 29, 2008 - 11:02 pm

Looks like the setup.c unification missed the early_ioremap init from 
the early_ioremap unification.  Unconditionally call early_ioremap_init().

Signed-off-by: Jeremy Fitzhardinge &lt;jeremy.fitzhardinge@citrix.com&gt;

diff -r 5c26177fdf8c arch/x86/kernel/setup.c
--- a/arch/x86/kernel/setup.c	Sun Jun 29 16:57:52 2008 -0700
+++ b/arch/x86/kernel/setup.c	Sun Jun 29 19:57:00 2008 -0700
@@ -523,11 +523,12 @@
 	memcpy(&amp;boot_cpu_data, &amp;new_cpu_data, sizeof(new_cpu_data));
 	pre_setup_arch_hook();
 	early_cpu_init();
-	early_ioremap_init();
 	reserve_setup_data();
 #else
 	printk(KERN_INFO "Command line: %s\n", boot_command_line);
 #endif
+
+	early_ioremap_init();
 
 	ROOT_DEV = old_decode_dev(boot_params.hdr.root_dev);
 	screen_info = boot_params.screen_info;


--
To: Jeremy Fitzhardinge <jeremy@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Monday, June 30, 2008 - 4:21 am

applied to tip/x86/unify-setup - thanks Jeremy.

I've reactived the x86/xen-64bit branch and i'm testing it currently.

	Ingo
--
To: Jeremy Fitzhardinge <jeremy@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Monday, June 30, 2008 - 5:22 am

-tip auto-testing found pagetable corruption (CPA self-test failure):

[   32.956015] CPA self-test:
[   32.958822]  4k 2048 large 508 gb 0 x 2556[ffff880000000000-ffff88003fe00000] miss 0
[   32.964000] CPA ffff88001d54e000: bad pte 1d4000e3
[   32.968000] CPA ffff88001d54e000: unexpected level 2
[   32.972000] CPA ffff880022c5d000: bad pte 22c000e3
[   32.976000] CPA ffff880022c5d000: unexpected level 2
[   32.980000] CPA ffff8800200ce000: bad pte 200000e3
[   32.984000] CPA ffff8800200ce000: unexpected level 2
[   32.988000] CPA ffff8800210f0000: bad pte 210000e3

config and full log can be found at:

 http://redhat.com/~mingo/misc/config-Mon_Jun_30_11_11_51_CEST_2008.bad
 http://redhat.com/~mingo/misc/log-Mon_Jun_30_11_11_51_CEST_2008.bad

i've pushed that tree out into tip/tmp.xen-64bit.Mon_Jun_30_11_11. The 
only new item in that tree over a well-tested base is x86/xen-64bit, so 
i've taken it out again.

	Ingo
--
To: Ingo Molnar <mingo@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Monday, June 30, 2008 - 7:04 pm

Phew.  OK, I've worked this out.  Short version is that's it's a false 
alarm, and there was no real failure here.  Long version:

    * I changed the code to create the physical mapping pagetables to
      reuse any existing mapping rather than replace it.   Specifically,
      reusing an pud pointed to by the pgd caused this symptom to appear.
    * The specific PUD being reused is the one created statically in
      head_64.S, which creates an initial 1GB mapping.
    * That mapping doesn't have _PAGE_GLOBAL set on it, due to the
      inconsistency between __PAGE_* and PAGE_*.
    * The CPA test attempts to clear _PAGE_GLOBAL, and then checks to
      see that the resulting range is 1) shattered into 4k pages, and 2)
      has no _PAGE_GLOBAL.
    * However, since it didn't have _PAGE_GLOBAL on that range to start
      with, change_page_attr_clear() had nothing to do, and didn't
      bother shattering the range,
    * resulting in the reported messages

The simple fix is to set _PAGE_GLOBAL in level2_ident_pgt.

An additional fix to make CPA testing more robust by using some other 
pagetable bit (one of the unused available-to-software ones).  This 
would solve spurious CPA test warnings under Xen which uses _PAGE_GLOBAL 
for its own purposes (ie, not under guest control).

Also, we should revisit the use of _PAGE_GLOBAL in asm-x86/pgtable.h, 
and use it consistently, and drop MAKE_GLOBAL.  The first time I 
proposed it it caused breakages in the very early CPA code; with luck 
that's all fixed now.

Anyway, the simple fix below.  I'll put together RFC patches for the 
other suggestions.  I also split the originating patch into tiny, tiny 
bisectable pieces.

Signed-off-by: Jeremy Fitzhardinge &lt;jeremy.fitzhardinge@citrix.com&gt;

---
 arch/x86/kernel/head_64.S |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

===================================================================
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -374,7 +37...
To: Jeremy Fitzhardinge <jeremy@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Tuesday, July 1, 2008 - 4:52 am

great - i've applied your fix and re-integrated x86/xen-64bit, it's 

cool! :)

	Ingo
--
To: Jeremy Fitzhardinge <jeremy@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Tuesday, July 1, 2008 - 5:21 am

hm, -tip testing still triggers a 64-bit bootup crash:

[    0.000000] init_memory_mapping
[    0.000000] kernel direct mapping tables up to 3fff0000 @ 8000-a000
PANIC: early exception 0e rip 10:ffffffff80418f81 error 0 cr2 ffffffffff300000
[    0.000000] Pid: 0, comm: swapper Not tainted 2.6.26-rc8-tip #13363
[    0.000000]
[    0.000000] Call Trace:
[    0.000000]  [&lt;ffffffff807f088b&gt;] ? init_memory_mapping+0x341/0x56b
[    0.000000]  [&lt;ffffffff80dba19f&gt;] early_idt_handler+0x5f/0x73
[    0.000000]  [&lt;ffffffff80418f81&gt;] ? __memcpy_fromio+0xd/0x1e
[    0.000000]  [&lt;ffffffff80de238a&gt;] dmi_scan_machine+0x41/0x19b
[    0.000000]  [&lt;ffffffff80dbeba8&gt;] setup_arch+0x46d/0x5d8
[    0.000000]  [&lt;ffffffff802896a0&gt;] ? kernel_text_unlock+0x10/0x12
[    0.000000]  [&lt;ffffffff80263b86&gt;] ? raw_notifier_chain_register+0x9/0xb
[    0.000000]  [&lt;ffffffff80dba140&gt;] ? early_idt_handler+0x0/0x73
[    0.000000]  [&lt;ffffffff80dbac5a&gt;] start_kernel+0xf4/0x3b3
[    0.000000]  [&lt;ffffffff80dba140&gt;] ? early_idt_handler+0x0/0x73
[    0.000000]  [&lt;ffffffff80dba2a4&gt;] x86_64_start_reservations+0xa9/0xad
[    0.000000]  [&lt;ffffffff80dba3b8&gt;] x86_64_start_kernel+0x110/0x11f
[    0.000000]

  http://redhat.com/~mingo/misc/crash.log-Tue_Jul__1_10_55_47_CEST_2008.bad
  http://redhat.com/~mingo/misc/config-Tue_Jul__1_10_55_47_CEST_2008.bad

Excluding the x86/xen-64bit topic solves the problem.

It triggered on two 64-bit machines so it seems readily reproducible 
with that config.

i've pushed the failing tree out to tip/tmp.xen-64bit.Tue_Jul__1_10_55

	Ingo

--
To: Ingo Molnar <mingo@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Tuesday, July 1, 2008 - 12:14 pm

The patch to fix this is on tip/x86/unify-setup: "x86: setup_arch() &amp;&amp; 
early_ioremap_init()".  Logically that patch should probably be in the 
xen64 branch, since it's only meaningful with the early_ioremap unification.

    J
--
To: Jeremy Fitzhardinge <jeremy@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Tuesday, July 1, 2008 - 4:31 pm

ah, indeed - it was missing from tip/master due to:

| commit ac998c259605741efcfbd215533b379970ba1d9f
| Author: Ingo Molnar &lt;mingo@elte.hu&gt;
| Date:   Mon Jun 30 12:01:31 2008 +0200
|
|    Revert "x86: setup_arch() &amp;&amp; early_ioremap_init()"
|
|    This reverts commit 181b3601a1a7d2ac3ace6b23cb3204450a4f9a27.

because that change needed the other changes from xen-64bit.

will retry tomorrow.

	Ingo
--
To: Jeremy Fitzhardinge <jeremy@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Thursday, July 3, 2008 - 5:10 am

ok, i've re-added x86/xen-64bit and it's looking good in testing so far.

	Ingo
--
To: Ingo Molnar <mingo@...>
Cc: Jeremy Fitzhardinge <jeremy@...>, Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>
Date: Thursday, July 3, 2008 - 2:20 pm

got
 [ffffe20000000000-ffffe27fffffffff] PGD -&gt;ffff88000128a000 on node 0
 [ffffe20000000000-ffffe2003fffffff] PUD -&gt;ffff88000128b000 on node 0
 [ffffe20000000000-ffffe200003fffff] PMD -&gt;
[ffff880001400000-ffff8800017fffff] on node 0
 [ffffe20000200000-ffffe200005fffff] PMD -&gt;
[ffff880001600000-ffff8800019fffff] on node 0
 [ffffe20000400000-ffffe200007fffff] PMD -&gt;
[ffff880001800000-ffff880001bfffff] on node 0
 [ffffe20000600000-ffffe200009fffff] PMD -&gt;
[ffff880001a00000-ffff880001dfffff] on node 0
 [ffffe20000800000-ffffe20000bfffff] PMD -&gt;
[ffff880001c00000-ffff880001ffffff] on node 0
 [ffffe20000a00000-ffffe20000dfffff] PMD -&gt;
[ffff880001e00000-ffff8800021fffff] on node 0
 [ffffe20000c00000-ffffe20000ffffff] PMD -&gt;
[ffff880002000000-ffff8800023fffff] on node 0
 [ffffe20000e00000-ffffe200011fffff] PMD -&gt;
[ffff880002200000-ffff8800025fffff] on node 0
 [ffffe20001000000-ffffe200013fffff] PMD -&gt;
[ffff880002400000-ffff8800027fffff] on node 0
 [ffffe20001200000-ffffe200015fffff] PMD -&gt;
[ffff880002600000-ffff8800029fffff] on node 0
 [ffffe20001400000-ffffe200017fffff] PMD -&gt;
[ffff880002800000-ffff880002bfffff] on node 0
 [ffffe20001600000-ffffe200019fffff] PMD -&gt;
[ffff880002a00000-ffff880002dfffff] on node 0
 [ffffe20001800000-ffffe20001bfffff] PMD -&gt;
[ffff880002c00000-ffff880002ffffff] on node 0
 [ffffe20001a00000-ffffe20001dfffff] PMD -&gt;
[ffff880002e00000-ffff8800031fffff] on node 0
 [ffffe20001c00000-ffffe20001ffffff] PMD -&gt;
[ffff880003000000-ffff8800033fffff] on node 0
 [ffffe20001e00000-ffffe200021fffff] PMD -&gt;
[ffff880003200000-ffff8800035fffff] on node 0
 [ffffe20002000000-ffffe200023fffff] PMD -&gt;
[ffff880003400000-ffff8800037fffff] on node 0
 [ffffe20002200000-ffffe200025fffff] PMD -&gt;
[ffff880003600000-ffff8800039fffff] on node 0
 [ffffe20002400000-ffffe200027fffff] PMD -&gt;
[ffff880003800000-ffff880003bfffff] on node 0
 [ffffe20002600000-ffffe200029fffff] PMD -&gt;
[ffff880003a00000-ffff880003dfffff] on n...
To: Yinghai Lu <yhlu.kernel@...>
Cc: Ingo Molnar <mingo@...>, Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>
Date: Thursday, July 3, 2008 - 2:25 pm

I haven't seen those messages before.  Can you explain what they mean?

    J
--
To: Jeremy Fitzhardinge <jeremy@...>
Cc: Ingo Molnar <mingo@...>, Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>
Date: Thursday, July 3, 2008 - 2:30 pm

that is for SPARSEMEM virtual memmap...

CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y

YH
--
To: Yinghai Lu <yhlu.kernel@...>
Cc: Ingo Molnar <mingo@...>, Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>
Date: Thursday, July 3, 2008 - 2:41 pm

I modified the vmemmap code so it would create 4k mappings if PSE isn't 
supported.  Did I get it wrong?  It should have no effect when PSE is 
available (which is any time you're not running under Xen).

    J
--
To: Jeremy Fitzhardinge <jeremy@...>
Cc: Ingo Molnar <mingo@...>, Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>
Date: Thursday, July 3, 2008 - 2:51 pm

it could be address continuous checkup for printout in
vmemmap_populated has some problem...

YH
--
To: Jeremy Fitzhardinge <jeremy@...>
Cc: Ingo Molnar <mingo@...>, Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>
Date: Thursday, July 3, 2008 - 3:19 pm

you moved p_end = p + PMD_SIZE before...

if (p_end != p || node_start != node) {

YH
--
To: Jeremy Fitzhardinge <jeremy@...>, Ingo Molnar <mingo@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>
Date: Thursday, July 3, 2008 - 3:29 pm

Ingo,

please put attached patch after jeremy's xen pv64 patches.

YH
To: Yinghai Lu <yhlu.kernel@...>
Cc: Jeremy Fitzhardinge <jeremy@...>, Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>
Date: Wednesday, July 9, 2008 - 3:42 am

applied, thanks.

	Ingo
--
To: Ingo Molnar <mingo@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Thursday, July 3, 2008 - 11:47 am

Great.  I'm hoping this stuff will be OK for the next merge, so I'm 
primed for fast turnaround bugfixes ;)

Also, I have the series of followup patches to actually implement 64-bit 
Xen which have much less impact on the non-Xen parts of the tree.  I'll 
probably mail them out later today.

Thanks,
    J
--
To: Ingo Molnar <mingo@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Tuesday, July 1, 2008 - 12:10 pm

Looks like you lost the other patch to put the early_ioremap_init in the 
right place...

    J
--
To: Ingo Molnar <mingo@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Monday, June 30, 2008 - 1:57 pm

This config doesn't have CONFIG_DEBUG_KERNEL enabled, let alone 
CONFIG_CPA_DEBUG.  I've noticed this seems to happen quite a lot: 
there's a disconnect between the log file and the config which is 
supposed to have built the kernel.  Is there a bug in your test 
infrastructure?

    J
--
To: Jeremy Fitzhardinge <jeremy@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Monday, June 30, 2008 - 2:03 pm

sometimes the kernel preceding the currently built one is the buggy one. 
As i have them saved away, so the right one should be:

 http://redhat.com/~mingo/misc/config-Mon_Jun_30_11_03_04_CEST_2008.bad

	Ingo
--
To: Ingo Molnar <mingo@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Monday, June 30, 2008 - 1:17 pm

That config doesn't build for me.  When I put it in place and do "make 
oldconfig" it still asks for lots of config options (which I just set to 
default).  But when I build it fails with:

  CC      arch/x86/kernel/asm-offsets.s
In file included from include2/asm/page.h:40,
                 from include2/asm/pda.h:8,
                 from include2/asm/current.h:19,
                 from include2/asm/processor.h:15,
                 from /home/jeremy/hg/xen/paravirt/linux/include/linux/prefetch.h:14,
                 from /home/jeremy/hg/xen/paravirt/linux/include/linux/list.h:6,
                 from /home/jeremy/hg/xen/paravirt/linux/include/linux/module.h:9,
                 from /home/jeremy/hg/xen/paravirt/linux/include/linux/crypto.h:21,
                 from /home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/asm-offsets_64.c:7,
                 from /home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/asm-offsets.c:4:
include2/asm/page_64.h:46:2: error: #error "CONFIG_PHYSICAL_START must be a multiple of 2MB"
make[3]: *** [arch/x86/kernel/asm-offsets.s] Error 1

I can fix that, of course, but it doesn't give me confidence I'm testing 
what you are...

    J
--
To: Jeremy Fitzhardinge <jeremy@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Monday, June 30, 2008 - 2:12 pm

the problem there is that the 32-bit config has:

CONFIG_PHYSICAL_START=0x100000

which the 64-bit make oldconfig picked up, but that start address is not 
valid on 64-bit.

	Ingo
--
To: Ingo Molnar <mingo@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Monday, June 30, 2008 - 2:36 pm

Er, we're talking about 64-bit here, aren't we?  The log messages are 
from a 64-bit kernel.

Well, it was the wrong config anyway, which I guess is the source of 
this confusion.

(I thought ARCH= to select 32/64 was going away now that the config has 
the bitsize config?)

    J
--
To: Jeremy Fitzhardinge <jeremy@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>, Sam Ravnborg <sam@...>
Date: Monday, June 30, 2008 - 2:44 pm

yep, correct - but it has to be done carefully - until now people (and 
tools) could assume that 'make oldconfig' just creates stuff for their 
native host architecture. But i agree in principle.

	Ingo
--
To: Jeremy Fitzhardinge <jeremy@...>
Cc: Ingo Molnar <mingo@...>, Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>
Date: Monday, June 30, 2008 - 12:35 am

it could be wrong? do we need that for 64 bit?

YH
--
To: Yinghai Lu <yhlu.kernel@...>
Cc: Ingo Molnar <mingo@...>, Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>
Date: Monday, June 30, 2008 - 1:32 am

Yes.  I unified the early_ioremap implementations by making 64-bit use 
the 32-bit one.

    J

--
To: Jeremy Fitzhardinge <jeremy@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Friday, June 27, 2008 - 11:56 am

i'm testing on multiple systems in parallel, each is running randconfig 
kernels. One 64-bit system found a build bug, the other one found a boot 
crash.

This can happen if certain configs build fine (but crash), certain 
configs dont even build. Each system does a random walk of the config 
space.

I've applied your two fixes and i'm re-testing.

	Ingo
--
To: Ingo Molnar <mingo@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Friday, June 27, 2008 - 12:02 pm

Yes, but the URL for both the crash and the build failure pointed to the 

Thanks,
    J
--
To: Jeremy Fitzhardinge <jeremy@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Friday, June 27, 2008 - 12:06 pm

yeah, i guess so. Right now i only ran into the build failure so there's 
hope :) Here's a config that fails to build for sure:

  http://redhat.com/~mingo/misc/config-Fri_Jun_27_17_54_32_CEST_2008.bad

note, on 32-bit there's a yet unfixed initrd corruption bug i've 
bisected back to:

| 510be56adc4bb9fb229637a8e89268d987263614 is first bad commit
| commit 510be56adc4bb9fb229637a8e89268d987263614
| Author: Yinghai Lu &lt;yhlu.kernel@gmail.com&gt;
| Date:   Tue Jun 24 04:10:47 2008 -0700
|
|    x86: introduce init_memory_mapping for 32bit

so if you see something like that it's probably not a bug introduced by 
your changes. (and maybe you'll see why the above commit is buggy, i 
havent figured it out yet)

	Ingo
--
To: Ingo Molnar <mingo@...>
Cc: Nick Piggin <npiggin@...>, Mark McLoughlin <markmc@...>, xen-devel <xen-devel@...>, Eduardo Habkost <ehabkost@...>, Vegard Nossum <vegard.nossum@...>, Stephen Tweedie <sct@...>, <x86@...>, LKML <linux-kernel@...>, Yinghai Lu <yhlu.kernel@...>
Date: Friday, June 27, 2008 - 12:25 pm

Well, on a non-PSE system find_early_table_space() will not allocate 
enough memory for ptes.  But I posted the fix for that, and it's likely 
you're using PSE anyway.  Nothing pops out from a quick re-read, but it 
could easily be mis-reserving the ramdisk memory or something.

    J
--
To: Ingo Molnar <mingo@...>
Cc: LKML <linux-kernel@...>, <x86@...>, xen-devel <xen-devel@...>, Stephen Tweedie <sct@...>, Eduardo Habkost <ehabkost@...>, Mark McLoughlin <markmc@...>, Vegard Nossum <vegard.nossum@...>, Nick Piggin <npiggin@...>, Yinghai Lu <yhlu.kernel@...>, Arjan van de Ven <arjan@...>, Avi Kivity <avi@...>, Linus Torvalds <torvalds@...>
Date: Wednesday, June 25, 2008 - 7:46 am

There's no inherent reason why Xen itself needs to be able to have all 
memory mapped at once.  32-bit Xen doesn't and can survive quite 
happily.  It's certainly nice to be able to access anything directly, 
but it's just a performance optimisation.  In practice, the guest 
generally has almost everything interesting mapped anyway, and Xen 
maintains a recursive mapping of the pagetable to make its access to the 
pagetable very efficient, so it's only when a hypercall is doing 
something to an unmapped page that there's an issue.

The main limitation the hole-size imposes is the max size of the machine 
to physical map.  That uses 8bytes/page, and reserves 256GB of space for 
it, meaning that the current limit is 2^47 bytes - but there's another 
256GB of reserved and unused space next to it, so that could be easily 

OK.  Though it might be an idea to add "nopse" and start deprecating 

Oh, I didn't mean to include that one.  I think it's probably safe (from 
both the performance and correctness stands), but it's not necessary for 

Yep, thanks,

    J
--
To: Ingo Molnar <mingo@...>
Cc: LKML <linux-kernel@...>, <x86@...>, xen-devel <xen-devel@...>, Stephen Tweedie <sct@...>, Eduardo Habkost <ehabkost@...>, Mark McLoughlin <markmc@...>
Date: Wednesday, June 25, 2008 - 12:19 am

Signed-off-by: Eduardo Habkost &lt;ehabkost@redhat.com&gt;
Signed-off-by: Jeremy Fitzhardinge &lt;jeremy.fitzhardinge@citrix.com&gt;
---
 arch/x86/kernel/entry_64.S |    4 ++--
 arch/x86/kernel/paravirt.c |    3 +++
 include/asm-x86/elf.h      |    2 +-
 include/asm-x86/paravirt.h |   10 ++++++++++
 include/asm-x86/system.h   |    3 ++-
 5 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1080,7 +1080,7 @@
 	
        /* Reload gs selector with exception handling */
        /* edi:  new selector */ 
-ENTRY(load_gs_index)
+ENTRY(native_load_gs_index)
 	CFI_STARTPROC
 	pushf
 	CFI_ADJUST_CFA_OFFSET 8
@@ -1094,7 +1094,7 @@
 	CFI_ADJUST_CFA_OFFSET -8
         ret
 	CFI_ENDPROC
-ENDPROC(load_gs_index)
+ENDPROC(native_load_gs_index)
        
         .section __ex_table,"a"
         .align 8
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -331,6 +331,9 @@
 	.store_idt = native_store_idt,
 	.store_tr = native_store_tr,
 	.load_tls = native_load_tls,
+#ifdef CONFIG_X86_64
+	.load_gs_index = native_load_gs_index,
+#endif
 	.write_ldt_entry = native_write_ldt_entry,
 	.write_gdt_entry = native_write_gdt_entry,
 	.write_idt_entry = native_write_idt_entry,
diff --git a/include/asm-x86/elf.h b/include/asm-x86/elf.h
--- a/include/asm-x86/elf.h
+++ b/include/asm-x86/elf.h
@@ -83,9 +83,9 @@
 	(((x)-&gt;e_machine == EM_386) || ((x)-&gt;e_machine == EM_486))
 
 #include &lt;asm/processor.h&gt;
+#include &lt;asm/system.h&gt;
 
 #ifdef CONFIG_X86_32
-#include &lt;asm/system.h&gt;		/* for savesegment */
 #include &lt;asm/desc.h&gt;
 
 #define elf_check_arch(x)	elf_check_arch_ia32(x)
diff --git a/include/asm-x86/paravirt.h b/include/asm-x86/paravirt.h
--- a/include/asm-x86/paravirt.h
+++ b/include/asm-x86/paravirt.h
@@ -115,6 +115,9 @@
 	void (*set_ld...
To: Jeremy Fitzhardinge <jeremy@...>
Cc: LKML <linux-kernel@...>, <x86@...>, xen-devel <xen-devel@...>, Stephen Tweedie <sct@...>, Eduardo Habkost <ehabkost@...>, Mark McLoughlin <markmc@...>
Date: Wednesday, June 25, 2008 - 4:47 am

patch logistics detail: the signoff order suggests it's been authored by 
Eduardo - but there's no From line to that effect - should i change it 
accordingly?

	Ingo
--
To: Ingo Molnar <mingo@...>
Cc: LKML <linux-kernel@...>, <x86@...>, xen-devel <xen-devel@...>, Stephen Tweedie <sct@...>, Eduardo Habkost <ehabkost@...>, Mark McLoughlin <markmc@...>
Date: Wednesday, June 25, 2008 - 7:48 am

Yes, it's Eduardo's.  Huh, I have the From line here; must have got 
stripped off by my script...

    J

--
To: Ingo Molnar <mingo@...>
Cc: LKML <linux-kernel@...>, <x86@...>, xen-devel <xen-devel@...>, Stephen Tweedie <sct@...>, Eduardo Habkost <ehabkost@...>, Mark McLoughlin <markmc@...>
Date: Wednesday, June 25, 2008 - 12:19 am

From: Eduardo Habkost &lt;ehabkost@redhat.com&gt;

We will need to set a pte on l3_user_pgt. Extract set_pte_vaddr_pud()
from set_pte_vaddr(), that will accept the l3 page table as parameter.

This change should be a no-op for existing code.

Signed-off-by: Eduardo Habkost &lt;ehabkost@redhat.com&gt;
Signed-off-by: Mark McLoughlin &lt;markmc@redhat.com&gt;
Signed-off-by: Jeremy Fitzhardinge &lt;jeremy.fitzhardinge@citrix.com&gt;
---
 arch/x86/mm/init_64.c        |   31 ++++++++++++++++++++-----------
 include/asm-x86/pgtable_64.h |    3 +++
 2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -149,22 +149,13 @@
 }
 
 void
-set_pte_vaddr(unsigned long vaddr, pte_t new_pte)
+set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte)
 {
-	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
 
-	pr_debug("set_pte_vaddr %lx to %lx\n", vaddr, native_pte_val(new_pte));
-
-	pgd = pgd_offset_k(vaddr);
-	if (pgd_none(*pgd)) {
-		printk(KERN_ERR
-			"PGD FIXMAP MISSING, it should be setup in head.S!\n");
-		return;
-	}
-	pud = pud_offset(pgd, vaddr);
+	pud = pud_page + pud_index(vaddr);
 	if (pud_none(*pud)) {
 		pmd = (pmd_t *) spp_getpage();
 		pud_populate(&amp;init_mm, pud, pmd);
@@ -195,6 +186,24 @@
 	 * (PGE mappings get flushed as well)
 	 */
 	__flush_tlb_one(vaddr);
+}
+
+void
+set_pte_vaddr(unsigned long vaddr, pte_t pteval)
+{
+	pgd_t *pgd;
+	pud_t *pud_page;
+
+	pr_debug("set_pte_vaddr %lx to %lx\n", vaddr, native_pte_val(pteval));
+
+	pgd = pgd_offset_k(vaddr);
+	if (pgd_none(*pgd)) {
+		printk(KERN_ERR
+			"PGD FIXMAP MISSING, it should be setup in head.S!\n");
+		return;
+	}
+	pud_page = (pud_t*)pgd_page_vaddr(*pgd);
+	set_pte_vaddr_pud(pud_page, vaddr, pteval);
 }
 
 /*
diff --git a/include/asm-x86/pgtable_64.h b/include/asm-x86/pgtable_64.h
--- a/include/asm-x86/pgtable_64.h
+++ b/include/asm-x86/pgtable_64.h
...
To: Ingo Molnar <mingo@...>
Cc: LKML <linux-kernel@...>, <x86@...>, xen-devel <xen-devel@...>, Stephen Tweedie <sct@...>, Eduardo Habkost <ehabkost@...>, Mark McLoughlin <markmc@...>
Date: Wednesday, June 25, 2008 - 12:19 am

64-bit Xen pushes a couple of extra words onto an exception frame.
Add a hook to deal with them.

Signed-off-by: Jeremy Fitzhardinge &lt;jeremy.fitzhardinge@citrix.com&gt;
---
 arch/x86/kernel/asm-offsets_64.c |    1 +
 arch/x86/kernel/entry_64.S       |    2 ++
 arch/x86/kernel/paravirt.c       |    3 +++
 arch/x86/xen/enlighten.c         |    3 +++
 include/asm-x86/paravirt.h       |    9 +++++++++
 include/asm-x86/processor.h      |    2 ++
 6 files changed, 20 insertions(+)

diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -61,6 +61,7 @@
 	OFFSET(PARAVIRT_PATCH_pv_irq_ops, paravirt_patch_template, pv_irq_ops);
 	OFFSET(PV_IRQ_irq_disable, pv_irq_ops, irq_disable);
 	OFFSET(PV_IRQ_irq_enable, pv_irq_ops, irq_enable);
+	OFFSET(PV_IRQ_adjust_exception_frame, pv_irq_ops, adjust_exception_frame);
 	OFFSET(PV_CPU_iret, pv_cpu_ops, iret);
 	OFFSET(PV_CPU_nmi_return, pv_cpu_ops, nmi_return);
 	OFFSET(PV_CPU_usergs_sysret32, pv_cpu_ops, usergs_sysret32);
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -864,6 +864,7 @@
  */ 		
 	.macro zeroentry sym
 	INTR_FRAME
+	PARAVIRT_ADJUST_EXCEPTION_FRAME
 	pushq $0	/* push error code/oldrax */ 
 	CFI_ADJUST_CFA_OFFSET 8
 	pushq %rax	/* push real oldrax to the rdi slot */ 
@@ -876,6 +877,7 @@
 
 	.macro errorentry sym
 	XCPT_FRAME
+	PARAVIRT_ADJUST_EXCEPTION_FRAME
 	pushq %rax
 	CFI_ADJUST_CFA_OFFSET 8
 	CFI_REL_OFFSET rax,0
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -298,6 +298,9 @@
 	.irq_enable = native_irq_enable,
 	.safe_halt = native_safe_halt,
 	.halt = native_halt,
+#ifdef CONFIG_X86_64
+	.adjust_exception_frame = paravirt_nop,
+#endif
 };
 
 struct pv_cpu_ops pv_cpu_ops = {
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/...
To: Ingo Molnar <mingo@...>
Cc: LKML <linux-kernel@...>, <x86@...>, xen-devel <xen-devel@...>, Stephen Tweedie <sct@...>, Eduardo Habkost <ehabkost@...>, Mark McLoughlin <markmc@...>
Date: Wednesday, June 25, 2008 - 12:19 am

This removes a pile of buggy open-coded implementations of savesegment
and loadsegment.

(They are buggy because they don't have memory barriers to prevent
them from being reordered with respect to memory accesses.)

Signed-off-by: Jeremy Fitzhardinge &lt;jeremy.fitzhardinge@citrix.com&gt;
---
 arch/x86/kernel/cpu/common_64.c |    3 ++-
 arch/x86/kernel/process_64.c    |   28 +++++++++++++++-------------
 2 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/common_64.c b/arch/x86/kernel/cpu/common_64.c
--- a/arch/x86/kernel/cpu/common_64.c
+++ b/arch/x86/kernel/cpu/common_64.c
@@ -480,7 +480,8 @@
 	struct x8664_pda *pda = cpu_pda(cpu);
 
 	/* Setup up data that may be needed in __get_free_pages early */
-	asm volatile("movl %0,%%fs ; movl %0,%%gs" :: "r" (0));
+	loadsegment(fs, 0);
+	loadsegment(gs, 0);
 	/* Memory clobbers used to order PDA accessed */
 	mb();
 	wrmsrl(MSR_GS_BASE, pda);
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -362,10 +362,10 @@
 	p-&gt;thread.fs = me-&gt;thread.fs;
 	p-&gt;thread.gs = me-&gt;thread.gs;
 
-	asm("mov %%gs,%0" : "=m" (p-&gt;thread.gsindex));
-	asm("mov %%fs,%0" : "=m" (p-&gt;thread.fsindex));
-	asm("mov %%es,%0" : "=m" (p-&gt;thread.es));
-	asm("mov %%ds,%0" : "=m" (p-&gt;thread.ds));
+	savesegment(gs, p-&gt;thread.gsindex);
+	savesegment(fs, p-&gt;thread.fsindex);
+	savesegment(es, p-&gt;thread.es);
+	savesegment(ds, p-&gt;thread.ds);
 
 	if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) {
 		p-&gt;thread.io_bitmap_ptr = kmalloc(IO_BITMAP_BYTES, GFP_KERNEL);
@@ -404,7 +404,9 @@
 void
 start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
 {
-	asm volatile("movl %0, %%fs; movl %0, %%es; movl %0, %%ds" :: "r"(0));
+	loadsegment(fs, 0);
+	loadsegment(es, 0);
+	loadsegment(ds, 0);
 	load_gs_index(0);
 	regs-&gt;ip		= new_ip;
 	regs-&gt;sp		= new_sp;
@@ -591,11 +593,...
To: Ingo Molnar <mingo@...>
Cc: LKML <linux-kernel@...>, <x86@...>, xen-devel <xen-devel@...>, Stephen Tweedie <sct@...>, Eduardo Habkost <ehabkost@...>, Mark McLoughlin <markmc@...>
Date: Wednesday, June 25, 2008 - 12:19 am

Because Xen doesn't support PSE mappings in guests, all code which
assumed the presence of PSE has been changed to fall back to smaller
mappings if necessary.  As a result, PSE is optional rather than
required (though still used whereever possible).

Signed-off-by: Jeremy Fitzhardinge &lt;jeremy.fitzhardinge@citrix.com&gt;
---
 include/asm-x86/required-features.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/asm-x86/required-features.h b/include/asm-x86/required-features.h
--- a/include/asm-x86/required-features.h
+++ b/include/asm-x86/required-features.h
@@ -42,7 +42,7 @@
 #endif
 
 #ifdef CONFIG_X86_64
-#define NEED_PSE	(1&lt;&lt;(X86_FEATURE_PSE &amp; 31))
+#define NEED_PSE	0
 #define NEED_MSR	(1&lt;&lt;(X86_FEATURE_MSR &amp; 31))
 #define NEED_PGE	(1&lt;&lt;(X86_FEATURE_PGE &amp; 31))
 #define NEED_FXSR	(1&lt;&lt;(X86_FEATURE_FXSR &amp; 31))


--
To: Ingo Molnar <mingo@...>
Cc: LKML <linux-kernel@...>, <x86@...>, xen-devel <xen-devel@...>, Stephen Tweedie <sct@...>, Eduardo Habkost <ehabkost@...>, Mark McLoughlin <markmc@...>
Date: Wednesday, June 25, 2008 - 12:19 am