We still need to pin PTEs, even if there are no PTE locks. Otherwise we'll BUG whenever there aren't PTE locks (i.e. whenever NR_CPUS < CONFIG_SPLIT_PTLOCK_CPUS), as we try to unpin PTEs which were never pinned in the first place.
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Ingo Molnar <mingo@elte.hu>
---
arch/x86/xen/mmu.c | 7 +++----
1 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index f5af913..1239bda 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -819,9 +819,10 @@ static int xen_pin_page(struct page *page, enum pt_level level)
pfn_pte(pfn, PAGE_KERNEL_RO),
level == PT_PGD ? UVMF_TLB_FLUSH : 0);
- if (ptl) {
+ if (level == PT_PTE)
xen_do_pin(MMUEXT_PIN_L1_TABLE, pfn);
+ if (ptl) {
/* Queue a deferred unlock for when this batch
is completed. */
xen_mc_callback(xen_pte_unlock, ptl);
@@ -924,9 +925,7 @@ static int xen_unpin_page(struct page *page, enum pt_level level)
*/
if (level == PT_PTE) {
ptl = xen_pte_lock(page);
-
- if (ptl)
- xen_do_pin(MMUEXT_UNPIN_TABLE, pfn);
+ xen_do_pin(MMUEXT_UNPIN_TABLE, pfn);
}
mcs = __xen_mc_entry(0);
--
1.5.4.3
--
applied to tip/x86/xen, thanks Alex. Ingo --
Where does the unpin happen? xen_unpin_page() also checks to see if it took the lock before trying to unpin, symmetric with xen_pin_page(). --
Here's the backtrace of the BUG() the patch addresses. Now you've
pointed it out - I see the asymmetry - and also suspect some ptes are
being left pinned.
I'm having trouble finding a cleaner solution which solves this but
doesn't incite more BUGs.
Perhaps you have an idea?
- Alex
------------[ cut here ]------------
kernel BUG at
/local/scratch/hotplug.linux.trees.git/arch/x86/xen/enlighten.c:847!
invalid opcode: 0000 [#1]
Modules linked in:
Pid: 1, comm: init Tainted: G W (2.6.27-rc5-tip #352)
EIP: 0061:[<c10038fe>] EFLAGS: 00010282 CPU: 0
EIP is at pin_pagetable_pfn+0x3f/0x4b
EAX: ffffffea EBX: df82bd7c ECX: 00000001 EDX: 00000000
ESI: 00007ff0 EDI: c1a579e0 EBP: df82bd94 ESP: df82bd7c
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: e021
Process init (pid: 1, ti=df82a000 task=df82c000 task.ti=df82a000)
Stack: 00000004 0014a0a1 00000000 0001fb4f c1a579e0 c1a579e0 df82bda4
c1003dd3
003f69e0 c157e024 df82bdac c1003e0e df82bdc8 c101b577 00000000
00000000
003f69e0 c1661000 dfb4e000 df82be28 c1062d35 1fb4f067 00000000
dfb4e000
Call Trace:
[<c1003dd3>] ? xen_release_ptpage+0x61/0x80
[<c1003e0e>] ? xen_release_pte+0xd/0xf
[<c101b577>] ? __pte_free_tlb+0x46/0x5f
[<c1062d35>] ? free_pgd_range+0x1dc/0x391
[<c10794e5>] ? setup_arg_pages+0x1b8/0x22b
[<c109b3cb>] ? load_elf_binary+0x3f1/0x10c6
[<c1062206>] ? get_user_pages+0x316/0x394
[<c1078bef>] ? get_arg_page+0x2c/0x7e
[<c1078bc1>] ? put_arg_page+0x8/0xa
[<c1078d97>] ? copy_strings+0x156/0x160
[<c1078e51>] ? search_binary_handler+0x7b/0x1c0
[<c1079e20>] ? do_execve+0x13a/0x1c4
[<c1007164>] ? sys_execve+0x29/0x4b
[<c1008a2e>] ? syscall_call+0x7/0xb
[<c100b11f>] ? kernel_execve+0x17/0x1c
[<c1003140>] ? run_init_process+0x17/0x19
[<c10031e8>] ? init_post+0xa6/0xf6
[<c100965f>] ? kernel_thread_helper+0x7/0x10
=======================
--
Right, I see. We shouldn't be pinning ptes on attachment in xen_alloc_ptpage() if we're not using split pte locks. --
Hmm ok. I'll search further for a solution which fixes things more officially (and deals with the CONFIG_SMP && (NR_CPUS < CONFIG_SPLIT_PTLOCK_CPUS) case, which I just realized is also broken, and isn't fixed by this patch). In the meantime, at least UP is now usable. - Alex --
I've got a patch here which fixes it. Will post shortly.
J
--
Define USE_SPLIT_PTLOCKS as a constant expression rather than repeating
"NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS" all over the place.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
arch/x86/xen/mmu.c | 2 +-
include/linux/mm.h | 6 +++---
include/linux/mm_types.h | 10 ++++++----
include/linux/sched.h | 6 +++---
4 files changed, 13 insertions(+), 11 deletions(-)
===================================================================
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -805,7 +805,7 @@
{
spinlock_t *ptl = NULL;
-#if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
+#if USE_SPLIT_PTLOCKS
ptl = __pte_lockptr(page);
spin_lock_nest_lock(ptl, &mm->page_table_lock);
#endif
===================================================================
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -914,7 +914,7 @@
}
#endif /* CONFIG_MMU && !__ARCH_HAS_4LEVEL_HACK */
-#if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
+#if USE_SPLIT_PTLOCKS
/*
* We tuck a spinlock to guard each pagetable page into its struct page,
* at page->private, with BUILD_BUG_ON to make sure that this will not
@@ -927,14 +927,14 @@
} while (0)
#define pte_lock_deinit(page) ((page)->mapping = NULL)
#define pte_lockptr(mm, pmd) ({(void)(mm); __pte_lockptr(pmd_page(*(pmd)));})
-#else
+#else /* !USE_SPLIT_PTLOCKS */
/*
* We use mm->page_table_lock to guard all pagetable pages of the mm.
*/
#define pte_lock_init(page) do {} while (0)
#define pte_lock_deinit(page) do {} while (0)
#define pte_lockptr(mm, pmd) ({(void)(pmd); &(mm)->page_table_lock;})
-#endif /* NR_CPUS < CONFIG_SPLIT_PTLOCK_CPUS */
+#endif /* USE_SPLIT_PTLOCKS */
static inline void pgtable_page_ctor(struct page *page)
{
===================================================================
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -21,11 +21,13 @@
struct address_space;
-#if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
+#define ...Acked-by: Hugh Dickins <hugh@veritas.com> --
We only pin PTE pages when using split PTE locks, so don't do the
pin/unpin when attaching/detaching pte pages to a pinned pagetable.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
arch/x86/xen/enlighten.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
===================================================================
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -864,7 +864,7 @@
if (!PageHighMem(page)) {
make_lowmem_page_readonly(__va(PFN_PHYS((unsigned long)pfn)));
- if (level == PT_PTE)
+ if (level == PT_PTE && USE_SPLIT_PTLOCKS)
pin_pagetable_pfn(MMUEXT_PIN_L1_TABLE, pfn);
} else
/* make sure there are no stray mappings of
@@ -932,7 +932,7 @@
if (PagePinned(page)) {
if (!PageHighMem(page)) {
- if (level == PT_PTE)
+ if (level == PT_PTE && USE_SPLIT_PTLOCKS)
pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, pfn);
make_lowmem_page_readwrite(__va(PFN_PHYS(pfn)));
}
--
On Tue, 09 Sep 2008 15:43:25 -0700 What are the effects of the bug which you fixed? Do you consider this to be 2.6.27 material? 2.6.26.x? 2.6.25.x? --
This is a fix for a bug which only exists in tip.git.
J
--
instead of putting it into the x86 tree i've split out the mm patch into
a separate, -git based isolated topic branch (tip/core/xen) - the
coordinates for it are below. The commit ID is:
4ef2d41: mm: define USE_SPLIT_PTLOCKS rather than repeating expression
Andrew, any objection to this approach? If it's fine to you then after
testing this commit will eventually go to linux-next via
tip/auto-core-next. (the x86 tree merges this tree and i've applied the
Xen fix after that merge.)
Ingo
----------------------------------------------->
the latest tip/core/xen topic tree is at:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git core/xen
------------------>
Jeremy Fitzhardinge (1):
mm: define USE_SPLIT_PTLOCKS rather than repeating expression
arch/x86/xen/mmu.c | 2 +-
include/linux/mm.h | 6 +++---
include/linux/mm_types.h | 10 ++++++----
include/linux/sched.h | 6 +++---
4 files changed, 13 insertions(+), 11 deletions(-)
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index aa37469..2e1b640 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -646,7 +646,7 @@ static spinlock_t *lock_pte(struct page *page)
{
spinlock_t *ptl = NULL;
-#if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
+#if USE_SPLIT_PTLOCKS
ptl = __pte_lockptr(page);
spin_lock(ptl);
#endif
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 72a15dc..4194bf8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -919,7 +919,7 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long a
}
#endif /* CONFIG_MMU && !__ARCH_HAS_4LEVEL_HACK */
-#if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
+#if USE_SPLIT_PTLOCKS
/*
* We tuck a spinlock to guard each pagetable page into its struct page,
* at page->private, with BUILD_BUG_ON to make sure that this will not
@@ -932,14 +932,14 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long a
} while (0)
#define ...Cheers Jeremy - a better solution. Worth mentioning - this doesn't exist peacefully alongside my patch 965fe78fe Ingo - could you revert that please? Thanks, - Alex --
Yes. Also, could you wait for an Ack from me before committing anything
Xen related? And Alex, could you be sure to explicitly cc: me on Xen
patches?
Thanks,
J
--
yes - that looked odd and un-ack-ed anyway - but i applied it [it's at the tail of x86/xen and easy to zap] because it solved a BUG_ON(). Btw., i suggested a different workflow to you in the past: please run Xen patches by Jeremy and ask him for Acks and Cc: him - thanks. Ingo --
