x86: fix endless page faults in mount_block_root for Linux 2.6

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Linux Kernel Mailing List
Date: Thursday, June 12, 2008 - 1:59 pm

Gitweb:     http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b29c70...
Commit:     b29c701deacd5d24453127c37ed77ef851c53b8b
Parent:     3703f39965a197ebd91743fc38d0f640606b8da3
Author:     Henry Nestler <henry.nestler@gmail.com>
AuthorDate: Mon May 12 15:44:39 2008 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu Jun 12 21:26:07 2008 +0200

    x86: fix endless page faults in mount_block_root for Linux 2.6
    
    Page faults in kernel address space between PAGE_OFFSET up to
    VMALLOC_START should not try to map as vmalloc.
    
    Fix rarely endless page faults inside mount_block_root for root
    filesystem at boot time.
    
    All 32bit kernels up to 2.6.25 can fail into this hole.
    I can not present this under native linux kernel. I see, that the 64bit
    has fixed the problem. I copied the same lines into 32bit part.
    
    Recorded debugs are from coLinux kernel 2.6.22.18 (virtualisation):
    http://www.henrynestler.com/colinux/testing/pfn-check-0.7.3/20080410-antinx/bug16-recu...
    The physicaly memory was trimmed down to 192MB to better catch the bug.
    More memory gets the bug more rarely.
    
    Details, how every x86 32bit system can fail:
    
    Start from "mount_block_root",
    http://lxr.linux.no/linux/init/do_mounts.c#L297
    There the variable "fs_names" got one memory page with 4096 bytes.
    Variable "p" walks through the existing file system types. The first
    string is no problem.
    But, with the second loop in mount_block_root the offset of "p" is not
    at beginning of page, the offset is for example +9, if "reiserfs" is the
    first in list.
    Than calls do_mount_root, and lands in sys_mount.
    Remember: Variable "type_page" contains now "fs_type+9" and not contains
    a full page.
    The sys_mount copies 4096 bytes with function "exact_copy_from_user()":
    http://lxr.linux.no/linux/fs/namespace.c#L1540
    
    Mostly exist pages after the buffer "fs_names+4096+9" and the page fault
    handler was not called. No problem.
    
    In the case, if the page after "fs_names+4096" is not mapped, the page
    fault handler was called from http://lxr.linux.no/linux/fs/namespace.c#L1320
    
    The do_page_fault gots an address 0xc03b4000.
    It's kernel address, address >= TASK_SIZE, but not from vmalloc! It's
    from "__getname()" alias "kmem_cache_alloc".
    The "error_code" is 0. "vmalloc_fault" will be call:
    http://lxr.linux.no/linux/arch/i386/mm/fault.c#L332
    
    "vmalloc_fault" tryed to find the physical page for a non existing
    virtual memory area. The macro "pte_present" in vmalloc_fault()
    got a next page fault for 0xc0000ed0 at:
    http://lxr.linux.no/linux/arch/i386/mm/fault.c#L282
    
    No PTE exist for such virtual address. The page fault handler was trying
    to sync the physical page for the PTE lockup.
    
    This called vmalloc_fault() again for address 0xc000000, and that also
    was not existing. The endless began...
    
    In normal case the cpu would still loop with disabled interrrupts. Under
    coLinux this was catched by a stack overflow inside printk debugs.
    
    Signed-off-by: Henry Nestler <henry.nestler@gmail.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/mm/fault.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fd7e179..8bcb6f4 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -497,6 +497,11 @@ static int vmalloc_fault(unsigned long address)
 	unsigned long pgd_paddr;
 	pmd_t *pmd_k;
 	pte_t *pte_k;
+
+	/* Make sure we are in vmalloc area */
+	if (!(address >= VMALLOC_START && address < VMALLOC_END))
+		return -1;
+
 	/*
 	 * Synchronize this task's top level page-table
 	 * with the 'reference' page table.
--
To unsubscribe from this list: send the line "unsubscribe git-commits-head" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
x86: fix endless page faults in mount_block_root for Linux 2.6, Linux Kernel Mailing ..., (Thu Jun 12, 1:59 pm)