* Venki Pallipadi <venkatesh.pallipadi@intel.com> wrote:hm, that's one dangerous looking patch. (Cc:-ed more MM folks. I've attached the patch below for reference.) the purpose of the fix itself seems to make some sense - we dont want mprotect() change the PAT bits in the pte from what they got populated with at fault or mmap time. the pte_modify() change looks correct at first sight. The _PAGE_PROT_PRESERVE_BITS solution looks a bit ugly (although we do have a couple of other similar #ifndefs in the MM already). at minimum we should add vm_get_page_prot_preserve() as an inline function to mm.h if _PAGE_PROT_PRESERVE_BITS is not defined, and make it call vm_get_page_prot(). Also, vm_get_page_prot() in mm/mmap.c should probably be marked inline so that we'll have only a single function call [vm_get_page_prot() is trivial]. but i'm wondering why similar issues never came up on other architectures - i thought it would be rather common to have immutable pte details. So maybe i'm missing something here ... Ingo -------------------------------------> Subject: generic, x86, PAT: fix mprotect From: Venki Pallipadi <venkatesh.pallipadi@intel.com> Date: Tue, 6 May 2008 15:42:40 -0700 There is a hole in mprotect, which lets the user to change the page cache type bits by-passing the kernel reserve_memtype and free_memtype wrappers. Fix the hole by not letting mprotect change the PAT bits. Some versions of X used the mprotect hole to change caching type from UC to WB, so that it can then use mtrr to program WC for that region [1]. Change the mmap of pci space through /sys or /proc interfaces from UC to UC_MINUS. With this change, X will not need to use mprotect hole to get WC type. [1] lkml.org/lkml/2008/4/16/369 Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- arch/x86/pci/i386.c | 4 +--- include/asm-x86/pgtable.h | 5 ++++- include/linux/mm.h | 1 + mm/mmap.c | 13 +++++++++++++ mm/mprotect.c | 4 +++- 5 files changed, 22 insertions(+), 5 deletions(-) Index: linux-x86.q/arch/x86/pci/i386.c =================================================================== --- linux-x86.q.orig/arch/x86/pci/i386.c +++ linux-x86.q/arch/x86/pci/i386.c @@ -301,15 +301,13 @@ int pci_mmap_page_range(struct pci_dev * prot = pgprot_val(vma->vm_page_prot); if (pat_wc_enabled && write_combine) prot |= _PAGE_CACHE_WC; - else if (pat_wc_enabled) + else if (pat_wc_enabled || boot_cpu_data.x86 > 3) /* * ioremap() and ioremap_nocache() defaults to UC MINUS for now. * To avoid attribute conflicts, request UC MINUS here * aswell. */ prot |= _PAGE_CACHE_UC_MINUS; - else if (boot_cpu_data.x86 > 3) - prot |= _PAGE_CACHE_UC; vma->vm_page_prot = __pgprot(prot); Index: linux-x86.q/include/asm-x86/pgtable.h =================================================================== --- linux-x86.q.orig/include/asm-x86/pgtable.h +++ linux-x86.q/include/asm-x86/pgtable.h @@ -66,6 +66,8 @@ #define _PAGE_CACHE_UC_MINUS (_PAGE_PCD) #define _PAGE_CACHE_UC (_PAGE_PCD | _PAGE_PWT) +#define _PAGE_PROT_PRESERVE_BITS (_PAGE_CACHE_MASK) + #define PAGE_NONE __pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED) #define PAGE_SHARED __pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \ _PAGE_ACCESSED | _PAGE_NX) @@ -289,7 +291,8 @@ static inline pte_t pte_modify(pte_t pte * Chop off the NX bit (if present), and add the NX portion of * the newprot (if present): */ - val &= _PAGE_CHG_MASK & ~_PAGE_NX; + /* We also preserve PAT bits from existing pte */ + val &= (_PAGE_CHG_MASK | _PAGE_PROT_PRESERVE_BITS) & ~_PAGE_NX; val |= pgprot_val(newprot) & __supported_pte_mask; return __pte(val); Index: linux-x86.q/include/linux/mm.h =================================================================== --- linux-x86.q.orig/include/linux/mm.h +++ linux-x86.q/include/linux/mm.h @@ -1177,6 +1177,7 @@ static inline unsigned long vma_pages(st } pgprot_t vm_get_page_prot(unsigned long vm_flags); +pgprot_t vm_get_page_prot_preserve(unsigned long vm_flags, pgprot_t oldprot); struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr); int remap_pfn_range(struct vm_area_struct *, unsigned long addr, unsigned long pfn, unsigned long size, pgprot_t); Index: linux-x86.q/mm/mmap.c =================================================================== --- linux-x86.q.orig/mm/mmap.c +++ linux-x86.q/mm/mmap.c @@ -77,6 +77,19 @@ pgprot_t vm_get_page_prot(unsigned long } EXPORT_SYMBOL(vm_get_page_prot); +#ifndef _PAGE_PROT_PRESERVE_BITS +#define _PAGE_PROT_PRESERVE_BITS 0 +#endif + +pgprot_t vm_get_page_prot_preserve(unsigned long vm_flags, pgprot_t oldprot) +{ + pteval_t newprotval = pgprot_val(oldprot); + + newprotval &= _PAGE_PROT_PRESERVE_BITS; + newprotval |= pgprot_val(vm_get_page_prot(vm_flags)); + return __pgprot(newprotval); +} + int sysctl_overcommit_memory = OVERCOMMIT_GUESS; /* heuristic overcommit */ int sysctl_overcommit_ratio = 50; /* default is 50% */ int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT; Index: linux-x86.q/mm/mprotect.c =================================================================== --- linux-x86.q.orig/mm/mprotect.c +++ linux-x86.q/mm/mprotect.c @@ -192,7 +192,9 @@ success: * held in write mode. */ vma->vm_flags = newflags; - vma->vm_page_prot = vm_get_page_prot(newflags); + vma->vm_page_prot = vm_get_page_prot_preserve(newflags, + vma->vm_page_prot); + if (vma_wants_writenotify(vma)) { vma->vm_page_prot = vm_get_page_prot(newflags & ~VM_SHARED); dirty_accountable = 1; --
| Pardo | Re: pthread_create() slow for many threads; also time to revisit 64b context switc... |
| Paul Jackson | Inquiry: Should we remove "isolcpus= kernel boot option? (may have realtime uses) |
| Srivatsa Vaddagiri | Re: [PATCH, RFC] reimplement flush_workqueue() |
| Peter Zijlstra | Re: Btrfs v0.16 released |
git: | |
| Giuseppe Bilotta | Re: gitweb and remote branches |
| Miklos Vajna | [rfc] git submodules howto |
| JD Guzman | C# Git Implementation |
| Junio C Hamano | Re: [PATCH] fix parallel make problem |
| Richard Stallman | Real men don't attack straw men |
| Steve B | SSH brute force attacks no longer being caught by PF rule |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Marius ROMAN | 1440x900 resolution problem |
| Tomasz Grobelny | [PATCH 0/5] [DCCP]: Queuing policies |
| Dushan Tcholich | Re: ksoftirqd high cpu load on kernels 2.6.24 to 2.6.27-rc1-mm1 |
| John Heffner | Re: A Linux TCP SACK Question |
| Denys Fedoryshchenko | Re: Could you make vconfig less stupid? |
