I think that I've seen this too on 2.6.26-rc8 on my laptop. It's 64-bit AMD-64 single core (although I run a SMP-alternatives kernel on it) and I was using the nv driver. The symptoms are the same. Everything was working great until I started X. I SSH'd in and got the dmesg after X locked up. The full dmesg and config are attached because Thunderbird is a very stupid program and won't let me paste without wrapping. Here is a bit of what I got though. [ 269.224276] BUG: unable to handle kernel paging request at ffffe20003480000 [ 269.224291] IP: [<ffffffff80297e30>] copy_page_range+0x520/0x760 [ 269.224306] PGD 1102067 PUD 1103067 PMD 0 [ 269.224315] Oops: 0000 [1] SMP [ 269.224473] Call Trace: [ 269.224495] [<ffffffff80239a9b>] dup_mm+0x26b/0x3c0 [ 269.224507] [<ffffffff8023a84c>] copy_process+0xc2c/0x1210 [ 269.224518] [<ffffffff8023aea3>] do_fork+0x73/0x310 [ 269.224526] [<ffffffff8024719e>] sys_rt_sigaction+0x8e/0xd0 [ 269.224536] [<ffffffff8020c2db>] system_call_after_swapgs+0x7b/0x80 [ 269.224542] [<ffffffff8020c5d7>] ptregscall_common+0x67/0xb0
which is enough to identify the oops as in copy_one_pte's get_page(page). Here's a patch I think we need, which I'm hoping will fix both your crashes - please let us know - thanks a lot. Stop mprotect's pte_modify from wiping out the x86 pte_special bit, which caused oops thereafter when vm_normal_page thought X's abnormal was normal. Signed-off-by: Hugh Dickins <hugh@veritas.com> --- Fix to 2.6.26-rc8-mm1 x86-implement-pte_special.patch Perhaps something similar needed for powerpc? Nick will know. include/asm-x86/pgtable.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- 2.6.26-rc8-mm1/include/asm-x86/pgtable.h 2008-07-03 11:34:55.000000000 +0100 +++ linux/include/asm-x86/pgtable.h 2008-07-04 20:58:36.000000000 +0100 @@ -57,7 +57,7 @@ /* Set of bits not changed in pte_modify */ #define _PAGE_CHG_MASK (PTE_MASK | _PAGE_PCD | _PAGE_PWT | \ - _PAGE_ACCESSED | _PAGE_DIRTY) + _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY) #define _PAGE_CACHE_MASK (_PAGE_PCD | _PAGE_PWT) #define _PAGE_CACHE_WB (0) --
[cut patch] I can report that this fixes things for me as well. Thank you. --
(back from vaccation) Great, thanks Hugh. Always nice to read the fix before the bug report :) powerpc should need it too, as well as an equivalent for s390. Thanks, --
Here is the required fix for powerpc-implement-pte_special.patch
I think we need a similar fix for s390 too. If so, then it really should get into 2.6.26, but this late in the release, I hope an s390 maintainer might be able to test and verify the fix? Thanks, Nick
Wow, yes, I hadn't realized s390 is ahead of the game there: glad you're
back to spot that. But yes, we'd prefer maintainer to confirm and push.
[PATCH]] s390: protect _PAGE_SPECIAL bit against mprotect
Stop mprotect's pte_modify from wiping out the s390 pte_special bit, which
caused oops thereafter when vm_normal_page thought X's abnormal was normal.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Hugh Dickins <hugh@veritas.com>
---
Index: linux-2.6/include/asm-s390/pgtable.h
===================================================================
--- linux-2.6.orig/include/asm-s390/pgtable.h
+++ linux-2.6/include/asm-s390/pgtable.h
@@ -223,6 +223,9 @@ extern char empty_zero_page[PAGE_SIZE];
#define _PAGE_SPECIAL 0x004 /* SW associated with special page */
#define __HAVE_ARCH_PTE_SPECIAL
+/* Set of bits not changed in pte_modify */
+#define _PAGE_CHG_MASK (PAGE_MASK | _PAGE_SPECIAL)
+
/* Six different types of pages. */
#define _PAGE_TYPE_EMPTY 0x400
#define _PAGE_TYPE_NONE 0x401
@@ -681,7 +684,7 @@ static inline void pte_clear(struct mm_s
*/
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{
- pte_val(pte) &= PAGE_MASK;
+ pte_val(pte) &= _PAGE_CHG_MASK;
pte_val(pte) |= pgprot_val(newprot);
return pte;
}
--
s390 will definitely need this. Loosing the pte special bit is not good. I'll prepate a please pull right away. Thanks of pointing this out. -- blue skies, Martin --
Thanks, I feel silly to take the authorship of this before your x86 version gets in (and will likely not be credited if it is folded before merging) Martin, could you please credit Hugh for the debugging? :) --
Oh, I'm more anxious for your fix to get to Linus than for credit to be fairly apportioned - we don't need an Oscar ceremony here! But if Martin does choose to add credits, then it's Ryan's and Zan's x86 reports that should be credited - they did the hard work. I didn't mention them in the x86 patch because at that stage I was just making a wild guess that this might be the cause of their problems. Hugh --
I've done my best to combine mprotect, mmap and munmap a MAP_PRIVATE mapping on a xip file system. The system runs stable with and without this patch. Could someone please enlighten me on how to reproduce the problem so that I can verify the fix? --
I guess you mean the xserver? See, we don't have PCI and thus don't have any graphics hardware - it's a server platform ;-). --
Well it was only the new xorg-server that caused the bug for me. I can't reproduce it else where.... --
Though it would be more obvious to use a MAP_SHARED mapping, we had that earlier thread in which it emerged that you're not using shared writable xip mappings, IIRC. So, sticking to MAP_PRIVATE, I'd expect the following sequence ptr = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_PRIVATE, xip_fd, 0); var = *ptr; mprotect(ptr, PAGE_SIZE, PROT_NONE); munmap(ptr, PAGE_SIZE); to do a put_page on a non-existent struct page derived from the pfn: perhaps corrupting other memory without being noticed? Or if you have CONFIG_DEBUG_VM=y (that would be a good move), to hit vm_normal_page's VM_BUG_ON(!pfn_valid(pte_pfn(pte))) before that. (I think you can just as well use PROT_READ|PROT_WRITE rather than PROT_NONE there, but in principle mprotect could optimize away that pte modification - though I think it goes ahead and does it anyway.) Hugh --
