Re: crashme fault

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Randy Dunlap <randy.dunlap@...>
Cc: Andi Kleen <andi@...>, lkml <linux-kernel@...>, Andi Kleen <ak@...>
Date: Saturday, September 15, 2007 - 3:44 pm

On Sat, 15 Sep 2007, Randy Dunlap wrote:

At least the original "crashme" would write its random number seeds to a 
logfile each time (and I made it fsync it in some versions), which meant 
that once a crash happened, you could re-produce it immediately (if it was 
reproducible at all, of course).

Does your crashme have something like that?

All your crashes look basically identical - I don't think there is 
anything new in this one, they're all the same issue. What CPU do you have 
- vendor, stepping, version etc - and has something else than the kernel 
changed in your setup lately?

As mentioned, the crash does look like a user-level crash got reported as 
a kernel page fault, and while a CPU bug sounds incredibly unlikely, this 
does have the smell of something strange like a fault in the middle of an 
"iretq" or "sysretq", where part of the CPU state has already been 
restored - which would explain why rip/cs is user space - but some part of 
the CPU is still in kernel mode - which would explain the incorrect page 
fault error code.

Here's a really *stupid* patch (and untested too, btw) to see if it gets 
easier to debug when you don't oops, just print the register state 
instead.

(It might be interesting to also do something like

	force_sig_specific(SIGSTOP, current);

to then be able to more easily attach to the process that had problems, 
and debug it in user space to see what was going on..)

		Linus
---
diff --git a/arch/x86_64/mm/fault.c b/arch/x86_64/mm/fault.c
index 327c9f2..1b81392 100644
--- a/arch/x86_64/mm/fault.c
+++ b/arch/x86_64/mm/fault.c
@@ -320,6 +320,11 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
 
 	info.si_code = SEGV_MAPERR;
 
+	if (!(error_code & PF_USER) && user_mode(regs)) {
+		printk("kernel mode page fault from user space? Huh?\n");
+		__show_regs(regs);
+		error_code |= PF_USER;
+	}
 
 	/*
 	 * We fault-in kernel-space virtual memory on-demand. The
-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
crashme fault, Randy Dunlap, (Thu Sep 13, 1:21 am)
Re: crashme fault, Andrea Arcangeli, (Sun Sep 16, 11:53 am)
Re: crashme fault, Randy Dunlap, (Sun Sep 16, 12:17 pm)
Re: crashme fault, Linus Torvalds, (Sat Sep 15, 12:28 am)
Re: crashme fault, Andi Kleen, (Sat Sep 15, 2:34 pm)
Re: crashme fault, Randy Dunlap, (Sat Sep 15, 2:40 pm)
Re: crashme fault, Linus Torvalds, (Sat Sep 15, 3:44 pm)
Re: crashme fault, Linus Torvalds, (Sat Sep 15, 6:15 pm)
Re: crashme fault, Linus Torvalds, (Sat Sep 15, 6:47 pm)
Re: crashme fault, Andi Kleen, (Sat Sep 15, 11:10 pm)
Re: crashme fault, Randy Dunlap, (Sat Sep 15, 7:47 pm)
Re: crashme fault, Linus Torvalds, (Sat Sep 15, 8:34 pm)
Re: crashme fault, Randy Dunlap, (Sun Sep 16, 12:40 pm)
Re: crashme fault, Linus Torvalds, (Sun Sep 16, 1:14 pm)
Re: crashme fault, Andi Kleen, (Sun Sep 16, 2:28 pm)
Re: crashme fault, Linus Torvalds, (Sun Sep 16, 2:12 pm)
Re: crashme fault, Randy Dunlap, (Mon Sep 17, 1:06 am)
Re: crashme fault, Linus Torvalds, (Mon Sep 17, 1:28 am)
Re: crashme fault, Randy Dunlap, (Mon Sep 17, 10:29 am)
Re: crashme fault, Linus Torvalds, (Mon Sep 17, 10:53 am)
Re: crashme fault, Randy Dunlap, (Mon Sep 17, 4:05 pm)
Re: crashme fault, Randy Dunlap, (Sat Sep 15, 3:53 pm)
Re: crashme fault, Randy Dunlap, (Sat Sep 15, 1:05 am)
Re: crashme fault, Randy Dunlap, (Sat Sep 15, 1:21 am)