I run almost-daily kernel testing. I haven't seen 'crashme' cause a
kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2,
x86_64. After the first fault, I ran 'crashme' about 10 more times
to get the second fault (usually for 10 minutes, one time for 30
minutes).
[This is gjc-crashme, not Dave's scrashme replacement:
http://people.delphiforums.com/gjc/crashme.html]
There is very little helpful info. RIP is strange, e.g.: 000000000051b446
The call stack is not printed. No kernel symbols are printed,
even though I have CONFIG_KALLSYMS{_ALL}=y.
Here are the 2 kernel message logs of the faults.
Any ideas/suggestions? I suppose I will try to log the syscall
parameters that crashme is using.
~~~~~~~~~~~~~~~~~~~~ 1. ~~~~~~~~~~~~~~~~~~~~~~~
[ 7487.208128] Unable to handle kernel paging request at 00000000ff019b53 RIP:
[ 7487.212752] [<0000000000510eea>]
[ 7487.218537] PGD 10c1a2067 PUD 0
[ 7487.221811] Oops: 0000 [1] SMP
[ 7487.224989] CPU 2
[ 7487.227024] Modules linked in: loop
[ 7487.230550] Pid: 19139, comm: crashme Not tainted 2.6.23-rc6-git2 #1
[ 7487.236896] RIP: 0033:[<0000000000510eea>] [<0000000000510eea>]
[ 7487.242925] RSP: 002b:00007fffc9a8ec10 EFLAGS: 00010e83
[ 7487.248234] RAX: 00000000ffff8c4a RBX: 00000000004014f1 RCX: 00002b20e11c8b37
[ 7487.255361] RDX: 0000000000510ee0 RSI: 0000000000000000 RDI: 000000000000000a
[ 7487.262489] RBP: 00007fffc9a8ec10 R08: 00007fffc9a8eb60 R09: 0000000000000000
[ 7487.269616] R10: 0000000000000008 R11: 0000000000000612 R12: 0000000000000000
[ 7487.276743] R13: 00007fffc9a8ee00 R14: 0000000000000000 R15: 0000000000000000
[ 7487.283871] FS: 00002b20e13676d0(0000) GS:ffff81011fc75840(0000) knlGS:0000000000000000
[ 7487.291952] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 7487.297693] CR2: 00000000ff019b53 CR3: 000000005be6b000 CR4: 00000000000006e0
[ 7487.304821] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7487.311949] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: ...Interesting. If this is reproducible for you, can you try to narrow down It looks like it's a page fault as a result of a *user*space* access, and most likely your machine would continue happily, except you have "panic_on_oops" set, so when the oops happens, it shuts the system down. Now, the reason I say it looks like a user space access is that you have RIP: 0033:[<0000000000510eea>] RSP: 002b:00007fffc9a8ec10 which are all user space segments. So the register contents clearly say "page fault in user space". However, what makes the kernel oops (rather than just send a SIGSEGV) is that the page fault "error code" is zero (that's the number that is printed out just after the "Oops" string). For a normal user space access, you should have bit #2 set in the error code. So the kernel thinks it's a kernel page fault, because the page fault error code says so. But everything else seems to indicate that it's really user mode.. It would be very interesting to hear when this started happening. Even if you cannot bisect it down all the way, since you say that you do almost daily kernel testing, is this really new to 2.6.23-rc6-git2, and 2.6.23-rc6-git1 was fine? Andi, anything comes to mind? Linus -
It doesn't happen quite on demand. Only 2 times in several hours --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -
I looked thru several hundred test logs and didn't find any other occurrences of it. Doesn't prove much. --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -
> Andi, anything comes to mind? No, unfortunately not. There weren't any changes to entry.S recently that could corrupt the error code as far as I remember. Also cannot think of something else. A version where it started happening would be useful. -Andi -
I'll begin testing older kernels then. Had another on recent last night (probably not helpful): [58169.607164] Unable to handle kernel paging request at 00000000ff018915 RIP: [58169.611784] [<0000000000506986>] [58169.617570] PGD d4380067 PUD 0 [58169.620757] Oops: 0000 [1] SMP [58169.623943] CPU 2 [58169.625970] Modules linked in: loop [58169.629496] Pid: 26735, comm: crashme Not tainted 2.6.23-rc6-git4 #1 [58169.635842] RIP: 0033:[<0000000000506986>] [<0000000000506986>] [58169.641861] RSP: 002b:00007fff14d3f9a0 EFLAGS: 00010247 [58169.647171] RAX: 0000000000506909 RBX: 00002b8695e82c00 RCX: 00002b8695f15b37 [58169.654298] RDX: 0000000000000000 RSI: 0000000000401476 RDI: 000000000000000a [58169.661426] RBP: 00007fff14d3f9a0 R08: 00007fff14d3f8f0 R09: 0000000000000000 [58169.668553] R10: 0000000000000008 R11: 0000000000000212 R12: 0000000000000000 [58169.675680] R13: 00007fff14d3fb90 R14: 0000000000000000 R15: 0000000000000000 [58169.682808] FS: 00002b86960b46d0(0000) GS:ffff81011fc75840(0000) knlGS:0000000000000000 [58169.690889] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [58169.696630] CR2: 00000000ff018915 CR3: 00000000d8820000 CR4: 00000000000006e0 [58169.703757] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [58169.710885] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 [58169.718014] Process crashme (pid: 26735, threadinfo ffff8100cd176000, task ffff81011fffe040) [58169.726439] [58169.727937] RIP [<0000000000506986>] [58169.731609] RSP <00007fff14d3f9a0> [58169.735100] CR2: 00000000ff018915 [58169.738710] Kernel panic - not syncing: Fatal exception [58169.743953] Rebooting in 30 seconds.. -
At least the original "crashme" would write its random number seeds to a
logfile each time (and I made it fsync it in some versions), which meant
that once a crash happened, you could re-produce it immediately (if it was
reproducible at all, of course).
Does your crashme have something like that?
All your crashes look basically identical - I don't think there is
anything new in this one, they're all the same issue. What CPU do you have
- vendor, stepping, version etc - and has something else than the kernel
changed in your setup lately?
As mentioned, the crash does look like a user-level crash got reported as
a kernel page fault, and while a CPU bug sounds incredibly unlikely, this
does have the smell of something strange like a fault in the middle of an
"iretq" or "sysretq", where part of the CPU state has already been
restored - which would explain why rip/cs is user space - but some part of
the CPU is still in kernel mode - which would explain the incorrect page
fault error code.
Here's a really *stupid* patch (and untested too, btw) to see if it gets
easier to debug when you don't oops, just print the register state
instead.
(It might be interesting to also do something like
force_sig_specific(SIGSTOP, current);
to then be able to more easily attach to the process that had problems,
and debug it in user space to see what was going on..)
Linus
---
diff --git a/arch/x86_64/mm/fault.c b/arch/x86_64/mm/fault.c
index 327c9f2..1b81392 100644
--- a/arch/x86_64/mm/fault.c
+++ b/arch/x86_64/mm/fault.c
@@ -320,6 +320,11 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
info.si_code = SEGV_MAPERR;
+ if (!(error_code & PF_USER) && user_mode(regs)) {
+ printk("kernel mode page fault from user space? Huh?\n");
+ __show_regs(regs);
+ error_code |= PF_USER;
+ }
/*
* We fault-in kernel-space virtual memory on-demand. The
-
I tell it the "random" seed to use. I can also sets its debug level, but when I did that yesterday, it never faulted, so I lowered it again, Just kernel changes. CPU is dual Pentium Xeon + HT: processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 3 model name : Intel(R) Xeon(TM) CPU 3.40GHz stepping : 4 cpu MHz : 3400.227 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 c lflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl est t m2 cid xtpr bogomips : 6805.96 clflush size : 64 cache_alignment : 128 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 3 model name : Intel(R) Xeon(TM) CPU 3.40GHz stepping : 4 cpu MHz : 3400.227 cache size : 1024 KB physical id : 3 siblings : 2 core id : 0 cpu cores : 1 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 c lflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl est t m2 cid xtpr bogomips : 6800.28 clflush size : 64 cache_alignment : 128 address sizes : 36 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 15 model : 3 model name : Intel(R) Xeon(TM) CPU 3.40GHz stepping : 4 cpu MHz : 3400.227 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : ...
Side note - while thinking about this, I'm wondering whether maybe that "stupid" patch might not actually be the right thing to do. The fact is, the "user access" bit in the error code is in many ways a totally *different* issue than the "user_mode(regs)" test. We may end up having a system page fault for some action that was initiated by user space: eg taking a page fault on a segment load. And it may well be marked as a "system access" in the error code, even though the instruction that triggered it was all user mode, and even though we could just terminate the program rather than oops. That said, since the kernel isn't paging itself out, user mode generally should not be able to generate those kinds of system access page faults. Our segment tables should all be in core, and so I would still in *practice* always expect the register state and the error code to match in the "usermode'ness". So regardless of whether we want to trust "user_mode(regs)" more than "error_code & PF_USER", it would definitely be very interesting if you can give a good "this is where it started happening". Linus -
Also, can you point to good crashme sources, and give the arguments you used to run it when it crashed? The original gjc crashme doesn't even do a "mprotect(PROT_EXEC)" by default (nor does it even compile on a modern unix), so it's not going to do anything. I hacked it up, and it appears to work ok for me, but I'm not at all confident that I'm even close to recreating what you are doing. (It probably goes without saying that I've not reproduced the oops on my Core 2 Duo. Lots of #GP and #PF errors, but nothing interesting. Linus -
I use http://people.delphiforums.com/gjc/crashme.tgz from http://people.delphiforums.com/gjc/crashme.html, plus one small patch: --- gjc-crashme.orig/crashme.c +++ gjc-crashme/crashme.c @@ -260,7 +260,8 @@ void my_signal(sig, func) #else struct sigaction act; act.sa_handler = func; - act.sa_mask = 0; + //act.sa_mask = 0; + sigemptyset(&act.sa_mask); #ifdef linux act.sa_restorer = 0; #endif /* linux */ So I'm missing some mprotect() call also? -
Ok, that's close to what I was testing (one of the examples from the
Yeah, by default, it won't do a mprotect(), so as far as I can tell,
you're always going to hit NX on a writable data page. I'm running with
the appended patch.
That said, I didn't actually *check* that the mprotect makes any
difference, and maybe you have one of the early EM64T chips that didn't
have NX support at all. Or maybe it's dodgy, and is actually the whole
*reason* for the page fault problem..
I don't have any P4's capable of 64-bit operation (nor do I really want
any ;)
Linus
---
diff --git a/crashme.c b/crashme.c
index 5e7e663..19a2fbb 100644
--- a/crashme.c
+++ b/crashme.c
@@ -1,3 +1,5 @@
+#define pyr
+#define PAGSIZ 4096
/* crashme: Create a string of random bytes and then jump to it.
crashme [+]<nbytes>[.inc] <srand> <ntrys> [nsub] [verboseness] */
@@ -143,7 +145,6 @@ a script.
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/param.h>
-#include <sys/vmmac.h>
#define strchr index
#endif
@@ -215,7 +216,7 @@ unsigned char *bad_malloc(n)
{unsigned char *data;
data = (unsigned char *) malloc(n);
#ifdef pyr
- if (mprotect(((int)data/PAGSIZ)*PAGSIZ, (n/PAGSIZ+1)*PAGSIZ,
+ if (mprotect(((long)data/PAGSIZ)*PAGSIZ, (n/PAGSIZ+1)*PAGSIZ,
PROT_READ|PROT_WRITE|PROT_EXEC))
perror("mprotect");
#endif
@@ -259,11 +260,8 @@ void my_signal(sig, func)
signal(sig, func);
#else
struct sigaction act;
+ memset(&act, 0, sizeof(act));
act.sa_handler = func;
- act.sa_mask = 0;
-#ifdef linux
- act.sa_restorer = 0;
-#endif /* linux */
act.sa_flags = SA_NOMASK;
#ifdef SA_RESTART
act.sa_flags |= SA_RESTART;
-
I'll apply this patch today, but I haven't done so yet (for the 2 bug reports below). Then I'll back up more, prior to 2.6.22 (which ack that. ~~~~~~~~~~~~~~~~~~~~~~~ 2.6.23-rc3 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [21769.161490] Unable to handle kernel paging request at 00000000ff016abe RIP: [21769.166119] [<000000000050522f>] [21769.171912] PGD d5572067 PUD 0 [21769.175100] Oops: 0000 [1] SMP [21769.178278] CPU 3 [21769.180312] Modules linked in: loop [21769.183830] Pid: 8823, comm: crashme Not tainted 2.6.23-rc3 #1 [21769.189657] RIP: 0033:[<000000000050522f>] [<000000000050522f>] [21769.195684] RSP: 002b:00007fffe9528118 EFLAGS: 00010202 [21769.200993] RAX: 00000000c169bc00 RBX: 0000000000000000 RCX: 00002b88c172eb37 [21769.208121] RDX: 00000000005051e0 RSI: 0000000000000000 RDI: 000000000000000a [21769.215249] RBP: 00007fffe9528120 R08: 00007fffe9528070 R09: 0000000000000000 [21769.222377] R10: 0000000000000008 R11: 0000000000000216 R12: 0000000000000000 [21769.229505] R13: 00007fffe9528310 R14: 0000000000000000 R15: 0000000000000000 [21769.236632] FS: 00002b88c18cd6d0(0000) GS:ffff81011fc751c0(0000) knlGS:0000000000000000 [21769.244711] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [21769.250453] CR2: 00000000ff016abe CR3: 00000000d792f000 CR4: 00000000000006e0 [21769.257581] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [21769.264709] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 [21769.271837] Process crashme (pid: 8823, threadinfo ffff8100d5656000, task ffff81011ffff810) [21769.280175] [21769.281673] RIP [<000000000050522f>] [21769.285355] RSP <00007fffe9528118> [21769.288845] CR2: 00000000ff016abe [21769.292455] Kernel panic - not syncing: Fatal exception [21769.297699] Rebooting in 30 seconds.. ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2.6.22 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sep 15 17:15:03 caor-test8 kernel: [18447.172949] kernel mode page fault from user space? Huh? Sep 15 17:15:03 caor-test8 kernel: [18447.178268] Sep ...
Actually, it's probably better that you don't change your situation unnecessarily, in case the bug goes away. Since you are triggering the problem even *without* the patch, whatever crashme is doing now for you is the interesting case. I suspect NX just isn't active for you, so the mprotect doesn't matter, Ahh, ok, so this is not a new thing, and it happened in 2.6.22 too. I'm really starting to suspect some early EM64T bug, and I also suspect that it's harmless but that we should just do the trivial patch to say "if the register state is in user mode, we don't care if the CPU says it was a kernel access". Linus -
Namely something like this.. The basic idea is that it's pointless to test for "error_code & PF_USER" to decide whether we should oops or kill the process: the *real* issue is whether we *can* kill the process or not. And that depends not on whether the CPU claimed it was a user access or not, but on whether the register state we'd return to is user-mode or not! So anything that decides whether it should send a signal or do to the "no_context" Ooops path should use "user_mode_vm(regs)" (yeah, I realize that the "_vm" part is unnecessary on x86-64, but it doesn't hurt either, and all of the issues are the same on 32/64-bit) which tests the right thing. Now, normally the USER bit in the error code should be the exact same thing, except for - Some CPU bug (microcode issue, whatever) where some complex fault situation sets the wrong error code. - user space accesses that caused a system page fault (ie a page fault while handling another trap - possibly due to lazy page table setup and having the LDT or some other CPU data structure in vmalloc space) Now, the vmalloc space accesses should be handled separately anyway, so I really wonder if it's some subtle CPU bug (I can't reproduce any problems on my Core 2 Duo), but the point is that I think this patch really is conceptually a real fix regardless, even if it _shouldn't_ matter. Comments? Randy, this replaces the hacky patch I sent, but also shuts up about the odd thing you're hitting, so for testing your case further this may not be the right thing. However, it would be nice to hear whether this just makes "crashme" work properly for you without any side effects.. Linus ---- arch/x86_64/mm/fault.c | 9 ++++----- 1 files changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86_64/mm/fault.c b/arch/x86_64/mm/fault.c index 327c9f2..75270a0 100644 --- a/arch/x86_64/mm/fault.c +++ b/arch/x86_64/mm/fault.c @@ -87,7 +87,7 @@ static noinline int is_prefetch(struct ...
I'll test this overnight on 2.6.23-rc6-git2 since that was failing. I haven't been able to reproduce the fault on 2.6.21 after several hours of testing. I'll also test a microcode update to see if it helps. --- ~Randy -
Before you do the microcode update, try to see if you can bisect the place between 2.6.21->22 that seems to start it. Even if you don't get all the way, if you are confident enough about the "no error" case to be able to bisect it down by doing a few reboots, it will at least cut down the set of possible commits by roughly a factor of 2^<nr-of-bisect> events, so even "just" a series of 4-5 bisect things might give us more of a clue. Of course, if it's somewhat random and timing-dependent, bisection can be hard (the "2^n" thing is very efficient, but it also means that a *single* wrong answer will totally invalidate the result, so if something isn't entirely reproducible, bisection often fails!) Linus -
OK, I haven't done the microcode update yet. I ran crashme overnight with your newer patch and it crashed: [14254.327676] Unable to handle kernel paging request at 00000000ff021eaf RIP: [14254.332299] [<0000000000504225>] [14254.338084] PGD d8542067 PUD 0 [14254.341271] Oops: 0000 [1] SMP [14254.344449] CPU 3 [14254.346484] Modules linked in: loop [14254.350001] Pid: 28565, comm: crashme Not tainted 2.6.23-rc6-git2 #2 [14254.356349] RIP: 0033:[<0000000000504225>] [<0000000000504225>] [14254.362376] RSP: 002b:00007fff5afccbf8 EFLAGS: 00010656 [14254.367685] RAX: 000000005afccbf8 RBX: 00002abd4fbf5c00 RCX: 00002abd4fc88b37 [14254.374812] RDX: 0000000000504220 RSI: 0000000000000000 RDI: 000000000000000a [14254.381939] RBP: 00007fff5afccc00 R08: 00007fff5afccb50 R09: 0000000000000000 [14254.389068] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [14254.396195] R13: 00007fff5afccdf0 R14: 0000000000000000 R15: 0000000000000000 [14254.403324] FS: 00002abd4fe276d0(0000) GS:ffff81011fc751c0(0000) knlGS:0000000000000000 [14254.411403] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [14254.417144] CR2: 00000000ff021eaf CR3: 00000000d85f6000 CR4: 00000000000006e0 [14254.424273] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [14254.431400] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 [14254.438528] Process crashme (pid: 28565, threadinfo ffff8100d8970000, task ffff8100d8628820) [14254.446953] [14254.448443] RIP [<0000000000504225>] [14254.452124] RSP <00007fff5afccbf8> [14254.455614] CR2: 00000000ff021eaf -
Well, duh. That's because I forgot to do the "error_code & PF_USER" => "user_mode_vm(regs)" thing in the most common case - the "bad_area_nosemaphore" if-statement. But thinking more about it, it's actually just easier and more straightforward to just take the same approach that my original hack did, namely to just set PF_USER if the register state implies it was in user mode. So ignore that patch. You're better off with my original one that also gave debugging info, and if we decide that this really looks like a CPU buglet, the patch to actually commit would be a simplified version of that (appended, just FYI). Also, I changed my mind on the microcode update - if you have a newer microcode, go ahead and try it, because if that fixes the issue, then we can really just close this as a "unimportant CPU bug" with the patch below. Linus --- arch/x86_64/mm/fault.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/arch/x86_64/mm/fault.c b/arch/x86_64/mm/fault.c index 327c9f2..54816ad 100644 --- a/arch/x86_64/mm/fault.c +++ b/arch/x86_64/mm/fault.c @@ -374,6 +374,13 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs, if (unlikely(in_atomic() || !mm)) goto bad_area_nosemaphore; + /* + * User-mode registers count as a user access even for any + * potential system fault or CPU buglet. + */ + if (user_mode_vm(regs)) + error_code |= PF_USER; + again: /* When running in the kernel we expect faults to occur only to * addresses in user space. All other faults represent errors in the -
[switched email address due to some unknown problems] Reverted that patch. Loaded new microcode update file. Ran test for 3 hours without a fault. It all seems rather inconclusive to me, still, the new patch (below) --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -
I checked the Prescott Specification Update, but didn't seen anything that could cause such strange faults. I'm trying to reproduce it here. -Andi -
Did the room temperature change in the server room? ;) Those early EM64T P4 core based are fast but they suck a whole lot of power. It seems a bitflip in a cpu register. You can use cpufreq to set low frequency (they should support it, and they shouldn't use more than 100w per core that way) to lower the temp, and see if the problem goes away. If it's a software issue it will hopefully be still reproducible with ~2ghz instead of 3.4ghz. -
I rarely go there, but when I do, it's usually very cool. --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -
