Peoplez,
Ive narrowed down a problem i am having with an old P2 to commit
61c4628b538608c1a85211ed8438136adfeb9a95 with subject "x86, fpu: split
FPU state from task struct - v5" (Authored by Suresh and committed by
Ingo on Apr/19).
In the process i learnt how painfully time consuming and boring a blind
git bisect feast could be (the last time a kernel worked on the P2 was
back in 2.6.23). I literally spent no less than 10 hours tracking this
(Ok, I was chewing tobbaco in between running git bisect bad/good,
compile, copy over kernel, spit here, reboot, test).
Also this patch is so huge that given my lack of knowledge in the area,
i couldnt do better bisecting to be more exact on what is causing this.
i.e the patch is not bisect-friendly.
So the best i can do is have other people take it from here.
I am able to reproduce the issue consistently on my laptop using qemu
(which helped speed debugging a bit). I have also narrowed it down to
include/asm-x86/i387.h::__save_init_fpu in (32 bit version) - it dies
somewhere in calling the following line:
----
alternative_input(
"fnsave %[fx] ;fwait;" GENERIC_NOP8 GENERIC_NOP4,
"fxsave %[fx]\n"
"bt $7,%[fsw] ; jnc 1f ; fnclex\n1:",
X86_FEATURE_FXSR,
[fx] "m" (tsk->thread.xstate->fxsave),
[fsw] "m" (tsk->thread.xstate->fxsave.swd) : "memory");
----------
The only thing that has changed there compared to good version is the
last two lines. But that looks sane to me given the struct naming has
changed. So i am suspecting the calling path perhaps not setting
something or other.
------------ boot output paste ----------------------
[....]
Compat vDSO mapped to ffffe000.
CPU: Intel Pentium II (Klamath) stepping 03
Checking 'hlt' instruction... OK.
Freeing SMP alternatives: 0k freed
invalid opcode: 0000 [#1]
Modules linked in:
Pid: 0, comm: swapper Not tainted (2.6.25-00000-g61c4628 #22)
EIP: 0060:[<c01012d0>] EFLAGS: ...My guess would be that the jnc 1f is now wrong. --
Do elucidate. The instruction with jnc 1f has always been there. Is there something as a result of the compiler or the codepath that makes it now wrong? cheers, jamal --
This looks bad.
0f ae 00 fxsave (%eax)
0f ba 60 02 07 btl $0x7,0x2(%eax)
73 02 jae (skip fnclex)
db e2 fnclex
0f 1f 00 nopl (%eax)
^^^^ This is a P4+ instruction. So it's not surprising that the P2
chokes. The question is where this comes from.
we have:
#define P6_NOP3 ".byte 0x0f,0x1f,0x00\n"
So the alternatives code applies the wrong nop padding for your
CPU. This was probably introduced with commit
32c464f5d9701db45bc1673288594e664065388e.
Jan, are you sure that P3 knows the P6 NOPs ? AFAICT its P4, but I
have to dig up the manuals.
Jamal, does the following patch solve your problem ? Please provide
also output of /proc/cpuinfo.
Thanks,
tglx
---
--- linux-2.6.orig/arch/x86/kernel/alternative.c
+++ linux-2.6/arch/x86/kernel/alternative.c
@@ -158,7 +158,6 @@ static const struct nop {
{ X86_FEATURE_K8, k8_nops },
{ X86_FEATURE_K7, k7_nops },
{ X86_FEATURE_P4, p6_nops },
- { X86_FEATURE_P3, p6_nops },
{ -1, NULL }
};
--
Dang - I feel i should have saved myself all that git bisecting mambo:~# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 3 model name : Pentium II (Klamath) stepping : 3 cpu MHz : 1063.771 cache size : 128 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu de pse tsc msr pae mce cx8 sep pge cmov mmx fxsr sse sse2 bogomips : 2160.92 clflush size : 32 power management: cheers, jamal --
your bisection was still very useful - it pinpointed the NOPs - the assembly code around the NOP changed so a different length NOP was patched in => which did not work on your CPU. So thanks for that! The great. So this NOP is indeed not generally known to all "P6 and later" CPUs. (the PII) Ingo --
Looks like. My analysis was wrong, as I got the P6 vs. PII/PIII confused :) Damn unintutive numbering, I thought ARM is worse but I'm not so sure anymore. But the oops clearly identified that instruction sequence. So for now we remove the X86_FEATURE_P3 -> P6_NOPS to be on the safe side. Thanks, tglx --
Guess that Intel named it Pentium II either because Hexium ("5"86:Pentium, "6"86:Hexium) would have been a strange name, or the successor to the Pentium/586 was not that great an improvement. Or something else? Always kept me wondering. --
Yeah, "Hexium" didn't quite work, and they thought they'd already gotten a working brand with "Pentium". That it clashed with their previous public prerelease naming scheme of P+number ("P", I believe, for "project" or "processor") didn't matter. The Pentium 4 is properly called the P7, but almost noone calls it that. "Pentium" is also a highly unstable isotope of hydrogen (Hydrogen-5), with a half-life under a zeptosecond. -hpa --
Well the brand was well recognized, so they called the P6 "Pentium Pro". Then when they went to make a cheaper version for regular consumers, they made the Pentium II based on the PPro, by making the L2 cache half speed (the PPro ran L2 cache at full speed) and put the L2 cache on a PCB with the core in a cartridge. After a while they added SSE and called it the Pentium !!! (check the logo, they really used exclamation marks) and a while later they managed to shrink their process enough that they could integrate the L2 cache on die, so they made the cache full speed again but on the same die as the cpu core, and soon after started offering plain CPUs again rather than cartridges. The Pentium 4 is of course a totally unrelated design. The Pentium M followed up on the P3 but aiming for lower power consumption rather than maximum performance, and later evolved into the Core and then Core 2 (where again they focused on maximum performance). I think the Pentium 4 may have tarnished the Pentium brand enough that they needed a new name, and hence "Core" came about. -- Len Sorensen --
This is very odd. Could you try running the attached C program on this processor and report the result? (Binary included for convenience.) Arjan: this seems to directly contradict the Intel documentation. Do you have any way to find out what the deal is with this? -hpa
H. Peter Anvin writes: > jamal wrote: > > > > Indeed it does - thanks. > > > >> Please provide > >> also output of /proc/cpuinfo. > > > > mambo:~# cat /proc/cpuinfo > > processor : 0 > > vendor_id : GenuineIntel > > cpu family : 6 > > model : 3 > > model name : Pentium II (Klamath) > > stepping : 3 > > cpu MHz : 1063.771 > > cache size : 128 KB > > fdiv_bug : no > > hlt_bug : no > > f00f_bug : no > > coma_bug : no > > fpu : yes > > fpu_exception : yes > > cpuid level : 2 > > wp : yes > > flags : fpu de pse tsc msr pae mce cx8 sep pge cmov mmx fxsr > > sse sse2 > > bogomips : 2160.92 > > clflush size : 32 > > power management: > > > > This is very odd. > > Could you try running the attached C program on this processor and > report the result? (Binary included for convenience.) > > Arjan: this seems to directly contradict the Intel documentation. Do > you have any way to find out what the deal is with this? hpa's p6nops test program works fine here on both a PII (family 6 model 3) and a Pentium Pro (family 6 model 1 stepping 9). I'm noticing another anomaly in jamal's /proc/cpuinfo above: since when can a PII have sse and sse2? As far as I recall, it was the PIII that added sse, and sse2 came with the P4. /Mikael --
Pentium III is the P6 core, so it will. Intel explicitly documents "all processors with family 6 or F." -hpa --
From the intel manual 0F 1F /0 NOP The multi-byte form of NOP is available on processors with model encoding: • CPUID.01H.EAX[Bytes 11:8] = 0110B or 1111B The multi-byte NOP instruction does not alter the content of a register and will not issue a memory operation. The instruction’s operation is the same in non-64-bit modes and 64-bit mode. --
Yeah, I looked up myself and noticed that I confused those stupid numbers again. Nevertheless reality seems to tell a different story :) Thanks, tglx --
Thomas Gleixner writes: > On Sat, 3 May 2008, H. Peter Anvin wrote: > > > Thomas Gleixner wrote: > > > > > > Jan, are you sure that P3 knows the P6 NOPs ? AFAICT its P4, but I > > > have to dig up the manuals. > > > > > > > Pentium III is the P6 core, so it will. > > > > Intel explicitly documents "all processors with family 6 or F." > > Yeah, I looked up myself and noticed that I confused those stupid > numbers again. > > Nevertheless reality seems to tell a different story :) I've tested the "0f 1f 00" 3-byte NOP on two PIIs, a Klamath (family 6 model 3 stepping 4) and a Deschutes (family 6 model 5 stepping 0), and it worked fine on both. Are you absolutely sure it's this 3-byte NOP that oopses? which is either wrong or indicates serious overclocking. The PIIs maxed out at about 400/450MHz. /Mikael --
Good spotting, and yes, the roof for Pentium II was 450 MHz. Pentium II had a TSC, so it shouldn't be wrong unless there wasn't a bigger timekeeping error on this platform. Jamal, can you clarify? -hpa --
This might also be caused by a calibration error. We've seen enough of them already. Thanks, tglx --
i think a vital clue can be found in the original report: | I am able to reproduce the issue consistently on my laptop using qemu | (which helped speed debugging a bit). I have also narrowed it down to | include/asm-x86/i387.h::__save_init_fpu in (32 bit version) - it dies | somewhere in calling the following line: so it might just be incorrect Qemu emulation of a PII's NOP instruction? (which btw. probably proves that Linux is the first OS to make intelligent use of those instructions?) Ingo --
It depends (as usually with Intel) on what document you are looking at. Based on my short research, the instruction has been retroactively added to the list of supported opcodes. Even my somewhat dated P4 manual does not list it, never mind its predecessors. It could have been accidentally omitted or even buggy in some early members of the P6 family and this could have been the reason for not documenting it from the beginning (the case of FFREEP comes to mind). Maciej --
It has retroactively been added to the documented list for all P6 core chips - that should mean it works on all of them. The most common reason for not documenting something (other than various Pure Evil NDA schemes) is that it hasn't been properly verified. However, verification can be done a posteori. -hpa --
True, but people do make mistakes from time to time. :) If I had a choice between a piece of silicon and a piece of documentation to trust, I would choose the former. Obviously this specific case has turned out to be an issue with Qemu, so it is somewhat irrelevant and given how the opcodes were added to the architecture I would consider the emulator excused. Maciej --
Sorry folks - had to run some errands. Wow, thanks for all the responses. Yes, the posting i just did (as pointed in my first email) was on qemu (so was the /proc/cpuinfo). I moved to qemu because it was less painful to do the git bisecting on my laptop; i used the same .config. I should also note that qemu seems to have worked fine in the past. In any case I will try to rerun on the older hardware which i can access next week again. I am begining to doubt myself if it is the same issue; i know even there it was pointing to FPU. So is the correct fix then to go patch qemu then? I should point i am running a slightly older version of qemu that has a few patches (nothing to do with x86 emulation). hpa, results from running on qemu (not the hardware) are: ----- mambo:~# ./hpa Test 0: ok Test 1: ok Test 2: ok Test 3: err Test 4: err Test 5: err Test 6: err Test 7: err Test 8: err ---------------- cheers, jamal --
Specifically, I believe the P6 added a whole class of instructions which would act as NOP's if used on older chips. It has been used for prefetch instructions, etc. At some point, the 0F 1F /0 group was declared to be NOP, and nothing else, for all future. -hpa --
