Linus,
Please pull the latest x86-fixes-for-linus git tree from:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git x86-fixes-for-linus
Thanks,
-hpa
------------------>
H. Peter Anvin (1):
x86: enable CONFIG_X86_GENERIC by defaultarch/x86/Kconfig.cpu | 19 ++++++++++---------
1 files changed, 10 insertions(+), 9 deletions(-)diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index 2c518fb..46d0acf 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -279,17 +279,18 @@ config GENERIC_CPU
endchoiceconfig X86_GENERIC
- bool "Generic x86 support"
+ bool "Generic x86 support" if EMBEDDED
depends on X86_32
+ default y
help
- Instead of just including optimizations for the selected
- x86 variant (e.g. PII, Crusoe or Athlon), include some more
- generic optimizations as well. This will make the kernel
- perform better on x86 CPUs other than that selected.
-
- This is really intended for distributors who need more
- generic optimizations.
-
+ Instead of just including optimizations and workarounds for
+ the selected x86 variant (e.g. PII, Crusoe or Athlon),
+ include some more generic optimizations and workarounds as
+ well. Without this option, the kernel is not guaranteed to
+ run on anything other than the exact CPU selected.
+
+ Disable this if you want to run the kernel on a specific CPU
+ *only* and want maximum optimizations for that CPU.
endifconfig X86_CPU
--
Ok, so after having realized that this seems to be more about a bug with
gcc, I'm really not as convinced any more.As far as I can tell, there are three issues:
- "-mtune=core/core2/pentium4/.." is buggy in some gas/gcc versions on
x86-32, and makes architectural choices.Any actual _released_ versions? Maybe it's just a current SVN issue?
Workaround: don't use it. And yes, X86_GENERIC=y will do that, although
quite frankly that seems to be dubious in itself. But quite frankly,
it's a gcc bug, and we should see it as such.The better workaround may well be "-Wa,-mtune=generic" as you pointed
out.- We do the CONFIG_P6_NOPL thing ourselves, and we should just stop
doing that on 32-bit. There simply isn't a good enough reason to do so.
I already posteed the Kconfig.cpu patch to just stop doing it.- X86_GENERIC means _other_ things too, like doing a 128-bit cacheline
just so that it won't suck horribly on P4's even if it's otherwise
tuned for a good microarchitecture.And they really do seem to be _separate_ issues. Do we really want to tie
these things together under X86_GENERIC?Linus
--
As far as I understood it it's a gas issue, and X86_GENERIC=y would
therefore *not* fix the bug with gcc < 4.2 and affected binutils
since we pass -mtune=i686 for gcc < 4.2 with X86_GENERIC=y.cu
Adrian--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed--
Well, for one thing, gcc doesn't actually pass the -mtune= option to
gas, it turns out.But yes, "-Wa,-march=generic32" is really the proper fix.
-hpa
--
If I understand the binutils changelog correctly -march=generic32
support was added one week before the NOP code in question, so allcu
Adrian--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed--
It doesn't, after all, with the current gcc driver. A future gcc driver
may change that. Of course, now when this has popped up on the radars/-march/-mtune/, but yes. I suspect it was actually added *in order*
to support the NOP code.-hpa
--
As far as I can tell, -Wa,-mtune=generic *should* work. It doesn't look
to me as if cc1 will generate the long NOPs. That one we can doWell, the argument in favour would be that if you want a kernel that can
cross between different microarchitectures, then you want the "don't
suck horribly on any of them". We can, of course, divide them down
further, but is it useful?The "ideal" way to do any of this would probably to have checkboxes for
all the CPUs you want to support and then a drop-down box for the CPU to
optimize for. However, the combinatorics of that would be horrible, and
it would be very unlikely we would avoid bugs.-hpa
--
On Mon, 08 Sep 2008 11:22:24 -0700
the ideal case would be "support them all"
the second-most ideal case would be "support all as of <year>" I suppose
a third one for advanced users not distros would be "support only
<vendor>" since that would be the biggest part of code to dropbetween models of the same vendor.. not too much to win there.
--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
Support all from the last 10 years (ok excluding legacy models that
just shipped forever like 486). I think that's quite reasonable
to do and worked for a long time.-Andi
--
ak@linux.intel.com
--
Not really. That would include things like the i386, which is a bunch
of really nasty stuff.-hpa
--
agreed - especially the verify_area() impact makes it a non-starter.
but 486 and higher is certainly quite reasonable, and is still being
tested.... and _in practice_ 99% of all systems that run Linux today understand
CMOV.... _and_ in practice 99% of all new Linux systems shipped today are
Core2 or better.... and so on it goes with this argument. Everyone has a different
target audience and there's no firm limit. Maybe what makes more sense
is to have some sort of time dependency:support all x86 CPUs released in the last year
support all x86 CPUs released in the past 5 years
support all x86 CPUs released in the past 10 years
support all x86 CPUs released ever
[ ... or configure a specific model ]and people/distributions would use _those_ switches. That means we could
continuously tweak those targets, as systems become obsolete and new
CPUs arrive.Ingo
--
That's just *asking* for flame mail if somebody builds a kernel for a system
that's 4 year 9 months old, and he builds a kernel 6 months later, and it fails
to boot because the CPU is now 3 months out and we've deprecated it...Quick - what year/month was the CPU you're using now released? No peeking. ;)
(For the record, I have no *clue* when Intel actually released the Core2 T7200,
which is a whole *nother* can of worms - the chip release date can be quite
some time before the system vendor ships, and when the consumer actually buys
it - it's quite possible that we can write "released in the past 5 years",
a user looks at it and says "I bought this system 4 years 2 months ago", and
think he's OK, but he's not because he bought a system released 4 years 9 months
ago that used a chipset released 5 years 6 months ago...
yeah, in terms of precision of the definition it's certainly more
towards the 'vague' end of the spectrum. OTOH, we do change our defaults
slowly but surely to match the hardware. So this would give a practical
definition. If someone _does_ complain legitimately, it doesnt cost us
much to revert a tweak and delay it some more.So the idea is to have some sort of independent platform, instead of the
current practice of distros like Debian chosing pretty much random
options. No strong opinion though. We can cover 90% of the real
advantages via dynamic methods, it's quite rare that we have to make
hard .config choices.Pretty much the only hardcoded aspect that hurts in practice is the
cache alignment parameter - all the rest is either dynamic already or
insignificant. Ever since distros have discovered
CONFIG_CC_OPTIMIZE_FOR_SIZE=y, even the various compiler optimization
parameters have less of a role. We just have to wait a year or two for
P4's to not matter that much anymore, then we can do generic kernels
with 64 byte alignment and cmov, that will just work almost everywhere
rather optimally.Ingo
--
cmov, cmpxchg and xadd are the noticeable things.
I think there are realistically three classes:
- _really_ old, to the point of being totally useless for SMP.
This is really just 386 and clones. We _need_ a working WP for a
race-free access_ok(), and we need cmpxchg (and lately xadd).SMP cannot really realistically work reasonably (ys, there were SMP
machines. No, they don't matter), and you'd have to be insane to care
about this as a vendor even on UP. Probably nobody really cares (ie if
you have hardware that old, you are likely much better off with an
older kernel too)Smaller pains even on UP: bswap doesn't exist. invlpg doesn't exist.
- old. pre-cmov. i486 and pentium, and some clones.
It's workable, but code generation differences are really big enough
that it's worth having a totally separate architecture option for newer
CPUs where the kernel simply won't work.And most newer distros probably simply don't care, although there may
be individual cases where this makes sense (embedded places still use
pentium clones etc, and there are probably a fair amount of individuals
that want to still use this)Other pains: TSC doesn't necessarily exist.
- "modern 32-bit": PPro and better. Can take CMOV, MMX and TSC for
granted.Yes, there are graduations to the above, but reasonably, those three are I
think the "architectural" big versions. The rest should be:- pure "tuning" options. A Pentium 4 is different from Core 2 in tuning,
and the best code sequences can be very very different, but the binary
should work on both.- with *dynamic* choices for the differences that are architecturally
visible.Ie the whole choice of syscall/sysenter/int80 is dynamic, not specified
statically at compile time with a config option. So are things like the
different XMM versions etc.Hmm? Doesn't that sound like a sane model?
Linus
--
We use 3DNow! for bigger memcpy's if the kernel is configured for a K7.
cu
Adrian--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed--
It doesn't. I guess I don't care that much, since explicitly asking for
some odd-ball case does indicate that you want a very specific kernel. I
guess that's ok. I'm certainly not violently against it.Of course, I also suspect that we _could_ fix it so that things like
memcpy really only have two cases:- the special inlined "rep movs" thing. Although I'm not actually sure
gcc even does this, and I don't think we force it any more.- If doing a function call, we could just fix things up to be more
dynamic. Of course, the fixups for the SMP cases are scary (ie we'd
probably have to first change it to a one-byte "int $3" instruction,
then change the target, and then write the first byte back - and handle
any race with another CPU by fixing up the trap).but I dunno.
Linus
--
VIA C3 (Samuel 2/Ezra, 600 - 1000 MHz?, common on VIA EPIA-*: home
theatres etc) can't CMOV.
--
Krzysztof Halasa
--
AFAIK they fixed that in newer BIOS with a microcode update. It's
slow, but it works.-Andi
--
..
Our firewall here uses a Via C3-600 CPU, and CMOV has never worked on it.
But based upon your posting, I have today upgraded the BIOS to the
latest (2004) version.Now.. how can I check whether CMOV works or not? It's not listed in /proc/cpuinfo.
Thanks
--
If it's not in cpuinfo it won't work.
-Andi
--
ak@linux.intel.com
--
..
..
Okay, done. And the binary does indeed have a ton of CMOV instructions.
When running it, this appears immediately:Illegal instruction
So much for the "BIOS upgrade fixes CMOV microcode" theory.
Cheers
--
Compile just about any C program with -march=i686.
-hpa
--
Yes, but if it's slower than jmp+mov than you actively want to avoid it.
-hpa
--
Well, more practically, the C3 simply _isn't_ a "modern 32-bit" one. It
would fall into the other category of "pre-PPro, but at least better
than i386".Linus
--
On Tue, 09 Sep 2008 01:17:19 +0200
so your cpu does not fall into this bucket......
no big deal.--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
On Mon, 8 Sep 2008 12:30:02 -0700 (PDT)
I'd lump all cpus that don't have cpuid in this bucket too (eg half the
486es) simply because not having cpuid is painful in pretty much theagain makes sense; question is if it makes sense to take PSE and PAE
it does to me; the only question is if we hit a new bucket with the
various fancy string instructions that are in upcoming models; doing
string/copy operations inlined for those guys will make a fourth bucket.--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
Not really. Detecting CPUID is pretty trivial, and we just initialize
Well, PAE implies PSE. Unfortunately Intel released a series of
Pentium-Ms without PAE support. We *should* be able to take PSE for
granted, but there is Xen damage.-hpa
--
Hmm. The only other thing seems to be X86_INTEL_USERCOPY. Which doesn't
seem to be something we want to force either.And I have to say, that whole X86_GENERIC -> L1_CACHE_BYTES=128 ->
cache_line_size() -> SLAB/SLUB/SLOB alignment worries me too. Looking at
that, I really don't feel like I want to force 128-byte alignment on
everybody, just because the P4 was a pig in cacheline size.So NOPL really stands out as being different from the other things that
X86_GENERIC does.Linus
--
SLAB/SLUB should actually auto detect the cache line at runtime.
Similar feeling here.
-Andi
--
ak@linux.intel.com
--
| Parag Warudkar | BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Arjan van de Ven | Re: [GIT]: Networking |
| David Miller | Re: [BUG] New Kernel Bugs |
