x86: merge the simple bitops and move them to bitops.h Some of those can be written in such a way that the same inline assembly can be used to generate both 32 bit and 64 bit code. For ffs and fls, x86_64 unconditionally used the cmov instruction and i386 unconditionally used a conditional branch over a mov instruction. In the current patch I chose to select the version based on the availability of the cmov instruction instead. A small detail here is that x86_64 did not previously set CONFIG_X86_CMOV=y. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> --- arch/x86/Kconfig.cpu | 2 +- include/asm-x86/bitops.h | 90 +++++++++++++++++++++++++++++++++++++++++++ include/asm-x86/bitops_32.h | 64 ------------------------------ include/asm-x86/bitops_64.h | 76 ------------------------------------ 4 files changed, 91 insertions(+), 141 deletions(-) Hi Ingo, On a 64-bit machine the cmov instruction is always available. I made that explicit by turning on CONFIG_X86_CMOV for all 64-bit builds. This patch introduces a change in behaviour for 32-bit builds: if the selected cpu has the cmov instruction, it will now be used by the functions ffs and fls. For the i386 defconfig the number of occurrences of cmov increases from 4336 to 4429 and vmlinux text size increases 15 bytes. When building for 486 no cmovs are generated, as expected. Greetings, Alexander diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index 31e92fb..fb7399b 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -399,7 +399,7 @@ config X86_TSC # generates cmov. config X86_CMOV def_bool y - depends on (MK7 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7) + depends on (MK7 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || X86_64) config X86_MINIMUM_CPU_FAMILY int diff --git a/include/asm-x86/bitops.h b/include/asm-x86/bitops.h index 1a23ce1..bbdecee 100644 --- ...
Looks good in general. What's left in bitops_{32,64}.h now?
This comment seems wrong. My "man ffs" says that it returns 1-32 for
non-zero inputs, and 0 for a zero input. This function returns 0-31, or
-1 for a zero input.
DESCRIPTION
The ffs() function returns the position of the first (least significant)
bit set in the word i. The least significant bit is position 1 and the
most significant position is, for example, 32 or 64. The functions
ffsll() and ffsl() do the same but take arguments of possibly different
size.
RETURN VALUE
These functions return the position of the first bit set, or 0 if no
The prevailing style in bitops.h is to use "bsf"/"cmovz" and let the
And this comment is even more wrong, given that ffs and fls are
completely different functions ;)
I know these are from the original, but its worth fixing given that
J
--
On Fri, 14 Mar 2008 11:07:55 -0700, "Jeremy Fitzhardinge"
Thanks for taking a look!
bitops_{32,64}.h are getting pretty empty ;)
Both contain find_first_bit/find_first_zero_bit, i386 has them inlined,
x86_64 has an ugly define to select between small bitmaps (inlined) and
an out-of-line version. I think they should be unified much like how
find_next_bit and find_next_zero_bit work now (in x86#testing).
Both define fls64(), but i386 uses a generic one and x86_64 defines
one all by itself. The generic one is currently not suitable for
use by 64-bit archs... that can change.
x86_64 defines ARCH_HAS_FAST_MULTIPLIER, i386 not. This affects a
choice of generated code in the (generic) hweight function. It would
be nice if that could move to some other file.
x86_64 has a mysterious inline function set_bit_string, which is
only used by pci-calgary_64.c and pci-gart_64.c. Not sure what to
The comment rightly mentions that ffs differs in spirit from the rest
of the functions here. The type to be used here it always a 4-byte one
(int); most other functions operate on longs. I have not checked if the
explicit size is absolutely necessary, but I feel much safer with the
I'll see if I can come up with something better.
Greetings,
--
Alexander van Heukelum
heukelum@fastmail.fm
--
http://www.fastmail.fm - A no graphics, no pop-ups email service
--
The CMOV define should probably be dependent on what CPU the kernel is tuned for. It was originally written for when x86-64 was only K8 which has fast CMOV, but e.g. on P4 CMOV is actually deprecated It is very unlikely a generic one will ever be able to compete It depends on the CPU, but it can be probably safely set on pretty much It's generic and could really live in linux/bitops.h -Andi --
On Fri, 14 Mar 2008 20:55:20 +0100, "Andi Kleen" <andi@firstfloor.org>
Hi Andi,
I guess you are right. But there is quite a big number of different
types of P4. Let's see what the current situation is... defconfigs
(of current x86#testing+this patch/current linus) with
CONFIG_GENERIC_CPU=n:
Athlon: 4764 / 4667 occurences of cmovxx
Pentium-IV: 4079 / 3982 occurences of cmovxx
Pentium-M: 3939 / 3841 occurences of cmovxx
Core-2: 4335 / 4330 occurences of cmovxx
So it adds a few percent extra cmovxx's. The last one is fishy...
Generic is maybe not the right term: asm-generic/bitops/fls64.h has:
static inline int fls64(__u64 x)
{
__u32 h = x >> 32;
if (h)
return fls(h) + 32;
return fls(x);
}
I just wanted to move the 64-bit version to that header, with some
In fact I just found out that it only had an effect for 64 bit
It could. But it is a trivial (slow?) implementation. Probably fine
for the uses in pci-calgary_64.c and pci-gart_64.c (small ranges?),
but I would worry about people using it, thinking it was a near-
optimal implementation.
Greetings,
Alexander
--
Alexander van Heukelum
heukelum@fastmail.fm
--
http://www.fastmail.fm - IMAP accessible web-mail
--
It is this way because the callers in 95+% of all cases only set a single bit. For that case it is not slow. -Andi --
On Fri, 14 Mar 2008 22:42:05 +0100, "Andi Kleen" <andi@firstfloor.org>
I agree that it should end up using bsr. It would look like this in
the end, I guess. Might be familiar.
#if BITS_PER_LONG == 32
static inline int fls64(__u64 x)
{
__u32 h = x >> 32;
if (h)
return __fls(h) + 33;
return fls(x);
}
#else
static inline int fls64(__u64 x)
{
if (x == 0)
return 0;
return __fls(x) + 1;
}
This is the only reason that this define exists. With another
And my feeling is that this is exactly the reason why this is
not a good version for a generic implementation in bitops.h. But
I don't care much.
Greetings,
--
Alexander van Heukelum
heukelum@fastmail.fm
--
http://www.fastmail.fm - Accessible with your email software
or over the web
--
That would require a polymorphic macro __fls that adapts to 32bit and 64bit AFAIK it only exists because some ancient sparc chips had incredibly I bet most different approaches who might be slightly faster for larger bit strings would make the one bit case slower. -Andi --
On Fri, 14 Mar 2008 23:18:45 +0100, "Andi Kleen" <andi@firstfloor.org>
Hi Andi,
It's unsigned long __fls(unsigned long)... and this is only compiled
if unsigned long is as long as u64. Seems fine to me. Moreover, it
Good to know. And I realized that there is also the machines without
a hardware multiply instruction at all. So you are right. i386/x86_64
That is true, of course. But then the name of the function should
give a hint that it is optimized for short sequences.
Greetings,
Alexander
--
Alexander van Heukelum
heukelum@fastmail.fm
--
http://www.fastmail.fm - Or how I learned to stop worrying and
love email again
--
K8, EFFICEON and CORE2 support the cmovxx instructions.
Instead of listing the cpu's that have support for the
cmovxx instructions, list the cpu's that don't.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
---
A bit of playing resulted in:
CPUS="M386 M486 M586 M586TSC M586MMX M686 MPENTIUMII MPENTIUMIII"
CPUS=$CPUS" MPENTIUMM MPENTIUM4 MK6 MK7 MK8 MCRUSOE MEFFICEON"
CPUS=$CPUS" MWINCHIPC6 MWINCHIP2 MWINCHIP3D MGEODEGX1 MGEODE_LX"
CPUS=$CPUS" MCYRIXIII MVIAC3_2 MVIAC7 MPSC MCORE2"
for cpu in $CPUS
do
echo "CONFIG_${cpu}=y" > testconfig
make ARCH=i386 allnoconfig KCONFIG_ALLCONFIG=testconfig > /dev/null
echo ${cpu} >> result
grep X86_CMOV .config >> result
echo >> result
done
I'm quite sure that K8, EFFICEON and CORE2 support HAVE_CMOV, but
they did not set X86_CMOV.
Greetings,
Alexander van Heukelum
diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index 31e92fb..7a3a2d4 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -399,7 +399,7 @@ config X86_TSC
# generates cmov.
config X86_CMOV
def_bool y
- depends on (MK7 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7)
+ depends on !(MCYRIXIII || MGEODE_LX || MGEODEGX1 || MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCRUSOE || MK6 || M586MMX || M586TSC || M586 || M486 || M386)
config X86_MINIMUM_CPU_FAMILY
int
--
On Sat, 15 Mar 2008 21:18:16 +0100, "H. Peter Anvin" <hpa@zytor.com>
Hi hpa,
I believe you... and the Makefile_32.cpu: it gets a -march=i686. I
also found out that I missed X86_ELAN, which is treated as a i486.
Also, MGEODE_LX does not get any optimization flags. I think that
is unintentional, but what should they be?
And MPSC (Intel P4 / older Netburst based Xeon) depends on X86_64.
I would be very surprised if that one could not run 32-bit code,
though. What flags should that one get?
Thanks for pointing out I missed Crusoe. As a punishment you
can answer my questions ;).
Greetings,
Alexander
--
Alexander van Heukelum
heukelum@fastmail.fm
--
http://www.fastmail.fm - Send your email first class
--
in my experience (on user-space code), optimizing for "i586" gives good results. However, the cache is small, so everything which can reduce code size (especially loop/jump/function alignment) is worth checking. Regards, Willy --
x86: K8, GEODE_LX, CRUSOE, EFFICEON and CORE2 support CMOV. Instead of summing up the cpu's that have support for the cmov instruction, sum up the cpu's that don't. GEODE_LX has no entry in arch/x86/Makefile_32.cpu, so it did not get any -march or -mtune arguments at all. I quite arbitrarily added -march=i686 -mtune=pentium-mmx. This way gcc is allowed to use cmov instructions. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> --- Thanks for letting me know. I chose i686 with pentium-mmx scheduling. This enables cmov, while the scheduling is still pentium-like. If i586 is really better, GEODE_LX should be listed as not using cmov at all (because the compiler does not generate them). Loop/jump/function alignment is automatically turned off if you compile the kernel with CONFIG_CC_OPTIMIZE_FOR_SIZE=y so I think it should not be set explicitly. Qemu does not run GEODE_LX-kernels (no 3dNow). Greetings, Alexander diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index 31e92fb..38af3e1 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -399,7 +399,7 @@ config X86_TSC # generates cmov. config X86_CMOV def_bool y - depends on (MK7 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7) + depends on !(MCYRIXIII || MGEODEGX1 || MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MK6 || M586MMX || M586TSC || M586 || M486 || M386 || X86_ELAN) config X86_MINIMUM_CPU_FAMILY int diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu index e372b58..b5e2438 100644 --- a/arch/x86/Makefile_32.cpu +++ b/arch/x86/Makefile_32.cpu @@ -38,8 +38,9 @@ cflags-$(CONFIG_MCORE2) += -march=i686 $(call tune,core2) # AMD Elan support cflags-$(CONFIG_X86_ELAN) += -march=i486 -# Geode GX1 support +# Geode GX1 and GX/LX support cflags-$(CONFIG_MGEODEGX1) += -march=pentium-mmx +cflags-$(CONFIG_MGEODE_LX) += -march=i686 $(call tune,pentium-mmx) # add at the end to overwrite eventual tuning options from ...
x86: merge the simple bitops and move them to bitops.h. Some of those can be written in such a way that the same inline assembly can be used to generate both 32 bit and 64 bit code. For ffs and fls, x86_64 unconditionally used the cmov instruction and i386 unconditionally used a conditional branch over a mov instruction. In the current patch I chose to select the version based on the availability of the cmov instruction instead. A small detail here is that x86_64 did not previously set CONFIG_X86_CMOV=y. Improved comments for ffs and fls. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> --- Hi, Here is a new version with improved (I hope) comments for the functions ffs and fls, suggested by Jeremy. I left the explicit use of the -l forms in the inline assembly of ffs and fls, because I think it is clearer that way. Greetings, Alexander diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index 31e92fb..fb7399b 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -399,7 +399,7 @@ config X86_TSC # generates cmov. config X86_CMOV def_bool y - depends on (MK7 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7) + depends on (MK7 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || X86_64) config X86_MINIMUM_CPU_FAMILY int diff --git a/include/asm-x86/bitops.h b/include/asm-x86/bitops.h index 1a23ce1..363a4ff 100644 --- a/include/asm-x86/bitops.h +++ b/include/asm-x86/bitops.h @@ -66,7 +66,6 @@ static inline void __set_bit(int nr, volatile void *addr) : "Ir" (nr) : "memory"); } - /** * clear_bit - Clears a bit in memory * @nr: Bit to clear @@ -310,6 +309,104 @@ static int test_bit(int nr, const volatile unsigned long *addr); constant_test_bit((nr),(addr)) : \ variable_test_bit((nr),(addr))) +/** + * __ffs - find first bit in word. + * @word: The word to search + * + * Undefined if no bit exists, so code should check ...
find first bit set in word
[The "first bit in word" could have any value in {0, 1}, so the
---
~Randy
--
x86: merge the simple bitops and move them to bitops.h. Some of those can be written in such a way that the same inline assembly can be used to generate both 32 bit and 64 bit code. For ffs and fls, x86_64 unconditionally used the cmov instruction and i386 unconditionally used a conditional branch over a mov instruction. In the current patch I chose to select the version based on the availability of the cmov instruction instead. A small detail here is that x86_64 did not previously set CONFIG_X86_CMOV=y. Improved comments for ffs, ffz, fls and variations. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> --- Hi Randy, Ingo, Version 3, for x86#testing. Compared to v1 there are only changes in the comments as suggested by Jeremy and Randy. Thanks to both for their comments. Greetings, Alexander arch/x86/Kconfig.cpu | 2 +- include/asm-x86/bitops.h | 99 ++++++++++++++++++++++++++++++++++++++++++- include/asm-x86/bitops_32.h | 64 ---------------------------- include/asm-x86/bitops_64.h | 76 --------------------------------- 4 files changed, 99 insertions(+), 142 deletions(-) diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index 31e92fb..fb7399b 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -399,7 +399,7 @@ config X86_TSC # generates cmov. config X86_CMOV def_bool y - depends on (MK7 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7) + depends on (MK7 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || X86_64) config X86_MINIMUM_CPU_FAMILY int diff --git a/include/asm-x86/bitops.h b/include/asm-x86/bitops.h index 1a23ce1..923bdc2 100644 --- a/include/asm-x86/bitops.h +++ b/include/asm-x86/bitops.h @@ -66,7 +66,6 @@ static inline void __set_bit(int nr, volatile void *addr) : "Ir" (nr) : "memory"); } - /** * clear_bit - Clears a bit in memory * @nr: Bit to clear @@ -310,6 +309,104 @@ static ...
