This series of patches: [1/3] adds __fls.h to asm-generic [2/3] modifies asm-*/bitops.h for 64-bit archs to implement __fls [3/3] modifies asm-generic/fls64.h to make use of __fls I have compiled i386 and x86_64, and they generate the same code as before the change. The changes to the other archs are a best effort. Please comment. If this patch series is accepted, it will make one tiny bit of the x86-unification a tiny bit cleaner. The patches are against Linus' current tree. Andrew, if no concensus can be reached that this is a bad patch series, would you be willing to add this to your tree? Greetings, Alexander --
I strongly support this. I wish we'd also have a consistent naming convention for all the bitops functions so it will be clearer what data type the function is working on and is the result 0 or 1 based. It seems like what we currently have is: name type first bit# ---- ---- ---------- ffs int 1 fls int 1 __ffs ulong 0 __fls ulong 0 # in your proposal ffz ulong 0 fls64 __u64 1 so it seems like - ffz is misnamed and is rather confusing. Apprently is should be renamed to __ffz. - (new) ffz(x) can be defined to ffs(~(x)) - It'd be nice to have ffs64, and maybe ffz64. --
I think every programmer who thinks in terms of bits realises that ffz(x) == __ffs(~x) and ffz(~x) == __ffs(x) etc... so I would rather get rid of ffz entirely by converting all uses to __ffs. Patch (against current linus) below. After that all implementations of ffz could be removed. ffs64 would be a good addition to complete the set of functions, but that would be the same as glibc's (and gcc-builtin) ffsll. Looking into that... the relevant gcc builtins are __builtin_ffs (find first set bit), __builtin_clz (count leading zeroes), __builtin_ctz (count trailing zeroes), __builtin_popcount, maybe __builtin_parity and their -l and -ll variants. Maybe the kernel should be changed to use those names instead of the current ones? ffs would stay as it is. __ffs would become ctz, __fls would become something like 31-clz, and hweight would become popcount. Greetings, Alexander [RFC] how about getting rid of ffz? The patch is not tested, but the conversion should be completely trivial. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> --- arch/alpha/kernel/irq_i8259.c | 2 +- arch/alpha/kernel/irq_pyxis.c | 2 +- arch/alpha/kernel/sys_alcor.c | 2 +- arch/alpha/kernel/sys_cabriolet.c | 2 +- arch/alpha/kernel/sys_dp264.c | 2 +- arch/alpha/kernel/sys_eb64p.c | 2 +- arch/alpha/kernel/sys_mikasa.c | 2 +- arch/alpha/kernel/sys_noritake.c | 2 +- arch/alpha/kernel/sys_rx164.c | 2 +- arch/arm/kernel/smp.c | 2 +- arch/ia64/hp/common/sba_iommu.c | 2 +- arch/ia64/kernel/perfmon.c | 2 +- arch/ia64/kernel/smp.c | 2 +- arch/ia64/mm/init.c | 2 +- arch/mips/kernel/irixelf.c | 2 +- arch/parisc/kernel/smp.c | 2 +- arch/sh/kernel/cpu/irq/imask.c | 2 +- block/blk-barrier.c | 2 +- crypto/lrw.c | 2 +- drivers/ieee1394/pcilynx.c | 2 +- drivers/input/keyboar...
Yeah, very few architectures have an optimized version of ffz that will perform noticeably better than __ffs(~x). Interesting idea. ctz much better than __ffs with regards to the return value's first bit number, but unless you expose clz and convert the code how do you get rid of the __fls vs. fls confusion? (BTW for __fls, I'd use BITS_PER_LONG - 1, not 31 :) I think that adopting libc's convention might make more sense, i.e., define ffs, ffsl, ffsll, and fls, flsl, flsll, and have *all* be 1-based. --
On Sun, 06 Apr 2008 18:03:40 +0300, "Benny Halevy" <bhalevy@panasas.com> Yeah, and these implementations seem to be based on a loop over all bits in the word. I don't think adding one extra not-operation Exposing clz/ctz on all architectures will be the harder part. Changing all current uses of ffs/fls (and __fls) will take some time. Mostly because converting code using fls to use clz instead needs to be done a bit carefully, because fls(0) has defined behaviour, while I agree that it makes sense for fls. For clz (and ctz) I would choose clz(unsigned long), clz32(u32), and clz64(u64). Greetings, -- Alexander van Heukelum heukelum@fastmail.fm -- http://www.fastmail.fm - Choose from over 50 domains or use your own --
i've applied #1 and #3 to x86.git/testing, to see how this works out on x86. But it looks good to me in general. Ingo --
Use __fls for fls64 on 64-bit archs. The implementation for
64-bit archs is moved from x86_64 to asm-generic.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
---
include/asm-generic/bitops/fls64.h | 22 ++++++++++++++++++++++
include/asm-x86/bitops_64.h | 15 ++-------------
2 files changed, 24 insertions(+), 13 deletions(-)
diff --git a/include/asm-generic/bitops/fls64.h b/include/asm-generic/bitops/fls64.h
index 1b6b17c..86d403f 100644
--- a/include/asm-generic/bitops/fls64.h
+++ b/include/asm-generic/bitops/fls64.h
@@ -3,6 +3,18 @@
#include <asm/types.h>
+/**
+ * fls64 - find last set bit in a 64-bit word
+ * @x: the word to search
+ *
+ * This is defined in a similar way as the libc and compiler builtin
+ * ffsll, but returns the position of the most significant set bit.
+ *
+ * fls64(value) returns 0 if value is 0 or the position of the last
+ * set bit if value is nonzero. The last (most significant) bit is
+ * at position 64.
+ */
+#if BITS_PER_LONG == 32
static inline int fls64(__u64 x)
{
__u32 h = x >> 32;
@@ -10,5 +22,15 @@ static inline int fls64(__u64 x)
return fls(h) + 32;
return fls(x);
}
+#elif BITS_PER_LONG == 64
+static inline int fls64(__u64 x)
+{
+ if (x == 0)
+ return 0;
+ return __fls(x) + 1;
+}
+#else
+#error BITS_PER_LONG not 32 or 64
+#endif
#endif /* _ASM_GENERIC_BITOPS_FLS64_H_ */
diff --git a/include/asm-x86/bitops_64.h b/include/asm-x86/bitops_64.h
index aaf1519..1f1f796 100644
--- a/include/asm-x86/bitops_64.h
+++ b/include/asm-x86/bitops_64.h
@@ -112,19 +112,6 @@ static inline int ffs(int x)
}
/**
- * fls64 - find last bit set in 64 bit word
- * @x: the word to search
- *
- * This is defined the same way as fls.
- */
-static inline int fls64(__u64 x)
-{
- if (x == 0)
- return 0;
- return __fls(x) + 1;
-}
-
-/**
* fls - find last bit set
* @x: the word to search
*
@@ -146,6 +133,8 @@ static inline int fls(int x)
#endif /* __KERNEL__ */...(Sorry, sending this again as I screwed up the previous mail). Hi, I have a question about fls64() which I hope you or someone else could clarify, please see below. It seems fls64() is implemented on top of __fls(), however the __fls() implementation on the x86-64 architecture states that the result is undefined if the argument does not have any zero bits. So if I understand correctly, the statement "fls64(~0ULL)" would return an undefined result on x64-64 instead of 64 as one would expect. Wouldn't it make sense to check for ~0ULL in fls64()? Thanks, Ricardo --
Ricardo M. Correia spotted that the use of __fls() in fls64() did
not seem to make sense. In fact fls64()'s implementation is fine,
but the description of __fls() was wrong. Fix that.
Reported-by: "Ricardo M. Correia" <Ricardo.M.Correia@Sun.COM>
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
---
You have found a bug. It's not in fls64, though, but a copy/paste
one in the comment preceding __fls(). __fls() gives an undefined
result if there are no _set_ bits: only __fls(0) gives an undefined
result.
The inconsistency is well-spotted, though, thanks.
Patch is against current -tip.
Greetings,
---
diff --git a/include/asm-x86/bitops.h b/include/asm-x86/bitops.h
index 96b1829..cfb2b64 100644
--- a/include/asm-x86/bitops.h
+++ b/include/asm-x86/bitops.h
@@ -356,7 +356,7 @@ static inline unsigned long ffz(unsigned long word)
* __fls: find last set bit in word
* @word: The word to search
*
- * Undefined if no zero exists, so code should check against ~0UL first.
+ * Undefined if no set bit exists, so code should check against 0 first.
*/
static inline unsigned long __fls(unsigned long word)
{
--applied to tip/x86/cleanups, thanks! Ingo --
Implement __fls on all 64-bit archs:
alpha has an implementation of fls64.
Added __fls(x) = fls64(x) - 1.
ia64 has fls, but not __fls.
Added __fls based on code of fls.
mips and powerpc have __ilog2, which is the same as __fls.
Added __fls = __ilog2.
parisc, s390, sh and sparc64:
Include generic __fls.
x86_64 already has __fls.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
---
include/asm-alpha/bitops.h | 5 +++++
include/asm-ia64/bitops.h | 16 ++++++++++++++++
include/asm-mips/bitops.h | 5 +++++
include/asm-parisc/bitops.h | 1 +
include/asm-powerpc/bitops.h | 5 +++++
include/asm-s390/bitops.h | 1 +
include/asm-sh/bitops.h | 1 +
include/asm-sparc64/bitops.h | 1 +
8 files changed, 35 insertions(+), 0 deletions(-)
diff --git a/include/asm-alpha/bitops.h b/include/asm-alpha/bitops.h
index 9e19a70..15f3ae2 100644
--- a/include/asm-alpha/bitops.h
+++ b/include/asm-alpha/bitops.h
@@ -388,6 +388,11 @@ static inline int fls64(unsigned long x)
}
#endif
+static inline unsigned long __fls(unsigned long x)
+{
+ return fls64(x) - 1;
+}
+
static inline int fls(int x)
{
return fls64((unsigned int) x);
diff --git a/include/asm-ia64/bitops.h b/include/asm-ia64/bitops.h
index 953d3df..e2ca800 100644
--- a/include/asm-ia64/bitops.h
+++ b/include/asm-ia64/bitops.h
@@ -407,6 +407,22 @@ fls (int t)
return ia64_popcnt(x);
}
+/*
+ * Find the last (most significant) bit set. Undefined for x==0.
+ * Bits are numbered from 0..63 (e.g., __fls(9) == 3).
+ */
+static inline unsigned long
+__fls (unsigned long x)
+{
+ x |= x >> 1;
+ x |= x >> 2;
+ x |= x >> 4;
+ x |= x >> 8;
+ x |= x >> 16;
+ x |= x >> 32;
+ return ia64_popcnt(x) - 1;
+}
+
#include <asm-generic/bitops/fls64.h>
/*
diff --git a/include/asm-mips/bitops.h b/include/asm-mips/bitops.h
index ec75ce4..c2bd126 100644
--- a/include/asm-mips/bitops.h
+++ b/include/asm-mips/bito...Add a generic __fls implementation in the same spirit as
the generic __ffs one. It finds the last (most significant)
set bit in the given long value.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
---
include/asm-generic/bitops/__fls.h | 43 ++++++++++++++++++++++++++++++++++++
1 files changed, 43 insertions(+), 0 deletions(-)
create mode 100644 include/asm-generic/bitops/__fls.h
diff --git a/include/asm-generic/bitops/__fls.h b/include/asm-generic/bitops/__fls.h
new file mode 100644
index 0000000..be24465
--- /dev/null
+++ b/include/asm-generic/bitops/__fls.h
@@ -0,0 +1,43 @@
+#ifndef _ASM_GENERIC_BITOPS___FLS_H_
+#define _ASM_GENERIC_BITOPS___FLS_H_
+
+#include <asm/types.h>
+
+/**
+ * __fls - find last (most-significant) set bit in a long word
+ * @word: the word to search
+ *
+ * Undefined if no set bit exists, so code should check against 0 first.
+ */
+static inline unsigned long __fls(unsigned long word)
+{
+ int num = BITS_PER_LONG - 1;
+
+#if BITS_PER_LONG == 64
+ if (!(word & (~0ul << 32))) {
+ num -= 32;
+ word <<= 32;
+ }
+#endif
+ if (!(word & (~0ul << (BITS_PER_LONG-16)))) {
+ num -= 16;
+ word <<= 16;
+ }
+ if (!(word & (~0ul << (BITS_PER_LONG-8)))) {
+ num -= 8;
+ word <<= 8;
+ }
+ if (!(word & (~0ul << (BITS_PER_LONG-4)))) {
+ num -= 4;
+ word <<= 4;
+ }
+ if (!(word & (~0ul << (BITS_PER_LONG-2)))) {
+ num -= 2;
+ word <<= 2;
+ }
+ if (!(word & (~0ul << (BITS_PER_LONG-1))))
+ num -= 1;
+ return num;
+}
+
+#endif /* _ASM_GENERIC_BITOPS___FLS_H_ */
--
| Adrian Bunk | If you want me to quit I will quit |
| Andi Kleen | [PATCH x86] [6/16] Add a new arch_early_alloc() interface for x86-64 |
| Chuck Ebbert | Why do so many machines need "noapic"? |
| Linus Torvalds | Linux 2.6.27-rc8 |
git: | |
| Sean Brown | git repository size vs. subversion repository size |
| Linus Torvalds | Re: git versus CVS (versus bk) |
| drafnel | [PATCH 2/5] Make mktag a builtin. |
| Andrew Morton | Untracked working tree files |
| Richard Stallman | Real men don't attack straw men |
| Todd Pytel | IDE or SCSI virtual disks for VMWare image? |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Parvinder Bhasin | Disabling IPv6 ? |
| Mark Seger | occasionally corrupted network stats in /proc/net/dev |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Hannes Eder | [PATCH 00/27] drivers/net: fix sparse warnings |
| Gerrit Renker | [PATCH 36/37] dccp: Initialisation and type-checking of feature sysctls |
