login
Header Space

 
 

[1/3] Introduce a generic __fls implementation

Previous thread: YOUR ATM CARD IS READY!!!!!!!! by mrs linda Hill on Saturday, March 15, 2008 - 11:20 am. (1 message)

Next thread: x86: not showing f00f bug correctly in /proc/cpuinfo by David Martin on Saturday, March 15, 2008 - 3:11 pm. (2 messages)
To: Andrew Morton <akpm@...>, linux-arch <linux-arch@...>
Cc: Ingo Molnar <mingo@...>, Andi Kleen <andi@...>, LKML <linux-kernel@...>
Date: Saturday, March 15, 2008 - 1:29 pm

This series of patches:

[1/3] adds __fls.h to asm-generic
[2/3] modifies asm-*/bitops.h for 64-bit archs to implement __fls
[3/3] modifies asm-generic/fls64.h to make use of __fls

I have compiled i386 and x86_64, and they generate the same code as
before the change. The changes to the other archs are a best effort.
Please comment.

If this patch series is accepted, it will make one tiny bit of
the x86-unification a tiny bit cleaner. The patches are against
Linus' current tree.

Andrew, if no concensus can be reached that this is a bad patch
series, would you be willing to add this to your tree?

Greetings,
	Alexander
--
To: Alexander van Heukelum <heukelum@...>
Cc: Andrew Morton <akpm@...>, linux-arch <linux-arch@...>, Ingo Molnar <mingo@...>, Andi Kleen <andi@...>, LKML <linux-kernel@...>
Date: Thursday, April 3, 2008 - 1:19 pm

I strongly support this.

I wish we'd also have a consistent naming convention for all
the bitops functions so it will be clearer what data type the
function is working on and is the result 0 or 1 based.

It seems like what we currently have is:

name	type	first bit#
----	----	----------
ffs	int	1
fls	int	1
__ffs	ulong	0
__fls	ulong	0	# in your proposal
ffz	ulong	0
fls64	__u64	1

so it seems like
- ffz is misnamed and is rather confusing.
  Apprently is should be renamed to __ffz.

- (new) ffz(x) can be defined to ffs(~(x))

- It'd be nice to have ffs64, and maybe ffz64.


--
To: Benny Halevy <bhalevy@...>, Andrew Morton <akpm@...>
Cc: linux-arch <linux-arch@...>, Ingo Molnar <mingo@...>, Andi Kleen <andi@...>, LKML <linux-kernel@...>, <heukelum@...>
Date: Friday, April 4, 2008 - 10:22 am

I think every programmer who thinks in terms of bits realises
that ffz(x) == __ffs(~x) and ffz(~x) == __ffs(x) etc... so I
would rather get rid of ffz entirely by converting all uses
to __ffs. Patch (against current linus) below. After that all
implementations of ffz could be removed.

ffs64 would be a good addition to complete the set of functions,
but that would be the same as glibc's (and gcc-builtin) ffsll.

Looking into that... the relevant gcc builtins are __builtin_ffs
(find first set bit), __builtin_clz (count leading zeroes),
__builtin_ctz (count trailing zeroes), __builtin_popcount, maybe
__builtin_parity and their -l and -ll variants. Maybe the kernel
should be changed to use those names instead of the current
ones? ffs would stay as it is. __ffs would become ctz, __fls
would become something like 31-clz, and hweight would become
popcount.

Greetings,
	Alexander


[RFC] how about getting rid of ffz?

The patch is not tested, but the conversion should be completely
trivial.

Signed-off-by: Alexander van Heukelum &lt;heukelum@fastmail.fm&gt;

---

 arch/alpha/kernel/irq_i8259.c     |    2 +-
 arch/alpha/kernel/irq_pyxis.c     |    2 +-
 arch/alpha/kernel/sys_alcor.c     |    2 +-
 arch/alpha/kernel/sys_cabriolet.c |    2 +-
 arch/alpha/kernel/sys_dp264.c     |    2 +-
 arch/alpha/kernel/sys_eb64p.c     |    2 +-
 arch/alpha/kernel/sys_mikasa.c    |    2 +-
 arch/alpha/kernel/sys_noritake.c  |    2 +-
 arch/alpha/kernel/sys_rx164.c     |    2 +-
 arch/arm/kernel/smp.c             |    2 +-
 arch/ia64/hp/common/sba_iommu.c   |    2 +-
 arch/ia64/kernel/perfmon.c        |    2 +-
 arch/ia64/kernel/smp.c            |    2 +-
 arch/ia64/mm/init.c               |    2 +-
 arch/mips/kernel/irixelf.c        |    2 +-
 arch/parisc/kernel/smp.c          |    2 +-
 arch/sh/kernel/cpu/irq/imask.c    |    2 +-
 block/blk-barrier.c               |    2 +-
 crypto/lrw.c                      |    2 +-
 drivers/ieee1394/pcilynx.c        |    2 +-
 drivers/input/keyboar...
To: Alexander van Heukelum <heukelum@...>
Cc: Andrew Morton <akpm@...>, linux-arch <linux-arch@...>, Ingo Molnar <mingo@...>, Andi Kleen <andi@...>, LKML <linux-kernel@...>, <heukelum@...>
Date: Sunday, April 6, 2008 - 11:03 am

Yeah, very few architectures have an optimized version of ffz
that will perform noticeably better than __ffs(~x).

Interesting idea.  ctz much better than __ffs with regards to the
return value's first bit number, but unless you expose clz
and convert the code how do you get rid of the __fls vs. fls
confusion?
(BTW for __fls, I'd use BITS_PER_LONG - 1, not 31 :)

I think that adopting libc's convention might make more sense,
i.e., define ffs, ffsl, ffsll, and fls, flsl, flsll, and have *all*
be 1-based.


--
To: Benny Halevy <bhalevy@...>, Alexander van Heukelum <heukelum@...>
Cc: Andrew Morton <akpm@...>, linux-arch <linux-arch@...>, Ingo Molnar <mingo@...>, Andi Kleen <andi@...>, LKML <linux-kernel@...>
Date: Sunday, April 6, 2008 - 3:10 pm

On Sun, 06 Apr 2008 18:03:40 +0300, "Benny Halevy" &lt;bhalevy@panasas.com&gt;

Yeah, and these implementations seem to be based on a loop over
all bits in the word. I don't think adding one extra not-operation

Exposing clz/ctz on all architectures will be the harder part. Changing
all current uses of ffs/fls (and __fls) will take some time. Mostly
because converting code using fls to use clz instead needs to be
done a bit carefully, because fls(0) has defined behaviour, while

I agree that it makes sense for fls. For clz (and ctz) I would choose
clz(unsigned long), clz32(u32), and clz64(u64).

Greetings,
-- 
  Alexander van Heukelum
  heukelum@fastmail.fm

-- 
http://www.fastmail.fm - Choose from over 50 domains or use your own

--
To: Alexander van Heukelum <heukelum@...>
Cc: Andrew Morton <akpm@...>, linux-arch <linux-arch@...>, Andi Kleen <andi@...>, LKML <linux-kernel@...>
Date: Friday, March 21, 2008 - 9:10 am

i've applied #1 and #3 to x86.git/testing, to see how this works out on 
x86. But it looks good to me in general.

	Ingo
--
To: Andrew Morton <akpm@...>, linux-arch <linux-arch@...>
Cc: Ingo Molnar <mingo@...>, Andi Kleen <andi@...>, LKML <linux-kernel@...>, <heukelum@...>
Date: Saturday, March 15, 2008 - 1:32 pm

Use __fls for fls64 on 64-bit archs. The implementation for
64-bit archs is moved from x86_64 to asm-generic.

Signed-off-by: Alexander van Heukelum &lt;heukelum@fastmail.fm&gt;
---
 include/asm-generic/bitops/fls64.h |   22 ++++++++++++++++++++++
 include/asm-x86/bitops_64.h        |   15 ++-------------
 2 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/include/asm-generic/bitops/fls64.h b/include/asm-generic/bitops/fls64.h
index 1b6b17c..86d403f 100644
--- a/include/asm-generic/bitops/fls64.h
+++ b/include/asm-generic/bitops/fls64.h
@@ -3,6 +3,18 @@
 
 #include &lt;asm/types.h&gt;
 
+/**
+ * fls64 - find last set bit in a 64-bit word
+ * @x: the word to search
+ *
+ * This is defined in a similar way as the libc and compiler builtin
+ * ffsll, but returns the position of the most significant set bit.
+ *
+ * fls64(value) returns 0 if value is 0 or the position of the last
+ * set bit if value is nonzero. The last (most significant) bit is
+ * at position 64.
+ */
+#if BITS_PER_LONG == 32
 static inline int fls64(__u64 x)
 {
 	__u32 h = x &gt;&gt; 32;
@@ -10,5 +22,15 @@ static inline int fls64(__u64 x)
 		return fls(h) + 32;
 	return fls(x);
 }
+#elif BITS_PER_LONG == 64
+static inline int fls64(__u64 x)
+{
+	if (x == 0)
+		return 0;
+	return __fls(x) + 1;
+}
+#else
+#error BITS_PER_LONG not 32 or 64
+#endif
 
 #endif /* _ASM_GENERIC_BITOPS_FLS64_H_ */
diff --git a/include/asm-x86/bitops_64.h b/include/asm-x86/bitops_64.h
index aaf1519..1f1f796 100644
--- a/include/asm-x86/bitops_64.h
+++ b/include/asm-x86/bitops_64.h
@@ -112,19 +112,6 @@ static inline int ffs(int x)
 }
 
 /**
- * fls64 - find last bit set in 64 bit word
- * @x: the word to search
- *
- * This is defined the same way as fls.
- */
-static inline int fls64(__u64 x)
-{
-	if (x == 0)
-		return 0;
-	return __fls(x) + 1;
-}
-
-/**
  * fls - find last bit set
  * @x: the word to search
  *
@@ -146,6 +133,8 @@ static inline int fls(int x)
 
 #endif /* __KERNEL__ */...
To: Alexander van Heukelum <heukelum@...>
Cc: Andrew Morton <akpm@...>, linux-arch <linux-arch@...>, Ingo Molnar <mingo@...>, Andi Kleen <andi@...>, LKML <linux-kernel@...>, <heukelum@...>
Date: Saturday, July 5, 2008 - 12:56 pm

(Sorry, sending this again as I screwed up the previous mail).

Hi,

I have a question about fls64() which I hope you or someone else could
clarify, please see below.


It seems fls64() is implemented on top of __fls(), however the __fls()
implementation on the x86-64 architecture states that the result is
undefined if the argument does not have any zero bits.
So if I understand correctly, the statement "fls64(~0ULL)" would return
an undefined result on x64-64 instead of 64 as one would expect.

Wouldn't it make sense to check for ~0ULL in fls64()?

Thanks,
Ricardo


--
To: Ricardo M. Correia <Ricardo.M.Correia@...>, Ingo Molnar <mingo@...>
Cc: Andrew Morton <akpm@...>, linux-arch <linux-arch@...>, Andi Kleen <andi@...>, LKML <linux-kernel@...>, <heukelum@...>
Date: Saturday, July 5, 2008 - 1:53 pm

Ricardo M. Correia spotted that the use of __fls() in fls64() did
not seem to make sense. In fact fls64()'s implementation is fine,
but the description of __fls() was wrong. Fix that.

Reported-by: "Ricardo M. Correia" &lt;Ricardo.M.Correia@Sun.COM&gt;
Signed-off-by: Alexander van Heukelum &lt;heukelum@fastmail.fm&gt;

---


You have found a bug. It's not in fls64, though, but a copy/paste
one in the comment preceding __fls(). __fls() gives an undefined
result if there are no _set_ bits: only __fls(0) gives an undefined
result.

The inconsistency is well-spotted, though, thanks.

Patch is against current -tip.

Greetings,

---

diff --git a/include/asm-x86/bitops.h b/include/asm-x86/bitops.h
index 96b1829..cfb2b64 100644
--- a/include/asm-x86/bitops.h
+++ b/include/asm-x86/bitops.h
@@ -356,7 +356,7 @@ static inline unsigned long ffz(unsigned long word)
  * __fls: find last set bit in word
  * @word: The word to search
  *
- * Undefined if no zero exists, so code should check against ~0UL first.
+ * Undefined if no set bit exists, so code should check against 0 first.
  */
 static inline unsigned long __fls(unsigned long word)
 {
--
To: Alexander van Heukelum <heukelum@...>
Cc: Ricardo M. Correia <Ricardo.M.Correia@...>, Andrew Morton <akpm@...>, linux-arch <linux-arch@...>, Andi Kleen <andi@...>, LKML <linux-kernel@...>, <heukelum@...>
Date: Friday, July 18, 2008 - 8:33 am

applied to tip/x86/cleanups, thanks!

	Ingo
--
To: Andrew Morton <akpm@...>, linux-arch <linux-arch@...>
Cc: Ingo Molnar <mingo@...>, Andi Kleen <andi@...>, LKML <linux-kernel@...>, <heukelum@...>
Date: Saturday, March 15, 2008 - 1:31 pm

Implement __fls on all 64-bit archs:

alpha has an implementation of fls64.
	Added __fls(x) = fls64(x) - 1.

ia64 has fls, but not __fls.
	Added __fls based on code of fls.

mips and powerpc have __ilog2, which is the same as __fls.
	Added __fls = __ilog2.

parisc, s390, sh and sparc64:
	Include generic __fls.

x86_64 already has __fls.

Signed-off-by: Alexander van Heukelum &lt;heukelum@fastmail.fm&gt;
---
 include/asm-alpha/bitops.h   |    5 +++++
 include/asm-ia64/bitops.h    |   16 ++++++++++++++++
 include/asm-mips/bitops.h    |    5 +++++
 include/asm-parisc/bitops.h  |    1 +
 include/asm-powerpc/bitops.h |    5 +++++
 include/asm-s390/bitops.h    |    1 +
 include/asm-sh/bitops.h      |    1 +
 include/asm-sparc64/bitops.h |    1 +
 8 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/include/asm-alpha/bitops.h b/include/asm-alpha/bitops.h
index 9e19a70..15f3ae2 100644
--- a/include/asm-alpha/bitops.h
+++ b/include/asm-alpha/bitops.h
@@ -388,6 +388,11 @@ static inline int fls64(unsigned long x)
 }
 #endif
 
+static inline unsigned long __fls(unsigned long x)
+{
+	return fls64(x) - 1;
+}
+
 static inline int fls(int x)
 {
 	return fls64((unsigned int) x);
diff --git a/include/asm-ia64/bitops.h b/include/asm-ia64/bitops.h
index 953d3df..e2ca800 100644
--- a/include/asm-ia64/bitops.h
+++ b/include/asm-ia64/bitops.h
@@ -407,6 +407,22 @@ fls (int t)
 	return ia64_popcnt(x);
 }
 
+/*
+ * Find the last (most significant) bit set.  Undefined for x==0.
+ * Bits are numbered from 0..63 (e.g., __fls(9) == 3).
+ */
+static inline unsigned long
+__fls (unsigned long x)
+{
+	x |= x &gt;&gt; 1;
+	x |= x &gt;&gt; 2;
+	x |= x &gt;&gt; 4;
+	x |= x &gt;&gt; 8;
+	x |= x &gt;&gt; 16;
+	x |= x &gt;&gt; 32;
+	return ia64_popcnt(x) - 1;
+}
+
 #include &lt;asm-generic/bitops/fls64.h&gt;
 
 /*
diff --git a/include/asm-mips/bitops.h b/include/asm-mips/bitops.h
index ec75ce4..c2bd126 100644
--- a/include/asm-mips/bitops.h
+++ b/include/asm-mips/bito...
To: Andrew Morton <akpm@...>, linux-arch <linux-arch@...>
Cc: Ingo Molnar <mingo@...>, Andi Kleen <andi@...>, LKML <linux-kernel@...>, <heukelum@...>
Date: Saturday, March 15, 2008 - 1:30 pm

Add a generic __fls implementation in the same spirit as
the generic __ffs one. It finds the last (most significant)
set bit in the given long value.

Signed-off-by: Alexander van Heukelum &lt;heukelum@fastmail.fm&gt;
---
 include/asm-generic/bitops/__fls.h |   43 ++++++++++++++++++++++++++++++++++++
 1 files changed, 43 insertions(+), 0 deletions(-)
 create mode 100644 include/asm-generic/bitops/__fls.h

diff --git a/include/asm-generic/bitops/__fls.h b/include/asm-generic/bitops/__fls.h
new file mode 100644
index 0000000..be24465
--- /dev/null
+++ b/include/asm-generic/bitops/__fls.h
@@ -0,0 +1,43 @@
+#ifndef _ASM_GENERIC_BITOPS___FLS_H_
+#define _ASM_GENERIC_BITOPS___FLS_H_
+
+#include &lt;asm/types.h&gt;
+
+/**
+ * __fls - find last (most-significant) set bit in a long word
+ * @word: the word to search
+ *
+ * Undefined if no set bit exists, so code should check against 0 first.
+ */
+static inline unsigned long __fls(unsigned long word)
+{
+	int num = BITS_PER_LONG - 1;
+
+#if BITS_PER_LONG == 64
+	if (!(word &amp; (~0ul &lt;&lt; 32))) {
+		num -= 32;
+		word &lt;&lt;= 32;
+	}
+#endif
+	if (!(word &amp; (~0ul &lt;&lt; (BITS_PER_LONG-16)))) {
+		num -= 16;
+		word &lt;&lt;= 16;
+	}
+	if (!(word &amp; (~0ul &lt;&lt; (BITS_PER_LONG-8)))) {
+		num -= 8;
+		word &lt;&lt;= 8;
+	}
+	if (!(word &amp; (~0ul &lt;&lt; (BITS_PER_LONG-4)))) {
+		num -= 4;
+		word &lt;&lt;= 4;
+	}
+	if (!(word &amp; (~0ul &lt;&lt; (BITS_PER_LONG-2)))) {
+		num -= 2;
+		word &lt;&lt;= 2;
+	}
+	if (!(word &amp; (~0ul &lt;&lt; (BITS_PER_LONG-1))))
+		num -= 1;
+	return num;
+}
+
+#endif /* _ASM_GENERIC_BITOPS___FLS_H_ */

--
Previous thread: YOUR ATM CARD IS READY!!!!!!!! by mrs linda Hill on Saturday, March 15, 2008 - 11:20 am. (1 message)

Next thread: x86: not showing f00f bug correctly in /proc/cpuinfo by David Martin on Saturday, March 15, 2008 - 3:11 pm. (2 messages)
speck-geostationary