No need for a sentinal if we explicitly catch the no bits/all bits set
cases, make it clear they are special cases returning size/BITS_PER_LONG.Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
---
include/linux/bitops.h | 93 ++++++++++++++++++++----------------------------
1 files changed, 39 insertions(+), 54 deletions(-)diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 48bde60..d9eb58a 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -127,18 +127,17 @@ extern unsigned long __find_first_bit(const unsigned long *addr,
static __always_inline unsigned long
find_first_bit(const unsigned long *addr, unsigned long size)
{
- /* Avoid a function call if the bitmap size is a constant */
- /* and not bigger than BITS_PER_LONG. */
-
- /* insert a sentinel so that __ffs returns size if there */
- /* are no set bits in the bitmap */
- if (__builtin_constant_p(size) && (size < BITS_PER_LONG))
- return __ffs((*addr) | (1ul << size));
-
- /* the result of __ffs(0) is undefined, so it needs to be */
- /* handled separately */
- if (__builtin_constant_p(size) && (size == BITS_PER_LONG))
- return ((*addr) == 0) ? BITS_PER_LONG : __ffs(*addr);
+ /*
+ * Avoid a function call if the bitmap size is a constant and not
+ * bigger than BITS_PER_LONG. Ensure we return size if there are
+ * no set bits.
+ */
+ if (__builtin_constant_p(size) && (size <= BITS_PER_LONG)) {
+ if (*addr == 0)
+ return (size < BITS_PER_LONG) ? size : BITS_PER_LONG;
+ else
+ return __ffs(*addr);
+ }/* size is not constant or too big */
return __find_first_bit(addr, size);
@@ -157,20 +156,17 @@ extern unsigned long __find_first_zero_bit(const unsigned long *addr,
static __always_inline unsigned long
find_first_zero_bit(const unsigned long *addr, unsigned long size)
{
- /* Avoid a function call if the bitmap size is a constant */
- /* and not bigger than BITS_PER_LONG. */
-
- /* insert a sentin...
Hi,
Ingo just pointed me to this thread.
On Sun, 27 Apr 2008 13:19:51 -0700, "Harvey Harrison"
Ehm... assume size=16 and *addr=0x80000000
--
Alexander van Heukelum
heukelum@fastmail.fm--
http://www.fastmail.fm - mmm... Fastmail...--
These things need to be re-done anyway. David already complained that the
thing inlines to something much much too big.Let's just make it not be an inline at all.
Linus
--
On Sun, 27 Apr 2008 13:26:34 -0700 (PDT), "Linus Torvalds"
Hello,
I think that's the wrong way to go about it. The problem that David
Miller pointed out was that the small-bitmap optimization caused an
inliningdisaster. Apparently, the generic version of __ffs is very
big on sparc64. In my opinion the implementation of __ffs should be
made out of line. If you move the wrapper of find_first_bit etc
out of line, all possibilities of optimization will be destroyed.
Better to remove them completely, then.Greetings,
--
Alexander van Heukelum
heukelum@fastmail.fm--
http://www.fastmail.fm - Accessible with your email software
or over the web--
The delta between the inlined version including the constant
optimizations and the original non inlined version is exactly 1400Which makes the whole constant optimization moot.
How about making the optimization controlled by a config switch so
arch maintainers can decide whether they want to enable the constant
optimization or unconditionally call the lib function ?See patch below. It gives back the 1400 bytes on SPARC64 and other
platforms that have no instruction for find bit.Thanks,
tglx---------->
Subject: bitops: optional bitops mapsize optimization on config switch
From: Thomas Gleixner <tglx@linutronix.de>
Date: Mon, 28 Apr 2008 00:22:06 +0200The mapsize optimizations which were moved from x86 to the generic
code in commit 64970b68d2b3ed32b964b0b30b1b98518fde388e increased the
binary size on non x86 architectures.Make the optimization depend on a config switch so architecture
maintainers can decide.Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>---
arch/x86/Kconfig.cpu | 1 +
include/linux/bitops.h | 6 ++++--
2 files changed, 5 insertions(+), 2 deletions(-)Index: linux-2.6/arch/x86/Kconfig.cpu
===================================================================
--- linux-2.6.orig/arch/x86/Kconfig.cpu
+++ linux-2.6/arch/x86/Kconfig.cpu
@@ -282,6 +282,7 @@ config X86_CPU
def_bool y
select GENERIC_FIND_FIRST_BIT
select GENERIC_FIND_NEXT_BIT
+ select GENERIC_FIND_BIT_MAPSIZE_OPTIMIZEconfig X86_GENERIC
bool "Generic x86 support"
Index: linux-2.6/include/linux/bitops.h
===================================================================
--- linux-2.6.orig/include/linux/bitops.h
+++ linux-2.6/include/linux/bitops.h
@@ -190,6 +190,7 @@ static __always_inline unsigned long
find_next_bit(const unsigned long *addr, unsigned long size,
unsigned long offset)
{
+#ifdef CONFIG_GENERIC_FIND_BIT_MAPSIZE_OPTIMIZE
unsigned long value;/* Avoid a functi...
No.
This is just making that damn header line look worse and worse.
Is there a _reason_ to optimize this stupid function this way?
Linus
--
On architectures which have a find bit instruction we can replace the
call to the library function by a single instruction when the size of
the bitmap is less/equal bits per long and when the bitnr is a
constant.Thanks,
tglx--
That's not what I asked.
I asked whether there is a *reason* to optimize this and cause all these
stupid problems.Is there a real hot path anywhere that actually uses this and depends on
it?Linus
--
A long time ago during profiling I found that in the scheduler back then
for_each_cpu() with find_next_bit() was somewhat hot. But I'm not sure it
still would be in the new scheduler anyways and the test case a little
dumb anyways (overscheduling user space)Still I think for_each_cpu() should be reasonably stream lined code.
-Andi
--
Sorry, misunderstood your question.
I checked the use cases and have not found a single one for
find_next_(zero_)bit() which makes use of this micro optimization. In
fact the .text section of vmlinux of the "optimized" and the straight
function call are identical. So the effect of those micro
optimizations is exactly zero.find_first_(zero_)bit() has a couple of places where the optimization
hits, but the code size reduction is mere 21 bytes and the use cases
are not in real hot pathes AFAICT.I doubt that that is worth the trouble and we should just remove those
inlines alltogether. This was discussed before, but Andi objected to
remove those micro optimizations and nobody had time to actually
verify the real benefits.I'll whip up a patch.
Thanks,
tglx
--
Thanks for checking.
In general, I think we should act the other way: we should *assume* that
inlines and complex macros in header files are not worth it and we'd be
better off with just plain function calls, unless it's really obvious that
the inline function is always worth it.We have tons of good examples of inline functions in headers that really
*really* want to be that way: <linux/list.h> is generally a collection of
things that clearly expand to somthing that is (a) easily critical and (b)
inlining things generally really causes a smaller code footprint than
having a "call" instruction and all the argument setup - regardless of how
many times it is done.So we do have cases where the inlines are obviously worth it. But in
general, I think we should try to move things from the header files into
*.c files unless there is a really clear reason for keeping it that way.(Some inline functions do depend on things like constant arguments goign
away, with kmalloc() being a good example of some really nasty header file
tricks that are likely worth it despite their extreme ugliness)Linus
--
The mapsize optimizations which were moved from x86 to the generic
code in commit 64970b68d2b3ed32b964b0b30b1b98518fde388e increased the
binary size on non x86 architectures.Looking into the real effects of the "optimizations" it turned out
that they are not used in find_next_bit() and find_next_zero_bit().The ones in find_first_bit() and find_first_zero_bit() are used in a
couple of places but none of them is a real hot path.Remove the "optimizations" all together and call the library functions
unconditionally.Boot-tested on x86 and compile tested on every cross compiler I have.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
include/linux/bitops.h | 115 +++++--------------------------------------------
lib/find_next_bit.c | 22 ++++-----
2 files changed, 22 insertions(+), 115 deletions(-)Index: linux-2.6/include/linux/bitops.h
===================================================================
--- linux-2.6.orig/include/linux/bitops.h
+++ linux-2.6/include/linux/bitops.h
@@ -114,8 +114,6 @@ static inline unsigned fls_long(unsigned#ifdef __KERNEL__
#ifdef CONFIG_GENERIC_FIND_FIRST_BIT
-extern unsigned long __find_first_bit(const unsigned long *addr,
- unsigned long size);/**
* find_first_bit - find the first set bit in a memory region
@@ -124,28 +122,8 @@ extern unsigned long __find_first_bit(co
*
* Returns the bit number of the first set bit.
*/
-static __always_inline unsigned long
-find_first_bit(const unsigned long *addr, unsigned long size)
-{
- /* Avoid a function call if the bitmap size is a constant */
- /* and not bigger than BITS_PER_LONG. */
-
- /* insert a sentinel so that __ffs returns size if there */
- /* are no set bits in the bitmap */
- if (__builtin_constant_p(size) && (size < BITS_PER_LONG))
- return __ffs((*addr) | (1ul << size));
-
- /* the result of __ffs(0) is undefined, so it needs to be */
- /* handled separately */
- if (__builtin_constant_p(size) && (size =...
From: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David S. Miller <davem@davemloft.net>
--
From: David Miller <davem@davemloft.net>
Ironically, I just bisected sparc64 bootup failures to the following
changeset. It's very late here, and I haven't looked into the
details, but it seems to wedge in free_area_init_nodes() on a non-NUMA
system with CONFIG_NUMA disabled.Isn't it funny that this optimization not only was useless, but also
broke things. :-/64970b68d2b3ed32b964b0b30b1b98518fde388e is first bad commit
commit 64970b68d2b3ed32b964b0b30b1b98518fde388e
Author: Alexander van Heukelum <heukelum@mailshack.com>
Date: Tue Mar 11 16:17:19 2008 +0100x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps
This moves an optimization for searching constant-sized small
bitmaps form x86_64-specific to generic code.On an i386 defconfig (the x86#testing one), the size of vmlinux hardly
changes with this applied. I have observed only four places where this
optimization avoids a call into find_next_bit:In the functions return_unused_surplus_pages, alloc_fresh_huge_page,
and adjust_pool_surplus, this patch avoids a call for a 1-bit bitmap.
In __next_cpu a call is avoided for a 32-bit bitmap. That's it.On x86_64, 52 locations are optimized with a minimal increase in
code size:Current #testing defconfig:
146 x bsf, 27 x find_next_*bit
text data bss dec hex filename
5392637 846592 724424 6963653 6a41c5 vmlinuxAfter removing the x86_64 specific optimization for find_next_*bit:
94 x bsf, 79 x find_next_*bit
text data bss dec hex filename
5392358 846592 724424 6963374 6a40ae vmlinuxAfter this patch (making the optimization generic):
146 x bsf, 27 x find_next_*bit
text data bss dec hex filename
5392396 846592 724424 6963412 6a40d4 vmlinux[ tglx@linutronix.de: build fixes ]
--
The question is, whether it broke things or just unearthed some bug
hidden elsewhere.Thanks,
tglx
--
From: Thomas Gleixner <tglx@linutronix.de>
So I did a quick test, just #if 0'ing out the optimization inline
portions of the find_first_bit() code in linux/bitops.h, and forcing
it to always unconditionally call __find_first_bit() fixes the
regression.Given that others who tested could not find one case where the
optimization cases actually applied, and it's breaking things for me,
my theory is that it's triggering for some obscure case on sparc64 and
thus showing a bug in these optimizations since in practice I'm the
only person to actually test this new code.
--
From: David Miller <davem@davemloft.net>
Ok, I think I see the problem.
The core issue is that (X << N) is undefined when N is >= the
word size, but that's exactly what the find_next_bit() inline
optimizations do.This optimization code will trigger on 64-bit if NR_CPUS is set
to 64, and you actually have 64 cpus. It should also occur
on 32-bit if NR_CPUS=32 and you have 32 cpus.The bogus expansion occurs in lib/cpumask.c:__next_cpu()
After processing cpu 63, we'll use offset==64 and thus try
to make the undefined shift I described above, causing the
caller's cpumask iteration loop to run forever.This code was really not tested very well at all.
--
hm. I guess the next thing you plan to check is whether Thomas's patch
yeah and sorry :-/ It would still be nice to understand why this
happened, it looks like a weird (and unexpected) interaction.Ingo
--
From: Ingo Molnar <mingo@elte.hu>
Yes, I just needed some sleep first, which I just obtained :-)
--
there's another benefit, and in asm-x86 we prefer to move inlines to .c
files even in borderline cases because it simplifies the type
dependencies: not having to fully define all types at the function
prototype site avoids include file dependency hell.Putting things like a task struct dereference into a lowlevel inline
file easily causes dependency problems that causes people to use macros
instead - which have their own set of readability and side-effect
problems.a third argument is that inlines seldom get smaller. So if they are
borderline and we move them into a .c, and later on the function gets
larger, no harm is done. But if we keep the inline in a .h in the
borderline case and we grow the inline later on, the whole kernel bloats
in a multiplied way, without any apparent direct feedback to the
developer that something wrong and harmful just happened.Ingo
--
Hi,
I, personally, see this patch as a stopgap measure. The real problem
is that the generic __ffs is inlined and ends up generating a lot
of instructions.Until the generic implementation of __ffs is fixed not to be an
enormous inline function, this seems to be a reasonable thing
to do, though it's a shame that the optimization is now default-
off for architectures with a good implementation of __ffs.Greetings,
--
Alexander van Heukelum
heukelum@fastmail.fm--
http://www.fastmail.fm - mmm... Fastmail...--
Yup, it hasn't been there before and we explicitely enable it on
x86. If there are other archs which have a good one it's a nobrainer
to enable it.Thanks,
tglx--
Oh, I didn't realize, I only did this because sparse started spewing out
lots of:
include/linux/bitops.h:166:32: warning: shift too big (65536) for type unsigned longdue to shift by size there, and again on line 202...I just wanted something
that sparse wouldn't warn about and was a little easier to understand to boot.Cheers,
Harvey
--
Well, that's really just a sparse bug/misfeature that didn't matter
before.It happens because the warning is done as part of constant expression
evaluation, but then the expression isn't actually *used*, so when we
optimize it away - at a later date - it's too late to undo the warning.I don't rightly know how to fix it. We do want to do the constant
evaluation early (since all the optimizations that may then make it a
non-issue depends on constants being constants!), but in order to not
output the warning we'd have to turn the constant into a "constant with
warning 'xyz' if used".Which we have no support for in sparse yet.
Linus
--
It's not even a constant, really... What we need is
* don't fold further if that sucker would be evaluated (i.e. 0 && <...>
is folded, 0 & <...> is not)
* don't consider an integer constant expression with value 0 for
purposes of null pointer constant handling
* emit corresponding insn in linearize_expression() when we run into
such sucker.
* somewhere around the call of vrfy_flow() walk through insns in
remaining bb (we'd done dead code elimination already) and spew postponed
warning on each such insn. Probably. Assuming that I'm not entirely
confused about what's going on in that area - which is quite possible.Folks, could somebody familiar with the backend comment on the last part?
--
That's a sparse problem, really. I wonder if we simply should introduce a
new node type: EXPR_WARN. So that expand would generate those from things
like division by zero/overflow/bad shift *and* emitting an insn for those
would generate a stored warning.Objections?
--
Well, even though it is a sparse problem, I think my revised version was
None here
Harvey
--
| david | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 005/196] Chinese: add translation of SubmittingDrivers |
| Vladislav Bolkhovitin | Re: Integration of SCST in the mainstream Linux kernel |
| Jan Engelhardt | intel iommu (Re: -mm merge plans for 2.6.23) |
git: | |
| David Miller | [GIT]: Networking |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Natalie Protasevich | [BUG] New Kernel Bugs |
