Hi,
The version that is in x86#testing _will_ do this optimization. For
32 node SMP on x86_64 this results in:
<__first_cpu>:
mov $0x20,%edx (inlined...)
mov $0x100000000,%rax
or (%rdi),%rax
bsf %rax,%rax (... find_first_bit)
cmp $0x20,%eax (superfluous paranoia...)
cmovg %edx,%eax (... for broken find_first_bit)
retq
and something similar for __next_cpu.
for_each_cpu code looks fine:
mov $cpumapaddress,%rdi
callq <__first_cpu>
jmp end_of_body
start_of_body:
...
end_of_body:
mov $cpumapaddress,%edi ($mapaddress often cached in register)
callq <__next_cpu>
cmp $0x1f,%eax
jle start_of_body
On the other hand it would be nice to change __first_cpu and
__next_cpu into inline functions. If all implementations of
find_first_bit and find_next_bit would reliably return max_size
if no bits were found, that would be a good thing to do. The
generic one does return max_size.
Greetings,
Alexander
--
Alexander van Heukelum
heukelum@fastmail.fm
--
http://www.fastmail.fm - One of many happy users:
http://www.fastmail.fm/docs/quotes.html
--