> to be honest, on 64 bit the overhead is quite small (the extra
The pipeline dependency stalls yes, the icache/decode overhead no.
Also CONFIG_FRAME_POINTER currently enables -fno-sibling-calls
which generates significantly worse code for a lot of common
kernel constructs.
%rbp is a general purpose register. In fact it's even better
than a general purpose register because it has often shorter
encoding than the other registers, but is as versatile.
Even on modern CPUs you can measure it in macro benchmarks, at
least that was the state last time that was investigated.
On older CPUs without the magic hardware it was even more
significant. Also there are even new CPUs like Atom which don't
have the magic hardware.
-Andi
--