[...]
I don't think so. You're making way too many assumptions about the code
generated by gcc.
This kind of stuff absolutely can be done, *BUT* it requires the
cooperation of the compiler. The right way to do this is to negotiate a
set of appropriate builtins with the gcc people, and use them. This
means this optimization will only work when compiled with the new gcc,
so there is a substantial lag, but it's the only sane way to do this
kind of stuff.
-hpa
--