On Friday 26 October 2007 09:09, Andi Kleen wrote:
But we have to mark all memory access inside the critical section
as volatile, even after the CPU synchronisation, and often the
common case where there is no contention or cacheline bouncing.
Sure, real code is dominated by cache misses anyway, etc etc.
However volatile prevents a lot of real optimisations that we'd
actually want to do, and increases icache size etc.
sbb as well.
It might just depend on the instruction costs that gcc knows about.
For example, if ld/st is expensive, they might hoist them out of
loops etc. You don't even need to have one of these predicate or
pseudo predicate instructions.
But we don't actually know what it is, and it could change with
different architectures or versions of gcc. I think the sanest thing
is for gcc to help us out here, seeing as there is this very well
defined requirement that we want.
If you really still want the optimisation to occur, I don't think it
is too much to use a local variable for the accumulator (eg. in the
simple example case).
-