How is this even an optimization? It looks SLOWER to me. The
conditional read wastes memory bandwidth sometimes, if the condition is
true, and v isn't already in the cache. The unconditional write wastes
memory bandwidth ALL the time, and dirties/flushes caches, in addition
to not being thread safe.
This SHOULD be using a conditional write instead of a conditional read
and an unconditional write.
-