atomic_read() and atomic_set() don't inherently cause changes to be visible on
other CPUs any faster than ++ and -- operators. Sometimes it happens to work
out that way as a result of how the compiler and the CPU order operations, but
there's no semantic guarantee, and it could even take arbitrarily long under
some circumstances. If you want to use atomic ops to close the race, you need
to use barriers.
-- Chris
--