i suspect _any_ abstract locking functionality around a data structure
can be implemented via atomic control over just a single user-space bit.
That bit can be used as a lock and if all access to the state of that
atomic variable uses it, arbitrary higher-order atomic state transitions
can be derived from it. The cost would be a bit more instructions in the
fastpath, but there would still only be a single atomic op (the acquire
op), as the unlock would be a natural barrier (on x86 at least).
Concurrency (and scheduling) of that lock would still be exactly the
same as with genuine 64-bit (or even larger) atomic ops, and the
fastpath would be very close as well.
Ingo
--