It's not cmpxchg, just xchg.
In other words, is:
lock btr $_PAGE_BIT_RW, (%rbx)
much cheaper than
mov $0, %rax
xchg %rax, (%rbx)
and $~_PAGE_RW, %rax
mov %rax, (%rbx)
?
It's the same number of locked RMW operations, so aside from being a few
instructions longer, I think it would be much the same.
I guess the correct answer is "lmbench".
J
--