No. There is no cmpxchg used in the patches that I tested. The slowdown
seem to come from the need to serialize at barriers. Adding an interrupt
enable/disable in the middle of the hot path creates another serialization
point.
Ummm... That is what I did. See the included patch that you quoted. The
measurements show that such a fallback is not preserving the performance
on IA64.
-