You don't need the "smp_mb__before_clear_bit()" there.
The regular "clear_bit()" needs it, but the "test_and_xxx()" versions are
architecturally defined to be memory barriers, exactly because they are
regularly used for locking.
This is even documented - see Documentation/atomic_ops.txt.
Linus
-