One really expensive but safe way to do atomic operations is to always
have them done on one CPU only, and provide a mechanism for other CPUs
to ask for an atomic operation using an inter-processor-interrupt.
Why can't you do a hash by memory address for this?
I would guess you can define an instruction to atomically set and check
a bit in a shared array of implementation-specific size, by passing
a token in that by convention is the memory address you want to lock.
Given two priviledged instructions
/* returns one if we got the lock, zero if someone else holds it */
bool hashlock_addr(volatile void *addr);
void hashunlock_addr(volatile void *addr);
you can do
int atomic_add_return(int i, atomic_t *v)
{
int temp;
while (!hashlock_addr(v))
;
smp_rmb();
temp = v->counter;
temp += i;
v->counter = temp;
smp_wmb();
hashunlock_addr(v);
}
static inline unsigned long __cmpxchg(volatile unsigned long *m,
unsigned long old, unsigned long new)
{
unsigned long retval;
unsigned long flags;
while (!hashlock_addr(m))
;
smp_rmb()
retval = *m;
if (retval == old) {
*m = new;
smp_wmb();
}
hashunlock_addr(m);
return retval;
}
Anything else you can build on top of these two, including the system calls
that are used from user applications. Since you never hold that bit lock for
more than a few cycles, you could do with much less than 1K bits, in theory
a single global mutex (ignoring the address entirely) would be enough.
That said, a real load-locked/store-conditional would be much more powerful,
in particular because it can also be used from user space, and it is typically
more efficient because it uses the same mechanisms as the cache coherency
protocol.
Arnd
--