I think you could do that lockless if you use a similar data structure
as netchannels (essentially a fixed size single buffer queue with atomic
exchange of the first/last pointers) and not using a list. That would avoid
at least one bounce for the lock and likely another one for the list
manipulation.
Also the right way would be to not add a second mechanism for this,
but fix the standard smp_call_function_single() to support it.
-Andi
--