The kernel thread variant wont be any more lockless than the
smp_call_function_single() approach, they both have to grab the
destination queue lock. If you recall, I pushed forward on the kernel
thread variant and even still have it online here:
http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=io-cpu-affinity-kthread
which is pretty much identical to io-cpu-affinity, except it uses kernel
threads for completion.
The reason why I dropped the kthread approach is that it was slower.
Time from signal to run was about 33% faster with IPI than with
wake_up_process(). Doing benchmark runs, and the IPI approach won hands
down in cache misses as well.
The patchset does not build on smp_call_function(), it merely cleans
that stuff up instead of having essentially the same code in each arch.
As more archs are converted, it'll remove lots more code.
The block stuff builds on smp_call_function_single(), which doesn't
suffer from any of the badness that smp_call_function() does.
--
Jens Axboe
--