Well, send-to-all can be special cased (is already at the apic IPI
level, but we could have a broadcast queue as well).
But I wonder how common an operation that is? Most calls to
smp_call_function_mask are sending to mm->cpu_vm_mask. For a small
number of cores, that could well be broadcast, but as the core count
goes up, the likelihood that all cpus have been involved with a given mm
will go down (very workload dependent, of course).
It could be that if we're sending to more than some proportion of the
cpus, it would be more efficient to just broadcast, and let the cpus
work out whether they need to do anything or not. But that's more or
less the scheme we have now...
J
--