On Wed, 2010-12-15 at 11:04 -0600, Christoph Lameter wrote:
Well, depends on how often you need that address I'd think. If you'd
have a per-cpu struct and need to frob lots of variables in that struct
it might be cheaper to simply compute the struct address once and then
use relative addresses than to prefix everything with %fs.
I thought you'd only need a single arithmetic op to calculate the
address, anyway at some point those 1 byte prefixes will add up to more
than the ops saved.
In the current code you add 2 bytes (although you safe one from loosing
the LOCK prefix, but that could have been achieved by using
cmpxchg_local() as well. These 2 bytes are probably less than the
address computation for head (and not needing the head pointer again
saves on register pressure) so its probably a win here.
Still, non of this is really fast-path code, so I really wonder why
we're optimizing this over keeping the code obvious.
Afaik the current callers are all from IRQ/NMI context, but I don't want
to mandate callers be from such contexts.
The problem is that we need to guarantee we raise the self-IPI on the
same cpu we queued the worklet on.
--