On Mon, Oct 15, 2007 at 11:10:00AM +0200, Jarek Poplawski wrote:
It's more expensive than nothing, sure. However in real code, algorithmic
complexity, cache misses and cacheline bouncing tend to be much bigger
issues.
I can't think of a place in the kernel where smp_rmb matters _that_ much.
seqlocks maybe (timers, dcache lookup), vmscan... Obviously removing the
lfence is not going to hurt. Maybe we even gain 0.01% performance in
someone's workload.
Also, remember: if loads are already in-order, then lfence is a noop,
right? (in practice it seems to have to do a _little_ bit of work, but
it's like a dozen cycles).
No. Why are you reading that much into it? I know for a fact that some
non-x86 architectures actual implementations have stronger ordering than
their ISA allows. It's nothing to do with you "believing" how the hardware
works. That's not what the document is describing (directly).
Really? Consider the consequences if, instead of releasing this latest
document tightening consistency, Intel found that out of order loads
were worth 5% more performance and implemented them in their next chip.
The chip could be completely backwards compatible, but all your old code
would break, because it was broken to begin with (because it was outside
the spec).
IMO Intel did exactly the right thing from an engineering perspective,
and so did Linux to always follow the spec.
-