Re: [patch] x86: improved memory barrier implementation

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Nick Piggin
Date: Saturday, September 29, 2007 - 6:17 am

On Fri, Sep 28, 2007 at 09:15:06AM -0700, Linus Torvalds wrote:

Actually, in a userspace test I have (which actually does enough
work to trigger out of order operations on my core2 but is otherwise
pretty trivial), lfence takes 13 cycles, sfence takes 40 (neither of
which actually solve the problem of load vs store ordering, but at
least they might be operating on a slightly utilised memory subsystem
rather than the stupidest possible microbenchmark).

The dummy lock op takes around 75 cycles (of course, the core2 would
always use the fences, but older CPUs will not and will be worse at the
lock op too, probably).

I suppose these could take significantly longer if there are uncached
memory operations and such (I wasn't doing any significant amount
of IO) -- I can't be sure, though.

So it isn't much, but it could be helpful. If the code is important enough
to go without locks and instead use complex barriers, it might easily be
worth saving this kind of cycles on.



Will add, I'll ask Alan to specify what he'd like to see there.



Maybe you're thinking of uncached / WC? Non-temporal stores to cacheable
RAM apparently can go out of order too, and they are being used in the kernel
for some things. Likewise for rep stos, apparently. But this means they are
already at odds with spin_unlock, unless they are enclosed with mfences 
everywhere they are used (of which I think most are not). So this is an
existing bug in the kernel.

So again the question comes up -- do we promote these kinds of stores
to be regular x86 citizens, keep the strong memory barriers as they are, and
eat 40 cycles with an sfence before each spin_unlock store; or do we fix the
few users of non-temporal stores and continue with the model we've always
had where stores are in-order? Or I guess the implicit option is to do nothing
until some poor bastard has the pleasure of having to debug some problem.

Anyway, just keep in mind that this patch is not making any changes
which are not already fundamentally broken. Sure, it might happen to
cause more actual individual cases to break, but if they just happened
to be using real locking instead of explicit barriers, they would be
broken anyway, right? (IOW, any new breakage is already conceptually
broken, even if OK in practice due to the overstrictness of our current
barriers).
-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[patch] x86: improved memory barrier implementation, Nick Piggin, (Fri Sep 28, 8:48 am)
Re: [patch] x86: improved memory barrier implementation, Linus Torvalds, (Fri Sep 28, 9:15 am)
Re: [patch] x86: improved memory barrier implementation, Nick Piggin, (Sat Sep 29, 6:17 am)
Re: [patch] x86: improved memory barrier implementation, Linus Torvalds, (Sat Sep 29, 9:07 am)