Re: [patch] x86: improved memory barrier implementation

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Nick Piggin <npiggin@...>
Cc: Alan Cox <alan@...>, Linux Kernel Mailing List <linux-kernel@...>, Andi Kleen <ak@...>
Date: Saturday, September 29, 2007 - 12:07 pm

On Sat, 29 Sep 2007, Nick Piggin wrote:

I'm really saying that people to a first approximation should think "NT is 
an IO (DMA) thing". Whether cached or not. Exactly because they do not 
honor the normal memory ordering.

It may be worth noting that "clflush" falls under that heading too - even 
if all the actual *writes* were done with totally normal writes, if 
anybody does a clflush instruction, that breaks the ordering, and that 
turns it to "DMA ordering" again - ie we're not talking about the normal 
SMP ordering rules at all.

So all the spinlocks and all the smp_*mb() barriers have never really done 
*anything* for those things (in particular, "smp_wmb()" has *always* 
ignored them on i386!)


No. As far as I can tell, the fast string operations are unordered 
*within*themselves*, but not wrt the operations around it.

In other words, you cannot depend on the ordering of stores *in* the 
memcpy() or memset() when it is implemented by "rep movs/stos" - but that 
is 100% equivalent to the fact that you cannot depend on the ordering even 
when it isn't - since the "memcpy()" library routine might be copying 
memory backwards for all you know!

The Intel memory ordering paper doesn't talk about the fast string 
instructions (except to say that the rules it *does* speak about do not 
hold), but the regular IA manuals do say (for example):

   "Code dependent upon sequential store ordering should not use the 
    string operations for the entire data structure to be stored. Data and 
    semaphores should be separated. Order dependent code should use a 
    discrete semaphore uniquely stored to after any string operations to 
    allow correctly ordered data to be seen by all processors."

and note how it says you should just store to the semaphore. If you think 
about it, that semahore will be involving all the memory ordering 
requirements that we *already* depend on, so if a semaphore is sufficient 
to order the fast string instruction, then by definition using a spinlock 
around them must be the same thing!

In other words, by Intels architecture manual, fast string instructions 
cannot escape a "semaphore" - but that means that they cannot escape a 
spinlock either (since the two are exactly the same wrt memory ordering 
rules! In other words, whenever the Intel docs say "semaphore", think 
"mutual exclusion lock", not necessarily the kernel kind of "sleeping 
semaphore").

But it might be good to have that explicitly mentioned in the IA memory 
ordering thing, so I'll ask the Intel people about that. However, I'd say 
that given our *current* documentation, string instructions may be 
*internally* out-of-order, but they would not escape a lock.


See above. I do not believe that it's an existing bug, but the basic point 
that the change to "smp_rmb()" doesn't change our existing rules is true.

			Linus
-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[patch] x86: improved memory barrier implementation, Nick Piggin, (Fri Sep 28, 11:48 am)
Re: [patch] x86: improved memory barrier implementation, Nick Piggin, (Fri Sep 28, 12:54 pm)
Re: [patch] x86: improved memory barrier implementation, Linus Torvalds, (Fri Sep 28, 12:15 pm)
Re: [patch] x86: improved memory barrier implementation, Linus Torvalds, (Sat Sep 29, 12:07 pm)