Re: [PATCH 0/3] 64-bit futexes: Intro

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Nick Piggin <npiggin@...>
Cc: Ingo Molnar <mingo@...>, David Howells <dhowells@...>, Ulrich Drepper <drepper@...>, Linux Kernel Mailing List <linux-kernel@...>, Andrew Morton <akpm@...>
Date: Wednesday, June 4, 2008 - 4:38 pm

On Wed, 4 Jun 2008, Linus Torvalds wrote:

Put another way: the CPU may internally effectively rewrite the two 
instructions as

	movb reg,tmpreg		(tmp = writebuffer)
	movl (byteptr),reg	(do the 32-bit read of the old cached contents)
	movb tmpreg,reg		(writebuffer snoop by reads)
	movb tmpreg,(byteptr)	(writebuffer actually drains into cacheline)

and *if* your algorithm is robust wrt these kinds of rewrites, you're ok. 
But notice how there are two accesses to the cacheline, and how they are 
actually in the "wrong" order, and how the cacheline could have been 
updated by another CPU in between.

Does this actually happen? Yeah, I do believe it does. Is it a deathknell 
for anybody trying to be clever with overlapping reads/writes? No, you can 
still have things like causality rules that guarantee that your algorithm 
is perfectly stable in the face of these kinds of reordering. But it *is* 
one of the few re-orderings that I think is theoretically archtiecturally 
visible.

For example, let's start out with a 32-bit word that contains zero, and 
three CPU's. One CPU does

	cmpxchgl 0->0x01010101,mem

another one does

	cmpxchlg 0x01010101->0x02020202,mem

and the third one does that

	movb $0x03,mem
	movl mem,reg

and after it all completed, you may have 0x02020203 in memory, but "reg" 
on the third CPU contains 0x01010103.

Note how NO OTHER CPU could _possibly_ have seen that value! That value 
never ever existed in any caches. If the final value was 0x02020203, then 
both the cmpxchgl's must have worked, so the cache coherency contents 
*must* have gone from 0 -> 0x01010101 -> 0x02020202 -> 0x02020203 (with 
the movb actually getting the exclusive cache access last).

So the third CPU saw a value for that load that actually *never* existed 
in any cache-line: 0x01010103.  Exactly because the x86 memory ordering 
allows the store buffer contents to be forwarded within a CPU core.

And this is why atomic locked instructions are special. They bypass the 
store buffer (or at least they _act_ as if they do - they likely actually 
use the store buffer, but they flush it and the instruction pipeline 
before the load and wait for it to drain after, and have a lock on the 
cacheline that they take as part o the load, and release as part of the 
store - all to make sure that the cacheline doesn't go away in between and 
that nobody else can see the store buffer contents while this is going 
on).

This is also why there is so much room for improvement in locked 
instruction performance - you don't _have_ to flush things if you just are 
very careful about tracking how and when you use which elements in the 
store buffer, and track the ordering of cache accesses by all of this.

			Linus
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH 0/3] 64-bit futexes: Intro, Ulrich Drepper, (Fri May 30, 9:27 pm)
Re: [PATCH 0/3] 64-bit futexes: Intro, Linus Torvalds, (Fri May 30, 10:13 pm)
Re: [PATCH 0/3] 64-bit futexes: Intro, Ulrich Drepper, (Fri May 30, 11:14 pm)
Re: [PATCH 0/3] 64-bit futexes: Intro, Linus Torvalds, (Fri May 30, 11:44 pm)
Re: [PATCH 0/3] 64-bit futexes: Intro, Ulrich Drepper, (Sat May 31, 12:04 am)
Re: [PATCH 0/3] 64-bit futexes: Intro, Linus Torvalds, (Sat May 31, 12:16 am)
Re: [PATCH 0/3] 64-bit futexes: Intro, Linus Torvalds, (Sat May 31, 12:23 am)
Re: [PATCH 0/3] 64-bit futexes: Intro, Ulrich Drepper, (Sat May 31, 12:38 am)
Re: [PATCH 0/3] 64-bit futexes: Intro, Linus Torvalds, (Sat May 31, 6:25 pm)
Re: [PATCH 0/3] 64-bit futexes: Intro, Ingo Molnar, (Mon Jun 2, 2:54 pm)
Re: [PATCH 0/3] 64-bit futexes: Intro, Linus Torvalds, (Mon Jun 2, 4:22 pm)
Re: [PATCH 0/3] 64-bit futexes: Intro, Nick Piggin, (Thu Jun 5, 9:27 pm)
Re: [PATCH 0/3] 64-bit futexes: Intro, Linus Torvalds, (Thu Jun 5, 11:37 pm)
Re: [PATCH 0/3] 64-bit futexes: Intro, Nick Piggin, (Fri Jun 6, 7:53 am)
Re: [PATCH 0/3] 64-bit futexes: Intro, Linus Torvalds, (Fri Jun 6, 11:01 am)
Re: [PATCH 0/3] 64-bit futexes: Intro, Ingo Molnar, (Mon Jun 2, 7:03 pm)
Re: [PATCH 0/3] 64-bit futexes: Intro, Nick Piggin, (Mon Jun 2, 11:24 pm)
Re: [PATCH 0/3] 64-bit futexes: Intro, Linus Torvalds, (Wed Jun 4, 3:57 pm)
Re: [PATCH 0/3] 64-bit futexes: Intro, Nick Piggin, (Wed Jun 4, 9:45 pm)
Re: [PATCH 0/3] 64-bit futexes: Intro, Linus Torvalds, (Wed Jun 4, 4:38 pm)
Re: [PATCH 0/3] 64-bit futexes: Intro, Nick Piggin, (Wed Jun 4, 9:56 pm)
Re: [PATCH 0/3] 64-bit futexes: Intro, Linus Torvalds, (Wed Jun 4, 11:08 pm)
Re: [PATCH 0/3] 64-bit futexes: Intro, Nick Piggin, (Thu Jun 5, 12:29 am)
Re: [PATCH 0/3] 64-bit futexes: Intro, Nick Piggin, (Wed Jun 4, 9:58 pm)
Re: [PATCH 0/3] 64-bit futexes: Intro, Linus Torvalds, (Sat May 31, 6:32 pm)
Re: [PATCH 0/3] 64-bit futexes: Intro, Linus Torvalds, (Sat May 31, 12:58 am)