On Fri, 12 Oct 2007, Jarek Poplawski wrote:I think the chip manufacturers really wanted to keep their options open. Having the option to re-order loads in architecturally visible ways was something that they probably felt they really wanted to have. On the other hand: - I bet they had noticed that things break, and some applications depend on fairly strong ordering (not necessarily in Linux-land, but..) I suspect hw manufacturers go through life hoping that "software improves". They probably thought that getting rid of the old 16-bit windows would mean that less people depended on undefined behaviour. And I suspect that they started noticing that no, with threads and JVM's and things, *more* people started depending on fairly strong memory ordering. - I suspect Intel in particular noticed that they can do a lot of very aggressive re-ordering at a microarchitectural level, but can still guarantee that *architecturally* they never show it (dynamic detection of reordered loads being replayed on cache dirty events etc). IOW, I suspect that both Intel and AMD noticed that while they had wanted to keep their options open, those options weren't really realistic, and not something that the market wanted (aggressive use of threading wants *stricter* memory ordering, not looser), and they could work well enough with a fairly strict memory model. Quite frankly, even *within* Intel and AMD, there are damn few people who understand exactly what the memory ordering requirements and guarantees are and historically were for the different CPU's. I would bet that had you asked a random (but still competent) Intel/AMD engineer that wasn't really intimately involved with the actual design of the cache protocols and memory pipelines, they would absolutely not have been able to tell you how the CPU actually worked. So no, there's no way a software person could have afforded to say "it seems to work on my setup even without the barrier". On a dual-socket setup with s shared bus, that says absolutely *nothing* about the behaviour of the exact same CPU when used with a multi-bus chipset. Not to mention another revisions of the same CPU - much less a whole other microarchitecture. So yes, I've personally been aware for about a year that the memory ordering was going to likely be documented, but no way was I going to depend on it until Intel and AMD were ready to state so *publicly*. Because before that happens, they may have noticed errata etc that made it not work out. Also, please note that we didn't even just change the barriers immediately when the docs came out. I want to do it soon - still *early* in the 2.6.24 development cycle - exactly because bugs happen, and if somebody notices something strange, we'll have more time to perhaps decide that "oops, there's something bad going on, let's undo this for the real 2.6.24 release until we can figure out the exact pattern". Linus -
| Mark Lord | 2.6.25-rc8: FTP transfer errors |
| Kamalesh Babulal | Re: 2.6.23-rc6-mm1 |
| Greg Kroah-Hartman | [PATCH 025/196] paride: Convert from class_device to device for block/paride |
| Stephen Rothwell | Announce: Linux-next (Or Andrew's dream :-)) |
git: | |
| Linus Torvalds | Re: iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49 |
| David Miller | Re: [GIT]: Networking |
| Gerrit Renker | [PATCH 18/37] dccp: Support for Mandatory options |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
