On Monday, June 02, 2008 9:33 pm Nick Piggin wrote:I think you mean wrt cacheable memory accesses here (though iirc on ia64 spin_unlock has release semantics, so at least it'll barrier other stores). Well, given how undefined things have been in the past, each arch has had to figure out what things mean (based on looking at drivers & core code) then come up with appropriate primitives. On Altix, we went both directions: we made regular PIO reads (readX etc.) *very* expensive to preserve compatibility with what existing drivers expect, and added a readX_relaxed to give a big performance boost to tuned drivers. OTOH, given that posted PCI writes were nothing new to Linux, but the Altix network topology was, we introduced mmiowb() (with lots of discussion I might add), which has clear and relatively simple usage guidelines. Now, in hindsight, using a PIO write set & test flag approach in writeX/spin_unlock (ala powerpc) might have been a better approach, but iirc that never came up in the discussion, probably because we were focused on PCI posting and not uncached vs. cached ordering. I agree, but afaik the only change Altix ended up forcing on people was mmiowb(), but that turned out to be necessary on mips64 (and maybe some other platforms?) anyway. Aside from the obvious performance impact of making all the readX/writeX routines strongly ordered, both in terms of PCI posting and cacheable vs. uncacheable accesses, it also makes things inconsistent. Both core code & drivers will still have to worry about regular, cacheable memory barriers for correctness, but it looks like you're proposing that they not have to think about I/O ordering. At any rate, I don't think anyone would argue against defining the ordering semantics of all of these routines (btw you should also include ordering wrt DMA & PCI posting); the question is what's the best balance between keeping the driver requirements simple and the performance cost on complex arches. Jesse --
| Joe Perches | [PATCH 143/148] include/asm-x86/vm86.h: checkpatch cleanups - formatting only |
| Linus Torvalds | Re: Back to the future. |
| Greg Kroah-Hartman | [PATCH 004/196] Chinese: add translation of SubmittingPatches |
| Trent Piepho | [PATCH] [POWERPC] Improve (in|out)_beXX() asm code |
git: | |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| David Miller | [GIT]: Networking |
| Linus Torvalds | Re: iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49 |
