Cc: Jes Sorensen <jes@...>, Jeremy Higdon <jeremy@...>, Roland Dreier <rdreier@...>, <benh@...>, Arjan van de Ven <arjan@...>, <linux-arch@...>, <linux-kernel@...>, <tpiepho@...>, <linuxppc-dev@...>, <scottwood@...>, <torvalds@...>, David Miller <davem@...>, <alan@...>
On Monday, June 02, 2008 9:33 pm Nick Piggin wrote:
I think you mean wrt cacheable memory accesses here (though iirc on ia64
spin_unlock has release semantics, so at least it'll barrier other stores).
Well, given how undefined things have been in the past, each arch has had to
figure out what things mean (based on looking at drivers & core code) then
come up with appropriate primitives. On Altix, we went both directions: we
made regular PIO reads (readX etc.) *very* expensive to preserve
compatibility with what existing drivers expect, and added a readX_relaxed to
give a big performance boost to tuned drivers.
OTOH, given that posted PCI writes were nothing new to Linux, but the Altix
network topology was, we introduced mmiowb() (with lots of discussion I might
add), which has clear and relatively simple usage guidelines.
Now, in hindsight, using a PIO write set & test flag approach in
writeX/spin_unlock (ala powerpc) might have been a better approach, but iirc
that never came up in the discussion, probably because we were focused on PCI
posting and not uncached vs. cached ordering.
I agree, but afaik the only change Altix ended up forcing on people was
mmiowb(), but that turned out to be necessary on mips64 (and maybe some other
platforms?) anyway.
Aside from the obvious performance impact of making all the readX/writeX
routines strongly ordered, both in terms of PCI posting and cacheable vs.
uncacheable accesses, it also makes things inconsistent. Both core code &
drivers will still have to worry about regular, cacheable memory barriers for
correctness, but it looks like you're proposing that they not have to think
about I/O ordering.
At any rate, I don't think anyone would argue against defining the ordering
semantics of all of these routines (btw you should also include ordering wrt
DMA & PCI posting); the question is what's the best balance between keeping
the driver requirements simple and the performance cost on complex arches.
Jesse
--