On Wednesday 11 June 2008 14:18, Paul Mackerras wrote:
Yes, this is the one reordering allowed by the ISA on cacheable memory.
WC memory is weaker, which Linus wants to allow exception for because
one must explicitly ask for it. UC memory (which presumably is what
we're talking about as "IO access") I think is stronger in that it does
not allow any reordering between one another or any cacheable access:
AMD says this:
c — A store (wp,wt,wb,uc,wc,wc+) may not pass a previous load
(wp,wt,wb,uc,wc,wc+).
f — A load (uc) does not pass a previous store (wp,wt,wb,uc,wc,wc+).
g — A store (wp,wt,wb,uc) does not pass a previous store (wp,wt,wb,uc).
i — A load (wp,wt,wb,wc,wc+) does not pass a previous store (uc).
AMD does allow WC/WC+ to be weakly ordered WRT WC as well as UC, which
Intel seemingly does not. But we're alrady treating WC as special.
I can't actually find the definitive statement in the Intel manuals
saying UC is strongly ordered also WRT WB. Linus?
~/usr/src/linux-2.6> git grep test_and_set_bit drivers/ | wc -l
506
How sure are you that none of those forms part of a cobbled-together
locking scheme that hopes to constrain some IO access?
~/usr/src/linux-2.6> git grep test_and_set_bit drivers/ | grep while | wc -l
29
How about those?
~/usr/src/linux-2.6> git grep mutex_lock drivers/ | wc -l
3138
How sure are you that none of them is hoping to constrain IO operations
within the lock?
Also grep for down, down_write, write_lock, and maybe some others I've
forgotten. And then forget completely about locking and imagine some
of the open coded things you see around the place (or parts where drivers
try to get creative and open code their own locking or try lockless
things).
But surely you have to audit the drivers anyway to ensure they are OK
with the current powerpc scheme. In which case, once you have audited
them and know they are safe, you can easily convert them to the even
_faster_ __readl/__writel, and just add the appropriate barriers.
--