Cc: Benjamin Herrenschmidt <benh@...>, <akpm@...>, Linux Kernel Mailing List <linux-kernel@...>, <linuxppc-dev@...>, Ingo Molnar <mingo@...>, Thomas Gleixner <tglx@...>
On Thu, Oct 18, 2007 at 08:26:45PM -0700, Linus Torvalds wrote:
Right. I think if we accept the definition of a spin lock/unlock
that Nick outlined earlier, then we can see that on the IRQ path
there simply isn't a memory barrier between the changing of the
status field and the execution of the action:
spin_lock
set IRQ_INPROGRESS
spin_unlock
action
spin_lock
clear IRQ_INPROGRESS
spin_unlock
What may happen is that action can either float upwards to give
spin_lock
action
set IRQ_INPROGRESS
spin_unlock
spin_lock
clear IRQ_INPROGRESS
spin_unlock
or it can float downwards to give
spin_lock
set IRQ_INPROGRESS
spin_unlock
spin_lock
clear IRQ_INPROGRESS
action
spin_unlock
As a result we can add as many barriers as we want on the slow
(synchronize_irq) path and it just won't do any good (not unless
we add some barriers on the IRQ path == fast path).
What we do have on the right though is the fact in some ways
action behaves as if it's inside the spin lock even though it's
not. In particular, action cannot float up past the first spin
lock nor can it float down past the last spin unlock.
That's why I think this patch is in fact the only one that
solves all the races in this thread. The case that it solves
which the lock/unlock patch does not is the one where action
flows downwards past the clearing of IRQ_INPROGRESS. I missed
this case earlier.
In fact this bug exists elsewhere too. For example, the network
stack does this in net/sched/sch_generic.c:
/* Wait for outstanding qdisc_run calls. */
while (test_bit(__LINK_STATE_QDISC_RUNNING, &dev->state))
yield();
This has the same problem as the current synchronize_irq code.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-