On Thu, Oct 18, 2007 at 04:39:59PM -0700, Linus Torvalds wrote:Thinking about this again and checking code I have come to the conclusion that 1) There is a race (in some drivers) even with the full mb. 2) Linus's patch fixes it. 3) It's difficult to fix it elegantly in the driver. In other words I think this patch is great :) First of all let's agree on some basic assumptions: * A pair of spin lock/unlock subsumes the effect of a full mb. * A spin lock in general only equates to (SS/SL/LL). * A spin unlock in general only equates to (SS/LS). In particular, a load after a spin unlock may pass before the spin unlock. Here is the race (with tg3 as the only example that I know of). The driver attempts to quiesce interrupts such that after the call to synchronize_irq it is intended that no further IRQ handler calls for that device will do any work besides acking the IRQ. Here is how it does it: CPU0 CPU1 spin lock load irq_sync irq_sync = 1 smp_mb synchronize_irq() while (IRQ_INPROGRESS) wait return set IRQ_INPROGRESS spin unlock tg3_msi ack IRQ if (irq_sync) return do work The problem here is that load of irq_sync in the handler has passed above the setting of IRQ_INPROGRESS. Linus's patch fixes it because this becomes: CPU0 CPU1 spin lock load irq_sync irq_sync = 1 smp_mb synchronize_irq set IRQ_INPROGRESS spin unlock spin lock spin unlock tg3_msi ack IRQ if (irq_sync) return do work while (IRQ_INPROGRESS) wait spin lock clear IRQ_INPROGRESS spin unlock return Even though we still do the work on the right we will now notice the INPROGRESS flag on the left and wait. It's hard to fix this in the drivers because they'd either have to access the desc lock or add a full mb to the fast path on the right. Once this goes in we can also remove the smp_mb from tg3.c. BTW, a lot of drivers (including the fusion example Ben quoted) call synchronize_irq before free_irq. This is unnecessary because the latter already calls it anyway. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -
| Greg KH | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Heiko Carstens | Re: -mm merge plans for 2.6.23 -- sys_fallocate |
| Tony Lindgren | [PATCH 37/90] ARM: OMAP: MPUIO wake updates |
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Benjamin Herrenschmidt | Re: powerpc allmodconfig |
