On Tue, 24 Jul 2007, Satyam Sharma wrote:No. CPU memory barriers extend to all CPU's. End of discussion. It's not about "that cacheline". The whole *point* of a CPU memory barrier is that it's about independent memory accesses. Yes, for a memory barrier to be effective, all CPU's involved in the transaction have to have the barriers - the same way a lock needs to be taken by everybody in order for it to make sense - but the point is, CPU barriers are about *global* behaviour, not local ones. So there's a *huge* difference between clear_bit(x,y); and clear_bit(x,y); smp_mb__before_after_clear_bit(); and it has absolutely nothing to do with the particular cacheline that "y" is in, it's about the *global* memory ordering. Any write you do after that "smp_mb__before_after_clear_bit()" will be guaranteed to be visible to _other_ CPU's *after* they have seen the bit being cleared. Yes, those other CPU's need to have a read barrier between reading the bit and reading some other thign, but the point is, this hass *nothing* to do with cache coherency, and the particular cache line that "y" is in. And no, "smp_mb__before/after_clear_bit()" must *not* be just an empty "do {} while (0)". It needs to be a compiler barrier even when it has no actual CPU meaning, unless clear_bit() itself is guaranteed to be a compiler barrier (which it isn't, although the "volatile" on the asm in practice makes it something *close* to that). Why? Think of the sequence like this: clear_bit(x,y); smp_mb__after_clear_bit(); other_variable = 10; the whole *point* of this sequence is that if another CPU does x = other_variable; smp_rmb(); bit = test_bit(x,y) then if it sees "x" being 10, then the bit *has* to be clear. And this is why the compiler barrier in "smp_mb__after_clear_bit()" needs to be a compiler barrier: - it doesn't matter for the action of the "clear_bit()" itself: that one is locked, and on x86 it thus also happens to be a serializing instruction, and the cache coherency and lock obviously means that the bit clearing *itself* is safe! - but it *does* matter for the compiler scheduling. If the compiler were to decide that "y" and "other_variable" are totally independent, it might otherwise decide to move the "other_variable = 10" assignment to *before* the clear_bit(), which would make the whole code pointless! See? We have two totally independent issues: - the CPU itself can re-order the visibility of accesses. x86 doesn't do this very much, and doesn't do it at all across a locked instruction, but it's still a real issue, even if it tends to be much easier to see on other architectures. - the compiler doesn't care about rules of "locked instruction" at all, because it has no clue. It has *different* rules about how it can re-order instructions and accesses, and maybe the "asm volatile" will guarantee that the compiler won't re-order things around the clear_bit(), and maybe it won't. But making it a compiler barrier (by using the "memory clobber" thing, *guarantees* that gcc cannot reorder memory writes or reads. See? Two different - and _totally_ independent - levels of ordering, and we need to make sure that both are valid. Linus -
| Pardo | Re: pthread_create() slow for many threads; also time to revisit 64b context switc... |
| Paul Jackson | Inquiry: Should we remove "isolcpus= kernel boot option? (may have realtime uses) |
| Srivatsa Vaddagiri | Re: [PATCH, RFC] reimplement flush_workqueue() |
| Peter Zijlstra | Re: Btrfs v0.16 released |
git: | |
| Giuseppe Bilotta | Re: gitweb and remote branches |
| Miklos Vajna | [rfc] git submodules howto |
| JD Guzman | C# Git Implementation |
| Junio C Hamano | Re: [PATCH] fix parallel make problem |
| Richard Stallman | Real men don't attack straw men |
| Steve B | SSH brute force attacks no longer being caught by PF rule |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Marius ROMAN | 1440x900 resolution problem |
| Tomasz Grobelny | [PATCH 0/5] [DCCP]: Queuing policies |
| Dushan Tcholich | Re: ksoftirqd high cpu load on kernels 2.6.24 to 2.6.27-rc1-mm1 |
| John Heffner | Re: A Linux TCP SACK Question |
| Denys Fedoryshchenko | Re: Could you make vconfig less stupid? |
