On Tue, 19 Feb 2008, Pekka Enberg wrote:No, no. The comment says that it's purely there to serialize an *interrupt*, and as such, a compiler-only barrier is sufficient (or the comment is wrong). Interrupts are "totally ordered" within a cpu (of course, in theory a CPU might have speculative work etc reordering, but the CPU also guarantees that interrupt acts _as_if_ it was exact), so a compiler barrier is sufficient. Of course, if we're talking about interrupts on another CPU, that's a different issue, but the fact is, in that case it's not about interrupts any more (might as well be other code just running normally on another CPU), and a barrier doesn't help, it needs real locking. So that barrier is fine per se. Of course, the whole code (and/or just the comment!) may be buggered, but any CPU SMP-aware barriers shouldn't be relevant. What's much more likely to be an issue is simply the fact that since the fastpath now accesses the per-cpu freelist without any locking, if there is *any* sequence what-so-ever that does it from another CPU and assumes the old locking behaviour, the list will be corrupted. And from a quick look-through, I certainly cannot guarantee that isn't the case. There's still a lot of cases that do direct assignments to "c->freelist" without using a guaranteed atomic sequence. They *should* be safe if it's guaranteed that (a) they always run with interrupts disabled AND (b) 'c' is _always_ the "current CPU" list but I can't quickly see that guarantee for either. I'd happily just revert this thing, but it would be really good to have confirmation that it seems to matter. But Torsten's partial bisection seems to say that the quicklist thing went into -mm before the crash even started. So: - it might be something else entirely - it might still be the local cmpxchg, just Torsten didn't happen to notice it until later. - it might still be the local cmpxchg, but something else changed its patterns to actually make it start triggering. and in general I don't think we should revert it unless we have stronger indications that it really is the problem (eg somebody finds the actual bug, or a reporter can confirm that it goes away when the local cmpxchg optimization is disabled). Linus --
| Andrew Morton | -mm merge plans for 2.6.23 |
| Rafael J. Wysocki | [Bug #11207] VolanoMark regression with 2.6.27-rc1 |
| Zhang, Yanmin | AIM7 40% regression with 2.6.26-rc1 |
| Con Kolivas | [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2 |
git: | |
| Gregory Haskins | [RFC PATCH 03/17] vbus: add connection-client helper infrastructure |
| David Woodhouse | [PATCH 03/30] solos: FPGA and firmware update support. |
| Natalie Protasevich | [BUG] New Kernel Bugs |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
