> Hedi Berriche sent me a simple test case that can
Can you post the test case please. How long does it typically take
to reproduce the problem?
What is the duplicate ticket number that CPUs 5 & 7 get at this point?
Presumably 0x0, yes? Or do they see a stale 0x7fff?
Is the fault handler using "ld.acq" to look at the spinlock value?
If not, then this might be a red herring. [Though clearly something
bad is going on here].
What cpu model are you running on?
What is the topological connection between CPU 4, 5 and 7 - are any of
them hyper-threaded siblings? Cores on same socket? N.B. topology may
change from boot to boot, so you may need to capture /proc/cpuinfo from
the same boot where this problem is detected. But the variation is
usually limited to which socket gets to own logical cpu 0.
If this is a memory ordering problem (and that seems quite plausible)
then a liberal sprinkling of "ia64_mf()" calls throughout the spinlock
routines would probably make it go away.
-Tony
--