Hi everybody, SGI has recently experienced failures with the new ticket spinlock implementation. Hedi Berriche sent me a simple test case that can trigger the failure on the siglock. To debug the issue, I wrote a small module that watches writes to current->sighand->siglock and records the values. I observed that the __ticket_spin_lock() primitive fails when the tail wraps around to zero. I reconstructed the following: CPU 7 holds the spinlock CPU 5 wants to acquire the spinlock Spinlock value is 0xfffcffff (now serving 0x7fffe, next ticket 0x7ffff) CPU 7 executes st2.rel to release the spinlock. At the same time CPU 5 executes a fetchadd4.acq. The resulting lock value is 0xfffe0000 (correct), and CPU 5 has recorded its ticket number (0x7fff). Consequently, the first spinlock loop iteration succeeds, and CPU 5 now holds the spinlock. Next, CPU 5 releases the spinlock with st2.rel, changing the lock value to 0x0 (correct). SO FAR SO GOOD. Now, CPU 4, CPU 5 and CPU 7 all want to acquire the lock again. Interestingly, CPU 5 and CPU 7 are both granted the same ticket, and the spinlock value (as seen from the debug fault handler) is 0x0 after single-stepping over the fetchadd4.acq, in both cases. CPU 4 correctly sets the spinlock value to 0x1. I don't know if the simultaneos acquire attempt and release are necessary to trigger the bug, but I noted it here. I've only seen this happen when the spinlock wraps around to zero, but I don't know whether it cannot happen otherwise. In any case, there seems to be a serious problem with memory ordering, and I'm not an expert to tell exactly what it is. Any ideas? Petr Tesarik L3 International Novell, Inc. --
On Fri, Aug 27, 2010 at 14:38 Petr Tesarik wrote:
| Hi everybody,
|
| SGI has recently experienced failures with the new ticket spinlock
| implementation. Hedi Berriche sent me a simple test case that can
| trigger the failure on the siglock.
One more fact, the problem was introduced by commit
commit 9d40ee200a527ce08ab8c793ba8ae3e242edbb0e
Author: Tony Luck <tony.luck@intel.com>
Date: Wed Oct 7 10:54:19 2009 -0700
[IA64] Squeeze ticket locks back into 4 bytes.
Reverting the patch makes the problem go away.
IOW, and as far as testing shows, the first incarnation of the ticket locks
implementation on IA64 (commit 2c8696), the one that used 8 bytes, does not
exhibit this problem.
Cheers,
Hedi.
--
Hedi Berriche
Global Product Support
http://www.sgi.com/support
--
I wouldn't be so sure about it. Given that I have only observed the problem when the spinlock value wraps around, then an 8-byte spinlock might only need much more time to trigger the bug. Just my two cents, Petr Tesarik L3 International Novell, Inc. --
On Fri, Aug 27, 2010 at 15:09 Petr Tesarik wrote: | On Friday 27 of August 2010 15:48:02 Hedi Berriche wrote: | | > One more fact, the problem was introduced by commit | > | > commit 9d40ee200a527ce08ab8c793ba8ae3e242edbb0e | > Author: Tony Luck <tony.luck@intel.com> | > Date: Wed Oct 7 10:54:19 2009 -0700 | > | > [IA64] Squeeze ticket locks back into 4 bytes. | > | > Reverting the patch makes the problem go away. | > | > IOW, and as far as testing shows, the first incarnation of the ticket locks | > implementation on IA64 (commit 2c8696), the one that used 8 bytes, does not | > exhibit this problem. | | I wouldn't be so sure about it. Given that I have only observed the problem | when the spinlock value wraps around, then an 8-byte spinlock might only need | much more time to trigger the bug. That's a possibility and that's why I said "as far as testing shows". That said, I'm letting my already over 36 hours run carry on chewing CPU time, and see if it will eventually trip the same problem seen with 4-byte ticket locks. Cheers, Hedi. -- Hedi Berriche Global Product Support http://www.sgi.com/support --
Hm, this doesn't sound like a viable approach. Since the siglock gets initialized to 0 when a new process is started, it may never actually wrap around. I would rather attach a SystemTap probe somewhere during process fork and add a bias to the siglock. That should work fine. Let me knock up the SystemTap script... Petr --
On Fri, Aug 27, 2010 at 15:40 Petr Tesarik wrote: | On Friday 27 of August 2010 16:31:35 Hedi Berriche wrote: | | > That said, I'm letting my already over 36 hours run carry on chewing CPU | > time, and see if it will eventually trip the same problem seen with 4-byte | > ticket locks. | | Hm, this doesn't sound like a viable approach. Since the siglock gets | initialized to 0 when a new process is started, it may never actually wrap | around. | | I would rather attach a SystemTap probe somewhere during process fork and add | a bias to the siglock. That should work fine. Let me knock up the SystemTap | script... That would be nice. Ta! Cheers, Hedi. -- Hedi Berriche Global Product Support http://www.sgi.com/support --
Here it is. I don't have a system with the old 64-bit ticket spinlocks at hand, so this is completely untested, but it should work fine. Adjust if needed. Petr
Can you post the test case please. How long does it typically take What is the duplicate ticket number that CPUs 5 & 7 get at this point? Is the fault handler using "ld.acq" to look at the spinlock value? If not, then this might be a red herring. [Though clearly something What cpu model are you running on? What is the topological connection between CPU 4, 5 and 7 - are any of them hyper-threaded siblings? Cores on same socket? N.B. topology may change from boot to boot, so you may need to capture /proc/cpuinfo from the same boot where this problem is detected. But the variation is usually limited to which socket gets to own logical cpu 0. If this is a memory ordering problem (and that seems quite plausible) then a liberal sprinkling of "ia64_mf()" calls throughout the spinlock routines would probably make it go away. -Tony --
I let Hedi send it. It's really easy to reproduce. In fact, I can reproduce it Right. I also realized I was reading the spinlock value with a plain "ld4". When I changed it to "ld4.acq", this is what happens: 1. We're in _spin_lock_irq, which starts like this: 0xa0000001008ea000 <_spin_lock_irq>: [MMI] rsm 0x4000;; 0xa0000001008ea001 <_spin_lock_irq+1>: fetchadd4.acq r15=[r32],1 0xa0000001008ea002 <_spin_lock_irq+2>: nop.i 0x0;; AFAICS the spinlock value should be 0x0 (after having wrapped around from 0xffff0000 at release on the same CPU). 2. fetchadd4.acq generates a debug exception (because it writes to the watched location) 3. ld4.acq inside the debug fault handler reads 0x0 from the location 4. the handler sets PSR.ss on return 5. fetchadd4.acq puts 0x1 (why?) in r15 and generates a Single step fault 6. the fault handler now reads 0x0 (sic!) from the spinlock location (again, using ld4.acq) 7. the resulting kernel crash dump contains ZERO in the spinlock location Maybe, there's something wrong with my test module, because I'm already getting tired today, but there's definitely something wrong here. I'll try to There are two Dual-Core Intel(R) Itanium(R) 2 Processor 9150M in the test machine: physical package 0 core 0: CPU 0, CPU 4 core 1: CPU 2, CPU 6 physical package 196611 core 0: CPU 1, CPU 5 core 1: CPU 3, CPU 7 /proc/cpuinfo says: processor : 0 vendor : GenuineIntel arch : IA-64 family : 32 model : 1 model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9150M revision : 1 archrev : 0 features : branchlong, 16-byte atomic ops cpu number : 0 cpu regs : 4 cpu MHz : 1668.672 itc MHz : 416.667500 BogoMIPS : 1662.97 siblings : 4 physical id: 0 core id : 0 thread id : 0 processor : 1 vendor : GenuineIntel arch : IA-64 family : 32 model : 1 model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9150M revision : ...
On Fri, Aug 27, 2010 at 18:16 Petr Tesarik wrote: | On Friday 27 of August 2010 18:08:03 Luck, Tony wrote: | > > Hedi Berriche sent me a simple test case that can | > > trigger the failure on the siglock. | > | > Can you post the test case please. How long does it typically take | > to reproduce the problem? | | I let Hedi send it. It's really easy to reproduce. In fact, I can reproduce it | within 5 minutes on an 8-CPU system. Test case provided via private email. Cheers, Hedi. -- Hedi Berriche Global Product Support http://www.sgi.com/support --
I have another crash dump which recorded the same values in the debug fault handler, but the resulting crash dump contains 0x1 (not 0x0) in the spinlock. R15 was still 0x1 (even though it should contain the original value, not the incremented one, shouldn't it?). Petr Tesarik --
I think I take this back ... if it were a memory ordering problem, then it could show up any time - not just at wrap-around. -Tony --
Well, I wasn't originally sure if it only happens at wrap-around. OTOH I've now modified my tests, so that they would also catch any other badness, and I still only got another two failures after wrap-around. From looking at the traces, I'm afraid this smells like another Itanium erratum. I'm now trying to write a minimal test case... Petr Tesarik --
One more idea. The wrap-around case is the only one when the high word is modified. This is in fact the only case when the fetchadd.acq competes with the st2.rel about the actual contents of that location. I don't know if it matters... Petr Tesarik --
I pondered that for a while - but I have difficulty believing that fetchadd looks at which bits changed and only writes back the bytes that did. -Tony --
OTOH the counter is only 15-bit, so it also wraps around at 0xfffe7fff, but I have never seen it fail there. It always fails after the wrap-around from 0xfffeffff. Petr Tesarik --
Debugging this in the kernel seemed hard ... so I tried to construct a user level test using the same code from the kernel. See attached. But this fails so catastrophically with any number of threads greater than one, that I suspect I made some mistake cut & pasting the relevant bits of kernel infrastructure. The goal of the program is to have several child processes pounding on a lock while the parent looks in every 5 seconds to see if they are making progress. -Tony
> this fails so catastrophically Interesting ... getting rid of some of the fancy asm bits (replacing both the ld4.c.nc and the ld2.bias with simple "serve = *p" and "tmp = *p") makes this run a lot better. -Tony --
On Sat, Aug 28, 2010 at 00:55 Tony Luck wrote: | > this fails so catastrophically | | Interesting ... getting rid of some of the fancy asm bits | (replacing both the ld4.c.nc and the ld2.bias with simple | "serve = *p" and "tmp = *p") makes this run a lot better. *Hasty* look seems to suggest that keeping the fancy asm bits but compiling with -O2 -frename-registers makes it work equally well. Don't take my word for it though, double check it, it's been a long week and brain is anything but alert at this time of the night. Cheers, Hedi. -- Hedi Berriche Global Product Support http://www.sgi.com/support --
Yup - that makes the usermode version run just fine (no problems in the first billion lock/unlock operations ... which is quite a lot of wrap-arounds of the 15-bit ticket lock). And since we already use -frename-registers when building the kernel, no immediate help for the kernel problem. :-( I may tinker with this test a bit to include some short random amounts of hold-time for the lock, and delays between attempts to acquire it (to make it look more like a contended kernel lock and less like a continuous queue of processes trading around a lock that is never free ... Petr's debug information definitely showed the lock becoming free at the wraparound (lock == 0x0). -Tony --
I've been iterating ... adding new bits to try to reproduce the
kernel environment:
1) Added delays (bigger delay not holding the lock than holding
it - so contention is controlled)
2) Check that locking actually works (with a critzone element that
is only modified when the lock is held).
3) Sometimes use trylock (all the odd numbered threads do this).
Compile with -frename-registers ... and add a nop() { } function
in another file (just to make sure the compiler doesn't optimize
the delay loops).
Sadly ... my user mode experiments haven't yet yielded any cases
where the ticket locks fail in the way that Petr saw them mess up
inside the kernel. This latest version has been running for ~90
minutes and has completed 25 million lock/trylock iterations (with
about a third of the ticket lock wraparounds passing through the
uncontested case (lock == 0) and the rest happening with some
processes waiting for the lock.
So now I'm trying to think of other ways that the kernel case
differs from my user mode mock-up.
-Tony
Hi Tony, I've been also playing with my test case, and I haven't been able to reproduce it in user-space either. One thing I noticed was the apparently incorrect use of ALAT. The generated code for _spin_lock_irq contains: invala;; ld4.c.nc r11=[r32] // Other instructions not affecting r20 ld4.c.nc r20=[r32] IIUC, the subsequent compare can use an undefined value (r20 is not modified anywhere in this function, except by the ld4.c.nc, but that happens only on an ALAT miss, right?). I changed the corresponding code in __ticket_spin_lock to: asm volatile ("ld4.c.nc %0=[%1]" : "+r"(serve) : "r"(p) : "memory"); (NB the "+r" constraint instead of "=r") The generated code now re-uses r15. Unfortunately, Hedi's test case still fails for me. :( Petr Tesarik --
I don't see that in my kernel - but you raise a good point. The idea in this code is that invala makes sure that we don't have a matching ALAT entry on the first iteration of the loop, so we will do the load. Then we loop not actually doing the access until the ALAT tells us that the location may have changed, when we load again. But this assumes that the compiler didn't decide to re-use the register for something else in the loop. So things may be a bit fragile if some aggressive optimization happen after the routine is inlined. Does Hedi's test fail for you if you drop the fancy ALAT trick? I.e. just use "serve = *p;" in __ticket_spin_lock()? [& similar in __ticket_spin_unlock_wait() if you want - but I don't think that is ever used for siglock] -Tony --
Answering my own question ... it failed in 47 minutes with the breakit script on iteration 2812 :-( So it would appear that the problem isn't related to the ALAT, or weird compiler optimizations around the inline asm. -Tony --
More results from other experiments ... 1) It occurred to me that I should check that these test cases weren't hitting some other problem in 2.6.36-rc3. So I ported the 64-bit version of ticket locks to the current kernel and ran the stress test. It was still going strong at 16 hours (where all my other experiments tend to fail at 90 minutes or less). 2) Next I investigated whether wrap-around was related by reducing TICKET_BITS from 15 to 8 (I only have 32 cpus, so this should be plenty). I also moved the bit offset of the "now serving" value to different spots in the high half of the lock to check whether we were hitting some issues with overflow from the fetchadd on the low half into the high half, or some sign problem when bit 31 was set. These tests all failed in 20 minutes to an hour (not significantly different from TICKET_BITS=15) ... so wraparound appears not to be an issue. 3) Then I wondered whether it was a problem that we used fetchadd4 which modifies all 32 bits in an atomic instruction when acquiring the lock, but a simple st2 to write just the upper 16 bits when doing the unlock. So I recoded __ticket_spin_unlock() to spin on a cmpxchg call to update all 32-bits with an atomic instruction. This one failed in 34 minutes. 4) Memory ordering? I added ia64_mf() calls liberally throughout all the __ticket_* routines. Kernel failed in 32 minutes. Summary: the only change that helps is the 64-bit ticket locks. -Tony --
On Thu, Sep 02, 2010 at 00:10 Tony Luck wrote: | More results from other experiments ... | | 1) It occurred to me that I should check that these test cases weren't | hitting some other problem in 2.6.36-rc3. So I ported the 64-bit | version of ticket locks to the current kernel and ran the stress test. | It was still going strong at 16 hours (where all my other experiments | tend to fail at 90 minutes or less). This is consistent with the bisection that led me to the 4 bytes ticket locks commit. | Summary: the only change that helps is the 64-bit ticket locks. Ditto. Cheers, Hedi. -- Hedi Berriche Global Product Support http://www.sgi.com/support --
Today's experiments were inspired by Petr's comment at the start of this thread: "Interestingly, CPU 5 and CPU 7 are both granted the same ticket" I added an "owner" element to every lock - I have 32 cpus, so I made it "unsigned int". Then added to the lock and trylock paths code to check that owner was 0 when the lock was granted, followed by: lock->owner |= (1u << cpu); Then in the unlock path I check that just the (1u << cpu) bit is set before doing: lock->owner &= ~(1u << cpu); In my first test I got a hit. cpu28 had failed to get the lock and was spinning holding ticket "1". When "now serving" hit 1, cpu28 saw that the owner field was set to 0x1, indicating that cpu0 had also claimed the lock. The lockword was 0x20002 at this point ... so cpu28 was correct to believe that the lock had been freed and handed to it. It was unclear why cpu0 had muscled in and set its bit in the owner field. Also can't tell whether that was a newly allocated lock, or one that had recently wrapped around. Subsequent tests have failed to reproduce that result - system just hangs without complaining about multiple cpus owning the same lock at the same time - perhaps because of the extra tracing I included to capture more details. -Tony --
I did some extensive testing of the issue. I wrote a Kprobe that attaches to copy_process and if the new task is one of the "count" processes, it sets up a pair of DBR registers to watch for all writes to the siglock. (Obviously, I had to limit parallel runs of "count" to 4, because there are only 8 dbr registers.) When I hit the breakpoint, I record the old value (with ld4.acq), single step one instruction and read the new value (with ld4.acq). The code panics the machine (producing a core-dump) if neither the new head is larger than the old head nor the new tail is larger than the old tail. What I got is rather disturbing. I got three different traces so far, all of them on the same fetchadd4.acq instruction. The observed values are: BEFORE reg AFTER DUMP A. 0 1 0 0 B. 1 0 1 1 C. 0 1 0 1 BEFORE .. value seen by ld4.acq in the first debug fault reg .. value in the target register of fetchadd AFTER .. value seen by ld4.acq after single step DUMP .. value saved to the crash dump Interestingly, sometimes there was no write recorded with the new value equal to the BEFORE column. Then it occured to me that I probably missed some writes from interrupt context, because psr.db gets cleared by the CPU. So I modified ivt.S so that it explicitly re-enabled psr.db. And I got a crash dump with variant C. I thought that I still missed some writes somehow, but consider that I never got any failures other than after a wrap-around, even though the code would catch any case where the lock does not increment correctly. Moreover, variant B cannot be explained even if I did miss a fetchadd4. How can we get 1 on the first ld4.acq, and then 0 from the fetchadd4.acq? I'm now trying to modify the lock primitives: 1. replace the fetchadd4.acq with looping over cmpxchg 2. replace the st2.rel with looping over cmpxchg I'll write again when I have the results. Petr Tesarik --
I did this and I feel dumber than ever. Basically, I replaced this snippet:
ticket = ia64_fetchadd(1, p, acq);
with:
int tmp;
do {
ticket = ACCESS_ONCE(lock->lock);
asm volatile (
"mov ar.ccv=%1\n"
"add %0=1,%1;;\n"
"cmpxchg4.acq %0=[%2],%0,ar.ccv\n"
: "=r" (tmp)
: "r" (ticket), "r" (&lock->lock)
: "ar.ccv");
} while (tmp != ticket);
Just to make sure I didn't miss something, this compiled to:
0xa0000001008dacb0: [MMI] nop.m 0x0
0xa0000001008dacb1: ld4.acq r15=[r32]
0xa0000001008dacb2: nop.i 0x0;;
0xa0000001008dacc0: [MII] mov.m ar.ccv=r15
0xa0000001008dacc1: adds r14=1,r15;;
0xa0000001008dacc2: nop.i 0x0
0xa0000001008dacd0: [MII] cmpxchg4.acq r14=[r32],r14,ar.ccv
0xa0000001008dacd1: nop.i 0x0
0xa0000001008dacd2: nop.i 0x0;;
0xa0000001008dace0: [MIB] nop.m 0x0
0xa0000001008dace1: cmp4.eq p7,p6=r14,r15
0xa0000001008dace2: (p06) br.cond.dptk.few 0xa0000001008dacb0
My test module recorded the following sequence on the failing CPU:
}, {
ip = 0xa00000010012f7b0,
addr = 0xe000000181925c08,
oldvalue = 0xffff0000,
newvalue = 0x0,
task = 0xe000000186930000
}, {
ip = 0xa0000001008dacd0,
addr = 0xe000000181925c08,
oldvalue = 0x0,
newvalue = 0x0,
task = 0xe000000186930000
}, {
ip = 0xa0000001008dacd0,
addr = 0xe000000181925c08,
oldvalue = 0x1,
newvalue = 0x0,
task = 0xe000000186930000
}, {
ip = 0xa0000001008dacd0,
addr = 0xe000000181925c08,
oldvalue = 0x1,
newvalue = 0x0,
task = 0xe000000186930000
}, {
ip = 0xa0000001008dacd0,
addr = 0xe000000181925c08,
oldvalue = 0x0,
newvalue = 0x0,
task = 0xe000000186930000
}, {
ip = 0xa0000001008dacd0,
addr = 0xe000000181925c08,
oldvalue = 0x1,
newvalue = 0x1,
task = 0xe000000186930000
}, {
I didn't see values around zero on any ...One more thing - the crash dump I got from that run shows that CPU 2 was just going through zap_page_range(), so it probably also did a few global TLB flushes. I'm not sure how this should matter, but any idea is good now, I think. Anyway, if a global TLB flush is necessary to trigger the bug, it would also explain why we couldn't reproduce it in user-space. OK, I know I'm just wildly guessing (and don't have any explanation for the wrap-around mystery) ... but does anybody have a better idea? Petr Tesarik --
Perhaps ... I did explore the TLB in one variant of my user mode test I added a pointer-chasing routine that looked at enough pages to clear out the TLB. Not quite the same as a flush - but close. It didn't help at all. -Tony --
Hi Tony, I experimented a lot with the code, trying to find a solution, but all in vain. I also tried to add a "dep %0=0,%0,15,2" instruction in the cmpxchg4 loop in __ticket_spin_lock but it still failed when the wrapped around to zero (but now the high word was not even touched). Replacing the "st2.rel" instruction with a similar cmpxchg4 loop in __ticket_spin_unlock did not help either (so we no longer have two accesses with different sizes). What I've seen quite often lately is that the spinlock value is read as "0" by the ld4.acq in __ticket_spin_lock(), then as "1" by ld4.acq inside the debug fault handler, and then as "0" again by the "cmpxchg4" instruction, i.e. the spin lock was actually acquired correctly, but the debug code triggered a panic. This made me think that I had an error in my debug code, so I tried running that test kernel without the probe, just waiting whether the kernel hangs. It did hang within 10 minutes (with 6 parallel test case loops and a module load/unload loop on another terminal) and produced a crash dump that was very similar all the others. To sum it up: 1. The ld4.acq and fetchadd.acq instructions fail to give us a coherent view of the spinlock memory location. 2. So far, the problem has been observed only after the spinlock value changes to zero. 3. It cannot be a random memory scribble, because I employed the DBR registers to catch all writes to that memory location. 4. We haven't been able to reproduce the problem in user-space. Frankly, I think that the processor does not follow the IPF specification, hence it is a CPU bug. But let's be extremely cautious here and re-read the specification once more, very carefully. We can still miss some writes to the siglock memory location: 1. if the same physical address is accessible with another virtual address 2. if the siglock location is written by a non-mandatory RSE-spill Option 2 seems extremely unlikely to me. Option 1 is more plausible, but given that ...
Hi all,
I thought about this point for a while, and then I decided to test this with
brute force. Why not simply skip the zero? If I shift the head position to
the right within the lock, I can iterate over odd numbers only.
Unfortunately, the ia64 platform does not have a fetchadd4 variant with an
increment of 2, so I had to reduce the size of the head/tail to 14 bits, but
that's still sufficient for all today's machines. Anyway, I do NOT propose
this as a solution, rather as a proof of concept.
Anyway, after applying the following patch, the test case provided by Hedi has
been running for a few hours already. Now, I'm confident this is a hardware
bug.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
diff --git a/arch/ia64/include/asm/spinlock.h
b/arch/ia64/include/asm/spinlock.h
index f0816ac..01be28e 100644
--- a/arch/ia64/include/asm/spinlock.h
+++ b/arch/ia64/include/asm/spinlock.h
@@ -26,23 +26,28 @@
* The pad bits in the middle are used to prevent the next_ticket number
* overflowing into the now_serving number.
*
- * 31 17 16 15 14 0
+ * 31 18 17 16 15 2 1 0
* +----------------------------------------------------+
- * | now_serving | padding | next_ticket |
+ * | now_serving | padding | next_ticket | - |
* +----------------------------------------------------+
*/
-#define TICKET_SHIFT 17
-#define TICKET_BITS 15
+#define TICKET_HSHIFT 2
+#define TICKET_TSHIFT 18
+#define TICKET_BITS 14
#define TICKET_MASK ((1 << TICKET_BITS) - 1)
+#define __ticket_spin_is_unlocked(ticket, serve) \
+ (!((((serve) >> (TICKET_TSHIFT - TICKET_HSHIFT)) ^ (ticket)) \
+ & (TICKET_MASK << TICKET_HSHIFT)))
+
static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
{
int *p = (int *)&lock->lock, ticket, serve;
- ticket = ia64_fetchadd(1, p, acq);
+ ticket = ia64_fetchadd(1 << TICKET_HSHIFT, p, acq);
- if (!(((ticket >> ...Started up a test with this patch on my 32-way to confirm. -Tony --
Still running now (22 hours). So that one works too. -Tony --
On Fri, Sep 03, 2010 at 04:35:23PM +0200, Petr Tesarik wrote:
> I didn't see values around zero on any other CPU in the system. So, either
> there is something seriously broken in hardware, or I made a silly mistake in
> the monitoring code.
...
> memset(dbr, sizeof dbr, 0);
^^^^^^^^^^^^^
swapped arguments. Perhaps unrelated to the problem, but still silly :)
Dave
--
