Ah, I was confused by the comment,
smp_mb(); /* Don't call for memory barriers before we see zero. */
^^^^^^^^^^^^^^^^^^
So, in fact, we need this barrier to make sure that _other_ CPUs see these
changes in order, thanks. Of course, _we_ already saw zero.
But in that particular case this doesn't matter, rcu_try_flip_waitzero()
is the only function which reads the "non-local" per_cpu(rcu_flipctr), so
it doesn't really need the barrier? (besides, it is always called under
fliplock).
Thanks a lot!!! This fills another gap in my understanding.
OK, the last (I promise :) off-topic question. When CPU 0 and 1 share a
store buffer, the situation is simple, we can replace "CPU 0 stores" with
"CPU 1 stores". But what if CPU 0 is equally "far" from CPUs 1 and 2?
Suppose that CPU 1 does
wmb();
B = 0
Can we assume that CPU 2 doing
if (B == 0) {
rmb();
must see all invalidations from CPU 0 which were seen by CPU 1 before wmb() ?
Yes, but this is _exactly_ what the current code does in the scenario below,
no,
Look, what happens is
// call_rcu()
rcu_flip_flag = rcu_flipped
insert the new callback
// timer irq
move the callbacks (the new one goes to wait[0])
But I still can't understand why this is bad,
Before this callback will be flushed, we need 2 rdp->completed != rcu_ctrlblk.completed
further events, we can't miss rcu_read_lock() section, no?
Please :)
Yes, yes. I just wanted to be sure I didn't miss some other subtle reason.
I hope this is OK, note that migration_call(CPU_DEAD) flushes ->migration_queue,
if we take rq->lock after that we must see !cpu_online(cpu). CPU_UP event is not
interesting for us, we can miss it.
Hmm... but wake_up_process() should be moved under spin_lock().
Oleg.
-