Impressive work! a couple of random newbie's questions...
Also superfluously declared in rcuclassic.h and in rcupreempt.h
unused?
Just curious, any reason why rcu_flipctr can't live in rcu_data ? Similarly,
rcu_try_flip_state can be a member of rcu_ctrlblk, it is even protected by
rcu_ctrlblk.fliplock
Isn't DEFINE_PER_CPU_SHARED_ALIGNED better for rcu_flip_flag and rcu_mb_flag?
Could you please tell more, why do we need this cli?
It can't "protect" rcu_ctrlblk.completed, and the only change which affects
the state machine is rcu_flipctr[idx]++, so I can't understand the "half-done"
above. (of course, we must disable preemption while changing rcu_flipctr).
Hmm... this was already discussed... from another message:
> Critical portions of the GP protection happen in the scheduler-clock
> interrupt, which is a hardirq. For example, the .completed counter
> is always incremented in hardirq context, and we cannot tolerate a
> .completed increment in this code. Allowing such an increment would
> defeat the counter-acknowledge state in the state machine.
Still can't understand, please help! .completed could be incremented by
another CPU, why rcu_check_callbacks() running on _this_ CPU is so special?
Isn't it "obvious" that this barrier is not needed? rcu_flipctr could be
change only by this CPU, regardless of the actual value of idx, we can't
read the "wrong" value of rcu_flipctr[idx], no?
OTOH, I can't understand how it works. With ot without local_irq_save(),
another CPU can do rcu_try_flip_idle() and increment rcu_ctrlblk.completed
at any time, we can see the old value... rcu_try_flip_waitzero() can miss us?
OK, GP_STAGES > 1, so rcu_try_flip_waitzero() will actually check both
0 and 1 lastidx's before synchronize_rcu() succeeds... I doubt very much
my understanding is correct. Apart from this why GP_STAGES > 1 ???
Well, I think this code is just too tricky for me! Will try to read again
after sleep ;)
Why? Any code sequence which relies on that?
For example, rcu_check_callbacks does
if (rcu_ctrlblk.completed == rdp->completed)
rcu_try_flip();
it is possible that the timer interrupt calls rcu_check_callbacks()
exactly because rcu_pending() sees rcu_flipped, but without rmb() in
between we can see the old value of rcu_ctrlblk.completed.
This is not a problem because rcu_try_flip() needs rcu_ctrlblk.fliplock,
so rcu_try_flip() will see the new state != rcu_try_flip_idle_state, but
I can't understand any mb() in rcu_try_flip_xxx().
Do we really need this fastpath? It is not needed for the correctness,
and this case is very unlikely (in fact I think it is not possible:
rcu_check_callbacks() (triggers RCU_SOFTIRQ) is called with irqs disabled).
This looks a bit strange. Is this optimization? Why not
spin_lock_irqsave(&rdp->lock, flags);
rdp = RCU_DATA_ME();
...
? RCU_DATA_ME() is cheap, but spin_lock() under local_irq_save() spins
without preemption.
Actually, why do we need rcu_data->lock ? Looks like local_irq_save()
should be enough, no? perhaps some -rt reasons ?
Any reason this func can't do rcu_check_mb() as well?
If this is possible, can't we move the code doing "s/rcu_flipped/rcu_flip_seen/"
from __rcu_advance_callbacks() to rcu_check_mb() to unify the "acks" ?
This "schedule()" is not needed, any time sched_setaffinity() returns on another
CPU we already forced preemption of the previously active task on that CPU.
Well, this is not correct... but doesn't matter because of the next patch.
But could you explain how it can deadlock (according to the changelog of
the next patch) ?
Thanks!
Oleg.
-