I'm not sure if a bitmap is the right storage. If I understand the code
correctly, it contains two information:
1) If the bitmap is clear, then all cpus have completed whatever they
need to do.
A counter is more efficient than a bitmap. Especially: It would allow to
choose the optimal fan-out, independent from 32/64 bits.
2) The information if the current cpu must do something to complete the
current period.non
This is a local information, usually (always?) only the current cpu
needs to know if it must do something.
But this doesn't need to be stored in a shared structure, the
information could be stored in a per-cpu structure.
Do you have a description of the events between call_rcu() and the rcu
callback?
Is the following description correct?
__call_rcu() queues in RCU_NEXT_TAIL.
In the middle of the current grace period: rcu_check_quiescent_state()
calls rcu_next_callbacks_are_ready().
Entry now in RCU_NEXT_READY_TAIL
** 0.5 cycles: wait until all cpus have completed the current cycle.
rcu_process_gp_end() moves from NEXT_READY_TAIL to WAIT_TAIL
** full grace period
rcu_process_gp_end() moves from WAIT_TAIL to DONE_TAIL
rcu_do_batch() finds the entries in DONE_TAIL and calls the callback.
Why this mb()? There was a grace period between the last read side
critical section that might have accessed the pointer.
The rcu internal code already does spin_lock()+spin_unlock(). Isn't that
sufficient?
Have you considered merging RCU_DONE_TAIL for rcu_data and rcu_bh_data?
The docbook entry is duplicated: They are in include/linux/rcupdate.h
and here.
What about removing one of them?
I would go one step further:
Even add call_rcu_sched() into rcupdate.h. Add a Kconfig bool
"RCU_NEEDS_SCHED" and automatically define either the extern or the #define.
--
Manfred
--