I've attached an updated patch [now without the initial "From" line.
Either thunderbird or dovecot cannot handle that, sorry for the noise
caused by posting everything 3 times].
Btw, does STP still exist? I'd like to do some testing on real SMP
hardware. http://stp.testing.osdl.org/ appears to be dead.
No, no overlapping at all. But it shouldn't be slower than mainline:
Mainline has two grace periods between call_rcu() and the rcu callback.
My approach means one call and one grace period.
Your code might be a bit faster, if I understand it correctly,
call_rcu() reads rdp->batch and includes everything in the next grace
period.
global state now DESTROY_AND_COLLECT.
DESTROY_AND_COLLECT done for cpu 1. Btw, there is no need that there is
a quiescent state for this operation.
>>> ok - here is call_rcu(). element in rcs->new.
CPU 2 notices DESTROY_AND_COLLECT. Moves all elements from rcs->new to
rcs->old.
someone notices that DESTROY_AND_COLLECT is completed, moves global
state to GRACE.
No - that's impossible. The grace period is started when the global
state is set to GRACE, all cpus must pass a quiescent state while in GRACE.
What is still missing is:
- all cpus must pass a quiescent state.
- last cpus moves global state to DESTROY
- cpu 2 notices that the global state is DESTROY. It moves the elements
from rcs->new to rcd->dead and the softirq will destroy them.
Oh - I forgot to list one point in the patch summary:
I've merged the list of dead pointers for the _bh and the _normal lists.
rcu_do_batch() operates on a unified list.
My approach is similar: first all cpus collect the pointers. Then the
grace period starts. When all cpus have finished, the pointers are
destroyed. New call_rcu() calls during the grace period are queued.
I think it was a NACK on sparc, because sparc used a spinlock inside
atomic_t. I assume it's ok today.
If it's not ok, then I would have to find another solution. I'll wait
for complains.
The #define has a bad name: above that limit I would use a hierarchy
instead of the flag rcu_cumask. The hierarchy is not yet implemented.
Here the miracle occurs: "bla bla bla" is replaced by a rcu_cpumask
structure with (probably) an array of atomic_t's instead of the simple
"int cpus_open".
My code does that same thing: When "0", the cpu is ignored by the state
machine, the cpu is assumed to be outside any read side critical sections.
When switching from "1" to "0", the outstanding work for the current
state is performed.
That's for the detailed review!
Attached is an updated patch, NO_HZ and NMI is now implemented.
--
Manfred