Even though in user mode or idle mode, rcu_check_callbacks() is not
context switch, so we don't call rcu_preempt_note_context_switch()
in rcu_check_callbacks().
Though there is no harm that calls rcu_preempt_note_context_switch()
in rcu_check_callbacks(), but it is waste.
rcu_check_callbacks()
rcu_sched_qs()
rcu_preempt_note_context_switch()
Now, ->rcu_read_lock_nesting == 0, so we just calls
rcu_preempt_qs(), but, rcu_preempt_check_callbacks()
will call it again and set the ->rcu_read_unlock_special
correct again.
So let rcu_preempt_check_callbacks() handle things for us.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 3ec8160..c7847ba 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -95,7 +95,7 @@ static int rcu_gp_in_progress(struct rcu_state *rsp)
* how many quiescent states passed, just if there was at least
* one since the start of the grace period, this just sets a flag.
*/
-void rcu_sched_qs(int cpu)
+static void __rcu_sched_qs(int cpu)
{
struct rcu_data *rdp;
@@ -103,6 +103,11 @@ void rcu_sched_qs(int cpu)
rdp->passed_quiesc_completed = rdp->gpnum - 1;
barrier();
rdp->passed_quiesc = 1;
+}
+
+void rcu_sched_qs(int cpu)
+{
+ __rcu_sched_qs(cpu);
rcu_preempt_note_context_switch(cpu);
}
@@ -1138,12 +1143,12 @@ void rcu_check_callbacks(int cpu, int user)
* a quiescent state, so note it.
*
* No memory barrier is required here because both
- * rcu_sched_qs() and rcu_bh_qs() reference only CPU-local
+ * __rcu_sched_qs() and rcu_bh_qs() reference only CPU-local
* variables that other CPUs neither access nor modify,
* at least not while the corresponding CPU is online.
*/
- rcu_sched_qs(cpu);
+ __rcu_sched_qs(cpu);
rcu_bh_qs(cpu);
} else if (!in_softirq()) {
--
Nice!!! But how about naming the new function that invokes rcu_preempt_note_context_switch() something like rcu_sched_note_context_switch(), and then leaving the name of rcu_sched_qs() the same (rather than changing it to __rcu_sched_qs(), as below)? This way, the names clearly call out what the function is doing. Or did I miss the point here? --
If I understand right, it will become this:
schedule() / run_ksoftirqd() / rcu_needs_cpu()
rcu_sched_note_context_switch()
rcu_sched_qs()
rcu_preempt_note_context_switch()
Right?
--
Wow!!! That was a scare!!! I misread "run_ksoftirqd()" as "do_softirq(). ;-) And I am not seeing a call to rcu_sched_qs() in rcu_needs_cpu()... Here is how I believe it needs to go: schedule(): rcu_sched_note_context_switch() rcu_sched_qs() rcu_preempt_note_context_switch() run_ksoftirqd(): rcu_sched_qs() rcu_check_callbacks(): rcu_sched_qs() [if idle etc.] rcu_bh_qs() [if not in softirq] The reason we don't need rcu_bh_qs() from run_ksoftirqd() is that __do_softirq() already calls rcu_bh_qs(). Make sense, or am I missing something? Thanx, Paul --
And I was in fact missing something. The rcu_preempt_note_context_switch() function currently combines some work that needs to happen only at context-switch time with work that needs to happen all the time. At first glance, it appears that the big "if" statement in rcu_preempt_note_context_switch() need only happen for context switches. The remaining lines must happen unconditionally for context switches, and should be executed from rcu_check_callbacks() only if the current CPU is not in an RCU read-side critical section. Thanx, Paul --
I think rcu_preempt_check_callbacks() will do this work better in rcu_check_callbacks(). Thanks, Lai --
Possibly by moving the clearing of RCU_READ_UNLOCK_NEED_QS to rcu_preempt_check_callbacks() -- or to rcu_preempt_qs(). The latter is in some sense cleaner, but higher overhead and probably unnecessary. Hmmm... Alternatively, require that all callers to rcu_preempt_qs() disable irqs. This affects only one callsite, which has a local_irq_disable() immediately following anyway. ;-) Thanx, Paul --
current rcu_preempt_check_callbacks() already has code to --
English is failing me. How about the following patch instead? Untested, probably does not even compile. Thanx, Paul ------------------------------------------------------------------------ From 9a1687fcd572ef34ac394a336d6ea79974e1e7e8 Mon Sep 17 00:00:00 2001 From: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Date: Thu, 1 Apr 2010 17:37:01 -0700 Subject: [PATCH tip/core/rcu 1/1] rcu: refactor RCU's context-switch handling The addition of preemptible RCU to treercu resulted in a bit of confusion and inefficiency surrounding the handling of context switches for RCU-sched and for RCU-preempt. For RCU-sched, a context switch is a quiescent state, pure and simple, just like it always has been. For RCU-preempt, a context switch is in no way a quiescent state, but special handling is required when a task blocks in an RCU read-side critical section. However, the callout from the scheduler and the outer loop in ksoftirqd still calls something named rcu_sched_qs(), whose name is no longer accurate. Furthermore, when rcu_check_callbacks() notes an RCU-sched quiescent state, it ends up unnecessarily (though harmlessly, aside from the performance hit) enqueuing the current task if it happens to be running in an RCU-preempt read-side critical section. This not only increases the maximum latency of scheduler_tick(), it also needlessly increases the overhead of the next outermost rcu_read_unlock() invocation. This patch addresses this situation by separating the notion of RCU's context-switch handling from that of RCU-sched's quiescent states. The context-switch handling is covered by rcu_note_context_switch() in general and by rcu_preempt_note_context_switch() for preemptible RCU. This permits rcu_sched_qs() to handle quiescent states and only quiescent states. It also reduces the maximum latency of scheduler_tick(), though probably by much less than a microsecond. Finally, it means that tasks within preemptible-RCU read-side critical sections avoid incurring ...
Very nice, thank you. Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com> --
Thank you! It passed overnight testing, so I have queued this for 2.6.35 with your Acked-by. Thanx, Paul --
