Re: [PATCH] rcu: don't call rcu_preempt_note_context_switch() in rcu_check_callbacks()

Previous thread: [PATCH 08/31] lmb: Add find_lmb_area_size() by Yinghai Lu on Sunday, March 28, 2010 - 7:43 pm. (1 message)

Next thread: [PATCH 07/31] lmb: Add reserve_lmb/free_lmb by Yinghai Lu on Sunday, March 28, 2010 - 7:43 pm. (21 messages)
From: Lai Jiangshan
Date: Sunday, March 28, 2010 - 7:47 pm

Even though in user mode or idle mode, rcu_check_callbacks() is not
context switch, so we don't call rcu_preempt_note_context_switch()
in rcu_check_callbacks().

Though there is no harm that calls rcu_preempt_note_context_switch()
in rcu_check_callbacks(), but it is waste.

rcu_check_callbacks()
  rcu_sched_qs()
    rcu_preempt_note_context_switch()
       Now, ->rcu_read_lock_nesting == 0, so we just calls
       rcu_preempt_qs(), but, rcu_preempt_check_callbacks()
       will call it again and set the ->rcu_read_unlock_special
       correct again.

So let rcu_preempt_check_callbacks() handle things for us.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 3ec8160..c7847ba 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -95,7 +95,7 @@ static int rcu_gp_in_progress(struct rcu_state *rsp)
  * how many quiescent states passed, just if there was at least
  * one since the start of the grace period, this just sets a flag.
  */
-void rcu_sched_qs(int cpu)
+static void __rcu_sched_qs(int cpu)
 {
 	struct rcu_data *rdp;
 
@@ -103,6 +103,11 @@ void rcu_sched_qs(int cpu)
 	rdp->passed_quiesc_completed = rdp->gpnum - 1;
 	barrier();
 	rdp->passed_quiesc = 1;
+}
+
+void rcu_sched_qs(int cpu)
+{
+	__rcu_sched_qs(cpu);
 	rcu_preempt_note_context_switch(cpu);
 }
 
@@ -1138,12 +1143,12 @@ void rcu_check_callbacks(int cpu, int user)
 		 * a quiescent state, so note it.
 		 *
 		 * No memory barrier is required here because both
-		 * rcu_sched_qs() and rcu_bh_qs() reference only CPU-local
+		 * __rcu_sched_qs() and rcu_bh_qs() reference only CPU-local
 		 * variables that other CPUs neither access nor modify,
 		 * at least not while the corresponding CPU is online.
 		 */
 
-		rcu_sched_qs(cpu);
+		__rcu_sched_qs(cpu);
 		rcu_bh_qs(cpu);
 
 	} else if (!in_softirq()) {

--

From: Paul E. McKenney
Date: Sunday, March 28, 2010 - 9:42 pm

Nice!!!

But how about naming the new function that invokes
rcu_preempt_note_context_switch() something like
rcu_sched_note_context_switch(), and then leaving the
name of rcu_sched_qs() the same (rather than changing
it to __rcu_sched_qs(), as below)?

This way, the names clearly call out what the function
is doing.

Or did I miss the point here?

--

From: Lai Jiangshan
Date: Tuesday, March 30, 2010 - 2:43 am

If I understand right, it will become this:

schedule() / run_ksoftirqd() / rcu_needs_cpu()
  rcu_sched_note_context_switch()
    rcu_sched_qs()
    rcu_preempt_note_context_switch()

Right?


--

From: Paul E. McKenney
Date: Tuesday, March 30, 2010 - 9:03 am

Wow!!!  That was a scare!!!  I misread "run_ksoftirqd()" as
"do_softirq().  ;-)

And I am not seeing a call to rcu_sched_qs() in rcu_needs_cpu()...

Here is how I believe it needs to go:

	schedule():
		rcu_sched_note_context_switch()
			rcu_sched_qs()
			rcu_preempt_note_context_switch()

	run_ksoftirqd():
		rcu_sched_qs()

	rcu_check_callbacks():
		rcu_sched_qs() [if idle etc.]
		rcu_bh_qs() [if not in softirq]

The reason we don't need rcu_bh_qs() from run_ksoftirqd() is that
__do_softirq() already calls rcu_bh_qs().

Make sense, or am I missing something?

							Thanx, Paul
--

From: Paul E. McKenney
Date: Wednesday, March 31, 2010 - 8:36 am

And I was in fact missing something.  The rcu_preempt_note_context_switch()
function currently combines some work that needs to happen only at
context-switch time with work that needs to happen all the time.

At first glance, it appears that the big "if" statement in
rcu_preempt_note_context_switch() need only happen for context switches.
The remaining lines must happen unconditionally for context switches,
and should be executed from rcu_check_callbacks() only if the current
CPU is not in an RCU read-side critical section.

							Thanx, Paul
--

From: Lai Jiangshan
Date: Wednesday, March 31, 2010 - 5:56 pm

I think rcu_preempt_check_callbacks() will do this work better
in rcu_check_callbacks().

Thanks, Lai
--

From: Paul E. McKenney
Date: Wednesday, March 31, 2010 - 6:17 pm

Possibly by moving the clearing of RCU_READ_UNLOCK_NEED_QS to
rcu_preempt_check_callbacks() -- or to rcu_preempt_qs().  The latter is in
some sense cleaner, but higher overhead and probably unnecessary.  Hmmm...
Alternatively, require that all callers to rcu_preempt_qs() disable
irqs.  This affects only one callsite, which has a local_irq_disable()
immediately following anyway.  ;-)

							Thanx, Paul
--

From: Lai Jiangshan
Date: Thursday, April 1, 2010 - 12:24 am

current rcu_preempt_check_callbacks() already has code to

--

From: Paul E. McKenney
Date: Thursday, April 1, 2010 - 5:53 pm

English is failing me.  How about the following patch instead?  Untested,
probably does not even compile.

							Thanx, Paul

------------------------------------------------------------------------

From 9a1687fcd572ef34ac394a336d6ea79974e1e7e8 Mon Sep 17 00:00:00 2001
From: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date: Thu, 1 Apr 2010 17:37:01 -0700
Subject: [PATCH tip/core/rcu 1/1] rcu: refactor RCU's context-switch handling

The addition of preemptible RCU to treercu resulted in a bit of
confusion and inefficiency surrounding the handling of context switches
for RCU-sched and for RCU-preempt.  For RCU-sched, a context switch
is a quiescent state, pure and simple, just like it always has been.
For RCU-preempt, a context switch is in no way a quiescent state, but
special handling is required when a task blocks in an RCU read-side
critical section.

However, the callout from the scheduler and the outer loop in ksoftirqd
still calls something named rcu_sched_qs(), whose name is no longer
accurate.  Furthermore, when rcu_check_callbacks() notes an RCU-sched
quiescent state, it ends up unnecessarily (though harmlessly, aside
from the performance hit) enqueuing the current task if it happens to
be running in an RCU-preempt read-side critical section.  This not only
increases the maximum latency of scheduler_tick(), it also needlessly
increases the overhead of the next outermost rcu_read_unlock() invocation.

This patch addresses this situation by separating the notion of RCU's
context-switch handling from that of RCU-sched's quiescent states.
The context-switch handling is covered by rcu_note_context_switch() in
general and by rcu_preempt_note_context_switch() for preemptible RCU.
This permits rcu_sched_qs() to handle quiescent states and only quiescent
states.  It also reduces the maximum latency of scheduler_tick(), though
probably by much less than a microsecond.  Finally, it means that tasks
within preemptible-RCU read-side critical sections avoid incurring ...
From: Lai Jiangshan
Date: Friday, April 2, 2010 - 5:27 am

Very nice, thank you.

Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com>

--

From: Paul E. McKenney
Date: Friday, April 2, 2010 - 8:25 am

Thank you!  It passed overnight testing, so I have queued this for 2.6.35
with your Acked-by.

							Thanx, Paul
--

Previous thread: