Re: [PATCH tip/core/rcu 09/10] rcu: update obsolete rcu_read_lock() comment.

Previous thread: [GIT PULL] workqueue: fixes for v2.6.36 by Tejun Heo on Monday, August 9, 2010 - 2:54 pm. (1 message)

Next thread: [GIT PULL] please pull infiniband.git by Roland Dreier on Monday, August 9, 2010 - 3:44 pm. (1 message)
From: Paul E. McKenney
Date: Monday, August 9, 2010 - 3:14 pm

Hello!

This patchset shows additional patches queued for 2.6.37, over and above
those posted at http://lkml.org/lkml/2010/7/14/334.  These are all minor
fixes, with the exception of patch #8, which adds TINY_PREEMPT_RCU.
The patches are as follows:

1.	Remove the rcu_head initialization macros (from Mathieu Desnoyers).
	This patch can move forward now that all uses of these macros
	have been removed from mainline.
2.	Update documentation to note the demise of the rcu_head
	initialization macros.
3.	Fix kernel-locking.tmpl docbook documentation, which was still
	using the now-ancient three-argument version of call_rcu().
4.	Allow RCU's CPU stall-warning messages to be controlled via sysfs.
5.	Now that TINY_RCU has been in-tree for a few releases, adjust
	the configuration so that TINY_RCU is mandatory for kernels
	built with !SMP and !PREEMPT.  Once TINY_PREEMPT_RCU has gained
	a similar level of experience, !SMP code will be eliminated
	from TREE_RCU.
6.	Allow kernels to be built such that RCU CPU stall warnings are
	suppressed at boot time.  Patch #4 above allows them to be
	manually re-enabled once the system has booted.
7.	Updates the RCU_FANOUT message to take commit cf244dc01bf68 into
	account.  This commit added a fourth level to TREE_RCU.
8.	Add TINY_PREEMPT_RCU, allowing reduced memory footprint for
	UP builds of preemptible RCU.  This is a cleaned-up version of
	the patch posted at http://lkml.org/lkml/2010/7/21/364.
9.	The "It is illegal to block while in an RCU read-side critical
	section" docbook comment was obsoleted long ago by preemptible
	RCU, so this patch brings it up to the present day.
10.	Add comments above the RCU CPU stall-warning printk()s pointing
	people at the Documentation/RCU/stallwarn.txt documentation.

For a testing-only version of this patchset from git, please see:

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git rcu/testing

							Thanx, Paul

 Documentation/DocBook/kernel-locking.tmpl   |    6 
 ...
From: Paul E. McKenney
Date: Monday, August 9, 2010 - 3:15 pm

Commit cf244dc01bf68 added a fourth level to the TREE_RCU hierarchy,
but the RCU_FANOUT help message still said "cube root".  This commit
fixes this to "fourth root" and also emphasizes that production
systems are well-served by the default.  (Stress-testing RCU itself
uses small RCU_FANOUT values in order to test large-system code paths
on small(er) systems.)

Located-by: John Kacur <jkacur@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 init/Kconfig |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 9366954..7faa5d8 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -384,9 +384,12 @@ config RCU_FANOUT
 	help
 	  This option controls the fanout of hierarchical implementations
 	  of RCU, allowing RCU to work efficiently on machines with
-	  large numbers of CPUs.  This value must be at least the cube
-	  root of NR_CPUS, which allows NR_CPUS up to 32,768 for 32-bit
-	  systems and up to 262,144 for 64-bit systems.
+	  large numbers of CPUs.  This value must be at least the fourth
+	  root of NR_CPUS, which allows NR_CPUS to be insanely large.
+	  The default value of RCU_FANOUT should be used for production
+	  systems, but if you are stress-testing the RCU implementation
+	  itself, small RCU_FANOUT values allow you to test large-system
+	  code paths on small(er) systems.
 
 	  Select a specific number if testing RCU itself.
 	  Take the default if unsure.
-- 
1.7.0.6

--

From: Paul E. McKenney
Date: Monday, August 9, 2010 - 3:15 pm

Currently, if RCU CPU stall warnings are enabled, they are enabled
immediately upon boot.  They can be manually disabled via /sys (and
also re-enabled via /sys), and are automatically disabled upon panic.
However, some users need RCU CPU stalls to be disabled at boot time,
but to be enabled without rebuilding/rebooting.  For example, someone
running a real-time application in production might not want the
additional latency of RCU CPU stall detection in normal operation, but
might need to enable it at any point for fault isolation purposes.

This commit therefore provides a new CONFIG_RCU_CPU_STALL_DETECTOR_RUNNABLE
kernel configuration parameter that maintains the current behavior
(enable at boot) by default, but allows a kernel to be configured
with RCU CPU stall detection built into the kernel, but disabled at
boot time.

Requested-by: Clark Williams <williams@redhat.com>
Requested-by: John Kacur <jkacur@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree.c  |    2 +-
 kernel/rcutree.h  |    6 ++++++
 lib/Kconfig.debug |   13 +++++++++++++
 3 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 5d910be..5aab7da 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -144,7 +144,7 @@ module_param(qhimark, int, 0);
 module_param(qlowmark, int, 0);
 
 #ifdef CONFIG_RCU_CPU_STALL_DETECTOR
-int rcu_cpu_stall_suppress __read_mostly;
+int rcu_cpu_stall_suppress __read_mostly = RCU_CPU_STALL_SUPPRESS_INIT;
 module_param(rcu_cpu_stall_suppress, int, 0644);
 #endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
 
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 183ebf4..bb4d086 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -264,6 +264,12 @@ struct rcu_data {
 						/*  scheduling clock irq */
 						/*  before ratting on them. */
 
+#ifdef CONFIG_RCU_CPU_STALL_DETECTOR_RUNNABLE
+#define RCU_CPU_STALL_SUPPRESS_INIT 0
+#else
+#define RCU_CPU_STALL_SUPPRESS_INIT ...
From: Paul E. McKenney
Date: Monday, August 9, 2010 - 3:15 pm

Set the permissions of the rcu_cpu_stall_suppress to 644 to enable RCU
CPU stall warnings to be enabled and disabled at runtime via sysfs.

Suggested-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index f3d5906..5d910be 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -145,7 +145,7 @@ module_param(qlowmark, int, 0);
 
 #ifdef CONFIG_RCU_CPU_STALL_DETECTOR
 int rcu_cpu_stall_suppress __read_mostly;
-module_param(rcu_cpu_stall_suppress, int, 0);
+module_param(rcu_cpu_stall_suppress, int, 0644);
 #endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
 
 static void force_quiescent_state(struct rcu_state *rsp, int relaxed);
-- 
1.7.0.6

--

From: Paul E. McKenney
Date: Monday, August 9, 2010 - 3:15 pm

The comment says that blocking is illegal in rcu_read_lock()-style
RCU read-side critical sections, which is no longer entirely true
given preemptible RCU.  This commit provides a fix.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 24b8966..d7af96e 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -458,7 +458,20 @@ extern int rcu_my_thread_group_empty(void);
  * will be deferred until the outermost RCU read-side critical section
  * completes.
  *
- * It is illegal to block while in an RCU read-side critical section.
+ * You can avoid reading and understanding the next paragraph by
+ * following this rule: don't put anything in an rcu_read_lock() RCU
+ * read-side critical section that would block in a !PREEMPT kernel.
+ * But if you want the full story, read on!
+ *
+ * In non-preemptible RCU implementations (TREE_RCU and TINY_RCU), it
+ * is illegal to block while in an RCU read-side critical section.  In
+ * preemptible RCU implementations (TREE_PREEMPT_RCU and TINY_PREEMPT_RCU)
+ * in CONFIG_PREEMPT kernel builds, RCU read-side critical sections may
+ * be preempted, but explicit blocking is illegal.  Finally, in preemptible
+ * RCU implementations in real-time (CONFIG_PREEMPT_RT) kernel builds,
+ * RCU read-side critical sections may be preempted and they may also
+ * block, but only when acquiring spinlocks that are subject to priority
+ * inheritance.
  */
 static inline void rcu_read_lock(void)
 {
-- 
1.7.0.6

--

From: Mathieu Desnoyers
Date: Monday, August 16, 2010 - 7:45 am

It might be good to add a note about locking chain dependency that is
created in the RT case, e.g., the lock we are sharing with another
context in preempt RT is subject to the same rules as the RCU C.S.. It
should never call synchronize_rcu(); this would cause a RCU+lock-induced
deadlock.

I must admit, however, that because calling synchronize_rcu() from
spinlocks is already forbidden, this is already implied.

Thanks,


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--

From: Paul E. McKenney
Date: Monday, August 16, 2010 - 10:55 am

Thank you for looking this over!

I am updating the srcu_read_lock() docbook comments to call out the
potential for this problem, given that SRCU read-side critical sections
can acquire mutexes, which can be held across both synchronize_srcu()
and synchronize_srcu_expedited().

Seem reasonable?

							Thanx, Paul

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 6f456a7..58971e8 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -139,7 +139,12 @@ static inline int srcu_read_lock_held(struct srcu_struct *sp)
  * @sp: srcu_struct in which to register the new reader.
  *
  * Enter an SRCU read-side critical section.  Note that SRCU read-side
- * critical sections may be nested.
+ * critical sections may be nested.  However, it is illegal to
+ * call anything that waits on an SRCU grace period for the same
+ * srcu_struct, whether directly or indirectly.  Please note that
+ * one way to indirectly wait on an SRCU grace period is to acquire
+ * a mutex that is held elsewhere while calling synchronize_srcu() or
+ * synchronize_srcu_expedited().
  */
 static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
--

From: Mathieu Desnoyers
Date: Monday, August 16, 2010 - 11:24 am

Yep, thanks!


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--

From: Paul E. McKenney
Date: Monday, August 9, 2010 - 3:15 pm

There is some documentation on RCU CPU stall warnings contained in
Documentation/RCU/stallwarn.txt, but it will not be apparent to someone
who runs into such a warning while under time pressure.  This commit
therefore adds comments preceding the printk()s pointing out the
location of this documentation.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree.c |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 5aab7da..ff21411 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -487,8 +487,11 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
 	rcu_print_task_stall(rnp);
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
 
-	/* OK, time to rat on our buddy... */
-
+	/*
+	 * OK, time to rat on our buddy...
+	 * See Documentation/RCU/stallwarn.txt for info on how to debug
+	 * RCU CPU stall warnings.
+	 */
 	printk(KERN_ERR "INFO: %s detected stalls on CPUs/tasks: {",
 	       rsp->name);
 	rcu_for_each_leaf_node(rsp, rnp) {
@@ -517,6 +520,11 @@ static void print_cpu_stall(struct rcu_state *rsp)
 	unsigned long flags;
 	struct rcu_node *rnp = rcu_get_root(rsp);
 
+	/*
+	 * OK, time to rat on ourselves...
+	 * See Documentation/RCU/stallwarn.txt for info on how to debug
+	 * RCU CPU stall warnings.
+	 */
 	printk(KERN_ERR "INFO: %s detected stall on CPU %d (t=%lu jiffies)\n",
 	       rsp->name, smp_processor_id(), jiffies - rsp->gp_start);
 	trigger_all_cpu_backtrace();
-- 
1.7.0.6

--

From: Paul E. McKenney
Date: Monday, August 9, 2010 - 3:15 pm

Because both TINY_RCU and TREE_PREEMPT_RCU have been in mainline for
several releases, it is time to restrict the use of TREE_RCU to SMP
non-preemptible systems.  This reduces testing/validation effort.  This
commit is a first step towards driving the selection of RCU implementation
directly off of the SMP and PREEMPT configuration parameters.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 init/Kconfig |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 5cff9a9..9366954 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -336,6 +336,7 @@ choice
 
 config TREE_RCU
 	bool "Tree-based hierarchical RCU"
+	depends on !PREEMPT && SMP
 	help
 	  This option selects the RCU implementation that is
 	  designed for very large SMP system with hundreds or
-- 
1.7.0.6

--

From: Paul E. McKenney
Date: Monday, August 9, 2010 - 3:15 pm

Implement a small-memory-footprint uniprocessor-only implementation of
preemptible RCU.  This implementation uses but a single blocked-tasks
list rather than the combinatorial number used per leaf rcu_node by
TREE_PREEMPT_RCU, which reduces memory consumption and greatly simplifies
processing.  This version also takes advantage of uniprocessor execution
to accelerate grace periods in the case where there are no readers.

The general design is otherwise broadly similar to that of TREE_PREEMPT_RCU.

This implementation is a step towards having RCU implementation driven
off of the SMP and PREEMPT kernel configuration variables, which can
happen once this implementation has accumulated sufficient experience.

Requested-by: Loïc Minier <loic.minier@canonical.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/hardirq.h   |    2 +-
 include/linux/init_task.h |   10 +-
 include/linux/rcupdate.h  |    3 +-
 include/linux/rcutiny.h   |  126 +++++++---
 include/linux/rcutree.h   |    2 +
 include/linux/sched.h     |   10 +-
 init/Kconfig              |   16 ++-
 kernel/Makefile           |    1 +
 kernel/rcutiny.c          |   33 +--
 kernel/rcutiny_plugin.h   |  580 ++++++++++++++++++++++++++++++++++++++++++++-
 10 files changed, 715 insertions(+), 68 deletions(-)

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index d5b3876..1f4517d 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -139,7 +139,7 @@ static inline void account_system_vtime(struct task_struct *tsk)
 #endif
 
 #if defined(CONFIG_NO_HZ)
-#if defined(CONFIG_TINY_RCU)
+#if defined(CONFIG_TINY_RCU) || defined(CONFIG_TINY_PREEMPT_RCU)
 extern void rcu_enter_nohz(void);
 extern void rcu_exit_nohz(void);
 
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 6460fc6..2fea6c8 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -82,11 +82,17 @@ extern struct group_info init_groups;
 # define CAP_INIT_BSET  ...
From: Mathieu Desnoyers
Date: Monday, August 16, 2010 - 8:07 am

* Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote:

Hrm I think we discussed this in a past life, but would the following
sequence be possible and correct ?

CPU 0

read t->rcu_read_unlock_special
  interrupt comes in, preempts. sets t->rcu_read_unlock_special
  <preempted>
  <scheduled back>
  iret
decrement and read t->rcu_read_lock_nesting
test both old "special" value (which we have locally on the stack) and
detect that rcu_read_lock_nesting is 0.

We actually missed a reschedule.

I think we might need a barrier() between the t->rcu_read_lock_nesting
and t->rcu_read_unlock_special reads. We might need to audit
TREE PREEMPT RCU for the same kind of behavior.

But I might be (again ?) missing something. I've got the feeling you
already convinced me that this was OK for some reason, but I trip on
this every time I read the code.


The interaction with preemption is unclear here. exit.c disables
preemption around the call to exit_rcu(), but if, for some reason,
rcu_read_unlock_special was set earlier by preemption, then the
rcu_read_unlock() code might block and cause problems.

Maybe we should consider clearing rcu_read_unlock_special here ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--

From: Paul E. McKenney
Date: Monday, August 16, 2010 - 11:33 am

You are correct -- I got too aggressive in eliminating synchronization.

Good catch!!!

I added an ACCESS_ONCE() to the second term of the "if" condition so
that it now reads:

	if (--t->rcu_read_lock_nesting == 0 &&
	    unlikely((ACCESS_ONCE(t->rcu_read_unlock_special)))

This prevents the compiler from reordering because the ACCESS_ONCE()
prohibits accessing t->rcu_read_unlock_special unless the value of

The version of __rcu_read_unlock() in kernel/rcutree_plugin.h is as
follows:

void __rcu_read_unlock(void)
{
	struct task_struct *t = current;

	barrier();  /* needed if we ever invoke rcu_read_unlock in rcutree.c */
	if (--ACCESS_ONCE(t->rcu_read_lock_nesting) == 0 &&
	    unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
		rcu_read_unlock_special(t);
#ifdef CONFIG_PROVE_LOCKING
	WARN_ON_ONCE(ACCESS_ONCE(t->rcu_read_lock_nesting) < 0);
#endif /* #ifdef CONFIG_PROVE_LOCKING */
}

The ACCESS_ONCE() calls should cover this.  I believe that the first
ACCESS_ONCE() is redundant, and have checking this more closely on my

But rcu_read_unlock_special() does not block.  In fact, it disables
interrupts over almost all of its execution.  Or am I missing some

If the task blocked in an RCU read-side critical section just before
exit_rcu() was called, we need to remove the task from the ->blkd_tasks
list.  If we fail to do so, we might get a segfault later on.  Also,
we do need to handle any RCU_READ_UNLOCK_NEED_QS requests from the RCU
core.

So I really do like the current approach of calling rcu_read_unlock()
to do this sort of cleanup.

--

From: Mathieu Desnoyers
Date: Monday, August 16, 2010 - 12:19 pm

Hrm, --t->rcu_read_lock_nesting does not have any globally visible
side-effect, so the compiler is free to reorder the memory access across
the rcu_read_unlock_special access. I think we need the ACCESS_ONCE()

This seem to work because we have:

volatile access (read/update t->rcu_read_lock_nesting)
&& (sequence point)
volatile access (t->rcu_read_unlock_special)

The C standard seems to forbid reordering of volatile accesses across
sequence points, so this should be fine. But it would probably be good


I am probably the one who was missing a subtlety about how

It looks good then, I just wanted to ensure that the side-effects of
calling rcu_read_unlock() in this code path were well-thought.

Thanks,


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--

From: Paul E. McKenney
Date: Monday, August 16, 2010 - 2:32 pm

Indeed, it is free to reorder that access.  This has the effect of
extending the scope of the RCU read-side critical section, which is


I should probably review commenting globally, and this might be one


Long ago on the first RCU priority-boosting implementation I tried doing
the rcu_read_unlock() by hand.  The unhappy lessons learned caused me
to just use rcu_read_unlock() when I encountered similar situations
later on.  ;-)

							Thanx, Paul
--

From: Mathieu Desnoyers
Date: Monday, August 16, 2010 - 2:41 pm

So what happens if we get:

CPU 0

read t->rcu_read_lock_nesting
check if equals to 1
read t->rcu_read_unlock_special
  interrupt comes in, preempts. sets t->rcu_read_unlock_special
  <preempted>
  <scheduled back>
  iret
decrement t->rcu_read_lock_nesting
test rcu_read_unlock_special value (read prior to interrupt)
 -> fails to notice the preemption that came in after the
    rcu_read_unlock_special read.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--

From: Paul E. McKenney
Date: Monday, August 16, 2010 - 2:55 pm

Moving this down past the check of t->rcu_read_lock_special (which is
now covered by ACCESS_ONCE()) would violate the C standard, as it would
be equivalent to moving a volatile up past a sequence point.

--

From: Mathieu Desnoyers
Date: Monday, August 16, 2010 - 3:07 pm

Hrm, I'm not quite convinced yet. I am not concerned about gcc moving
the volatile access prior to the sequence point (as you say, this is
forbidden by the C standard), but rather that:

--(t->rcu_read_lock_nesting)

could be split in two distinct operations:

read t->rcu_read_lock_nesting
decrement t->rcu_read_lock_nesting

Note that in order to know the result required to pass the sequence
point "&&" (the test), we only need to perform the read, not the
decrement. AFAIU, gcc would be in its rights to move the
t->rcu_read_lock_nesting update after the volatile access.

Thanks,


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--

From: Paul E. McKenney
Date: Monday, August 16, 2010 - 3:24 pm

I will run this by some compiler experts.

							Thanx, Paul
--

From: Lai Jiangshan
Date: Tuesday, August 17, 2010 - 2:36 am

We can just use "read and decrement statements" instead of "--" to
avoid dependency from compilers.
--

From: Paul E. McKenney
Date: Tuesday, August 17, 2010 - 7:35 am

You mean something like local_add_return()?  This turns into
atomic_add_return() on many platforms, including x86 as it turns out,
so I am reluctant to use it.

If you had something else in mind, please don't keep it a secret!

							Thanx, Paul
--

From: Steven Rostedt
Date: Tuesday, August 17, 2010 - 6:27 am

If we are this concerned, what about just doing:

	--t->rcu_read_lock_nesting;
        if (ACCESS_ONCE(t->rcu_read_lock_nesting) == 0 &&
             unlikely((ACCESS_ONCE(t->rcu_read_unlock_special)))

-- Steve


--

From: Mathieu Desnoyers
Date: Tuesday, August 17, 2010 - 7:16 am

I'd be concerned by the fact that there is no strong ordering guarantee
that the non-volatile --t->rcu_read_lock_nesting is done before
ACCESS_ONCE(t->rcu_read_unlock_special).

My concern is that the compiler might be allowed to turn your code into:

        if (ACCESS_ONCE(t->rcu_read_lock_nesting) == 1 &&
             unlikely((ACCESS_ONCE(t->rcu_read_unlock_special))) {
 		--t->rcu_read_lock_nesting;
		do_something();
	} else
	 	--t->rcu_read_lock_nesting;

So whether or not this could be done by the compiler depending on the
various definitions of volatile, I strongly recommend against using
volatile accesses to provide compiler ordering guarantees. It is bad in
terms of code documentation (we don't document _what_ is ordered) and it
is also bad because the volatile ordering guarantees seems to be
very easy to misinterpret.

ACCESS_ONCE() should be only that: a macro that tells the access should
be performed only once. Why are we suddenly presuming it should have any
ordering semantic ?

It should be totally valid to create arch-specific ACCESS_ONCE() macros
that only perform the "read once", without the ordering guarantees
provided by the current ACCESS_ONCE() "volatile" implementation. The
following code is only for unsigned long, but you get the idea: there is
no volatile at all, and I ensure that "val" is only read once by using
the "+m" (val) constraint, telling the compiler (falsely) that the
assembler is modifying the value (it therefore has a side-effect), so
gcc won't be tempted to re-issue the assembly statement.

static inline unsigned long arch_access_once(unsigned long val)
{
	unsigned long ret;

#if (__BITS_PER_LONG == 32)
	asm ("movl %1,%0": "=r" (ret), "+m" (val));
#else
	asm ("movq %1,%0": "=r" (ret), "+m" (val));
#endif
}

Thanks,


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--

From: Steven Rostedt
Date: Tuesday, August 17, 2010 - 7:54 am

Yes, volatile does not guarantee ordering of other accesses, but it
should at least guarantee ordering of access to the thing that is
volatile.

	b++;
	a++;
	c = ACCESS_ONCE(a);

'b++' can be moved to anywhere. But I'm pretty sure the compiler is not
allowed to move the 'a++' after the ACCESS_ONCE(a) because it is the
thing that is volatile. We are telling the compiler that 'a' can change
outside our scope, which to me is the same as doing:

	a++;
	c = some_global_function(&a);

Where, the compiler does not know the result of 'a' and can not move the
'a++'.


Maybe I'm wrong, and need to verify this with a compiler expert. But
what's the use of volatile if it can't protect the ordering of what is

Only ordering with the variable that is volatile. It has no ordering to

Heck, this is too much micro optimization. We could just be safe and do
the:
 	--t->rcu_read_lock_nesting;
	barrier();
         if (ACCESS_ONCE(t->rcu_read_lock_nesting) == 0 &&
              unlikely((ACCESS_ONCE(t->rcu_read_unlock_special)))
 
And be done with it.

-- Steve


--

From: Mathieu Desnoyers
Date: Tuesday, August 17, 2010 - 8:55 am

Is there a rule stating that a non-volatile access is ensured to be in
issued in program order wrt other accesses to that same variable ?

The stardard discourages compilers to do any kind of optimization when
volatile is detected on a variable
(http://gcc.gnu.org/onlinedocs/gcc/Volatiles.html), but that's a very
weak guarantee I would not like to rely on.

The only strong guarantee it provides is:
"The minimum either standard specifies is that at a sequence point all
previous accesses to volatile objects have stabilized and no subsequent
accesses have occurred."

So the question here is: given that the "--t->rcu_read_lock_nesting"
access is not marked volatile, but the following
"ACCESS_ONCE(t->rcu_read_lock_nesting)" read is volatile, should we
consider than "--t->rcu_read_lock_nesting" apply to a volatile object or
not ?


AFAIU, the standard only requires ordering between volatile "objects".
So when we cast non-volatile objects to volatile, I assume the
non-volatile accesses apply only to the non-volatile version of the

The code here is different: calling an external function is equivalent
to put a full compiler memory barrier ("memory" clobber). Volatile is
quite different from that; the compiler is only told to keep ordering
between volatile accesses, and to try not to optimize the volatile

I'm concerned about the fact that volatile seems to have been initially
designed to apply to a variable declaration (object) rather than a
specific access through a cast. So I really wonder if the non-volatile
object accesses has the same guarantees as the accesses performed on its

This is not quite correct. Volatile orders with respect to _all_ other
volatile accesses.

What I am pointing out here is that the macro name "ACCESS_ONCE()" does

Then we could go for the simpler:

  	--t->rcu_read_lock_nesting;
 	barrier();
        if (t->rcu_read_lock_nesting == 0 &&
             unlikely((t->rcu_read_unlock_special))

Which puts a constraint across all memory accesses. ...
From: Steven Rostedt
Date: Tuesday, August 17, 2010 - 9:04 am

Yeah, that's what I meant, I was too lazy to remove the ACCESS_ONCE()

Not afraid, but just too much code for a simple solution.

-- Steve


--

From: Steven Rostedt
Date: Tuesday, August 17, 2010 - 9:06 am

IOW,

I think just pulling out the '--' and adding the barrier() is the proper
solution here. Compiler barriers are rather cheap.

Can we all agree on this solution?

-- Steve


--

From: Mathieu Desnoyers
Date: Tuesday, August 17, 2010 - 9:25 am

Given that we already have a barrier() at the beginning of
rcu_read_unlock(), adding a second one will not have much more global
optimisation impact than what is already there. I'm personally fine with
this solution. Let's see what others have to say about this.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--

From: Paul E. McKenney
Date: Tuesday, August 17, 2010 - 12:33 pm

Thank you both for the optimization work -- the read-side primitives do
need to be fast.  And the barrier() approach generates decent code, on
some systems better than the original.  So the second barrier wins.  ;-)

							Thanx, Paul
--

From: Paul E. McKenney
Date: Tuesday, August 17, 2010 - 1:00 pm

And here is the updated version of this patch.

							Thanx, Paul

------------------------------------------------------------------------

commit e4fb476c3a41e0c1dde1f8e303a328623c9eaa92
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue Jun 29 16:49:16 2010 -0700

    rcu: Add a TINY_PREEMPT_RCU
    
    Implement a small-memory-footprint uniprocessor-only implementation of
    preemptible RCU.  This implementation uses but a single blocked-tasks
    list rather than the combinatorial number used per leaf rcu_node by
    TREE_PREEMPT_RCU, which reduces memory consumption and greatly simplifies
    processing.  This version also takes advantage of uniprocessor execution
    to accelerate grace periods in the case where there are no readers.
    
    The general design is otherwise broadly similar to that of TREE_PREEMPT_RCU.
    
    This implementation is a step towards having RCU implementation driven
    off of the SMP and PREEMPT kernel configuration variables, which can
    happen once this implementation has accumulated sufficient experience.
    
    Removed ACCESS_ONCE() from __rcu_read_unlock() and added barrier() as
    suggested by Steve Rostedt in order to avoid the compiler-reordering
    issue noted by Mathieu Desnoyers (http://lkml.org/lkml/2010/8/16/183).
    
    Requested-by: Loïc Minier <loic.minier@canonical.com>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index d5b3876..1f4517d 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -139,7 +139,7 @@ static inline void account_system_vtime(struct task_struct *tsk)
 #endif
 
 #if defined(CONFIG_NO_HZ)
-#if defined(CONFIG_TINY_RCU)
+#if defined(CONFIG_TINY_RCU) || defined(CONFIG_TINY_PREEMPT_RCU)
 extern void rcu_enter_nohz(void);
 extern void rcu_exit_nohz(void);
 
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 6460fc6..2fea6c8 100644
--- ...
From: Paul E. McKenney
Date: Monday, August 9, 2010 - 3:15 pm

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
---
 Documentation/DocBook/kernel-locking.tmpl |    8 --------
 1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl
index 084f6ad..e6cc574 100644
--- a/Documentation/DocBook/kernel-locking.tmpl
+++ b/Documentation/DocBook/kernel-locking.tmpl
@@ -1725,14 +1725,6 @@ the amount of locking which needs to be done.
          if (++cache_num > MAX_CACHE_SIZE) {
                  struct object *i, *outcast = NULL;
                  list_for_each_entry(i, &cache, list) {
-@@ -85,6 +94,7 @@
-         obj->popularity = 0;
-         atomic_set(&obj->refcnt, 1); /* The cache holds a reference */
-         spin_lock_init(&obj->lock);
-+        INIT_RCU_HEAD(&obj->rcu);
-
-         spin_lock_irqsave(&cache_lock, flags);
-         __cache_add(obj);
 @@ -104,12 +114,11 @@
  struct object *cache_find(int id)
  {
-- 
1.7.0.6

--

From: Paul E. McKenney
Date: Monday, August 9, 2010 - 3:15 pm

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

RCU heads really don't need to be initialized. Their state before call_rcu()
really does not matter.

We need to keep init/destroy_rcu_head_on_stack() though, since we want
debugobjects to be able to keep track of these objects.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: David S. Miller <davem@davemloft.net>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: akpm@linux-foundation.org
CC: mingo@elte.hu
CC: laijs@cn.fujitsu.com
CC: dipankar@in.ibm.com
CC: josh@joshtriplett.org
CC: dvhltc@us.ibm.com
CC: niv@us.ibm.com
CC: tglx@linutronix.de
CC: peterz@infradead.org
CC: rostedt@goodmis.org
CC: Valdis.Kletnieks@vt.edu
CC: dhowells@redhat.com
CC: eric.dumazet@gmail.com
CC: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
---
 include/linux/rcupdate.h |    6 ------
 1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 3e1b662..27b44b3 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -75,12 +75,6 @@ extern void rcu_init(void);
 #error "Unknown RCU implementation specified to kernel configuration"
 #endif
 
-#define RCU_HEAD_INIT	{ .next = NULL, .func = NULL }
-#define RCU_HEAD(head) struct rcu_head head = RCU_HEAD_INIT
-#define INIT_RCU_HEAD(ptr) do { \
-       (ptr)->next = NULL; (ptr)->func = NULL; \
-} while (0)
-
 /*
  * init_rcu_head_on_stack()/destroy_rcu_head_on_stack() are needed for dynamic
  * initialization and destruction of rcu_head on the stack. rcu_head structures
-- 
1.7.0.6

--

From: Paul E. McKenney
Date: Monday, August 9, 2010 - 3:15 pm

Reported-by: Kyle Hubert <khubert@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
---
 Documentation/DocBook/kernel-locking.tmpl |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl
index e6cc574..ed64d22 100644
--- a/Documentation/DocBook/kernel-locking.tmpl
+++ b/Documentation/DocBook/kernel-locking.tmpl
@@ -1645,7 +1645,9 @@ the amount of locking which needs to be done.
       all the readers who were traversing the list when we deleted the
       element are finished.  We use <function>call_rcu()</function> to
       register a callback which will actually destroy the object once
-      the readers are finished.
+      all pre-existing readers are finished.  Alternatively,
+      <function>synchronize_rcu()</function> may be used to block until
+      all pre-existing are finished.
     </para>
     <para>
       But how does Read Copy Update know when the readers are
@@ -1714,7 +1716,7 @@ the amount of locking which needs to be done.
 -        object_put(obj);
 +        list_del_rcu(&obj->list);
          cache_num--;
-+        call_rcu(&obj->rcu, cache_delete_rcu, obj);
++        call_rcu(&obj->rcu, cache_delete_rcu);
  }
 
  /* Must be holding cache_lock */
-- 
1.7.0.6

--

Previous thread: [GIT PULL] workqueue: fixes for v2.6.36 by Tejun Heo on Monday, August 9, 2010 - 2:54 pm. (1 message)

Next thread: [GIT PULL] please pull infiniband.git by Roland Dreier on Monday, August 9, 2010 - 3:44 pm. (1 message)