Hello! This patchset shows additional patches queued for 2.6.37, over and above those posted at http://lkml.org/lkml/2010/7/14/334. These are all minor fixes, with the exception of patch #8, which adds TINY_PREEMPT_RCU. The patches are as follows: 1. Remove the rcu_head initialization macros (from Mathieu Desnoyers). This patch can move forward now that all uses of these macros have been removed from mainline. 2. Update documentation to note the demise of the rcu_head initialization macros. 3. Fix kernel-locking.tmpl docbook documentation, which was still using the now-ancient three-argument version of call_rcu(). 4. Allow RCU's CPU stall-warning messages to be controlled via sysfs. 5. Now that TINY_RCU has been in-tree for a few releases, adjust the configuration so that TINY_RCU is mandatory for kernels built with !SMP and !PREEMPT. Once TINY_PREEMPT_RCU has gained a similar level of experience, !SMP code will be eliminated from TREE_RCU. 6. Allow kernels to be built such that RCU CPU stall warnings are suppressed at boot time. Patch #4 above allows them to be manually re-enabled once the system has booted. 7. Updates the RCU_FANOUT message to take commit cf244dc01bf68 into account. This commit added a fourth level to TREE_RCU. 8. Add TINY_PREEMPT_RCU, allowing reduced memory footprint for UP builds of preemptible RCU. This is a cleaned-up version of the patch posted at http://lkml.org/lkml/2010/7/21/364. 9. The "It is illegal to block while in an RCU read-side critical section" docbook comment was obsoleted long ago by preemptible RCU, so this patch brings it up to the present day. 10. Add comments above the RCU CPU stall-warning printk()s pointing people at the Documentation/RCU/stallwarn.txt documentation. For a testing-only version of this patchset from git, please see: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git rcu/testing Thanx, Paul Documentation/DocBook/kernel-locking.tmpl | 6 ...
Commit cf244dc01bf68 added a fourth level to the TREE_RCU hierarchy, but the RCU_FANOUT help message still said "cube root". This commit fixes this to "fourth root" and also emphasizes that production systems are well-served by the default. (Stress-testing RCU itself uses small RCU_FANOUT values in order to test large-system code paths on small(er) systems.) Located-by: John Kacur <jkacur@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> --- init/Kconfig | 9 ++++++--- 1 files changed, 6 insertions(+), 3 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index 9366954..7faa5d8 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -384,9 +384,12 @@ config RCU_FANOUT help This option controls the fanout of hierarchical implementations of RCU, allowing RCU to work efficiently on machines with - large numbers of CPUs. This value must be at least the cube - root of NR_CPUS, which allows NR_CPUS up to 32,768 for 32-bit - systems and up to 262,144 for 64-bit systems. + large numbers of CPUs. This value must be at least the fourth + root of NR_CPUS, which allows NR_CPUS to be insanely large. + The default value of RCU_FANOUT should be used for production + systems, but if you are stress-testing the RCU implementation + itself, small RCU_FANOUT values allow you to test large-system + code paths on small(er) systems. Select a specific number if testing RCU itself. Take the default if unsure. -- 1.7.0.6 --
Currently, if RCU CPU stall warnings are enabled, they are enabled
immediately upon boot. They can be manually disabled via /sys (and
also re-enabled via /sys), and are automatically disabled upon panic.
However, some users need RCU CPU stalls to be disabled at boot time,
but to be enabled without rebuilding/rebooting. For example, someone
running a real-time application in production might not want the
additional latency of RCU CPU stall detection in normal operation, but
might need to enable it at any point for fault isolation purposes.
This commit therefore provides a new CONFIG_RCU_CPU_STALL_DETECTOR_RUNNABLE
kernel configuration parameter that maintains the current behavior
(enable at boot) by default, but allows a kernel to be configured
with RCU CPU stall detection built into the kernel, but disabled at
boot time.
Requested-by: Clark Williams <williams@redhat.com>
Requested-by: John Kacur <jkacur@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
kernel/rcutree.c | 2 +-
kernel/rcutree.h | 6 ++++++
lib/Kconfig.debug | 13 +++++++++++++
3 files changed, 20 insertions(+), 1 deletions(-)
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 5d910be..5aab7da 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -144,7 +144,7 @@ module_param(qhimark, int, 0);
module_param(qlowmark, int, 0);
#ifdef CONFIG_RCU_CPU_STALL_DETECTOR
-int rcu_cpu_stall_suppress __read_mostly;
+int rcu_cpu_stall_suppress __read_mostly = RCU_CPU_STALL_SUPPRESS_INIT;
module_param(rcu_cpu_stall_suppress, int, 0644);
#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 183ebf4..bb4d086 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -264,6 +264,12 @@ struct rcu_data {
/* scheduling clock irq */
/* before ratting on them. */
+#ifdef CONFIG_RCU_CPU_STALL_DETECTOR_RUNNABLE
+#define RCU_CPU_STALL_SUPPRESS_INIT 0
+#else
+#define RCU_CPU_STALL_SUPPRESS_INIT ...Set the permissions of the rcu_cpu_stall_suppress to 644 to enable RCU CPU stall warnings to be enabled and disabled at runtime via sysfs. Suggested-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> --- kernel/rcutree.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/kernel/rcutree.c b/kernel/rcutree.c index f3d5906..5d910be 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -145,7 +145,7 @@ module_param(qlowmark, int, 0); #ifdef CONFIG_RCU_CPU_STALL_DETECTOR int rcu_cpu_stall_suppress __read_mostly; -module_param(rcu_cpu_stall_suppress, int, 0); +module_param(rcu_cpu_stall_suppress, int, 0644); #endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */ static void force_quiescent_state(struct rcu_state *rsp, int relaxed); -- 1.7.0.6 --
The comment says that blocking is illegal in rcu_read_lock()-style
RCU read-side critical sections, which is no longer entirely true
given preemptible RCU. This commit provides a fix.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
include/linux/rcupdate.h | 15 ++++++++++++++-
1 files changed, 14 insertions(+), 1 deletions(-)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 24b8966..d7af96e 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -458,7 +458,20 @@ extern int rcu_my_thread_group_empty(void);
* will be deferred until the outermost RCU read-side critical section
* completes.
*
- * It is illegal to block while in an RCU read-side critical section.
+ * You can avoid reading and understanding the next paragraph by
+ * following this rule: don't put anything in an rcu_read_lock() RCU
+ * read-side critical section that would block in a !PREEMPT kernel.
+ * But if you want the full story, read on!
+ *
+ * In non-preemptible RCU implementations (TREE_RCU and TINY_RCU), it
+ * is illegal to block while in an RCU read-side critical section. In
+ * preemptible RCU implementations (TREE_PREEMPT_RCU and TINY_PREEMPT_RCU)
+ * in CONFIG_PREEMPT kernel builds, RCU read-side critical sections may
+ * be preempted, but explicit blocking is illegal. Finally, in preemptible
+ * RCU implementations in real-time (CONFIG_PREEMPT_RT) kernel builds,
+ * RCU read-side critical sections may be preempted and they may also
+ * block, but only when acquiring spinlocks that are subject to priority
+ * inheritance.
*/
static inline void rcu_read_lock(void)
{
--
1.7.0.6
--
It might be good to add a note about locking chain dependency that is created in the RT case, e.g., the lock we are sharing with another context in preempt RT is subject to the same rules as the RCU C.S.. It should never call synchronize_rcu(); this would cause a RCU+lock-induced deadlock. I must admit, however, that because calling synchronize_rcu() from spinlocks is already forbidden, this is already implied. Thanks, -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com --
Thank you for looking this over! I am updating the srcu_read_lock() docbook comments to call out the potential for this problem, given that SRCU read-side critical sections can acquire mutexes, which can be held across both synchronize_srcu() and synchronize_srcu_expedited(). Seem reasonable? Thanx, Paul diff --git a/include/linux/srcu.h b/include/linux/srcu.h index 6f456a7..58971e8 100644 --- a/include/linux/srcu.h +++ b/include/linux/srcu.h @@ -139,7 +139,12 @@ static inline int srcu_read_lock_held(struct srcu_struct *sp) * @sp: srcu_struct in which to register the new reader. * * Enter an SRCU read-side critical section. Note that SRCU read-side - * critical sections may be nested. + * critical sections may be nested. However, it is illegal to + * call anything that waits on an SRCU grace period for the same + * srcu_struct, whether directly or indirectly. Please note that + * one way to indirectly wait on an SRCU grace period is to acquire + * a mutex that is held elsewhere while calling synchronize_srcu() or + * synchronize_srcu_expedited(). */ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp) --
Yep, thanks! -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com --
There is some documentation on RCU CPU stall warnings contained in
Documentation/RCU/stallwarn.txt, but it will not be apparent to someone
who runs into such a warning while under time pressure. This commit
therefore adds comments preceding the printk()s pointing out the
location of this documentation.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
kernel/rcutree.c | 12 ++++++++++--
1 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 5aab7da..ff21411 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -487,8 +487,11 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
rcu_print_task_stall(rnp);
raw_spin_unlock_irqrestore(&rnp->lock, flags);
- /* OK, time to rat on our buddy... */
-
+ /*
+ * OK, time to rat on our buddy...
+ * See Documentation/RCU/stallwarn.txt for info on how to debug
+ * RCU CPU stall warnings.
+ */
printk(KERN_ERR "INFO: %s detected stalls on CPUs/tasks: {",
rsp->name);
rcu_for_each_leaf_node(rsp, rnp) {
@@ -517,6 +520,11 @@ static void print_cpu_stall(struct rcu_state *rsp)
unsigned long flags;
struct rcu_node *rnp = rcu_get_root(rsp);
+ /*
+ * OK, time to rat on ourselves...
+ * See Documentation/RCU/stallwarn.txt for info on how to debug
+ * RCU CPU stall warnings.
+ */
printk(KERN_ERR "INFO: %s detected stall on CPU %d (t=%lu jiffies)\n",
rsp->name, smp_processor_id(), jiffies - rsp->gp_start);
trigger_all_cpu_backtrace();
--
1.7.0.6
--
Because both TINY_RCU and TREE_PREEMPT_RCU have been in mainline for several releases, it is time to restrict the use of TREE_RCU to SMP non-preemptible systems. This reduces testing/validation effort. This commit is a first step towards driving the selection of RCU implementation directly off of the SMP and PREEMPT configuration parameters. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> --- init/Kconfig | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index 5cff9a9..9366954 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -336,6 +336,7 @@ choice config TREE_RCU bool "Tree-based hierarchical RCU" + depends on !PREEMPT && SMP help This option selects the RCU implementation that is designed for very large SMP system with hundreds or -- 1.7.0.6 --
Implement a small-memory-footprint uniprocessor-only implementation of preemptible RCU. This implementation uses but a single blocked-tasks list rather than the combinatorial number used per leaf rcu_node by TREE_PREEMPT_RCU, which reduces memory consumption and greatly simplifies processing. This version also takes advantage of uniprocessor execution to accelerate grace periods in the case where there are no readers. The general design is otherwise broadly similar to that of TREE_PREEMPT_RCU. This implementation is a step towards having RCU implementation driven off of the SMP and PREEMPT kernel configuration variables, which can happen once this implementation has accumulated sufficient experience. Requested-by: Loïc Minier <loic.minier@canonical.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> --- include/linux/hardirq.h | 2 +- include/linux/init_task.h | 10 +- include/linux/rcupdate.h | 3 +- include/linux/rcutiny.h | 126 +++++++--- include/linux/rcutree.h | 2 + include/linux/sched.h | 10 +- init/Kconfig | 16 ++- kernel/Makefile | 1 + kernel/rcutiny.c | 33 +-- kernel/rcutiny_plugin.h | 580 ++++++++++++++++++++++++++++++++++++++++++++- 10 files changed, 715 insertions(+), 68 deletions(-) diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h index d5b3876..1f4517d 100644 --- a/include/linux/hardirq.h +++ b/include/linux/hardirq.h @@ -139,7 +139,7 @@ static inline void account_system_vtime(struct task_struct *tsk) #endif #if defined(CONFIG_NO_HZ) -#if defined(CONFIG_TINY_RCU) +#if defined(CONFIG_TINY_RCU) || defined(CONFIG_TINY_PREEMPT_RCU) extern void rcu_enter_nohz(void); extern void rcu_exit_nohz(void); diff --git a/include/linux/init_task.h b/include/linux/init_task.h index 6460fc6..2fea6c8 100644 --- a/include/linux/init_task.h +++ b/include/linux/init_task.h @@ -82,11 +82,17 @@ extern struct group_info init_groups; # define CAP_INIT_BSET ...
* Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote: Hrm I think we discussed this in a past life, but would the following sequence be possible and correct ? CPU 0 read t->rcu_read_unlock_special interrupt comes in, preempts. sets t->rcu_read_unlock_special <preempted> <scheduled back> iret decrement and read t->rcu_read_lock_nesting test both old "special" value (which we have locally on the stack) and detect that rcu_read_lock_nesting is 0. We actually missed a reschedule. I think we might need a barrier() between the t->rcu_read_lock_nesting and t->rcu_read_unlock_special reads. We might need to audit TREE PREEMPT RCU for the same kind of behavior. But I might be (again ?) missing something. I've got the feeling you already convinced me that this was OK for some reason, but I trip on this every time I read the code. The interaction with preemption is unclear here. exit.c disables preemption around the call to exit_rcu(), but if, for some reason, rcu_read_unlock_special was set earlier by preemption, then the rcu_read_unlock() code might block and cause problems. Maybe we should consider clearing rcu_read_unlock_special here ? Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com --
You are correct -- I got too aggressive in eliminating synchronization.
Good catch!!!
I added an ACCESS_ONCE() to the second term of the "if" condition so
that it now reads:
if (--t->rcu_read_lock_nesting == 0 &&
unlikely((ACCESS_ONCE(t->rcu_read_unlock_special)))
This prevents the compiler from reordering because the ACCESS_ONCE()
prohibits accessing t->rcu_read_unlock_special unless the value of
The version of __rcu_read_unlock() in kernel/rcutree_plugin.h is as
follows:
void __rcu_read_unlock(void)
{
struct task_struct *t = current;
barrier(); /* needed if we ever invoke rcu_read_unlock in rcutree.c */
if (--ACCESS_ONCE(t->rcu_read_lock_nesting) == 0 &&
unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
rcu_read_unlock_special(t);
#ifdef CONFIG_PROVE_LOCKING
WARN_ON_ONCE(ACCESS_ONCE(t->rcu_read_lock_nesting) < 0);
#endif /* #ifdef CONFIG_PROVE_LOCKING */
}
The ACCESS_ONCE() calls should cover this. I believe that the first
ACCESS_ONCE() is redundant, and have checking this more closely on my
But rcu_read_unlock_special() does not block. In fact, it disables
interrupts over almost all of its execution. Or am I missing some
If the task blocked in an RCU read-side critical section just before
exit_rcu() was called, we need to remove the task from the ->blkd_tasks
list. If we fail to do so, we might get a segfault later on. Also,
we do need to handle any RCU_READ_UNLOCK_NEED_QS requests from the RCU
core.
So I really do like the current approach of calling rcu_read_unlock()
to do this sort of cleanup.
--
Hrm, --t->rcu_read_lock_nesting does not have any globally visible side-effect, so the compiler is free to reorder the memory access across the rcu_read_unlock_special access. I think we need the ACCESS_ONCE() This seem to work because we have: volatile access (read/update t->rcu_read_lock_nesting) && (sequence point) volatile access (t->rcu_read_unlock_special) The C standard seems to forbid reordering of volatile accesses across sequence points, so this should be fine. But it would probably be good I am probably the one who was missing a subtlety about how It looks good then, I just wanted to ensure that the side-effects of calling rcu_read_unlock() in this code path were well-thought. Thanks, -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com --
Indeed, it is free to reorder that access. This has the effect of extending the scope of the RCU read-side critical section, which is I should probably review commenting globally, and this might be one Long ago on the first RCU priority-boosting implementation I tried doing the rcu_read_unlock() by hand. The unhappy lessons learned caused me to just use rcu_read_unlock() when I encountered similar situations later on. ;-) Thanx, Paul --
So what happens if we get:
CPU 0
read t->rcu_read_lock_nesting
check if equals to 1
read t->rcu_read_unlock_special
interrupt comes in, preempts. sets t->rcu_read_unlock_special
<preempted>
<scheduled back>
iret
decrement t->rcu_read_lock_nesting
test rcu_read_unlock_special value (read prior to interrupt)
-> fails to notice the preemption that came in after the
rcu_read_unlock_special read.
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
Moving this down past the check of t->rcu_read_lock_special (which is now covered by ACCESS_ONCE()) would violate the C standard, as it would be equivalent to moving a volatile up past a sequence point. --
Hrm, I'm not quite convinced yet. I am not concerned about gcc moving the volatile access prior to the sequence point (as you say, this is forbidden by the C standard), but rather that: --(t->rcu_read_lock_nesting) could be split in two distinct operations: read t->rcu_read_lock_nesting decrement t->rcu_read_lock_nesting Note that in order to know the result required to pass the sequence point "&&" (the test), we only need to perform the read, not the decrement. AFAIU, gcc would be in its rights to move the t->rcu_read_lock_nesting update after the volatile access. Thanks, -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com --
I will run this by some compiler experts. Thanx, Paul --
We can just use "read and decrement statements" instead of "--" to avoid dependency from compilers. --
You mean something like local_add_return()? This turns into atomic_add_return() on many platforms, including x86 as it turns out, so I am reluctant to use it. If you had something else in mind, please don't keep it a secret! Thanx, Paul --
If we are this concerned, what about just doing:
--t->rcu_read_lock_nesting;
if (ACCESS_ONCE(t->rcu_read_lock_nesting) == 0 &&
unlikely((ACCESS_ONCE(t->rcu_read_unlock_special)))
-- Steve
--
I'd be concerned by the fact that there is no strong ordering guarantee
that the non-volatile --t->rcu_read_lock_nesting is done before
ACCESS_ONCE(t->rcu_read_unlock_special).
My concern is that the compiler might be allowed to turn your code into:
if (ACCESS_ONCE(t->rcu_read_lock_nesting) == 1 &&
unlikely((ACCESS_ONCE(t->rcu_read_unlock_special))) {
--t->rcu_read_lock_nesting;
do_something();
} else
--t->rcu_read_lock_nesting;
So whether or not this could be done by the compiler depending on the
various definitions of volatile, I strongly recommend against using
volatile accesses to provide compiler ordering guarantees. It is bad in
terms of code documentation (we don't document _what_ is ordered) and it
is also bad because the volatile ordering guarantees seems to be
very easy to misinterpret.
ACCESS_ONCE() should be only that: a macro that tells the access should
be performed only once. Why are we suddenly presuming it should have any
ordering semantic ?
It should be totally valid to create arch-specific ACCESS_ONCE() macros
that only perform the "read once", without the ordering guarantees
provided by the current ACCESS_ONCE() "volatile" implementation. The
following code is only for unsigned long, but you get the idea: there is
no volatile at all, and I ensure that "val" is only read once by using
the "+m" (val) constraint, telling the compiler (falsely) that the
assembler is modifying the value (it therefore has a side-effect), so
gcc won't be tempted to re-issue the assembly statement.
static inline unsigned long arch_access_once(unsigned long val)
{
unsigned long ret;
#if (__BITS_PER_LONG == 32)
asm ("movl %1,%0": "=r" (ret), "+m" (val));
#else
asm ("movq %1,%0": "=r" (ret), "+m" (val));
#endif
}
Thanks,
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
Yes, volatile does not guarantee ordering of other accesses, but it
should at least guarantee ordering of access to the thing that is
volatile.
b++;
a++;
c = ACCESS_ONCE(a);
'b++' can be moved to anywhere. But I'm pretty sure the compiler is not
allowed to move the 'a++' after the ACCESS_ONCE(a) because it is the
thing that is volatile. We are telling the compiler that 'a' can change
outside our scope, which to me is the same as doing:
a++;
c = some_global_function(&a);
Where, the compiler does not know the result of 'a' and can not move the
'a++'.
Maybe I'm wrong, and need to verify this with a compiler expert. But
what's the use of volatile if it can't protect the ordering of what is
Only ordering with the variable that is volatile. It has no ordering to
Heck, this is too much micro optimization. We could just be safe and do
the:
--t->rcu_read_lock_nesting;
barrier();
if (ACCESS_ONCE(t->rcu_read_lock_nesting) == 0 &&
unlikely((ACCESS_ONCE(t->rcu_read_unlock_special)))
And be done with it.
-- Steve
--
Is there a rule stating that a non-volatile access is ensured to be in issued in program order wrt other accesses to that same variable ? The stardard discourages compilers to do any kind of optimization when volatile is detected on a variable (http://gcc.gnu.org/onlinedocs/gcc/Volatiles.html), but that's a very weak guarantee I would not like to rely on. The only strong guarantee it provides is: "The minimum either standard specifies is that at a sequence point all previous accesses to volatile objects have stabilized and no subsequent accesses have occurred." So the question here is: given that the "--t->rcu_read_lock_nesting" access is not marked volatile, but the following "ACCESS_ONCE(t->rcu_read_lock_nesting)" read is volatile, should we consider than "--t->rcu_read_lock_nesting" apply to a volatile object or not ? AFAIU, the standard only requires ordering between volatile "objects". So when we cast non-volatile objects to volatile, I assume the non-volatile accesses apply only to the non-volatile version of the The code here is different: calling an external function is equivalent to put a full compiler memory barrier ("memory" clobber). Volatile is quite different from that; the compiler is only told to keep ordering between volatile accesses, and to try not to optimize the volatile I'm concerned about the fact that volatile seems to have been initially designed to apply to a variable declaration (object) rather than a specific access through a cast. So I really wonder if the non-volatile object accesses has the same guarantees as the accesses performed on its This is not quite correct. Volatile orders with respect to _all_ other volatile accesses. What I am pointing out here is that the macro name "ACCESS_ONCE()" does Then we could go for the simpler: --t->rcu_read_lock_nesting; barrier(); if (t->rcu_read_lock_nesting == 0 && unlikely((t->rcu_read_unlock_special)) Which puts a constraint across all memory accesses. ...
Yeah, that's what I meant, I was too lazy to remove the ACCESS_ONCE() Not afraid, but just too much code for a simple solution. -- Steve --
IOW, I think just pulling out the '--' and adding the barrier() is the proper solution here. Compiler barriers are rather cheap. Can we all agree on this solution? -- Steve --
Given that we already have a barrier() at the beginning of rcu_read_unlock(), adding a second one will not have much more global optimisation impact than what is already there. I'm personally fine with this solution. Let's see what others have to say about this. Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com --
Thank you both for the optimization work -- the read-side primitives do need to be fast. And the barrier() approach generates decent code, on some systems better than the original. So the second barrier wins. ;-) Thanx, Paul --
And here is the updated version of this patch.
Thanx, Paul
------------------------------------------------------------------------
commit e4fb476c3a41e0c1dde1f8e303a328623c9eaa92
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date: Tue Jun 29 16:49:16 2010 -0700
rcu: Add a TINY_PREEMPT_RCU
Implement a small-memory-footprint uniprocessor-only implementation of
preemptible RCU. This implementation uses but a single blocked-tasks
list rather than the combinatorial number used per leaf rcu_node by
TREE_PREEMPT_RCU, which reduces memory consumption and greatly simplifies
processing. This version also takes advantage of uniprocessor execution
to accelerate grace periods in the case where there are no readers.
The general design is otherwise broadly similar to that of TREE_PREEMPT_RCU.
This implementation is a step towards having RCU implementation driven
off of the SMP and PREEMPT kernel configuration variables, which can
happen once this implementation has accumulated sufficient experience.
Removed ACCESS_ONCE() from __rcu_read_unlock() and added barrier() as
suggested by Steve Rostedt in order to avoid the compiler-reordering
issue noted by Mathieu Desnoyers (http://lkml.org/lkml/2010/8/16/183).
Requested-by: Loïc Minier <loic.minier@canonical.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index d5b3876..1f4517d 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -139,7 +139,7 @@ static inline void account_system_vtime(struct task_struct *tsk)
#endif
#if defined(CONFIG_NO_HZ)
-#if defined(CONFIG_TINY_RCU)
+#if defined(CONFIG_TINY_RCU) || defined(CONFIG_TINY_PREEMPT_RCU)
extern void rcu_enter_nohz(void);
extern void rcu_exit_nohz(void);
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 6460fc6..2fea6c8 100644
--- ...Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
---
Documentation/DocBook/kernel-locking.tmpl | 8 --------
1 files changed, 0 insertions(+), 8 deletions(-)
diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl
index 084f6ad..e6cc574 100644
--- a/Documentation/DocBook/kernel-locking.tmpl
+++ b/Documentation/DocBook/kernel-locking.tmpl
@@ -1725,14 +1725,6 @@ the amount of locking which needs to be done.
if (++cache_num > MAX_CACHE_SIZE) {
struct object *i, *outcast = NULL;
list_for_each_entry(i, &amp;cache, list) {
-@@ -85,6 +94,7 @@
- obj-&gt;popularity = 0;
- atomic_set(&amp;obj-&gt;refcnt, 1); /* The cache holds a reference */
- spin_lock_init(&amp;obj-&gt;lock);
-+ INIT_RCU_HEAD(&amp;obj-&gt;rcu);
-
- spin_lock_irqsave(&amp;cache_lock, flags);
- __cache_add(obj);
@@ -104,12 +114,11 @@
struct object *cache_find(int id)
{
--
1.7.0.6
--
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> RCU heads really don't need to be initialized. Their state before call_rcu() really does not matter. We need to keep init/destroy_rcu_head_on_stack() though, since we want debugobjects to be able to keep track of these objects. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: David S. Miller <davem@davemloft.net> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: akpm@linux-foundation.org CC: mingo@elte.hu CC: laijs@cn.fujitsu.com CC: dipankar@in.ibm.com CC: josh@joshtriplett.org CC: dvhltc@us.ibm.com CC: niv@us.ibm.com CC: tglx@linutronix.de CC: peterz@infradead.org CC: rostedt@goodmis.org CC: Valdis.Kletnieks@vt.edu CC: dhowells@redhat.com CC: eric.dumazet@gmail.com CC: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> --- include/linux/rcupdate.h | 6 ------ 1 files changed, 0 insertions(+), 6 deletions(-) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 3e1b662..27b44b3 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -75,12 +75,6 @@ extern void rcu_init(void); #error "Unknown RCU implementation specified to kernel configuration" #endif -#define RCU_HEAD_INIT { .next = NULL, .func = NULL } -#define RCU_HEAD(head) struct rcu_head head = RCU_HEAD_INIT -#define INIT_RCU_HEAD(ptr) do { \ - (ptr)->next = NULL; (ptr)->func = NULL; \ -} while (0) - /* * init_rcu_head_on_stack()/destroy_rcu_head_on_stack() are needed for dynamic * initialization and destruction of rcu_head on the stack. rcu_head structures -- 1.7.0.6 --
Reported-by: Kyle Hubert <khubert@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
---
Documentation/DocBook/kernel-locking.tmpl | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl
index e6cc574..ed64d22 100644
--- a/Documentation/DocBook/kernel-locking.tmpl
+++ b/Documentation/DocBook/kernel-locking.tmpl
@@ -1645,7 +1645,9 @@ the amount of locking which needs to be done.
all the readers who were traversing the list when we deleted the
element are finished. We use <function>call_rcu()</function> to
register a callback which will actually destroy the object once
- the readers are finished.
+ all pre-existing readers are finished. Alternatively,
+ <function>synchronize_rcu()</function> may be used to block until
+ all pre-existing are finished.
</para>
<para>
But how does Read Copy Update know when the readers are
@@ -1714,7 +1716,7 @@ the amount of locking which needs to be done.
- object_put(obj);
+ list_del_rcu(&amp;obj-&gt;list);
cache_num--;
-+ call_rcu(&amp;obj-&gt;rcu, cache_delete_rcu, obj);
++ call_rcu(&amp;obj-&gt;rcu, cache_delete_rcu);
}
/* Must be holding cache_lock */
--
1.7.0.6
--
