Re: [RFC patch 2/3] tree/tiny rcu: Add debug RCU head objects (v3)

Previous thread: [RFC patch 3/3] kernel call_rcu usage: initialize rcu_head structures by Mathieu Desnoyers on Friday, March 19, 2010 - 1:47 pm. (1 message)

Next thread: debug registers, write breakpoints, and x86 by Chris Friesen on Friday, March 19, 2010 - 2:33 pm. (1 message)
From: Mathieu Desnoyers
Date: Friday, March 19, 2010 - 1:47 pm

Hi,

This patchset introduces a debugobjects-based rcu list head debugging
infrastructure. The first patch adds the ability to keep track of a "state
machine" into debugobjects. The state machine logic is all contained in the
type-aware caller. Debugobjects only keep track of the current state (a single
integer). A state validation/transition API is provided to debugobjects users.

This patchset is based on 2.6.33.1.

Comments are welcome,

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--

From: Mathieu Desnoyers
Date: Friday, March 19, 2010 - 1:47 pm

Helps finding racy users of call_rcu(), which results in hangs because list
entries are overwritten and/or skipped.

This new patch version is based on the debugobjects with the newly introduced
"active state" tracker.

Non-initialized entries are all considered as "statically initialized". An
activation fixup (triggered by call_rcu()) takes care of performing the debug
object initialization without issuing any warning. Since we cannot increase the
size of struct rcu_head, I don't see much room to put an identifier for
statically initialized rcu_head structures. So for now, we have to live without
"activation without explicit init" detection. But the main purpose of this debug
option is to detect double-activations (double call_rcu() use of a rcu_head
before the callback is executed), which is correctly addressed here.

This also detects potential internal RCU callback corruption, which would cause
the callbacks to be executed twice.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: akpm@linux-foundation.org
CC: mingo@elte.hu
CC: laijs@cn.fujitsu.com
CC: dipankar@in.ibm.com
CC: josh@joshtriplett.org
CC: dvhltc@us.ibm.com
CC: niv@us.ibm.com
CC: tglx@linutronix.de
CC: peterz@infradead.org
CC: rostedt@goodmis.org
CC: Valdis.Kletnieks@vt.edu
CC: dhowells@redhat.com
CC: eric.dumazet@gmail.com
CC: Alexey Dobriyan <adobriyan@gmail.com>
---
 include/linux/rcupdate.h |   61 ++++++++++++++++++-
 kernel/rcupdate.c        |  148 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/rcutiny.c         |    2 
 kernel/rcutree.c         |    2 
 lib/Kconfig.debug        |    6 +
 5 files changed, 215 insertions(+), 4 deletions(-)

Index: linux-2.6-lttng/include/linux/rcupdate.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/rcupdate.h	2010-03-19 15:26:02.000000000 -0400
+++ linux-2.6-lttng/include/linux/rcupdate.h	2010-03-19 15:26:06.000000000 ...
From: Alexey Dobriyan
Date: Friday, March 19, 2010 - 3:10 pm

Is this useful?

Basic usage is so there no double call_rcu():

	if (atomic_dec_and_test())
		call_rcu()
--

From: Paul E. McKenney
Date: Friday, March 19, 2010 - 3:49 pm

I believe that it is.  There have been a few cases of call_rcu() being
invoked twice without a grace period between the two invocations.
Mathieu's patch would catch this sort of misbehavior.

That said, I do agree that if everyone followed the rules, there would
be no need for Mathieu's patch -- and there would be no need for much
else, besides.  ;-)

							Thanx, Paul
--

From: Lai Jiangshan
Date: Sunday, March 21, 2010 - 8:33 pm

Very nice you do not touch struct rcu_head.


I'm a little foolish. I don't know why we need
STATE_RCU_HEAD_READY/QUEUED.

1) obj->astate does not be set to STATE_RCU_HEAD_READY in
debug_object_activate/init().

Maybe you need to add a function:
debug_object_activate_with_astate(head, descr, init_astate);
(the same as debug_object_activate(), but init obj->astate also)

Or you need:
@debugobjects.h
# define DEBUG_OBJECT_ASTATE_INIT 0
@rcupdate.c
# define STATE_RCU_HEAD_READY	DEBUG_OBJECT_ASTATE_INIT
# define STATE_RCU_HEAD_QUEUED	(STATE_RCU_HEAD_READY + 1)

2) In debug_rcu_head_queue(), debug_object_active_state()
   is always success when debug_object_activate() is success.

   In debug_rcu_head_unqueue(), debug_object_active_state()
   is always success if RCU subsystem is running correctly.

(Correct me if I'm wrong) So we don't need debug_object_active_state()


We may need call rcu_barrier(), rcu_barrier_sched(), rcu_barrier_bh() together?

--

From: Mathieu Desnoyers
Date: Monday, March 22, 2010 - 7:22 am

Yes it is. astate is set to 0 upon activation, and STATE_RCU_HEAD_READY is 0.
This is the initial state of the state machine (and also the accepting state).
Maybe I could document it a little bit more. Will write:

+/*
+ * Active state:
+ * - Set at 0 upon initialization.
+ * - Must return to 0 before deactivation.
+ */
+extern void
+debug_object_active_state(void *addr, struct debug_obj_descr *descr,
+                         unsigned int expect, unsigned int next);

Yes, but this debug infrastructure also detects problems that may arise in the
RCU subsystem itself: e.g. a rcu head that would be executed on two CPUs, which
could be caused by races between cpu hotplug and the RCU subsystem. I'm not
saying that we have these races currently, but that the RCU subsystems is now so
tied to complex parts of the kernel (scheduler, cpu hotplug) that it's bound to
have nasty bugs eventually.

So without the state machine, what we don't have in debugobjects is the ability
to detect a second "deactivation": debugobjects considers that an object can be
deactivated more than once without error. But in this case, just one
deactivation should be allowed. I thought that this state machine scheme would
allow following an object life more closely than just the
init/activate/deactivate/destroy/free phases. Maybe there could be ways to
combine the calls ? But.. given that it's for debugging purposes, do we care
that much about performance that it's worth creating new API members, making the

Will change for:

#ifndef CONFIG_PREEMPT
                WARN_ON(1);
                return 0;
#else
                if (rcu_preempt_depth() != 0 || preempt_count() != 0 ||
                    irqs_disabled()) {
                        WARN_ON(1);
                        return 0;
                }
                rcu_barrier();
                rcu_barrier_sched();
                rcu_barrier_bh();
                debug_object_activate(head, &rcuhead_debug_descr);
                return ...
From: Mathieu Desnoyers
Date: Friday, March 19, 2010 - 1:47 pm

Implement a basic state machine checker in the debugobjects.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: akpm@linux-foundation.org
CC: mingo@elte.hu
CC: laijs@cn.fujitsu.com
CC: dipankar@in.ibm.com
CC: josh@joshtriplett.org
CC: dvhltc@us.ibm.com
CC: niv@us.ibm.com
CC: tglx@linutronix.de
CC: peterz@infradead.org
CC: rostedt@goodmis.org
CC: Valdis.Kletnieks@vt.edu
CC: dhowells@redhat.com
CC: eric.dumazet@gmail.com
CC: Alexey Dobriyan <adobriyan@gmail.com>
---
 include/linux/debugobjects.h |    9 ++++++
 lib/debugobjects.c           |   59 ++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 65 insertions(+), 3 deletions(-)

Index: linux-2.6-lttng/include/linux/debugobjects.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/debugobjects.h	2010-03-19 15:26:02.000000000 -0400
+++ linux-2.6-lttng/include/linux/debugobjects.h	2010-03-19 15:26:06.000000000 -0400
@@ -20,12 +20,14 @@ struct debug_obj_descr;
  * struct debug_obj - representaion of an tracked object
  * @node:	hlist node to link the object into the tracker list
  * @state:	tracked object state
+ * @astate:	current active state
  * @object:	pointer to the real object
  * @descr:	pointer to an object type specific debug description structure
  */
 struct debug_obj {
 	struct hlist_node	node;
 	enum debug_obj_state	state;
+	unsigned int		astate;
 	void			*object;
 	struct debug_obj_descr	*descr;
 };
@@ -60,6 +62,13 @@ extern void debug_object_deactivate(void
 extern void debug_object_destroy   (void *addr, struct debug_obj_descr *descr);
 extern void debug_object_free      (void *addr, struct debug_obj_descr *descr);
 
+/*
+ * Active state must return to 0 before deactivation.
+ */
+extern void
+debug_object_active_state(void *addr, struct debug_obj_descr *descr,
+			  unsigned int expect, unsigned int next);
+
 extern void debug_objects_early_init(void);
 ...
Previous thread: [RFC patch 3/3] kernel call_rcu usage: initialize rcu_head structures by Mathieu Desnoyers on Friday, March 19, 2010 - 1:47 pm. (1 message)

Next thread: debug registers, write breakpoints, and x86 by Chris Friesen on Friday, March 19, 2010 - 2:33 pm. (1 message)