[PATCH tip/core/urgent] fix several lockdep splats, allow multiple splats

Previous thread: [PATCH -tip 1/4] [BUGFIX] perf tools: Initialize dso->node member in dso__new by Masami Hiramatsu on Wednesday, April 21, 2010 - 12:56 pm. (4 messages)

Next thread: Another BUG_ON() in blkio controller by Vivek Goyal on Wednesday, April 21, 2010 - 1:12 pm. (2 messages)
From: Paul E. McKenney
Date: Wednesday, April 21, 2010 - 1:01 pm

Hello!

This patchset contains four RCU lockdep splat fixes, courtesy of David
Howells, Peter Zijlstra, and Trond Myklebust, as well as an enhancement
by Lai Jiangshan that permits collecting more than one RCU lockdep splat
per boot.

							Thanx, Paul

------------------------------------------------------------------------

 b/fs/nfs/delegation.c      |   42 ++++++++++++++++++++++++++++--------------
 b/include/linux/rcupdate.h |   15 +++++++++++----
 b/kernel/cgroup_freezer.c  |    5 ++++-
 b/kernel/lockdep.c         |    2 --
 b/kernel/sched.c           |   10 ++++++++++
 fs/nfs/delegation.c        |   44 +++++++++++++++++++++++---------------------
 6 files changed, 76 insertions(+), 42 deletions(-)
--

From: Paul E. McKenney
Date: Wednesday, April 21, 2010 - 1:02 pm

From: David Howells <dhowells@redhat.com>

Fix a number of RCU issues in the NFSv4 delegation code.

 (1) delegation->cred doesn't need to be RCU protected as it's essentially an
     invariant refcounted structure.

     By the time we get to nfs_free_delegation(), the delegation is being
     released, so no one else should be attempting to use the saved
     credentials, and they can be cleared.

     However, since the list of delegations could still be under traversal at
     this point by such as nfs_client_return_marked_delegations(), the cred
     should be released in nfs_do_free_delegation() rather than in
     nfs_free_delegation().  Simply using rcu_assign_pointer() to clear it is
     insufficient as that doesn't stop the cred from being destroyed, and nor
     does calling put_rpccred() after call_rcu(), given that the latter is
     asynchronous.

 (2) nfs_detach_delegation_locked() and nfs_inode_set_delegation() should use
     rcu_derefence_protected() because they can only be called if
     nfs_client::cl_lock is held, and that guards against anyone changing
     nfsi->delegation under it.  Furthermore, the barrier imposed by
     rcu_dereference() is superfluous, given that the spin_lock() is also a
     barrier.

 (3) nfs_detach_delegation_locked() is now passed a pointer to the nfs_client
     struct so that it can issue lockdep advice based on clp->cl_lock for (2).

 (4) nfs_inode_return_delegation_noreclaim() and nfs_inode_return_delegation()
     should use rcu_access_pointer() outside the spinlocked region as they
     merely examine the pointer and don't follow it, thus rendering unnecessary
     the need to impose a partial ordering over the one item of interest.

     These result in an RCU warning like the following:

[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
fs/nfs/delegation.c:332 invoked rcu_dereference_check() without protection!

other info that might help us debug ...
From: Paul E. McKenney
Date: Wednesday, April 21, 2010 - 1:02 pm

From: Lai Jiangshan <laijs@cn.fujitsu.com>

There is no need to disable lockdep after an RCU lockdep splat, so
remove the debug_lockdeps_off() from lockdep_rcu_dereference().
To avoid repeated lockdep splats, use a static variable in the
inlined rcu_dereference_check() and rcu_dereference_protected()
macros so that a given instance splats only once, but so that
multiple instances can be detected per boot.

Requested-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h |   15 +++++++++++----
 kernel/lockdep.c         |    2 --
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 07db2fe..ec9ab49 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -190,6 +190,15 @@ static inline int rcu_read_lock_sched_held(void)
 
 #ifdef CONFIG_PROVE_RCU
 
+#define __do_rcu_dereference_check(c)					\
+	do {								\
+		static bool __warned;					\
+		if (debug_lockdep_rcu_enabled() && !__warned && !(c)) {	\
+			__warned = true;				\
+			lockdep_rcu_dereference(__FILE__, __LINE__);	\
+		}							\
+	} while (0)
+
 /**
  * rcu_dereference_check - rcu_dereference with debug checking
  * @p: The pointer to read, prior to dereferencing
@@ -219,8 +228,7 @@ static inline int rcu_read_lock_sched_held(void)
  */
 #define rcu_dereference_check(p, c) \
 	({ \
-		if (debug_lockdep_rcu_enabled() && !(c)) \
-			lockdep_rcu_dereference(__FILE__, __LINE__); \
+		__do_rcu_dereference_check(c); \
 		rcu_dereference_raw(p); \
 	})
 
@@ -237,8 +245,7 @@ static inline int rcu_read_lock_sched_held(void)
  */
 #define rcu_dereference_protected(p, c) \
 	({ \
-		if (debug_lockdep_rcu_enabled() && !(c)) \
-			lockdep_rcu_dereference(__FILE__, __LINE__); \
+		__do_rcu_dereference_check(c); \
 		(p); \
 	})
 
diff --git a/kernel/lockdep.c ...
From: Paul E. McKenney
Date: Wednesday, April 21, 2010 - 1:02 pm

From: Peter Zijlstra <peterz@infradead.org>

Add an RCU read-side critical section to suppress this false positive.

Located-by: Eric Paris <eparis@parisplace.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/sched.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 6af210a..14c44ec 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -323,6 +323,15 @@ static inline struct task_group *task_group(struct task_struct *p)
 /* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
 static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
 {
+	/*
+	 * Strictly speaking this rcu_read_lock() is not needed since the
+	 * task_group is tied to the cgroup, which in turn can never go away
+	 * as long as there are tasks attached to it.
+	 *
+	 * However since task_group() uses task_subsys_state() which is an
+	 * rcu_dereference() user, this quiets CONFIG_PROVE_RCU.
+	 */
+	rcu_read_lock();
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
 	p->se.parent = task_group(p)->se[cpu];
@@ -332,6 +341,7 @@ static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
 	p->rt.rt_rq  = task_group(p)->rt_rq[cpu];
 	p->rt.parent = task_group(p)->rt_se[cpu];
 #endif
+	rcu_read_unlock();
 }
 
 #else
-- 
1.7.0

--

From: tip-bot for Peter Zijlstra
Date: Friday, April 30, 2010 - 3:51 am

Commit-ID:  8b08ca52f5942c21564bbb90ccfb61053f2c26a1
Gitweb:     http://git.kernel.org/tip/8b08ca52f5942c21564bbb90ccfb61053f2c26a1
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Wed, 21 Apr 2010 13:02:07 -0700
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 30 Apr 2010 12:03:17 +0200

rcu: Fix RCU lockdep splat in set_task_cpu on fork path

Add an RCU read-side critical section to suppress this false
positive.

Located-by: Eric Paris <eparis@parisplace.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: eric.dumazet@gmail.com
LKML-Reference: <1271880131-3951-1-git-send-email-paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index de0bd26..3c2a54f 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -323,6 +323,15 @@ static inline struct task_group *task_group(struct task_struct *p)
 /* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
 static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
 {
+	/*
+	 * Strictly speaking this rcu_read_lock() is not needed since the
+	 * task_group is tied to the cgroup, which in turn can never go away
+	 * as long as there are tasks attached to it.
+	 *
+	 * However since task_group() uses task_subsys_state() which is an
+	 * rcu_dereference() user, this quiets CONFIG_PROVE_RCU.
+	 */
+	rcu_read_lock();
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
 	p->se.parent = task_group(p)->se[cpu];
@@ -332,6 +341,7 @@ static inline void set_task_rq(struct task_struct *p, ...
From: Paul E. McKenney
Date: Wednesday, April 21, 2010 - 1:02 pm

Add an RCU read-side critical section to suppress this false positive.

Located-by: Eric Paris <eparis@parisplace.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
---
 kernel/cgroup_freezer.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index da5e139..e5c0244 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -205,9 +205,12 @@ static void freezer_fork(struct cgroup_subsys *ss, struct task_struct *task)
 	 * No lock is needed, since the task isn't on tasklist yet,
 	 * so it can't be moved to another cgroup, which means the
 	 * freezer won't be removed and will be valid during this
-	 * function call.
+	 * function call.  Nevertheless, apply RCU read-side critical
+	 * section to suppress RCU lockdep false positives.
 	 */
+	rcu_read_lock();
 	freezer = task_freezer(task);
+	rcu_read_unlock();
 
 	/*
 	 * The root cgroup is non-freezable, so we can skip the
-- 
1.7.0

--

From: tip-bot for Paul E. McKenney
Date: Friday, April 30, 2010 - 3:51 am

Commit-ID:  8b46f880841aac821af8efa6581bb0e46b8b9845
Gitweb:     http://git.kernel.org/tip/8b46f880841aac821af8efa6581bb0e46b8b9845
Author:     Paul E. McKenney <paulmck@linux.vnet.ibm.com>
AuthorDate: Wed, 21 Apr 2010 13:02:08 -0700
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 30 Apr 2010 12:03:17 +0200

rcu: Fix RCU lockdep splat on freezer_fork path

Add an RCU read-side critical section to suppress this false
positive.

Located-by: Eric Paris <eparis@parisplace.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: eric.dumazet@gmail.com
LKML-Reference: <1271880131-3951-2-git-send-email-paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/cgroup_freezer.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index da5e139..e5c0244 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -205,9 +205,12 @@ static void freezer_fork(struct cgroup_subsys *ss, struct task_struct *task)
 	 * No lock is needed, since the task isn't on tasklist yet,
 	 * so it can't be moved to another cgroup, which means the
 	 * freezer won't be removed and will be valid during this
-	 * function call.
+	 * function call.  Nevertheless, apply RCU read-side critical
+	 * section to suppress RCU lockdep false positives.
 	 */
+	rcu_read_lock();
 	freezer = task_freezer(task);
+	rcu_read_unlock();
 
 	/*
 	 * The root cgroup is non-freezable, so we can skip the
--

From: Paul E. McKenney
Date: Wednesday, April 21, 2010 - 1:02 pm

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Ensure that we correctly rcu-dereference the delegation itself, and that we
protect against removal while we're changing the contents.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 fs/nfs/delegation.c |   42 ++++++++++++++++++++++++++++--------------
 1 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index 1567124..8d9ec49 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -129,21 +129,35 @@ again:
  */
 void nfs_inode_reclaim_delegation(struct inode *inode, struct rpc_cred *cred, struct nfs_openres *res)
 {
-	struct nfs_delegation *delegation = NFS_I(inode)->delegation;
-	struct rpc_cred *oldcred;
+	struct nfs_delegation *delegation;
+	struct rpc_cred *oldcred = NULL;
 
-	if (delegation == NULL)
-		return;
-	memcpy(delegation->stateid.data, res->delegation.data,
-			sizeof(delegation->stateid.data));
-	delegation->type = res->delegation_type;
-	delegation->maxsize = res->maxsize;
-	oldcred = delegation->cred;
-	delegation->cred = get_rpccred(cred);
-	clear_bit(NFS_DELEGATION_NEED_RECLAIM, &delegation->flags);
-	NFS_I(inode)->delegation_state = delegation->type;
-	smp_wmb();
-	put_rpccred(oldcred);
+	rcu_read_lock();
+	delegation = rcu_dereference(NFS_I(inode)->delegation);
+	if (delegation != NULL) {
+		spin_lock(&delegation->lock);
+		if (delegation->inode != NULL) {
+			memcpy(delegation->stateid.data, res->delegation.data,
+			       sizeof(delegation->stateid.data));
+			delegation->type = res->delegation_type;
+			delegation->maxsize = res->maxsize;
+			oldcred = delegation->cred;
+			delegation->cred = get_rpccred(cred);
+			clear_bit(NFS_DELEGATION_NEED_RECLAIM,
+				  &delegation->flags);
+			NFS_I(inode)->delegation_state = ...
From: Mathieu Desnoyers
Date: Friday, April 30, 2010 - 8:33 am

I recommend creating a kernel command line parameter that would tweak
the number of messages printed by lockdep. The default would indeed by 1
message, but people in a debugging marathon can specify a larger value
so they won't have to reboot between each individual lockdep error.

Thanks,


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--

From: Paul E. McKenney
Date: Friday, April 30, 2010 - 11:02 am

The RCU-lockdep splats are a bit different in nature than the
deadlock-related splats that lockdep normally prints.  The RCU-lockdep
splats are transient in nature, and it is easy to apply WARN_ON_ONCE().
In contrast, if you permit multiple deadlock-related lockdep splats,
you tend to get lots of warnings about the same deadlock cycle.

So how about an additional kernel configuration variable, default
disabled, perhaps named CONFIG_PROVE_RCU_MULTIPLE, that allows a
single boot to see multiple messages?  Unlike the dyntick-idle
WARN_ON()s that generated multi-gigabyte console logs in a great
hurry, I haven't yet seen excessive quantities of RCU-lockdep splats,
so I don't see the need for an integer limit.

Thoughts?

							Thanx, Paul
--

From: Mathieu Desnoyers
Date: Friday, April 30, 2010 - 11:12 am

Ideally we don't want to flood the console with thousands of instances
of the same RCU-lockdep splat (think of a missing read lock on a common
code path). Therefore I think keeping an integer limit is relevant here.
I agree that this integer limit could be selected by a CONFIG_ option
rather than by a kernel parameter, as it will typically only be used on
development kernels with "kernel hacking" enabled anyway. There is not
much point in bloating the kernel code with an extra debug-only kernel
parameter parsing.

Thanks,


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--

From: Paul E. McKenney
Date: Friday, April 30, 2010 - 11:32 am

We already limit via WARN_ON_ONCE(), and there are fewer than 500 lines
of code in the kernel that can give RCU lockdep splats, so I really believe
that we are OK without an overall limit for the foreseeable future.

							Thanx, Paul
--

From: Mathieu Desnoyers
Date: Friday, April 30, 2010 - 12:09 pm

Your idea makes sense then.

Thanks,


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--

From: Valdis.Kletnieks
Date: Friday, April 30, 2010 - 9:16 am

Yeah, that would rock for development kernels - playing whack-a-mole with
a half-dozen new lockdep whinges can easily stretch out for quite some time.
From: Ingo Molnar
Date: Friday, April 30, 2010 - 3:07 am

Hm, this #3 patch i disagree with quite fundamentally: one of the big virtues 
of lockdep is that it complains only once and then shuts up and lets the 
system work. It allows distro debug kernels to have lockdep enabled, etc.

One bugreport per bootup per user is the most we can expect really. Not 
disabling it risks getting a stream of repeat messages, annoyed testers and 
gives us _less_ bugreports in the end.

Also, often the _first_ warning is the most reliable one - sometimes there's 
interactions, and the first bug causing a second warning as well, etc. So 
reporting just the highest-quality (i.e. first) issue we detect is the best 
approach.

Thanks,

	Ingo
--

From: Paul E. McKenney
Date: Friday, April 30, 2010 - 4:47 pm

Good point -- I will forward them on to Trond.

							Thanx, Paul
--

Previous thread: [PATCH -tip 1/4] [BUGFIX] perf tools: Initialize dso->node member in dso__new by Masami Hiramatsu on Wednesday, April 21, 2010 - 12:56 pm. (4 messages)

Next thread: Another BUG_ON() in blkio controller by Vivek Goyal on Wednesday, April 21, 2010 - 1:12 pm. (2 messages)