Hello! This patchset contains four RCU lockdep splat fixes, courtesy of David Howells, Peter Zijlstra, and Trond Myklebust, as well as an enhancement by Lai Jiangshan that permits collecting more than one RCU lockdep splat per boot. Thanx, Paul ------------------------------------------------------------------------ b/fs/nfs/delegation.c | 42 ++++++++++++++++++++++++++++-------------- b/include/linux/rcupdate.h | 15 +++++++++++---- b/kernel/cgroup_freezer.c | 5 ++++- b/kernel/lockdep.c | 2 -- b/kernel/sched.c | 10 ++++++++++ fs/nfs/delegation.c | 44 +++++++++++++++++++++++--------------------- 6 files changed, 76 insertions(+), 42 deletions(-) --
From: David Howells <dhowells@redhat.com>
Fix a number of RCU issues in the NFSv4 delegation code.
(1) delegation->cred doesn't need to be RCU protected as it's essentially an
invariant refcounted structure.
By the time we get to nfs_free_delegation(), the delegation is being
released, so no one else should be attempting to use the saved
credentials, and they can be cleared.
However, since the list of delegations could still be under traversal at
this point by such as nfs_client_return_marked_delegations(), the cred
should be released in nfs_do_free_delegation() rather than in
nfs_free_delegation(). Simply using rcu_assign_pointer() to clear it is
insufficient as that doesn't stop the cred from being destroyed, and nor
does calling put_rpccred() after call_rcu(), given that the latter is
asynchronous.
(2) nfs_detach_delegation_locked() and nfs_inode_set_delegation() should use
rcu_derefence_protected() because they can only be called if
nfs_client::cl_lock is held, and that guards against anyone changing
nfsi->delegation under it. Furthermore, the barrier imposed by
rcu_dereference() is superfluous, given that the spin_lock() is also a
barrier.
(3) nfs_detach_delegation_locked() is now passed a pointer to the nfs_client
struct so that it can issue lockdep advice based on clp->cl_lock for (2).
(4) nfs_inode_return_delegation_noreclaim() and nfs_inode_return_delegation()
should use rcu_access_pointer() outside the spinlocked region as they
merely examine the pointer and don't follow it, thus rendering unnecessary
the need to impose a partial ordering over the one item of interest.
These result in an RCU warning like the following:
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
fs/nfs/delegation.c:332 invoked rcu_dereference_check() without protection!
other info that might help us debug ...From: Lai Jiangshan <laijs@cn.fujitsu.com>
There is no need to disable lockdep after an RCU lockdep splat, so
remove the debug_lockdeps_off() from lockdep_rcu_dereference().
To avoid repeated lockdep splats, use a static variable in the
inlined rcu_dereference_check() and rcu_dereference_protected()
macros so that a given instance splats only once, but so that
multiple instances can be detected per boot.
Requested-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
include/linux/rcupdate.h | 15 +++++++++++----
kernel/lockdep.c | 2 --
2 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 07db2fe..ec9ab49 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -190,6 +190,15 @@ static inline int rcu_read_lock_sched_held(void)
#ifdef CONFIG_PROVE_RCU
+#define __do_rcu_dereference_check(c) \
+ do { \
+ static bool __warned; \
+ if (debug_lockdep_rcu_enabled() && !__warned && !(c)) { \
+ __warned = true; \
+ lockdep_rcu_dereference(__FILE__, __LINE__); \
+ } \
+ } while (0)
+
/**
* rcu_dereference_check - rcu_dereference with debug checking
* @p: The pointer to read, prior to dereferencing
@@ -219,8 +228,7 @@ static inline int rcu_read_lock_sched_held(void)
*/
#define rcu_dereference_check(p, c) \
({ \
- if (debug_lockdep_rcu_enabled() && !(c)) \
- lockdep_rcu_dereference(__FILE__, __LINE__); \
+ __do_rcu_dereference_check(c); \
rcu_dereference_raw(p); \
})
@@ -237,8 +245,7 @@ static inline int rcu_read_lock_sched_held(void)
*/
#define rcu_dereference_protected(p, c) \
({ \
- if (debug_lockdep_rcu_enabled() && !(c)) \
- lockdep_rcu_dereference(__FILE__, __LINE__); \
+ __do_rcu_dereference_check(c); \
(p); \
})
diff --git a/kernel/lockdep.c ...From: Peter Zijlstra <peterz@infradead.org>
Add an RCU read-side critical section to suppress this false positive.
Located-by: Eric Paris <eparis@parisplace.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
kernel/sched.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index 6af210a..14c44ec 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -323,6 +323,15 @@ static inline struct task_group *task_group(struct task_struct *p)
/* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
{
+ /*
+ * Strictly speaking this rcu_read_lock() is not needed since the
+ * task_group is tied to the cgroup, which in turn can never go away
+ * as long as there are tasks attached to it.
+ *
+ * However since task_group() uses task_subsys_state() which is an
+ * rcu_dereference() user, this quiets CONFIG_PROVE_RCU.
+ */
+ rcu_read_lock();
#ifdef CONFIG_FAIR_GROUP_SCHED
p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
p->se.parent = task_group(p)->se[cpu];
@@ -332,6 +341,7 @@ static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
p->rt.rt_rq = task_group(p)->rt_rq[cpu];
p->rt.parent = task_group(p)->rt_se[cpu];
#endif
+ rcu_read_unlock();
}
#else
--
1.7.0
--
Commit-ID: 8b08ca52f5942c21564bbb90ccfb61053f2c26a1 Gitweb: http://git.kernel.org/tip/8b08ca52f5942c21564bbb90ccfb61053f2c26a1 Author: Peter Zijlstra <peterz@infradead.org> AuthorDate: Wed, 21 Apr 2010 13:02:07 -0700 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Fri, 30 Apr 2010 12:03:17 +0200 rcu: Fix RCU lockdep splat in set_task_cpu on fork path Add an RCU read-side critical section to suppress this false positive. Located-by: Eric Paris <eparis@parisplace.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com Cc: eric.dumazet@gmail.com LKML-Reference: <1271880131-3951-1-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/sched.c | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/kernel/sched.c b/kernel/sched.c index de0bd26..3c2a54f 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -323,6 +323,15 @@ static inline struct task_group *task_group(struct task_struct *p) /* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */ static inline void set_task_rq(struct task_struct *p, unsigned int cpu) { + /* + * Strictly speaking this rcu_read_lock() is not needed since the + * task_group is tied to the cgroup, which in turn can never go away + * as long as there are tasks attached to it. + * + * However since task_group() uses task_subsys_state() which is an + * rcu_dereference() user, this quiets CONFIG_PROVE_RCU. + */ + rcu_read_lock(); #ifdef CONFIG_FAIR_GROUP_SCHED p->se.cfs_rq = task_group(p)->cfs_rq[cpu]; p->se.parent = task_group(p)->se[cpu]; @@ -332,6 +341,7 @@ static inline void set_task_rq(struct task_struct *p, ...
Add an RCU read-side critical section to suppress this false positive. Located-by: Eric Paris <eparis@parisplace.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com> --- kernel/cgroup_freezer.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c index da5e139..e5c0244 100644 --- a/kernel/cgroup_freezer.c +++ b/kernel/cgroup_freezer.c @@ -205,9 +205,12 @@ static void freezer_fork(struct cgroup_subsys *ss, struct task_struct *task) * No lock is needed, since the task isn't on tasklist yet, * so it can't be moved to another cgroup, which means the * freezer won't be removed and will be valid during this - * function call. + * function call. Nevertheless, apply RCU read-side critical + * section to suppress RCU lockdep false positives. */ + rcu_read_lock(); freezer = task_freezer(task); + rcu_read_unlock(); /* * The root cgroup is non-freezable, so we can skip the -- 1.7.0 --
Commit-ID: 8b46f880841aac821af8efa6581bb0e46b8b9845 Gitweb: http://git.kernel.org/tip/8b46f880841aac821af8efa6581bb0e46b8b9845 Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> AuthorDate: Wed, 21 Apr 2010 13:02:08 -0700 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Fri, 30 Apr 2010 12:03:17 +0200 rcu: Fix RCU lockdep splat on freezer_fork path Add an RCU read-side critical section to suppress this false positive. Located-by: Eric Paris <eparis@parisplace.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com Cc: eric.dumazet@gmail.com LKML-Reference: <1271880131-3951-2-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/cgroup_freezer.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c index da5e139..e5c0244 100644 --- a/kernel/cgroup_freezer.c +++ b/kernel/cgroup_freezer.c @@ -205,9 +205,12 @@ static void freezer_fork(struct cgroup_subsys *ss, struct task_struct *task) * No lock is needed, since the task isn't on tasklist yet, * so it can't be moved to another cgroup, which means the * freezer won't be removed and will be valid during this - * function call. + * function call. Nevertheless, apply RCU read-side critical + * section to suppress RCU lockdep false positives. */ + rcu_read_lock(); freezer = task_freezer(task); + rcu_read_unlock(); /* * The root cgroup is non-freezable, so we can skip the --
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Ensure that we correctly rcu-dereference the delegation itself, and that we
protect against removal while we're changing the contents.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
fs/nfs/delegation.c | 42 ++++++++++++++++++++++++++++--------------
1 files changed, 28 insertions(+), 14 deletions(-)
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index 1567124..8d9ec49 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -129,21 +129,35 @@ again:
*/
void nfs_inode_reclaim_delegation(struct inode *inode, struct rpc_cred *cred, struct nfs_openres *res)
{
- struct nfs_delegation *delegation = NFS_I(inode)->delegation;
- struct rpc_cred *oldcred;
+ struct nfs_delegation *delegation;
+ struct rpc_cred *oldcred = NULL;
- if (delegation == NULL)
- return;
- memcpy(delegation->stateid.data, res->delegation.data,
- sizeof(delegation->stateid.data));
- delegation->type = res->delegation_type;
- delegation->maxsize = res->maxsize;
- oldcred = delegation->cred;
- delegation->cred = get_rpccred(cred);
- clear_bit(NFS_DELEGATION_NEED_RECLAIM, &delegation->flags);
- NFS_I(inode)->delegation_state = delegation->type;
- smp_wmb();
- put_rpccred(oldcred);
+ rcu_read_lock();
+ delegation = rcu_dereference(NFS_I(inode)->delegation);
+ if (delegation != NULL) {
+ spin_lock(&delegation->lock);
+ if (delegation->inode != NULL) {
+ memcpy(delegation->stateid.data, res->delegation.data,
+ sizeof(delegation->stateid.data));
+ delegation->type = res->delegation_type;
+ delegation->maxsize = res->maxsize;
+ oldcred = delegation->cred;
+ delegation->cred = get_rpccred(cred);
+ clear_bit(NFS_DELEGATION_NEED_RECLAIM,
+ &delegation->flags);
+ NFS_I(inode)->delegation_state = ...I recommend creating a kernel command line parameter that would tweak the number of messages printed by lockdep. The default would indeed by 1 message, but people in a debugging marathon can specify a larger value so they won't have to reboot between each individual lockdep error. Thanks, -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com --
The RCU-lockdep splats are a bit different in nature than the deadlock-related splats that lockdep normally prints. The RCU-lockdep splats are transient in nature, and it is easy to apply WARN_ON_ONCE(). In contrast, if you permit multiple deadlock-related lockdep splats, you tend to get lots of warnings about the same deadlock cycle. So how about an additional kernel configuration variable, default disabled, perhaps named CONFIG_PROVE_RCU_MULTIPLE, that allows a single boot to see multiple messages? Unlike the dyntick-idle WARN_ON()s that generated multi-gigabyte console logs in a great hurry, I haven't yet seen excessive quantities of RCU-lockdep splats, so I don't see the need for an integer limit. Thoughts? Thanx, Paul --
Ideally we don't want to flood the console with thousands of instances of the same RCU-lockdep splat (think of a missing read lock on a common code path). Therefore I think keeping an integer limit is relevant here. I agree that this integer limit could be selected by a CONFIG_ option rather than by a kernel parameter, as it will typically only be used on development kernels with "kernel hacking" enabled anyway. There is not much point in bloating the kernel code with an extra debug-only kernel parameter parsing. Thanks, -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com --
We already limit via WARN_ON_ONCE(), and there are fewer than 500 lines of code in the kernel that can give RCU lockdep splats, so I really believe that we are OK without an overall limit for the foreseeable future. Thanx, Paul --
Your idea makes sense then. Thanks, -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com --
Yeah, that would rock for development kernels - playing whack-a-mole with a half-dozen new lockdep whinges can easily stretch out for quite some time.
Hm, this #3 patch i disagree with quite fundamentally: one of the big virtues of lockdep is that it complains only once and then shuts up and lets the system work. It allows distro debug kernels to have lockdep enabled, etc. One bugreport per bootup per user is the most we can expect really. Not disabling it risks getting a stream of repeat messages, annoyed testers and gives us _less_ bugreports in the end. Also, often the _first_ warning is the most reliable one - sometimes there's interactions, and the first bug causing a second warning as well, etc. So reporting just the highest-quality (i.e. first) issue we detect is the best approach. Thanks, Ingo --
Good point -- I will forward them on to Trond. Thanx, Paul --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian | [RFC/PATCH] RE: git calls SSH_ASKPASS even if DISPLAY is not set |
| Junio C Hamano | Re: Rss produced by git is not valid xml? |
| Linux Kernel Mailing List | iSeries: fix section mismatch in iseries_veth |
| Linux Kernel Mailing List | ixbge: remove TX lock and redo TX accounting. |
| Linux Kernel Mailing List | ixgbe: fix several counter register errata |
