[PATCH tip/core/urgent] rcu: protect fork-time cgroup access

Previous thread: [PATCH -tip 0/7] perf-probe updates - data-structure support improvements, etc. by Masami Hiramatsu on Monday, March 29, 2010 - 1:37 pm. (13 messages)

Next thread: [Ocfs2-devel] [GIT PULL] ocfs2 fixes for 2.6.34-rc2 by Joel Becker on Monday, March 29, 2010 - 2:40 pm. (1 message)
From: Paul E. McKenney
Date: Monday, March 29, 2010 - 2:15 pm

Add an rcu_read_lock() / rcu_read_unlock() pair to protect a fork-time
cgroup access.  This seems likely to be a false positive.

Located by: Alessio Igor Bogani <abogani@texware.it>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

 sched.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched.c b/kernel/sched.c
index 9ab3cd7..d4bb5e0 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2621,7 +2621,9 @@ void sched_fork(struct task_struct *p, int clone_flags)
 	if (p->sched_class->task_fork)
 		p->sched_class->task_fork(p);
 
+	rcu_read_lock();
 	set_task_cpu(p, cpu);
+	rcu_read_unlock();
 
 #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
 	if (likely(sched_info_on()))
--

From: Peter Zijlstra
Date: Monday, March 29, 2010 - 2:19 pm

What got accessed? This patch just looks wrong.

--

From: Peter Zijlstra
Date: Monday, March 29, 2010 - 2:26 pm

So the only cgroup thing I can find is set_task_rq()'s task_group()
usage, which does indeed look like it wants RCU, but then, why would
only the sched_fork() usage of set_task_cpu() need this.

If it's needed (possible) then set_task_rq() needs it unconditionally.

That said, the whole task_group stuff is tied to the cgroup muck, so it
shouldn't be possible for the task_group to disappear on us, but then, I
always sorta glaze over when I get near the cgroup core.

--

From: Paul E. McKenney
Date: Monday, March 29, 2010 - 3:24 pm

Hey, if it was easy, I might have gotten it right on the first try!  ;-)

							Thanx, Paul
--

From: Paul E. McKenney
Date: Monday, March 29, 2010 - 2:29 pm

It might well be wrong.  The lockdep RCU splat triggered in
task_subsys_state():

static inline struct cgroup_subsys_state *task_subsys_state(
	struct task_struct *task, int subsys_id)
{
	return rcu_dereference_check(task->cgroups->subsys[subsys_id],
				     rcu_read_lock_held() ||
				     cgroup_lock_is_held());
}

My thought was that this access was safe due to the fact that we were
in the middle of creating a task, so that no other CPU could access it.
But I am not certain of this, given the fact that this is digging through
cgroup state.

The lockdep splat is here: http://pastebin.ubuntu.com/406131/

Suggestions?

							Thanx, Paul
--

From: Paul E. McKenney
Date: Monday, March 29, 2010 - 2:34 pm

And it appears that my patch is at best insufficient:
http://paste.ubuntu.com/406189/

Left to myself, I would wrap copy_process() with rcu_read_lock(),
but I would rather hear your thoughts before doing too much more
semi-random hacking.  ;-)

						Thanx, Paul
--

From: Peter Zijlstra
Date: Monday, March 29, 2010 - 2:42 pm

Well, I don't think you can get away with that, copy_process() wants to
sleep on quite a few places ;-) Also, locks should be taken at the
smallest possible scope, unless we want to go back to BKL style
locking :-)

As to that freezer splat, you'd have to chase down the cgroup folks, I'm
fully ignorant on that.

For the set_task_rq() one, I'm afraid someone (which again I'm afraid
will be me) will have to look into how the task_group muck ties into the
cgroup muck as I think the original authors of that ran off :/



--

From: Paul E. McKenney
Date: Monday, March 29, 2010 - 2:51 pm

K, adding them to CC.  The two splats are:

	http://pastebin.ubuntu.com/406131/
	http://paste.ubuntu.com/406189/

Some additional RCU protection is required, or perhaps some suppression

Sigh!

							Thanx, Paul
--

From: Paul Menage
Date: Monday, March 29, 2010 - 3:43 pm

On Mon, Mar 29, 2010 at 2:15 PM, Paul E. McKenney

I think you're right that this is a false positive - it would only be
a problem if it were possible for the task to be moved to a different
cgroup, and I think that shouldn't be the case at this point in the
fork path since the new process isn't visible on the tasklist yet,
right?

Paul
--

From: Paul E. McKenney
Date: Monday, March 29, 2010 - 4:05 pm

You are correct, it is not yet on the tasklist.

So I have to ask...  What happens if the underlying cgroup is removed
between the time sched_fork() calls set_task_cpu() and the time that
copy_process() puts the new task on the tasklist?  Or is the initial
cgroup guaranteed to be immortal?

							Thanx, Paul
--

From: Paul Menage
Date: Tuesday, March 30, 2010 - 11:57 am

On Mon, Mar 29, 2010 at 4:05 PM, Paul E. McKenney

As long as your code is running after cgroup_fork() - which
sched_fork() is - then it should be OK. cgroup_fork() takes a
reference count on the parent's cgroups set, which implicitly keeps
all of those cgroups alive until the task either exits or is moved.
But it can't be moved until it's visible on the task list.

Possibly dup_task_struct should do tsk->cgroups = NULL so that
(currently) unsafe references to the un-refcounted tsk->cgroups before
cgroup_fork() get caught.

Paul
--

From: Peter Zijlstra
Date: Tuesday, March 30, 2010 - 1:50 am

Well the thing is, this fork time invocation of
set_task_cpu()->set_task_rq() is in no way special, there's multiple
places like that.


--

From: Paul E. McKenney
Date: Tuesday, March 30, 2010 - 10:28 am

Certainly the lack of clarity about why this access is safe indicates
that recording some of the locking design in rcu_dereferenceI() would
be quite helpful.  ;-)

							Thanx, Paul
--

Previous thread: [PATCH -tip 0/7] perf-probe updates - data-structure support improvements, etc. by Masami Hiramatsu on Monday, March 29, 2010 - 1:37 pm. (13 messages)

Next thread: [Ocfs2-devel] [GIT PULL] ocfs2 fixes for 2.6.34-rc2 by Joel Becker on Monday, March 29, 2010 - 2:40 pm. (1 message)