Re: 2.6.35+vfs-scale: INFO: suspicious rcu_dereference_check() usage (kernel/exit.c:1387)

Previous thread: [RFC PATCHSET block#for-2.6.36-post] block: convert to REQ_FLUSH/FUA by Tejun Heo on Monday, August 16, 2010 - 9:51 am. (27 messages)

Next thread: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock? by Steven Rostedt on Monday, August 16, 2010 - 10:07 am. (13 messages)
From: Vegard Nossum
Date: Monday, August 16, 2010 - 9:57 am

Hi,

I was testing your VFS scaling tree*, but I get the following in my dmesg:

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/exit.c:1387 invoked rcu_dereference_check() without protection!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 1
2 locks held by bash/2878:
 #0:  (tasklist_lock){.+.+..}, at: [<ffffffff8103a275>] do_wait+0xb3/0x200
 #1:  (&(&sighand->siglock)->rlock){-.....}, at: [<ffffffff810399e9>]
wait_consider_task+0x2ca/0xaa3

stack backtrace:
Pid: 2878, comm: bash Not tainted 2.6.35-00079-g4067b85 #7
Call Trace:
 [<ffffffff8105e9f2>] lockdep_rcu_dereference+0x9b/0xa6
 [<ffffffff8103a036>] wait_consider_task+0x917/0xaa3
 [<ffffffff8102cf36>] ? add_preempt_count+0xc2/0xc9
 [<ffffffff8103a29d>] do_wait+0xdb/0x200
 [<ffffffff8103a45b>] sys_wait4+0x99/0xd3
 [<ffffffff810387e5>] ? child_wait_callback+0x0/0x5d
 [<ffffffff8109be66>] ? might_fault+0x96/0xbc
 [<ffffffff810737a6>] compat_sys_wait4+0x8a/0xc1
 [<ffffffff8109be83>] ? might_fault+0xb3/0xbc
 [<ffffffff8109be38>] ? might_fault+0x68/0xbc
 [<ffffffff815d02c3>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff8102862c>] sys32_waitpid+0xb/0xd
 [<ffffffff81027bb4>] sysenter_dispatch+0x7/0x30
 [<ffffffff815d0284>] ? trace_hardirqs_on_thunk+0x3a/0x3f

I don't think it's related to the VFS patches, however, so I added
some other (maybe relevant) CCs too.

At the time I was simply invoking gcc from bash.


Vegard

* git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin.git
branch vfs-scale
--

From: Vegard Nossum
Date: Monday, August 16, 2010 - 10:01 am

Nick Piggin might update his git config because the commits in the
vfs-scale tree appear to bear an invalid e-mail address. Please reply
to this e-mail instead. Thanks.

Vegard


---------- Forwarded message ----------
From: Vegard Nossum <vegard.nossum@gmail.com>
Date: 16 August 2010 18:57
Subject: 2.6.35+vfs-scale: INFO: suspicious rcu_dereference_check()
usage (kernel/exit.c:1387)
To: Nick Piggin <npiggin@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, David Howells
<dhowells@redhat.com>, "Paul E. McKenney"
<paulmck@linux.vnet.ibm.com>, LKML <linux-kernel@vger.kernel.org>


Hi,

I was testing your VFS scaling tree*, but I get the following in my dmesg:

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/exit.c:1387 invoked rcu_dereference_check() without protection!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 1
2 locks held by bash/2878:
 #0:  (tasklist_lock){.+.+..}, at: [<ffffffff8103a275>] do_wait+0xb3/0x200
 #1:  (&(&sighand->siglock)->rlock){-.....}, at: [<ffffffff810399e9>]
wait_consider_task+0x2ca/0xaa3

stack backtrace:
Pid: 2878, comm: bash Not tainted 2.6.35-00079-g4067b85 #7
Call Trace:
 [<ffffffff8105e9f2>] lockdep_rcu_dereference+0x9b/0xa6
 [<ffffffff8103a036>] wait_consider_task+0x917/0xaa3
 [<ffffffff8102cf36>] ? add_preempt_count+0xc2/0xc9
 [<ffffffff8103a29d>] do_wait+0xdb/0x200
 [<ffffffff8103a45b>] sys_wait4+0x99/0xd3
 [<ffffffff810387e5>] ? child_wait_callback+0x0/0x5d
 [<ffffffff8109be66>] ? might_fault+0x96/0xbc
 [<ffffffff810737a6>] compat_sys_wait4+0x8a/0xc1
 [<ffffffff8109be83>] ? might_fault+0xb3/0xbc
 [<ffffffff8109be38>] ? might_fault+0x68/0xbc
 [<ffffffff815d02c3>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff8102862c>] sys32_waitpid+0xb/0xd
 [<ffffffff81027bb4>] sysenter_dispatch+0x7/0x30
 [<ffffffff815d0284>] ? trace_hardirqs_on_thunk+0x3a/0x3f

I don't think it's ...
From: Vegard Nossum
Date: Monday, August 16, 2010 - 10:03 am

On the third try it will hopefully work.


Vegard

---------- Forwarded message ----------
From: Vegard Nossum <vegard.nossum@gmail.com>
Date: 16 August 2010 19:01
Subject: Re: 2.6.35+vfs-scale: INFO: suspicious
rcu_dereference_check() usage (kernel/exit.c:1387)
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, David Howells
<dhowells@redhat.com>, "Paul E. McKenney"
<paulmck@linux.vnet.ibm.com>, LKML <linux-kernel@vger.kernel.org>


Nick Piggin might update his git config because the commits in the
vfs-scale tree appear to bear an invalid e-mail address. Please reply
to this e-mail instead. Thanks.

Vegard


---------- Forwarded message ----------
From: Vegard Nossum <vegard.nossum@gmail.com>
Date: 16 August 2010 18:57
Subject: 2.6.35+vfs-scale: INFO: suspicious rcu_dereference_check()
usage (kernel/exit.c:1387)
To: Nick Piggin <npiggin@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, David Howells
<dhowells@redhat.com>, "Paul E. McKenney"
<paulmck@linux.vnet.ibm.com>, LKML <linux-kernel@vger.kernel.org>


Hi,

I was testing your VFS scaling tree*, but I get the following in my dmesg:

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/exit.c:1387 invoked rcu_dereference_check() without protection!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 1
2 locks held by bash/2878:
 #0:  (tasklist_lock){.+.+..}, at: [<ffffffff8103a275>] do_wait+0xb3/0x200
 #1:  (&(&sighand->siglock)->rlock){-.....}, at: [<ffffffff810399e9>]
wait_consider_task+0x2ca/0xaa3

stack backtrace:
Pid: 2878, comm: bash Not tainted 2.6.35-00079-g4067b85 #7
Call Trace:
 [<ffffffff8105e9f2>] lockdep_rcu_dereference+0x9b/0xa6
 [<ffffffff8103a036>] wait_consider_task+0x917/0xaa3
 [<ffffffff8102cf36>] ? add_preempt_count+0xc2/0xc9
 [<ffffffff8103a29d>] do_wait+0xdb/0x200
 [<ffffffff8103a45b>] ...
From: Paul E. McKenney
Date: Monday, August 16, 2010 - 1:21 pm

Thank you for locating this!

OK, so we aren't holding rcu_read_lock(), nor is this task dead, so
something needs some help.

We get from wait_consider_task() as follows: wait_consider_task()
calls security_task_wait(), which calls either selinux_task_wait() or
smack_task_wait().  Let's start with selinux_task_wait(), which calls
task_has_perm(), which does a couple of __task_cred(), but does so
under rcu_read_lock() protection, so presumably we didn't come in
that way.

Dave, am I missing a code path somewhere here?

--

From: David Howells
Date: Monday, August 16, 2010 - 1:01 pm

The attached patch should fix this.

David
---
From: Daniel J Blueman <daniel.blueman@gmail.com>
Subject: [PATCH] Fix unprotected access to task credentials in waitid()

Using a program like the following:

	#include <stdlib.h>
	#include <unistd.h>
	#include <sys/types.h>
	#include <sys/wait.h>

	int main() {
		id_t id;
		siginfo_t infop;
		pid_t res;

		id = fork();
		if (id == 0) { sleep(1); exit(0); }
		kill(id, SIGSTOP);
		alarm(1);
		waitid(P_PID, id, &infop, WCONTINUED);
		return 0;
	}

to call waitid() on a stopped process results in access to the child task's
credentials without the RCU read lock being held - which may be replaced in the
meantime - eliciting the following warning:

	===================================================
	[ INFO: suspicious rcu_dereference_check() usage. ]
	---------------------------------------------------
	kernel/exit.c:1460 invoked rcu_dereference_check() without protection!

	other info that might help us debug this:

	rcu_scheduler_active = 1, debug_locks = 1
	2 locks held by waitid02/22252:
	 #0:  (tasklist_lock){.?.?..}, at: [<ffffffff81061ce5>] do_wait+0xc5/0x310
	 #1:  (&(&sighand->siglock)->rlock){-.-...}, at: [<ffffffff810611da>]
	wait_consider_task+0x19a/0xbe0

	stack backtrace:
	Pid: 22252, comm: waitid02 Not tainted 2.6.35-323cd+ #3
	Call Trace:
	 [<ffffffff81095da4>] lockdep_rcu_dereference+0xa4/0xc0
	 [<ffffffff81061b31>] wait_consider_task+0xaf1/0xbe0
	 [<ffffffff81061d15>] do_wait+0xf5/0x310
	 [<ffffffff810620b6>] sys_waitid+0x86/0x1f0
	 [<ffffffff8105fce0>] ? child_wait_callback+0x0/0x70
	 [<ffffffff81003282>] system_call_fastpath+0x16/0x1b

This is fixed by holding the RCU read lock in wait_task_continued() to ensure
that the task's current credentials aren't destroyed between us reading the
cred pointer and us reading the UID from those credentials.

Furthermore, protect wait_task_stopped() in the same way.

We don't need to keep holding the RCU read lock once we've read the UID ...
From: Oleg Nesterov
Date: Tuesday, August 17, 2010 - 11:09 am

Acked-by: Oleg Nesterov <oleg@redhat.com>


Vegard, perhaps you can take care of wait_task_zombie() too?
Of course, if you wish.

Like wait_task_stopped(), it doesn't really need RCU lock. But if
we change it to use task_uid() to shut up rcu_dereference_check(),
then we can kill task_is_dead().

Oleg.

--

Previous thread: [RFC PATCHSET block#for-2.6.36-post] block: convert to REQ_FLUSH/FUA by Tejun Heo on Monday, August 16, 2010 - 9:51 am. (27 messages)

Next thread: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock? by Steven Rostedt on Monday, August 16, 2010 - 10:07 am. (13 messages)