Hi,
I was testing your VFS scaling tree*, but I get the following in my dmesg:
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/exit.c:1387 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
2 locks held by bash/2878:
#0: (tasklist_lock){.+.+..}, at: [<ffffffff8103a275>] do_wait+0xb3/0x200
#1: (&(&sighand->siglock)->rlock){-.....}, at: [<ffffffff810399e9>]
wait_consider_task+0x2ca/0xaa3
stack backtrace:
Pid: 2878, comm: bash Not tainted 2.6.35-00079-g4067b85 #7
Call Trace:
[<ffffffff8105e9f2>] lockdep_rcu_dereference+0x9b/0xa6
[<ffffffff8103a036>] wait_consider_task+0x917/0xaa3
[<ffffffff8102cf36>] ? add_preempt_count+0xc2/0xc9
[<ffffffff8103a29d>] do_wait+0xdb/0x200
[<ffffffff8103a45b>] sys_wait4+0x99/0xd3
[<ffffffff810387e5>] ? child_wait_callback+0x0/0x5d
[<ffffffff8109be66>] ? might_fault+0x96/0xbc
[<ffffffff810737a6>] compat_sys_wait4+0x8a/0xc1
[<ffffffff8109be83>] ? might_fault+0xb3/0xbc
[<ffffffff8109be38>] ? might_fault+0x68/0xbc
[<ffffffff815d02c3>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff8102862c>] sys32_waitpid+0xb/0xd
[<ffffffff81027bb4>] sysenter_dispatch+0x7/0x30
[<ffffffff815d0284>] ? trace_hardirqs_on_thunk+0x3a/0x3f
I don't think it's related to the VFS patches, however, so I added
some other (maybe relevant) CCs too.
At the time I was simply invoking gcc from bash.
Vegard
* git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin.git
branch vfs-scale
--
Nick Piggin might update his git config because the commits in the
vfs-scale tree appear to bear an invalid e-mail address. Please reply
to this e-mail instead. Thanks.
Vegard
---------- Forwarded message ----------
From: Vegard Nossum <vegard.nossum@gmail.com>
Date: 16 August 2010 18:57
Subject: 2.6.35+vfs-scale: INFO: suspicious rcu_dereference_check()
usage (kernel/exit.c:1387)
To: Nick Piggin <npiggin@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, David Howells
<dhowells@redhat.com>, "Paul E. McKenney"
<paulmck@linux.vnet.ibm.com>, LKML <linux-kernel@vger.kernel.org>
Hi,
I was testing your VFS scaling tree*, but I get the following in my dmesg:
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/exit.c:1387 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
2 locks held by bash/2878:
#0: (tasklist_lock){.+.+..}, at: [<ffffffff8103a275>] do_wait+0xb3/0x200
#1: (&(&sighand->siglock)->rlock){-.....}, at: [<ffffffff810399e9>]
wait_consider_task+0x2ca/0xaa3
stack backtrace:
Pid: 2878, comm: bash Not tainted 2.6.35-00079-g4067b85 #7
Call Trace:
[<ffffffff8105e9f2>] lockdep_rcu_dereference+0x9b/0xa6
[<ffffffff8103a036>] wait_consider_task+0x917/0xaa3
[<ffffffff8102cf36>] ? add_preempt_count+0xc2/0xc9
[<ffffffff8103a29d>] do_wait+0xdb/0x200
[<ffffffff8103a45b>] sys_wait4+0x99/0xd3
[<ffffffff810387e5>] ? child_wait_callback+0x0/0x5d
[<ffffffff8109be66>] ? might_fault+0x96/0xbc
[<ffffffff810737a6>] compat_sys_wait4+0x8a/0xc1
[<ffffffff8109be83>] ? might_fault+0xb3/0xbc
[<ffffffff8109be38>] ? might_fault+0x68/0xbc
[<ffffffff815d02c3>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff8102862c>] sys32_waitpid+0xb/0xd
[<ffffffff81027bb4>] sysenter_dispatch+0x7/0x30
[<ffffffff815d0284>] ? trace_hardirqs_on_thunk+0x3a/0x3f
I don't think it's ...On the third try it will hopefully work.
Vegard
---------- Forwarded message ----------
From: Vegard Nossum <vegard.nossum@gmail.com>
Date: 16 August 2010 19:01
Subject: Re: 2.6.35+vfs-scale: INFO: suspicious
rcu_dereference_check() usage (kernel/exit.c:1387)
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, David Howells
<dhowells@redhat.com>, "Paul E. McKenney"
<paulmck@linux.vnet.ibm.com>, LKML <linux-kernel@vger.kernel.org>
Nick Piggin might update his git config because the commits in the
vfs-scale tree appear to bear an invalid e-mail address. Please reply
to this e-mail instead. Thanks.
Vegard
---------- Forwarded message ----------
From: Vegard Nossum <vegard.nossum@gmail.com>
Date: 16 August 2010 18:57
Subject: 2.6.35+vfs-scale: INFO: suspicious rcu_dereference_check()
usage (kernel/exit.c:1387)
To: Nick Piggin <npiggin@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, David Howells
<dhowells@redhat.com>, "Paul E. McKenney"
<paulmck@linux.vnet.ibm.com>, LKML <linux-kernel@vger.kernel.org>
Hi,
I was testing your VFS scaling tree*, but I get the following in my dmesg:
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/exit.c:1387 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
2 locks held by bash/2878:
#0: (tasklist_lock){.+.+..}, at: [<ffffffff8103a275>] do_wait+0xb3/0x200
#1: (&(&sighand->siglock)->rlock){-.....}, at: [<ffffffff810399e9>]
wait_consider_task+0x2ca/0xaa3
stack backtrace:
Pid: 2878, comm: bash Not tainted 2.6.35-00079-g4067b85 #7
Call Trace:
[<ffffffff8105e9f2>] lockdep_rcu_dereference+0x9b/0xa6
[<ffffffff8103a036>] wait_consider_task+0x917/0xaa3
[<ffffffff8102cf36>] ? add_preempt_count+0xc2/0xc9
[<ffffffff8103a29d>] do_wait+0xdb/0x200
[<ffffffff8103a45b>] ...Thank you for locating this! OK, so we aren't holding rcu_read_lock(), nor is this task dead, so something needs some help. We get from wait_consider_task() as follows: wait_consider_task() calls security_task_wait(), which calls either selinux_task_wait() or smack_task_wait(). Let's start with selinux_task_wait(), which calls task_has_perm(), which does a couple of __task_cred(), but does so under rcu_read_lock() protection, so presumably we didn't come in that way. Dave, am I missing a code path somewhere here? --
The attached patch should fix this.
David
---
From: Daniel J Blueman <daniel.blueman@gmail.com>
Subject: [PATCH] Fix unprotected access to task credentials in waitid()
Using a program like the following:
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main() {
id_t id;
siginfo_t infop;
pid_t res;
id = fork();
if (id == 0) { sleep(1); exit(0); }
kill(id, SIGSTOP);
alarm(1);
waitid(P_PID, id, &infop, WCONTINUED);
return 0;
}
to call waitid() on a stopped process results in access to the child task's
credentials without the RCU read lock being held - which may be replaced in the
meantime - eliciting the following warning:
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/exit.c:1460 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
2 locks held by waitid02/22252:
#0: (tasklist_lock){.?.?..}, at: [<ffffffff81061ce5>] do_wait+0xc5/0x310
#1: (&(&sighand->siglock)->rlock){-.-...}, at: [<ffffffff810611da>]
wait_consider_task+0x19a/0xbe0
stack backtrace:
Pid: 22252, comm: waitid02 Not tainted 2.6.35-323cd+ #3
Call Trace:
[<ffffffff81095da4>] lockdep_rcu_dereference+0xa4/0xc0
[<ffffffff81061b31>] wait_consider_task+0xaf1/0xbe0
[<ffffffff81061d15>] do_wait+0xf5/0x310
[<ffffffff810620b6>] sys_waitid+0x86/0x1f0
[<ffffffff8105fce0>] ? child_wait_callback+0x0/0x70
[<ffffffff81003282>] system_call_fastpath+0x16/0x1b
This is fixed by holding the RCU read lock in wait_task_continued() to ensure
that the task's current credentials aren't destroyed between us reading the
cred pointer and us reading the UID from those credentials.
Furthermore, protect wait_task_stopped() in the same way.
We don't need to keep holding the RCU read lock once we've read the UID ...Acked-by: Oleg Nesterov <oleg@redhat.com> Vegard, perhaps you can take care of wait_task_zombie() too? Of course, if you wish. Like wait_task_stopped(), it doesn't really need RCU lock. But if we change it to use task_uid() to shut up rcu_dereference_check(), then we can kill task_is_dead(). Oleg. --
