Hi,
I just ran the following on -rc4 with the latest pending sparc pull
request manually merged (no -tip bits):
perf record -a -f -g sleep 5
And I got this warning:
[ 2235.846071] ===================================================
[ 2235.857419] [ INFO: suspicious rcu_dereference_check() usage. ]
[ 2235.863127] ---------------------------------------------------
[ 2235.868734] kernel/perf_event.c:2232 invoked rcu_dereference_check() without protection!
[ 2235.879843]
[ 2235.879848] other info that might help us debug this:
[ 2235.879855]
[ 2235.895659]
[ 2235.895663] rcu_scheduler_active = 1, debug_locks = 0
[ 2235.905505] 1 lock held by swapper/0:
[ 2235.910333] #0: (&ctx->lock){-.....}, at: [<00000000004d9e00>] __perf_event_enable+0x60/0x1e8
[ 2235.920430]
[ 2235.920434] stack backtrace:
[ 2235.929691] Call Trace:
[ 2235.934143] [000000000049fca0] lockdep_rcu_dereference+0x88/0x98
[ 2235.938700] [00000000004d9304] perf_event_update_userpage+0x74/0x16c
[ 2235.943221] [000000000044c148] sparc_perf_event_set_period+0xb4/0xc4
[ 2235.947593] [000000000044c238] perf_event_nmi_handler+0xe0/0x190
[ 2235.951801] [0000000000494924] notifier_call_chain+0x2c/0x84
[ 2235.956112] [0000000000495008] __atomic_notifier_call_chain+0x54/0x9c
[ 2235.960493] [0000000000495070] atomic_notifier_call_chain+0x20/0x34
[ 2235.964830] [00000000004950a8] notify_die+0x24/0x38
[ 2235.969175] [000000000043a650] perfctr_irq+0xa4/0x3f4
[ 2235.973493] [00000000004209f4] tl0_irq15+0x14/0x20
[ 2235.977744] [0000000000406920] sun4v_niagara2_setperf+0x8/0x28
[ 2235.982265] [000000000044c448] hw_perf_enable+0x160/0x170
[ 2235.986494] [00000000004d89b4] perf_enable+0x3c/0x4c
[ 2235.990513] [00000000004d9f2c] __perf_event_enable+0x18c/0x1e8
[ 2235.994665] [00000000004aad5c] generic_smp_call_function_single_interrupt+0x114/0x16c
[ 2236.002975] [0000000000764770] smp_call_function_single_client+0x20/0x30
This is weird as everything looks fine there, it seems we have
a well ...As far as lockdep is concerned, we are not in an RCU read-side critical Very strange indeed. One question -- is it possible that your kernel does not match your source code? I freely admit that it is extremely unlikely that such a mismatch would happen to land on an rcu_dereference(), but... Another unlikely possibility is that an interrupt occurred between the rcu_read_lock() and the rcu_dereference(), and that this interrupt Color me confused!!! --
No, for example I just found the same problem in x86 in -tip:
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/perf_event.c:2236 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
2 locks held by perf/3466:
#0: (&ctx->mutex){+.+...}, at: [<c10bc567>] sys_perf_event_open+0x2a7/0x420
#1: (&ctx->lock){-.....}, at: [<c10b940f>] __perf_install_in_context+0x6f/0x160
stack backtrace:
Pid: 3466, comm: perf Not tainted 2.6.34-rc3-atom #411
Call Trace:
[<c150f95f>] ? printk+0x1d/0x1f
[<c1075f8a>] lockdep_rcu_dereference+0xaa/0xb0
[<c10b8c01>] perf_event_update_userpage+0x151/0x190
[<c10b8ab0>] ? perf_event_update_userpage+0x0/0x190
[<c1010931>] x86_perf_event_set_period+0x101/0x1d0
[<c1010cf2>] intel_pmu_save_and_restart+0x12/0x20
[<c1013743>] intel_pmu_handle_irq+0x1d3/0x4e0
[<c1069b08>] ? sched_clock_cpu+0x128/0x170
[<c1074e8b>] ? trace_hardirqs_off+0xb/0x10
[<c1069b9f>] ? cpu_clock+0x4f/0x60
[<c1074e8b>] ? trace_hardirqs_off+0xb/0x10
[<c1069b9f>] ? cpu_clock+0x4f/0x60
[<c1078105>] ? __lock_acquire+0x1c5/0x1900
[<c1069942>] ? sched_clock_local+0xd2/0x170
[<c100f180>] perf_event_nmi_handler+0x40/0x50
[<c1068885>] notifier_call_chain+0x35/0x70
[<c1068eec>] __atomic_notifier_call_chain+0x6c/0xb0
[<c1068e80>] ? __atomic_notifier_call_chain+0x0/0xb0
[<c1068f4f>] atomic_notifier_call_chain+0x1f/0x30
[<c1068f8d>] notify_die+0x2d/0x30
[<c100428c>] do_nmi+0x16c/0x350
[<c1074f36>] ? lock_release_holdtime+0xa6/0x1a0
[<c151458d>] nmi_stack_correct+0x28/0x2d
[<c10104cc>] ? intel_pmu_enable_all+0x8c/0x110
[<c1010c5a>] hw_perf_enable+0x1ba/0x240
[<c10b7df5>] perf_enable+0x25/0x30
[<c10b94b7>] __perf_install_in_context+0x117/0x160
[<c10807f6>] smp_call_function_single+0x76/0x170
[<c10b93a0>] ? __perf_install_in_context+0x0/0x160
[<c10bb34d>] ...From: Frederic Weisbecker <fweisbec@gmail.com> Yeah, I bet that's it too. lock_is_held() can't return anything meaningful while lockdep is disabled, which it is during NMIs. --
Ah! So I just need to add a "current->lockdep_recursion"
check to debug_lockdep_rcu_enabled(). And move the function to
kernel/rcutree_plugin.h to avoid #include hell.
See below for (untested) patch.
Thanx, Paul
------------------------------------------------------------------------
include/linux/rcupdate.h | 5 +----
kernel/rcutree_plugin.h | 11 +++++++++++
2 files changed, 12 insertions(+), 4 deletions(-)
commit 304d8da6cd791a81ce3164f867e1b3ef4f9af1d1
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date: Tue Apr 13 18:45:51 2010 -0700
rcu: Make RCU lockdep check the lockdep_recursion variable
The lockdep facility temporarily disables lockdep checking by incrementing
the current->lockdep_recursion variable. Such disabling happens in NMIs
and in other situations where lockdep might expect to recurse on itself.
This patch therefore checks current->lockdep_recursion, disabling RCU
lockdep splats when this variable is non-zero.
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 9f1ddfe..07db2fe 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -101,10 +101,7 @@ extern struct lockdep_map rcu_sched_lock_map;
# define rcu_read_release_sched() \
lock_release(&rcu_sched_lock_map, 1, _THIS_IP_)
-static inline int debug_lockdep_rcu_enabled(void)
-{
- return likely(rcu_scheduler_active && debug_locks);
-}
+extern int debug_lockdep_rcu_enabled(void);
/**
* rcu_read_lock_held - might we be in RCU read-side critical section?
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 79b53bd..2169abe 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -1067,3 +1067,14 @@ static void rcu_needs_cpu_flush(void)
}
#endif /* #else #if ...From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Looks good to me, Frederic please test :-) --
Good point. And to add injury to insult, I forgot EXPORT_SYMBOL_GPL(). Updated patch in the works. Thanx, Paul --
Note I just tested the patch the previous one and it looks fine now. You can then safely consider the "general idea" fixes the problem :) Thanks. --
Thank you, Frederic!!! Thanx, Paul --
And here is what I hope is the official fix.
Could you please test it?
Thanx, Paul
------------------------------------------------------------------------
commit 9be39c445a41e458d53cf9a57d25dbfa4b74c970
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date: Tue Apr 13 18:45:51 2010 -0700
rcu: Make RCU lockdep check the lockdep_recursion variable
The lockdep facility temporarily disables lockdep checking by incrementing
the current->lockdep_recursion variable. Such disabling happens in NMIs
and in other situations where lockdep might expect to recurse on itself.
This patch therefore checks current->lockdep_recursion, disabling RCU
lockdep splats when this variable is non-zero. In addition, this patch
removes the "likely()", as suggested by Lai Jiangshan.
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 9f1ddfe..07db2fe 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -101,10 +101,7 @@ extern struct lockdep_map rcu_sched_lock_map;
# define rcu_read_release_sched() \
lock_release(&rcu_sched_lock_map, 1, _THIS_IP_)
-static inline int debug_lockdep_rcu_enabled(void)
-{
- return likely(rcu_scheduler_active && debug_locks);
-}
+extern int debug_lockdep_rcu_enabled(void);
/**
* rcu_read_lock_held - might we be in RCU read-side critical section?
diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index 63fe254..03a7ea1 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -69,6 +69,13 @@ EXPORT_SYMBOL_GPL(rcu_scheduler_active);
#ifdef CONFIG_DEBUG_LOCK_ALLOC
+int debug_lockdep_rcu_enabled(void)
+{
+ return rcu_scheduler_active && debug_locks &&
+ current->lockdep_recursion == 0;
+}
+EXPORT_SYMBOL_GPL(debug_lockdep_rcu_enabled);
+
/**
* ...Tested-by: Frederic Weisbecker <fweisbec@gmail.com> Thanks! --
Thank you, Frederic! Thanx, Paul --
