The mm-of-the-moment snapshot 2010-08-11-16-10 has been uploaded to http://userweb.kernel.org/~akpm/mmotm/ and will soon be available at git://zen-kernel.org/kernel/mmotm.git It contains the following patches against ...
Throws a RCU complaint. Hopefully somebody on the cc: list knows what it is about...
[ 0.026136] CPU0: Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz stepping 0a
[ 0.028399] NMI watchdog enabled, takes one hw-pmu counter.
[ 0.030019] lockdep: fixing up alternatives.
[ 0.031178]
[ 0.031179] ===================================================
[ 0.031182] [ INFO: suspicious rcu_dereference_check() usage. ]
[ 0.031184] ---------------------------------------------------
[ 0.031187] kernel/sched.c:618 invoked rcu_dereference_check() without protection!
[ 0.031189]
[ 0.031189] other info that might help us debug this:
[ 0.031190]
[ 0.031192]
[ 0.031193] rcu_scheduler_active = 1, debug_locks = 1
[ 0.031195] 3 locks held by kworker/0:0/4:
[ 0.031197] #0: (events){+.+.+.}, at: [<ffffffff810504ca>] process_one_work+0x1b6/0x37d
[ 0.031210] #1: ((&c_idle.work)){+.+.+.}, at: [<ffffffff810504ca>] process_one_work+0x1b6/0x37d
[ 0.031217] #2: (&rq->lock){-.-...}, at: [<ffffffff81b5f9b8>] init_idle+0x2b/0x114
[ 0.031225]
[ 0.031226] stack backtrace:
[ 0.031229] Pid: 4, comm: kworker/0:0 Not tainted 2.6.35-mmotm0811 #1
[ 0.031232] Call Trace:
[ 0.031237] [<ffffffff810661eb>] lockdep_rcu_dereference+0x9d/0xa5
[ 0.031242] [<ffffffff8102b751>] task_group+0x7b/0x8a
[ 0.031246] [<ffffffff81b5f9b8>] ? init_idle+0x2b/0x114
[ 0.031250] [<ffffffff8102b775>] set_task_rq+0x15/0x6e
[ 0.031253] [<ffffffff81b5fa5e>] init_idle+0xd1/0x114
[ 0.031257] [<ffffffff81b5fb44>] fork_idle+0x8e/0x9d
[ 0.031261] [<ffffffff81b5de6f>] do_fork_idle+0x17/0x28
[ 0.031265] [<ffffffff8105052b>] process_one_work+0x217/0x37d
[ 0.031269] [<ffffffff810504ca>] ? process_one_work+0x1b6/0x37d
[ 0.031273] [<ffffffff81b5de58>] ? do_fork_idle+0x0/0x28
[ 0.031277] [<ffffffff81051775>] worker_thread+0x17e/0x251
[ 0.031281] [<ffffffff810515f7>] ? worker_thread+0x0/0x251
[ 0.031285] [<ffffffff8105544a>] ...Hello, Valdis!
Interesting! My first thought was that this is a false positive, given
that lockdep_is_held(&task_rq(p)->lock) is one of the arguments to
task_subsys_state_check() and thus to rcu_dereference_check(). However...
Given the "lockdep: fixing up alternatives" above, we know that cpu==1,
and that the code is running on CPU 0.
So init_idle() acquires the specified CPU's runqueue lock:
struct rq *rq = cpu_rq(cpu);
...
raw_spin_lock_irqsave(&rq->lock, flags);
Then init_idle() goes on to initialize a number of fields in the
new idle task's task structure, then calls __set_task_cpu() to set
up the new idle task on the specified CPU.
Now, __set_task_cpu() invokes set_task_rq(), which invokes task_group(),
which as mentioned before specifies lockdep_is_held(&task_rq(p)->lock)
as one of the splat-avoiding conditions. But the new idle task does
not yet have its current CPU set to CPU 1 -- that doesn't happen until
the end of __set_task_cpu(). Therefore, task_rq(p) will return 0.
So, if I am reading the code correctly, task_group() will be checking
for CPU 0's runqueue, when we are instead holding CPU 1's runqueue lock.
The patch below fixes this by acquiring both locks, as is done during
task migration. Untested, probably does not even compile.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
sched.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index 70fa78d..81a6a0a 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5314,9 +5314,11 @@ void __cpuinit init_idle_bootup_task(struct task_struct *idle)
void __cpuinit init_idle(struct task_struct *idle, int cpu)
{
struct rq *rq = cpu_rq(cpu);
+ struct rq *oldrq = task_rq(idle);
unsigned long flags;
- raw_spin_lock_irqsave(&rq->lock, flags);
+ local_irq_save(flags);
+ double_rq_lock(oldrq, rq);
__sched_fork(idle);
idle->state = TASK_RUNNING;
@@ -5329,7 +5331,8 @@ void __cpuinit init_idle(struct ...I'm still seeing this INFO message on my vanilla 2.6.36-rc kernel.
----------------------
ftrace: converting mcount calls to 0f 1f 44 00 00
ftrace: allocating 16045 entries in 63 pages
Setting APIC routing to flat
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz stepping 0a
NMI watchdog enabled, takes one hw-pmu counter.
lockdep: fixing up alternatives.
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/sched.c:618 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
3 locks held by kworker/0:0/4:
#0: (events){+.+.+.}, at: [<ffffffff8106e78e>] process_one_work+0x12e/0x560
#1: ((&c_idle.work)){+.+.+.}, at: [<ffffffff8106e78e>]
process_one_work+0x12e/0x560
#2: (&rq->lock){......}, at: [<ffffffff814772c3>] init_idle+0x30/0x12c
stack backtrace:
Pid: 4, comm: kworker/0:0 Not tainted 2.6.36-rc6-00085-g6e34025 #1
Call Trace:
[<ffffffff81089c1b>] lockdep_rcu_dereference+0xbb/0xc0
[<ffffffff8103e7d5>] set_task_rq+0x2f5/0x300
[<ffffffff81477374>] init_idle+0xe1/0x12c
[<ffffffff81477769>] fork_idle+0x90/0x9f
[<ffffffff81048dfa>] ? enqueue_entity+0x13a/0x430
[<ffffffff81482789>] ? sub_preempt_count+0x59/0x60
[<ffffffff814752f5>] do_fork_idle+0x1c/0x2d
[<ffffffff8106e7fa>] process_one_work+0x19a/0x560
[<ffffffff8106e78e>] ? process_one_work+0x12e/0x560
[<ffffffff814752d9>] ? do_fork_idle+0x0/0x2d
[<ffffffff81070189>] worker_thread+0x169/0x340
[<ffffffff81070020>] ? worker_thread+0x0/0x340
[<ffffffff810752e6>] kthread+0xa6/0xb0
[<ffffffff81004014>] kernel_thread_helper+0x4/0x10
[<ffffffff8147efcb>] ? _raw_spin_unlock_irq+0x3b/0x60
[<ffffffff8147f600>] ? restore_args+0x0/0x30
[<ffffffff81075240>] ? kthread+0x0/0xb0
[<ffffffff81004010>] ? kernel_thread_helper+0x0/0x10
Booting Node 0, Processors ...Hello, Zdenek,
I believe that the following patch from Peter Z. should address this.
Thanx, Paul
------------------------------------------------------------------------
commit e3dd67d97b3c2aad366b845c797745a78efaf90d
Author: Peter Zijlstra <peterz@infradead.org>
Date: Thu Sep 16 17:50:31 2010 +0200
sched: fix RCU lockdep splat from task_group()
This addresses the following RCU lockdep splat:
[0.051203] CPU0: AMD QEMU Virtual CPU version 0.12.4 stepping 03
[0.052999] lockdep: fixing up alternatives.
[0.054105]
[0.054106] ===================================================
[0.054999] [ INFO: suspicious rcu_dereference_check() usage. ]
[0.054999] ---------------------------------------------------
[0.054999] kernel/sched.c:616 invoked rcu_dereference_check() without protection!
[0.054999]
[0.054999] other info that might help us debug this:
[0.054999]
[0.054999]
[0.054999] rcu_scheduler_active = 1, debug_locks = 1
[0.054999] 3 locks held by swapper/1:
[0.054999] #0: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff814be933>] cpu_up+0x42/0x6a
[0.054999] #1: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810400d8>] cpu_hotplug_begin+0x2a/0x51
[0.054999] #2: (&rq->lock){-.-...}, at: [<ffffffff814be2f7>] init_idle+0x2f/0x113
[0.054999]
[0.054999] stack backtrace:
[0.054999] Pid: 1, comm: swapper Not tainted 2.6.35 #1
[0.054999] Call Trace:
[0.054999] [<ffffffff81068054>] lockdep_rcu_dereference+0x9b/0xa3
[0.054999] [<ffffffff810325c3>] task_group+0x7b/0x8a
[0.054999] [<ffffffff810325e5>] set_task_rq+0x13/0x40
[0.054999] [<ffffffff814be39a>] init_idle+0xd2/0x113
[0.054999] [<ffffffff814be78a>] fork_idle+0xb8/0xc7
[0.054999] [<ffffffff81068717>] ? mark_held_locks+0x4d/0x6b
[0.054999] [<ffffffff814bcebd>] do_fork_idle+0x17/0x2b
[0.054999] [<ffffffff814bc89b>] native_cpu_up+0x1c1/0x724
[0.054999] [<ffffffff814bcea6>] ? ...I get a similar lockdep splat, even with that patch applied, so I think it is not completely fixed. I'm using wireless-testing, which is based on 2.6.36-rc6. See my previous email: http://groups.google.com/group/linux.kernel/browse_thread/thread/fcef23494cfda353 Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com --
I'm using kernel patched with this patch - but I still get this error
- though at different place:
(not really sure how it is related - but of course the RCU complain
disappeared during boot).
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/sched.c:618 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
1 lock held by make/6137:
#0: (&rq->lock){......}, at: [<ffffffff810487d7>] task_fork_fair+0x67/0x180
stack backtrace:
Pid: 6137, comm: make Not tainted 2.6.36-rc8-00024-ga7ac73b #6
Call Trace:
[<ffffffff81089b7b>] lockdep_rcu_dereference+0xbb/0xc0
[<ffffffff8103e605>] set_task_rq+0x2f5/0x300
[<ffffffff810488db>] task_fork_fair+0x16b/0x180
[<ffffffff8104b634>] sched_fork+0xe4/0x280
[<ffffffff8104fa55>] copy_process+0x6e5/0x13d0
[<ffffffff81119809>] ? __do_fault+0x3b9/0x4b0
[<ffffffff810507fb>] do_fork+0x8b/0x490
[<ffffffff8111d6b6>] ? handle_mm_fault+0x196/0xa90
[<ffffffff8147dc2d>] ? retint_swapgs+0xe/0x13
[<ffffffff8147dc2d>] ? retint_swapgs+0xe/0x13
[<ffffffff8100c395>] sys_vfork+0x25/0x30
[<ffffffff81003583>] stub_vfork+0x13/0x20
[<ffffffff810031db>] ? system_call_fastpath+0x16/0x1b
loop: module loaded
Zdenek
--
OK, so this one requires that we either be in an rcu_read_lock() critical section, that we hold the task's alloc_lock(), that the cgroup_mutex is held, or that the runqueue lock for the CPU that the task last ran on is held. Unfortunately, we are creating the task, so there is no last CPU that it ran on, thus confusing the check. But this looks -really- familiar... Could you please apply commit b0a0f667, which hit mainline after 2.6.36? Thanx, Paul --
From: Randy Dunlap <randy.dunlap@oracle.com> Minimal patch to fix mmc to build when CONFIG_PM is not enabled: (.text+0x128fcd): undefined reference to `mmc_pm_notify' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> --- drivers/mmc/core/host.c | 2 ++ 1 file changed, 2 insertions(+) Seems to be needed in mainline also. --- mmotm-2010-0811-1610.orig/drivers/mmc/core/host.c +++ mmotm-2010-0811-1610/drivers/mmc/core/host.c @@ -86,7 +86,9 @@ struct mmc_host *mmc_alloc_host(int extr init_waitqueue_head(&host->wq); INIT_DELAYED_WORK(&host->detect, mmc_rescan); INIT_DELAYED_WORK_DEFERRABLE(&host->disable, mmc_host_deeper_disable); +#ifdef CONFIG_PM host->pm_notify.notifier_call = mmc_pm_notify; +#endif /* * By default, hosts do not support SGIO or large requests. --
I sent the same patch[1] a few hours later than Randy, mine got Acked-by: Kukjin Kim <kgene.kim@samsung.com> Acked-by: Maxim Levitsky <maximlevitsky@gmail.com> . I think it's fine to add these to Randy's patch, too, together with my Ack. Maybe add a reference to the breaking commit as my patch did? Thanks Uwe [1] http://mid.gmane.org/1281691473-15481-1-git-send-email-u.kleine-koenig@pengutronix.de -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | http://www.pengutronix.de/ | --
Not sure if it's an e1000e bug, or an iptables bug that happened to trip on
like the first few packets accepted after the interface came up, so I'll cc:
everybody and let ya'll fight over it. :)
[ 431.011194] e1000e 0000:00:19.0: irq 46 for MSI/MSI-X
[ 431.062183] e1000e 0000:00:19.0: irq 46 for MSI/MSI-X
[ 431.064607] ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 432.691161] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
[ 432.691177] e1000e 0000:00:19.0: eth0: 10/100 speed: disabling TSO
[ 432.695461] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 432.697278]
[ 432.697279] =================================
[ 432.697477] [ INFO: inconsistent lock state ]
[ 432.697581] 2.6.35-mmotm0811 #1
[ 432.697682] ---------------------------------
[ 432.697785] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 432.697890] kworker/0:0/0 [HC0[0]:SC1[2]:HE1:SE0] takes:
[ 432.697994] (&(&lock->lock)->rlock){+.?...}, at: [<ffffffff814d39c8>] ip6t_do_table+0xc1/0x646
[ 432.698028] {SOFTIRQ-ON-W} state was registered at:
[ 432.698028] [<ffffffff8106762f>] __lock_acquire+0x3a3/0xd6a
[ 432.698028] [<ffffffff81068514>] lock_acquire+0x10a/0x130
[ 432.698028] [<ffffffff81557cd9>] _raw_spin_lock+0x36/0x45
[ 432.698028] [<ffffffff814d3375>] xt_info_wrlock+0x1c/0x1e
[ 432.698028] [<ffffffff814d48df>] get_counters+0x93/0x14a
[ 432.698028] [<ffffffff814d49c3>] alloc_counters.clone.3+0x2d/0x41
[ 432.698028] [<ffffffff814d4f98>] do_ip6t_get_ctl+0x110/0x367
[ 432.698028] [<ffffffff814402a7>] nf_sockopt+0x5c/0x88
[ 432.698028] [<ffffffff814402e6>] nf_getsockopt+0x13/0x15
[ 432.698028] [<ffffffff814ba05e>] ipv6_getsockopt+0x94/0xc3
[ 432.698028] [<ffffffff814c1175>] rawv6_getsockopt+0x48/0x54
[ 432.698028] [<ffffffff8141533a>] sock_common_getsockopt+0xf/0x11
[ 432.698028] [<ffffffff814147dd>] sys_getsockopt+0x75/0x93
[ 432.698028] [<ffffffff8100272b>] system_call_fastpath+0x16/0x1b
[ 432.698028] irq event stamp: ...Something appears to be borked in the ALSA arena. There's no actual volume coming out of the system, and 'alsamixer' is insisting that the volume slider only goes from 0 to 10% or so, no further. However, experimentation shows that the volume slider in 'xine' *does* affect the 'Amp-Out vals' lines, and alsamixer has *no* effect on what 'Amp-Out vals' lists. A diff of alsa-info.sh for the two kernels shows them being identical, so I'm only attaching one copy. It may be the weekend before I find time to do a bisection of this.
At Thu, 12 Aug 2010 14:59:32 -0400,
Hm, there shouldn't be a change regarding the volume control.
There can be an issue about PCM stream, e.g. in commit
eb541337b7a43822fce7d0c9d967ee149b2d9a96
ALSA: hda - Make converter setups sticky
To be sure, could you try to revert it? If it doesn't help but still
you get strange volume behavior, please get alsa-info.sh output at
different volume levels for comparison.
thanks,
Takashi
--
Didn't you (like some other people) get into the state where pulseaudio doesn't work? It chooses as an output a dummy driver automatically, then you can change volume, play sound, but actually it all goes to /dev/null. It took me a while before I figured out that it's a "dummy" driver I have in pulseaudio. regards, -- js --
At Thu, 12 Aug 2010 23:06:22 +0200, Looks like there is a breakage regarding open/close due to fs/notify/* changes. I guess you can hear still sounds like: % aplay -Dplughw foo.wav Takashi --
Confirming that Linus's patch fixes it for me:
commit 2069601b3f0ea38170d4b509b89f3ca0a373bdc1
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Thu Aug 12 14:23:04 2010 -0700
Revert "fsnotify: store struct file not struct path"
This reverts commit 3bcf3860a4ff9bbc522820b4b765e65e4deceb3e (and the
accompanying commit c1e5c954020e "vfs/fsnotify: fsnotify_close can delay
the final work in fput" that was a horribly ugly hack to make it work at
all).
