cgroup_mm_owner_callbacks() can be called with task_lock() held in
mm_update_next_owner(), and all the .mm_owner_changed callbacks seem to
be *always* called with task_lock() held.
Actually, memrlimit is using task_lock() via get_task_mm() in
memrlimit_cgroup_mm_owner_changed(), raising the following recursive locking
trace:
[ 5346.421365] =============================================
[ 5346.421374] [ INFO: possible recursive locking detected ]
[ 5346.421381] 2.6.27-rc5-mm1 #20
[ 5346.421385] ---------------------------------------------
[ 5346.421391] interbench/10530 is trying to acquire lock:
[ 5346.421396] (&p->alloc_lock){--..}, at: [<ffffffff8023b034>] get_task_mm+0x24/0x70
[ 5346.421417]
[ 5346.421418] but task is already holding lock:
[ 5346.421423] (&p->alloc_lock){--..}, at: [<ffffffff8023db98>] mm_update_next_owner+0x148/0x230
[ 5346.421438]
[ 5346.421440] other info that might help us debug this:
[ 5346.421446] 2 locks held by interbench/10530:
[ 5346.421450] #0: (&mm->mmap_sem){----}, at: [<ffffffff8023db90>] mm_update_next_owner+0x140/0x230
[ 5346.421467] #1: (&p->alloc_lock){--..}, at: [<ffffffff8023db98>] mm_update_next_owner+0x148/0x230
[ 5346.421483]
[ 5346.421485] stack backtrace:
[ 5346.421491] Pid: 10530, comm: interbench Not tainted 2.6.27-rc5-mm1 #20
[ 5346.421496] Call Trace:
[ 5346.421507] [<ffffffff80263383>] validate_chain+0xb03/0x10d0
[ 5346.421515] [<ffffffff80263c05>] __lock_acquire+0x2b5/0x9c0
[ 5346.421522] [<ffffffff80262cc2>] validate_chain+0x442/0x10d0
[ 5346.421530] [<ffffffff802643aa>] lock_acquire+0x9a/0xe0
[ 5346.421537] [<ffffffff8023b034>] get_task_mm+0x24/0x70
[ 5346.421546] [<ffffffff804757c7>] _spin_lock+0x37/0x70
[ 5346.421553] [<ffffffff8023b034>] get_task_mm+0x24/0x70
[ 5346.421560] [<ffffffff8023b034>] get_task_mm+0x24/0x70
[ 5346.421569] [<ffffffff802b91f8>] memrlimit_cgroup_mm_owner_changed+0x18/0x90
[ 5346.421579] [<ffffffff80278b03>] cgroup_mm_owner_callbacks+0x93/0xc0
[ 5346.421587] [<ffffffff8023dc36>] ...[snip] Thanks for the BUG report() Since we hold task_lock(), we know that p->mm cannot change and we don't have to worry about incrementing mm_users. I think using just p->mm will work, we do have checks to make sure we don't pick a kernel thread. I vote for going down that road. -- Balbir --
Hi Balbir,
Sounds good. What about this?
---
cgroup_mm_owner_callbacks() can be called with task_lock() held in
mm_update_next_owner(), and all the .mm_owner_changed callbacks seem to
be *always* called with task_lock() held.
Actually, memrlimit is using task_lock() via get_task_mm() in
memrlimit_cgroup_mm_owner_changed(), raising the following recursive locking
trace:
[ 5346.421365] =============================================
[ 5346.421374] [ INFO: possible recursive locking detected ]
[ 5346.421381] 2.6.27-rc5-mm1 #20
[ 5346.421385] ---------------------------------------------
[ 5346.421391] interbench/10530 is trying to acquire lock:
[ 5346.421396] (&p->alloc_lock){--..}, at: [<ffffffff8023b034>] get_task_mm+0x24/0x70
[ 5346.421417]
[ 5346.421418] but task is already holding lock:
[ 5346.421423] (&p->alloc_lock){--..}, at: [<ffffffff8023db98>] mm_update_next_owner+0x148/0x230
[ 5346.421438]
[ 5346.421440] other info that might help us debug this:
[ 5346.421446] 2 locks held by interbench/10530:
[ 5346.421450] #0: (&mm->mmap_sem){----}, at: [<ffffffff8023db90>] mm_update_next_owner+0x140/0x230
[ 5346.421467] #1: (&p->alloc_lock){--..}, at: [<ffffffff8023db98>] mm_update_next_owner+0x148/0x230
[ 5346.421483]
[ 5346.421485] stack backtrace:
[ 5346.421491] Pid: 10530, comm: interbench Not tainted 2.6.27-rc5-mm1 #20
[ 5346.421496] Call Trace:
[ 5346.421507] [<ffffffff80263383>] validate_chain+0xb03/0x10d0
[ 5346.421515] [<ffffffff80263c05>] __lock_acquire+0x2b5/0x9c0
[ 5346.421522] [<ffffffff80262cc2>] validate_chain+0x442/0x10d0
[ 5346.421530] [<ffffffff802643aa>] lock_acquire+0x9a/0xe0
[ 5346.421537] [<ffffffff8023b034>] get_task_mm+0x24/0x70
[ 5346.421546] [<ffffffff804757c7>] _spin_lock+0x37/0x70
[ 5346.421553] [<ffffffff8023b034>] get_task_mm+0x24/0x70
[ 5346.421560] [<ffffffff8023b034>] get_task_mm+0x24/0x70
[ 5346.421569] [<ffffffff802b91f8>] memrlimit_cgroup_mm_owner_changed+0x18/0x90
[ 5346.421579] [<ffffffff80278b03>] ...My bad! mmput() must be removed at the end of this function! (just hit another bug). Ignore this one and sorry for the noise. I'll send a new patch. -Andrea --
