with CONFIG_PROVE_RCU=y, I saw this warning, it's because
css_id() is not under rcu_read_lock().
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/cgroup.c:4438 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
1 lock held by kswapd0/31:
#0: (swap_lock){+.+.-.}, at: [<c05058bb>] swap_info_get+0x4b/0xd0
stack backtrace:
Pid: 31, comm: kswapd0 Not tainted 2.6.34-rc5-tip+ #13
Call Trace:
[<c083c5d6>] ? printk+0x1d/0x1f
[<c0480744>] lockdep_rcu_dereference+0x94/0xb0
[<c049d6ed>] css_id+0x5d/0x60
[<c05165a5>] mem_cgroup_uncharge_swapcache+0x45/0xa0
[<c0505e4f>] swapcache_free+0x3f/0x60
[<c04e79e2>] __remove_mapping+0xb2/0xf0
[<c04e7cbb>] shrink_page_list+0x26b/0x490
[<c047f85d>] ? put_lock_stats+0xd/0x30
[<c083fd67>] ? _raw_spin_unlock_irq+0x27/0x50
[<c0482566>] ? trace_hardirqs_on_caller+0xb6/0x220
[<c04e8158>] shrink_inactive_list+0x278/0x620
[<c04729e1>] ? sched_clock_cpu+0x121/0x180
[<c047e9b8>] ? trace_hardirqs_off_caller+0x18/0x130
[<c047eadb>] ? trace_hardirqs_off+0xb/0x10
[<c0843438>] ? sub_preempt_count+0x8/0x90
[<c047f85d>] ? put_lock_stats+0xd/0x30
[<c04e8704>] shrink_zone+0x204/0x3c0
[<c083fcac>] ? _raw_spin_unlock+0x2c/0x50
[<c04e951e>] kswapd+0x61e/0x7c0
[<c04e6ed0>] ? isolate_pages_global+0x0/0x1f0
[<c046bae0>] ? autoremove_wake_function+0x0/0x50
[<c04e8f00>] ? kswapd+0x0/0x7c0
[<c046b5e4>] kthread+0x74/0x80
[<c046b570>] ? kthread+0x0/0x80
[<c04035ba>] kernel_thread_helper+0x6/0x10
--
On Fri, 23 Apr 2010 11:00:41 +0800
Ok. Thank you for reporting.
This is ok ?
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
css_id() should be called under rcu_read_lock().
Following is a report from Li Zefan.
==
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/cgroup.c:4438 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
1 lock held by kswapd0/31:
#0: (swap_lock){+.+.-.}, at: [<c05058bb>] swap_info_get+0x4b/0xd0
stack backtrace:
Pid: 31, comm: kswapd0 Not tainted 2.6.34-rc5-tip+ #13
Call Trace:
[<c083c5d6>] ? printk+0x1d/0x1f
[<c0480744>] lockdep_rcu_dereference+0x94/0xb0
[<c049d6ed>] css_id+0x5d/0x60
[<c05165a5>] mem_cgroup_uncharge_swapcache+0x45/0xa0
[<c0505e4f>] swapcache_free+0x3f/0x60
[<c04e79e2>] __remove_mapping+0xb2/0xf0
[<c04e7cbb>] shrink_page_list+0x26b/0x490
[<c047f85d>] ? put_lock_stats+0xd/0x30
[<c083fd67>] ? _raw_spin_unlock_irq+0x27/0x50
[<c0482566>] ? trace_hardirqs_on_caller+0xb6/0x220
[<c04e8158>] shrink_inactive_list+0x278/0x620
[<c04729e1>] ? sched_clock_cpu+0x121/0x180
[<c047e9b8>] ? trace_hardirqs_off_caller+0x18/0x130
[<c047eadb>] ? trace_hardirqs_off+0xb/0x10
[<c0843438>] ? sub_preempt_count+0x8/0x90
[<c047f85d>] ? put_lock_stats+0xd/0x30
[<c04e8704>] shrink_zone+0x204/0x3c0
[<c083fcac>] ? _raw_spin_unlock+0x2c/0x50
[<c04e951e>] kswapd+0x61e/0x7c0
[<c04e6ed0>] ? isolate_pages_global+0x0/0x1f0
[<c046bae0>] ? autoremove_wake_function+0x0/0x50
[<c04e8f00>] ? kswapd+0x0/0x7c0
[<c046b5e4>] kthread+0x74/0x80
[<c046b570>] ? kthread+0x0/0x80
[<c04035ba>] kernel_thread_helper+0x6/0x10
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA Hiroyuki ...Excellent Catch! Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com> -- Three Cheers, Balbir --
Yes, and I did some more simple tests on memcg, no more warning --
oops, after trigging oom, I saw 2 more warnings:
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/cgroup.c:4459 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
2 locks held by firefox/2258:
#0: (&mm->mmap_sem){++++++}, at: [<c0843090>] do_page_fault+0x100/0x500
#1: (tasklist_lock){.?.?.-}, at: [<c04df1ac>] mem_cgroup_out_of_memory+0x2c/0x90
stack backtrace:
Pid: 2258, comm: firefox Not tainted 2.6.34-rc5-tip+ #14
Call Trace:
[<c083c636>] ? printk+0x1d/0x1f
[<c0480744>] lockdep_rcu_dereference+0x94/0xb0
[<c049d61e>] css_is_ancestor+0xce/0xe0
[<c0517c41>] task_in_mem_cgroup+0xd1/0xf0
[<c0517b70>] ? task_in_mem_cgroup+0x0/0xf0
[<c04def10>] select_bad_process+0x70/0xe0
[<c04df1c1>] mem_cgroup_out_of_memory+0x41/0x90
[<c04826db>] ? trace_hardirqs_on+0xb/0x10
[<c05159e3>] mem_cgroup_handle_oom+0xf3/0x130
[<c046bae0>] ? autoremove_wake_function+0x0/0x50
[<c0516e01>] __mem_cgroup_try_charge+0x391/0x3d0
[<c047eadb>] ? trace_hardirqs_off+0xb/0x10
[<c05174c0>] mem_cgroup_charge_common+0x40/0x70
[<c0517620>] mem_cgroup_cache_charge+0x130/0x150
[<c04db6e7>] add_to_page_cache_locked+0x37/0x130
[<c04e5719>] ? __lru_cache_add+0x69/0xb0
[<c04db811>] add_to_page_cache_lru+0x31/0x80
[<c0549084>] mpage_readpages+0x84/0xf0
[<c057e4d0>] ? ext3_get_block+0x0/0x110
[<c057c760>] ? ext3_readpages+0x0/0x20
[<c057c77e>] ext3_readpages+0x1e/0x20
[<c057e4d0>] ? ext3_get_block+0x0/0x110
[<c04e4889>] __do_page_cache_readahead+0x219/0x2b0 ...On Fri, 23 Apr 2010 11:55:16 +0800 ok, I will update. thank you. --
one more:
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/cgroup.c:4438 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
3 locks held by bash/2270:
#0: (cgroup_mutex){+.+.+.}, at: [<c049ab37>] cgroup_lock_live_group+0x17/0x30
#1: (&mm->mmap_sem){++++++}, at: [<c0517302>] mem_cgroup_can_attach+0xb2/0x130
#2: (&(&mm->page_table_lock)->rlock){+.+.-.}, at: [<c0513c23>] mem_cgroup_count_precharge_pte_range+0x93/0x130
stack backtrace:
Pid: 2270, comm: bash Not tainted 2.6.34-rc5-tip+ #14
Call Trace:
[<c083c636>] ? printk+0x1d/0x1f
[<c0480744>] lockdep_rcu_dereference+0x94/0xb0
[<c049d6ed>] css_id+0x5d/0x60
[<c051373f>] is_target_pte_for_mc+0x16f/0x1c0
[<c083f46b>] ? _raw_spin_lock+0x6b/0x80
[<c0513c4d>] mem_cgroup_count_precharge_pte_range+0xbd/0x130
[<c0513b90>] ? mem_cgroup_count_precharge_pte_range+0x0/0x130
[<c05030bd>] walk_page_range+0x25d/0x3f0
[<c0517344>] mem_cgroup_can_attach+0xf4/0x130
[<c0513b90>] ? mem_cgroup_count_precharge_pte_range+0x0/0x130
[<c0517250>] ? mem_cgroup_can_attach+0x0/0x130
[<c049e000>] cgroup_attach_task+0x70/0x280
[<c049e633>] cgroup_tasks_write+0x63/0x1c0
[<c049e660>] ? cgroup_tasks_write+0x90/0x1c0
[<c049d515>] cgroup_file_write+0x1f5/0x230
[<c0842f90>] ? do_page_fault+0x0/0x500
[<c047107b>] ? up_read+0x1b/0x30
[<c0843195>] ? do_page_fault+0x205/0x500
[<c051a8c4>] vfs_write+0xa4/0x1a0
[<c049d320>] ? cgroup_file_write+0x0/0x230
[<c051b3f6>] sys_write+0x46/0x70
[<c0403090>] sysenter_do_call+0x12/0x36
--
On Fri, 23 Apr 2010 11:55:16 +0800
Thank you for good testing.
=
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
css_id() should be called under rcu_read_lock().
And css_is_ancestor() should be called under rcu_read_lock().
Following is a report from Li Zefan.
==
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/cgroup.c:4438 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
1 lock held by kswapd0/31:
#0: (swap_lock){+.+.-.}, at: [<c05058bb>] swap_info_get+0x4b/0xd0
stack backtrace:
Pid: 31, comm: kswapd0 Not tainted 2.6.34-rc5-tip+ #13
Call Trace:
[<c083c5d6>] ? printk+0x1d/0x1f
[<c0480744>] lockdep_rcu_dereference+0x94/0xb0
[<c049d6ed>] css_id+0x5d/0x60
[<c05165a5>] mem_cgroup_uncharge_swapcache+0x45/0xa0
[<c0505e4f>] swapcache_free+0x3f/0x60
[<c04e79e2>] __remove_mapping+0xb2/0xf0
[<c04e7cbb>] shrink_page_list+0x26b/0x490
[<c047f85d>] ? put_lock_stats+0xd/0x30
[<c083fd67>] ? _raw_spin_unlock_irq+0x27/0x50
[<c0482566>] ? trace_hardirqs_on_caller+0xb6/0x220
[<c04e8158>] shrink_inactive_list+0x278/0x620
[<c04729e1>] ? sched_clock_cpu+0x121/0x180
[<c047e9b8>] ? trace_hardirqs_off_caller+0x18/0x130
[<c047eadb>] ? trace_hardirqs_off+0xb/0x10
[<c0843438>] ? sub_preempt_count+0x8/0x90
[<c047f85d>] ? put_lock_stats+0xd/0x30
[<c04e8704>] shrink_zone+0x204/0x3c0
[<c083fcac>] ? _raw_spin_unlock+0x2c/0x50
[<c04e951e>] kswapd+0x61e/0x7c0
[<c04e6ed0>] ? isolate_pages_global+0x0/0x1f0
[<c046bae0>] ? autoremove_wake_function+0x0/0x50
[<c04e8f00>] ? kswapd+0x0/0x7c0
[<c046b5e4>] kthread+0x74/0x80
[<c046b570>] ? kthread+0x0/0x80
[<c04035ba>] kernel_thread_helper+0x6/0x10
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: ...On Fri, 23 Apr 2010 12:58:14 +0900
v3 here...sorry too rapid posting...
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
css_id() should be called under rcu_read_lock().
Following is a report from Li Zefan.
==
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/cgroup.c:4438 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
1 lock held by kswapd0/31:
#0: (swap_lock){+.+.-.}, at: [<c05058bb>] swap_info_get+0x4b/0xd0
stack backtrace:
Pid: 31, comm: kswapd0 Not tainted 2.6.34-rc5-tip+ #13
Call Trace:
[<c083c5d6>] ? printk+0x1d/0x1f
[<c0480744>] lockdep_rcu_dereference+0x94/0xb0
[<c049d6ed>] css_id+0x5d/0x60
[<c05165a5>] mem_cgroup_uncharge_swapcache+0x45/0xa0
[<c0505e4f>] swapcache_free+0x3f/0x60
[<c04e79e2>] __remove_mapping+0xb2/0xf0
[<c04e7cbb>] shrink_page_list+0x26b/0x490
[<c047f85d>] ? put_lock_stats+0xd/0x30
[<c083fd67>] ? _raw_spin_unlock_irq+0x27/0x50
[<c0482566>] ? trace_hardirqs_on_caller+0xb6/0x220
[<c04e8158>] shrink_inactive_list+0x278/0x620
[<c04729e1>] ? sched_clock_cpu+0x121/0x180
[<c047e9b8>] ? trace_hardirqs_off_caller+0x18/0x130
[<c047eadb>] ? trace_hardirqs_off+0xb/0x10
[<c0843438>] ? sub_preempt_count+0x8/0x90
[<c047f85d>] ? put_lock_stats+0xd/0x30
[<c04e8704>] shrink_zone+0x204/0x3c0
[<c083fcac>] ? _raw_spin_unlock+0x2c/0x50
[<c04e951e>] kswapd+0x61e/0x7c0
[<c04e6ed0>] ? isolate_pages_global+0x0/0x1f0
[<c046bae0>] ? autoremove_wake_function+0x0/0x50
[<c04e8f00>] ? kswapd+0x0/0x7c0
[<c046b5e4>] kthread+0x74/0x80
[<c046b570>] ? kthread+0x0/0x80
[<c04035ba>] kernel_thread_helper+0x6/0x10
And css_is_ancestor() should be called under rcu_read_lock().
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh ...Thank you for your report & patch. (and I'm sorry that I've not been active these days ;( ) This patch looks good to me and, IIUC, would be enough to fix this bug. Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> BTW, it wouldn't cause any problem, I think former rcu_read_lock()/unlock() in task_in_mem_cgroup() is unnecessary, because try_get_mem_cgroup_from_mm() calls them for itself. Thanks, --
With this patch applied, I did some more test, and no warning was triggered. Tested-by: Li Zefan <lizf@cn.fujitsu.com> --
On Fri, 23 Apr 2010 14:10:32 +0800 Thank you!. -Kame --
Looking at the patch we seem to be protecting the use of only css_*(). I wonder if we should push down the rcu_read_*lock() semnatics to the css routines or is it just too instrusive to do it that way? -- Three Cheers, Balbir --
On Fri, 23 Apr 2010 12:30:11 +0530 Maybe worth to consider for future patches for clean up. Thanks, -Kame --
I have queued this, thank you all! However, memcg_oom_wake_function() does not yet exist in the tree I am using, and is_target_pte_for_mc() has changed. I omitted the hunk for memcg_oom_wake_function() and edited the hunk for is_target_pte_for_mc(). I have queued this for others' testing, but if you would rather carry this patch up the memcg path, please let me know and I will drop it. --
On Fri, 23 Apr 2010 12:34:06 -0700 I think it's ok to be fixed by your tree. I'll look at memcg later and fix remaining things. Thanks, -Kame --
