Re: [BUGFIX][PATCH] memcg rcu lock fix v3

Previous thread: [PATCH] cgroup: Check task_lock in task_subsys_state() by Li Zefan on Thursday, April 22, 2010 - 7:35 pm. (3 messages)

Next thread: [PATCH] initramfs: handle unrecognised decompressor when unpacking by Phillip Lougher on Thursday, April 22, 2010 - 8:02 pm. (1 message)
From: Li Zefan
Date: Thursday, April 22, 2010 - 8:00 pm

with CONFIG_PROVE_RCU=y, I saw this warning, it's because
css_id() is not under rcu_read_lock().


===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/cgroup.c:4438 invoked rcu_dereference_check() without protection!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 1
1 lock held by kswapd0/31:
 #0:  (swap_lock){+.+.-.}, at: [<c05058bb>] swap_info_get+0x4b/0xd0

stack backtrace:
Pid: 31, comm: kswapd0 Not tainted 2.6.34-rc5-tip+ #13
Call Trace:
 [<c083c5d6>] ? printk+0x1d/0x1f
 [<c0480744>] lockdep_rcu_dereference+0x94/0xb0
 [<c049d6ed>] css_id+0x5d/0x60
 [<c05165a5>] mem_cgroup_uncharge_swapcache+0x45/0xa0
 [<c0505e4f>] swapcache_free+0x3f/0x60
 [<c04e79e2>] __remove_mapping+0xb2/0xf0
 [<c04e7cbb>] shrink_page_list+0x26b/0x490
 [<c047f85d>] ? put_lock_stats+0xd/0x30
 [<c083fd67>] ? _raw_spin_unlock_irq+0x27/0x50
 [<c0482566>] ? trace_hardirqs_on_caller+0xb6/0x220
 [<c04e8158>] shrink_inactive_list+0x278/0x620
 [<c04729e1>] ? sched_clock_cpu+0x121/0x180
 [<c047e9b8>] ? trace_hardirqs_off_caller+0x18/0x130
 [<c047eadb>] ? trace_hardirqs_off+0xb/0x10
 [<c0843438>] ? sub_preempt_count+0x8/0x90
 [<c047f85d>] ? put_lock_stats+0xd/0x30
 [<c04e8704>] shrink_zone+0x204/0x3c0
 [<c083fcac>] ? _raw_spin_unlock+0x2c/0x50
 [<c04e951e>] kswapd+0x61e/0x7c0
 [<c04e6ed0>] ? isolate_pages_global+0x0/0x1f0
 [<c046bae0>] ? autoremove_wake_function+0x0/0x50
 [<c04e8f00>] ? kswapd+0x0/0x7c0
 [<c046b5e4>] kthread+0x74/0x80
 [<c046b570>] ? kthread+0x0/0x80
 [<c04035ba>] kernel_thread_helper+0x6/0x10
--

From: KAMEZAWA Hiroyuki
Date: Thursday, April 22, 2010 - 8:14 pm

On Fri, 23 Apr 2010 11:00:41 +0800

Ok. Thank you for reporting.
This is ok ? 
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

css_id() should be called under rcu_read_lock().
Following is a report from Li Zefan.
==
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/cgroup.c:4438 invoked rcu_dereference_check() without protection!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 1
1 lock held by kswapd0/31:
 #0:  (swap_lock){+.+.-.}, at: [<c05058bb>] swap_info_get+0x4b/0xd0

stack backtrace:
Pid: 31, comm: kswapd0 Not tainted 2.6.34-rc5-tip+ #13
Call Trace:
 [<c083c5d6>] ? printk+0x1d/0x1f
 [<c0480744>] lockdep_rcu_dereference+0x94/0xb0
 [<c049d6ed>] css_id+0x5d/0x60
 [<c05165a5>] mem_cgroup_uncharge_swapcache+0x45/0xa0
 [<c0505e4f>] swapcache_free+0x3f/0x60
 [<c04e79e2>] __remove_mapping+0xb2/0xf0
 [<c04e7cbb>] shrink_page_list+0x26b/0x490
 [<c047f85d>] ? put_lock_stats+0xd/0x30
 [<c083fd67>] ? _raw_spin_unlock_irq+0x27/0x50
 [<c0482566>] ? trace_hardirqs_on_caller+0xb6/0x220
 [<c04e8158>] shrink_inactive_list+0x278/0x620
 [<c04729e1>] ? sched_clock_cpu+0x121/0x180
 [<c047e9b8>] ? trace_hardirqs_off_caller+0x18/0x130
 [<c047eadb>] ? trace_hardirqs_off+0xb/0x10
 [<c0843438>] ? sub_preempt_count+0x8/0x90
 [<c047f85d>] ? put_lock_stats+0xd/0x30
 [<c04e8704>] shrink_zone+0x204/0x3c0
 [<c083fcac>] ? _raw_spin_unlock+0x2c/0x50
 [<c04e951e>] kswapd+0x61e/0x7c0
 [<c04e6ed0>] ? isolate_pages_global+0x0/0x1f0
 [<c046bae0>] ? autoremove_wake_function+0x0/0x50
 [<c04e8f00>] ? kswapd+0x0/0x7c0
 [<c046b5e4>] kthread+0x74/0x80
 [<c046b570>] ? kthread+0x0/0x80
 [<c04035ba>] kernel_thread_helper+0x6/0x10

Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA Hiroyuki ...
From: Balbir Singh
Date: Thursday, April 22, 2010 - 8:32 pm

Excellent Catch!

Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>

-- 
	Three Cheers,
	Balbir
--

From: Li Zefan
Date: Thursday, April 22, 2010 - 8:49 pm

Yes, and I did some more simple tests on memcg, no more warning
--

From: Li Zefan
Date: Thursday, April 22, 2010 - 8:55 pm

oops, after trigging oom, I saw 2 more warnings:

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/cgroup.c:4459 invoked rcu_dereference_check() without protection!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 1
2 locks held by firefox/2258:            
 #0:  (&mm->mmap_sem){++++++}, at: [<c0843090>] do_page_fault+0x100/0x500
 #1:  (tasklist_lock){.?.?.-}, at: [<c04df1ac>] mem_cgroup_out_of_memory+0x2c/0x90

stack backtrace:
Pid: 2258, comm: firefox Not tainted 2.6.34-rc5-tip+ #14
Call Trace:                                             
 [<c083c636>] ? printk+0x1d/0x1f                        
 [<c0480744>] lockdep_rcu_dereference+0x94/0xb0         
 [<c049d61e>] css_is_ancestor+0xce/0xe0                 
 [<c0517c41>] task_in_mem_cgroup+0xd1/0xf0              
 [<c0517b70>] ? task_in_mem_cgroup+0x0/0xf0             
 [<c04def10>] select_bad_process+0x70/0xe0              
 [<c04df1c1>] mem_cgroup_out_of_memory+0x41/0x90        
 [<c04826db>] ? trace_hardirqs_on+0xb/0x10              
 [<c05159e3>] mem_cgroup_handle_oom+0xf3/0x130          
 [<c046bae0>] ? autoremove_wake_function+0x0/0x50       
 [<c0516e01>] __mem_cgroup_try_charge+0x391/0x3d0       
 [<c047eadb>] ? trace_hardirqs_off+0xb/0x10             
 [<c05174c0>] mem_cgroup_charge_common+0x40/0x70        
 [<c0517620>] mem_cgroup_cache_charge+0x130/0x150       
 [<c04db6e7>] add_to_page_cache_locked+0x37/0x130       
 [<c04e5719>] ? __lru_cache_add+0x69/0xb0               
 [<c04db811>] add_to_page_cache_lru+0x31/0x80           
 [<c0549084>] mpage_readpages+0x84/0xf0                 
 [<c057e4d0>] ? ext3_get_block+0x0/0x110                
 [<c057c760>] ? ext3_readpages+0x0/0x20                 
 [<c057c77e>] ext3_readpages+0x1e/0x20                  
 [<c057e4d0>] ? ext3_get_block+0x0/0x110                
 [<c04e4889>] __do_page_cache_readahead+0x219/0x2b0 ...
From: KAMEZAWA Hiroyuki
Date: Thursday, April 22, 2010 - 8:50 pm

On Fri, 23 Apr 2010 11:55:16 +0800

ok, I will update.  thank you.


--

From: Li Zefan
Date: Thursday, April 22, 2010 - 9:02 pm

one more:

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/cgroup.c:4438 invoked rcu_dereference_check() without protection!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 1
3 locks held by bash/2270:
 #0:  (cgroup_mutex){+.+.+.}, at: [<c049ab37>] cgroup_lock_live_group+0x17/0x30
 #1:  (&mm->mmap_sem){++++++}, at: [<c0517302>] mem_cgroup_can_attach+0xb2/0x130
 #2:  (&(&mm->page_table_lock)->rlock){+.+.-.}, at: [<c0513c23>] mem_cgroup_count_precharge_pte_range+0x93/0x130

stack backtrace:
Pid: 2270, comm: bash Not tainted 2.6.34-rc5-tip+ #14
Call Trace:
 [<c083c636>] ? printk+0x1d/0x1f
 [<c0480744>] lockdep_rcu_dereference+0x94/0xb0
 [<c049d6ed>] css_id+0x5d/0x60
 [<c051373f>] is_target_pte_for_mc+0x16f/0x1c0
 [<c083f46b>] ? _raw_spin_lock+0x6b/0x80
 [<c0513c4d>] mem_cgroup_count_precharge_pte_range+0xbd/0x130
 [<c0513b90>] ? mem_cgroup_count_precharge_pte_range+0x0/0x130
 [<c05030bd>] walk_page_range+0x25d/0x3f0
 [<c0517344>] mem_cgroup_can_attach+0xf4/0x130
 [<c0513b90>] ? mem_cgroup_count_precharge_pte_range+0x0/0x130
 [<c0517250>] ? mem_cgroup_can_attach+0x0/0x130
 [<c049e000>] cgroup_attach_task+0x70/0x280
 [<c049e633>] cgroup_tasks_write+0x63/0x1c0
 [<c049e660>] ? cgroup_tasks_write+0x90/0x1c0
 [<c049d515>] cgroup_file_write+0x1f5/0x230
 [<c0842f90>] ? do_page_fault+0x0/0x500
 [<c047107b>] ? up_read+0x1b/0x30
 [<c0843195>] ? do_page_fault+0x205/0x500
 [<c051a8c4>] vfs_write+0xa4/0x1a0
 [<c049d320>] ? cgroup_file_write+0x0/0x230
 [<c051b3f6>] sys_write+0x46/0x70
 [<c0403090>] sysenter_do_call+0x12/0x36
--

From: KAMEZAWA Hiroyuki
Date: Thursday, April 22, 2010 - 8:58 pm

On Fri, 23 Apr 2010 11:55:16 +0800

Thank you for good testing.
=
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

css_id() should be called under rcu_read_lock().
And css_is_ancestor() should be called under rcu_read_lock().

Following is a report from Li Zefan.
==
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/cgroup.c:4438 invoked rcu_dereference_check() without protection!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 1
1 lock held by kswapd0/31:
 #0:  (swap_lock){+.+.-.}, at: [<c05058bb>] swap_info_get+0x4b/0xd0

stack backtrace:
Pid: 31, comm: kswapd0 Not tainted 2.6.34-rc5-tip+ #13
Call Trace:
 [<c083c5d6>] ? printk+0x1d/0x1f
 [<c0480744>] lockdep_rcu_dereference+0x94/0xb0
 [<c049d6ed>] css_id+0x5d/0x60
 [<c05165a5>] mem_cgroup_uncharge_swapcache+0x45/0xa0
 [<c0505e4f>] swapcache_free+0x3f/0x60
 [<c04e79e2>] __remove_mapping+0xb2/0xf0
 [<c04e7cbb>] shrink_page_list+0x26b/0x490
 [<c047f85d>] ? put_lock_stats+0xd/0x30
 [<c083fd67>] ? _raw_spin_unlock_irq+0x27/0x50
 [<c0482566>] ? trace_hardirqs_on_caller+0xb6/0x220
 [<c04e8158>] shrink_inactive_list+0x278/0x620
 [<c04729e1>] ? sched_clock_cpu+0x121/0x180
 [<c047e9b8>] ? trace_hardirqs_off_caller+0x18/0x130
 [<c047eadb>] ? trace_hardirqs_off+0xb/0x10
 [<c0843438>] ? sub_preempt_count+0x8/0x90
 [<c047f85d>] ? put_lock_stats+0xd/0x30
 [<c04e8704>] shrink_zone+0x204/0x3c0
 [<c083fcac>] ? _raw_spin_unlock+0x2c/0x50
 [<c04e951e>] kswapd+0x61e/0x7c0
 [<c04e6ed0>] ? isolate_pages_global+0x0/0x1f0
 [<c046bae0>] ? autoremove_wake_function+0x0/0x50
 [<c04e8f00>] ? kswapd+0x0/0x7c0
 [<c046b5e4>] kthread+0x74/0x80
 [<c046b570>] ? kthread+0x0/0x80
 [<c04035ba>] kernel_thread_helper+0x6/0x10



Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: ...
From: KAMEZAWA Hiroyuki
Date: Thursday, April 22, 2010 - 9:03 pm

On Fri, 23 Apr 2010 12:58:14 +0900
v3 here...sorry too rapid posting...

==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

css_id() should be called under rcu_read_lock().
Following is a report from Li Zefan.
==
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/cgroup.c:4438 invoked rcu_dereference_check() without protection!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 1
1 lock held by kswapd0/31:
 #0:  (swap_lock){+.+.-.}, at: [<c05058bb>] swap_info_get+0x4b/0xd0

stack backtrace:
Pid: 31, comm: kswapd0 Not tainted 2.6.34-rc5-tip+ #13
Call Trace:
 [<c083c5d6>] ? printk+0x1d/0x1f
 [<c0480744>] lockdep_rcu_dereference+0x94/0xb0
 [<c049d6ed>] css_id+0x5d/0x60
 [<c05165a5>] mem_cgroup_uncharge_swapcache+0x45/0xa0
 [<c0505e4f>] swapcache_free+0x3f/0x60
 [<c04e79e2>] __remove_mapping+0xb2/0xf0
 [<c04e7cbb>] shrink_page_list+0x26b/0x490
 [<c047f85d>] ? put_lock_stats+0xd/0x30
 [<c083fd67>] ? _raw_spin_unlock_irq+0x27/0x50
 [<c0482566>] ? trace_hardirqs_on_caller+0xb6/0x220
 [<c04e8158>] shrink_inactive_list+0x278/0x620
 [<c04729e1>] ? sched_clock_cpu+0x121/0x180
 [<c047e9b8>] ? trace_hardirqs_off_caller+0x18/0x130
 [<c047eadb>] ? trace_hardirqs_off+0xb/0x10
 [<c0843438>] ? sub_preempt_count+0x8/0x90
 [<c047f85d>] ? put_lock_stats+0xd/0x30
 [<c04e8704>] shrink_zone+0x204/0x3c0
 [<c083fcac>] ? _raw_spin_unlock+0x2c/0x50
 [<c04e951e>] kswapd+0x61e/0x7c0
 [<c04e6ed0>] ? isolate_pages_global+0x0/0x1f0
 [<c046bae0>] ? autoremove_wake_function+0x0/0x50
 [<c04e8f00>] ? kswapd+0x0/0x7c0
 [<c046b5e4>] kthread+0x74/0x80
 [<c046b570>] ? kthread+0x0/0x80
 [<c04035ba>] kernel_thread_helper+0x6/0x10

And css_is_ancestor() should be called under rcu_read_lock().


Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh ...
From: Daisuke Nishimura
Date: Thursday, April 22, 2010 - 9:41 pm

Thank you for your report & patch.
(and I'm sorry that I've not been active these days ;( )

This patch looks good to me and, IIUC, would be enough to fix this bug.

	Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

BTW, it wouldn't cause any problem, I think former rcu_read_lock()/unlock()
in task_in_mem_cgroup() is unnecessary, because try_get_mem_cgroup_from_mm()
calls them for itself.


Thanks,
--

From: Li Zefan
Date: Thursday, April 22, 2010 - 11:10 pm

With this patch applied, I did some more test, and no warning was triggered.

Tested-by: Li Zefan <lizf@cn.fujitsu.com>

--

From: KAMEZAWA Hiroyuki
Date: Thursday, April 22, 2010 - 11:05 pm

On Fri, 23 Apr 2010 14:10:32 +0800
Thank you!.

-Kame

--

From: Balbir Singh
Date: Friday, April 23, 2010 - 12:00 am

Looking at the patch we seem to be protecting the use of only css_*().
I wonder if we should push down the rcu_read_*lock() semnatics to the
css routines or is it just too instrusive to do it that way?

-- 
	Three Cheers,
	Balbir
--

From: KAMEZAWA Hiroyuki
Date: Thursday, April 22, 2010 - 11:57 pm

On Fri, 23 Apr 2010 12:30:11 +0530

Maybe worth to consider for future patches for clean up.

Thanks,
-Kame


--

From: Paul E. McKenney
Date: Friday, April 23, 2010 - 12:34 pm

I have queued this, thank you all!

However, memcg_oom_wake_function() does not yet exist in the tree
I am using, and is_target_pte_for_mc() has changed.  I omitted the
hunk for memcg_oom_wake_function() and edited the hunk for
is_target_pte_for_mc().

I have queued this for others' testing, but if you would rather carry
this patch up the memcg path, please let me know and I will drop it.

--

From: KAMEZAWA Hiroyuki
Date: Friday, April 23, 2010 - 7:08 pm

On Fri, 23 Apr 2010 12:34:06 -0700
I think it's ok to be fixed by your tree. I'll look at memcg later and
fix remaining things.

Thanks,
-Kame

--

From: Paul E. McKenney
Date: Friday, April 23, 2010 - 9:27 pm

Sounds good!

							Thanx, Paul
--

Previous thread: [PATCH] cgroup: Check task_lock in task_subsys_state() by Li Zefan on Thursday, April 22, 2010 - 7:35 pm. (3 messages)

Next thread: [PATCH] initramfs: handle unrecognised decompressor when unpacking by Phillip Lougher on Thursday, April 22, 2010 - 8:02 pm. (1 message)