migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous
pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d.
The point of the RCU protection there is part of getting a stable reference
to anon_vma and is only held for anon pages as file pages are locked
which is sufficient protection against freeing.
However, while a file page's mapping is being migrated, the radix
tree is double checked to ensure it is the expected page. This uses
radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held
triggering the following warning under CONFIG_PROVE_RCU.
[ 173.674290] ===================================================
[ 173.676016] [ INFO: suspicious rcu_dereference_check() usage. ]
[ 173.676016] ---------------------------------------------------
[ 173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
[ 173.676016]
[ 173.676016] other info that might help us debug this:
[ 173.676016]
[ 173.676016]
[ 173.676016] rcu_scheduler_active = 1, debug_locks = 0
[ 173.676016] 1 lock held by hugeadm/2899:
[ 173.676016] #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab
[ 173.676016]
[ 173.676016] stack backtrace:
[ 173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild
[ 173.676016] Call Trace:
[ 173.676016] [<c128cc01>] ? printk+0x14/0x1b
[ 173.676016] [<c1063502>] lockdep_rcu_dereference+0x7d/0x86
[ 173.676016] [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab
[ 173.676016] [<c10e41ad>] migrate_page+0x23/0x39
[ 173.676016] [<c10e491b>] buffer_migrate_page+0x22/0x107
[ 173.676016] [<c10e48f9>] ? buffer_migrate_page+0x0/0x107
[ 173.676016] [<c10e425d>] move_to_new_page+0x9a/0x1ae
[ 173.676016] [<c10e47e6>] migrate_pages+0x1e7/0x2fa
This patch introduces radix_tree_deref_slot_protected() which calls
rcu_dereference_protected(). Users of it must pass in the mapping->tree_lock
that is protecting ...This was a bad idea. After some extended testing, it was obvious that this function can be called for swapcache pages with the RCU lock held. Paul, is it still permissible to use rcu_dereference_protected() or must -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
Hmm.. Why did you add the check? -- Kind regards, Minchan Kim --
Because our earlier discussions assumed that RCU read lock was not held in this path. The check was added to ensure that assumption was correct, -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
[ Add Paul back to the CC list, and also Dipankar. Hopefully I killed the mime encodings correctly ] I'm not Paul but I can read the code in include/linux/rcuupdate.h. Holding rcu_read_lock_held isn't a problem, but using protected with No this is a problem .. because __rcu_dereference_protected doesn't include the smp_read_barrier_depends() that is needed in the rcu only reference path. Either we need two helpers, one for when the tree is write locked and one when the tree is only rcu read locked, or we use __rcu_dereference_check milton --
Bah, this was extremely careless of me as it's even written in teh
documentation. In this specific case, it's simply allowed to ignore whether
the RCU read lock is held or not and the BUG_ON check was unnecessary. The
tree lock protects against parallel updaters which is what we really care
about for using _protected.
In a later cycle, I should look at reducing the RCU read lock hold time
in migration. The main thing it's protecting is getting a stable
reference to anon_vma and it's held longer than is necessary for that.
In the meantime, can anyone spot a problem with this patch?
==== CUT HERE ====
mm: migration: Use rcu_dereference_protected when dereferencing the radix tree slot during file page migration
migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous
pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d.
The point of the RCU protection there is part of getting a stable reference
to anon_vma and is only held for anon pages as file pages are locked
which is sufficient protection against freeing.
However, while a file page's mapping is being migrated, the radix tree
is double checked to ensure it is the expected page. This uses
radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held
triggering the following warning.
[ 173.674290] ===================================================
[ 173.676016] [ INFO: suspicious rcu_dereference_check() usage. ]
[ 173.676016] ---------------------------------------------------
[ 173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
[ 173.676016]
[ 173.676016] other info that might help us debug this:
[ 173.676016]
[ 173.676016]
[ 173.676016] rcu_scheduler_active = 1, debug_locks = 0
[ 173.676016] 1 lock held by hugeadm/2899:
[ 173.676016] #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab
[ 173.676016]
[ 173.676016] stack backtrace:
[ 173.676016] Pid: 2899, ...Yes. I think if we want to reduce RCU read lock hold time, we should look unmap_and_move in case of anon page. After we hold a reference of anon_vma->external_refcount, anon_vma would be stable so we can release rcu_read_unlock. Reviewed-by: Minchan Kim <minchan.kim@gmail.com> This is what I want. Thanks. -- Kind regards, Minchan Kim --
On Mon, 20 Dec 2010 15:23:36 +0000 Thank you for fixing. --
