PageAnon pages that are unmapped may or may not have an anon_vma so
are not currently migrated. However, a swap cache page can be migrated
and fits this description. This patch identifies page swap caches and
allows them to be migrated.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
mm/migrate.c | 15 ++++++++++-----
mm/rmap.c | 6 ++++--
2 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 35aad2a..f9bf37e 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -203,6 +203,9 @@ static int migrate_page_move_mapping(struct address_space *mapping,
void **pslot;
if (!mapping) {
+ if (PageSwapCache(page))
+ SetPageSwapCache(newpage);
+
/* Anonymous page without mapping */
if (page_count(page) != 1)
return -EAGAIN;
@@ -607,11 +610,13 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
* the page was isolated and when we reached here while
* the RCU lock was not held
*/
- if (!page_mapped(page))
- goto rcu_unlock;
-
- anon_vma = page_anon_vma(page);
- atomic_inc(&anon_vma->external_refcount);
+ if (!page_mapped(page)) {
+ if (!PageSwapCache(page))
+ goto rcu_unlock;
+ } else {
+ anon_vma = page_anon_vma(page);
+ atomic_inc(&anon_vma->external_refcount);
+ }
}
/*
diff --git a/mm/rmap.c b/mm/rmap.c
index af35b75..d5ea1f2 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1394,9 +1394,11 @@ int rmap_walk(struct page *page, int (*rmap_one)(struct page *,
if (unlikely(PageKsm(page)))
return rmap_walk_ksm(page, rmap_one, arg);
- else if (PageAnon(page))
+ else if (PageAnon(page)) {
+ if (PageSwapCache(page))
+ return SWAP_AGAIN;
return rmap_walk_anon(page, rmap_one, arg);
- else
+ } else
return rmap_walk_file(page, rmap_one, arg);
}
#endif /* CONFIG_MIGRATION */
--
1.6.5
--
On Tue, 30 Mar 2010 10:14:49 +0100
Migration of SwapCache requires radix-tree replacement, IOW,
mapping == NULL && PageSwapCache is BUG.
SwapCache has a condition as (PageSwapCache(page) && page_mapped(page) == true.
Please see do_swap_page(), PageSwapCache bit is cleared only when
do_swap_page()...
swap_free(entry);
if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
try_to_free_swap(page);
Then, PageSwapCache is cleared only when swap is freeable even if mapped.
rmap_walk_anon() should be called and the check is not necessary.
Thanks,
-Kame
--
Correct. In the initial attempt to allow PageSwapCache pages to move, I
encountered a number of bugs. This was a "it can't be happening but something
I follow your reasoning. What caught me is the following comment;
* Corner case handling:
* 1. When a new swap-cache page is read into, it is added to the LRU
* and treated as swapcache but it has no rmap yet.
* Calling try_to_unmap() against a page->mapping==NULL page will
* trigger a BUG. So handle it here.
and the fact that without the check the following bug is triggered;
[ 476.541321] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 476.590328] IP: [<ffffffff81072162>] __bfs+0x1a1/0x1d7
[ 476.590328] PGD 3781c067 PUD 371b2067 PMD 0
[ 476.590328] Oops: 0000 [#1] PREEMPT SMP
[ 476.590328] last sysfs file: /sys/block/sr0/capability
[ 476.590328] CPU 0
[ 476.590328] Modules linked in: highalloc trace_allocmap buddyinfo vmregress_core oprofile dm_crypt loop i2c_piix4 evdev i2c_core processor serio_raw tpm_tis tpm tpm_bios shpchp pci_hotplug button ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod sg sr_mod cdrom sd_mod ata_generic ahci libahci ide_pci_generic libata ide_core scsi_mod ohci_hcd r8169 mii ehci_hcd floppy thermal fan thermal_sys
[ 477.296405]
[ 477.296405] Pid: 4343, comm: bench-stresshig Not tainted 2.6.34-rc2-mm1-compaction-v7r3 #1 GA-MA790GP-UD4H/GA-MA790GP-UD4H
[ 477.296405] RIP: 0010:[<ffffffff81072162>] [<ffffffff81072162>] __bfs+0x1a1/0x1d7
[ 477.296405] RSP: 0000:ffff88007817d4c8 EFLAGS: 00010046
[ 477.296405] RAX: ffffffff81c80170 RBX: ffff88007817d528 RCX: 0000000000000000
[ 477.296405] RDX: ffff88007817d528 RSI: 0000000000000000 RDI: ffff88007817d528
[ 477.296405] RBP: ffff88007817d518 R08: 0000000000000000 R09: 0000000000000000
[ 477.296405] R10: ffffffff816a1d08 R11: 0000000000000046 R12: 0000000000000000
[ 477.922839] R13: ffffffff81513887 R14: ffffffff81c80170 R15: 0000000000000000
[ 477.922839] ...On Wed, 31 Mar 2010 12:27:30 +0100 rmap_walk_anon() is called against unmapped pages. Then, !page_mapped() is always true. So, the behavior will not be different from the last one. Thanks, -Kame --
On Thu, Apr 1, 2010 at 8:57 AM, KAMEZAWA Hiroyuki rmap_walk_anon can be also called in case of failing try_to_unmap. Then, In unmap_and_move, page_mapped is true and remove_migration_ptes can be called. -- Kind regards, Minchan Kim --
On Wed, Mar 31, 2010 at 2:26 PM, KAMEZAWA Hiroyuki Frankly speaking, I don't understand what is Mel's problem, why he added Swapcache check in rmap_walk, and why do you said we don't need it. -- Kind regards, Minchan Kim --
On Thu, 1 Apr 2010 11:43:18 +0900
I may miss something.
unmap_and_move()
1. try_to_unmap(TTU_MIGRATION)
2. move_to_newpage
3. remove_migration_ptes
-> rmap_walk()
Then, to map a page back we unmapped we call rmap_walk().
Assume a SwapCache which is mapped, then, PageAnon(page) == true.
At 1. try_to_unmap() will rewrite pte with swp_entry of SwapCache.
mapcount goes to 0.
At 2. SwapCache is copied to a new page.
At 3. The new page is mapped back to the place. Now, newpage's mapcount is 0.
Before patch, the new page is mapped back to all ptes.
After patch, the new page is not mapped back because its mapcount is 0.
I don't think shared SwapCache of anon is not an usual behavior, so, the logic
before patch is more attractive.
If SwapCache is not mapped before "1", we skip "1" and rmap_walk will do nothing
because page->mapping is NULL.
Thanks,
-Kame
--
On Thu, Apr 1, 2010 at 12:01 PM, KAMEZAWA Hiroyuki Thanks. I agree. We don't need the check. Then, my question is why Mel added the check in rmap_walk. He mentioned some BUG trigger and fixed things after this patch. What's it? Is it really related to this logic? I don't think so or we are missing something. -- Kind regards, Minchan Kim --
On Thu, 1 Apr 2010 13:44:29 +0900
Hmm. Consiering again.
Now.
if (PageAnon(page)) {
rcu_locked = 1;
rcu_read_lock();
if (!page_mapped(page)) {
if (!PageSwapCache(page))
goto rcu_unlock;
} else {
anon_vma = page_anon_vma(page);
atomic_inc(&anon_vma->external_refcount);
}
Maybe this is a fix.
==
skip_remap = 0;
if (PageAnon(page)) {
rcu_read_lock();
if (!page_mapped(page)) {
if (!PageSwapCache(page))
goto rcu_unlock;
/*
* We can't convice this anon_vma is valid or not because
* !page_mapped(page). Then, we do migration(radix-tree replacement)
* but don't remap it which touches anon_vma in page->mapping.
*/
skip_remap = 1;
goto skip_unmap;
} else {
anon_vma = page_anon_vma(page);
atomic_inc(&anon_vma->external_refcount);
}
}
.....copy page, radix-tree replacement,....
if (!rc && !skip_remap)
remove_migration_ptes(page, page);
==
Thanks,
-Kame
--
On Thu, Apr 1, 2010 at 2:42 PM, KAMEZAWA Hiroyuki
It's not enough.
we uses remove_migration_ptes in move_to_new_page, too.
We have to prevent it.
We can check PageSwapCache(page) in move_to_new_page and then
skip remove_migration_ptes.
ex)
static int move_to_new_page(....)
{
int swapcache = PageSwapCache(page);
...
if (!swapcache)
if(!rc)
remove_migration_ptes
else
newpage->mapping = NULL;
}
And we have to close race between PageAnon(page) and rcu_read_lock.
If we don't do it, anon_vma could be free in the middle of operation.
I means
* of migration. File cache pages are no problem because of page_lock()
* File Caches may use write_page() or lock_page() in migration, then,
* just care Anon page here.
*/
if (PageAnon(page)) {
!!! RACE !!!!
rcu_read_lock();
rcu_locked = 1;
+
+ /*
+ * If the page has no mappings any more, just bail. An
+ * unmapped anon page is likely to be freed soon but worse,
--
Kind regards,
Minchan Kim
--
Not so sure on this. The page is locked at this point and that should
I am not sure this race exists because the page is locked but a key
observation has been made - A page that is unmapped can be migrated if
it's PageSwapCache but it may not have a valid anon_vma. Hence, in the
!page_mapped case, the key is to not use anon_vma. How about the
following patch?
==== CUT HERE ====
mm,migration: Allow the migration of PageSwapCache pages
PageAnon pages that are unmapped may or may not have an anon_vma so are
not currently migrated. However, a swap cache page can be migrated and
fits this description. This patch identifies page swap caches and allows
them to be migrated but ensures that no attempt to made to remap the pages
would would potentially try to access an already freed anon_vma.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff --git a/mm/migrate.c b/mm/migrate.c
index 35aad2a..5d0218b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -484,7 +484,8 @@ static int fallback_migrate_page(struct address_space *mapping,
* < 0 - error code
* == 0 - success
*/
-static int move_to_new_page(struct page *newpage, struct page *page)
+static int move_to_new_page(struct page *newpage, struct page *page,
+ int safe_to_remap)
{
struct address_space *mapping;
int rc;
@@ -519,10 +520,12 @@ static int move_to_new_page(struct page *newpage, struct page *page)
else
rc = fallback_migrate_page(mapping, newpage, page);
- if (!rc)
- remove_migration_ptes(page, newpage);
- else
- newpage->mapping = NULL;
+ if (safe_to_remap) {
+ if (!rc)
+ remove_migration_ptes(page, newpage);
+ else
+ newpage->mapping = NULL;
+ }
unlock_page(newpage);
@@ -539,6 +542,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
int rc = 0;
int *result = NULL;
struct page *newpage = get_new_page(page, private, &result);
+ int safe_to_remap = 1;
int rcu_locked = 0;
int charge = 0;
struct mem_cgroup *mem = NULL;
@@ ...page lock can't prevent anon_vma free. I like this. Kame. How about your opinion? How about changing comment? "In this case, swapcache page still " Also, I want to change "safe_to_remap" to "remap_swapcache". I think it's just problem related to swapcache page. So I want to represent it explicitly although we can know it's swapcache by code. -- Kind regards, Minchan Kim --
True, it can't in itself but it is a bug to free a locked page. As PageAnon is cleared by the page allocator (see comments in page_remove_rmap) and we have taken a reference to this page when isolating for migration, I still Sure. Thanks -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
On Thu, 1 Apr 2010 18:36:41 +0100 if (rc) newpage->mapping = NULL; else if (safe_to_remap) remove_migrateion_ptes(page, newpage); Is better. Old code cleared newpage->mapping if rc!=0. Thanks, -Kame --
True, done. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
If I remove the check for (PageSwapCache(page) && !page_mapped(page)) in rmap_walk(), then the bug below occurs. The first one is lockdep going bad because it's accessing a bad lock implying that anon_vma->lock is already invalid. The bug that triggers after it is the list walk. [ 373.951347] INFO: trying to register non-static key. [ 373.984314] the code is fine but needs lockdep annotation. [ 374.020512] turning off the locking correctness validator. [ 374.020512] Pid: 4272, comm: bench-stresshig Not tainted 2.6.34-rc2-mm1-compaction-v7r5 #2 [ 374.020512] Call Trace: [ 374.020512] [<ffffffff810758f2>] __lock_acquire+0xf99/0x1776 [ 374.020512] [<ffffffff810761c5>] lock_acquire+0xf6/0x122 [ 374.020512] [<ffffffff810ef121>] ? rmap_walk+0x5c/0x16d [ 374.020512] [<ffffffff812fcfeb>] _raw_spin_lock+0x3b/0x47 [ 374.020512] [<ffffffff810ef121>] ? rmap_walk+0x5c/0x16d [ 374.020512] [<ffffffff810ef121>] rmap_walk+0x5c/0x16d [ 374.020512] [<ffffffff81106396>] ? remove_migration_pte+0x0/0x234 [ 374.677618] [<ffffffff81300dc1>] ? sub_preempt_count+0x9/0x83 [ 374.677618] [<ffffffff81106914>] ? migrate_page_copy+0xa0/0x1ed [ 374.677618] [<ffffffff81106ea4>] migrate_pages+0x3fc/0x5d3 [ 374.880569] [<ffffffff81106c56>] ? migrate_pages+0x1ae/0x5d3 [ 374.994700] [<ffffffff81073a24>] ? trace_hardirqs_on_caller+0x110/0x134 [ 375.018405] [<ffffffff81107e11>] ? compaction_alloc+0x0/0x283 [ 375.097256] [<ffffffff811079b0>] ? compact_zone+0x14e/0x4bd [ 375.097256] [<ffffffff812fd851>] ? _raw_spin_unlock_irq+0x30/0x5d [ 375.097256] [<ffffffff81073a24>] ? trace_hardirqs_on_caller+0x110/0x134 [ 375.097256] [<ffffffff81107b43>] compact_zone+0x2e1/0x4bd [ 375.097256] [<ffffffff811082f2>] try_to_compact_pages+0x1de/0x248 [ 375.516928] [<ffffffff810d3cd2>] __alloc_pages_nodemask+0x45a/0x81c [ 375.516928] [<ffffffff812fde14>] ? restore_args+0x0/0x30 [ 375.620035] [<ffffffff8103995e>] ? finish_task_switch+0x0/0xe3 [ 375.684491] [<ffffffff810fe297>] ...
Thanks. I think it's possible. It's subtle problem. Assume !page_mapped && PageAnon(page) && PageSwapCache 0. PageAnon check 1. race window <---- anon_vma free!!!! 2. rcu_read_lock() 3. skip_unmap 4. move_to_new_page 5. newpage->mapping = page->mapping <--- !!!! It's invalid 6. mapping->a_ops->migratepage 7. radix tree change, copy page (still new page anon is NULL) 8. remove_migrate_ptes 9. rmap_walk 10. PageAnon is true --> we are deceived. 11. rmap_walk_anon -> go bomb! Does it make sense? -- Kind regards, Minchan Kim --
