[PATCH 14/14] mm,migration: Allow the migration of PageSwapCache pages

Previous thread: [PATCH 08/14] Memory compaction core by Mel Gorman on Tuesday, March 30, 2010 - 2:14 am. (1 message)

Next thread: [PATCH 13/14] Do not compact within a preferred zone after a compaction failure by Mel Gorman on Tuesday, March 30, 2010 - 2:14 am. (1 message)
From: Mel Gorman
Date: Tuesday, March 30, 2010 - 2:14 am

PageAnon pages that are unmapped may or may not have an anon_vma so
are not currently migrated. However, a swap cache page can be migrated
and fits this description. This patch identifies page swap caches and
allows them to be migrated.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/migrate.c |   15 ++++++++++-----
 mm/rmap.c    |    6 ++++--
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 35aad2a..f9bf37e 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -203,6 +203,9 @@ static int migrate_page_move_mapping(struct address_space *mapping,
 	void **pslot;
 
 	if (!mapping) {
+		if (PageSwapCache(page))
+			SetPageSwapCache(newpage);
+
 		/* Anonymous page without mapping */
 		if (page_count(page) != 1)
 			return -EAGAIN;
@@ -607,11 +610,13 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
 		 * the page was isolated and when we reached here while
 		 * the RCU lock was not held
 		 */
-		if (!page_mapped(page))
-			goto rcu_unlock;
-
-		anon_vma = page_anon_vma(page);
-		atomic_inc(&anon_vma->external_refcount);
+		if (!page_mapped(page)) {
+			if (!PageSwapCache(page))
+				goto rcu_unlock;
+		} else {
+			anon_vma = page_anon_vma(page);
+			atomic_inc(&anon_vma->external_refcount);
+		}
 	}
 
 	/*
diff --git a/mm/rmap.c b/mm/rmap.c
index af35b75..d5ea1f2 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1394,9 +1394,11 @@ int rmap_walk(struct page *page, int (*rmap_one)(struct page *,
 
 	if (unlikely(PageKsm(page)))
 		return rmap_walk_ksm(page, rmap_one, arg);
-	else if (PageAnon(page))
+	else if (PageAnon(page)) {
+		if (PageSwapCache(page))
+			return SWAP_AGAIN;
 		return rmap_walk_anon(page, rmap_one, arg);
-	else
+	} else
 		return rmap_walk_file(page, rmap_one, arg);
 }
 #endif /* CONFIG_MIGRATION */
-- 
1.6.5

--

From: KAMEZAWA Hiroyuki
Date: Tuesday, March 30, 2010 - 10:26 pm

On Tue, 30 Mar 2010 10:14:49 +0100


Migration of SwapCache requires radix-tree replacement, IOW, 
 mapping == NULL && PageSwapCache is BUG.


SwapCache has a condition as (PageSwapCache(page) && page_mapped(page) == true.

Please see do_swap_page(), PageSwapCache bit is cleared only when

do_swap_page()...
       swap_free(entry);
        if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
                try_to_free_swap(page);

Then, PageSwapCache is cleared only when swap is freeable even if mapped.

rmap_walk_anon() should be called and the check is not necessary.

Thanks,
-Kame

--

From: Mel Gorman
Date: Wednesday, March 31, 2010 - 4:27 am

Correct. In the initial attempt to allow PageSwapCache pages to move, I
encountered a number of bugs. This was a "it can't be happening but something

I follow your reasoning. What caught me is the following comment;

         * Corner case handling:
         * 1. When a new swap-cache page is read into, it is added to the LRU
         * and treated as swapcache but it has no rmap yet.
         * Calling try_to_unmap() against a page->mapping==NULL page will
         * trigger a BUG.  So handle it here.

and the fact that without the check the following bug is triggered;

[  476.541321] BUG: unable to handle kernel NULL pointer dereference at (null)
[  476.590328] IP: [<ffffffff81072162>] __bfs+0x1a1/0x1d7
[  476.590328] PGD 3781c067 PUD 371b2067 PMD 0 
[  476.590328] Oops: 0000 [#1] PREEMPT SMP 
[  476.590328] last sysfs file: /sys/block/sr0/capability
[  476.590328] CPU 0 
[  476.590328] Modules linked in: highalloc trace_allocmap buddyinfo vmregress_core oprofile dm_crypt loop i2c_piix4 evdev i2c_core processor serio_raw tpm_tis tpm tpm_bios shpchp pci_hotplug button ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod sg sr_mod cdrom sd_mod ata_generic ahci libahci ide_pci_generic libata ide_core scsi_mod ohci_hcd r8169 mii ehci_hcd floppy thermal fan thermal_sys
[  477.296405] 
[  477.296405] Pid: 4343, comm: bench-stresshig Not tainted 2.6.34-rc2-mm1-compaction-v7r3 #1 GA-MA790GP-UD4H/GA-MA790GP-UD4H
[  477.296405] RIP: 0010:[<ffffffff81072162>]  [<ffffffff81072162>] __bfs+0x1a1/0x1d7
[  477.296405] RSP: 0000:ffff88007817d4c8  EFLAGS: 00010046
[  477.296405] RAX: ffffffff81c80170 RBX: ffff88007817d528 RCX: 0000000000000000
[  477.296405] RDX: ffff88007817d528 RSI: 0000000000000000 RDI: ffff88007817d528
[  477.296405] RBP: ffff88007817d518 R08: 0000000000000000 R09: 0000000000000000
[  477.296405] R10: ffffffff816a1d08 R11: 0000000000000046 R12: 0000000000000000
[  477.922839] R13: ffffffff81513887 R14: ffffffff81c80170 R15: 0000000000000000
[  477.922839] ...
From: KAMEZAWA Hiroyuki
Date: Wednesday, March 31, 2010 - 4:57 pm

On Wed, 31 Mar 2010 12:27:30 +0100

rmap_walk_anon() is called against unmapped pages.
Then, !page_mapped() is always true. So, the behavior will not be different from
the last one.

Thanks,
-Kame

--

From: Minchan Kim
Date: Wednesday, March 31, 2010 - 7:39 pm

On Thu, Apr 1, 2010 at 8:57 AM, KAMEZAWA Hiroyuki

rmap_walk_anon can be also called in case of failing try_to_unmap.
Then, In unmap_and_move, page_mapped is true and
remove_migration_ptes can be called.




-- 
Kind regards,
Minchan Kim
--

From: Minchan Kim
Date: Wednesday, March 31, 2010 - 7:43 pm

On Wed, Mar 31, 2010 at 2:26 PM, KAMEZAWA Hiroyuki


Frankly speaking, I don't understand what is Mel's problem, why he added
Swapcache check in rmap_walk, and why do you said we don't need it.




-- 
Kind regards,
Minchan Kim
--

From: KAMEZAWA Hiroyuki
Date: Wednesday, March 31, 2010 - 8:01 pm

On Thu, 1 Apr 2010 11:43:18 +0900
I may miss something.

unmap_and_move()
 1. try_to_unmap(TTU_MIGRATION)
 2. move_to_newpage
 3. remove_migration_ptes
	-> rmap_walk()

Then, to map a page back we unmapped we call rmap_walk().

Assume a SwapCache which is mapped, then, PageAnon(page) == true.

 At 1. try_to_unmap() will rewrite pte with swp_entry of SwapCache.
       mapcount goes to 0.
 At 2. SwapCache is copied to a new page.
 At 3. The new page is mapped back to the place. Now, newpage's mapcount is 0.
       Before patch, the new page is mapped back to all ptes.
       After patch, the new page is not mapped back because its mapcount is 0.

I don't think shared SwapCache of anon is not an usual behavior, so, the logic
before patch is more attractive.

If SwapCache is not mapped before "1", we skip "1" and rmap_walk will do nothing
because page->mapping is NULL.

Thanks,
-Kame
















--

From: Minchan Kim
Date: Wednesday, March 31, 2010 - 9:44 pm

On Thu, Apr 1, 2010 at 12:01 PM, KAMEZAWA Hiroyuki

Thanks. I agree. We don't need the check.
Then, my question is why Mel added the check in rmap_walk.
He mentioned some BUG trigger and fixed things after this patch.
What's it?
Is it really related to this logic?
I don't think so or we are missing something.

-- 
Kind regards,
Minchan Kim
--

From: KAMEZAWA Hiroyuki
Date: Wednesday, March 31, 2010 - 10:42 pm

On Thu, 1 Apr 2010 13:44:29 +0900
Hmm. Consiering again.

Now.
	if (PageAnon(page)) {
		rcu_locked = 1;
		rcu_read_lock();
		if (!page_mapped(page)) {
			if (!PageSwapCache(page))
				goto rcu_unlock;
		} else {
			anon_vma = page_anon_vma(page);
			atomic_inc(&anon_vma->external_refcount);
		}


Maybe this is a fix.

==
	skip_remap = 0;
	if (PageAnon(page)) {
		rcu_read_lock();
		if (!page_mapped(page)) {
			if (!PageSwapCache(page))
				goto rcu_unlock;
			/*
			 * We can't convice this anon_vma is valid or not because
			 * !page_mapped(page). Then, we do migration(radix-tree replacement)
			 * but don't remap it which touches anon_vma in page->mapping.
			 */
			skip_remap = 1;
			goto skip_unmap;
		} else {
			anon_vma = page_anon_vma(page);
			atomic_inc(&anon_vma->external_refcount);
		}
	}	
	.....copy page, radix-tree replacement,....

	if (!rc && !skip_remap)
		 remove_migration_ptes(page, page);
==

Thanks,
-Kame











--

From: Minchan Kim
Date: Thursday, April 1, 2010 - 3:51 am

On Thu, Apr 1, 2010 at 2:42 PM, KAMEZAWA Hiroyuki

It's not enough.
we uses remove_migration_ptes in  move_to_new_page, too.
We have to prevent it.
We can check PageSwapCache(page) in move_to_new_page and then
skip remove_migration_ptes.

ex)
static int move_to_new_page(....)
{
     int swapcache = PageSwapCache(page);
     ...
     if (!swapcache)
         if(!rc)
             remove_migration_ptes
         else
             newpage->mapping = NULL;
}

And we have to close race between PageAnon(page) and rcu_read_lock.
If we don't do it, anon_vma could be free in the middle of operation.
I means

         * of migration. File cache pages are no problem because of page_lock()
         * File Caches may use write_page() or lock_page() in migration, then,
         * just care Anon page here.
         */
        if (PageAnon(page)) {
                !!! RACE !!!!
                rcu_read_lock();
                rcu_locked = 1;

+
+               /*
+                * If the page has no mappings any more, just bail. An
+                * unmapped anon page is likely to be freed soon but worse,


-- 
Kind regards,
Minchan Kim
--

From: Mel Gorman
Date: Thursday, April 1, 2010 - 10:36 am

Not so sure on this. The page is locked at this point and that should

I am not sure this race exists because the page is locked but a key
observation has been made - A page that is unmapped can be migrated if
it's PageSwapCache but it may not have a valid anon_vma. Hence, in the
!page_mapped case, the key is to not use anon_vma. How about the
following patch?

==== CUT HERE ====

mm,migration: Allow the migration of PageSwapCache pages

PageAnon pages that are unmapped may or may not have an anon_vma so are
not currently migrated. However, a swap cache page can be migrated and
fits this description. This patch identifies page swap caches and allows
them to be migrated but ensures that no attempt to made to remap the pages
would would potentially try to access an already freed anon_vma.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>

diff --git a/mm/migrate.c b/mm/migrate.c
index 35aad2a..5d0218b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -484,7 +484,8 @@ static int fallback_migrate_page(struct address_space *mapping,
  *   < 0 - error code
  *  == 0 - success
  */
-static int move_to_new_page(struct page *newpage, struct page *page)
+static int move_to_new_page(struct page *newpage, struct page *page,
+						int safe_to_remap)
 {
 	struct address_space *mapping;
 	int rc;
@@ -519,10 +520,12 @@ static int move_to_new_page(struct page *newpage, struct page *page)
 	else
 		rc = fallback_migrate_page(mapping, newpage, page);
 
-	if (!rc)
-		remove_migration_ptes(page, newpage);
-	else
-		newpage->mapping = NULL;
+	if (safe_to_remap) {
+		if (!rc)
+			remove_migration_ptes(page, newpage);
+		else
+			newpage->mapping = NULL;
+	}
 
 	unlock_page(newpage);
 
@@ -539,6 +542,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
 	int rc = 0;
 	int *result = NULL;
 	struct page *newpage = get_new_page(page, private, &result);
+	int safe_to_remap = 1;
 	int rcu_locked = 0;
 	int charge = 0;
 	struct mem_cgroup *mem = NULL;
@@ ...
From: Minchan Kim
Date: Thursday, April 1, 2010 - 5:20 pm

page lock can't prevent anon_vma free.

I like this. Kame. How about your opinion?

How about changing comment?
"In this case, swapcache page still "
Also, I want to change "safe_to_remap" to "remap_swapcache".
I think it's just problem related to swapcache page.
So I want to represent it explicitly although we can know it's swapcache
by code.


-- 
Kind regards,
Minchan Kim
--

From: Mel Gorman
Date: Friday, April 2, 2010 - 1:51 am

True, it can't in itself but it is a bug to free a locked page. As PageAnon
is cleared by the page allocator (see comments in page_remove_rmap) and we
have taken a reference to this page when isolating for migration, I still


Sure. Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: KAMEZAWA Hiroyuki
Date: Thursday, April 1, 2010 - 5:21 pm

On Thu, 1 Apr 2010 18:36:41 +0100

	if (rc)
		newpage->mapping = NULL;
	else if (safe_to_remap)
		remove_migrateion_ptes(page, newpage);

Is better. Old code cleared newpage->mapping if rc!=0.

Thanks,
-Kame




--

From: Mel Gorman
Date: Friday, April 2, 2010 - 1:52 am

True, done.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: Mel Gorman
Date: Thursday, April 1, 2010 - 2:30 am

If I remove the check for (PageSwapCache(page) && !page_mapped(page))
in rmap_walk(), then the bug below occurs. The first one is lockdep going
bad because it's accessing a bad lock implying that anon_vma->lock is
already invalid. The bug that triggers after it is the list walk.

[  373.951347] INFO: trying to register non-static key.
[  373.984314] the code is fine but needs lockdep annotation.
[  374.020512] turning off the locking correctness validator.
[  374.020512] Pid: 4272, comm: bench-stresshig Not tainted 2.6.34-rc2-mm1-compaction-v7r5 #2
[  374.020512] Call Trace:
[  374.020512]  [<ffffffff810758f2>] __lock_acquire+0xf99/0x1776
[  374.020512]  [<ffffffff810761c5>] lock_acquire+0xf6/0x122
[  374.020512]  [<ffffffff810ef121>] ? rmap_walk+0x5c/0x16d
[  374.020512]  [<ffffffff812fcfeb>] _raw_spin_lock+0x3b/0x47
[  374.020512]  [<ffffffff810ef121>] ? rmap_walk+0x5c/0x16d
[  374.020512]  [<ffffffff810ef121>] rmap_walk+0x5c/0x16d
[  374.020512]  [<ffffffff81106396>] ? remove_migration_pte+0x0/0x234
[  374.677618]  [<ffffffff81300dc1>] ? sub_preempt_count+0x9/0x83
[  374.677618]  [<ffffffff81106914>] ? migrate_page_copy+0xa0/0x1ed
[  374.677618]  [<ffffffff81106ea4>] migrate_pages+0x3fc/0x5d3
[  374.880569]  [<ffffffff81106c56>] ? migrate_pages+0x1ae/0x5d3
[  374.994700]  [<ffffffff81073a24>] ? trace_hardirqs_on_caller+0x110/0x134
[  375.018405]  [<ffffffff81107e11>] ? compaction_alloc+0x0/0x283
[  375.097256]  [<ffffffff811079b0>] ? compact_zone+0x14e/0x4bd
[  375.097256]  [<ffffffff812fd851>] ? _raw_spin_unlock_irq+0x30/0x5d
[  375.097256]  [<ffffffff81073a24>] ? trace_hardirqs_on_caller+0x110/0x134
[  375.097256]  [<ffffffff81107b43>] compact_zone+0x2e1/0x4bd
[  375.097256]  [<ffffffff811082f2>] try_to_compact_pages+0x1de/0x248
[  375.516928]  [<ffffffff810d3cd2>] __alloc_pages_nodemask+0x45a/0x81c
[  375.516928]  [<ffffffff812fde14>] ? restore_args+0x0/0x30
[  375.620035]  [<ffffffff8103995e>] ? finish_task_switch+0x0/0xe3
[  375.684491]  [<ffffffff810fe297>] ...
From: Minchan Kim
Date: Thursday, April 1, 2010 - 3:42 am

Thanks. I think it's possible. It's subtle problem.
Assume !page_mapped  && PageAnon(page)  && PageSwapCache

0. PageAnon check
1. race window <---- anon_vma free!!!!
2. rcu_read_lock()
3. skip_unmap
4. move_to_new_page
5. newpage->mapping = page->mapping <--- !!!! It's invalid
6.     mapping->a_ops->migratepage
7.         radix tree change, copy page (still new page anon is NULL)
8.     remove_migrate_ptes
9.     rmap_walk
10.         PageAnon is true --> we are deceived.
11.         rmap_walk_anon -> go bomb!

Does it make sense?
-- 
Kind regards,
Minchan Kim
--

Previous thread: [PATCH 08/14] Memory compaction core by Mel Gorman on Tuesday, March 30, 2010 - 2:14 am. (1 message)

Next thread: [PATCH 13/14] Do not compact within a preferred zone after a compaction failure by Mel Gorman on Tuesday, March 30, 2010 - 2:14 am. (1 message)