[PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-rc5-mm3

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Andrew Morton <akpm@...>
Cc: Rik van Riel <riel@...>, Lee Schermerhorn <lee.schermerhorn@...>, Kosaki Motohiro <kosaki.motohiro@...>, Nick Piggin <npiggin@...>, <linux-mm@...>, <linux-kernel@...>, <kernel-testers@...>
Date: Tuesday, June 17, 2008 - 3:35 am

Hi.

I got this bug while migrating pages only a few times
via memory_migrate of cpuset.

Unfortunately, even if this patch is applied,
I got bad_page problem after hundreds times of page migration
(I'll report it in another mail).
But I believe something like this patch is needed anyway.

------------[ cut here ]------------
kernel BUG at mm/migrate.c:719!
invalid opcode: 0000 [1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
CPU 0
Modules linked in: ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_log dm_multipath dm_mod sbs sbshc button battery acpi_memhotplug ac parport_pc lp parport floppy serio_raw rtc_cmos rtc_core rtc_lib 8139too pcspkr 8139cp mii ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode]
Pid: 3096, comm: switch.sh Not tainted 2.6.26-rc5-mm3 #1
RIP: 0010:[<ffffffff8029bb85>]  [<ffffffff8029bb85>] migrate_pages+0x33e/0x49f
RSP: 0018:ffff81002f463bb8  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffffe20000c17500 RCX: 0000000000000034
RDX: ffffe20000c17500 RSI: ffffe200010003c0 RDI: ffffe20000c17528
RBP: ffffe200010003c0 R08: 8000000000000000 R09: 304605894800282f
R10: 282f87058b480028 R11: 0028304005894800 R12: ffff81003f90a5d8
R13: 0000000000000000 R14: ffffe20000bf4cc0 R15: ffff81002f463c88
FS:  00007ff9386576f0(0000) GS:ffffffff8061d800(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007ff938669000 CR3: 000000002f458000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process switch.sh (pid: 3096, threadinfo ffff81002f462000, task ffff81003e99cf10)
Stack:  0000000000000001 ffffffff80290777 0000000000000000 0000000000000000
 ffff81002f463c88 ffff81000000ea18 ffff81002f463c88 000000000000000c
 ffff81002f463ca8 00007ffffffff000 00007fff649f6000 0000000000000004
Call Trace:
 [<ffffffff80290777>] ? new_node_page+0x0/0x2f
 [<ffffffff80291611>] ? do_migrate_pages+0x19b/0x1e7
 [<ffffffff802315c7>] ? set_cpus_allowed_ptr+0xe6/0xf3
 [<ffffffff8025c827>] ? cpuset_migrate_mm+0x58/0x8f
 [<ffffffff8025d0fd>] ? cpuset_attach+0x8b/0x9e
 [<ffffffff8025a3e1>] ? cgroup_attach_task+0x3a3/0x3f5
 [<ffffffff80276cb5>] ? __alloc_pages_internal+0xe2/0x3d1
 [<ffffffff8025af06>] ? cgroup_common_file_write+0x150/0x1dd
 [<ffffffff8025aaf4>] ? cgroup_file_write+0x54/0x150
 [<ffffffff8029f839>] ? vfs_write+0xad/0x136
 [<ffffffff8029fd76>] ? sys_write+0x45/0x6e
 [<ffffffff8020bef2>] ? tracesys+0xd5/0xda


Code: 4c 48 8d 7b 28 e8 cc 87 09 00 48 83 7b 18 00 75 30 48 8b 03 48 89 da 25 00 40 00 00 48 85 c0 74 04 48 8b 53 10 83 7a 08 01 74 04 <0f> 0b eb fe 48 89 df e8 5e 50 fd ff 48 89 df e8 7d d6 fd ff eb
RIP  [<ffffffff8029bb85>] migrate_pages+0x33e/0x49f
 RSP <ffff81002f463bb8>
Clocksource tsc unstable (delta = 438246251 ns)
---[ end trace ce4e6053f7b9bba1 ]---


This bug is caused by VM_BUG_ON() in unmap_and_move().

unmap_and_move()
    710         if (rc != -EAGAIN) {
    711                 /*
    712                  * A page that has been migrated has all references
    713                  * removed and will be freed. A page that has not been
    714                  * migrated will have kepts its references and be
    715                  * restored.
    716                  */
    717                 list_del(&page->lru);
    718                 if (!page->mapping) {
    719                         VM_BUG_ON(page_count(page) != 1);
    720                         unlock_page(page);
    721                         put_page(page);         /* just free the old page */
    722                         goto end_migration;
    723                 } else
    724                         unlock = putback_lru_page(page);
    725         }

I think the page count is not necessarily 1 here, because
migration_entry_wait increases page count and waits for the
page to be unlocked.
So, if the old page is accessed between migrate_page_move_mapping,
which checks the page count, and remove_migration_ptes, page count
would not be 1 here.

Actually, just commenting out get/put_page from migration_entry_wait
works well in my environment(succeeded in hundreds times of page migration),
but modifying migration_entry_wait this way is not good, I think.


This patch depends on Lee Schermerhorn's fix for double unlock_page.

This patch also fixes a race between migrate_entry_wait and
page_freeze_refs in migrate_page_move_mapping.


Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

---
diff -uprN linux-2.6.26-rc5-mm3/mm/migrate.c linux-2.6.26-rc5-mm3-test/mm/migrate.c
--- linux-2.6.26-rc5-mm3/mm/migrate.c	2008-06-17 15:31:23.000000000 +0900
+++ linux-2.6.26-rc5-mm3-test/mm/migrate.c	2008-06-17 13:59:15.000000000 +0900
@@ -232,6 +232,7 @@ void migration_entry_wait(struct mm_stru
 	swp_entry_t entry;
 	struct page *page;
 
+retry:
 	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
 	pte = *ptep;
 	if (!is_swap_pte(pte))
@@ -243,11 +244,20 @@ void migration_entry_wait(struct mm_stru
 
 	page = migration_entry_to_page(entry);
 
-	get_page(page);
-	pte_unmap_unlock(ptep, ptl);
-	wait_on_page_locked(page);
-	put_page(page);
-	return;
+	/*
+	 * page count might be set to zero by page_freeze_refs()
+	 * in migrate_page_move_mapping().
+	 */
+	if (get_page_unless_zero(page)) {
+		pte_unmap_unlock(ptep, ptl);
+		wait_on_page_locked(page);
+		put_page(page);
+		return;
+	} else {
+		pte_unmap_unlock(ptep, ptl);
+		goto retry;
+	}
+
 out:
 	pte_unmap_unlock(ptep, ptl);
 }
@@ -715,13 +725,7 @@ unlock:
  		 * restored.
  		 */
  		list_del(&page->lru);
-		if (!page->mapping) {
-			VM_BUG_ON(page_count(page) != 1);
-			unlock_page(page);
-			put_page(page);		/* just free the old page */
-			goto end_migration;
-		} else
-			unlock = putback_lru_page(page);
+		unlock = putback_lru_page(page);
 	}
 
 	if (unlock)
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
2.6.26-rc5-mm3, Andrew Morton, (Thu Jun 12, 1:59 am)
Re: 2.6.26-rc5-mm3: BUG large value for HugePages_Rsvd, Jon Tollefson, (Thu Jun 19, 12:27 pm)
Re: 2.6.26-rc5-mm3: BUG large value for HugePages_Rsvd, Andy Whitcroft, (Thu Jun 19, 1:16 pm)
Re: 2.6.26-rc5-mm3: BUG large value for HugePages_Rsvd, Jon Tollefson, (Thu Jun 19, 11:18 pm)
[BUG][PATCH -mm] avoid BUG() in __stop_machine_run(), Hidehiro Kawai, (Thu Jun 19, 2:59 am)
Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run(), Rusty Russell, (Thu Jun 19, 6:12 am)
Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run(), Jeremy Fitzhardinge, (Thu Jun 19, 11:51 am)
Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run(), Rusty Russell, (Sun Jun 22, 11:55 pm)
[PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6.26-r..., Daisuke Nishimura, (Tue Jun 17, 3:35 am)
Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6...., KAMEZAWA Hiroyuki, (Tue Jun 17, 9:13 pm)
[PATCH] migration_entry_wait fix., KAMEZAWA Hiroyuki, (Tue Jun 17, 9:54 pm)
Re: [PATCH] migration_entry_wait fix., Nick Piggin, (Wed Jun 18, 1:35 am)
Re: [PATCH] migration_entry_wait fix., KAMEZAWA Hiroyuki, (Wed Jun 18, 2:04 am)
Re: [PATCH] migration_entry_wait fix., Nick Piggin, (Wed Jun 18, 2:42 am)
Re: [PATCH] migration_entry_wait fix., KAMEZAWA Hiroyuki, (Wed Jun 18, 2:52 am)
[PATCH -mm][BUGFIX] migration_entry_wait fix. v2, KAMEZAWA Hiroyuki, (Wed Jun 18, 3:29 am)
Re: [PATCH -mm][BUGFIX] migration_entry_wait fix. v2, Nick Piggin, (Wed Jun 18, 3:40 am)
Re: [PATCH -mm][BUGFIX] migration_entry_wait fix. v2, KOSAKI Motohiro, (Wed Jun 18, 3:26 am)
Re: [PATCH] migration_entry_wait fix., KOSAKI Motohiro, (Wed Jun 18, 1:26 am)
Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6...., Daisuke Nishimura, (Tue Jun 17, 9:26 pm)
Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6...., Daisuke Nishimura, (Tue Jun 17, 10:59 pm)
Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6...., Daisuke Nishimura, (Tue Jun 17, 9:54 pm)
Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6...., Daisuke Nishimura, (Wed Jun 18, 12:41 am)
[PATCH][-mm] remove redundant page-&gt;mapping check, KOSAKI Motohiro, (Wed Jun 18, 3:54 am)
Re: [PATCH][RFC] fix kernel BUG at mm/migrate.c:719! in 2.6...., KAMEZAWA Hiroyuki, (Wed Jun 18, 12:59 am)
[Bad page] trying to free locked page? (Re: [PATCH][RFC] fix..., Daisuke Nishimura, (Tue Jun 17, 3:47 am)
[Experimental][PATCH] putback_lru_page rework, KAMEZAWA Hiroyuki, (Wed Jun 18, 5:40 am)
Re: [Experimental][PATCH] putback_lru_page rework, Lee Schermerhorn, (Wed Jun 18, 2:21 pm)
Re: [Experimental][PATCH] putback_lru_page rework, KAMEZAWA Hiroyuki, (Wed Jun 18, 8:22 pm)
Re: [Experimental][PATCH] putback_lru_page rework, Lee Schermerhorn, (Thu Jun 19, 10:45 am)
Re: [Experimental][PATCH] putback_lru_page rework, KAMEZAWA Hiroyuki, (Thu Jun 19, 9:13 pm)
Re: [Experimental][PATCH] putback_lru_page rework, KOSAKI Motohiro, (Sat Jun 21, 4:39 am)
Re: [Experimental][PATCH] putback_lru_page rework, Lee Schermerhorn, (Fri Jun 20, 1:10 pm)
Re: [Experimental][PATCH] putback_lru_page rework, KOSAKI Motohiro, (Sat Jun 21, 4:41 am)
Re: [Experimental][PATCH] putback_lru_page rework, Lee Schermerhorn, (Fri Jun 20, 4:41 pm)
Re: [Experimental][PATCH] putback_lru_page rework, KOSAKI Motohiro, (Sat Jun 21, 4:56 am)
Re: [Experimental][PATCH] putback_lru_page rework, KAMEZAWA Hiroyuki, (Sun Jun 22, 8:30 pm)
Re: [Experimental][PATCH] putback_lru_page rework, KAMEZAWA Hiroyuki, (Thu Jun 19, 8:47 pm)
Re: Re: [Experimental][PATCH] putback_lru_page rework, Lee Schermerhorn, (Fri Jun 20, 12:24 pm)
Re: [Experimental][PATCH] putback_lru_page rework, Daisuke Nishimura, (Wed Jun 18, 10:50 am)
Re: [Experimental][PATCH] putback_lru_page rework, KOSAKI Motohiro, (Wed Jun 18, 7:36 am)
Re: [Experimental][PATCH] putback_lru_page rework, KAMEZAWA Hiroyuki, (Wed Jun 18, 7:55 am)
Re: [Experimental][PATCH] putback_lru_page rework, Daisuke Nishimura, (Thu Jun 19, 4:00 am)
Re: [Experimental][PATCH] putback_lru_page rework, KAMEZAWA Hiroyuki, (Thu Jun 19, 4:24 am)
Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC]..., Daisuke Nishimura, (Tue Jun 17, 10:32 pm)
Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC]..., KAMEZAWA Hiroyuki, (Tue Jun 17, 5:03 am)
Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC]..., Daisuke Nishimura, (Tue Jun 17, 5:15 am)
Re: [Bad page] trying to free locked page? (Re: [PATCH][RFC]..., Daisuke Nishimura, (Tue Jun 17, 10:40 pm)
Re: 2.6.26-rc5-mm3, Byron Bradley, (Thu Jun 12, 7:32 pm)
Re: 2.6.26-rc5-mm3, Daniel Walker, (Wed Jun 18, 1:55 pm)
Re: 2.6.26-rc5-mm3, Ingo Molnar, (Thu Jun 19, 5:13 am)
Re: 2.6.26-rc5-mm3, Daniel Walker, (Thu Jun 19, 10:39 am)
Re: 2.6.26-rc5-mm3, Daniel Walker, (Thu Jun 12, 7:55 pm)
Re: 2.6.26-rc5-mm3, Byron Bradley, (Thu Jun 12, 8:04 pm)
[BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!, Kamalesh Babulal, (Thu Jun 12, 4:44 am)
Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!, Andrew Morton, (Fri Jun 13, 3:16 am)
Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!, Andrew Morton, (Thu Jun 12, 4:57 am)
Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!, KAMEZAWA Hiroyuki, (Thu Jun 12, 8:25 pm)
Re: [BUG] 2.6.26-rc5-mm3 kernel BUG at mm/filemap.c:575!, KAMEZAWA Hiroyuki, (Thu Jun 12, 7:20 am)
[PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kernel BU..., KAMEZAWA Hiroyuki, (Thu Jun 12, 9:44 pm)
Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kerne..., Lee Schermerhorn, (Fri Jun 13, 11:30 am)
Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kerne..., KAMEZAWA Hiroyuki, (Mon Jun 16, 10:32 pm)
Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kerne..., Lee Schermerhorn, (Tue Jun 17, 11:26 am)
Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kerne..., Lee Schermerhorn, (Mon Jun 16, 10:49 am)
Re: [PATCH] fix double unlock_page() in 2.6.26-rc5-mm3 kerne..., Kamalesh Babulal, (Sat Jun 14, 11:59 pm)
2.6.26-rc5-mm3: kernel BUG at mm/vmscan.c:510, Alexey Dobriyan, (Thu Jun 12, 3:58 am)
Re: 2.6.26-rc5-mm3: kernel BUG at mm/vmscan.c:510, Andrew Morton, (Thu Jun 12, 4:22 am)
Re: 2.6.26-rc5-mm3: kernel BUG at mm/vmscan.c:510, Alexey Dobriyan, (Thu Jun 12, 4:23 am)