[bad page] memcg: another bad page at page migration (2.6.26-rc5-mm3 + patch collection)

Previous thread: [ANNOUNCE] Position Statement on Linux Kernel Modules by Greg KH on Monday, June 23, 2008 - 1:01 am. (8 messages)

Next thread: Broken ttusb-dec DVB support since, well, year(s) by barry bouwsma on Monday, June 23, 2008 - 2:55 am. (3 messages)
To: <linux-mm@...>
Cc: <kamezawa.hiroyu@...>, <balbir@...>, <xemul@...>, <linux-kernel@...>
Date: Monday, June 23, 2008 - 1:53 am

Hi.

It seems the current -mm has been gradually stabilized,
but I encounter another bad page problem in my test(*1)
on 2.6.26-rc5-mm3 + patch collection(*2).

Compared to previous probrems fixed by the patch collection,
the frequency is law.

- 1 time in 1 hour running(1'st one was seen after 30 minutes)
- 3 times in 16 hours running(1'st one was seen after 4 hours)
- 10 times in 70 hours running(1'st one was seen after 8 hours)

All bad pages show similar message like below:

---
Bad page state in process 'switch.sh'
page:ffffe2000c8e59c0 flags:0x0200000000080018 mapping:000
0000000000000 mapcount:0 count:0
cgroup:ffff81062a817050
Trying to fix it up, but a reboot is needed
Backtrace:
Pid: 14980, comm: switch.sh Not tainted 2.6.26-rc5-mm3-mem
fix #1
Jun 19 20:10:23 opteron kernel:
Call Trace:
[<ffffffff802747b0>] bad_page+0x97/0x131
[<ffffffff80275ae6>] free_hot_cold_page+0xd4/0x19c
[<ffffffff80275bcf>] __pagevec_free+0x21/0x2e
[<ffffffff80278d51>] release_pages+0x18d/0x19f
[<ffffffff80278e58>] ____pagevec_lru_add+0xf5/0x106
[<ffffffff8027a5ea>] putback_lru_page+0x52/0xe9
[<ffffffff8029baec>] migrate_pages+0x331/0x42a
[<ffffffff8029070f>] new_node_page+0x0/0x2f
[<ffffffff802915a9>] do_migrate_pages+0x19b/0x1e7
[<ffffffff8025c827>] cpuset_migrate_mm+0x58/0x8f
[<ffffffff8025d0fd>] cpuset_attach+0x8b/0x9e
[<ffffffff8025a3e1>] cgroup_attach_task+0x3a3/0x3f5
[<ffffffff8029db71>] __dentry_open+0x154/0x238
[<ffffffff8025af06>] cgroup_common_file_write+0x150/0x1dd
[<ffffffff8025aaf4>] cgroup_file_write+0x54/0x150
[<ffffffff8030a335>] selinux_file_permission+0x56/0x117
[<ffffffff8029f74d>] vfs_write+0xad/0x136
[<ffffffff8029fc8a>] sys_write+0x45/0x6e
[<ffffffff8020bef2>] tracesys+0xd5/0xda
Jun 19 20:10:23 opteron kernel:
Hexdump:
000: 28 00 08 00 00 00 00 02 01 00 00 00 00 00 00 00
010: 00 00 00 00 00 00 00 00 a1 f1 08 25 03 81 ff ff
020: 6e...

To: Daisuke Nishimura <nishimura@...>
Cc: <linux-mm@...>, <balbir@...>, <xemul@...>, <linux-kernel@...>
Date: Tuesday, June 24, 2008 - 1:51 am

Hi, Nishimura-san. thank you for all your help.

I think this one is......hopefully.

==

In general, mem_cgroup's charge on ANON page is removed when page_remove_rmap()
is called.

At migration, the newpage is remapped again by remove_migration_ptes(). But
pte may be already changed (by task exits).
It is charged at page allocation but have no chance to be uncharged in that
case because it is never added to rmap.

Handle that corner case in mem_cgroup_end_migration().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
mm/memcontrol.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)

Index: test2-2.6.26-rc5-mm3/mm/memcontrol.c
===================================================================
--- test2-2.6.26-rc5-mm3.orig/mm/memcontrol.c
+++ test2-2.6.26-rc5-mm3/mm/memcontrol.c
@@ -747,10 +747,22 @@ int mem_cgroup_prepare_migration(struct
/* remove redundant charge if migration failed*/
void mem_cgroup_end_migration(struct page *newpage)
{
- /* At success, page->mapping is not NULL and nothing to do. */
+ /*
+ * At success, page->mapping is not NULL.
+ * special rollback care is necessary when
+ * 1. at migration failure. (newpage->mapping is cleared in this case)
+ * 2. the newpage was moved but not remapped again because the task
+ * exits and the newpage is obsolete. In this case, the new page
+ * may be a swapcache. So, we just call mem_cgroup_uncharge_page()
+ * always for avoiding mess. The page_cgroup will be removed if
+ * unnecessary. File cache pages is still on radix-tree. Don't
+ * care it.
+ */
if (!newpage->mapping)
__mem_cgroup_uncharge_common(newpage,
MEM_CGROUP_CHARGE_TYPE_FORCE);
+ else if (PageAnon(newpage))
+ mem_cgroup_uncharge_page(newpage);
}

/*

--

To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Cc: Daisuke Nishimura <nishimura@...>, <linux-mm@...>, <xemul@...>, <linux-kernel@...>
Date: Tuesday, June 24, 2008 - 3:27 am

Definitely makes sense to me!

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--

To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Cc: <linux-mm@...>, <balbir@...>, <xemul@...>, <linux-kernel@...>
Date: Tuesday, June 24, 2008 - 3:19 am

I hope so too :)

I think the corner case that this patch fixes is likely
in my case(there may be other cases though..).

Thanks,
Daisuke Nishimura.
--

To: Daisuke Nishimura <nishimura@...>
Cc: <linux-mm@...>, <balbir@...>, <xemul@...>, <linux-kernel@...>
Date: Tuesday, June 24, 2008 - 3:30 am

On Tue, 24 Jun 2008 16:19:03 +0900
Thanks, will rewrite.

Regards,

--

To: Daisuke Nishimura <nishimura@...>
Cc: <linux-mm@...>, <balbir@...>, <xemul@...>, <linux-kernel@...>
Date: Monday, June 23, 2008 - 2:08 am

On Mon, 23 Jun 2008 14:53:41 +0900
Thank you. I'll dig this.

--

To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Cc: Daisuke Nishimura <nishimura@...>, <linux-mm@...>, <balbir@...>, <xemul@...>, <linux-kernel@...>
Date: Monday, June 23, 2008 - 7:21 am

On Mon, 23 Jun 2008 15:08:17 +0900
Here is one possibilty. But if your test doesn't migrate any shmem,
I'll have to dig more ;)
Anyway, I'll schedule this patch.

-Kame
=
mem_cgroup_uncharge() against old page is done after radix-tree-replacement.
And there were special handling to ingore swap-cache page. But, shmem can
be swap-cache and file-cache at the same time. Chekcing PageSwapCache() is
not correct here. Check PageAnon() instead.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
mm/migrate.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

Index: test2-2.6.26-rc5-mm3/mm/migrate.c
===================================================================
--- test2-2.6.26-rc5-mm3.orig/mm/migrate.c
+++ test2-2.6.26-rc5-mm3/mm/migrate.c
@@ -330,7 +330,13 @@ static int migrate_page_move_mapping(str
__inc_zone_page_state(newpage, NR_FILE_PAGES);

spin_unlock_irq(&mapping->tree_lock);
- if (!PageSwapCache(newpage))
+
+ /*
+ * The page is removed from radix-tree implicitly.
+ * We uncharge it here but swap cache of anonymous page should be
+ * uncharged by mem_cgroup_ucharge_page().
+ */
+ if (!PageAnon(newpage))
mem_cgroup_uncharge_cache_page(page);

return 0;
@@ -379,7 +385,8 @@ static void migrate_page_copy(struct pag
/*
* SwapCache is removed implicitly. Uncharge against swapcache
* should be called after ClearPageSwapCache() because
- * mem_cgroup_uncharge_page checks the flag.
+ * mem_cgroup_uncharge_page checks the flag. shmem's swap cache
+ * is uncharged before here.
*/
mem_cgroup_uncharge_page(page);
}

--

To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Cc: <linux-mm@...>, <balbir@...>, <xemul@...>, <linux-kernel@...>
Date: Monday, June 23, 2008 - 7:44 am

Thank you for your investigation and a patch!

I don't use shmem explicitly, but I'll test this patch anyway
and report the result.

Considering the frequency of the problem, it will take long time
to tell whether this patch fixes the problem, so please wait :)

Thanks,
Daisuke Nishimura.

--

To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Cc: <linux-mm@...>, <balbir@...>, <xemul@...>, <linux-kernel@...>
Date: Monday, June 23, 2008 - 9:37 pm

Unfortunately, this patch doesn't solve my problem, hum...
I'll dig more, too.
In my test, I don't use large amount of memory, so I think
no swap activities happens, perhaps.

Anyway, I agree that this patch itself is needed for shmem migration.

Thanks,
Daisuke Nishimura.
--

To: Daisuke Nishimura <nishimura@...>
Cc: <linux-mm@...>, <balbir@...>, <xemul@...>, <linux-kernel@...>
Date: Monday, June 23, 2008 - 11:22 pm

On Tue, 24 Jun 2008 10:37:09 +0900
Sigh, one hint in the log is
==
Bad page state in process 'switch.sh'
page:ffffe2000c8e59c0 flags:0x0200000000080018 mapping:000
0000000000000 mapcount:0 count:0
cgroup:ffff81062a817050
==

- the page was mapped one.
- a page is swapbacked ....Anon or Shmem/tmpfs.
- mapping is NULL

When it was a *source* page.
.. if it was Anon, page->mapping was cleared by migrate_page_copy()
.. if not, replacement in radix-tree was succeeded.

When it was a destination page
.. page-flags is copied, then, migrate_page_copy() was called.
.. newpage->mapping is cleared only at migration failure.

Hmm..I think the troublesome page is *source* page now.

Anyway, thanks.
-Kame

--

To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...>
Cc: Daisuke Nishimura <nishimura@...>, <linux-mm@...>, <balbir@...>, <xemul@...>, <linux-kernel@...>
Date: Monday, June 23, 2008 - 11:26 pm

On Tue, 24 Jun 2008 12:22:57 +0900
ignore this... free_hot_cold_page() clears page->mapping. (--;

-Kame

--

Previous thread: [ANNOUNCE] Position Statement on Linux Kernel Modules by Greg KH on Monday, June 23, 2008 - 1:01 am. (8 messages)

Next thread: Broken ttusb-dec DVB support since, well, year(s) by barry bouwsma on Monday, June 23, 2008 - 2:55 am. (3 messages)