Re: [BUGFIX][PATCH] fix race in file_mapped accounting in memcg

Previous thread: Need to intercept open&write calls for vdi file when VirtualBox is working. by Sergey Malykhin on Tuesday, March 23, 2010 - 11:04 pm. (2 messages)

Next thread: [GIT PULL] sound fixes by Takashi Iwai on Wednesday, March 24, 2010 - 12:07 am. (1 message)
From: KAMEZAWA Hiroyuki
Date: Tuesday, March 23, 2010 - 11:43 pm

A fix for race in file_mapped statistics. I noticed this race while discussing
Andrea's dirty accounting patch series. 
At the end of discusstion, I said "please don't touch file mapped". So, this bugfix
should be posted as an independent patch.
Tested on the latest mmotm.

Thanks,
-Kame

==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Now, memcg's FILE_MAPPED accounting has following race with
move_account (happens at rmdir()).

    increment page->mapcount (rmap.c)
    mem_cgroup_update_file_mapped()           move_account()
					      lock_page_cgroup()
					      check page_mapped() if
					      page_mapped(page)>1 {
						FILE_MAPPED -1 from old memcg
						FILE_MAPPED +1 to old memcg
					      }
					      .....
					      overwrite pc->mem_cgroup
					      unlock_page_cgroup()
    lock_page_cgroup()
    FILE_MAPPED + 1 to pc->mem_cgroup
    unlock_page_cgroup()

Then,
	old memcg (-1 file mapped)
	new memcg (+2 file mapped)


This happens because move_account see page_mapped() which is not guarded by
lock_page_cgroup(). This patch adds FILE_MAPPED flag to page_cgroup and
move account information based on it. Now, all checks are synchronous with
lock_page_cgroup().


Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page_cgroup.h |    6 ++++++
 mm/memcontrol.c             |   18 +++++++++---------
 2 files changed, 15 insertions(+), 9 deletions(-)

Index: mmotm-2.6.34-Mar23/include/linux/page_cgroup.h
===================================================================
--- mmotm-2.6.34-Mar23.orig/include/linux/page_cgroup.h
+++ mmotm-2.6.34-Mar23/include/linux/page_cgroup.h
@@ -39,6 +39,7 @@ enum {
 	PCG_CACHE, /* charged as cache */
 	PCG_USED, /* this object is in use. */
 	PCG_ACCT_LRU, /* page has been accounted for */
+	PCG_FILE_MAPPED, /* page is accounted as "mapped" */
 };
 
 #define TESTPCGFLAG(uname, lname)			\
@@ -73,6 +74,11 @@ CLEARPCGFLAG(AcctLRU, ACCT_LRU)
 TESTPCGFLAG(AcctLRU, ...
From: Balbir Singh
Date: Wednesday, March 24, 2010 - 12:05 am

Good catch!


Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>
 
 
-- 
	Three Cheers,
	Balbir
--

From: Daisuke Nishimura
Date: Wednesday, March 24, 2010 - 12:17 am

Yes, I've also noticed this race.
IMHO, one cause of this race is that we check "page_mapped(page)" while we might
not take the pte lock referring the page.
So, I suggested Andrea to introduce PCG_FILE_MAPPED and protect the bit by
Looks good to me.

	Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

Thanks,
--

Previous thread: Need to intercept open&write calls for vdi file when VirtualBox is working. by Sergey Malykhin on Tuesday, March 23, 2010 - 11:04 pm. (2 messages)

Next thread: [GIT PULL] sound fixes by Takashi Iwai on Wednesday, March 24, 2010 - 12:07 am. (1 message)