[RFC/PATCH 0/6] memcg: peformance improvement at el. v3

Previous thread: linux-next: Tree for May 14 by Stephen Rothwell on Wednesday, May 14, 2008 - 12:01 am. (10 messages)

Next thread: 2.6.26-rc2-mm1 by Andrew Morton on Wednesday, May 14, 2008 - 1:01 am. (61 messages)
From: KAMEZAWA Hiroyuki
Date: Wednesday, May 14, 2008 - 1:02 am

This set is for memory resource controller, reviewer and testers. 

Updated against 2.6.26-rc2 and added fixes.

This set does
 - remove refcnt from page_cgroup. By this, codes can be simplified and
   we can avoid tons of unnecessary calls just for maintain refcnt.
 - handle swap-cache, which is now ignored by memory resource controller.
 - small optimization.
 - make force_empty to drop pages. (NEW)
 
major changes : v2 -> v3
 - fixed shared memory handling.
 - added a call to request recalaim memory from specified memcg (NEW) for shmem.
 - added drop_all_pages_in_mem_cgroup before calling force_empty()
 - dropped 3 patches because it's already sent to -mm queue.

 1/6 -- make force_empty to drop pages. (NEW)
 2/6 -- remove refcnt from page_cgroup (shmem handling is fixed.)
 3/6 -- handle swap cache
 4/6 -- add an interface to reclaim memory from memcg (NEW) (for shmem)
 5/6 -- small optimzation with likely()/unlikely()
 6/6 -- remove redundant check.

If positive feedback, I'd like to send some of them agaisnt -mm queue.

This is based on
  - 2.6.26-rc2 
  - memcg-avoid-unnecessary-initialization.patch (in -mm queue)
  - memcg-make-global-var-read_mostly.patch (in -mm queue)
  - memcg-better-migration-handling.patch (in -mm queue)
tested on x86-64 box. Seems to work very well.

Any comments are welcome.

Thanks,
-Kame

--

From: KAMEZAWA Hiroyuki
Date: Wednesday, May 14, 2008 - 1:04 am

This is NEW one ;)
==
Now, when we remove memcg, we call force_empty().
This call drops all page_cgroup accounting in this mem_cgroup but doesn't
drop pages. So, some page caches can be remaind as "not accounted" memory
while they are alive. (because it's accounted only when add_to_page_cache())
If they are not used by other memcg, global LRU will drop them.

This patch tries to drop pages at removing memcg. Other memcg will
reload and re-account page caches.

Consideration: should we add knob to drop pages or not?

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 include/linux/res_counter.h |   11 +++++++++++
 mm/memcontrol.c             |   21 ++++++++++++++++++---
 2 files changed, 29 insertions(+), 3 deletions(-)

Index: linux-2.6.26-rc2/mm/memcontrol.c
===================================================================
--- linux-2.6.26-rc2.orig/mm/memcontrol.c
+++ linux-2.6.26-rc2/mm/memcontrol.c
@@ -763,6 +763,20 @@ void mem_cgroup_end_migration(struct pag
 	mem_cgroup_uncharge_page(newpage);
 }
 
+
+static void mem_cgroup_drop_all_page(struct mem_cgroup *mem)
+{
+	int progress;
+	while (res_counter_check_empty(&mem->res)) {
+		progress = try_to_free_mem_cgroup_pages(mem,
+					GFP_HIGHUSER_MOVABLE);
+		if (!progress) /* we did as much as possible */
+			break;
+		cond_resched();
+	}
+	return;
+}
+
 /*
  * This routine traverse page_cgroup in given list and drop them all.
  * This routine ignores page_cgroup->ref_cnt.
@@ -820,7 +834,12 @@ static int mem_cgroup_force_empty(struct
 	if (mem_cgroup_subsys.disabled)
 		return 0;
 
+	if (atomic_read(&mem->css.cgroup->count) > 0)
+		goto out;
+
 	css_get(&mem->css);
+	/* drop pages as much as possible */
+	mem_cgroup_drop_all_pages(mem);
 	/*
 	 * page reclaim code (kswapd etc..) will move pages between
 	 * active_list <-> inactive_list while we don't take a lock.
Index: ...
From: KAMEZAWA Hiroyuki
Date: Wednesday, May 14, 2008 - 1:07 am

major changes in shmem handling.

==
This patch removes refcnt from page_cgroup().

After this,

 * A page is charged only when !page_mapped() && no page_cgroup is assigned.
	* Anon page is newly mapped.
	* File page is added to mapping->tree.

 * A page is uncharged only when
	* Anon page is fully unmapped.
	* File page is removed from LRU.

There is no change in behavior from user's view.

This patch also removes unnecessary calls in rmap.c which was used only for
refcnt mangement.

Changelog: v2->v3
  - adjusted to 2.6.26-rc2
  - Fixed shmem's page_cgroup refcnt handling. (but it's still complicated...)
  - added detect-already-accounted-file-cache check to mem_cgroup_charge().

Changelog: v1->v2
  adjusted to 2.6.25-mm1.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 include/linux/memcontrol.h |    9 +---
 mm/filemap.c               |    6 +-
 mm/memcontrol.c            |   94 ++++++++++++++++++++++-----------------------
 mm/migrate.c               |    3 -
 mm/rmap.c                  |   14 ------
 mm/shmem.c                 |    8 +--
 6 files changed, 61 insertions(+), 73 deletions(-)

Index: linux-2.6.26-rc2/mm/memcontrol.c
===================================================================
--- linux-2.6.26-rc2.orig/mm/memcontrol.c
+++ linux-2.6.26-rc2/mm/memcontrol.c
@@ -166,7 +166,6 @@ struct page_cgroup {
 	struct list_head lru;		/* per cgroup LRU list */
 	struct page *page;
 	struct mem_cgroup *mem_cgroup;
-	int ref_cnt;			/* cached, mapped, migrating */
 	int flags;
 };
 #define PAGE_CGROUP_FLAG_CACHE	(0x1)	/* charged as cache */
@@ -185,6 +184,7 @@ static enum zone_type page_cgroup_zid(st
 enum charge_type {
 	MEM_CGROUP_CHARGE_TYPE_CACHE = 0,
 	MEM_CGROUP_CHARGE_TYPE_MAPPED,
+	MEM_CGROUP_CHARGE_TYPE_FORCE,	/* used by force_empty */
 };
 
 /*
@@ -552,9 +552,7 @@ retry:
 	 */
 	if (pc) {
 		VM_BUG_ON(pc->page != page);
-		VM_BUG_ON(pc->ref_cnt <= 0);
-
-		pc->ref_cnt++;
+		VM_BUG_ON(!pc->mem_cgroup);
 ...
From: Li Zefan
Date: Wednesday, May 14, 2008 - 6:42 pm

This function is defined and used in the 4th patch, so the declaration



Seems the logic is changed here. is it intended ?
--

From: KAMEZAWA Hiroyuki
Date: Wednesday, May 14, 2008 - 6:57 pm

On Thu, 15 May 2008 09:42:36 +0800
intended. (if success, uncharge is not necessary because there is no refcnt.
I'll add comment.

Thanks,
-Kame

--

From: KAMEZAWA Hiroyuki
Date: Wednesday, May 14, 2008 - 8:34 pm

On Thu, 15 May 2008 10:57:40 +0900
But, it seems patch 6/6 doesn't seem to be optimal in this case.
and have some troubles...I'll find a workaround.
It seems that shmem's memcg is complicated...

Thanks,
-Kame


--

From: KAMEZAWA Hiroyuki
Date: Wednesday, May 14, 2008 - 1:08 am

Including changes in memory.stat file.

=
Now swapcache is not accounted. (because it had some troubles.)

This is retrying account swap cache, based on remove-refcnt patch.

 * If a page is swap-cache,  mem_cgroup_uncharge_page() will *not*
   uncharge a page even if page->mapcount == 0.
 * If a page is removed from swap-cache, mem_cgroup_uncharge_page()
   is called.
 * A new swapcache page is not charged until when it's mapped. By this
   we can avoid complicated read-ahead troubles.

 A file, memory.stat,"rss" member is changed to "anon/swapcache".
 (rss is not precise name here...)
 When all processes in cgroup exits, rss/swapcache counter can have some
 numbers because of lazy behavior of LRU. So the word "rss" is confusing.
 I can easily imagine a user says "Oh, there may be memory leak..."
 Precise counting of swapcache is too costly to be handled in memcg.

Change log: v2->v3
 - adjusted to 2.6.26-rc2+x
 - changed "rss" in stat to "rss/swapcache". (stat value includes swapcache)
Change log: v1->v2
 - adjusted to 2.6.25-mm1.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 mm/memcontrol.c |    9 +++++----
 mm/migrate.c    |    3 ++-
 mm/swap_state.c |    1 +
 3 files changed, 8 insertions(+), 5 deletions(-)

Index: linux-2.6.26-rc2/mm/migrate.c
===================================================================
--- linux-2.6.26-rc2.orig/mm/migrate.c
+++ linux-2.6.26-rc2/mm/migrate.c
@@ -359,7 +359,8 @@ static int migrate_page_move_mapping(str
 	write_unlock_irq(&mapping->tree_lock);
 	if (!PageSwapCache(newpage)) {
 		mem_cgroup_uncharge_cache_page(page);
-	}
+	} else
+		mem_cgroup_uncharge_page(page);
 
 	return 0;
 }
Index: linux-2.6.26-rc2/mm/swap_state.c
===================================================================
--- linux-2.6.26-rc2.orig/mm/swap_state.c
+++ linux-2.6.26-rc2/mm/swap_state.c
@@ -110,6 +110,7 @@ void __delete_from_swap_cache(struct pag
 	total_swapcache_pages--;
 	__dec_zone_page_state(page, ...
From: KAMEZAWA Hiroyuki
Date: Wednesday, May 14, 2008 - 1:10 am

A new call, mem_cgroup_shrink_usage() is added for shmem handling
and removing not usual usage of mem_cgroup_charge/uncharge.

Now, shmem calls mem_cgroup_charge() just for reclaim some pages from
mem_cgroup. In general, shmem is used by some process group and not for
global resource (like file caches). So, it's reasonable to reclaim pages from
mem_cgroup where shmem is mainly used.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 mm/memcontrol.c |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Index: linux-2.6.26-rc2/mm/memcontrol.c
===================================================================
--- linux-2.6.26-rc2.orig/mm/memcontrol.c
+++ linux-2.6.26-rc2/mm/memcontrol.c
@@ -783,6 +783,30 @@ static void mem_cgroup_drop_all_pages(st
 }
 
 /*
+ * A call to try to shrink memory usage under specified resource controller.
+ * This is typically used for page reclaiming for shmem for reducing side
+ * effect of page allocation from shmem, which is used by some mem_cgroup.
+ */
+int mem_cgroup_shrink_usage(struct mm_struct *mm, gfp_t gfp_mask)
+{
+	struct mem_cgroup *mem;
+	int progress = 0;
+	int retry = MEM_CGROUP_RECLAIM_RETRIES;
+
+	rcu_read_lock();
+	mem = mem_cgroup_from_task(rcu_dereference(mm->owner));
+	css_get(&mem->css);
+	rcu_read_unlock();
+
+	while(!progress && --retry) {
+		progress = try_to_free_mem_cgroup_pages(mem, gfp_mask);
+	}
+	if (!retry)
+		return -ENOMEM;
+	return 0;
+}
+
+/*
  * This routine traverse page_cgroup in given list and drop them all.
  * *And* this routine doesn't reclaim page itself, just removes page_cgroup.
  */
Index: linux-2.6.26-rc2/mm/shmem.c
===================================================================
--- linux-2.6.26-rc2.orig/mm/shmem.c
+++ linux-2.6.26-rc2/mm/shmem.c
@@ -1314,13 +1314,12 @@ repeat:
 			unlock_page(swappage);
 			if (error == -ENOMEM) {
 				/* allow reclaim from this memory cgroup */
-				error = mem_cgroup_cache_charge(swappage,
-					current->mm, ...
From: Li Zefan
Date: Wednesday, May 14, 2008 - 1:15 am

This is wrong. How about:
	do {
		...
--

From: KAMEZAWA Hiroyuki
Date: Wednesday, May 14, 2008 - 1:25 am

On Wed, 14 May 2008 16:15:49 +0800
Ouch, or retry--....

Thank you for catching.

Thanks
-Kame

--

From: Li Zefan
Date: Wednesday, May 14, 2008 - 1:23 am

retry-- is still wrong, because then after while, retry will be -1, and then:

+	if (!retry)

--

From: KAMEZAWA Hiroyuki
Date: Wednesday, May 14, 2008 - 1:32 am

On Wed, 14 May 2008 16:23:09 +0800
ok, i'm now rewriting.

Thanks,
-Kame

--

From: KAMEZAWA Hiroyuki
Date: Wednesday, May 14, 2008 - 1:11 am

Showing brach direction for obvious conditions.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 mm/memcontrol.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6.26-rc2/mm/memcontrol.c
===================================================================
--- linux-2.6.26-rc2.orig/mm/memcontrol.c
+++ linux-2.6.26-rc2/mm/memcontrol.c
@@ -550,7 +550,7 @@ retry:
 	 * The page_cgroup exists and
 	 * the page has already been accounted.
 	 */
-	if (pc) {
+	if (unlikely(pc)) {
 		VM_BUG_ON(pc->page != page);
 		VM_BUG_ON(!pc->mem_cgroup);
 		unlock_page_cgroup(page);
@@ -559,7 +559,7 @@ retry:
 	unlock_page_cgroup(page);
 
 	pc = kmem_cache_alloc(page_cgroup_cache, gfp_mask);
-	if (pc == NULL)
+	if (unlikely(pc == NULL))
 		goto err;
 
 	/*
@@ -616,7 +616,7 @@ retry:
 		pc->flags = PAGE_CGROUP_FLAG_ACTIVE;
 
 	lock_page_cgroup(page);
-	if (page_get_page_cgroup(page)) {
+	if (unlikely(page_get_page_cgroup(page))) {
 		unlock_page_cgroup(page);
 		/*
 		 * Another charge has been added to this page already.
@@ -689,7 +689,7 @@ void __mem_cgroup_uncharge_common(struct
 	 */
 	lock_page_cgroup(page);
 	pc = page_get_page_cgroup(page);
-	if (!pc)
+	if (unlikely(!pc))
 		goto unlock;
 
 	VM_BUG_ON(pc->page != page);

--

From: KAMEZAWA Hiroyuki
Date: Wednesday, May 14, 2008 - 1:13 am

Because of remove refcnt patch, it's very rare case to that
mem_cgroup_charge_common() is called against a page which is accounted.

mem_cgroup_charge_common() is called when.
 1. a page is added into file cache.
 2. an anon page is _newly_ mapped.

A racy case is that a newly-swapped-in anonymous page is referred from
prural threads in do_swap_page() at the same time.
(a page is not Locked when mem_cgroup_charge() is called from do_swap_page.)

Signed-off-by : KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 mm/memcontrol.c |   34 ++++------------------------------
 1 file changed, 4 insertions(+), 30 deletions(-)

Index: linux-2.6.26-rc2/mm/memcontrol.c
===================================================================
--- linux-2.6.26-rc2.orig/mm/memcontrol.c
+++ linux-2.6.26-rc2/mm/memcontrol.c
@@ -536,28 +536,6 @@ static int mem_cgroup_charge_common(stru
 	if (mem_cgroup_subsys.disabled)
 		return 0;
 
-	/*
-	 * Should page_cgroup's go to their own slab?
-	 * One could optimize the performance of the charging routine
-	 * by saving a bit in the page_flags and using it as a lock
-	 * to see if the cgroup page already has a page_cgroup associated
-	 * with it
-	 */
-retry:
-	lock_page_cgroup(page);
-	pc = page_get_page_cgroup(page);
-	/*
-	 * The page_cgroup exists and
-	 * the page has already been accounted.
-	 */
-	if (unlikely(pc)) {
-		VM_BUG_ON(pc->page != page);
-		VM_BUG_ON(!pc->mem_cgroup);
-		unlock_page_cgroup(page);
-		goto done;
-	}
-	unlock_page_cgroup(page);
-
 	pc = kmem_cache_alloc(page_cgroup_cache, gfp_mask);
 	if (unlikely(pc == NULL))
 		goto err;
@@ -618,15 +596,10 @@ retry:
 	lock_page_cgroup(page);
 	if (unlikely(page_get_page_cgroup(page))) {
 		unlock_page_cgroup(page);
-		/*
-		 * Another charge has been added to this page already.
-		 * We take lock_page_cgroup(page) again and read
-		 * page->cgroup, increment refcnt.... just retry is OK.
-		 */
 		res_counter_uncharge(&mem->res, PAGE_SIZE);
 ...
Previous thread: linux-next: Tree for May 14 by Stephen Rothwell on Wednesday, May 14, 2008 - 12:01 am. (10 messages)

Next thread: 2.6.26-rc2-mm1 by Andrew Morton on Wednesday, May 14, 2008 - 1:01 am. (61 messages)