Re: [PATCH 5/7] mm: migration: Cleanup migrate_pages API by matching types for offlining and sync

Previous thread: [PATCH 1/7] mm: compaction: Add trace events for memory compaction activity by Mel Gorman on Monday, November 22, 2010 - 8:43 am. (1 message)

Next thread: [RFC][PATCH 0/4] security label of the file descriptor structure by Roberto Sassu on Monday, November 22, 2010 - 8:56 am. (1 message)
From: Mel Gorman
Date: Monday, November 22, 2010 - 8:43 am

Changelog since V1
  o Drop patch that takes a scanning hint from LRU
  o Loop in reclaim until it is known that enough pages are reclaimed for
    compaction to make forward progress or that progress is no longer
    possible
  o Do not call compaction from within reclaim. Instead have the allocator
    or kswapd call it as necessary
  o Obeying sync in migration now means just avoiding wait_on_page_writeback

Huge page allocations are not expected to be cheap but lumpy reclaim
is still very disruptive. While it is far better than reclaiming random
order-0 pages, it ignores the reference bit of pages near the reference
page selected from the LRU. Memory compaction was merged in 2.6.35 to use
less lumpy reclaim by moving pages around instead of reclaiming when there
were enough pages free. It has been tested fairly heavily at this point.
This is a prototype series to use compaction more aggressively.

When CONFIG_COMPACTION is set, lumpy reclaim is no longer used. Instead,
a mechanism called reclaim/compaction is used where a number of order-0
pages are reclaimed and later the caller uses compaction to satisfy the
allocation. This keeps a larger number of active pages in memory at the cost
of increased use of migration and compaction scanning. With the full series
applied, latencies when allocating huge pages are significantly reduced. By
the end of the series, hints are taken from the LRU on where the best place
to start migrating from might be.

Andrea, this version calls compaction from the callers instead of within
reclaim. Your main concern before was that compaction was being called after
a blind reclaim without checking if enough reclaim work had occurred. This
version is better at checking if enough work has been done but the callers
of compaction are a little awkward. I'm wondering if it really does make
more sense to call compact_zone_order() if should_continue_reclaim() returns
false and indications are that compaction would have a successful outcome.

Four kernels ...
From: Mel Gorman
Date: Monday, November 22, 2010 - 8:43 am

Currently lumpy_mode is an enum and determines if lumpy reclaim is off,
syncronous or asyncronous. In preparation for using compaction instead of
lumpy reclaim, this patch converts the flags into a bitmap.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/trace/events/vmscan.h |    6 ++--
 mm/vmscan.c                   |   46 +++++++++++++++++++++++++----------------
 2 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index c255fcc..be76429 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -25,13 +25,13 @@
 
 #define trace_reclaim_flags(page, sync) ( \
 	(page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
-	(sync == LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
+	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
 	)
 
 #define trace_shrink_flags(file, sync) ( \
-	(sync == LUMPY_MODE_SYNC ? RECLAIM_WB_MIXED : \
+	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_MIXED : \
 			(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON)) |  \
-	(sync == LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
+	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
 	)
 
 TRACE_EVENT(mm_vmscan_kswapd_sleep,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d31d7ce..e5eda92 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -51,11 +51,20 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/vmscan.h>
 
-enum lumpy_mode {
-	LUMPY_MODE_NONE,
-	LUMPY_MODE_ASYNC,
-	LUMPY_MODE_SYNC,
-};
+/*
+ * lumpy_mode determines how the inactive list is shrunk
+ * LUMPY_MODE_SINGLE: Reclaim only order-0 pages
+ * LUMPY_MODE_ASYNC:  Do not block
+ * LUMPY_MODE_SYNC:   Allow blocking e.g. call wait_on_page_writeback
+ * LUMPY_MODE_CONTIGRECLAIM: For high-order allocations, take a reference
+ *			page from the LRU and reclaim all pages within a
+ *			naturally aligned range
+ */
+typedef unsigned __bitwise__ lumpy_mode;
+#define ...
From: Johannes Weiner
Date: Wednesday, December 1, 2010 - 3:27 am

I find those names terribly undescriptive.  It also strikes me as an
odd set of flags.  Can't this be represented with less?

	LUMPY_MODE_ENABLED
	LUMPY_MODE_SYNC

or, after the rename,

	RECLAIM_MODE_HIGHER	= 1
	RECLAIM_MODE_SYNC	= 2
	RECLAIM_MODE_LUMPY	= 4

where compaction mode is default if RECLAIM_MODE_HIGHER, and
RECLAIM_MODE_LUMPY will go away eventually.

Also, if you have a flag name for 'reclaim with extra efforts for

lumpy_mode_t / reclaim_mode_t?
--

From: Mel Gorman
Date: Wednesday, December 1, 2010 - 3:50 am

My problem with that is you have to infer what the behaviour is from what the
flags "are not" as opposed to what they are. For example, !LUMPY_MODE_SYNC
implies LUMPY_MODE_ASYNC instead of specifying LUMPY_MODE_ASYNC. It also
looks very odd when trying to distinguish between order-0 standard reclaim,

It can't hurt!

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: Johannes Weiner
Date: Wednesday, December 1, 2010 - 4:21 am

Sounds like a boolean value to me.  And it shows: you never actually
check for RECLAIM_MODE_ASYNC in the code, you just always set it to

That is true, because this is still an actual tristate.  It's probably
better to defer until lumpy reclaim is gone and there is only one flag

Thanks :)
--

From: Mel Gorman
Date: Wednesday, December 1, 2010 - 4:56 am

If you insist, the ASYNC flag can be dropped. I found it easier to flag
what behaviour was expected than infer it. In retrospect, I should have
passed the flag into set_reclaim_mode() instead of a boolean and it

Sure.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: Johannes Weiner
Date: Thursday, December 2, 2010 - 4:04 am

It seems to be a matter of taste and nobody else seems to care, so I
am not insisting.  Let's just keep it as it is.
--

From: Mel Gorman
Date: Thursday, December 2, 2010 - 5:03 am

As suggested by Johannes, rename reclaim_mode to reclaim_mode_t. This is
a fix to the mmotm patch
broken-out/mm-vmscan-rename-lumpy_mode-to-reclaim_mode.patch.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/vmscan.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 42a4859..a9390fd 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -63,12 +63,12 @@
  * RECLAIM_MODE_COMPACTION: For high-order allocations, reclaim a number of
  *			order-0 pages and then compact the zone
  */
-typedef unsigned __bitwise__ reclaim_mode;
-#define RECLAIM_MODE_SINGLE		((__force reclaim_mode)0x01u)
-#define RECLAIM_MODE_ASYNC		((__force reclaim_mode)0x02u)
-#define RECLAIM_MODE_SYNC		((__force reclaim_mode)0x04u)
-#define RECLAIM_MODE_LUMPYRECLAIM	((__force reclaim_mode)0x08u)
-#define RECLAIM_MODE_COMPACTION		((__force reclaim_mode)0x10u)
+typedef unsigned __bitwise__ reclaim_mode_t;
+#define RECLAIM_MODE_SINGLE		((__force reclaim_mode_t)0x01u)
+#define RECLAIM_MODE_ASYNC		((__force reclaim_mode_t)0x02u)
+#define RECLAIM_MODE_SYNC		((__force reclaim_mode_t)0x04u)
+#define RECLAIM_MODE_LUMPYRECLAIM	((__force reclaim_mode_t)0x08u)
+#define RECLAIM_MODE_COMPACTION		((__force reclaim_mode_t)0x10u)
 
 struct scan_control {
 	/* Incremented by the number of inactive pages that were scanned */
@@ -101,7 +101,7 @@ struct scan_control {
 	 * Intend to reclaim enough continuous memory rather than reclaim
 	 * enough amount of memory. i.e, mode for high order allocation.
 	 */
-	reclaim_mode reclaim_mode;
+	reclaim_mode_t reclaim_mode;
 
 	/* Which cgroup do we reclaim from */
 	struct mem_cgroup *mem_cgroup;
@@ -287,7 +287,7 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 static void set_reclaim_mode(int priority, struct scan_control *sc,
 				   bool sync)
 {
-	reclaim_mode syncmode = sync ? RECLAIM_MODE_SYNC : RECLAIM_MODE_ASYNC;
+	reclaim_mode_t syncmode = sync ? RECLAIM_MODE_SYNC : RECLAIM_MODE_ASYNC;
 ...

Migration synchronously waits for writeback if the initial passes fails.
Callers of memory compaction do not necessarily want this behaviour if the
caller is latency sensitive or expects that synchronous migration is not
going to have a significantly better success rate.

This patch adds a sync parameter to migrate_pages() allowing the caller to
indicate if wait_on_page_writeback() is allowed within migration or not. For
reclaim/compaction, try_to_compact_pages() is first called asynchronously,
direct reclaim runs and then try_to_compact_pages() is called synchronously
as there is a greater expectation that it'll succeed.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/compaction.h |   10 ++++++----
 include/linux/migrate.h    |   12 ++++++++----
 mm/compaction.c            |   14 ++++++++++----
 mm/memory-failure.c        |    3 ++-
 mm/memory_hotplug.c        |    3 ++-
 mm/mempolicy.c             |    4 ++--
 mm/migrate.c               |   22 +++++++++++++---------
 mm/page_alloc.c            |   21 +++++++++++++++------
 mm/vmscan.c                |    2 +-
 9 files changed, 59 insertions(+), 32 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index e082cf9..d0aeffd 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -21,10 +21,11 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
 
 extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
-			int order, gfp_t gfp_mask, nodemask_t *mask);
+			int order, gfp_t gfp_mask, nodemask_t *mask,
+			bool sync);
 extern unsigned long compaction_suitable(struct zone *zone, int order);
 extern unsigned long compact_zone_order(struct zone *zone, int order,
-						gfp_t gfp_mask);
+						gfp_t gfp_mask, bool sync);
 
 /* Do not skip compaction more than 64 times */
 #define COMPACT_MAX_DEFER_SHIFT 6
@@ -57,7 +58,8 @@ static inline bool ...

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
--

From: Mel Gorman
Date: Monday, November 22, 2010 - 8:43 am

With the introduction of the boolean sync parameter, the API looks a
little inconsistent as offlining is still an int. Convert offlining to a
bool for the sake of being tidy.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/migrate.h |    8 ++++----
 mm/compaction.c         |    2 +-
 mm/memory_hotplug.c     |    2 +-
 mm/mempolicy.c          |    6 ++++--
 mm/migrate.c            |    8 ++++----
 5 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index fa31902..e39aeec 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -13,10 +13,10 @@ extern void putback_lru_pages(struct list_head *l);
 extern int migrate_page(struct address_space *,
 			struct page *, struct page *);
 extern int migrate_pages(struct list_head *l, new_page_t x,
-			unsigned long private, int offlining,
+			unsigned long private, bool offlining,
 			bool sync);
 extern int migrate_huge_pages(struct list_head *l, new_page_t x,
-			unsigned long private, int offlining,
+			unsigned long private, bool offlining,
 			bool sync);
 
 extern int fail_migrate_page(struct address_space *,
@@ -35,10 +35,10 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
 
 static inline void putback_lru_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_page_t x,
-		unsigned long private, int offlining,
+		unsigned long private, bool offlining,
 		bool sync) { return -ENOSYS; }
 static inline int migrate_huge_pages(struct list_head *l, new_page_t x,
-		unsigned long private, int offlining,
+		unsigned long private, bool offlining,
 		bool sync) { return -ENOSYS; }
 
 static inline int migrate_prep(void) { return -ENOSYS; }
diff --git a/mm/compaction.c b/mm/compaction.c
index 03bd8f9..b6e589d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -457,7 +457,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 
 		nr_migrate = ...
From: Johannes Weiner
Date: Wednesday, December 1, 2010 - 3:28 am

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
--

From: Mel Gorman
Date: Monday, November 22, 2010 - 8:43 am

With compaction being used instead of lumpy reclaim, the name lumpy_mode
and associated variables is a bit misleading. Rename lumpy_mode to
reclaim_mode which is a better fit. There is no functional change.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/trace/events/vmscan.h |    6 ++--
 mm/vmscan.c                   |   70 ++++++++++++++++++++--------------------
 2 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index be76429..ea422aa 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -25,13 +25,13 @@
 
 #define trace_reclaim_flags(page, sync) ( \
 	(page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
-	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
+	(sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
 	)
 
 #define trace_shrink_flags(file, sync) ( \
-	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_MIXED : \
+	(sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_MIXED : \
 			(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON)) |  \
-	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
+	(sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
 	)
 
 TRACE_EVENT(mm_vmscan_kswapd_sleep,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6a6aa7d..92af572 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -53,22 +53,22 @@
 #include <trace/events/vmscan.h>
 
 /*
- * lumpy_mode determines how the inactive list is shrunk
- * LUMPY_MODE_SINGLE: Reclaim only order-0 pages
- * LUMPY_MODE_ASYNC:  Do not block
- * LUMPY_MODE_SYNC:   Allow blocking e.g. call wait_on_page_writeback
- * LUMPY_MODE_CONTIGRECLAIM: For high-order allocations, take a reference
+ * reclaim_mode determines how the inactive list is shrunk
+ * RECLAIM_MODE_SINGLE: Reclaim only order-0 pages
+ * RECLAIM_MODE_ASYNC:  Do not block
+ * RECLAIM_MODE_SYNC:   Allow blocking e.g. call wait_on_page_writeback
+ * RECLAIM_MODE_LUMPYRECLAIM: For high-order allocations, ...
From: Johannes Weiner
Date: Wednesday, December 1, 2010 - 3:34 am

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
--

From: Mel Gorman
Date: Monday, November 22, 2010 - 8:43 am

try_to_compact_pages() is initially called to only migrate pages asychronously
and kswapd always compacts asynchronously. Both are being optimistic so it
is important to complete the work as quickly as possible to minimise stalls.

This patch alters the scanner when asynchronous to only consider
MIGRATE_MOVABLE pageblocks as migration candidates. This reduces stalls
when allocating huge pages while not impairing allocation success rates as
a full scan will be performed if necessary after direct reclaim.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/compaction.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index b6e589d..50b0a90 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -240,6 +240,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
 					struct compact_control *cc)
 {
 	unsigned long low_pfn, end_pfn;
+	unsigned long last_pageblock_nr = 0, pageblock_nr;
 	unsigned long nr_scanned = 0, nr_isolated = 0;
 	struct list_head *migratelist = &cc->migratepages;
 
@@ -280,6 +281,20 @@ static unsigned long isolate_migratepages(struct zone *zone,
 		if (PageBuddy(page))
 			continue;
 
+		/*
+		 * For async migration, also only scan in MOVABLE blocks. Async
+		 * migration is optimistic to see if the minimum amount of work
+		 * satisfies the allocation
+		 */
+		pageblock_nr = low_pfn >> pageblock_order;
+		if (!cc->sync && last_pageblock_nr != pageblock_nr &&
+				get_pageblock_migratetype(page) != MIGRATE_MOVABLE) {
+			low_pfn += pageblock_nr_pages;
+			low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
+			last_pageblock_nr = pageblock_nr;
+			continue;
+		}
+
 		/* Try isolate the page */
 		if (__isolate_lru_page(page, ISOLATE_BOTH, 0) != 0)
 			continue;
-- 
1.7.1

--

From: Johannes Weiner
Date: Wednesday, December 1, 2010 - 3:31 am

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
--

From: Mel Gorman
Date: Monday, November 22, 2010 - 8:43 am

Lumpy reclaim is disruptive. It reclaims a large number of pages and ignores
the age of the pages it reclaims. This can incur significant stalls and
potentially increase the number of major faults.

Compaction has reached the point where it is considered reasonably stable
(meaning it has passed a lot of testing) and is a potential candidate for
displacing lumpy reclaim. This patch introduces an alternative to lumpy
reclaim whe compaction is available called reclaim/compaction. The basic
operation is very simple - instead of selecting a contiguous range of pages
to reclaim, a number of order-0 pages are reclaimed and then compaction is
later by either kswapd (compact_zone_order()) or direct compaction
(__alloc_pages_direct_compact()).

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/compaction.h |   14 ++++++
 include/linux/kernel.h     |    7 +++
 mm/compaction.c            |   89 +++++++++++++++++++++++--------------
 mm/page_alloc.c            |   13 ++++++
 mm/vmscan.c                |  103 +++++++++++++++++++++++++++++++++++++------
 5 files changed, 177 insertions(+), 49 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 5ac5155..e082cf9 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -22,6 +22,9 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
 extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
 			int order, gfp_t gfp_mask, nodemask_t *mask);
+extern unsigned long compaction_suitable(struct zone *zone, int order);
+extern unsigned long compact_zone_order(struct zone *zone, int order,
+						gfp_t gfp_mask);
 
 /* Do not skip compaction more than 64 times */
 #define COMPACT_MAX_DEFER_SHIFT 6
@@ -59,6 +62,17 @@ static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
 	return COMPACT_CONTINUE;
 }
 
+static inline unsigned long compaction_suitable(struct ...
From: Johannes Weiner
Date: Wednesday, December 1, 2010 - 3:27 am

Isn't this a regression for !COMPACTION_BUILD in that earlier kernels
would not do sync lumpy reclaim when somebody disabled it during the
async run?

If so, it should be trivial to fix.  Aside from that

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
--

From: Mel Gorman
Date: Wednesday, December 1, 2010 - 3:56 am

You'll need to clarify your question I'm afraid. In 2.6.36 for example,
if lumpy reclaim gets disabled then sync reclaim does not happen at all.
This was due to large stalls being observed when copying large amounts

Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--

From: Johannes Weiner
Date: Wednesday, December 1, 2010 - 4:32 am

Sorry for the noise, I just verified that it really was dead code.  We
have

	if (should_reclaim_stall())
		set_lumpy_reclaim_mode(.sync=true)

but because the branch is never taken if lumpy is disabled, the
conditional in set_lumpy_reclaim_mode() is dead.
--

From: Andrea Arcangeli
Date: Monday, November 22, 2010 - 9:01 am

Hi Mel,

this looks great to me. I'll replace my patch 1/66 with this patchset
to test with THP. It shall work fine.

Thanks,
Andrea
--

Previous thread: [PATCH 1/7] mm: compaction: Add trace events for memory compaction activity by Mel Gorman on Monday, November 22, 2010 - 8:43 am. (1 message)

Next thread: [RFC][PATCH 0/4] security label of the file descriptor structure by Roberto Sassu on Monday, November 22, 2010 - 8:56 am. (1 message)