Changelog since V1
o Drop patch that takes a scanning hint from LRU
o Loop in reclaim until it is known that enough pages are reclaimed for
compaction to make forward progress or that progress is no longer
possible
o Do not call compaction from within reclaim. Instead have the allocator
or kswapd call it as necessary
o Obeying sync in migration now means just avoiding wait_on_page_writeback
Huge page allocations are not expected to be cheap but lumpy reclaim
is still very disruptive. While it is far better than reclaiming random
order-0 pages, it ignores the reference bit of pages near the reference
page selected from the LRU. Memory compaction was merged in 2.6.35 to use
less lumpy reclaim by moving pages around instead of reclaiming when there
were enough pages free. It has been tested fairly heavily at this point.
This is a prototype series to use compaction more aggressively.
When CONFIG_COMPACTION is set, lumpy reclaim is no longer used. Instead,
a mechanism called reclaim/compaction is used where a number of order-0
pages are reclaimed and later the caller uses compaction to satisfy the
allocation. This keeps a larger number of active pages in memory at the cost
of increased use of migration and compaction scanning. With the full series
applied, latencies when allocating huge pages are significantly reduced. By
the end of the series, hints are taken from the LRU on where the best place
to start migrating from might be.
Andrea, this version calls compaction from the callers instead of within
reclaim. Your main concern before was that compaction was being called after
a blind reclaim without checking if enough reclaim work had occurred. This
version is better at checking if enough work has been done but the callers
of compaction are a little awkward. I'm wondering if it really does make
more sense to call compact_zone_order() if should_continue_reclaim() returns
false and indications are that compaction would have a successful outcome.
Four kernels ...Currently lumpy_mode is an enum and determines if lumpy reclaim is off,
syncronous or asyncronous. In preparation for using compaction instead of
lumpy reclaim, this patch converts the flags into a bitmap.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
include/trace/events/vmscan.h | 6 ++--
mm/vmscan.c | 46 +++++++++++++++++++++++++----------------
2 files changed, 31 insertions(+), 21 deletions(-)
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index c255fcc..be76429 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -25,13 +25,13 @@
#define trace_reclaim_flags(page, sync) ( \
(page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
- (sync == LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
+ (sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
)
#define trace_shrink_flags(file, sync) ( \
- (sync == LUMPY_MODE_SYNC ? RECLAIM_WB_MIXED : \
+ (sync & LUMPY_MODE_SYNC ? RECLAIM_WB_MIXED : \
(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON)) | \
- (sync == LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
+ (sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
)
TRACE_EVENT(mm_vmscan_kswapd_sleep,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d31d7ce..e5eda92 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -51,11 +51,20 @@
#define CREATE_TRACE_POINTS
#include <trace/events/vmscan.h>
-enum lumpy_mode {
- LUMPY_MODE_NONE,
- LUMPY_MODE_ASYNC,
- LUMPY_MODE_SYNC,
-};
+/*
+ * lumpy_mode determines how the inactive list is shrunk
+ * LUMPY_MODE_SINGLE: Reclaim only order-0 pages
+ * LUMPY_MODE_ASYNC: Do not block
+ * LUMPY_MODE_SYNC: Allow blocking e.g. call wait_on_page_writeback
+ * LUMPY_MODE_CONTIGRECLAIM: For high-order allocations, take a reference
+ * page from the LRU and reclaim all pages within a
+ * naturally aligned range
+ */
+typedef unsigned __bitwise__ lumpy_mode;
+#define ...I find those names terribly undescriptive. It also strikes me as an odd set of flags. Can't this be represented with less? LUMPY_MODE_ENABLED LUMPY_MODE_SYNC or, after the rename, RECLAIM_MODE_HIGHER = 1 RECLAIM_MODE_SYNC = 2 RECLAIM_MODE_LUMPY = 4 where compaction mode is default if RECLAIM_MODE_HIGHER, and RECLAIM_MODE_LUMPY will go away eventually. Also, if you have a flag name for 'reclaim with extra efforts for lumpy_mode_t / reclaim_mode_t? --
My problem with that is you have to infer what the behaviour is from what the flags "are not" as opposed to what they are. For example, !LUMPY_MODE_SYNC implies LUMPY_MODE_ASYNC instead of specifying LUMPY_MODE_ASYNC. It also looks very odd when trying to distinguish between order-0 standard reclaim, It can't hurt! -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
Sounds like a boolean value to me. And it shows: you never actually check for RECLAIM_MODE_ASYNC in the code, you just always set it to That is true, because this is still an actual tristate. It's probably better to defer until lumpy reclaim is gone and there is only one flag Thanks :) --
If you insist, the ASYNC flag can be dropped. I found it easier to flag what behaviour was expected than infer it. In retrospect, I should have passed the flag into set_reclaim_mode() instead of a boolean and it Sure. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
It seems to be a matter of taste and nobody else seems to care, so I am not insisting. Let's just keep it as it is. --
As suggested by Johannes, rename reclaim_mode to reclaim_mode_t. This is
a fix to the mmotm patch
broken-out/mm-vmscan-rename-lumpy_mode-to-reclaim_mode.patch.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
mm/vmscan.c | 16 ++++++++--------
1 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 42a4859..a9390fd 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -63,12 +63,12 @@
* RECLAIM_MODE_COMPACTION: For high-order allocations, reclaim a number of
* order-0 pages and then compact the zone
*/
-typedef unsigned __bitwise__ reclaim_mode;
-#define RECLAIM_MODE_SINGLE ((__force reclaim_mode)0x01u)
-#define RECLAIM_MODE_ASYNC ((__force reclaim_mode)0x02u)
-#define RECLAIM_MODE_SYNC ((__force reclaim_mode)0x04u)
-#define RECLAIM_MODE_LUMPYRECLAIM ((__force reclaim_mode)0x08u)
-#define RECLAIM_MODE_COMPACTION ((__force reclaim_mode)0x10u)
+typedef unsigned __bitwise__ reclaim_mode_t;
+#define RECLAIM_MODE_SINGLE ((__force reclaim_mode_t)0x01u)
+#define RECLAIM_MODE_ASYNC ((__force reclaim_mode_t)0x02u)
+#define RECLAIM_MODE_SYNC ((__force reclaim_mode_t)0x04u)
+#define RECLAIM_MODE_LUMPYRECLAIM ((__force reclaim_mode_t)0x08u)
+#define RECLAIM_MODE_COMPACTION ((__force reclaim_mode_t)0x10u)
struct scan_control {
/* Incremented by the number of inactive pages that were scanned */
@@ -101,7 +101,7 @@ struct scan_control {
* Intend to reclaim enough continuous memory rather than reclaim
* enough amount of memory. i.e, mode for high order allocation.
*/
- reclaim_mode reclaim_mode;
+ reclaim_mode_t reclaim_mode;
/* Which cgroup do we reclaim from */
struct mem_cgroup *mem_cgroup;
@@ -287,7 +287,7 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
static void set_reclaim_mode(int priority, struct scan_control *sc,
bool sync)
{
- reclaim_mode syncmode = sync ? RECLAIM_MODE_SYNC : RECLAIM_MODE_ASYNC;
+ reclaim_mode_t syncmode = sync ? RECLAIM_MODE_SYNC : RECLAIM_MODE_ASYNC;
...Migration synchronously waits for writeback if the initial passes fails. Callers of memory compaction do not necessarily want this behaviour if the caller is latency sensitive or expects that synchronous migration is not going to have a significantly better success rate. This patch adds a sync parameter to migrate_pages() allowing the caller to indicate if wait_on_page_writeback() is allowed within migration or not. For reclaim/compaction, try_to_compact_pages() is first called asynchronously, direct reclaim runs and then try_to_compact_pages() is called synchronously as there is a greater expectation that it'll succeed. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- include/linux/compaction.h | 10 ++++++---- include/linux/migrate.h | 12 ++++++++---- mm/compaction.c | 14 ++++++++++---- mm/memory-failure.c | 3 ++- mm/memory_hotplug.c | 3 ++- mm/mempolicy.c | 4 ++-- mm/migrate.c | 22 +++++++++++++--------- mm/page_alloc.c | 21 +++++++++++++++------ mm/vmscan.c | 2 +- 9 files changed, 59 insertions(+), 32 deletions(-) diff --git a/include/linux/compaction.h b/include/linux/compaction.h index e082cf9..d0aeffd 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -21,10 +21,11 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write, extern int fragmentation_index(struct zone *zone, unsigned int order); extern unsigned long try_to_compact_pages(struct zonelist *zonelist, - int order, gfp_t gfp_mask, nodemask_t *mask); + int order, gfp_t gfp_mask, nodemask_t *mask, + bool sync); extern unsigned long compaction_suitable(struct zone *zone, int order); extern unsigned long compact_zone_order(struct zone *zone, int order, - gfp_t gfp_mask); + gfp_t gfp_mask, bool sync); /* Do not skip compaction more than 64 times */ #define COMPACT_MAX_DEFER_SHIFT 6 @@ -57,7 +58,8 @@ static inline bool ...
Acked-by: Johannes Weiner <hannes@cmpxchg.org> --
With the introduction of the boolean sync parameter, the API looks a
little inconsistent as offlining is still an int. Convert offlining to a
bool for the sake of being tidy.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
include/linux/migrate.h | 8 ++++----
mm/compaction.c | 2 +-
mm/memory_hotplug.c | 2 +-
mm/mempolicy.c | 6 ++++--
mm/migrate.c | 8 ++++----
5 files changed, 14 insertions(+), 12 deletions(-)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index fa31902..e39aeec 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -13,10 +13,10 @@ extern void putback_lru_pages(struct list_head *l);
extern int migrate_page(struct address_space *,
struct page *, struct page *);
extern int migrate_pages(struct list_head *l, new_page_t x,
- unsigned long private, int offlining,
+ unsigned long private, bool offlining,
bool sync);
extern int migrate_huge_pages(struct list_head *l, new_page_t x,
- unsigned long private, int offlining,
+ unsigned long private, bool offlining,
bool sync);
extern int fail_migrate_page(struct address_space *,
@@ -35,10 +35,10 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
static inline void putback_lru_pages(struct list_head *l) {}
static inline int migrate_pages(struct list_head *l, new_page_t x,
- unsigned long private, int offlining,
+ unsigned long private, bool offlining,
bool sync) { return -ENOSYS; }
static inline int migrate_huge_pages(struct list_head *l, new_page_t x,
- unsigned long private, int offlining,
+ unsigned long private, bool offlining,
bool sync) { return -ENOSYS; }
static inline int migrate_prep(void) { return -ENOSYS; }
diff --git a/mm/compaction.c b/mm/compaction.c
index 03bd8f9..b6e589d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -457,7 +457,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
nr_migrate = ...Acked-by: Johannes Weiner <hannes@cmpxchg.org> --
With compaction being used instead of lumpy reclaim, the name lumpy_mode and associated variables is a bit misleading. Rename lumpy_mode to reclaim_mode which is a better fit. There is no functional change. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- include/trace/events/vmscan.h | 6 ++-- mm/vmscan.c | 70 ++++++++++++++++++++-------------------- 2 files changed, 38 insertions(+), 38 deletions(-) diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h index be76429..ea422aa 100644 --- a/include/trace/events/vmscan.h +++ b/include/trace/events/vmscan.h @@ -25,13 +25,13 @@ #define trace_reclaim_flags(page, sync) ( \ (page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \ - (sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \ + (sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \ ) #define trace_shrink_flags(file, sync) ( \ - (sync & LUMPY_MODE_SYNC ? RECLAIM_WB_MIXED : \ + (sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_MIXED : \ (file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON)) | \ - (sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \ + (sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \ ) TRACE_EVENT(mm_vmscan_kswapd_sleep, diff --git a/mm/vmscan.c b/mm/vmscan.c index 6a6aa7d..92af572 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -53,22 +53,22 @@ #include <trace/events/vmscan.h> /* - * lumpy_mode determines how the inactive list is shrunk - * LUMPY_MODE_SINGLE: Reclaim only order-0 pages - * LUMPY_MODE_ASYNC: Do not block - * LUMPY_MODE_SYNC: Allow blocking e.g. call wait_on_page_writeback - * LUMPY_MODE_CONTIGRECLAIM: For high-order allocations, take a reference + * reclaim_mode determines how the inactive list is shrunk + * RECLAIM_MODE_SINGLE: Reclaim only order-0 pages + * RECLAIM_MODE_ASYNC: Do not block + * RECLAIM_MODE_SYNC: Allow blocking e.g. call wait_on_page_writeback + * RECLAIM_MODE_LUMPYRECLAIM: For high-order allocations, ...
Acked-by: Johannes Weiner <hannes@cmpxchg.org> --
try_to_compact_pages() is initially called to only migrate pages asychronously
and kswapd always compacts asynchronously. Both are being optimistic so it
is important to complete the work as quickly as possible to minimise stalls.
This patch alters the scanner when asynchronous to only consider
MIGRATE_MOVABLE pageblocks as migration candidates. This reduces stalls
when allocating huge pages while not impairing allocation success rates as
a full scan will be performed if necessary after direct reclaim.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
mm/compaction.c | 15 +++++++++++++++
1 files changed, 15 insertions(+), 0 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index b6e589d..50b0a90 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -240,6 +240,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
struct compact_control *cc)
{
unsigned long low_pfn, end_pfn;
+ unsigned long last_pageblock_nr = 0, pageblock_nr;
unsigned long nr_scanned = 0, nr_isolated = 0;
struct list_head *migratelist = &cc->migratepages;
@@ -280,6 +281,20 @@ static unsigned long isolate_migratepages(struct zone *zone,
if (PageBuddy(page))
continue;
+ /*
+ * For async migration, also only scan in MOVABLE blocks. Async
+ * migration is optimistic to see if the minimum amount of work
+ * satisfies the allocation
+ */
+ pageblock_nr = low_pfn >> pageblock_order;
+ if (!cc->sync && last_pageblock_nr != pageblock_nr &&
+ get_pageblock_migratetype(page) != MIGRATE_MOVABLE) {
+ low_pfn += pageblock_nr_pages;
+ low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
+ last_pageblock_nr = pageblock_nr;
+ continue;
+ }
+
/* Try isolate the page */
if (__isolate_lru_page(page, ISOLATE_BOTH, 0) != 0)
continue;
--
1.7.1
--
Acked-by: Johannes Weiner <hannes@cmpxchg.org> --
Lumpy reclaim is disruptive. It reclaims a large number of pages and ignores the age of the pages it reclaims. This can incur significant stalls and potentially increase the number of major faults. Compaction has reached the point where it is considered reasonably stable (meaning it has passed a lot of testing) and is a potential candidate for displacing lumpy reclaim. This patch introduces an alternative to lumpy reclaim whe compaction is available called reclaim/compaction. The basic operation is very simple - instead of selecting a contiguous range of pages to reclaim, a number of order-0 pages are reclaimed and then compaction is later by either kswapd (compact_zone_order()) or direct compaction (__alloc_pages_direct_compact()). Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- include/linux/compaction.h | 14 ++++++ include/linux/kernel.h | 7 +++ mm/compaction.c | 89 +++++++++++++++++++++++-------------- mm/page_alloc.c | 13 ++++++ mm/vmscan.c | 103 +++++++++++++++++++++++++++++++++++++------ 5 files changed, 177 insertions(+), 49 deletions(-) diff --git a/include/linux/compaction.h b/include/linux/compaction.h index 5ac5155..e082cf9 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -22,6 +22,9 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write, extern int fragmentation_index(struct zone *zone, unsigned int order); extern unsigned long try_to_compact_pages(struct zonelist *zonelist, int order, gfp_t gfp_mask, nodemask_t *mask); +extern unsigned long compaction_suitable(struct zone *zone, int order); +extern unsigned long compact_zone_order(struct zone *zone, int order, + gfp_t gfp_mask); /* Do not skip compaction more than 64 times */ #define COMPACT_MAX_DEFER_SHIFT 6 @@ -59,6 +62,17 @@ static inline unsigned long try_to_compact_pages(struct zonelist *zonelist, return COMPACT_CONTINUE; } +static inline unsigned long compaction_suitable(struct ...
Isn't this a regression for !COMPACTION_BUILD in that earlier kernels would not do sync lumpy reclaim when somebody disabled it during the async run? If so, it should be trivial to fix. Aside from that Acked-by: Johannes Weiner <hannes@cmpxchg.org> --
You'll need to clarify your question I'm afraid. In 2.6.36 for example, if lumpy reclaim gets disabled then sync reclaim does not happen at all. This was due to large stalls being observed when copying large amounts Thanks -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
Sorry for the noise, I just verified that it really was dead code. We have if (should_reclaim_stall()) set_lumpy_reclaim_mode(.sync=true) but because the branch is never taken if lumpy is disabled, the conditional in set_lumpy_reclaim_mode() is dead. --
Hi Mel, this looks great to me. I'll replace my patch 1/66 with this patchset to test with THP. It shall work fine. Thanks, Andrea --
