This is just an RFC to reduce some of the more obvious stack usage in page
reclaim. It's a bit rushed and I haven't tested this yet but am sending
it out as there may be others working on similar material and would rather
avoid overlap. I built on some of Kosaki Motohiro's work.
On X86 bit, stack usage figures (generated using a modified bloat-o-meter
that uses checkstack.pl as its input) change in the following ways after
the series of patches.
add/remove: 2/0 grow/shrink: 0/4 up/down: 804/-1688 (-884)
function old new delta
putback_lru_pages - 676 +676
update_isolated_counts - 128 +128
do_try_to_free_pages 172 128 -44
kswapd 1324 1168 -156
shrink_page_list 1616 1224 -392
shrink_zone 2320 1224 -1096
There are some growths there but critically they are no longer in the path
that would call writepages. In the main path, there is about 1K of stack
lopped off giving a small amount of breathing room.
KOSAKI Motohiro (3):
vmscan: kill prev_priority completely
vmscan: move priority variable into scan_control
vmscan: simplify shrink_inactive_list()
Mel Gorman (7):
vmscan: Remove useless loop at end of do_try_to_free_pages
vmscan: Remove unnecessary temporary vars in do_try_to_free_pages
vmscan: Split shrink_zone to reduce stack usage
vmscan: Remove unnecessary temporary variables in shrink_zone()
vmscan: Setup pagevec as late as possible in shrink_inactive_list()
vmscan: Setup pagevec as late as possible in shrink_page_list()
vmscan: Update isolated page counters outside of main path in
shrink_inactive_list()
include/linux/mmzone.h | 15 --
mm/page_alloc.c | 2 -
mm/vmscan.c | 447 +++++++++++++++++++++++-------------------------
mm/vmstat.c | 2 -
4 files ...When shrink_inactive_list() isolates pages, it updates a number of
counters using temporary variables to gather them. These consume stack
and it's in the main path that calls ->writepage(). This patch moves the
accounting updates outside of the main path to reduce stack usage.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
mm/vmscan.c | 63 +++++++++++++++++++++++++++++++++++-----------------------
1 files changed, 38 insertions(+), 25 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2c22c83..4225319 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1061,7 +1061,8 @@ static unsigned long clear_active_flags(struct list_head *page_list,
ClearPageActive(page);
nr_active++;
}
- count[lru]++;
+ if (count)
+ count[lru]++;
}
return nr_active;
@@ -1141,12 +1142,13 @@ static int too_many_isolated(struct zone *zone, int file,
* TODO: Try merging with migrations version of putback_lru_pages
*/
static noinline void putback_lru_pages(struct zone *zone,
- struct zone_reclaim_stat *reclaim_stat,
+ struct scan_control *sc,
unsigned long nr_anon, unsigned long nr_file,
struct list_head *page_list)
{
struct page *page;
struct pagevec pvec;
+ struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
pagevec_init(&pvec, 1);
@@ -1185,6 +1187,37 @@ static noinline void putback_lru_pages(struct zone *zone,
pagevec_release(&pvec);
}
+static noinline void update_isolated_counts(struct zone *zone,
+ struct scan_control *sc,
+ unsigned long *nr_anon,
+ unsigned long *nr_file,
+ struct list_head *isolated_list)
+{
+ unsigned long nr_active;
+ unsigned int count[NR_LRU_LISTS] = { 0, };
+ struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
+
+ nr_active = clear_active_flags(isolated_list, count);
+ __count_vm_events(PGDEACTIVATE, nr_active);
+
+ __mod_zone_page_state(zone, NR_ACTIVE_FILE,
+ -count[LRU_ACTIVE_FILE]);
+ __mod_zone_page_state(zone, ...Seems unrelated change here. Otherwise looks good. --
It was needed somewhat otherwise the split in accounting looked odd. It -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> --
shrink_inactive_list() sets up a pagevec to release unfreeable pages. It
uses significant amounts of stack doing this. This patch splits
shrink_inactive_list() to take the stack usage out of the main path so
that callers to writepage() do not contain an unused pagevec on the
stack.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
mm/vmscan.c | 93 +++++++++++++++++++++++++++++++++-------------------------
1 files changed, 53 insertions(+), 40 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a232ad6..9bc1ede 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1120,6 +1120,54 @@ static int too_many_isolated(struct zone *zone, int file,
}
/*
+ * TODO: Try merging with migrations version of putback_lru_pages
+ */
+static noinline void putback_lru_pages(struct zone *zone,
+ struct zone_reclaim_stat *reclaim_stat,
+ unsigned long nr_anon, unsigned long nr_file,
+ struct list_head *page_list)
+{
+ struct page *page;
+ struct pagevec pvec;
+
+ pagevec_init(&pvec, 1);
+
+ /*
+ * Put back any unfreeable pages.
+ */
+ spin_lock(&zone->lru_lock);
+ while (!list_empty(page_list)) {
+ int lru;
+ page = lru_to_page(page_list);
+ VM_BUG_ON(PageLRU(page));
+ list_del(&page->lru);
+ if (unlikely(!page_evictable(page, NULL))) {
+ spin_unlock_irq(&zone->lru_lock);
+ putback_lru_page(page);
+ spin_lock_irq(&zone->lru_lock);
+ continue;
+ }
+ SetPageLRU(page);
+ lru = page_lru(page);
+ add_page_to_lru_list(zone, page, lru);
+ if (is_active_lru(lru)) {
+ int file = is_file_lru(lru);
+ reclaim_stat->recent_rotated[file]++;
+ }
+ if (!pagevec_add(&pvec, page)) {
+ spin_unlock_irq(&zone->lru_lock);
+ __pagevec_release(&pvec);
+ spin_lock_irq(&zone->lru_lock);
+ }
+ }
+ __mod_zone_page_state(zone, NR_ISOLATED_ANON, -nr_anon);
+ __mod_zone_page_state(zone, NR_ISOLATED_FILE, -nr_file);
+
+ spin_unlock_irq(&zone->lru_lock);
+ pagevec_release(&pvec);
+}
+
+/*
* shrink_inactive_list() is a helper for shrink_zone(). It ...noinline_for_stack Cheers, Dave. -- Dave Chinner david@fromorbit.com --
I also think this stuff need more cleanups. but this patch works and no downside. So, Let's merge this at first. Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> --
Agreed. I was keeping the first-pass straight-forward and didn't want to Done. Thanks Dave, I hadn't spotted noinline_for_stack but it's exactly -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> --
shrink_page_list() sets up a pagevec to release pages as according as they
are free. It uses significant amounts of stack on the pagevec. This
patch adds pages to be freed via pagevec to a linked list which is then
freed en-masse at the end. This avoids using stack in the main path that
potentially calls writepage().
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
mm/vmscan.c | 34 ++++++++++++++++++++++++++--------
1 files changed, 26 insertions(+), 8 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9bc1ede..2c22c83 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -619,6 +619,22 @@ static enum page_references page_check_references(struct page *page,
return PAGEREF_RECLAIM;
}
+static void free_page_list(struct list_head *free_list)
+{
+ struct pagevec freed_pvec;
+ struct page *page, *tmp;
+
+ pagevec_init(&freed_pvec, 1);
+
+ list_for_each_entry_safe(page, tmp, free_list, lru) {
+ list_del(&page->lru);
+ if (!pagevec_add(&freed_pvec, page)) {
+ __pagevec_free(&freed_pvec);
+ pagevec_reinit(&freed_pvec);
+ }
+ }
+}
+
/*
* shrink_page_list() returns the number of reclaimed pages
*/
@@ -627,13 +643,12 @@ static unsigned long shrink_page_list(struct list_head *page_list,
enum pageout_io sync_writeback)
{
LIST_HEAD(ret_pages);
- struct pagevec freed_pvec;
+ LIST_HEAD(free_list);
int pgactivate = 0;
unsigned long nr_reclaimed = 0;
cond_resched();
- pagevec_init(&freed_pvec, 1);
while (!list_empty(page_list)) {
enum page_references references;
struct address_space *mapping;
@@ -808,10 +823,12 @@ static unsigned long shrink_page_list(struct list_head *page_list,
__clear_page_locked(page);
free_it:
nr_reclaimed++;
- if (!pagevec_add(&freed_pvec, page)) {
- __pagevec_free(&freed_pvec);
- pagevec_reinit(&freed_pvec);
- }
+
+ /*
+ * Is there need to periodically free_page_list? It would
+ * appear not as the counts should be low
+ */
+ list_add(&page->lru, &free_list);
...Need this two line at this? because we need consider number of list element are not 14xN. if (pagevec_count(&freed_pvec)) --
Whoops, yes indeed. Otherwise this potentially leaks and as -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
Two variables are declared that are unnecessary in shrink_zone() as they
already exist int the scan_control. Remove them
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
mm/vmscan.c | 8 ++------
1 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a374879..a232ad6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1633,8 +1633,6 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc)
{
unsigned long nr[NR_LRU_LISTS];
unsigned long nr_to_scan;
- unsigned long nr_reclaimed = sc->nr_reclaimed;
- unsigned long nr_to_reclaim = sc->nr_to_reclaim;
enum lru_list l;
calc_scan_trybatch(zone, sc, nr);
@@ -1647,7 +1645,7 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc)
nr[l], SWAP_CLUSTER_MAX);
nr[l] -= nr_to_scan;
- nr_reclaimed += shrink_list(l, nr_to_scan,
+ sc->nr_reclaimed += shrink_list(l, nr_to_scan,
zone, sc);
}
}
@@ -1659,13 +1657,11 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc)
* with multiple processes reclaiming pages, the total
* freeing target can get unreasonably large.
*/
- if (nr_reclaimed >= nr_to_reclaim &&
+ if (sc->nr_reclaimed >= sc->nr_to_reclaim &&
sc->priority < DEF_PRIORITY)
break;
}
- sc->nr_reclaimed = nr_reclaimed;
-
/*
* Even if we did not try to evict anon pages at all, we want to
* rebalance the anon lru active/inactive ratio.
--
1.6.5
--
You confuse me, you added the local variables yourself in 01dbe5c9 for performance reasons. Doesn't that still hold? --
To avoid a potential regression, I've dropped the patch. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
I'm ok either. Commit 01dbe5c9 issue was only observed on ia64. so it's not important. But at making 01dbe5c9 time, I didn't realized this stack overflow issue. So, I thought "Oh, It's no risk. should go!". but if stack reducing is important, I'm ok to drop this one. --
shrink_zone() calculculates how many pages it needs to shrink from each
LRU list in a given pass. It uses a number of temporary variables to
work this out that then remain on the stack. This patch splits the
function so that some of the stack variables can be discarded.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
mm/vmscan.c | 29 +++++++++++++++++++----------
1 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1ace7c6..a374879 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1595,19 +1595,14 @@ static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
return nr;
}
-/*
- * This is a basic per-zone page freer. Used by both kswapd and direct reclaim.
- */
-static void shrink_zone(struct zone *zone, struct scan_control *sc)
+/* Calculate how many pages from each LRU list should be scanned */
+static void calc_scan_trybatch(struct zone *zone,
+ struct scan_control *sc, unsigned long *nr)
{
- unsigned long nr[NR_LRU_LISTS];
- unsigned long nr_to_scan;
- unsigned long percent[2]; /* anon @ 0; file @ 1 */
enum lru_list l;
- unsigned long nr_reclaimed = sc->nr_reclaimed;
- unsigned long nr_to_reclaim = sc->nr_to_reclaim;
+ unsigned long percent[2]; /* anon @ 0; file @ 1 */
+ int noswap = 0 ;
struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
- int noswap = 0;
/* If we have no swap space, do not bother scanning anon pages. */
if (!sc->may_swap || (nr_swap_pages <= 0)) {
@@ -1629,6 +1624,20 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc)
nr[l] = nr_scan_try_batch(scan,
&reclaim_stat->nr_saved_scan[l]);
}
+}
+
+/*
+ * This is a basic per-zone page freer. Used by both kswapd and direct reclaim.
+ */
+static void shrink_zone(struct zone *zone, struct scan_control *sc)
+{
+ unsigned long nr[NR_LRU_LISTS];
+ unsigned long nr_to_scan;
+ unsigned long nr_reclaimed = sc->nr_reclaimed;
+ unsigned long nr_to_reclaim = sc->nr_to_reclaim;
+ enum ...Needs "noinline_for_stack" to stop the compiler re-inlining it. Cheers, Dave. -- Dave Chinner david@fromorbit.com --
Agreed. I was using noinline where I spotted the compiler automatically inlined but noinline_for_stack both handles future compiler changes and is self-documenting. Thanks -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
Uh, that does not sound very nice! How about calculate_scan_work() or something like that? Might as well use the function to abstract detail :) Other than that (and with the noinline_for_stack): Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> --
Remove temporary variable that is only used once and does not help
clarify code.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
mm/vmscan.c | 8 +++-----
1 files changed, 3 insertions(+), 5 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 838ac8b..1ace7c6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1685,13 +1685,12 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc)
*/
static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
{
- enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask);
struct zoneref *z;
struct zone *zone;
sc->all_unreclaimable = 1;
- for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
- sc->nodemask) {
+ for_each_zone_zonelist_nodemask(zone, z, zonelist,
+ gfp_zone(sc->gfp_mask), sc->nodemask) {
if (!populated_zone(zone))
continue;
/*
@@ -1741,7 +1740,6 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
unsigned long lru_pages = 0;
struct zoneref *z;
struct zone *zone;
- enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask);
unsigned long writeback_threshold;
delayacct_freepages_start();
@@ -1752,7 +1750,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
* mem_cgroup will not do shrink_slab.
*/
if (scanning_global_lru(sc)) {
- for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
+ for_each_zone_zonelist(zone, z, zonelist, gfp_zone(sc->gfp_mask)) {
if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
continue;
--
1.6.5
--
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Now very lots function in vmscan have `priority' argument. It consume
stack slightly. To move it on struct scan_control reduce stack.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
mm/vmscan.c | 83 ++++++++++++++++++++++++++--------------------------------
1 files changed, 37 insertions(+), 46 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1db19f8..5c276f0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -77,6 +77,8 @@ struct scan_control {
int order;
+ int priority;
+
/* Which cgroup do we reclaim from */
struct mem_cgroup *mem_cgroup;
@@ -1123,7 +1125,7 @@ static int too_many_isolated(struct zone *zone, int file,
*/
static unsigned long shrink_inactive_list(unsigned long max_scan,
struct zone *zone, struct scan_control *sc,
- int priority, int file)
+ int file)
{
LIST_HEAD(page_list);
struct pagevec pvec;
@@ -1149,7 +1151,7 @@ static unsigned long shrink_inactive_list(unsigned long max_scan,
*/
if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
lumpy_reclaim = 1;
- else if (sc->order && priority < DEF_PRIORITY - 2)
+ else if (sc->order && sc->priority < DEF_PRIORITY - 2)
lumpy_reclaim = 1;
pagevec_init(&pvec, 1);
@@ -1328,7 +1330,7 @@ static void move_active_pages_to_lru(struct zone *zone,
}
static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
- struct scan_control *sc, int priority, int file)
+ struct scan_control *sc, int file)
{
unsigned long nr_taken;
unsigned long pgscanned;
@@ -1491,17 +1493,17 @@ static int inactive_list_is_low(struct zone *zone, struct scan_control *sc,
}
static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
- struct zone *zone, struct scan_control *sc, int priority)
+ struct zone *zone, struct scan_control *sc)
{
int file = is_file_lru(lru);
if (is_active_lru(lru)) {
if (inactive_list_is_low(zone, sc, file))
- ...I don't like this much because it obfuscates value communication. Functions no longer have obvious arguments and return values, as it's all passed hidden in that struct. Do you think it's worth it? I would much rather see that thing die than expand on it... --
Sorry for the long delay on this. I got distracted by the anon_vma and page migration stuff. I don't feel strongly enough to fight about it and reducing stack usage here isn't the "fix" anyway. I'll drop this patch for now. That aside, the page reclaim algorithm maintains a lot of state and the "priority" is part of that state. While the struct means that functions might not have obvious arguments, passing the state around as arguments gets very unwieldly very quickly. I don't think killing scan_control would be as nice as you imagine although I do think it should be as small as possible. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
So the first pass seems to have been reasonably well received. Kosaki, Rik and Johannes, how you do typically test reclaim-related patches for regressions? My initial sniff-tests look ok with the page leak sorted out but I typically am not searching for vmscan regressions other than lumpy This should have been X86-64. The stack shrinkage is less on X86 -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab --
