Re: [PATCH -mm 05/15] free swap space on swap-in/activation

Previous thread: [PATCH -mm 00/15] VM pageout scalability improvements (V6) by Rik van Riel on Monday, April 28, 2008 - 2:18 pm. (3 messages)

Next thread: [PATCH -mm 01/15] FYI: vmstats are "off-by-one" by Rik van Riel on Monday, April 28, 2008 - 2:18 pm. (1 message)
To: <linux-kernel@...>
Cc: <lee.schermerhorn@...>, <akpm@...>, <kosaki.motohiro@...>
Date: Monday, April 28, 2008 - 2:18 pm

Free swap cache entries when swapping in pages if vm_swap_full()
[swap space > 1/2 used]. Uses new pagevec to reduce pressure
on locks.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Index: linux-2.6.25-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.25-mm1.orig/mm/vmscan.c 2008-04-24 10:46:43.000000000 -0400
+++ linux-2.6.25-mm1/mm/vmscan.c 2008-04-24 11:59:56.000000000 -0400
@@ -619,6 +619,9 @@ free_it:
continue;

activate_locked:
+ /* Not a candidate for swapping, so reclaim swap space. */
+ if (PageSwapCache(page) && vm_swap_full())
+ remove_exclusive_swap_page(page);
SetPageActive(page);
pgactivate++;
keep_locked:
@@ -1203,6 +1206,8 @@ static void shrink_active_list(unsigned
__mod_zone_page_state(zone, NR_ACTIVE, pgmoved);
pgmoved = 0;
spin_unlock_irq(&zone->lru_lock);
+ if (vm_swap_full())
+ pagevec_swap_free(&pvec);
__pagevec_release(&pvec);
spin_lock_irq(&zone->lru_lock);
}
@@ -1212,6 +1217,8 @@ static void shrink_active_list(unsigned
__count_zone_vm_events(PGREFILL, zone, pgscanned);
__count_vm_events(PGDEACTIVATE, pgdeactivate);
spin_unlock_irq(&zone->lru_lock);
+ if (vm_swap_full())
+ pagevec_swap_free(&pvec);

pagevec_release(&pvec);
}
Index: linux-2.6.25-mm1/mm/swap.c
===================================================================
--- linux-2.6.25-mm1.orig/mm/swap.c 2008-04-24 11:59:51.000000000 -0400
+++ linux-2.6.25-mm1/mm/swap.c 2008-04-24 11:59:56.000000000 -0400
@@ -443,6 +443,24 @@ void pagevec_strip(struct pagevec *pvec)
}
}

+/*
+ * Try to free swap space from the pages in a pagevec
+ */
+void pagevec_swap_free(struct pagevec *pvec)
+{
+ int i;
+
+ for (i = 0; i < pagevec_count(pvec); i++) {
+ struct page *page = pvec->pages[i];
+
+ if (PageSwapCache(page) && !TestSetPageLocked(page)) {
+ ...

To: Rik van Riel <riel@...>
Cc: <lee.schermerhorn@...>, <akpm@...>, <kosaki.motohiro@...>, <linux-kernel@...>
Date: Monday, May 12, 2008 - 7:21 am

Hi.

I have a question about the intention of this patch.

I think all the remove_exclusive_swap_page() you have added
are called while the _count of the page is incremented by
__isolate_lru_page().

So, IMHO, swap caches that will be freed by this patch should have
page counts from __isolate_lru_page() and the swap cache itself.

They are different from, for example, those are freed by do_swap_page()
-> remove_exclusive_swap_page(), that is, swap caches
that have been just mapped and have page counts from
the process(only user of the page) and the swap cache itself.

So, the intention of this patch is
not to free a swap space of swap cache that has already swapped in
and only used one process as do_swap_page() does,
but to free a swap space that is only used as a swap cache
(not used by any processes), right?

Regards,
Daisuke Nishimura.

--

To: Daisuke Nishimura <nishimura@...>
Cc: <lee.schermerhorn@...>, <akpm@...>, <kosaki.motohiro@...>, <linux-kernel@...>
Date: Monday, May 12, 2008 - 9:33 am

On Mon, 12 May 2008 20:21:32 +0900

No. The code should also free the swap space of pages that are
mapped by processes, in order to free up swap space for other
uses. This only happens if vm_swap_full() is true.

--
All rights reversed.
--

To: Rik van Riel <riel@...>
Cc: <lee.schermerhorn@...>, <akpm@...>, <kosaki.motohiro@...>, <linux-kernel@...>
Date: Tuesday, May 13, 2008 - 9:00 am

OK.

I thought that current code cannot free the swap space
of pages that are mapped by processes(and I think it cannot),
so I asked the intention of this patch for clarification.

Thanks,
Daisuke Nishimura.

--

To: Daisuke Nishimura <nishimura@...>
Cc: <lee.schermerhorn@...>, <akpm@...>, <kosaki.motohiro@...>, <linux-kernel@...>
Date: Tuesday, May 13, 2008 - 12:04 pm

The pageout code takes a reference count to the page, which means
that remove_exclusive_swap_page (when called from the pageout code)
needs to take that extra refcount into account for mapped pages.

Introduces a remove_exclusive_swap_page_ref function to avoid
exposing too much magic to the rest of the VM.

Signed-off-by: Rik van Riel <riel@redhat.com>
Debugged-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

---
Daisuke: thank you for pointing out the problem. Does this patch look like
a reasonable fix to you?

Andrew: this patch is incremental to my patch
"[PATCH -mm 05/15] free swap space on swap-in/activation"

Index: linux-2.6.25-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.25-mm1.orig/mm/vmscan.c 2008-05-13 11:59:34.000000000 -0400
+++ linux-2.6.25-mm1/mm/vmscan.c 2008-05-13 11:54:03.000000000 -0400
@@ -621,7 +621,7 @@ free_it:
activate_locked:
/* Not a candidate for swapping, so reclaim swap space. */
if (PageSwapCache(page) && vm_swap_full())
- remove_exclusive_swap_page(page);
+ remove_exclusive_swap_page_ref(page);
SetPageActive(page);
pgactivate++;
keep_locked:
Index: linux-2.6.25-mm1/mm/swap.c
===================================================================
--- linux-2.6.25-mm1.orig/mm/swap.c 2008-05-13 11:59:34.000000000 -0400
+++ linux-2.6.25-mm1/mm/swap.c 2008-05-13 11:53:47.000000000 -0400
@@ -455,7 +455,7 @@ void pagevec_swap_free(struct pagevec *p

if (PageSwapCache(page) && !TestSetPageLocked(page)) {
if (PageSwapCache(page))
- remove_exclusive_swap_page(page);
+ remove_exclusive_swap_page_ref(page);
unlock_page(page);
}
}
Index: linux-2.6.25-mm1/include/linux/swap.h
===================================================================
--- linux-2.6.25-mm1.orig/include/linux/swap.h 2008-05-13 11:20:46.000000000 -0400
+++ linux-2.6.25-mm1/include/linux/swap.h 2008-05-13 11:47:31.000000000 -0400
@@ -266,6 +266,7 @@ exter...

To: Rik van Riel <riel@...>
Cc: Daisuke Nishimura <nishimura@...>, <akpm@...>, <kosaki.motohiro@...>, <linux-kernel@...>
Date: Tuesday, May 13, 2008 - 1:43 pm

Or, more generally, 2 + N, for an anon page that is mapped [must be
read-only, right?] by N processes. This can happen after, e.g., fork().
Looks like this patch handles the condition just fine, but you might
want to reflect this in the comment.

Now, I think I can use this to try to remove anon pages from the swap

--

To: Lee Schermerhorn <Lee.Schermerhorn@...>
Cc: Daisuke Nishimura <nishimura@...>, <akpm@...>, <kosaki.motohiro@...>, <linux-kernel@...>
Date: Tuesday, May 13, 2008 - 2:09 pm

On Tue, 13 May 2008 13:43:55 -0400

No, this patch only removes a page from the swap cache that is mapped
by one process. The function page_mapped() returns either 1 or 0, not
the same as page_mapcount().

I am not sure if trying to handle swap cache pages that are mapped by
multiple processes could get us into other corner cases and think that
we should probably try to stick to the safe thing for now.

--
All Rights Reversed
--

To: Rik van Riel <riel@...>
Cc: Daisuke Nishimura <nishimura@...>, <akpm@...>, <kosaki.motohiro@...>, <linux-kernel@...>
Date: Tuesday, May 13, 2008 - 3:02 pm

Duh! I was reading "page_mapcount()", 'cause that's what I've been

Well, all anon pages are shared right after fork(), right? They only
become private once they've been written to. I don't have a feel for
the relative numbers of shared anon vs COWed anon--either in general or

I suppose I can just go ahead and use this version with my stress load
and count the times when I could have freed the swap cache entry, but

--

To: Lee Schermerhorn <Lee.Schermerhorn@...>
Cc: Rik van Riel <riel@...>, <akpm@...>, <kosaki.motohiro@...>, <linux-kernel@...>
Date: Wednesday, May 14, 2008 - 10:15 pm

I think it would be better to add a comment of
remove_exclusive_swap_page_ref() that it doesn't handle

Lee, Rik's version looks good to me except mlocked case, but
--

To: Daisuke Nishimura <nishimura@...>
Cc: Rik van Riel <riel@...>, <akpm@...>, <kosaki.motohiro@...>, <linux-kernel@...>
Date: Thursday, May 15, 2008 - 1:55 pm

Well, it's really just an optimization--freeing swap cache entry, if
possible, when mlocking a page. I'll patch Rik's version to use
page_mapcount() and test heavily before proposing to do this. I suppose
we can also discuss whether we want a separate version that frees swap
mapped by multiple tasks for the mlock case, and keep Rik's version for
vmscan. However, I think that if it's safe in the mlock case, it should
be safe for vmscan. We'll see.

--

To: Daisuke Nishimura <nishimura@...>
Cc: <lee.schermerhorn@...>, <akpm@...>, <kosaki.motohiro@...>, <linux-kernel@...>
Date: Tuesday, May 13, 2008 - 9:11 am

On Tue, 13 May 2008 22:00:44 +0900

You may be right, due to the refcounting magic that is going
on. I will verify the code and send in a fix if it turns out
to be needed.

--
All rights reversed.
--

Previous thread: [PATCH -mm 00/15] VM pageout scalability improvements (V6) by Rik van Riel on Monday, April 28, 2008 - 2:18 pm. (3 messages)

Next thread: [PATCH -mm 01/15] FYI: vmstats are "off-by-one" by Rik van Riel on Monday, April 28, 2008 - 2:18 pm. (1 message)