From: Rik van Riel <riel@redhat.com>
Free swap cache entries when swapping in pages if vm_swap_full()
[swap space > 1/2 used]. Uses new pagevec to reduce pressure
on locks.Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: MinChan Kim <minchan.kim@gmail.com>---
include/linux/pagevec.h | 1 +
include/linux/swap.h | 6 ++++++
mm/swap.c | 18 ++++++++++++++++++
mm/swapfile.c | 25 ++++++++++++++++++++++---
mm/vmscan.c | 7 +++++++
5 files changed, 54 insertions(+), 3 deletions(-)Index: linux-2.6.26-rc2-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/vmscan.c 2008-05-23 14:21:33.000000000 -0400
+++ linux-2.6.26-rc2-mm1/mm/vmscan.c 2008-05-23 14:21:33.000000000 -0400
@@ -619,6 +619,9 @@ free_it:
continue;activate_locked:
+ /* Not a candidate for swapping, so reclaim swap space. */
+ if (PageSwapCache(page) && vm_swap_full())
+ remove_exclusive_swap_page_ref(page);
SetPageActive(page);
pgactivate++;
keep_locked:
@@ -1203,6 +1206,8 @@ static void shrink_active_list(unsigned
__mod_zone_page_state(zone, NR_ACTIVE, pgmoved);
pgmoved = 0;
spin_unlock_irq(&zone->lru_lock);
+ if (vm_swap_full())
+ pagevec_swap_free(&pvec);
__pagevec_release(&pvec);
spin_lock_irq(&zone->lru_lock);
}
@@ -1212,6 +1217,8 @@ static void shrink_active_list(unsigned
__count_zone_vm_events(PGREFILL, zone, pgscanned);
__count_vm_events(PGDEACTIVATE, pgdeactivate);
spin_unlock_irq(&zone->lru_lock);
+ if (vm_swap_full())
+ pagevec_swap_free(&pvec);pagevec_release(&pvec);
}
Index: linux-2.6.26-rc2-mm1/mm/swap.c
===================================================================
--- linux-2.6.26-rc2-mm1.orig/mm/swap.c 2008-05-23 14:21:33.000000000 -0400
+++ linux-2.6.26-rc2-mm1/...
On Fri, 06 Jun 2008 16:28:42 -0400
The patch puts rather a lot of pressure onto vm_swap_full(). We might
want to look into optimising that.- Is the 50% thing optimum? Could go higher and perhaps should be
based on amount-of-memory.- Can precalculate the fraction rather than doing it inline all the time.
- Can make total_swap_pages __read_mostly and have a think about
nr_swap_pages too.- Can completely optimise the thing away if !CONFIG_SWAP.
What's going on here.
Normally we'll bump a page's refcount to account for its presence in a
pagevec. This code doesn't do that.Is it safe? If so, how come?
Suitable code comments should be added which explain this unusual and
dangerous optimisation. Or fix the bug :)--
I think we can optimize more and more in case of if !CONFIG_SWAP.
If system are !CONFIG_SWAP, we can never swap out anonymous pages.Don't we manage anonymous pages with list ?
Such system don't need to insert anonymous pages into lru list when fault occur.
It is just needless overhead.Also, If we can't reclaim anonymous pages, don't we need anon rmap facility ?
I don't know well which subsystems used rmap.If pageout only use anonymous rmap for pageout,
we can remove anonymous rmapping code in case of !CONFIG_SWAPHow about your opinion ?
--
Kinds regards,
MinChan Kim
--
You are right.
but I think rmap isn't bottle neck in embedded system.
Why do you want remove that?
--
On Mon, Jun 9, 2008 at 10:38 PM, KOSAKI Motohiro
I think unnecessary code execution may increase power consumption and
cache footprint.
But I am not sure pageout just only one user in rmap.
If another subsystem (ex, xen, kvm) use rmap for managing memory, we
cannot remove rmap.--
Kinds regards,
MinChan Kim
--
On Mon, 9 Jun 2008 11:14:55 +0900
You are right that it can be done. However, doing all of those
!CONFIG_SWAP optimizations are not my priority right now. Most
of them would also apply separately to the current upstream VM.Patches are welcome, though.
--
All rights reversed.
--
On Fri, 6 Jun 2008 18:04:30 -0700
I do not know if 50% is optimum. It is just what the upstream kernel
has had since 2.4.10 or so. Before that it used to be 75%. This sameI wonder if we wouldn't be off best simply placing the two on their
own cache line. After all, they are often handled together.I believe that should be a separate patch though, since I am not
changing that situation from what is already upstream and thisWith CONFIG_SWAP=n the macro PageSwapCache(page) will always be
declared false due to the declarations in page-flags.h. That
means that vm_swap_full() will be evaluated and the compiler
should leave it out.#ifdef CONFIG_SWAP
PAGEFLAG(SwapCache, swapcache)
#else
PAGEFLAG_FALSE(SwapCache)It is safe because the callers already hold an extra reference to
each page. I have added a full kerneldoc comment to this function
to explain that.--
All rights reversed.
--
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| Heiko Carstens | Re: -mm merge plans for 2.6.23 -- sys_fallocate |
git: | |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | Re: [GIT]: Networking |
| Jarek Poplawski | [PATCH iproute2] Re: HTB accuracy for high speed |
