> This patch is based on mmotm-11-23.
>
> Recently, there are reported problem about thrashing.
> (
http://marc.info/?l=rsync&m=128885034930933&w=2)
> It happens by backup workloads(ex, nightly rsync).
> That's because the workload makes just use-once pages
> and touches pages twice. It promotes the page into
> active list so that it results in working set page eviction.
>
> Some app developer want to support POSIX_FADV_NOREUSE.
> But other OSes don't support it, either.
> (
http://marc.info/?l=linux-mm&m=128928979512086&w=2)
>
> By Other approach, app developer uses POSIX_FADV_DONTNEED.
> But it has a problem. If kernel meets page is writing
> during invalidate_mapping_pages, it can't work.
> It is very hard for application programmer to use it.
> Because they always have to sync data before calling
> fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could
> be discardable. At last, they can't use deferred write of kernel
> so that they could see performance loss.
> (
http://insights.oetiker.ch/linux/fadvise.html)
>
> In fact, invalidation is very big hint to reclaimer.
> It means we don't use the page any more. So let's move
> the writing page into inactive list's head.
>
> Why I need the page to head, Dirty/Writeback page would be flushed
> sooner or later. This patch uses trick PG_reclaim so the page would
> be moved into tail of inactive list when the page writeout completes.
>
> It can prevent writeout of pageout which is less effective than
> flusher's writeout.
>
> This patch considers page_mappged(page) with working set.
> So the page could leave head of inactive to get a change to activate.
>
> Originally, I reused lru_demote of Peter with some change so added
> his Signed-off-by.
>
> Note :
> PG_reclaim trick of writeback page could race with end_page_writeback
> so this patch check PageWriteback one more. It makes race window time
> reall small. But by theoretical, it still have a race. But it's a trivial.
>
> Quote from fe3cba17 and some modification
> "If some page PG_reclaim unintentionally, it will confuse readahead and
> make it restart the size rampup process. But it's a trivial problem, and
> can mostly be avoided by checking PageWriteback(page) first in readahead"
>
> PG_reclaim trick of dirty page don't work now since clear_page_dirty_for_io
> always clears PG_reclaim. Next patch will fix it.
>
> Reported-by: Ben Gamari <bgamari.foss@gmail.com>
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>
>
> Changelog since v1:
> - modify description
> - correct typo
> - add some comment
> - change deactivation policy
> ---
> mm/swap.c | 84 +++++++++++++++++++++++++++++++++++++++++++++---------------
> 1 files changed, 63 insertions(+), 21 deletions(-)
>
> diff --git a/mm/swap.c b/mm/swap.c
> index 31f5ec4..345eca1 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -268,10 +268,65 @@ void add_page_to_unevictable_list(struct page *page)
> spin_unlock_irq(&zone->lru_lock);
> }
>
> -static void __pagevec_lru_deactive(struct pagevec *pvec)
> +/*
> + * This function is used by invalidate_mapping_pages.
> + * If the page can't be invalidated, this function moves the page
> + * into inative list's head or tail to reclaim ASAP and evict
> + * working set page.
> + *
> + * PG_reclaim means when the page's writeback completes, the page
> + * will move into tail of inactive for reclaiming ASAP.
> + *
> + * 1. active, mapped page -> inactive, head
> + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
> + * 3. inactive, mapped page -> none
> + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
> + * 5. others -> none
> + *
> + * In 4, why it moves inactive's head, the VM expects the page would
> + * be writeout by flusher. The flusher's writeout is much effective than
> + * reclaimer's random writeout.
> + */
> +static void __lru_deactivate(struct page *page, struct zone *zone)
> {
> - int i, lru, file;
> + int lru, file;
> + int active = 0;
> +
> + if (!PageLRU(page))
> + return;
> +
> + if (PageActive(page))
> + active = 1;
> + /* Some processes are using the page */
> + if (page_mapped(page) && !active)
> + return;
> +