Re: [RFC 5/5] truncate: Remove unnecessary page release

Previous thread: [GIT PULL for v2.6.37-rc6] V4L/DVB BKL fixes by Mauro Carvalho Chehab on Friday, December 17, 2010 - 9:50 am. (1 message)

Next thread: reproducer for DM on MD flush deadlock? (was: Re: [PULL REQUEST] md bug fixes) by Mike Snitzer on Friday, December 17, 2010 - 11:13 am. (2 messages)
From: Minchan Kim
Date: Friday, December 17, 2010 - 10:13 am

I copy description of 1/5.

Now we add page reference on add_to_page_cache but doesn't drop it
in remove_from_page_cache. Such asymmetric makes confusing about
page reference so that caller should notice it and comment why they
release page reference. It's not good API.

Long time ago, Hugh tried it[1] but gave up of reason which
reiser4's drop_page had to unlock the page between removing it from
page cache and doing the page_cache_release. But now the situation is
changed. I think at least things in current mainline doesn't have any
obstacles. The problem is fs or somethings out of mainline.
If it has done such thing like reiser4, this patch could be a problem.

Do anyone know the such things? Do we care about things out of mainline?

[1] http://lkml.org/lkml/2004/10/24/140

Minchan Kim (5):
  drop page reference on remove_from_page_cache
  fuse: Remove unnecessary page release
  tlbfs: Remove unnecessary page release
  swap: Remove unnecessary page release
  truncate: Remove unnecessary page release

 fs/fuse/dev.c        |    1 -
 fs/hugetlbfs/inode.c |    1 -
 mm/filemap.c         |   12 ++++++++++++
 mm/shmem.c           |    1 -
 mm/truncate.c        |    1 -
 5 files changed, 12 insertions(+), 4 deletions(-)

--

From: Minchan Kim
Date: Friday, December 17, 2010 - 10:13 am

Now we add page reference on add_to_page_cache but doesn't drop it
in remove_from_page_cache. Such asymmetric makes confusing about
page reference so that caller should notice it and comment why they
release page reference. It's not good API.

Long time ago, Hugh tried it[1] but gave up of reason which
reiser4's drop_page had to unlock the page between removing it from
page cache and doing the page_cache_release. But now the situation is
changed. I think at least things in current mainline doesn't have any
obstacles. The problem is fs or somethings out of mainline.
If it has done such thing like reiser4, this patch could be a problem.

Do anyone know the such things? Do we care about things out of mainline?

Note :
The comment of remove_from_page_cache make by copy & paste & s/swap/page/
from delete_from_swap_cache.

[1] http://lkml.org/lkml/2004/10/24/140

Cc: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
---
 mm/filemap.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 095c393..fb9db36 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -148,6 +148,16 @@ void __remove_from_page_cache(struct page *page)
 	}
 }
 
+/**
+ * remove_from_page_cache - remove page from page cache
+ *
+ * @page: the page which the kernel is trying to remove from page cache
+ *
+ * This must be called only on pages that have
+ * been verified to be in the page cache and locked.
+ * It will never put the page into the free list,
+ * the caller has a reference on the page.
+ */
 void remove_from_page_cache(struct page *page)
 {
 	struct address_space *mapping = page->mapping;
@@ -163,6 +173,8 @@ void remove_from_page_cache(struct page *page)
 
 	if (freepage)
 		freepage(page);
+
+	page_cache_release(page);
 }
 EXPORT_SYMBOL(remove_from_page_cache);
 
-- 
1.7.0.4

--

From: KAMEZAWA Hiroyuki
Date: Sunday, December 19, 2010 - 6:53 pm

On Sat, 18 Dec 2010 02:13:36 +0900

I like this.
Reviewd-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--

From: Minchan Kim
Date: Friday, December 17, 2010 - 10:13 am

This patch series changes remove_from_page_cache's page ref counting
rule. page cache ref count is decreased in remove_from_page_cache.
So we don't need call again in caller context.

Cc: William Irwin <wli@holomorphy.com>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
---
 fs/hugetlbfs/inode.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 9885082..4f32fb6 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -333,7 +333,6 @@ static void truncate_huge_page(struct page *page)
 	cancel_dirty_page(page, /* No IO accounting for huge pages? */0);
 	ClearPageUptodate(page);
 	remove_from_page_cache(page);
-	put_page(page);
 }
 
 static void truncate_hugepages(struct inode *inode, loff_t lstart)
-- 
1.7.0.4

--

From: KAMEZAWA Hiroyuki
Date: Sunday, December 19, 2010 - 6:54 pm

On Sat, 18 Dec 2010 02:13:38 +0900

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--

From: Minchan Kim
Date: Friday, December 17, 2010 - 10:13 am

This patch series changes remove_from_page_cache's page ref counting
rule. page cache ref count is decreased in remove_from_page_cache.
So we don't need call again in caller context.

Cc:Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
---
 mm/shmem.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index fa9acc9..16800f2 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1091,7 +1091,6 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 		spin_unlock(&info->lock);
 		swap_shmem_alloc(swap);
 		BUG_ON(page_mapped(page));
-		page_cache_release(page);	/* pagecache ref */
 		swap_writepage(page, wbc);
 		if (inode) {
 			mutex_lock(&shmem_swaplist_mutex);
-- 
1.7.0.4

--

From: KAMEZAWA Hiroyuki
Date: Sunday, December 19, 2010 - 6:55 pm

On Sat, 18 Dec 2010 02:13:39 +0900

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--

From: Minchan Kim
Date: Friday, December 17, 2010 - 10:13 am

This patch series changes remove_from_page_cache's page ref counting
rule. page cache ref count is decreased in remove_from_page_cache.
So we don't need call again in caller context.

Cc: Nick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-mm@kvack.org
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
---
 mm/truncate.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index 9ee5673..8decb93 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -114,7 +114,6 @@ truncate_complete_page(struct address_space *mapping, struct page *page)
 	 * calls cleancache_put_page (and note page->mapping is now NULL)
 	 */
 	cleancache_flush_page(mapping, page);
-	page_cache_release(page);	/* pagecache ref */
 	return 0;
 }
 
-- 
1.7.0.4

--

From: KAMEZAWA Hiroyuki
Date: Sunday, December 19, 2010 - 6:55 pm

On Sat, 18 Dec 2010 02:13:40 +0900

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--

From: KOSAKI Motohiro
Date: Sunday, December 19, 2010 - 7:21 pm

Do we _always_ have stable page reference here? IOW, I can assume 
cleancache_flush_page() doesn't cause NULL deref?



--

From: Minchan Kim
Date: Sunday, December 19, 2010 - 7:27 pm

On Mon, Dec 20, 2010 at 11:21 AM, KOSAKI Motohiro

I think so.



-- 
Kind regards,
Minchan Kim
--

From: KOSAKI Motohiro
Date: Sunday, December 19, 2010 - 7:32 pm

Hmm...

Perhaps, I'm missing something. But I think  __memory_failure() only lock 



--

From: Minchan Kim
Date: Sunday, December 19, 2010 - 7:49 pm

On Mon, Dec 20, 2010 at 11:32 AM, KOSAKI Motohiro

The page passed truncate_complete_page is only head page?
Is it possible to pass the page which isn't head of compound in



-- 
Kind regards,
Minchan Kim
--

From: KOSAKI Motohiro
Date: Sunday, December 19, 2010 - 8:03 pm

I dunno, really. My five miniture grep found following logic. therefore I asked you.



__memory_failure()
{
        p = pfn_to_page(pfn);
        hpage = compound_head(p);
(snip)
        res = -EBUSY;
        for (ps = error_states;; ps++) {
                if ((p->flags & ps->mask) == ps->res) {
                        res = page_action(ps, p, pfn);  // call truncate here
                        break;
                }
        }
out:
        unlock_page(hpage);
}



--

From: Minchan Kim
Date: Sunday, December 19, 2010 - 8:31 pm

On Mon, Dec 20, 2010 at 12:03 PM, KOSAKI Motohiro

AFAIK, We have to handle head page when we handle compound page.
Internal page handling logic about tail pages is hidden by compound
page internal.

So I think memory_failure also don't have a problem.
For needing double check, Cced Andi.

Thanks for the review, KOSAKI.


-- 
Kind regards,
Minchan Kim
--

From: KAMEZAWA Hiroyuki
Date: Sunday, December 19, 2010 - 7:27 pm

On Mon, 20 Dec 2010 11:21:52 +0900 (JST)
Hmm, my review was bad.

I think cleancache_flush_page() here should eat (mapping, index) as argument
rather than "page".

BTW,  I can't understand
==
void __cleancache_flush_page(struct address_space *mapping, struct page *page)
{
        /* careful... page->mapping is NULL sometimes when this is called */
        int pool_id = mapping->host->i_sb->cleancache_poolid;
        struct cleancache_filekey key = { .u.key = { 0 } };
==

Why above is safe...
I think (mapping,index) should be passed instead of page.


-Kame

--

From: Minchan Kim
Date: Sunday, December 19, 2010 - 7:58 pm

On Mon, Dec 20, 2010 at 11:27 AM, KAMEZAWA Hiroyuki

I don't think current code isn't safe.

void __cleancache_flush_page(struct address_space *mapping, struct page *page)
{
        /* careful... page->mapping is NULL sometimes when this is called */
        int pool_id = mapping->host->i_sb->cleancache_poolid;
        struct cleancache_filekey key = { .u.key = { 0 } };

        if (pool_id >= 0) {
                VM_BUG_ON(!PageLocked(page));

it does check PageLocked. So caller should hold a page reference to
prevent freeing ramined PG_locked
If the caller doesn't hold a ref of page, I think it's BUG of caller.




-- 
Kind regards,
Minchan Kim
--

From: KAMEZAWA Hiroyuki
Date: Sunday, December 19, 2010 - 9:35 pm

On Mon, 20 Dec 2010 11:58:50 +0900

Ah, my point is that this function trust page->index even if page->mapping is
reset to NULL. And I'm not sure that there are any race that an other thread
add a replacement page for (mapping, index) while a thread call this function.

Thanks,
-Kame



--

From: Minchan Kim
Date: Monday, December 20, 2010 - 1:09 am

On Mon, Dec 20, 2010 at 1:35 PM, KAMEZAWA Hiroyuki

Because the page is locked and is detached from page cache, I guess
it's no problem.



-- 
Kind regards,
Minchan Kim
--

From: KAMEZAWA Hiroyuki
Date: Monday, December 20, 2010 - 1:54 am

On Mon, 20 Dec 2010 17:09:11 +0900

yes.

Thanks,
-Kame

--

From: Minchan Kim
Date: Friday, December 17, 2010 - 10:13 am

This patch series changes remove_from_page_cache's page ref counting
rule. page cache ref count is decreased in remove_from_page_cache.
So we don't need call again in caller context.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: fuse-devel@lists.sourceforge.net
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
---
 fs/fuse/dev.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index cf8d28d..4adaf4b 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -738,7 +738,6 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
 		goto out_fallback_unlock;
 
 	remove_from_page_cache(oldpage);
-	page_cache_release(oldpage);
 
 	err = add_to_page_cache_locked(newpage, mapping, index, GFP_KERNEL);
 	if (err) {
-- 
1.7.0.4

--

From: KAMEZAWA Hiroyuki
Date: Sunday, December 19, 2010 - 6:54 pm

On Sat, 18 Dec 2010 02:13:37 +0900

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--

From: Christoph Hellwig
Date: Monday, December 20, 2010 - 3:33 am

You'll need to merge all patches into one, otherwise you create really
nasty memory leaks when bisecting between them.

--

From: Minchan Kim
Date: Monday, December 20, 2010 - 4:36 pm

Okay. I will resend.

Thanks for the notice, Christoph.



-- 
Kind regards,
Minchan Kim
--

From: Hugh Dickins
Date: Monday, December 20, 2010 - 10:07 pm

Good point from hch, but I feel even more strongly: if you're going to
do this now, please rename remove_from_page_cache (delete_from_page_cache
was what I chose back when I misdid it) - you're changing an EXPORTed
function in a subtle (well, subtlish) confusing way, which could easily
waste people's time down the line, whether in not-yet-in-tree filesystems
or backports of fixes.  I'd much rather you break someone's build,
forcing them to look at what changed, than crash or leak at runtime.

If you do rename, you can keep your patch structure, introducing the
new function as a wrapper to the old at the beginning, then removing
the old function at the end.

(As you know, I do agree that it's right to decrement the reference
count at the point of removing from page cache.)

Hugh
--

From: Minchan Kim
Date: Tuesday, December 21, 2010 - 12:32 am

It is very good idea!!



-- 
Kind regards,
Minchan Kim
--

Previous thread: [GIT PULL for v2.6.37-rc6] V4L/DVB BKL fixes by Mauro Carvalho Chehab on Friday, December 17, 2010 - 9:50 am. (1 message)

Next thread: reproducer for DM on MD flush deadlock? (was: Re: [PULL REQUEST] md bug fixes) by Mike Snitzer on Friday, December 17, 2010 - 11:13 am. (2 messages)