I copy description of 1/5. Now we add page reference on add_to_page_cache but doesn't drop it in remove_from_page_cache. Such asymmetric makes confusing about page reference so that caller should notice it and comment why they release page reference. It's not good API. Long time ago, Hugh tried it[1] but gave up of reason which reiser4's drop_page had to unlock the page between removing it from page cache and doing the page_cache_release. But now the situation is changed. I think at least things in current mainline doesn't have any obstacles. The problem is fs or somethings out of mainline. If it has done such thing like reiser4, this patch could be a problem. Do anyone know the such things? Do we care about things out of mainline? [1] http://lkml.org/lkml/2004/10/24/140 Minchan Kim (5): drop page reference on remove_from_page_cache fuse: Remove unnecessary page release tlbfs: Remove unnecessary page release swap: Remove unnecessary page release truncate: Remove unnecessary page release fs/fuse/dev.c | 1 - fs/hugetlbfs/inode.c | 1 - mm/filemap.c | 12 ++++++++++++ mm/shmem.c | 1 - mm/truncate.c | 1 - 5 files changed, 12 insertions(+), 4 deletions(-) --
Now we add page reference on add_to_page_cache but doesn't drop it in remove_from_page_cache. Such asymmetric makes confusing about page reference so that caller should notice it and comment why they release page reference. It's not good API. Long time ago, Hugh tried it[1] but gave up of reason which reiser4's drop_page had to unlock the page between removing it from page cache and doing the page_cache_release. But now the situation is changed. I think at least things in current mainline doesn't have any obstacles. The problem is fs or somethings out of mainline. If it has done such thing like reiser4, this patch could be a problem. Do anyone know the such things? Do we care about things out of mainline? Note : The comment of remove_from_page_cache make by copy & paste & s/swap/page/ from delete_from_swap_cache. [1] http://lkml.org/lkml/2004/10/24/140 Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Signed-off-by: Minchan Kim <minchan.kim@gmail.com> --- mm/filemap.c | 12 ++++++++++++ 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 095c393..fb9db36 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -148,6 +148,16 @@ void __remove_from_page_cache(struct page *page) } } +/** + * remove_from_page_cache - remove page from page cache + * + * @page: the page which the kernel is trying to remove from page cache + * + * This must be called only on pages that have + * been verified to be in the page cache and locked. + * It will never put the page into the free list, + * the caller has a reference on the page. + */ void remove_from_page_cache(struct page *page) { struct address_space *mapping = page->mapping; @@ -163,6 +173,8 @@ void remove_from_page_cache(struct page *page) if (freepage) freepage(page); + + page_cache_release(page); } EXPORT_SYMBOL(remove_from_page_cache); -- 1.7.0.4 --
On Sat, 18 Dec 2010 02:13:36 +0900 I like this. Reviewd-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --
This patch series changes remove_from_page_cache's page ref counting rule. page cache ref count is decreased in remove_from_page_cache. So we don't need call again in caller context. Cc: William Irwin <wli@holomorphy.com> Signed-off-by: Minchan Kim <minchan.kim@gmail.com> --- fs/hugetlbfs/inode.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 9885082..4f32fb6 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -333,7 +333,6 @@ static void truncate_huge_page(struct page *page) cancel_dirty_page(page, /* No IO accounting for huge pages? */0); ClearPageUptodate(page); remove_from_page_cache(page); - put_page(page); } static void truncate_hugepages(struct inode *inode, loff_t lstart) -- 1.7.0.4 --
On Sat, 18 Dec 2010 02:13:38 +0900 Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --
This patch series changes remove_from_page_cache's page ref counting rule. page cache ref count is decreased in remove_from_page_cache. So we don't need call again in caller context. Cc:Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Signed-off-by: Minchan Kim <minchan.kim@gmail.com> --- mm/shmem.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index fa9acc9..16800f2 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1091,7 +1091,6 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) spin_unlock(&info->lock); swap_shmem_alloc(swap); BUG_ON(page_mapped(page)); - page_cache_release(page); /* pagecache ref */ swap_writepage(page, wbc); if (inode) { mutex_lock(&shmem_swaplist_mutex); -- 1.7.0.4 --
On Sat, 18 Dec 2010 02:13:39 +0900 Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --
This patch series changes remove_from_page_cache's page ref counting rule. page cache ref count is decreased in remove_from_page_cache. So we don't need call again in caller context. Cc: Nick Piggin <npiggin@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-mm@kvack.org Signed-off-by: Minchan Kim <minchan.kim@gmail.com> --- mm/truncate.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/mm/truncate.c b/mm/truncate.c index 9ee5673..8decb93 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -114,7 +114,6 @@ truncate_complete_page(struct address_space *mapping, struct page *page) * calls cleancache_put_page (and note page->mapping is now NULL) */ cleancache_flush_page(mapping, page); - page_cache_release(page); /* pagecache ref */ return 0; } -- 1.7.0.4 --
On Sat, 18 Dec 2010 02:13:40 +0900 Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --
Do we _always_ have stable page reference here? IOW, I can assume cleancache_flush_page() doesn't cause NULL deref? --
On Mon, Dec 20, 2010 at 11:21 AM, KOSAKI Motohiro I think so. -- Kind regards, Minchan Kim --
Hmm... Perhaps, I'm missing something. But I think __memory_failure() only lock --
On Mon, Dec 20, 2010 at 11:32 AM, KOSAKI Motohiro The page passed truncate_complete_page is only head page? Is it possible to pass the page which isn't head of compound in -- Kind regards, Minchan Kim --
I dunno, really. My five miniture grep found following logic. therefore I asked you.
__memory_failure()
{
p = pfn_to_page(pfn);
hpage = compound_head(p);
(snip)
res = -EBUSY;
for (ps = error_states;; ps++) {
if ((p->flags & ps->mask) == ps->res) {
res = page_action(ps, p, pfn); // call truncate here
break;
}
}
out:
unlock_page(hpage);
}
--
On Mon, Dec 20, 2010 at 12:03 PM, KOSAKI Motohiro AFAIK, We have to handle head page when we handle compound page. Internal page handling logic about tail pages is hidden by compound page internal. So I think memory_failure also don't have a problem. For needing double check, Cced Andi. Thanks for the review, KOSAKI. -- Kind regards, Minchan Kim --
On Mon, 20 Dec 2010 11:21:52 +0900 (JST)
Hmm, my review was bad.
I think cleancache_flush_page() here should eat (mapping, index) as argument
rather than "page".
BTW, I can't understand
==
void __cleancache_flush_page(struct address_space *mapping, struct page *page)
{
/* careful... page->mapping is NULL sometimes when this is called */
int pool_id = mapping->host->i_sb->cleancache_poolid;
struct cleancache_filekey key = { .u.key = { 0 } };
==
Why above is safe...
I think (mapping,index) should be passed instead of page.
-Kame
--
On Mon, Dec 20, 2010 at 11:27 AM, KAMEZAWA Hiroyuki
I don't think current code isn't safe.
void __cleancache_flush_page(struct address_space *mapping, struct page *page)
{
/* careful... page->mapping is NULL sometimes when this is called */
int pool_id = mapping->host->i_sb->cleancache_poolid;
struct cleancache_filekey key = { .u.key = { 0 } };
if (pool_id >= 0) {
VM_BUG_ON(!PageLocked(page));
it does check PageLocked. So caller should hold a page reference to
prevent freeing ramined PG_locked
If the caller doesn't hold a ref of page, I think it's BUG of caller.
--
Kind regards,
Minchan Kim
--
On Mon, 20 Dec 2010 11:58:50 +0900 Ah, my point is that this function trust page->index even if page->mapping is reset to NULL. And I'm not sure that there are any race that an other thread add a replacement page for (mapping, index) while a thread call this function. Thanks, -Kame --
On Mon, Dec 20, 2010 at 1:35 PM, KAMEZAWA Hiroyuki Because the page is locked and is detached from page cache, I guess it's no problem. -- Kind regards, Minchan Kim --
On Mon, 20 Dec 2010 17:09:11 +0900 yes. Thanks, -Kame --
This patch series changes remove_from_page_cache's page ref counting rule. page cache ref count is decreased in remove_from_page_cache. So we don't need call again in caller context. Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: fuse-devel@lists.sourceforge.net Signed-off-by: Minchan Kim <minchan.kim@gmail.com> --- fs/fuse/dev.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index cf8d28d..4adaf4b 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -738,7 +738,6 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep) goto out_fallback_unlock; remove_from_page_cache(oldpage); - page_cache_release(oldpage); err = add_to_page_cache_locked(newpage, mapping, index, GFP_KERNEL); if (err) { -- 1.7.0.4 --
On Sat, 18 Dec 2010 02:13:37 +0900 Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --
You'll need to merge all patches into one, otherwise you create really nasty memory leaks when bisecting between them. --
Okay. I will resend. Thanks for the notice, Christoph. -- Kind regards, Minchan Kim --
Good point from hch, but I feel even more strongly: if you're going to do this now, please rename remove_from_page_cache (delete_from_page_cache was what I chose back when I misdid it) - you're changing an EXPORTed function in a subtle (well, subtlish) confusing way, which could easily waste people's time down the line, whether in not-yet-in-tree filesystems or backports of fixes. I'd much rather you break someone's build, forcing them to look at what changed, than crash or leak at runtime. If you do rename, you can keep your patch structure, introducing the new function as a wrapper to the old at the beginning, then removing the old function at the end. (As you know, I do agree that it's right to decrement the reference count at the point of removing from page cache.) Hugh --
It is very good idea!! -- Kind regards, Minchan Kim --
