Re: [RFC][PATCH] page->mapping clarification [1/3] base functions

Previous thread: Problem: one driver and 4 instances with different parameters by Andrey Kamchatnikov on Tuesday, September 18, 2007 - 11:54 pm. (5 messages)

Next thread: sys_chroot+sys_fchdir Fix by majkls on Wednesday, September 19, 2007 - 12:19 am. (34 messages)
From: KAMEZAWA Hiroyuki
Date: Wednesday, September 19, 2007 - 12:43 am

Rebased to 2.6.23-rc6-mm1 and reflected comments.
Not so aggresive as previous version.
(I'm not in a hurry if -mm is busy.)

Any comments are welcome.

Thanks,
-Kame
==
A clarification of page <-> fs interface (page cache).

At first, each FS has to access to struct page->mapping directly.
But it's not just pointer. (we use special 1bit enconding for anon.)

Although there is historical consensus that page->mapping points to its inode's
address space, I think adding some neat helper functon is not bad.

This patch adds page-cache.h which containes page<->address_space<->inode
function which is required (used) by subsystems.

Following functions are added

 * page_mapping_cache() ... returns address space if a page is page cache
 * page_mapping_anon()  ... returns anon_vma if a page is anonymous page.
 * page_is_pagecache()  ... returns true if a page is page-cache.
 * page_inode()         ... returns inode which a page-cache belongs to.
 * is_page_consistent() ... returns true if a page is still valid page cache 

Followings are moved 
 * page_mapping()       ... returns swapper_space or address_space a page is on.
			    (from mm.h)
 * page_index()         ... returns position of a page in its inode
			    (from mm.h)
 * remove_mapping()     ... a safe routine to remove page->mapping from page.
			    (from swap.h)

Changelog V1 -> V2:
 - for 2.6.23-rc6-mm1.
 - use bool type.
 - moved related functions to page-cache.h
 - renamed some functions.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 include/linux/fs.h         |    6 +-
 include/linux/mm.h         |   40 ---------------
 include/linux/page-cache.h |  118 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/swap.h       |    1 
 4 files changed, 123 insertions(+), 42 deletions(-)

Index: linux-2.6.23-rc6-mm1/include/linux/page-cache.h
===================================================================
--- /dev/null
+++ ...
From: KAMEZAWA Hiroyuki
Date: Wednesday, September 19, 2007 - 12:44 am

Make use of page-cache.h functions in /mm layer.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 mm/filemap.c        |   19 ++++++++++---------
 mm/memory.c         |    5 +++--
 mm/migrate.c        |    8 ++------
 mm/page-writeback.c |    4 ++--
 mm/rmap.c           |   36 ++++++++++++++----------------------
 mm/shmem.c          |    4 ++--
 mm/truncate.c       |   15 ++++++++-------
 7 files changed, 41 insertions(+), 50 deletions(-)

Index: linux-2.6.23-rc6-mm1/mm/filemap.c
===================================================================
--- linux-2.6.23-rc6-mm1.orig/mm/filemap.c
+++ linux-2.6.23-rc6-mm1/mm/filemap.c
@@ -115,7 +115,7 @@ generic_file_direct_IO(int rw, struct ki
  */
 void __remove_from_page_cache(struct page *page)
 {
-	struct address_space *mapping = page->mapping;
+	struct address_space *mapping = page_mapping_cache(page);
 
 	mem_container_uncharge_page(page);
 	radix_tree_delete(&mapping->page_tree, page->index);
@@ -127,7 +127,7 @@ void __remove_from_page_cache(struct pag
 
 void remove_from_page_cache(struct page *page)
 {
-	struct address_space *mapping = page->mapping;
+	struct address_space *mapping = page_mapping_cache(page);
 
 	BUG_ON(!PageLocked(page));
 
@@ -641,7 +641,7 @@ repeat:
 			__lock_page(page);
 
 			/* Has the page been truncated while we slept? */
-			if (unlikely(page->mapping != mapping)) {
+			if (unlikely(!is_page_consistent(page, mapping))) {
 				unlock_page(page);
 				page_cache_release(page);
 				goto repeat;
@@ -750,7 +750,7 @@ unsigned find_get_pages_contig(struct ad
 	ret = radix_tree_gang_lookup(&mapping->page_tree,
 				(void **)pages, index, nr_pages);
 	for (i = 0; i < ret; i++) {
-		if (pages[i]->mapping == NULL || pages[i]->index != index)
+		if (!page_is_pagecache(pages[i]) || pages[i]->index != index)
 			break;
 
 		page_cache_get(pages[i]);
@@ -979,7 +979,7 @@ page_not_up_to_date:
 		lock_page(page);
 
 		/* Did it get truncated before we got the lock? ...
From: KAMEZAWA Hiroyuki
Date: Wednesday, September 19, 2007 - 12:46 am

Make use of page-cache.h in fs-generic layer.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


---
 fs/buffer.c       |   43 ++++++++++++++++++++++---------------------
 fs/fs-writeback.c |    2 +-
 fs/libfs.c        |    2 +-
 fs/mpage.c        |   13 +++++++------
 4 files changed, 31 insertions(+), 29 deletions(-)

Index: linux-2.6.23-rc6-mm1/fs/buffer.c
===================================================================
--- linux-2.6.23-rc6-mm1.orig/fs/buffer.c
+++ linux-2.6.23-rc6-mm1/fs/buffer.c
@@ -467,7 +467,7 @@ static void end_buffer_async_write(struc
 					"I/O error on %s\n",
 			       bdevname(bh->b_bdev, b));
 		}
-		set_bit(AS_EIO, &page->mapping->flags);
+		set_bit(AS_EIO, &page_mapping_cache(page)->flags);
 		set_buffer_write_io_error(bh);
 		clear_buffer_uptodate(bh);
 		SetPageError(page);
@@ -678,7 +678,7 @@ void write_boundary_block(struct block_d
 void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode)
 {
 	struct address_space *mapping = inode->i_mapping;
-	struct address_space *buffer_mapping = bh->b_page->mapping;
+	struct address_space *buffer_mapping = page_mapping_cache(bh->b_page);
 
 	mark_buffer_dirty(bh);
 	if (!mapping->assoc_mapping) {
@@ -713,7 +713,7 @@ static int __set_page_dirty(struct page 
 		return 0;
 
 	write_lock_irq(&mapping->tree_lock);
-	if (page->mapping) {	/* Race with truncate? */
+	if (is_page_consistent(page, mapping)) {/* Race with truncate? */
 		WARN_ON_ONCE(warn && !PageUptodate(page));
 
 		if (mapping_cap_account_dirty(mapping)) {
@@ -1204,7 +1204,8 @@ void __bforget(struct buffer_head *bh)
 {
 	clear_buffer_dirty(bh);
 	if (!list_empty(&bh->b_assoc_buffers)) {
-		struct address_space *buffer_mapping = bh->b_page->mapping;
+		struct address_space *buffer_mapping;
+		buffer_mapping = page_mapping_cache(bh->b_page);
 
 		spin_lock(&buffer_mapping->private_lock);
 		list_del_init(&bh->b_assoc_buffers);
@@ -1544,7 +1545,7 @@ void create_empty_buffers(struct page ...
From: Christoph Lameter
Date: Thursday, September 20, 2007 - 11:26 am

That is confusing.

if (PageAnon(page))
	return NULL;


Why do we need a special function? Why is it safer?

-

From: KAMEZAWA Hiroyuki
Date: Thursday, September 20, 2007 - 5:50 pm

On Thu, 20 Sep 2007 11:26:47 -0700 (PDT)
Honestly, I have 3 purposes, 2 for readability/clarificaton and 1 for my trial.

1. Clarify page cache <-> inode relationship before *new concept of page cache*,
   yours or someone else's is introduced.

2. There are some places using PAGE_MAPPING_ANON directly. I don't want to see
   following line in .c file. 
   ==
   anon_vma = (struct anon_vma *)(mapping - PAGE_MAPPING_ANON);
   ==

3. I want to *try* page->mapping overriding... store  memory resource controller's   
   information in page->mapping. By this, memory controller doesn't enlarge sizeof
   struct page. (works well in my small test.)

This is my first experience of using bool in Linux kernel.. :)

I know bool is not very widely used in Linux now but I tried it because 
this function obviously returns yes or no, and C language supports bool as
For clarify meaning of compareing page_mapping_cache() with mapping.
Does this reduce readability ?

Thank you for comments.

Regards,
-Kame





-

From: Hugh Dickins
Date: Friday, September 21, 2007 - 10:02 am

My own vote (nothing more) would be for you to set this aside until
some future time when there aren't a dozen developers all trampling
over each other in this area.

They're invasive little changes affecting all filesystems, whereas what
we've done so far with page->mapping hasn't affected filesystems at all.

Purposes 1 and 2 don't score very high in my book (though I too regret
how mm/migrate.c copied that PAGE_MAPPING_ANON stuff from it's rightful
home in mm/rmap.c: maybe we should wrap that).  There's no end to the
wrappers we can add, but they're not always helpful.

3: well, saving memory is good, but I think it could wait until some
other time, particularly since the memory controller isn't in yet.

Wouldn't it be easier to do something with page->lru than page->mapping?
Everybody is interested in page->mapping, not so many in page->lru.
(Though perhaps it wouldn't work out so well, since you don't need to
get uniquely from mapping to page, whereas you do from lru to page.)

If we were to attack page->mapping to save memory from struct page,
then we should consider Magnus Damm's idea too: he suggested it could
be replaced by a pointer to the radixtree slot (something else needed
in the anon case), from which "index" could be deduced via alignment
instead of keeping it in struct page (details to be filled in ...)

Of course, my particular prejudice is that I promised months ago to
free up the PG_swapcache bit by using a PAGE_MAPPING_SWAP bit instead.
That patch got buried while I tried to think up a suitable name for
a further page_mapping() variant that turned out to be needed - guess
I should look through your collection to see if I can steal one ;)
Beyond the unsatisfactory naming, that work has been long done
(and like PAGE_MAPPING_ANON, doesn't touch filesystems at all).

Or should I now leave PG_swapcache as is,
given your designs on page->mapping?

Hugh

p.s. Sorry to niggle, but next time, please say [PATCH 1/3] etc.
rather than [PATCH] Long Description ...
From: KAMEZAWA Hiroyuki
Date: Friday, September 21, 2007 - 11:42 am

On Fri, 21 Sep 2007 18:02:47 +0100 (BST)
I found that each FS doesn't touch page->mapping so much as I expected.
(except for ReiserFS)
Yes, if extra field in page struct is not hazard to push memory controller,
I don't have much motivation. 
Because extra 8 bytes makes page struct to be 64 bytes(in 64bit), extra 8 bytes
There is a bit difference. My purpose is "avoid making struct page larger",
 will conflict with my idea ?
==
http://marc.info/?l=linux-mm&m=118956492926821&w=2
==

Anyway, I'm not in hurry about this patch-set. I'll see what memory controller
will go. Other people seems to have an idea to implement 
pfn <-> container_info_per_page function.
(But this kind of function is not welcomed always.)

sorry, I'll consider well next time.

Thanks,
-Kame
-

From: Hugh Dickins
Date: Wednesday, September 26, 2007 - 12:31 pm

I asked because I had thought it would be a serious conflict: obviously
the patches as such would conflict quite a bit, but that's not serious,
one or the other just gets fixed up.

But now I don't see it - we both want to grab a further bit from the
low bits of the page->mapping pointer, you PAGE_MAPPING_INFO and me
PAGE_MAPPING_SWAP; but that's okay, so long as whoever is left using
bit (1<<2) is careful about the 32-bit case and remembers to put
__attribute__((aligned(sizeof(long long))))
on the declarations of struct address_space and struct anon_vma
and your struct page_mapping_info.

Would that waste a little memory?  I think not with SLUB,
but perhaps with SLOB, which packs a little tighter.

Hugh
-

From: KAMEZAWA Hiroyuki
Date: Wednesday, September 26, 2007 - 2:50 pm

On Wed, 26 Sep 2007 20:31:02 +0100 (BST)

maybe just depends on the amount of used anon_vma and page_mapping_info etc...
I don't think a system which uses SLOB consumes such structs so much
as that memory-for-alignment is considered as "waste" of memory.

Anyway, I decided to go ahead with current container-info-per-page
implementation. If the size of page struct is problem at mainline inclusion
discussion, I'll be back.

Thanks,
-Kame
-

From: Peter Zijlstra
Date: Friday, September 21, 2007 - 4:48 am

On Wed, 19 Sep 2007 16:43:08 +0900 KAMEZAWA Hiroyuki

I have two other functions that might want integration with this scheme:

  page_file_mapping()     ... returns backing address space
  page_file_index()       ... returns the index therein

They are identical to page_mapping_cache() and page_index() for
page cache pages, but they also work on swap cache pages.

That is, for swapcache pages they return:

page_file_mapping:
  page_swap_info(page)->swap_file->f_mapping

page_file_index:
  swp_offset((swp_offset_t)page_private(page))

When a filesystem uses these functions instead of page->mapping and
page->index, it allows passing swap cache pages into the regular
filesystem read/write paths.

This is useful for things like swap over NFS, where swap is backed by a
swapfile on a 'regular' filesystem.

-

From: KAMEZAWA Hiroyuki
Date: Friday, September 21, 2007 - 8:06 am

On Fri, 21 Sep 2007 13:48:28 +0200
Okay, I'll try to add them in the next set.

Thanks,
-Kame
-

Previous thread: Problem: one driver and 4 instances with different parameters by Andrey Kamchatnikov on Tuesday, September 18, 2007 - 11:54 pm. (5 messages)

Next thread: sys_chroot+sys_fchdir Fix by majkls on Wednesday, September 19, 2007 - 12:19 am. (34 messages)