Rebased to 2.6.23-rc6-mm1 and reflected comments. Not so aggresive as previous version. (I'm not in a hurry if -mm is busy.) Any comments are welcome. Thanks, -Kame == A clarification of page <-> fs interface (page cache). At first, each FS has to access to struct page->mapping directly. But it's not just pointer. (we use special 1bit enconding for anon.) Although there is historical consensus that page->mapping points to its inode's address space, I think adding some neat helper functon is not bad. This patch adds page-cache.h which containes page<->address_space<->inode function which is required (used) by subsystems. Following functions are added * page_mapping_cache() ... returns address space if a page is page cache * page_mapping_anon() ... returns anon_vma if a page is anonymous page. * page_is_pagecache() ... returns true if a page is page-cache. * page_inode() ... returns inode which a page-cache belongs to. * is_page_consistent() ... returns true if a page is still valid page cache Followings are moved * page_mapping() ... returns swapper_space or address_space a page is on. (from mm.h) * page_index() ... returns position of a page in its inode (from mm.h) * remove_mapping() ... a safe routine to remove page->mapping from page. (from swap.h) Changelog V1 -> V2: - for 2.6.23-rc6-mm1. - use bool type. - moved related functions to page-cache.h - renamed some functions. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- include/linux/fs.h | 6 +- include/linux/mm.h | 40 --------------- include/linux/page-cache.h | 118 +++++++++++++++++++++++++++++++++++++++++++++ include/linux/swap.h | 1 4 files changed, 123 insertions(+), 42 deletions(-) Index: linux-2.6.23-rc6-mm1/include/linux/page-cache.h =================================================================== --- /dev/null +++ ...
Make use of page-cache.h functions in /mm layer.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/filemap.c | 19 ++++++++++---------
mm/memory.c | 5 +++--
mm/migrate.c | 8 ++------
mm/page-writeback.c | 4 ++--
mm/rmap.c | 36 ++++++++++++++----------------------
mm/shmem.c | 4 ++--
mm/truncate.c | 15 ++++++++-------
7 files changed, 41 insertions(+), 50 deletions(-)
Index: linux-2.6.23-rc6-mm1/mm/filemap.c
===================================================================
--- linux-2.6.23-rc6-mm1.orig/mm/filemap.c
+++ linux-2.6.23-rc6-mm1/mm/filemap.c
@@ -115,7 +115,7 @@ generic_file_direct_IO(int rw, struct ki
*/
void __remove_from_page_cache(struct page *page)
{
- struct address_space *mapping = page->mapping;
+ struct address_space *mapping = page_mapping_cache(page);
mem_container_uncharge_page(page);
radix_tree_delete(&mapping->page_tree, page->index);
@@ -127,7 +127,7 @@ void __remove_from_page_cache(struct pag
void remove_from_page_cache(struct page *page)
{
- struct address_space *mapping = page->mapping;
+ struct address_space *mapping = page_mapping_cache(page);
BUG_ON(!PageLocked(page));
@@ -641,7 +641,7 @@ repeat:
__lock_page(page);
/* Has the page been truncated while we slept? */
- if (unlikely(page->mapping != mapping)) {
+ if (unlikely(!is_page_consistent(page, mapping))) {
unlock_page(page);
page_cache_release(page);
goto repeat;
@@ -750,7 +750,7 @@ unsigned find_get_pages_contig(struct ad
ret = radix_tree_gang_lookup(&mapping->page_tree,
(void **)pages, index, nr_pages);
for (i = 0; i < ret; i++) {
- if (pages[i]->mapping == NULL || pages[i]->index != index)
+ if (!page_is_pagecache(pages[i]) || pages[i]->index != index)
break;
page_cache_get(pages[i]);
@@ -979,7 +979,7 @@ page_not_up_to_date:
lock_page(page);
/* Did it get truncated before we got the lock? ...Make use of page-cache.h in fs-generic layer.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
fs/buffer.c | 43 ++++++++++++++++++++++---------------------
fs/fs-writeback.c | 2 +-
fs/libfs.c | 2 +-
fs/mpage.c | 13 +++++++------
4 files changed, 31 insertions(+), 29 deletions(-)
Index: linux-2.6.23-rc6-mm1/fs/buffer.c
===================================================================
--- linux-2.6.23-rc6-mm1.orig/fs/buffer.c
+++ linux-2.6.23-rc6-mm1/fs/buffer.c
@@ -467,7 +467,7 @@ static void end_buffer_async_write(struc
"I/O error on %s\n",
bdevname(bh->b_bdev, b));
}
- set_bit(AS_EIO, &page->mapping->flags);
+ set_bit(AS_EIO, &page_mapping_cache(page)->flags);
set_buffer_write_io_error(bh);
clear_buffer_uptodate(bh);
SetPageError(page);
@@ -678,7 +678,7 @@ void write_boundary_block(struct block_d
void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode)
{
struct address_space *mapping = inode->i_mapping;
- struct address_space *buffer_mapping = bh->b_page->mapping;
+ struct address_space *buffer_mapping = page_mapping_cache(bh->b_page);
mark_buffer_dirty(bh);
if (!mapping->assoc_mapping) {
@@ -713,7 +713,7 @@ static int __set_page_dirty(struct page
return 0;
write_lock_irq(&mapping->tree_lock);
- if (page->mapping) { /* Race with truncate? */
+ if (is_page_consistent(page, mapping)) {/* Race with truncate? */
WARN_ON_ONCE(warn && !PageUptodate(page));
if (mapping_cap_account_dirty(mapping)) {
@@ -1204,7 +1204,8 @@ void __bforget(struct buffer_head *bh)
{
clear_buffer_dirty(bh);
if (!list_empty(&bh->b_assoc_buffers)) {
- struct address_space *buffer_mapping = bh->b_page->mapping;
+ struct address_space *buffer_mapping;
+ buffer_mapping = page_mapping_cache(bh->b_page);
spin_lock(&buffer_mapping->private_lock);
list_del_init(&bh->b_assoc_buffers);
@@ -1544,7 +1545,7 @@ void create_empty_buffers(struct page ...That is confusing. if (PageAnon(page)) return NULL; Why do we need a special function? Why is it safer? -
On Thu, 20 Sep 2007 11:26:47 -0700 (PDT) Honestly, I have 3 purposes, 2 for readability/clarificaton and 1 for my trial. 1. Clarify page cache <-> inode relationship before *new concept of page cache*, yours or someone else's is introduced. 2. There are some places using PAGE_MAPPING_ANON directly. I don't want to see following line in .c file. == anon_vma = (struct anon_vma *)(mapping - PAGE_MAPPING_ANON); == 3. I want to *try* page->mapping overriding... store memory resource controller's information in page->mapping. By this, memory controller doesn't enlarge sizeof struct page. (works well in my small test.) This is my first experience of using bool in Linux kernel.. :) I know bool is not very widely used in Linux now but I tried it because this function obviously returns yes or no, and C language supports bool as For clarify meaning of compareing page_mapping_cache() with mapping. Does this reduce readability ? Thank you for comments. Regards, -Kame -
My own vote (nothing more) would be for you to set this aside until some future time when there aren't a dozen developers all trampling over each other in this area. They're invasive little changes affecting all filesystems, whereas what we've done so far with page->mapping hasn't affected filesystems at all. Purposes 1 and 2 don't score very high in my book (though I too regret how mm/migrate.c copied that PAGE_MAPPING_ANON stuff from it's rightful home in mm/rmap.c: maybe we should wrap that). There's no end to the wrappers we can add, but they're not always helpful. 3: well, saving memory is good, but I think it could wait until some other time, particularly since the memory controller isn't in yet. Wouldn't it be easier to do something with page->lru than page->mapping? Everybody is interested in page->mapping, not so many in page->lru. (Though perhaps it wouldn't work out so well, since you don't need to get uniquely from mapping to page, whereas you do from lru to page.) If we were to attack page->mapping to save memory from struct page, then we should consider Magnus Damm's idea too: he suggested it could be replaced by a pointer to the radixtree slot (something else needed in the anon case), from which "index" could be deduced via alignment instead of keeping it in struct page (details to be filled in ...) Of course, my particular prejudice is that I promised months ago to free up the PG_swapcache bit by using a PAGE_MAPPING_SWAP bit instead. That patch got buried while I tried to think up a suitable name for a further page_mapping() variant that turned out to be needed - guess I should look through your collection to see if I can steal one ;) Beyond the unsatisfactory naming, that work has been long done (and like PAGE_MAPPING_ANON, doesn't touch filesystems at all). Or should I now leave PG_swapcache as is, given your designs on page->mapping? Hugh p.s. Sorry to niggle, but next time, please say [PATCH 1/3] etc. rather than [PATCH] Long Description ...
On Fri, 21 Sep 2007 18:02:47 +0100 (BST) I found that each FS doesn't touch page->mapping so much as I expected. (except for ReiserFS) Yes, if extra field in page struct is not hazard to push memory controller, I don't have much motivation. Because extra 8 bytes makes page struct to be 64 bytes(in 64bit), extra 8 bytes There is a bit difference. My purpose is "avoid making struct page larger", will conflict with my idea ? == http://marc.info/?l=linux-mm&m=118956492926821&w=2 == Anyway, I'm not in hurry about this patch-set. I'll see what memory controller will go. Other people seems to have an idea to implement pfn <-> container_info_per_page function. (But this kind of function is not welcomed always.) sorry, I'll consider well next time. Thanks, -Kame -
I asked because I had thought it would be a serious conflict: obviously the patches as such would conflict quite a bit, but that's not serious, one or the other just gets fixed up. But now I don't see it - we both want to grab a further bit from the low bits of the page->mapping pointer, you PAGE_MAPPING_INFO and me PAGE_MAPPING_SWAP; but that's okay, so long as whoever is left using bit (1<<2) is careful about the 32-bit case and remembers to put __attribute__((aligned(sizeof(long long)))) on the declarations of struct address_space and struct anon_vma and your struct page_mapping_info. Would that waste a little memory? I think not with SLUB, but perhaps with SLOB, which packs a little tighter. Hugh -
On Wed, 26 Sep 2007 20:31:02 +0100 (BST) maybe just depends on the amount of used anon_vma and page_mapping_info etc... I don't think a system which uses SLOB consumes such structs so much as that memory-for-alignment is considered as "waste" of memory. Anyway, I decided to go ahead with current container-info-per-page implementation. If the size of page struct is problem at mainline inclusion discussion, I'll be back. Thanks, -Kame -
On Wed, 19 Sep 2007 16:43:08 +0900 KAMEZAWA Hiroyuki I have two other functions that might want integration with this scheme: page_file_mapping() ... returns backing address space page_file_index() ... returns the index therein They are identical to page_mapping_cache() and page_index() for page cache pages, but they also work on swap cache pages. That is, for swapcache pages they return: page_file_mapping: page_swap_info(page)->swap_file->f_mapping page_file_index: swp_offset((swp_offset_t)page_private(page)) When a filesystem uses these functions instead of page->mapping and page->index, it allows passing swap cache pages into the regular filesystem read/write paths. This is useful for things like swap over NFS, where swap is backed by a swapfile on a 'regular' filesystem. -
On Fri, 21 Sep 2007 13:48:28 +0200 Okay, I'll try to add them in the next set. Thanks, -Kame -
