Re: Should PAGE_CACHE_SIZE be discarded?

Previous thread: Beagle and logging inotify events by Jon Smirl on Tuesday, November 13, 2007 - 8:04 pm. (16 messages)

Next thread: [PATCH 3/3] nfs: use ->mmap_prepare() to avoid an AB-BA deadlock by Peter Zijlstra on Wednesday, November 14, 2007 - 4:01 pm. (8 messages)
To: <torvalds@...>, <linux-arch@...>, <linux-fsdevel@...>
Cc: <dhowells@...>, <npiggin@...>
Date: Wednesday, November 14, 2007 - 9:56 am

Are we ever going to have PAGE_CACHE_SIZE != PAGE_SIZE? If not, why not
discard PAGE_CACHE_SIZE as it's then redundant.

PAGE_CACHE_SIZE is also an ill-defined concept. Just grep in Documentation/
and it only comes up in a couple of places, neither particularly informative.

If there's a possibility that it will be used, then someone who knows how it's
supposed to work needs to sit down and document what it is, what it represents,
where it applies, how it interacts with PG_compound and how the page flags
distribute over a page cache slot.

One further thing to consider: actually making PAGE_CACHE_SIZE > PAGE_SIZE
work will be an interesting problem, as I'm certain most filesystems will
break horribly without a lot of work (ramfs might be only exception).

David
-

To: David Howells <dhowells@...>
Cc: <torvalds@...>, <linux-arch@...>, <linux-fsdevel@...>, <npiggin@...>
Date: Thursday, November 15, 2007 - 1:09 pm

I did some digging. The code was merged between 28.4.1999 and
11.5.1999, as it first appears in 2.2.8 and 2.3.0:

/*
* The page cache can done in larger chunks than
* one page, because it allows for more efficient
* throughput (it can then be mapped into user
* space in smaller chunks for same flexibility).
*
* Or rather, it _will_ be done in larger chunks.
*/
#define PAGE_CACHE_SHIFT PAGE_SHIFT
#define PAGE_CACHE_SIZE PAGE_SIZE
#define PAGE_CACHE_MASK PAGE_MASK

#define page_cache_alloc() __get_free_page(GFP_USER)
#define page_cache_free(x) free_page(x)
#define page_cache_release(x) __free_page(x)

Looks like PAGE_CACHE_SIZE>PAGE_SIZE was planned before 2.3 opened, but
never went very far. So judging by the last eight years of history,
PAGE_CACHE_SIZE is dead.

Completely discarding it might be a bad idea, as it serves as code
annotation. Christoph Lameters "Large Blocksize Support" patches seem
to globally replace PAGE_CACHE_SIZE with a different macro. Having
PAGE_CACHE_SIZE makes such an effort easier. Dropping existing
annotation seems useless, when maintaining them is zero effort.

Large code audits for the PAGE_SIZE/PAGE_CACHE_SIZE distinction seem
completely useless, though. For as long as the macro existed, it never
ever mattered. My take is to keep it and let it bitrot until someone
actually cares.

Jörn

--
Joern's library part 6:
http://www.gzip.org/zlib/feldspar.html
-

To: David Howells <dhowells@...>
Cc: <torvalds@...>, <linux-arch@...>, <linux-fsdevel@...>
Date: Wednesday, November 14, 2007 - 11:23 am

Christoph Lameter has patches exactly to make PAGE_CACHE_SIZE larger than
PAGE_SIZE, and they seem to work without much effort. I happen to hate the
patches ;) but that doesn't change the fact that PAGE_CACHE_SIZE is

No, this was an *example*. It has nothing to do with PG_compound upsream.
That was just an example. Basically, anything that goes in the page cache
is in units of PAGE_CACHE_SIZE, and nothing else. For filesystems it

I think most filesystems actually don't have much problem with it. mm
code has bigger problems, eg. due to ptes <-> pagecache no longer being
equal size.... but why would filesystems care?

-

To: Nick Piggin <npiggin@...>
Cc: <dhowells@...>, <torvalds@...>, <linux-arch@...>, <linux-fsdevel@...>
Date: Wednesday, November 14, 2007 - 11:59 am

That depends on what the coverage of struct page is. I don't actually know
whether this is PAGE_SIZE or PAGE_CACHE_SIZE; I assumed it to be the former,
but from what you've said, I'm not actually sure.

David
-

To: David Howells <dhowells@...>
Cc: <torvalds@...>, <linux-arch@...>, <linux-fsdevel@...>
Date: Wednesday, November 14, 2007 - 5:35 pm

Floating around. I'm not saying it will even get upstream. It's just

It can be pretty well any power of 2 from PAGE_SIZE upwards, with
compound pages. None of the filesystems should really care at all.
It's not even a new concept, hugetlbfs uses HPAGE_SIZE...
-

To: Nick Piggin <npiggin@...>
Cc: <dhowells@...>, <torvalds@...>, <linux-arch@...>, <linux-fsdevel@...>
Date: Thursday, November 15, 2007 - 8:05 am

Ummm... The filesystem has to care. If the VFS/VM says 'fill this page' you
do need to know how big the page is or whether it's even actually several
pages.

David
-

To: David Howells <dhowells@...>
Cc: Nick Piggin <npiggin@...>, <torvalds@...>, <linux-arch@...>, <linux-fsdevel@...>
Date: Thursday, November 15, 2007 - 10:15 am

I think that what Nick was trying to say is that PAGE_CACHE_SIZE should always
be used properly as the size of the memory struct Page covers (while PAGE_SIZE
is the hardware page size and the constraint is that PAGE_CACHE_SIZE == (PAGE_SIZE << k)
for some k >= 0). If everybody does that then "None of the filesystems should really
care at all". That said, it doesn't seem like the current usage in fs/ and drivers/

-

To: Benny Halevy <bhalevy@...>
Cc: <dhowells@...>, Nick Piggin <npiggin@...>, <torvalds@...>, <linux-arch@...>, <linux-fsdevel@...>
Date: Thursday, November 15, 2007 - 10:46 am

Indeed. One thing you have to consider is kmap(). I would expect it to
present an area of PAGE_SIZE for access. However, if the filesystem gets an
area of PAGE_CACHE_SIZE to fill, then I would have to do multiple kmap() calls
in the process of filling that 'pagecache page' in AFS.

Furthermore, if a page struct covers a PAGE_CACHE_SIZE chunk of memory, then I
suspect the page allocator is also wrong, as it I believe it deals with
PAGE_SIZE chunks of memory, assuming a struct page for each.

David
-

To: David Howells <dhowells@...>
Cc: Benny Halevy <bhalevy@...>, <torvalds@...>, <linux-arch@...>, <linux-fsdevel@...>
Date: Thursday, November 15, 2007 - 5:30 pm

That's because you're thinking about all these difficulties outside the
about the least problematic here.
-

Previous thread: Beagle and logging inotify events by Jon Smirl on Tuesday, November 13, 2007 - 8:04 pm. (16 messages)

Next thread: [PATCH 3/3] nfs: use ->mmap_prepare() to avoid an AB-BA deadlock by Peter Zijlstra on Wednesday, November 14, 2007 - 4:01 pm. (8 messages)