David Chinner <dgc@sgi.com> writes:So the practical question is. Was it a high level design problem or was it simply a choice of implementation issue. Until we code review and implementation that does page aggregation for linux we can't say how nasty it would be. Of course what gets confusing is when you mention you refer to the previous implementation as a buffer cache, because that isn't at all what Linux had for a buffer cache. The linux buffer cache was the same as the current page cache except it was index by block number and not by offset into a file. The suggestion seems to be to always aggregate pages (to handle PAGE_SIZE < block size), and not to even worry about the fact that it happens that the pages you are aggregating are physically contiguous. The memory allocator and the block layer can worry about that. It isn't something the page cache or filesystems need to pay attention to. I suspect the implementation in linux would be sufficiently different that it would not be prone to the same problems. Among other things we are already do most things on a range of page addresses, so we would seem to have most of the infrastructure already. It looks like if we extend the current batching a little more so it covers all of the interesting cases. (read) Ensure the dirty bit on all pages in the group when we set it on one page. Add re-read when we dirty the group if we don't have it all present. Round the range we operate on up so we cleanly hit the beginning and end of the group size. Only issue the mapping operations on the first page in the group. Is about what we would have to do to handle multiple pages in one block in the page cache. There are clearly more details but as a first approximation I don't see this being fundamentally more complex then what we are currently doing. Just taking into account a few more details. The whole physical continuity thing seems to come cleanly out of a speculative page allocator, and that would seem to work and provide improvements on smaller block sizes filesystems so it looks like a larger general improvement. Likewise Jens increase the linux scatter gather list size seems like a more general independent improvement. So if we can also handle groups of pages that make up a single block as a independent change we have all of the benefits of large block sizes with most of them applying to small sector size filesystems as well. Given that small block sizes give us better storage efficiency, which means less disk bandwidth used, which means less time to get the data off of a slow disk (especially if you can put multiple files you want simultaneously in that same space). I'm not convinced that large block sizes are a clear disk performance advantage, so we should not neglect the small file sizes. Eric -
| Arjan van de Ven | [patch] Add basic sanity checks to the syscall execution patch |
| Matthew Wilcox | Re: AIM7 40% regression with 2.6.26-rc1 |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| Greg Kroah-Hartman | [PATCH 005/196] Chinese: add translation of SubmittingDrivers |
git: | |
| Andy Whitcroft | Re: VCS comparison table |
| David | User's mailing list? And multiple cherry pick |
| Scott Chacon | Git Community Book |
| Mark Levedahl | Re: [PATCH] Teach remote machinery about remotes.default config variable |
| Marco Peereboom | Re: Real men don't attack straw men |
| Richard Stallman | Real men don't attack straw men |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Tony Abernethy | Re: What is our ultimate goal?? |
| Arjan van de Ven | Re: [GIT]: Networking |
| Jeff Garzik | Re: [bug?] tg3: Failed to load firmware "tigon/tg3_tso.bin" |
| Denys Fedoryshchenko | packetloss, on e1000e worse than r8169? |
| Radu Rendec | Endianness problem with u32 classifier hash masks |
