Hi Mel, On Tue, Sep 11, 2007 at 04:36:07PM +0100, Mel Gorman wrote:The config_page_shift guarantees the kernel stacks or whatever not defragmentable allocation other allocation goes into the same 64k "not defragmentable" page. Not like with SGI design that a 8k kernel stack could be allocated in the first 64k page, and then another 8k stack could be allocated in the next 64k page, effectively pinning all 64k pages until Nick worst case scenario triggers. What I said at the VM summit is that your reclaim-defrag patch in the slub isn't necessarily entirely useless with config_page_shift, because the larger the software page_size, the more partial pages we could find in the slab, so to save some memory if there are tons of pages very partially used, we could free some of them. But the whole point is that with the config_page_shift, Nick's worst case scenario can't happen by design regardless of defrag or not defrag. While it can _definitely_ happen with SGI design (regardless of any defrag thing). We can still try to save some memory by defragging the slab a bit, but it's by far *not* required with config_page_shift. No defrag at all is required infact. Plus there's a cost in defragging and freeing cache... the more you need defrag, the slower the kernel will be. Well it wasn't my fault if we didn't discuss it in depth though. I tried to discuss it in all possible occasions where I was suggested to talk about it and where it was somewhat on topic. Given I wasn't even invited at the KS, I felt it would not be appropriate for me to try to monopolize the VM summit according to my agenda. So I happily listened to what the top kernel developers are planning ;), while giving some hints on what I think the right direction is instead. Frankly I don't care what the end conclusion was. Let's see how good the mmap support for variable order page size will work after the 2 weeks... Yes, but perhaps you missed that such printk is needed exactly to provide proof that SGI design is the wrong way and it needs to be dumped. If that printk ever triggers it means you were totally wrong. fsblock should stack on top of config_page_shift simply. Both are needed. You don't want to use 64k pages on a laptop but you may want a larger blocksize for the btrees etc... if you've a large harddisk and not much ram. Do you agree this worst case can't happen with config_page_shift? Except you don't get all the full benefits of it... Even if I could end up mapping 4k kmalloced entries in userland for the tail packing, that IMHO would still be a preferable solution than to keep the base-page small and to make an hard effort to create large pages out of small pages. The approach I advocate keeps the base page big and the fast path fast, and it rather does some work to split the base pages outside the buddy for the small files. All your defrag work is still good to have, like I said at the VM summit if you remember, to grow the hugetlbfs at runtime etc... I just rather avoid to depend on it to avoid I/O failure in presence of mlocked pagecache for example. That's pretty much an unnecessary logic, if the order0 pages become larger. -
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Greg Kroah-Hartman | [PATCH 002/196] Chinese: rephrase English introduction in HOWTO |
| Christoph Lameter | [00/41] Large Blocksize Support V7 (adds memmap support) |
| Chuck Ebbert | Re: Linux 2.6.21 |
git: | |
| Gerrit Renker | [PATCH 03/37] dccp: List management for new feature negotiation |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Hugh Dickins | Re: [bug?] tg3: Failed to load firmware "tigon/tg3_tso.bin" |
| David Miller | [GIT]: Networking |
