On Wednesday 12 September 2007 01:36, Mel Gorman wrote:Well Christoph seems to still be spinning them as a solution for VM scalability and first class support for making contiguous IOs, large filesystem block sizes etc. At the VM summit I think the conclusion was that grouping by mobility could be merged. I'm still not thrilled by that, but I was going to get steamrolled[*] anyway... and seeing as the userspace hugepages is a relatively demanded workload and can be implemented in this way with basically no other changes to the kernel and already must have fallbacks.... then that's actually a reasonable case for it. The higher order pagecache, again I'm just going to get steamrolled on, and it actually isn't so intrusive minus the mmap changes, so I didn't have much to reasonably say there. And I would have kept quiet this time too, except for the worrying idea to use higher order pages to fix the SLUB vs SLAB regression, and if the rationale for this patchset was more realistic. [*] And I don't say steamrolled because I'm bitter and twisted :) I personally want the kernel to be perfect. But I realise it already isn't and for practical purposes people want these things, so I accept being overruled, no problem. The fact simply is -- I would have been steamrolled I think :P Sure. And some people run workloads where fragmentation is likely never going to be a problem, they are shipping this poorly configured hardware now or soon, so they don't have too much interest in doing it right at this point, rather than doing it *now*. OK, that's a valid reason which is why I don't use the argument that we should do it correctly or never at all. In theory (and again for the filesystem guys who don't have to worry about it). In practice after seeing the patch it's not a nice thing for the VM to have to do. I guess it is still in the air. I personally think a vmapping approach and/or teaching filesystems to do some nonlinear block metadata access is the way to go (strangely, this happens to be one of the fsblock paradigms!). OTOH, I'm not sure how much buy-in there was from the filesystems guys. Particularly Christoph H and XFS (which is strange because they already do vmapping in places). That's understandable though. It is a lot of work for filesystems. But the reason I think it is the correct approach for larger block than soft-page size is that it doesn't have fundamental issues (assuming that virtually mapping the entire kernel is off the table). That's what I expected, but it seems from the descriptions in the patches that it is also supposed to cure cancer :) No, you have been good about that aspect. I wasn't trying to point to you at all here. It would be interesting to craft an attack. If you knew roughly the layout and size of your dentry slab for example... maybe you could stat a whole lot of files, then open one and keep it open (maybe post the fd to a unix socket or something crazy!) when you think you have filled up a couple of MB worth of them. Repeat the process until your movable zone is gone. Or do the same things with pagetables, or task structs, or radix tree nodes, etc.. these are the kinds of things I worry about (as well as just the gradual natural degredation). Yeah, it might be reasonably possible to make an attack that would deplete most of higher order allocations while pinning somewhat close to just the theoretical minimum required. [snip] Thanks Mel. Fairly good summary I think. I guess that was my hope. The only problem I have with a 2nd class higher order pagecache on a *practical* technical issue is introducing more complexity in the VM for mmap. Andrea and Hugh are probably more guardians of that area of code than I, so if they're happy with the mmap stuff then again I can accept being overruled on this ;) Then I would love to say #2 will go ahead (and I hope it would), but I can't force it down the throat of the filesystem maintainers just like I feel they can't force vm devs (me) to do a virtually mapped and defrag-able kernel :) Basically I'm trying to practice what I preach and I don't want to force fsblock onto anyone. Maybe when ext2 is converted and if I can show it isn't a performance problem / too much complexity then I'll have another leg to stand on here... I don't know. Definitely. Also, aops capable of spanning multiple pages, batching of large write(2) pagecache insertion, etc all are things we must go after, regardless of the large page and/or block size work. -
| Greg Kroah-Hartman | [PATCH 004/196] Chinese: add translation of SubmittingPatches |
| Tarkan Erimer | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Matt Mackall | Re: [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c compiling |
| James Bottomley | Re: Integration of SCST in the mainstream Linux kernel |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Natalie Protasevich | [BUG] New Kernel Bugs |
git: | |
