On Wednesday 12 September 2007 07:52, Christoph Lameter wrote:OK, I'll describe how it works and what the actual problem with it is. I haven't looked at the patches for a fair while so you can forgive my inaccuracies in terminology or exact details. So anti-frag groups memory into (say) 2MB chunks. Top priority heuristic is that allocations which are movable all go into groups with other movable memory and allocations which are not movable do not go into these groups. This is flexible though, so if a workload wants to use more non movable memory, it is allowed to eat into first free, then movable groups after filling all non-movable groups. This is important because it is what makes anti-frag flexible (otherwise it would just be a memory reserve in another form). In my attack, I cause the kernel to allocate lots of unmovable allocations and deplete movable groups. I theoretically then only need to keep a small number (1/2^N) of these allocations around in order to DoS a page allocation of order N. And it doesn't even have to be a DoS. The natural fragmentation that occurs today in a kernel today has the possibility to slowly push out the movable groups and give you the same situation. Now there are lots of other little heuristics, *including lumpy reclaim and various slab reclaim improvements*, that improve the effectiveness or speed of this thing, but at the end of the day, it has the same basic issues. Unless you can move practically any currently unmovable allocation (which will either be a lot of intrusive code or require a vmapped kernel), then you can't get around the fundamental problem. And if you do get around the fundamental problem, you don't really need to group pages by mobility any more because they are all movable[*]. So lumpy reclaim does not change my formula nor significantly help against a fragmentation attack. AFAIKS. [*] ok, this isn't quite true because if you can actually put a hard limit on unmovable allocations then anti-frag will fundamentally help -- get back to me on that when you get patches to move most of the obvious ones. Like pinned dentries, inodes, buffer heads, page tables, task structs, mm structs, vmas, anon_vmas, radix-tree nodes, etc. Sure, and I pointed out the theoretical figure for 64K pages as well. Is that figure not problematic to you? Where do you draw the limit for what is acceptable? Why? What happens with tiny memory machines where a reserve or even the anti-frag patches may not be acceptable and/or work very well? When do you require reserve pools? Why are reserve pools acceptable for first-class support of filesystems when it has been very loudly been made a known policy decision by Linus in the past (and for some valid reasons) that we should not put limits on the sizes of caches in the kernel. -
| Natalie Protasevich | [BUG] New Kernel Bugs |
| david | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
git: | |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | [GIT]: Networking |
| Patrick McHardy | [NET_SCHED 00/04]: External SFQ classifiers/flow classifier |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
