On Fri, 7 Dec 2007, Linus Torvalds wrote:So I checked hackbench on my machine with oprofile, and while I'm not seeing anything that says "ten times slower", I do see it hitting the slow path all the time, and __slab_alloc() is 15% of the profile. With __slab_free() being another 8-9%. I assume that with many CPU's, those get horrendously worse, with cache trashing. The biggest cost of __slab_alloc() in my profile is the "slab_lock()", but that may not be the one that causes problems in a 64-cpu setup, so it would be good to have that verified. Anyway, it's unclear whether the reason it goes into the slow-path because the freelist is just always empty, or whether it hits the ... || !node_match(c, node) case which can trigger on NUMA. That's another potential explanation of why you'd see such a *huge* slowdown (ie if the whole node-match thing doesn't work out, slub just never gets the fast-case at all). That said, the number of places that actually pass a specific node to slub is very limited, so I suspect it's not the node matching. But just disabling that test in slab_alloc() might be one thing to test. [ The whole node match thing is likely totally bogus. I suspect we pay *more* in trying to match nodes than we'd ever likely pay in just returning the wrong node for an allocation, but that's neither here nor there ] But yeah, I'm not entirely sure SLUB is working out. Linus --
| James Bottomley | Re: Integration of SCST in the mainstream Linux kernel |
| Greg Kroah-Hartman | [PATCH 007/196] Chinese: add translation of stable_kernel_rules.txt |
| david | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Jan Engelhardt | intel iommu (Re: -mm merge plans for 2.6.23) |
git: | |
| Alexey Dobriyan | Re: [GIT]: Networking |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | Re: [BUG] New Kernel Bugs |
