Peter Zijlstra has recently demonstrated that we can have order 1 allocation failures under memory pressure with small memory configurations. The x86_64 stack has a size of 8k and thus requires a order 1 allocation. This patch adds a virtual fallback capability for the stack. The system may continue even in extreme situations and we may be able to increase the stack size if necessary (see next patch). Cc: ak@suse.de Cc: travis@sgi.com Signed-off-by: Christoph Lameter <clameter@sgi.com> --- include/asm-x86_64/thread_info.h | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-) Index: linux-2.6/include/asm-x86_64/thread_info.h =================================================================== --- linux-2.6.orig/include/asm-x86_64/thread_info.h 2007-10-03 14:49:48.000000000 -0700 +++ linux-2.6/include/asm-x86_64/thread_info.h 2007-10-03 14:51:00.000000000 -0700 @@ -74,20 +74,14 @@ static inline struct thread_info *stack_ /* thread information allocation */ #ifdef CONFIG_DEBUG_STACK_USAGE -#define alloc_thread_info(tsk) \ - ({ \ - struct thread_info *ret; \ - \ - ret = ((struct thread_info *) __get_free_pages(GFP_KERNEL,THREAD_ORDER)); \ - if (ret) \ - memset(ret, 0, THREAD_SIZE); \ - ret; \ - }) +#define THREAD_FLAGS (GFP_VFALLBACK | __GFP_ZERO) #else -#define alloc_thread_info(tsk) \ - ((struct thread_info *) __get_free_pages(GFP_KERNEL,THREAD_ORDER)) +#define THREAD_FLAGS GFP_VFALLBACK #endif +#define alloc_thread_info(tsk) \ + ((struct thread_info *) __get_free_pages(THREAD_FLAGS, THREAD_ORDER)) + #define free_thread_info(ti) free_pages((unsigned long) (ti), THREAD_ORDER) #else /* !__ASSEMBLY__ */ -- -
We've known for ages that it is possible. But it has been always so rare that it was ignored. Is there any evidence this is more common now than it used to be? -Andi -
The order-1 allocation failures where GFP_ATOMIC, because SLUB uses !0 order for everything. Kernel stack allocation is GFP_KERNEL I presume. Also, I use 4k stacks on all my machines. Maybe the cpumask thing needs an extended api, one that falls back to kmalloc if NR_CPUS >> sane. That way that cannot be an argument to inflate stacks. -
You don't have any x86-64 machines? -Andi -
I think mainline slub doesn't do this, just -mm. Ah, my bad, yes I do, but I (wrongly) thought they had that option too. -
SLUB in mm kernels was using higher order allocations for some slabs for the last 6 months or so. Not true for upstream. -
Well we can now address the rarity. That is the whole point of the It will be more common if the stack size is increased beyond 8k. -
On Thu, 4 Oct 2007 12:20:50 -0700 (PDT) Introducing complexity to fight a very rare problem with a good fallback (refusing to fork more tasks, as well as lumpy reclaim) Why would we want to do such a thing? 8kB stacks are large enough... -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan -
The problem can become non-rare on special low memory machines doing wild Because NUMA requires more stack space. In particular support for very For many things yes. I just want to have the compile time option to increase it. -
But only your huge systems will be using huge stacks? -
I have no idea who else would be using such a feature. Relaxing the tight memory restrictions on stack use may allow placing larger structures on the stack in general. I have some concerns about the medium NUMA systems (a few dozen of nodes) also running out of stack since more data is placed on the stack through the policy layer and since we may end up with a couple of stacked filesystems. Most of the current NUMA systems on x86_64 are basically two nodes on one motherboard. The use of NUMA controls is likely limited there and the complexity of the filesystems is also not high. -
The tight memory restrictions on stack usage do not come about because of the difficulty in increasing the stack size :) It is because we want to keep stack sizes small! Increasing the stack size 4K uses another 4MB of memory for every 1000 threads you have, right? It would take a lot of good reason to move away from the general direction we've been taking over the past years that 4/8K stacks are a good idea for The solution has until now always been to fix the problems so they don't use so much stack. Maybe a bigger stack is OK for you for 1024+ CPU systems, but I don't think you'd be able to make that assumption for most normal systems. -
Yes that is why I made the stack size configurable. -
Fine. I just don't see why you need this fallback. -
So you would be ok with submitting the configurable stacksize patches separately without the fallback? -
Generic code must assume a 4K stack on 32-bit, in general (modulo Sure. It's already configurable on other architectures. -
Why would anyone need more than 640k... In addition to NUMA, who can tell what some future hardware might do, given that the size of memory is expanding as if it were covered in Moore's Law. As memory sizes increase someone will bump the page size again. Better to Let people make it as large as they feel they need and warn at build time performance may suck. -- Bill Davidsen <davidsen@tmr.com> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot -
