Re: [13/18] x86_64: Allow fallback for the stack

Previous thread: [08/18] GFP_VFALLBACK: Allow fallback of compound pages to virtual mappings by Christoph Lameter on Wednesday, October 3, 2007 - 8:59 pm. (1 message)

Next thread: [14/18] Configure stack size by Christoph Lameter on Wednesday, October 3, 2007 - 8:59 pm. (6 messages)
From: Christoph Lameter
Date: Wednesday, October 3, 2007 - 8:59 pm

Peter Zijlstra has recently demonstrated that we can have order 1 allocation
failures under memory pressure with small memory configurations. The
x86_64 stack has a size of 8k and thus requires a order 1 allocation.

This patch adds a virtual fallback capability for the stack. The system may
continue even in extreme situations and we may be able to increase the stack
size if necessary (see next patch).

Cc: ak@suse.de
Cc: travis@sgi.com
Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/asm-x86_64/thread_info.h |   16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

Index: linux-2.6/include/asm-x86_64/thread_info.h
===================================================================
--- linux-2.6.orig/include/asm-x86_64/thread_info.h	2007-10-03 14:49:48.000000000 -0700
+++ linux-2.6/include/asm-x86_64/thread_info.h	2007-10-03 14:51:00.000000000 -0700
@@ -74,20 +74,14 @@ static inline struct thread_info *stack_
 
 /* thread information allocation */
 #ifdef CONFIG_DEBUG_STACK_USAGE
-#define alloc_thread_info(tsk)					\
-    ({								\
-	struct thread_info *ret;				\
-								\
-	ret = ((struct thread_info *) __get_free_pages(GFP_KERNEL,THREAD_ORDER)); \
-	if (ret)						\
-		memset(ret, 0, THREAD_SIZE);			\
-	ret;							\
-    })
+#define THREAD_FLAGS (GFP_VFALLBACK | __GFP_ZERO)
 #else
-#define alloc_thread_info(tsk) \
-	((struct thread_info *) __get_free_pages(GFP_KERNEL,THREAD_ORDER))
+#define THREAD_FLAGS GFP_VFALLBACK
 #endif
 
+#define alloc_thread_info(tsk) \
+	((struct thread_info *) __get_free_pages(THREAD_FLAGS, THREAD_ORDER))
+
 #define free_thread_info(ti) free_pages((unsigned long) (ti), THREAD_ORDER)
 
 #else /* !__ASSEMBLY__ */

-- 
-

From: Andi Kleen
Date: Thursday, October 4, 2007 - 4:56 am

We've known for ages that it is possible. But it has been always so rare
that it was ignored.

Is there any evidence this is more common now than it used to be?

-Andi
-

From: Peter Zijlstra
Date: Thursday, October 4, 2007 - 5:08 am

The order-1 allocation failures where GFP_ATOMIC, because SLUB uses !0
order for everything. Kernel stack allocation is GFP_KERNEL I presume.
Also, I use 4k stacks on all my machines.

Maybe the cpumask thing needs an extended api, one that falls back to
kmalloc if NR_CPUS >> sane.

That way that cannot be an argument to inflate stacks.

-

From: Andi Kleen
Date: Thursday, October 4, 2007 - 5:25 am

You don't have any x86-64 machines?

-Andi
-

From: Peter Zijlstra
Date: Thursday, October 4, 2007 - 5:30 am

I think mainline slub doesn't do this, just -mm.


Ah, my bad, yes I do, but I (wrongly) thought they had that option too.

-

From: Christoph Lameter
Date: Thursday, October 4, 2007 - 10:40 am

SLUB in mm kernels was using higher order allocations for some slabs 
for the last 6 months or so. Not true for upstream.

-

From: Christoph Lameter
Date: Thursday, October 4, 2007 - 12:20 pm

Well we can now address the rarity. That is the whole point of the 

It will be more common if the stack size is increased beyond 8k.


-

From: Rik van Riel
Date: Thursday, October 4, 2007 - 12:39 pm

On Thu, 4 Oct 2007 12:20:50 -0700 (PDT)

Introducing complexity to fight a very rare problem with a good
fallback (refusing to fork more tasks, as well as lumpy reclaim)

Why would we want to do such a thing?

8kB stacks are large enough...

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
-

From: Christoph Lameter
Date: Thursday, October 4, 2007 - 2:20 pm

The problem can become non-rare on special low memory machines doing wild 

Because NUMA requires more stack space. In particular support for very 

For many things yes. I just want to have the compile time option to 
increase it.
-

From: Nick Piggin
Date: Sunday, October 7, 2007 - 12:35 am

But only your huge systems will be using huge stacks?
-

From: Christoph Lameter
Date: Monday, October 8, 2007 - 10:36 am

I have no idea who else would be using such a feature. Relaxing the tight 
memory restrictions on stack use may allow placing larger structures on 
the stack in general.

I have some concerns about the medium NUMA systems (a few dozen of nodes) 
also running out of stack since more data is placed on the stack through 
the policy layer and since we may end up with a couple of stacked 
filesystems. Most of the current NUMA systems on x86_64 are basically 
two nodes on one motherboard. The use of NUMA controls is likely 
limited there and the complexity of the filesystems is also not high.


-

From: Nick Piggin
Date: Monday, October 8, 2007 - 5:55 am

The tight memory restrictions on stack usage do not come about because
of the difficulty in increasing the stack size :) It is because we want to
keep stack sizes small!

Increasing the stack size 4K uses another 4MB of memory for every 1000
threads you have, right?

It would take a lot of good reason to move away from the general direction
we've been taking over the past years that 4/8K stacks are a good idea for

The solution has until now always been to fix the problems so they don't
use so much stack. Maybe a bigger stack is OK for you for 1024+ CPU
systems, but I don't think you'd be able to make that assumption for most
normal systems.
-

From: Christoph Lameter
Date: Tuesday, October 9, 2007 - 11:39 am

Yes that is why I made the stack size configurable.

-

From: Nick Piggin
Date: Tuesday, October 9, 2007 - 1:46 am

Fine. I just don't see why you need this fallback.
-

From: Christoph Lameter
Date: Tuesday, October 9, 2007 - 6:26 pm

So you would be ok with submitting the configurable stacksize patches 
separately without the fallback? 
-

From: Nick Piggin
Date: Tuesday, October 9, 2007 - 2:56 am

Generic code must assume a 4K stack on 32-bit, in general (modulo

Sure. It's already configurable on other architectures.
-

From: Bill Davidsen
Date: Saturday, October 6, 2007 - 11:53 am

Why would anyone need more than 640k... In addition to NUMA, who can 
tell what some future hardware might do, given that the size of memory 
is expanding as if it were covered in Moore's Law. As memory sizes 
increase someone will bump the page size again. Better to Let people 
make it as large as they feel they need and warn at build time 
performance may suck.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-

Previous thread: [08/18] GFP_VFALLBACK: Allow fallback of compound pages to virtual mappings by Christoph Lameter on Wednesday, October 3, 2007 - 8:59 pm. (1 message)

Next thread: [14/18] Configure stack size by Christoph Lameter on Wednesday, October 3, 2007 - 8:59 pm. (6 messages)