4k stacks will never fly on an SGI x86_64 NUMA configuration given the
additional data that may be kept on the stack. We are currently
considering to go from 8k to 16k (or even 32k) to make things work. So
having the ability to put the stacks in vmalloc space may be something to
look at.
-