login
Header Space

 
 

Defaulting To 4K Stacks

April 22, 2008 - 9:42pm
Submitted by Jeremy on April 22, 2008 - 9:42pm.
Linux news

Andrew Morton replied to a commit message making 4k stacks the default, saying, "this patch will cause kernels to crash." Ingo Molnar replied, "what mainline kernels crash and how will they crash? Fedora and other distros have had 4K stacks enabled for years." He added, "we've conducted tens of thousands of bootup tests with all sorts of drivers and kernel options enabled and have yet to see a single crash due to 4K stacks." During the lengthy discussion it was suggested that nfs+xfs+raid kernel configurations, and using ndiswrapper are the most common reasons for overflowing a 4K stack size.

Andi Kleen questioned the usefulness of 4k stacks, "as far as I can figure out they are not [a worthy goal]. They might have been a worthy goal on crappy 2.4 VMs, but these times are long gone." Arjan van de Ven suggested that though the 2.6 VM is much improved over the 2.4 VM, fragmentation with 8K stacks remains an unsolvable problem, "it's basic math; the Linux VM gets to deal with both short and long lasting allocations; no matter how hard you try to get some degree of fragmentation; especially due to the 15:1 acceleration you get due to the lowmem issue. And before you say 'you should use 64 bit on such machines'; I would love it if more people used 64 bit linux. Sadly the adoption rate of that is not very good still.... by far ;(" In another email, Arjan listed two advantages to 4K stacks, "1) less memory consumption in the lowmem zone (critical for enterprise use, also good for general performance), and 2) kernel stacks at 8K are one of the most prominent order-1 allocations in the kernel; again with big-memory systems the fragmentation of the lowmem zone is a problem (and the distros that ship 4K stacks went there because of customer complaints)".


From: Andrew Morton <akpm@...>
Subject: Re: x86: 4kstacks default
Date: Apr 18, 5:29 pm 2008

On Fri, 18 Apr 2008 17:37:36 GMT
Linux Kernel Mailing List <linux-kernel@vger.kernel.org> wrote:

> Gitweb:     http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d61ecf...
> Commit:     d61ecf0b53131564949bc4196e70f676000a845a
> Parent:     f408b43ceedce49f26c01cd4a68dbbdbe2743e51
> Author:     Ingo Molnar <mingo@elte.hu>
> AuthorDate: Fri Apr 4 17:11:09 2008 +0200
> Committer:  Ingo Molnar <mingo@elte.hu>
> CommitDate: Thu Apr 17 17:41:34 2008 +0200
> 
>     x86: 4kstacks default
>     
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  arch/x86/Kconfig.debug |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
> index f4413c0..610aaec 100644
> --- a/arch/x86/Kconfig.debug
> +++ b/arch/x86/Kconfig.debug
> @@ -106,8 +106,8 @@ config DEBUG_NX_TEST
>  
>  config 4KSTACKS
>  	bool "Use 4Kb for kernel stacks instead of 8Kb"
> -	depends on DEBUG_KERNEL
>  	depends on X86_32
> +	default y

This patch will cause kernels to crash.

It has no changelog which explains or justifies the alteration.

afaict the patch was not posted to the mailing list and was not
discussed or reviewed.
--

From: Ingo Molnar <mingo@...> Subject: Re: x86: 4kstacks default Date: Apr 19, 10:23 am 2008 * Andrew Morton <akpm@linux-foundation.org> wrote: > > config 4KSTACKS > > bool "Use 4Kb for kernel stacks instead of 8Kb" > > - depends on DEBUG_KERNEL > > depends on X86_32 > > + default y > > This patch will cause kernels to crash. what mainline kernels crash and how will they crash? Fedora and other distros have had 4K stacks enabled for years: $ grep 4K /boot/config-2.6.24-9.fc9 CONFIG_4KSTACKS=y and we've conducted tens of thousands of bootup tests with all sorts of drivers and kernel options enabled and have yet to see a single crash due to 4K stacks. So basically the kernel default just follows the common distro default now. (distros and users can still disable it) Ingo --
From: Andrew Morton <akpm@...> Subject: Re: x86: 4kstacks default Date: Apr 19, 1:49 pm 2008 > On Sat, 19 Apr 2008 16:23:29 +0200 Ingo Molnar <mingo@elte.hu> wrote: > > * Andrew Morton <akpm@linux-foundation.org> wrote: > > > > config 4KSTACKS > > > bool "Use 4Kb for kernel stacks instead of 8Kb" > > > - depends on DEBUG_KERNEL > > > depends on X86_32 > > > + default y > > > > This patch will cause kernels to crash. > > what mainline kernels crash and how will they crash? There has been a dribble of reports - I don't have the links handy, nor did I search for them. > Fedora and other > distros have had 4K stacks enabled for years: > > $ grep 4K /boot/config-2.6.24-9.fc9 > CONFIG_4KSTACKS=y > > and we've conducted tens of thousands of bootup tests with all sorts of > drivers and kernel options enabled and have yet to see a single crash > due to 4K stacks. I doubt if you're testing things like nfsd-on-xfs-on-md-on-porky-scsi-driver. Enable CONFIG_DEBUG_STACK_USAGE. Monitor the results. It's so scary that I wonder if the feature is busted. > So basically the kernel default just follows the > common distro default now. (distros and users can still disable it) Apparently not. I wouldn't enable it if I had a distro. Anyway. We should be having this sort of discussion _before_ a patch gets merged, no? --

From: Shawn Bohrer <shawn.bohrer@...>
Subject: Re: x86: 4kstacks default
Date: Apr 19, 10:59 am 2008

On Sat, Apr 19, 2008 at 04:23:29PM +0200, Ingo Molnar wrote:
> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > >  config 4KSTACKS
> > >  	bool "Use 4Kb for kernel stacks instead of 8Kb"
> > > -	depends on DEBUG_KERNEL
> > >  	depends on X86_32
> > > +	default y
> > 
> > This patch will cause kernels to crash.
> 
> what mainline kernels crash and how will they crash? Fedora and other 
> distros have had 4K stacks enabled for years:

If by other distros you mean RHEL then yes.  However, openSUSE,
Ubuntu, and Mandriva all still have 8K stacks.  I know of no other
distributions that default to 4K.

--
Shawn
--

From: Arjan van de Ven <arjan@...> Subject: Re: x86: 4kstacks default Date: Apr 19, 2:00 pm 2008 On Sat, 19 Apr 2008 09:59:48 -0500 Shawn Bohrer <shawn.bohrer@gmail.com> wrote: > On Sat, Apr 19, 2008 at 04:23:29PM +0200, Ingo Molnar wrote: > > > > * Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > config 4KSTACKS > > > > bool "Use 4Kb for kernel stacks instead of 8Kb" > > > > - depends on DEBUG_KERNEL > > > > depends on X86_32 > > > > + default y > > > > > > This patch will cause kernels to crash. > > > > what mainline kernels crash and how will they crash? Fedora and > > other distros have had 4K stacks enabled for years: > > If by other distros you mean RHEL then yes. However, openSUSE, > Ubuntu, and Mandriva all still have 8K stacks. I know of no other > distributions that default to 4K. centos, oracle and redflag tend to follow the RHEL/fedora settings. To be honest, at this point we're at a situation where * Several very popular distributions have this enabled for 5+ years, apparently without any real issues (otherwise the enterprise releases would have turned this off) * The early "hot known issues" have been resolved afaik, things like block device stacking, and symlink recursion lookups are either no longer recursive, or a lot less recursive than they used to be. There are clear benefits to 4K stacks (no need to reiterate the flamewar, but worth mentioning) * Less memory consumption in the lowmem zone (critical for enterprise use, also good for general performance) * Kernel stacks at 8K are one of the most prominent order-1 allocations in the kernel; again with big-memory systems the fragmentation of the lowmem zone is a problem (and the distros that ship 4K stacks went there because of customer complaints) On the flipside the arguments tend to be 1) certain stackings of components still runs the risk of overflowing 2) I want to run ndiswrapper 3) general, unspecified uneasyness. For 1), we need to know which they are, and then solve them, because even on x86-64 with 8k stacks they can be a problem (just because the stack frames are bigger, although not quite double, there). I've not seen any recent reports, I'll try to extend the kerneloops.org client to collect the "stack is getting low" warning to be able to see how much this really happens. for 2), the real answer there is "ndiswrapper needs 12kb not 8kb" for 3), this is hard to deal with but also generally unfounded... you can use this argument against any change in the kernel. --
From: Ingo Molnar <mingo@...> Subject: Re: x86: 4kstacks default Date: Apr 19, 2:33 pm 2008 * Arjan van de Ven <arjan@infradead.org> wrote: > On Sat, 19 Apr 2008 09:59:48 -0500 > Shawn Bohrer <shawn.bohrer@gmail.com> wrote: > > > On Sat, Apr 19, 2008 at 04:23:29PM +0200, Ingo Molnar wrote: > > > > > > * Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > config 4KSTACKS > > > > > bool "Use 4Kb for kernel stacks instead of 8Kb" > > > > > - depends on DEBUG_KERNEL > > > > > depends on X86_32 > > > > > + default y > > > > > > > > This patch will cause kernels to crash. > > > > > > what mainline kernels crash and how will they crash? Fedora and > > > other distros have had 4K stacks enabled for years: > > > > If by other distros you mean RHEL then yes. However, openSUSE, > > Ubuntu, and Mandriva all still have 8K stacks. I know of no other > > distributions that default to 4K. > > centos, oracle and redflag tend to follow the RHEL/fedora settings. > > To be honest, at this point we're at a situation where > * Several very popular distributions have this enabled for 5+ years, > apparently without any real issues (otherwise the enterprise releases > would have turned this off) > * The early "hot known issues" have been resolved afaik, things like > block device stacking, and symlink recursion lookups are either no longer > recursive, or a lot less recursive than they used to be. > > There are clear benefits to 4K stacks (no need to reiterate the flamewar, > but worth mentioning) > * Less memory consumption in the lowmem zone (critical for enterprise use, > also good for general performance) > * Kernel stacks at 8K are one of the most prominent order-1 allocations in the > kernel; again with big-memory systems the fragmentation of the lowmem zone > is a problem (and the distros that ship 4K stacks went there because of customer > complaints) > > On the flipside the arguments tend to be > 1) certain stackings of components still runs the risk of overflowing > 2) I want to run ndiswrapper > 3) general, unspecified uneasyness. > > For 1), we need to know which they are, and then solve them, because > even on x86-64 with 8k stacks they can be a problem (just because the > stack frames are bigger, although not quite double, there). I've not > seen any recent reports, I'll try to extend the kerneloops.org client > to collect the "stack is getting low" warning to be able to see how > much this really happens. > > for 2), the real answer there is "ndiswrapper needs 12kb not 8kb" > > for 3), this is hard to deal with but also generally unfounded... you > can use this argument against any change in the kernel. and lets observe it that 8K stacks are of course still offered, so if anyone disables 4K stacks in the .config, it will stay disabled. Ingo --
From: Eric Sandeen <sandeen@...> Subject: Re: x86: 4kstacks default Date: Apr 19, 10:36 pm 2008 Arjan van de Ven wrote: > On the flipside the arguments tend to be > 1) certain stackings of components still runs the risk of overflowing > 2) I want to run ndiswrapper > 3) general, unspecified uneasyness. > > For 1), we need to know which they are, and then solve them, because even on x86-64 with 8k stacks > they can be a problem (just because the stack frames are bigger, although not quite double, there). Except, apparently, not, at least in my experience. Ask the xfs guys if they see stack overflows on x86_64, or on x86. I've personally never seen common stack problems with xfs on x86_64, but it's very common on x86. I don't have a great answer for why, but that's my anecdotal evidence. > I've not seen any recent reports, I'll try to extend the kerneloops.org client to collect the > "stack is getting low" warning to be able to see how much this really happens. That sounds like a very good thing to collect, and maybe if I re-send a "clearly state stack overflows at oops time" patch you can easily keep tabs. Thanks, -Eric --
From: Arjan van de Ven <arjan@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 2:11 am 2008 On Sat, 19 Apr 2008 21:36:16 -0500 Eric Sandeen <sandeen@sandeen.net> wrote: > > For 1), we need to know which they are, and then solve them, > > because even on x86-64 with 8k stacks they can be a problem (just > > because the stack frames are bigger, although not quite double, > > there). > > Except, apparently, not, at least in my experience. if you actually go over on x86, it's not unlikely that you're getting close to the edge on 64 bit. At minimum we really do want to fix these things... > I've personally never seen common stack problems with xfs on x86_64, > but it's very common on x86. I don't have a great answer for why, but > that's my anecdotal evidence. One thing I've learned with the kerneloops.org work is that people don't read their dmesg..... > > > I've not seen any recent reports, I'll try to extend the > > kerneloops.org client to collect the "stack is getting low" warning > > to be able to see how much this really happens. > > That sounds like a very good thing to collect, and maybe if I re-send > a "clearly state stack overflows at oops time" patch you can easily > keep tabs. ... which makes me think we need to strengthen this part of the kernel. (and then have kerneloops.org collect the issues) If there's a clear pattern in the backtraces we will find it. And then we can fix it... which is absolutely the right thing, I don't think anyone disagrees with that. So yes if you can dig up your patch, yes please! -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --

From: Eric Sandeen <sandeen@...>
Subject: Re: x86: 4kstacks default
Date: Apr 19, 11:29 pm 2008

Ingo Molnar wrote:
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
>>>  config 4KSTACKS
>>>  	bool "Use 4Kb for kernel stacks instead of 8Kb"
>>> -	depends on DEBUG_KERNEL
>>>  	depends on X86_32
>>> +	default y
>> This patch will cause kernels to crash.
> 
> what mainline kernels crash and how will they crash? Fedora and other 
> distros have had 4K stacks enabled for years:
> 
>   $ grep 4K /boot/config-2.6.24-9.fc9
>   CONFIG_4KSTACKS=y
> 
> and we've conducted tens of thousands of bootup tests with all sorts of 
> drivers and kernel options enabled and have yet to see a single crash 
> due to 4K stacks. 

Really, not one?

https://bugzilla.redhat.com/show_bug.cgi?id=247158
https://bugzilla.redhat.com/show_bug.cgi?id=227331
https://bugzilla.redhat.com/show_bug.cgi?id=240077

(hehe, ok, xfs is a common component there...)

and it's not always obvious that you've overflowed the stack.

CONFIG_DEBUG_STACKOVERFLOW isn't ery useful because the warning printk
it generates uses the remaining amount of stack, and tips the box.

> So basically the kernel default just follows the 
> common distro default now. (distros and users can still disable it)

If Fedora is the common distro, ok. :)

Fedora is a pretty narrow sample in terms of IO stacks at least.  I have
plenty of fondness for Fedora, but it's almost 100% ext3[1].  I spent a
fair amount of time getting xfs+lvm to survive 4k on F8; gcc caused
stack usage to grow in general from F7 to F8, and F9 seems to have
gotten tight again but I haven't gotten to the bottom of yet.

Heck my ext3-root-on-sda1 pre-beta F9 box, no nfs or lvm or xfs or
anything gets within 744 bytes of the end of the 4k stack simply by
*booting* (it was a modprobe process... maybe some module needs help)

How many other distros use 4K stacks on x86, really?

-Eric

[1] http://www.smolts.org/static/stats/stats.html shows 24588 ext3
filesystems, compared to 366 xfs, 248 reiserfs, 76 jfs ...
--

From: Ingo Molnar <mingo@...> Subject: Re: x86: 4kstacks default Date: Apr 21, 10:31 am 2008 * Eric Sandeen <sandeen@sandeen.net> wrote: > > and we've conducted tens of thousands of bootup tests with all sorts > > of drivers and kernel options enabled and have yet to see a single > > crash due to 4K stacks. > > Really, not one? > > https://bugzilla.redhat.com/show_bug.cgi?id=247158 > https://bugzilla.redhat.com/show_bug.cgi?id=227331 > https://bugzilla.redhat.com/show_bug.cgi?id=240077 > > (hehe, ok, xfs is a common component there...) > > and it's not always obvious that you've overflowed the stack. > > CONFIG_DEBUG_STACKOVERFLOW isn't ery useful because the warning printk > it generates uses the remaining amount of stack, and tips the box. note that in -rt we have an ftrace plugin that measures _precise_ stack footprint, when it happens. so it's possible to measure exact stack footprint and save a stack trace when that happens. Ingo --

From: Adrian Bunk <bunk@...>
Subject: Re: x86: 4kstacks default
Date: Apr 20, 4:09 am 2008

On Sat, Apr 19, 2008 at 09:59:48AM -0500, Shawn Bohrer wrote:
> On Sat, Apr 19, 2008 at 04:23:29PM +0200, Ingo Molnar wrote:
> > 
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > >  config 4KSTACKS
> > > >  	bool "Use 4Kb for kernel stacks instead of 8Kb"
> > > > -	depends on DEBUG_KERNEL
> > > >  	depends on X86_32
> > > > +	default y
> > > 
> > > This patch will cause kernels to crash.
> > 
> > what mainline kernels crash and how will they crash? Fedora and other 
> > distros have had 4K stacks enabled for years:
> 
> If by other distros you mean RHEL then yes.  However, openSUSE,
> Ubuntu, and Mandriva all still have 8K stacks.  I know of no other
> distributions that default to 4K.

MontaVista offers 4k stacks for arm (currently an external patch) and 
markets that as a feature to customers, so many of them might use it.

In-kernel the sh and m68knommu ports also offer 4k stacks (for both 
archs there's also a defconfig using it), and the mn10300 port contains 
an #ifdef but no config option.

The stack problems in the kernel tend to not be in arch code, and if 
we don't get i386 to always run with 4k stacks there's no chance that 
it will ever work reliably on other architectures.

> Shawn

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Alan Cox <alan@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 4:06 am 2008 > The stack problems in the kernel tend to not be in arch code, and if > we don't get i386 to always run with 4k stacks there's no chance that > it will ever work reliably on other architectures. Not really the case - embedded tends not to use deep stacks of drivers. Alan --
From: Adrian Bunk <bunk@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 4:51 am 2008 On Sun, Apr 20, 2008 at 09:06:23AM +0100, Alan Cox wrote: > > The stack problems in the kernel tend to not be in arch code, and if > > we don't get i386 to always run with 4k stacks there's no chance that > > it will ever work reliably on other architectures. > > Not really the case - embedded tends not to use deep stacks of drivers. Something like nfsd-over-xfs-over-raid is (or was) the most common problem - and this or similar stackings might be used in NAS devices. > Alan cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed --
From: Alan Cox <alan@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 5:36 am 2008 On Sun, 20 Apr 2008 11:51:04 +0300 Adrian Bunk <bunk@kernel.org> wrote: > On Sun, Apr 20, 2008 at 09:06:23AM +0100, Alan Cox wrote: > > > The stack problems in the kernel tend to not be in arch code, and if > > > we don't get i386 to always run with 4k stacks there's no chance that > > > it will ever work reliably on other architectures. > > > > Not really the case - embedded tends not to use deep stacks of drivers. > > Something like nfsd-over-xfs-over-raid is (or was) the most common > problem - and this or similar stackings might be used in NAS devices. Specific cases yes, but such NAS devices have big processors and are not little emdedded CPUs. On an embedded box you know at build time what it will be doing. --
From: Adrian Bunk <bunk@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 6:44 am 2008 On Sun, Apr 20, 2008 at 10:36:11AM +0100, Alan Cox wrote: > On Sun, 20 Apr 2008 11:51:04 +0300 > Adrian Bunk <bunk@kernel.org> wrote: > > > On Sun, Apr 20, 2008 at 09:06:23AM +0100, Alan Cox wrote: > > > > The stack problems in the kernel tend to not be in arch code, and if > > > > we don't get i386 to always run with 4k stacks there's no chance that > > > > it will ever work reliably on other architectures. > > > > > > Not really the case - embedded tends not to use deep stacks of drivers. > > > > Something like nfsd-over-xfs-over-raid is (or was) the most common > > problem - and this or similar stackings might be used in NAS devices. > > Specific cases yes, but such NAS devices have big processors and are not > little emdedded CPUs. On an embedded box you know at build time what it > will be doing. The code in the kernel that gets the fewest coverage at all are our error paths, and some vendor might try 4k stacks, validate it works in all use cases - and then it will blow up in some error condition he didn't test. 6k is known to work, and there aren't many problems known with 4k. And from a QA point of view the only way of getting 4k thoroughly tested by users, and well also tested in -rc kernels for catching regressions before they get into stable kernels, is if we get 4k stacks enabled unconditionally on i386. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed --
From: Andi Kleen <andi@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 8:27 am 2008 Adrian Bunk <bunk@kernel.org> writes: > > 6k is known to work, and there aren't many problems known with 4k. > > And from a QA point of view the only way of getting 4k thoroughly tested But you have to first ask why do you want 4k tested? Does it serve any useful purpose in itself? I don't think so. Or you're saying it's important to support 50k kernel threads on 32bit kernels? -Andi --
From: Daniel Hazelton <dhazelton@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 11:44 am 2008 On Sunday 20 April 2008 08:27:14 Andi Kleen wrote: > Adrian Bunk <bunk@kernel.org> writes: > > 6k is known to work, and there aren't many problems known with 4k. > > > > And from a QA point of view the only way of getting 4k thoroughly tested > > But you have to first ask why do you want 4k tested? Does it serve > any useful purpose in itself? I don't think so. Or you're saying > it's important to support 50k kernel threads on 32bit kernels? > > -Andi Andi, you're the only one I've seen seriously pounding the "50k threads" thing - I don't think anyone is really fooled by the straw-man, so I'd suggest you drop it. The real issue is that you think (and are correct in thinking) that people are idiots. Yes, there will be breakages if the default is changed to 4k stacks - but if people are running new kernels on boxes that'll hit stack use problems (that *AREN'T* related to ndiswrapper) and haven't made sure that they've configured the kernel properly, then they deserve the outcome. It isn't the job of the Linux Kernel to protect the incompetent - nor is it the job of linux kernel developers to do such. If people are doing a "zcat /proc/kconfig.gz > .config && make oldconfig" (or similar) the problem shouldn't even appear, really. They'll get whatever setting was in their old config for the stack size. And until the problems with deep-stack setups - like nfs+xfs+raid - get resolved I'd think that the option to configure the stack size would remain. Since the second-most-common reason for stack overages is ndiswrapper... Well, with there being so much more hardware now supported directly by the linux kernel... I'm stunned every time someone tells me "I can't run Linux on my laptop, there is hardware that isn't supported without me having to get ndiswrapper". The last time someone said that to me I pointed to the fact that their hardware is supported by the latest kernel and even offered to build&install it for them. DRH -- Dialup is like pissing through a pipette. Slow and excruciatingly painful. --
From: Andi Kleen <andi@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 1:26 pm 2008 Daniel Hazelton wrote: > Andi, you're the only one I've seen seriously pounding the "50k threads" > thing. I don't think anyone is really fooled by the straw-man, so I'd > suggest you drop it. Ok, perhaps we can settle this properly. Like historicans. We study the original sources. The primary resource is the original commit adding the 4k stack code. You cannot find this in latest git because it predates 2.6.12, but it is available in one of the historic trees imported from BitKeeper like git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Here's the log: >> commit 95f238eac82907c4ccbc301cd5788e67db0715ce Author: Andrew Morton <akpm@osdl.org> Date: Sun Apr 11 23:18:43 2004 -0700 [PATCH] ia32: 4Kb stacks (and irqstacks) patch From: Arjan van de Ven <arjanv@redhat.com> Below is a patch to enable 4Kb stacks for x86. The goal of this is to 1) Reduce footprint per thread so that systems can run many more threads (for the java people) 2) Reduce the pressure on the VM for order > 0 allocations. We see real life workloads (granted with 2.4 but the fundamental fragmentation issue isn't solved in 2.6 and isn't solvable in theory) where this can be a problem. In addition order > 0 allocations can make the VM "stutter" and give more latency due to having to do much much more work trying to defragment ... << This gives us two reasons as you can see, one of them many threads and another mostly only relevant to 2.4 Now I was also assuming that nobody took (1) really serious and attacked (2) in earlier thread; in particular in http://article.gmane.org/gmane.linux.kernel/665584 >> Actually the real reason the 4K stacks were introduced IIRC was that the VM is not very good at allocation of order > 0 pages and that only using order 0 and not order 1 in normal operation prevented some stalls. This rationale also goes back to 2.4 (especially some of the early 2.4 VMs were not very good) and the 2.6 VM is generally better and on x86-64 I don't see much evidence that these stalls are a big problem (but then x86-64 also has more lowmem). << This was corrected by Ingo who was one of the primary authors of the patch: http://thread.gmane.org/gmane.linux.kernel/665420: >> no, the primary motivation Arjan and me started working on 4K stacks and implemented it was what Denys mentioned: i had a testcase that ran 50,000 threads before it ran out of memory - i wanted it to run 100,000 threads. The improved order-0 behavior was just icing on the cake. Ingo << and then from Arjan: http://thread.gmane.org/gmane.linux.kernel/665420 >> > no, the primary motivation Arjan and me started working on 4K stacks > and implemented it was what Denys mentioned: i had a testcase that well that and the fact that RH had customers who had major issues at fewer threads with 8Kb versus fragmentation. << So both the primary authors of the patch state that 50k threads was the main reason. I didn't believe it at first either, but after these forceful corrections I do now. You're totally wrong when you call it a straw man. -Andi --
From: Arjan van de Ven <arjan@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 2:48 pm 2008 On Sun, 20 Apr 2008 19:26:10 +0200 Andi Kleen <andi@firstfloor.org> wrote: > Daniel Hazelton wrote: > > > Andi, you're the only one I've seen seriously pounding the "50k > > threads" thing. I don't think anyone is really fooled by the > > straw-man, so I'd suggest you drop it. > > Ok, perhaps we can settle this properly. Like historicans. We study > the original sources. > > The primary resource is the original commit adding the 4k stack code. > You cannot find this in latest git because it predates 2.6.12, but it > is available in one of the historic trees imported from BitKeeper like > git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git > > Here's the log: > >> > commit 95f238eac82907c4ccbc301cd5788e67db0715ce > Author: Andrew Morton <akpm@osdl.org> > Date: Sun Apr 11 23:18:43 2004 -0700 > > [PATCH] ia32: 4Kb stacks (and irqstacks) patch > > From: Arjan van de Ven <arjanv@redhat.com> > > Below is a patch to enable 4Kb stacks for x86. The goal of this > is to > > 1) Reduce footprint per thread so that systems can run many more > threads (for the java people) > > 2) Reduce the pressure on the VM for order > 0 allocations. We see > real life > workloads (granted with 2.4 but the fundamental fragmentation > issue isn't > solved in 2.6 and isn't solvable in theory) where this can be a > problem. > In addition order > 0 allocations can make the VM "stutter" and > give more > latency due to having to do much much more work trying to > defragment > > ... > << > > This gives us two reasons as you can see, one of them many threads > and another mostly only relevant to 2.4 > > Now I was also assuming that nobody took (1) really serious and I'm sorry but I really hope nobody shares your assumption here. These are real customer workloads; java based "many things going on" at a time showed several thousands of threads fin the system (a dozen or two per request, multiplied by the number of outstanding connections) for *real customers*. That you don't take that serious, fair, you can take serious whatever you want. > attacked (2) in earlier thread; in particular in yes you did attack. But lets please use more friendly conversation here than words like "attack". This is not a war, and we really shouldn't be hostile in this forum, neither in words nor in intention. > > http://article.gmane.org/gmane.linux.kernel/665584 > > >> > Actually the real reason the 4K stacks were introduced IIRC was that > the VM is not very good at allocation of order > 0 pages and that only > using order 0 and not order 1 in normal operation prevented some > stalls. > > This rationale also goes back to 2.4 (especially some of the early 2.4 > VMs were not very good) and the 2.6 VM is generally better and on > x86-64 I don't see much evidence that these stalls are a big problem > (but then x86-64 also has more lowmem). > << What you didn't atta^Waddress was the observation that fragmentation is fundamentally unsolvable. Yes 2.4 sucked a lot more than 2.6 does. But even 2.6 will (and does) have fragmentation issues. We don't have effective physical address based reclaim yet for higher order allocs. > > http://thread.gmane.org/gmane.linux.kernel/665420: > > >> > no, the primary motivation Arjan and me started working on 4K stacks > and implemented it was what Denys mentioned: i had a testcase that ran > 50,000 threads before it ran out of memory - i wanted it to run > 100,000 threads. The improved order-0 behavior was just icing on the > cake. > > Ingo > << > > and then from Arjan: > > http://thread.gmane.org/gmane.linux.kernel/665420 > > >> > > no, the primary motivation Arjan and me started working on 4K stacks > > and implemented it was what Denys mentioned: i had a testcase that > > well that and the fact that RH had customers who had major issues at > fewer threads > with 8Kb versus fragmentation. > << > > So both the primary authors of the patch state that 50k threads > was the main reason. I didn't believe it at first either, but after > these forceful corrections I do now. I'm sorry but I fail to entirely understand where your "So" or the rest of your conclusion comes from in terms of "both the authors". Which part of "fewer threads" and "8kb versus fragmentation" did you misunderstand to get to your conclusion? -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
From: Andi Kleen <andi@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 4:01 pm 2008 > These are real customer workloads; java based "many things going on" at a time > showed several thousands of threads fin the system (a dozen or two per request, multiplied > by the number of outstanding connections) for *real customers*. Several thousands or 50k? Several thousands sounds large, but not entirely unreasonable, but it is far from 50k. > That you don't take that serious, fair, you can take serious whatever you want. No I don't take 50k threads on 32bit serious. And I hope you do not either. Why I don't take it serious: on 32bit 50k threads will lead to lowmem exhaustion if the threads are actually doing something (like keeping select pages around or similar and having some thread local data). You'll easily be at 16-32K/thread and that is already far beyond the lowmem available on any 3:1 split 32bit kernel, likely even beyond 2:2. Even with 3:1 it could be tight. So you can say about customer workloads what you want, but you'll have a hard time convincing me they really run 50k threads doing something on 32bit. Now if we take the real realistic overhead of a thread into account 4k or more less don't really matter all that much and the decreased safety from the 4k stack starts to look like a very bad bargain. >> attacked (2) in earlier thread; in particular in > > yes you did attack. > But lets please use more friendly conversation here than words like > "attack". This is not a war, and we really shouldn't be hostile in this forum, neither > in words nor in intention. Ok what word would you prefer? There is no war involved right, just a technical argument. I previously always assumed that "attacking" was a standard term in discussions, but if you don't like I can switch to another one. Regarding war like terminology: I used to think that people who commonly talk about "nuking code" went a little too far, but at some point I adapted to them I think. Perhaps it comes from that. > What you didn't atta^Waddress Fine, I will call it address from now. > was the observation that fragmentation is fundamentally unsolvable. Where was that observation? > Yes 2.4 sucked a lot more than 2.6 does. But even 2.6 will (and does) have fragmentation issues. > We don't have effective physical address based reclaim yet for higher order allocs. I don't see any evidence that there are serious order 1 fragmentation issues on 2.6. If you have any please post it. -Andi --
From: Arjan van de Ven <arjan@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 6:33 pm 2008 On Sun, 20 Apr 2008 22:01:46 +0200 Andi Kleen <andi@firstfloor.org> wrote: > > > > These are real customer workloads; java based "many things going > > on" at a time showed several thousands of threads fin the system (a > > dozen or two per request, multiplied by the number of outstanding > > connections) for *real customers*. > > Several thousands or 50k? Several thousands sounds large, but not > entirely unreasonable, but it is far from 50k. it is you who keeps putting up the 50k argument. What I'm talking about is in the 10k to 20k range; and that is actual workloads by real customers. > > > That you don't take that serious, fair, you can take serious > > whatever you want. > > No I don't take 50k threads on 32bit serious. And I hope you do not > either. [ removed a bunch of stuff about 50k again ] > > > was the observation that fragmentation is fundamentally unsolvable. > > Where was that observation? it was in the commit message from me you quoted, and was rather widely discussed at the time. It's also basic math; the Linux VM gets to deal with both short and long lasting allocations; no matter how hard you try to get some degree of fragmentation; especially due to the 15:1 acceleration you get due to the lowmem issue. And before you say "you should use 64 bit on such machines"; I would love it if more people used 64 bit linux. Sadly the adoption rate of that is not very good still.... by far ;( > > > Yes 2.4 sucked a lot more than 2.6 does. But even 2.6 will (and > > does) have fragmentation issues. We don't have effective physical > > address based reclaim yet for higher order allocs. > > I don't see any evidence that there are serious order 1 fragmentation > issues on 2.6. I assume you're not asking me to give you customer confidential data from a previous job in public ;) >If you have any please post it. just like you're posting the evidence that 4k stacks overflows? Google scores: 1-order allocation failed 54000 pages do_IRQ: stack overflow 4560 pages -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --
From: Andi Kleen <andi@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 7:16 pm 2008 Arjan van de Ven <arjan@infradead.org> writes: > > it is you who keeps putting up the 50k argument. See the links I posted and quote in an earlier message up the thread if you don't remember what you wrote yourself. I originally only hold up the fragmentation argument (or rather only argued against it), until I was corrected by both Ingo and you in the earlier thread and you both insisted that 50k threads were the real reason'd'etre for 4k stacks. You're saying that was wrong and the fragmentation issue was really the real reason for 4k stacks? If both you and Ingo can agree on that I would be happy to forget the 50k threads :) > What I'm talking about is in the 10k to 20k range; and that is actual workloads > by real customers. On a 32bit kernel? My estimate is that you need around 32k for a functional blocked thread in a network server (8k + 2*4k for poll with large fd table and wait queues + some pinned dentries and inodes + misc other stuff). With 20k you're 625MB into your lowmem which leaves about 200MB left on a 3:1 system with 16GB (and ~128MB mem_map). That might work for some time, but I expect it will fall over at some point because there is just too much pinned lowmem and not enough left for other stuff (like networking buffers etc.) 10k sounds more doable. But again do 4k more or less make a big difference with the other thread overhead? I don't think so. And trading reliability (and functionality -- you basically have to cut off XFS)just for 4k/thread doesn't seem like good bargain to me. Especially with kernel code getting more complicated all the time. >> I don't see any evidence that there are serious order 1 fragmentation >> issues on 2.6. > > I assume you're not asking me to give you customer confidential data from a previous job in public ;) Well if it is that serious a problem surely it will have hit some public bugzillas or mailing lists? Arguing with something secret is also not very useful. Also I find it always important to reevaluate assumptions when new facts come up. In this case we should reevaluate a decision that made sense[1] in 2.4 with the new facts of 2.6 (e.g. new VM with much better reclaim) [1] refering to the fragmentation argument, not the 50k threads which were always unrealistic. -Andi --
From: Arjan van de Ven <arjan@...> Subject: Re: x86: 4kstacks default Date: Apr 21, 1:53 am 2008 On Mon, 21 Apr 2008 01:16:22 +0200 Andi Kleen <andi@firstfloor.org> wrote: > Arjan van de Ven <arjan@infradead.org> writes: > > > > it is you who keeps putting up the 50k argument. > > See the links I posted and quote in an earlier message up the thread > if you don't remember what you wrote yourself. > > I originally only hold up the fragmentation argument (or rather only > argued against it), until I was corrected by both Ingo and you in the > earlier thread and you both insisted that 50k threads were the real > reason'd'etre for 4k stacks. > > You're saying that was wrong and the fragmentation issue was really > the real reason for 4k stacks? If both you and Ingo can agree on that > I would be happy to forget the 50k threads :) I already corrected you misquoting/misunderstanding me; should I do this again? > > > What I'm talking about is in the 10k to 20k range; and that is > > actual workloads by real customers. > > On a 32bit kernel? > > My estimate is that you need around 32k for a functional blocked > thread in a network server (8k + 2*4k for poll with large fd table > and wait queues + some pinned dentries and inodes + misc other > stuff). With 20k you're 625MB into your lowmem which leaves about > 200MB left on a 3:1 system with 16GB (and ~128MB mem_map). That > might work for some time, but I expect it will fall over at some > point because there is just too much pinned lowmem and not enough > left for other stuff (like networking buffers etc.) > > 10k sounds more doable. But again do 4k more or less make > a big difference with the other thread overhead? I don't think so. no but the other ones are order 0.. -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org --

From: Willy Tarreau <w@...>
Subject: Re: x86: 4kstacks default
Date: Apr 20, 8:47 am 2008

On Sun, Apr 20, 2008 at 02:27:14PM +0200, Andi Kleen wrote:
> Adrian Bunk <bunk@kernel.org> writes:
> >
> > 6k is known to work, and there aren't many problems known with 4k.
> >
> > And from a QA point of view the only way of getting 4k thoroughly tested 
> 
> But you have to first ask why do you want 4k tested? Does it serve
> any useful purpose in itself? I don't think so. Or you're saying
> it's important to support 50k kernel threads on 32bit kernels?

Clearly if I have the choice between a kernel which can run 50k threads
and a kernel which does not crash under me during an I/O error, I choose
the later! I don't even imagine what purpose 50k kernel threads may serve.
I certainly can understand that reducing memory footprint is useful, but
if we want wider testing of 4k stacks, considering they may fail in error
path in complex I/O environment, it's not likely during -rc kernels that
we'll detect problems, and if we push them down the throat of users in a
stable release, of course they will thank us very much for crashing their
NFS servers in production during peak hours.

I have nothing against changing the default setting to 4k provided that
it is easy to get back to the save setting (ie changing a config option,
or better, a cmdline parameter). I just don't agree with the idea of
forcing users to swim in the sh*t, it only brings bad reputation to
Linux.

What would really help would be to have 8k stacks with the lower page
causing a fault and print a stack trace upon first access. That way,
the safe setting would still report us useful information without
putting users into trouble.

Willy

--

From: Mark Lord <lkml@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 9:27 am 2008 Willy Tarreau wrote: > > What would really help would be to have 8k stacks with the lower page > causing a fault and print a stack trace upon first access. That way, > the safe setting would still report us useful information without > putting users into trouble. .. That's the best suggestion from this thread, by far! Can you produce a patch for 2.6.26 for this? Or perhaps someone else here, with the right code familiarity, could? Some sort of CONFIG option would likely be wanted to either enable/disable this feature, of course. Cheers --
From: Willy Tarreau <w@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 9:38 am 2008 On Sun, Apr 20, 2008 at 09:27:32AM -0400, Mark Lord wrote: > Willy Tarreau wrote: > > > >What would really help would be to have 8k stacks with the lower page > >causing a fault and print a stack trace upon first access. That way, > >the safe setting would still report us useful information without > >putting users into trouble. > .. > > That's the best suggestion from this thread, by far! > Can you produce a patch for 2.6.26 for this? Unfortunately, I can't. I wouldn't know where to start from. > Or perhaps someone else here, with the right code familiarity, could? I hope so. > Some sort of CONFIG option would likely be wanted to > either enable/disable this feature, of course. If we want to migrate to 4k sooner or later, this behaviour would not need a config option, maybe just a /proc or /sys tunable to disable the warning. Config would be either (4k + risk of crash) or (8k + warning). The *real* issue is to decide whether we need/want 4k or not, because I think we're still discussing the subject for no reason, as usual... Willy --
From: Andi Kleen <andi@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 10:19 am 2008 Willy Tarreau wrote: > On Sun, Apr 20, 2008 at 09:27:32AM -0400, Mark Lord wrote: >> Willy Tarreau wrote: >>> What would really help would be to have 8k stacks with the lower page >>> causing a fault and print a stack trace upon first access. That way, >>> the safe setting would still report us useful information without >>> putting users into trouble. >> .. >> >> That's the best suggestion from this thread, by far! Only if you believe that 4K stack pages are a worthy goal. As far as I can figure out they are not. They might have been a worthy goal on crappy 2.4 VMs, but these times are long gone. The "saving memory on embedded" argument also does not quite convince me, it is unclear if that is really a significant amount of memory on these systems and if that couldn't be addressed better (e.g. in running generally less kernel threads). I don't have numbers on this, but then the people who made this argument didn't have any either :) If anybody has concrete statistics on this (including other kernel memory users in realistic situations) please feel free to post them. >> Can you produce a patch for 2.6.26 for this? > > Unfortunately, I can't. I wouldn't know where to start from. The problem with his suggestion is that the lower 4K of the stack page are accessed in normal operation too because it contains the thread_struct. That could be changed, but it would be a relatively large change because you would need to audit/change a lot of code who assumes thread_struct and stack are continuous If that was changed implementing Willy's suggestion would not be that difficult using cpa() at the cost of some general slowdown in increased TLB misses and much higher thread creation/tear down cost etc, Using the alternative vmalloc way has also other issues. But still the fundamental problem is that it would likely only hit the interesting cases in real production setups and I don't think the production users would be very happy to slow down their kernels and handle strange backtraces just to act as guinea pigs for something dubious -Andi --
From: Jörn <joern@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 12:41 pm 2008 On Sun, 20 April 2008 16:19:29 +0200, Andi Kleen wrote: > > Only if you believe that 4K stack pages are a worthy goal. > As far as I can figure out they are not. They might have been > a worthy goal on crappy 2.4 VMs, but these times are long gone. > > The "saving memory on embedded" argument also does not > quite convince me, it is unclear if that is really > a significant amount of memory on these systems and if that > couldn't be addressed better (e.g. in running generally > less kernel threads). I don't have numbers on this, > but then the people who made this argument didn't have any > either :) It is not uncommon for embedded systems to be designed around 16MiB. Some may even have less, although I haven't encountered any of those lately. When dealing in those dimensions, savings of 100k are substantial. In some causes they may be the difference between 16MiB or 32MiB, which translates to manufacturing costs. In others it simply means that the system can cache a bit more and run faster, or it can have a little more functionality. In most cases it simply allows userspace programmers to avoid looking harder to save those 100k, as they are already saved in kernel space. Therefore we made life hard for us in order to make life easier for someone else, saving them time and money. Whether that is worth it depends on your personal point of view. Many embedded people will claim "Hell yes!" Of those that don't, most are simply ignoring currently mainline kernels and will regret the development later. They care, thay just don't tend to care enough to engage in these discussions or even know about them. :( Jörn -- Eighty percent of success is showing up. -- Woody Allen --
From: Andi Kleen <andi@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 1:19 pm 2008 Jörn Engel wrote: > On Sun, 20 April 2008 16:19:29 +0200, Andi Kleen wrote: >> Only if you believe that 4K stack pages are a worthy goal. >> As far as I can figure out they are not. They might have been >> a worthy goal on crappy 2.4 VMs, but these times are long gone. >> >> The "saving memory on embedded" argument also does not >> quite convince me, it is unclear if that is really >> a significant amount of memory on these systems and if that >> couldn't be addressed better (e.g. in running generally >> less kernel threads). I don't have numbers on this, >> but then the people who made this argument didn't have any >> either :) > > It is not uncommon for embedded systems to be designed around 16MiB. But these are SoC systems. Do they really run x86? (note we're talking about an x86 default option here) Also I suspect in a true 16MB system you have to strip down everything kernel side so much that you're pretty much outside the "validated by testers" realm that Adrian cares about. > When dealing in those dimensions, savings of 100k are substantial. In > some causes they may be the difference between 16MiB or 32MiB, which > translates to manufacturing costs. In others it simply means that the > system can cache If you need the stack you don't have any less cache foot print. If you don't need it you don't have any either. -Andi --
From: Jörn <joern@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 1:43 pm 2008 On Sun, 20 April 2008 19:19:26 +0200, Andi Kleen wrote: > > But these are SoC systems. Do they really run x86? > (note we're talking about an x86 default option here) > > Also I suspect in a true 16MB system you have to strip down > everything kernel side so much that you're pretty much outside > the "validated by testers" realm that Adrian cares about. Maybe. I merely showed that embedded people (not me) have good reasons to care about small stacks. Whether they care enough to actually spend work on it - doubtful. > > When dealing in those dimensions, savings of 100k are substantial. In > > some causes they may be the difference between 16MiB or 32MiB, which > > translates to manufacturing costs. In others it simply means that the > > system can cache > > If you need the stack you don't have any less cache foot print. > If you don't need it you don't have any either. This part I don't understand. Jörn -- You ain't got no problem, Jules. I'm on the motherfucker. Go back in there, chill them niggers out and wait for the Wolf, who should be coming directly. -- Marsellus Wallace --
From: Andi Kleen <andi@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 2:19 pm 2008 Jörn Engel wrote: > On Sun, 20 April 2008 19:19:26 +0200, Andi Kleen wrote: >> But these are SoC systems. Do they really run x86? >> (note we're talking about an x86 default option here) >> >> Also I suspect in a true 16MB system you have to strip down >> everything kernel side so much that you're pretty much outside >> the "validated by testers" realm that Adrian cares about. > > Maybe. I merely showed that embedded people (not me) have good reasons > to care about small stacks. Sure but I don't think they're x86 embedded people. Right now there are very little x86 SOCs if any (iirc there is only some obscure rise core) and future SOCs will likely have more RAM. Anyways I don't have a problem to give these people any special options they need to do whatever they want. I just object to changing the default options on important architectures to force people in completely different setups to do part of their testing. Whether they care enough to actually spend > work on it - doubtful. > >>> When dealing in those dimensions, savings of 100k are substantial. In >>> some causes they may be the difference between 16MiB or 32MiB, which >>> translates to manufacturing costs. In others it simply means that the >>> system can cache >> If you need the stack you don't have any less cache foot print. >> If you don't need it you don't have any either. > > This part I don't understand. I was just objecting to your claim that small stack implies smaller cache foot print. Smaller stacks rarely give you smaller cache foot print in my kernel coding experience: First some stack is always safety and in practice unused. It won't be in cache. Then typically standard kernel stack pigs are just too large buffers on the stack which are not fully used. These also don't have much cache foot print. Or if you have a complicated call stack the typical fix is to move parts of it into another thread. But that doesn't give you less cache footprint because the cache foot print is just in someone else's stack. In fact you'll likely have slightly more cache foot print from that due to the context of the other thread. In theory if you e.g. convert a recursive algorithm to iterative you might save some cache foot print, but I don't think that really happens in kernel code. -Andi --


it's a long discussion on lkml already

April 23, 2008 - 1:53pm
Tomasz Chmielewski (not verified)

The discussion on 4k stacks pops in on lkml now and then for at least 4 years.

I wonder what were the technical arguments for defaulting to 8k years ago?

Strange question - it's much

April 23, 2008 - 2:37pm
Anonymous (not verified)

Strange question - it's much simpler to write code for a larger stack size (think recursion).

So why didn't they choose

April 23, 2008 - 4:11pm
Anonymous (not verified)

So why didn't they choose 16k? ;)

Because 8k apparently was

April 23, 2008 - 4:19pm
Anonymous (not verified)

Because 8k apparently was enough.

Actually, 640kb should be

April 23, 2008 - 5:58pm
Anonymous (not verified)

Actually, 640kb should be enough for anyone.
8k is only enough for linux hippies.

8K stacks

April 23, 2008 - 9:35pm
Anonymouse (not verified)

Some processors have a 'page size' of 8192B. If you allocate 8KB stacks, then allocation is trivial on 8KB, 4KB, 2KB, 1KB page sizes (but really, all CPU's I've dealt with over the past 10 years have 4K and 8K page sizes only). The only substantial issue I can think of is that on 8KB page machines, using 4KB would make memory management unnecessarily difficult. So 8K paged machines need to maintain that 8K stack and 4K paged machines can use 4K. Some poor person has got to go through an awful lot of code and make sure that there aren't any problems with merely changing definitions from 8K to 4K.

The issue is "order(1)"

April 24, 2008 - 10:29am

Nothing's magic about 4K stacks in general. The issue is the difference between a single page allocation—an "order(0)" allocation—versus a two page allocation—an "order(1)" allocation. Kernel stacks need to have physically contiguous addresses because they need to be "always present," and therefore have trivial mappings not subject to VM management. Finding two physically contiguous pages is much harder than simply "finding a page."

On most architectures, the page size is 4K. This is true on x86 and x86-64, which are the architectures under discussion. Thus, the issue for these architectures is whether they can support a 4K stack size so that kernel stack allocations are always order(0). On architectures with larger page sizes, it's trivial to have order(0) kernel stacks. It seems rather unlikely someone would try for a sub-page kernel stack on a machine with 8K pages. It's the architectures with 4K pages that have this issue.

That's why this feature is generally referred to as "4K kernel stacks," since its the 4K page architectures that impose the maximum stack depth on the rest of the kernel should they try to go for order(0) kernel stack allocations.

On a different note, PPC can also do 64K hardware pages, and there's some evidence this has a huge performance benefit too due to fewer trips through the VM code. (This has been confirmed by building 64K pages out of 4K pages, to sort the kernel benefit from the MMU benefit.) Thus, on PPC with 64K hardware pages, they can have very generous kernel stacks. :-)

--
Program Intellivision and play Space Patrol!

But Linux was developed on

April 24, 2008 - 3:17pm
Anonymous (not verified)

But Linux was developed on x86, which has 4k pages. So what is your point again?

recursion is not allowed in

April 23, 2008 - 5:31pm
Anonymous (not verified)

recursion is not allowed in kernel code

But there used to be

April 24, 2008 - 5:25am
Anonymous (not verified)

But there used to be recursive (which has different meanings, btw) algorithms and the question was a historic one.

"Since the

April 23, 2008 - 6:36pm
Anonymous (not verified)

"Since the second-most-common reason for stack overages is ndiswrapper... Well,
with there being so much more hardware now supported directly by the linux
kernel... I'm stunned every time someone tells me "I can't run Linux on my
laptop, there is hardware that isn't supported without me having to get
ndiswrapper". The last time someone said that to me I pointed to the fact
that their hardware is supported by the latest kernel and even offered to
build&install it for them."

Is this guy joking? I use 3 different laptops with Linux (2 work, 1 home) and none of them work without ndiswrapper. As far as I can tell, the only wireless with good support is intel. If you don't have intel, you're kind of screwed without ndiswrapper.

ndiswrapper

April 24, 2008 - 1:22am
Anonymous (not verified)

Are you so sure?

A *lot* of people think they need NDISwrapper but are wrong. At some point in the past their card wasn't well supported and ndiswrapper got traction. So now anyone who does a search finds a dozen (old) forums recommending ndiswrapper, and a dozen current people who are confused... but today, many of those cards have very robust native support.

There are, no doubt, some cards which are still not (well) supported and it's possible that you have some of those cards, but I'd bet that all three of your laptops have the same card .. which wouldn't be a fair comparison.

Intel cards are well supported and fairly popular, Atheros cards are very popular and are well supported by the blobby native driver, and some are well supported by the free driver. People used to use ndiswrapper for both of these, and some people still insist that they need it.

You should have mentioned what cards you have...

ndiswrapper vs bcm43xx

April 25, 2008 - 11:46am
Anonymous (not verified)

The most common need for using ndiswrapper is having a broadcom wireless chipset.
Most broadcom chips are now supported by bcm43xx but I'm still using ndiswrapper. Why? Because it works better. Bcm43xx loses connection every ~30min and has lower power output (loses connection when behind 2 walls in my room). Ndiswrapper doesn't have these problems. It never crashed my system.
Some would say that ndiswrapper is not free because it has to use the Windows driver. Bcm43xx needs firmware, which has to be extracted from Windows drivers, so what's the difference?

> There are, no doubt, some

April 27, 2008 - 9:12pm
Anonymous (not verified)

> There are, no doubt, some cards which are still not (well) supported and it's possible that you have some of those cards, but I'd bet that all three of your laptops have the same card .. which wouldn't be a fair comparison.

Just to give you an idea of the real state of wireless:

The only wireless hardware I've got working without a problem is a D-Link - shame that they no longer stock them in the shop I got it from.

I've also got a US Robotics USB dongle (USR8054-22) that doesn't work at all without ndiswrapper, and then only after making sure the kernel doesn't "helpfully" autoload prism54 locking it up. On the rare occasions when it actually initialises properly and connects to the access point it lasts about an hour before dropping the connection until next reboot. I tried searching around to see if there really was an open driver for this but there's nothing even close to working.

And then there's the Atheros chipset in the eeePC that works, most of the time anyway, with its native linux driver - a binary blob.

32-bit vs 64-bit

April 23, 2008 - 7:49pm

Actually, it doesn't surprise me that 32-bit would use more stack space than 64-bit. More function arguments get passed on the stack in the 32-bit model, and there will be more register spills to the stack with the smaller register file. (I'm not 100% sure the stack args argument is completely true with reg-parm=3, but I believe it is.)

The number of actual pointers stored on the stack ought to get dwarfed by both these phenomena.

--
Program Intellivision and play Space Patrol!

Need for 8 KiB stacks

April 24, 2008 - 6:44pm
David VomLehn (not verified)

One place where I'm seeing some pretty big stack sizes is running NFS over Ethernet on USB with 32-bit MIPS processors. That seems to just pile on the stack frames. 4 KiB stacks just don't look acceptable with this configuration. Note that this is a development environment for embedded systems.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
speck-geostationary