Defaulting To 4K Stacks

Submitted by Jeremy
on April 22, 2008 - 9:42pm

Andrew Morton replied to a commit message making 4k stacks the default, saying, "this patch will cause kernels to crash." Ingo Molnar replied, "what mainline kernels crash and how will they crash? Fedora and other distros have had 4K stacks enabled for years." He added, "we've conducted tens of thousands of bootup tests with all sorts of drivers and kernel options enabled and have yet to see a single crash due to 4K stacks." During the lengthy discussion it was suggested that nfs+xfs+raid kernel configurations, and using ndiswrapper are the most common reasons for overflowing a 4K stack size.

Andi Kleen questioned the usefulness of 4k stacks, "as far as I can figure out they are not [a worthy goal]. They might have been a worthy goal on crappy 2.4 VMs, but these times are long gone." Arjan van de Ven suggested that though the 2.6 VM is much improved over the 2.4 VM, fragmentation with 8K stacks remains an unsolvable problem, "it's basic math; the Linux VM gets to deal with both short and long lasting allocations; no matter how hard you try to get some degree of fragmentation; especially due to the 15:1 acceleration you get due to the lowmem issue. And before you say 'you should use 64 bit on such machines'; I would love it if more people used 64 bit linux. Sadly the adoption rate of that is not very good still.... by far ;(" In another email, Arjan listed two advantages to 4K stacks, "1) less memory consumption in the lowmem zone (critical for enterprise use, also good for general performance), and 2) kernel stacks at 8K are one of the most prominent order-1 allocations in the kernel; again with big-memory systems the fragmentation of the lowmem zone is a problem (and the distros that ship 4K stacks went there because of customer complaints)".


From: Andrew Morton <akpm@...>
Subject: Re: x86: 4kstacks default
Date: Apr 18, 5:29 pm 2008

On Fri, 18 Apr 2008 17:37:36 GMT
Linux Kernel Mailing List wrote:

> Gitweb: http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=c...
> Commit: d61ecf0b53131564949bc4196e70f676000a845a
> Parent: f408b43ceedce49f26c01cd4a68dbbdbe2743e51
> Author: Ingo Molnar
> AuthorDate: Fri Apr 4 17:11:09 2008 +0200
> Committer: Ingo Molnar
> CommitDate: Thu Apr 17 17:41:34 2008 +0200
>
> x86: 4kstacks default
>
> Signed-off-by: Ingo Molnar
> ---
> arch/x86/Kconfig.debug | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
> index f4413c0..610aaec 100644
> --- a/arch/x86/Kconfig.debug
> +++ b/arch/x86/Kconfig.debug
> @@ -106,8 +106,8 @@ config DEBUG_NX_TEST
>
> config 4KSTACKS
> bool "Use 4Kb for kernel stacks instead of 8Kb"
> - depends on DEBUG_KERNEL
> depends on X86_32
> + default y

This patch will cause kernels to crash.

It has no changelog which explains or justifies the alteration.

afaict the patch was not posted to the mailing list and was not
discussed or reviewed.
--


From: Ingo Molnar <mingo@...> Subject: Re: x86: 4kstacks default Date: Apr 19, 10:23 am 2008

* Andrew Morton wrote:

> > config 4KSTACKS
> > bool "Use 4Kb for kernel stacks instead of 8Kb"
> > - depends on DEBUG_KERNEL
> > depends on X86_32
> > + default y
>
> This patch will cause kernels to crash.

what mainline kernels crash and how will they crash? Fedora and other
distros have had 4K stacks enabled for years:

$ grep 4K /boot/config-2.6.24-9.fc9
CONFIG_4KSTACKS=y

and we've conducted tens of thousands of bootup tests with all sorts of
drivers and kernel options enabled and have yet to see a single crash
due to 4K stacks. So basically the kernel default just follows the
common distro default now. (distros and users can still disable it)

Ingo
--


From: Andrew Morton <akpm@...> Subject: Re: x86: 4kstacks default Date: Apr 19, 1:49 pm 2008

> On Sat, 19 Apr 2008 16:23:29 +0200 Ingo Molnar wrote:
>
> * Andrew Morton wrote:
>
> > > config 4KSTACKS
> > > bool "Use 4Kb for kernel stacks instead of 8Kb"
> > > - depends on DEBUG_KERNEL
> > > depends on X86_32
> > > + default y
> >
> > This patch will cause kernels to crash.
>
> what mainline kernels crash and how will they crash?

There has been a dribble of reports - I don't have the links handy, nor did
I search for them.

> Fedora and other
> distros have had 4K stacks enabled for years:
>
> $ grep 4K /boot/config-2.6.24-9.fc9
> CONFIG_4KSTACKS=y
>
> and we've conducted tens of thousands of bootup tests with all sorts of
> drivers and kernel options enabled and have yet to see a single crash
> due to 4K stacks.

I doubt if you're testing things like nfsd-on-xfs-on-md-on-porky-scsi-driver.

Enable CONFIG_DEBUG_STACK_USAGE. Monitor the results. It's so scary that
I wonder if the feature is busted.

> So basically the kernel default just follows the
> common distro default now. (distros and users can still disable it)

Apparently not. I wouldn't enable it if I had a distro.

Anyway. We should be having this sort of discussion _before_ a patch
gets merged, no?
--

From: Shawn Bohrer <shawn.bohrer@...>
Subject: Re: x86: 4kstacks default
Date: Apr 19, 10:59 am 2008

On Sat, Apr 19, 2008 at 04:23:29PM +0200, Ingo Molnar wrote:
>
> * Andrew Morton wrote:
>
> > > config 4KSTACKS
> > > bool "Use 4Kb for kernel stacks instead of 8Kb"
> > > - depends on DEBUG_KERNEL
> > > depends on X86_32
> > > + default y
> >
> > This patch will cause kernels to crash.
>
> what mainline kernels crash and how will they crash? Fedora and other
> distros have had 4K stacks enabled for years:

If by other distros you mean RHEL then yes. However, openSUSE,
Ubuntu, and Mandriva all still have 8K stacks. I know of no other
distributions that default to 4K.

--
Shawn
--


From: Arjan van de Ven <arjan@...> Subject: Re: x86: 4kstacks default Date: Apr 19, 2:00 pm 2008

On Sat, 19 Apr 2008 09:59:48 -0500
Shawn Bohrer wrote:

> On Sat, Apr 19, 2008 at 04:23:29PM +0200, Ingo Molnar wrote:
> >
> > * Andrew Morton wrote:
> >
> > > > config 4KSTACKS
> > > > bool "Use 4Kb for kernel stacks instead of 8Kb"
> > > > - depends on DEBUG_KERNEL
> > > > depends on X86_32
> > > > + default y
> > >
> > > This patch will cause kernels to crash.
> >
> > what mainline kernels crash and how will they crash? Fedora and
> > other distros have had 4K stacks enabled for years:
>
> If by other distros you mean RHEL then yes. However, openSUSE,
> Ubuntu, and Mandriva all still have 8K stacks. I know of no other
> distributions that default to 4K.

centos, oracle and redflag tend to follow the RHEL/fedora settings.

To be honest, at this point we're at a situation where
* Several very popular distributions have this enabled for 5+ years,
apparently without any real issues (otherwise the enterprise releases
would have turned this off)
* The early "hot known issues" have been resolved afaik, things like
block device stacking, and symlink recursion lookups are either no longer
recursive, or a lot less recursive than they used to be.

There are clear benefits to 4K stacks (no need to reiterate the flamewar,
but worth mentioning)
* Less memory consumption in the lowmem zone (critical for enterprise use,
also good for general performance)
* Kernel stacks at 8K are one of the most prominent order-1 allocations in the
kernel; again with big-memory systems the fragmentation of the lowmem zone
is a problem (and the distros that ship 4K stacks went there because of customer
complaints)

On the flipside the arguments tend to be
1) certain stackings of components still runs the risk of overflowing
2) I want to run ndiswrapper
3) general, unspecified uneasyness.

For 1), we need to know which they are, and then solve them, because even on x86-64 with 8k stacks
they can be a problem (just because the stack frames are bigger, although not quite double, there).
I've not seen any recent reports, I'll try to extend the kerneloops.org client to collect the
"stack is getting low" warning to be able to see how much this really happens.

for 2), the real answer there is "ndiswrapper needs 12kb not 8kb"

for 3), this is hard to deal with but also generally unfounded... you can use this argument against any change in the kernel.

--


From: Ingo Molnar <mingo@...> Subject: Re: x86: 4kstacks default Date: Apr 19, 2:33 pm 2008

* Arjan van de Ven wrote:

> On Sat, 19 Apr 2008 09:59:48 -0500
> Shawn Bohrer wrote:
>
> > On Sat, Apr 19, 2008 at 04:23:29PM +0200, Ingo Molnar wrote:
> > >
> > > * Andrew Morton wrote:
> > >
> > > > > config 4KSTACKS
> > > > > bool "Use 4Kb for kernel stacks instead of 8Kb"
> > > > > - depends on DEBUG_KERNEL
> > > > > depends on X86_32
> > > > > + default y
> > > >
> > > > This patch will cause kernels to crash.
> > >
> > > what mainline kernels crash and how will they crash? Fedora and
> > > other distros have had 4K stacks enabled for years:
> >
> > If by other distros you mean RHEL then yes. However, openSUSE,
> > Ubuntu, and Mandriva all still have 8K stacks. I know of no other
> > distributions that default to 4K.
>
> centos, oracle and redflag tend to follow the RHEL/fedora settings.
>
> To be honest, at this point we're at a situation where
> * Several very popular distributions have this enabled for 5+ years,
> apparently without any real issues (otherwise the enterprise releases
> would have turned this off)
> * The early "hot known issues" have been resolved afaik, things like
> block device stacking, and symlink recursion lookups are either no longer
> recursive, or a lot less recursive than they used to be.
>
> There are clear benefits to 4K stacks (no need to reiterate the flamewar,
> but worth mentioning)
> * Less memory consumption in the lowmem zone (critical for enterprise use,
> also good for general performance)
> * Kernel stacks at 8K are one of the most prominent order-1 allocations in the
> kernel; again with big-memory systems the fragmentation of the lowmem zone
> is a problem (and the distros that ship 4K stacks went there because of customer
> complaints)
>
> On the flipside the arguments tend to be
> 1) certain stackings of components still runs the risk of overflowing
> 2) I want to run ndiswrapper
> 3) general, unspecified uneasyness.
>
> For 1), we need to know which they are, and then solve them, because
> even on x86-64 with 8k stacks they can be a problem (just because the
> stack frames are bigger, although not quite double, there). I've not
> seen any recent reports, I'll try to extend the kerneloops.org client
> to collect the "stack is getting low" warning to be able to see how
> much this really happens.
>
> for 2), the real answer there is "ndiswrapper needs 12kb not 8kb"
>
> for 3), this is hard to deal with but also generally unfounded... you
> can use this argument against any change in the kernel.

and lets observe it that 8K stacks are of course still offered, so if
anyone disables 4K stacks in the .config, it will stay disabled.

Ingo
--


From: Eric Sandeen <sandeen@...> Subject: Re: x86: 4kstacks default Date: Apr 19, 10:36 pm 2008

Arjan van de Ven wrote:

> On the flipside the arguments tend to be
> 1) certain stackings of components still runs the risk of overflowing
> 2) I want to run ndiswrapper
> 3) general, unspecified uneasyness.
>
> For 1), we need to know which they are, and then solve them, because even on x86-64 with 8k stacks
> they can be a problem (just because the stack frames are bigger, although not quite double, there).

Except, apparently, not, at least in my experience.

Ask the xfs guys if they see stack overflows on x86_64, or on x86.

I've personally never seen common stack problems with xfs on x86_64, but
it's very common on x86. I don't have a great answer for why, but
that's my anecdotal evidence.

> I've not seen any recent reports, I'll try to extend the kerneloops.org client to collect the
> "stack is getting low" warning to be able to see how much this really happens.

That sounds like a very good thing to collect, and maybe if I re-send a
"clearly state stack overflows at oops time" patch you can easily keep tabs.

Thanks,

-Eric
--


From: Arjan van de Ven <arjan@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 2:11 am 2008

On Sat, 19 Apr 2008 21:36:16 -0500
Eric Sandeen wrote:

> > For 1), we need to know which they are, and then solve them,
> > because even on x86-64 with 8k stacks they can be a problem (just
> > because the stack frames are bigger, although not quite double,
> > there).
>
> Except, apparently, not, at least in my experience.

if you actually go over on x86, it's not unlikely that you're getting close to the edge on 64 bit.

At minimum we really do want to fix these things...

> I've personally never seen common stack problems with xfs on x86_64,
> but it's very common on x86. I don't have a great answer for why, but
> that's my anecdotal evidence.

One thing I've learned with the kerneloops.org work is that people don't read
their dmesg.....
>
> > I've not seen any recent reports, I'll try to extend the
> > kerneloops.org client to collect the "stack is getting low" warning
> > to be able to see how much this really happens.
>
> That sounds like a very good thing to collect, and maybe if I re-send
> a "clearly state stack overflows at oops time" patch you can easily
> keep tabs.

... which makes me think we need to strengthen this part of the kernel.
(and then have kerneloops.org collect the issues)

If there's a clear pattern in the backtraces we will find it.
And then we can fix it... which is absolutely the right thing,
I don't think anyone disagrees with that.

So yes if you can dig up your patch, yes please!

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--

From: Eric Sandeen <sandeen@...>
Subject: Re: x86: 4kstacks default
Date: Apr 19, 11:29 pm 2008

Ingo Molnar wrote:
> * Andrew Morton wrote:
>
>>> config 4KSTACKS
>>> bool "Use 4Kb for kernel stacks instead of 8Kb"
>>> - depends on DEBUG_KERNEL
>>> depends on X86_32
>>> + default y
>> This patch will cause kernels to crash.
>
> what mainline kernels crash and how will they crash? Fedora and other
> distros have had 4K stacks enabled for years:
>
> $ grep 4K /boot/config-2.6.24-9.fc9
> CONFIG_4KSTACKS=y
>
> and we've conducted tens of thousands of bootup tests with all sorts of
> drivers and kernel options enabled and have yet to see a single crash
> due to 4K stacks.

Really, not one?

https://bugzilla.redhat.com/show_bug.cgi?id=247158
https://bugzilla.redhat.com/show_bug.cgi?id=227331
https://bugzilla.redhat.com/show_bug.cgi?id=240077

(hehe, ok, xfs is a common component there...)

and it's not always obvious that you've overflowed the stack.

CONFIG_DEBUG_STACKOVERFLOW isn't ery useful because the warning printk
it generates uses the remaining amount of stack, and tips the box.

> So basically the kernel default just follows the
> common distro default now. (distros and users can still disable it)

If Fedora is the common distro, ok. :)

Fedora is a pretty narrow sample in terms of IO stacks at least. I have
plenty of fondness for Fedora, but it's almost 100% ext3[1]. I spent a
fair amount of time getting xfs+lvm to survive 4k on F8; gcc caused
stack usage to grow in general from F7 to F8, and F9 seems to have
gotten tight again but I haven't gotten to the bottom of yet.

Heck my ext3-root-on-sda1 pre-beta F9 box, no nfs or lvm or xfs or
anything gets within 744 bytes of the end of the 4k stack simply by
*booting* (it was a modprobe process... maybe some module needs help)

How many other distros use 4K stacks on x86, really?

-Eric

[1] http://www.smolts.org/static/stats/stats.html shows 24588 ext3
filesystems, compared to 366 xfs, 248 reiserfs, 76 jfs ...
--


From: Ingo Molnar <mingo@...> Subject: Re: x86: 4kstacks default Date: Apr 21, 10:31 am 2008

* Eric Sandeen wrote:

> > and we've conducted tens of thousands of bootup tests with all sorts
> > of drivers and kernel options enabled and have yet to see a single
> > crash due to 4K stacks.
>
> Really, not one?
>
> https://bugzilla.redhat.com/show_bug.cgi?id=247158
> https://bugzilla.redhat.com/show_bug.cgi?id=227331
> https://bugzilla.redhat.com/show_bug.cgi?id=240077
>
> (hehe, ok, xfs is a common component there...)
>
> and it's not always obvious that you've overflowed the stack.
>
> CONFIG_DEBUG_STACKOVERFLOW isn't ery useful because the warning printk
> it generates uses the remaining amount of stack, and tips the box.

note that in -rt we have an ftrace plugin that measures _precise_ stack
footprint, when it happens.

so it's possible to measure exact stack footprint and save a stack trace
when that happens.

Ingo
--

From: Adrian Bunk <bunk@...>
Subject: Re: x86: 4kstacks default
Date: Apr 20, 4:09 am 2008

On Sat, Apr 19, 2008 at 09:59:48AM -0500, Shawn Bohrer wrote:
> On Sat, Apr 19, 2008 at 04:23:29PM +0200, Ingo Molnar wrote:
> >
> > * Andrew Morton wrote:
> >
> > > > config 4KSTACKS
> > > > bool "Use 4Kb for kernel stacks instead of 8Kb"
> > > > - depends on DEBUG_KERNEL
> > > > depends on X86_32
> > > > + default y
> > >
> > > This patch will cause kernels to crash.
> >
> > what mainline kernels crash and how will they crash? Fedora and other
> > distros have had 4K stacks enabled for years:
>
> If by other distros you mean RHEL then yes. However, openSUSE,
> Ubuntu, and Mandriva all still have 8K stacks. I know of no other
> distributions that default to 4K.

MontaVista offers 4k stacks for arm (currently an external patch) and
markets that as a feature to customers, so many of them might use it.

In-kernel the sh and m68knommu ports also offer 4k stacks (for both
archs there's also a defconfig using it), and the mn10300 port contains
an #ifdef but no config option.

The stack problems in the kernel tend to not be in arch code, and if
we don't get i386 to always run with 4k stacks there's no chance that
it will ever work reliably on other architectures.

> Shawn

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--


From: Alan Cox <alan@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 4:06 am 2008

> The stack problems in the kernel tend to not be in arch code, and if
> we don't get i386 to always run with 4k stacks there's no chance that
> it will ever work reliably on other architectures.

Not really the case - embedded tends not to use deep stacks of drivers.

Alan
--


From: Adrian Bunk <bunk@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 4:51 am 2008

On Sun, Apr 20, 2008 at 09:06:23AM +0100, Alan Cox wrote:
> > The stack problems in the kernel tend to not be in arch code, and if
> > we don't get i386 to always run with 4k stacks there's no chance that
> > it will ever work reliably on other architectures.
>
> Not really the case - embedded tends not to use deep stacks of drivers.

Something like nfsd-over-xfs-over-raid is (or was) the most common
problem - and this or similar stackings might be used in NAS devices.

> Alan

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--


From: Alan Cox <alan@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 5:36 am 2008

On Sun, 20 Apr 2008 11:51:04 +0300
Adrian Bunk wrote:

> On Sun, Apr 20, 2008 at 09:06:23AM +0100, Alan Cox wrote:
> > > The stack problems in the kernel tend to not be in arch code, and if
> > > we don't get i386 to always run with 4k stacks there's no chance that
> > > it will ever work reliably on other architectures.
> >
> > Not really the case - embedded tends not to use deep stacks of drivers.
>
> Something like nfsd-over-xfs-over-raid is (or was) the most common
> problem - and this or similar stackings might be used in NAS devices.

Specific cases yes, but such NAS devices have big processors and are not
little emdedded CPUs. On an embedded box you know at build time what it
will be doing.
--


From: Adrian Bunk <bunk@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 6:44 am 2008

On Sun, Apr 20, 2008 at 10:36:11AM +0100, Alan Cox wrote:
> On Sun, 20 Apr 2008 11:51:04 +0300
> Adrian Bunk wrote:
>
> > On Sun, Apr 20, 2008 at 09:06:23AM +0100, Alan Cox wrote:
> > > > The stack problems in the kernel tend to not be in arch code, and if
> > > > we don't get i386 to always run with 4k stacks there's no chance that
> > > > it will ever work reliably on other architectures.
> > >
> > > Not really the case - embedded tends not to use deep stacks of drivers.
> >
> > Something like nfsd-over-xfs-over-raid is (or was) the most common
> > problem - and this or similar stackings might be used in NAS devices.
>
> Specific cases yes, but such NAS devices have big processors and are not
> little emdedded CPUs. On an embedded box you know at build time what it
> will be doing.

The code in the kernel that gets the fewest coverage at all are our
error paths, and some vendor might try 4k stacks, validate it works in
all use cases - and then it will blow up in some error condition he
didn't test.

6k is known to work, and there aren't many problems known with 4k.

And from a QA point of view the only way of getting 4k thoroughly tested
by users, and well also tested in -rc kernels for catching regressions
before they get into stable kernels, is if we get 4k stacks enabled
unconditionally on i386.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--


From: Andi Kleen <andi@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 8:27 am 2008

Adrian Bunk writes:
>
> 6k is known to work, and there aren't many problems known with 4k.
>
> And from a QA point of view the only way of getting 4k thoroughly tested

But you have to first ask why do you want 4k tested? Does it serve
any useful purpose in itself? I don't think so. Or you're saying
it's important to support 50k kernel threads on 32bit kernels?

-Andi
--


From: Daniel Hazelton <dhazelton@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 11:44 am 2008

On Sunday 20 April 2008 08:27:14 Andi Kleen wrote:
> Adrian Bunk writes:
> > 6k is known to work, and there aren't many problems known with 4k.
> >
> > And from a QA point of view the only way of getting 4k thoroughly tested
>
> But you have to first ask why do you want 4k tested? Does it serve
> any useful purpose in itself? I don't think so. Or you're saying
> it's important to support 50k kernel threads on 32bit kernels?
>
> -Andi

Andi, you're the only one I've seen seriously pounding the "50k threads"
thing - I don't think anyone is really fooled by the straw-man, so I'd
suggest you drop it.

The real issue is that you think (and are correct in thinking) that people are
idiots. Yes, there will be breakages if the default is changed to 4k stacks -
but if people are running new kernels on boxes that'll hit stack use problems
(that *AREN'T* related to ndiswrapper) and haven't made sure that they've
configured the kernel properly, then they deserve the outcome. It isn't the
job of the Linux Kernel to protect the incompetent - nor is it the job of
linux kernel developers to do such.

If people are doing a "zcat /proc/kconfig.gz > .config && make oldconfig" (or
similar) the problem shouldn't even appear, really. They'll get whatever
setting was in their old config for the stack size. And until the problems
with deep-stack setups - like nfs+xfs+raid - get resolved I'd think that the
option to configure the stack size would remain.

Since the second-most-common reason for stack overages is ndiswrapper... Well,
with there being so much more hardware now supported directly by the linux
kernel... I'm stunned every time someone tells me "I can't run Linux on my
laptop, there is hardware that isn't supported without me having to get
ndiswrapper". The last time someone said that to me I pointed to the fact
that their hardware is supported by the latest kernel and even offered to
build&install it for them.

DRH

--
Dialup is like pissing through a pipette. Slow and excruciatingly painful.
--


From: Andi Kleen <andi@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 1:26 pm 2008

Daniel Hazelton wrote:

> Andi, you're the only one I've seen seriously pounding the "50k threads"
> thing. I don't think anyone is really fooled by the straw-man, so I'd
> suggest you drop it.

Ok, perhaps we can settle this properly. Like historicans. We study the
original sources.

The primary resource is the original commit adding the 4k stack code.
You cannot find this in latest git because it predates 2.6.12, but it is
available in one of the historic trees imported from BitKeeper like
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git

Here's the log:
>>
commit 95f238eac82907c4ccbc301cd5788e67db0715ce
Author: Andrew Morton
Date: Sun Apr 11 23:18:43 2004 -0700

[PATCH] ia32: 4Kb stacks (and irqstacks) patch

From: Arjan van de Ven

Below is a patch to enable 4Kb stacks for x86. The goal of this is to

1) Reduce footprint per thread so that systems can run many more threads
(for the java people)

2) Reduce the pressure on the VM for order > 0 allocations. We see
real life
workloads (granted with 2.4 but the fundamental fragmentation
issue isn't
solved in 2.6 and isn't solvable in theory) where this can be a
problem.
In addition order > 0 allocations can make the VM "stutter" and
give more
latency due to having to do much much more work trying to defragment

...
<<

This gives us two reasons as you can see, one of them many threads
and another mostly only relevant to 2.4

Now I was also assuming that nobody took (1) really serious and
attacked (2) in earlier thread; in particular in

http://article.gmane.org/gmane.linux.kernel/665584

>>
Actually the real reason the 4K stacks were introduced IIRC was that
the VM is not very good at allocation of order > 0 pages and that only
using order 0 and not order 1 in normal operation prevented some stalls.

This rationale also goes back to 2.4 (especially some of the early 2.4
VMs were not very good) and the 2.6 VM is generally better and on
x86-64 I don't see much evidence that these stalls are a big problem
(but then x86-64 also has more lowmem).
<<

This was corrected by Ingo who was one of the primary authors of the patch:

http://thread.gmane.org/gmane.linux.kernel/665420:

>>
no, the primary motivation Arjan and me started working on 4K stacks and
implemented it was what Denys mentioned: i had a testcase that ran
50,000 threads before it ran out of memory - i wanted it to run 100,000
threads. The improved order-0 behavior was just icing on the cake.

Ingo
<<

and then from Arjan:

http://thread.gmane.org/gmane.linux.kernel/665420

>>
> no, the primary motivation Arjan and me started working on 4K stacks
> and implemented it was what Denys mentioned: i had a testcase that

well that and the fact that RH had customers who had major issues at
fewer threads
with 8Kb versus fragmentation.
<<

So both the primary authors of the patch state that 50k threads
was the main reason. I didn't believe it at first either, but after
these forceful corrections I do now.

You're totally wrong when you call it a straw man.

-Andi

--


From: Arjan van de Ven <arjan@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 2:48 pm 2008

On Sun, 20 Apr 2008 19:26:10 +0200
Andi Kleen wrote:

> Daniel Hazelton wrote:
>
> > Andi, you're the only one I've seen seriously pounding the "50k
> > threads" thing. I don't think anyone is really fooled by the
> > straw-man, so I'd suggest you drop it.
>
> Ok, perhaps we can settle this properly. Like historicans. We study
> the original sources.
>
> The primary resource is the original commit adding the 4k stack code.
> You cannot find this in latest git because it predates 2.6.12, but it
> is available in one of the historic trees imported from BitKeeper like
> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
>
> Here's the log:
> >>
> commit 95f238eac82907c4ccbc301cd5788e67db0715ce
> Author: Andrew Morton
> Date: Sun Apr 11 23:18:43 2004 -0700
>
> [PATCH] ia32: 4Kb stacks (and irqstacks) patch
>
> From: Arjan van de Ven
>
> Below is a patch to enable 4Kb stacks for x86. The goal of this
> is to
>
> 1) Reduce footprint per thread so that systems can run many more
> threads (for the java people)
>
> 2) Reduce the pressure on the VM for order > 0 allocations. We see
> real life
> workloads (granted with 2.4 but the fundamental fragmentation
> issue isn't
> solved in 2.6 and isn't solvable in theory) where this can be a
> problem.
> In addition order > 0 allocations can make the VM "stutter" and
> give more
> latency due to having to do much much more work trying to
> defragment
>
> ...
> <<
>
> This gives us two reasons as you can see, one of them many threads
> and another mostly only relevant to 2.4
>
> Now I was also assuming that nobody took (1) really serious and

I'm sorry but I really hope nobody shares your assumption here.
These are real customer workloads; java based "many things going on" at a time
showed several thousands of threads fin the system (a dozen or two per request, multiplied
by the number of outstanding connections) for *real customers*.
That you don't take that serious, fair, you can take serious whatever you want.

> attacked (2) in earlier thread; in particular in

yes you did attack. But lets please use more friendly conversation here than words like
"attack". This is not a war, and we really shouldn't be hostile in this forum, neither
in words nor in intention.

>
> http://article.gmane.org/gmane.linux.kernel/665584
>
> >>
> Actually the real reason the 4K stacks were introduced IIRC was that
> the VM is not very good at allocation of order > 0 pages and that only
> using order 0 and not order 1 in normal operation prevented some
> stalls.
>
> This rationale also goes back to 2.4 (especially some of the early 2.4
> VMs were not very good) and the 2.6 VM is generally better and on
> x86-64 I don't see much evidence that these stalls are a big problem
> (but then x86-64 also has more lowmem).
> <<

What you didn't atta^Waddress was the observation that fragmentation is fundamentally unsolvable.
Yes 2.4 sucked a lot more than 2.6 does. But even 2.6 will (and does) have fragmentation issues.
We don't have effective physical address based reclaim yet for higher order allocs.

>
> http://thread.gmane.org/gmane.linux.kernel/665420:
>
> >>
> no, the primary motivation Arjan and me started working on 4K stacks
> and implemented it was what Denys mentioned: i had a testcase that ran
> 50,000 threads before it ran out of memory - i wanted it to run
> 100,000 threads. The improved order-0 behavior was just icing on the
> cake.
>
> Ingo
> <<
>
> and then from Arjan:
>
> http://thread.gmane.org/gmane.linux.kernel/665420
>
> >>
> > no, the primary motivation Arjan and me started working on 4K stacks
> > and implemented it was what Denys mentioned: i had a testcase that
>
> well that and the fact that RH had customers who had major issues at
> fewer threads
> with 8Kb versus fragmentation.
> <<
>
> So both the primary authors of the patch state that 50k threads
> was the main reason. I didn't believe it at first either, but after
> these forceful corrections I do now.

I'm sorry but I fail to entirely understand where your "So" or the rest of your
conclusion comes from in terms of "both the authors". Which part of "fewer threads" and
"8kb versus fragmentation" did you misunderstand to get to your conclusion?

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--


From: Andi Kleen <andi@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 4:01 pm 2008

> These are real customer workloads; java based "many things going on" at a time
> showed several thousands of threads fin the system (a dozen or two per request, multiplied
> by the number of outstanding connections) for *real customers*.

Several thousands or 50k? Several thousands sounds large, but not entirely unreasonable,
but it is far from 50k.

> That you don't take that serious, fair, you can take serious whatever you want.

No I don't take 50k threads on 32bit serious. And I hope you do not
either.

Why I don't take it serious: on 32bit 50k threads will lead
to lowmem exhaustion if the threads are actually doing something
(like keeping select pages around or similar and having some thread
local data). You'll easily be at 16-32K/thread and that is already
far beyond the lowmem available on any 3:1 split 32bit kernel, likely
even beyond 2:2. Even with 3:1 it could be tight.

So you can say about customer workloads what you want, but you'll
have a hard time convincing me they really run 50k threads
doing something on 32bit.

Now if we take the real realistic overhead of a thread into
account 4k or more less don't really matter all that much
and the decreased safety from the 4k stack starts to look
like a very bad bargain.

>> attacked (2) in earlier thread; in particular in
>
> yes you did attack.
> But lets please use more friendly conversation here than words like
> "attack". This is not a war, and we really shouldn't be hostile in this forum, neither
> in words nor in intention.

Ok what word would you prefer?

There is no war involved right, just a technical argument. I previously
always assumed that "attacking" was a standard term in discussions, but
if you don't like I can switch to another one.

Regarding war like terminology: I used to think that people who commonly
talk about "nuking code" went a little too far, but at some point
I adapted to them I think. Perhaps it comes from that.

> What you didn't atta^Waddress

Fine, I will call it address from now.

> was the observation that fragmentation is fundamentally unsolvable.

Where was that observation?

> Yes 2.4 sucked a lot more than 2.6 does. But even 2.6 will (and does) have fragmentation issues.
> We don't have effective physical address based reclaim yet for higher order allocs.

I don't see any evidence that there are serious order 1 fragmentation
issues on 2.6. If you have any please post it.

-Andi
--


From: Arjan van de Ven <arjan@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 6:33 pm 2008

On Sun, 20 Apr 2008 22:01:46 +0200
Andi Kleen wrote:

>
>
> > These are real customer workloads; java based "many things going
> > on" at a time showed several thousands of threads fin the system (a
> > dozen or two per request, multiplied by the number of outstanding
> > connections) for *real customers*.
>
> Several thousands or 50k? Several thousands sounds large, but not
> entirely unreasonable, but it is far from 50k.

it is you who keeps putting up the 50k argument.
What I'm talking about is in the 10k to 20k range; and that is actual workloads
by real customers.
>
> > That you don't take that serious, fair, you can take serious
> > whatever you want.
>
> No I don't take 50k threads on 32bit serious. And I hope you do not
> either.

[ removed a bunch of stuff about 50k again ]

>
> > was the observation that fragmentation is fundamentally unsolvable.
>
> Where was that observation?

it was in the commit message from me you quoted, and was rather widely discussed at the time.
It's also basic math; the Linux VM gets to deal with both short and long lasting allocations;
no matter how hard you try to get some degree of fragmentation; especially due to the
15:1 acceleration you get due to the lowmem issue.

And before you say "you should use 64 bit on such machines"; I would love it if more people used 64 bit linux.
Sadly the adoption rate of that is not very good still.... by far ;(

>
> > Yes 2.4 sucked a lot more than 2.6 does. But even 2.6 will (and
> > does) have fragmentation issues. We don't have effective physical
> > address based reclaim yet for higher order allocs.
>
> I don't see any evidence that there are serious order 1 fragmentation
> issues on 2.6.

I assume you're not asking me to give you customer confidential data from a previous job in public ;)

>If you have any please post it.

just like you're posting the evidence that 4k stacks overflows?

Google scores:

1-order allocation failed 54000 pages
do_IRQ: stack overflow 4560 pages

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--


From: Andi Kleen <andi@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 7:16 pm 2008

Arjan van de Ven writes:
>
> it is you who keeps putting up the 50k argument.

See the links I posted and quote in an earlier message up the thread if you
don't remember what you wrote yourself.

I originally only hold up the fragmentation argument (or rather only
argued against it), until I was corrected by both Ingo and you in the
earlier thread and you both insisted that 50k threads were the real
reason'd'etre for 4k stacks.

You're saying that was wrong and the fragmentation issue was really the
real reason for 4k stacks? If both you and Ingo can agree on that
I would be happy to forget the 50k threads :)

> What I'm talking about is in the 10k to 20k range; and that is actual workloads
> by real customers.

On a 32bit kernel?

My estimate is that you need around 32k for a functional blocked thread
in a network server (8k + 2*4k for poll with large fd table and wait queues +
some pinned dentries and inodes + misc other stuff). With 20k you're 625MB into
your lowmem which leaves about 200MB left on a 3:1 system with 16GB
(and ~128MB mem_map). That might work for some time, but I expect it will fall
over at some point because there is just too much pinned lowmem
and not enough left for other stuff (like networking buffers etc.)

10k sounds more doable. But again do 4k more or less make
a big difference with the other thread overhead? I don't think so.

And trading reliability (and functionality -- you basically have to
cut off XFS)just for 4k/thread doesn't seem like good bargain to
me. Especially with kernel code getting more complicated all the time.

>> I don't see any evidence that there are serious order 1 fragmentation
>> issues on 2.6.
>
> I assume you're not asking me to give you customer confidential data from a previous job in public ;)

Well if it is that serious a problem surely it will have hit some public
bugzillas or mailing lists? Arguing with something secret is also not
very useful.

Also I find it always important to reevaluate assumptions when new
facts come up. In this case we should reevaluate a decision that made
sense[1] in 2.4 with the new facts of 2.6 (e.g. new VM with much better
reclaim)

[1] refering to the fragmentation argument, not the 50k threads which
were always unrealistic.

-Andi
--


From: Arjan van de Ven <arjan@...> Subject: Re: x86: 4kstacks default Date: Apr 21, 1:53 am 2008

On Mon, 21 Apr 2008 01:16:22 +0200
Andi Kleen wrote:

> Arjan van de Ven writes:
> >
> > it is you who keeps putting up the 50k argument.
>
> See the links I posted and quote in an earlier message up the thread
> if you don't remember what you wrote yourself.
>
> I originally only hold up the fragmentation argument (or rather only
> argued against it), until I was corrected by both Ingo and you in the
> earlier thread and you both insisted that 50k threads were the real
> reason'd'etre for 4k stacks.
>
> You're saying that was wrong and the fragmentation issue was really
> the real reason for 4k stacks? If both you and Ingo can agree on that
> I would be happy to forget the 50k threads :)

I already corrected you misquoting/misunderstanding me; should I do this again?

>
> > What I'm talking about is in the 10k to 20k range; and that is
> > actual workloads by real customers.
>
> On a 32bit kernel?
>
> My estimate is that you need around 32k for a functional blocked
> thread in a network server (8k + 2*4k for poll with large fd table
> and wait queues + some pinned dentries and inodes + misc other
> stuff). With 20k you're 625MB into your lowmem which leaves about
> 200MB left on a 3:1 system with 16GB (and ~128MB mem_map). That
> might work for some time, but I expect it will fall over at some
> point because there is just too much pinned lowmem and not enough
> left for other stuff (like networking buffers etc.)
>
> 10k sounds more doable. But again do 4k more or less make
> a big difference with the other thread overhead? I don't think so.

no but the other ones are order 0..

--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--

From: Willy Tarreau <w@...>
Subject: Re: x86: 4kstacks default
Date: Apr 20, 8:47 am 2008

On Sun, Apr 20, 2008 at 02:27:14PM +0200, Andi Kleen wrote:
> Adrian Bunk writes:
> >
> > 6k is known to work, and there aren't many problems known with 4k.
> >
> > And from a QA point of view the only way of getting 4k thoroughly tested
>
> But you have to first ask why do you want 4k tested? Does it serve
> any useful purpose in itself? I don't think so. Or you're saying
> it's important to support 50k kernel threads on 32bit kernels?

Clearly if I have the choice between a kernel which can run 50k threads
and a kernel which does not crash under me during an I/O error, I choose
the later! I don't even imagine what purpose 50k kernel threads may serve.
I certainly can understand that reducing memory footprint is useful, but
if we want wider testing of 4k stacks, considering they may fail in error
path in complex I/O environment, it's not likely during -rc kernels that
we'll detect problems, and if we push them down the throat of users in a
stable release, of course they will thank us very much for crashing their
NFS servers in production during peak hours.

I have nothing against changing the default setting to 4k provided that
it is easy to get back to the save setting (ie changing a config option,
or better, a cmdline parameter). I just don't agree with the idea of
forcing users to swim in the sh*t, it only brings bad reputation to
Linux.

What would really help would be to have 8k stacks with the lower page
causing a fault and print a stack trace upon first access. That way,
the safe setting would still report us useful information without
putting users into trouble.

Willy

--


From: Mark Lord <lkml@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 9:27 am 2008

Willy Tarreau wrote:
>
> What would really help would be to have 8k stacks with the lower page
> causing a fault and print a stack trace upon first access. That way,
> the safe setting would still report us useful information without
> putting users into trouble.
..

That's the best suggestion from this thread, by far!
Can you produce a patch for 2.6.26 for this?
Or perhaps someone else here, with the right code familiarity, could?

Some sort of CONFIG option would likely be wanted to
either enable/disable this feature, of course.

Cheers
--


From: Willy Tarreau <w@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 9:38 am 2008

On Sun, Apr 20, 2008 at 09:27:32AM -0400, Mark Lord wrote:
> Willy Tarreau wrote:
> >
> >What would really help would be to have 8k stacks with the lower page
> >causing a fault and print a stack trace upon first access. That way,
> >the safe setting would still report us useful information without
> >putting users into trouble.
> ..
>
> That's the best suggestion from this thread, by far!
> Can you produce a patch for 2.6.26 for this?

Unfortunately, I can't. I wouldn't know where to start from.

> Or perhaps someone else here, with the right code familiarity, could?

I hope so.

> Some sort of CONFIG option would likely be wanted to
> either enable/disable this feature, of course.

If we want to migrate to 4k sooner or later, this behaviour would not
need a config option, maybe just a /proc or /sys tunable to disable
the warning. Config would be either (4k + risk of crash) or (8k + warning).

The *real* issue is to decide whether we need/want 4k or not, because
I think we're still discussing the subject for no reason, as usual...

Willy

--


From: Andi Kleen <andi@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 10:19 am 2008

Willy Tarreau wrote:
> On Sun, Apr 20, 2008 at 09:27:32AM -0400, Mark Lord wrote:
>> Willy Tarreau wrote:
>>> What would really help would be to have 8k stacks with the lower page
>>> causing a fault and print a stack trace upon first access. That way,
>>> the safe setting would still report us useful information without
>>> putting users into trouble.
>> ..
>>
>> That's the best suggestion from this thread, by far!

Only if you believe that 4K stack pages are a worthy goal.
As far as I can figure out they are not. They might have been
a worthy goal on crappy 2.4 VMs, but these times are long gone.

The "saving memory on embedded" argument also does not
quite convince me, it is unclear if that is really
a significant amount of memory on these systems and if that
couldn't be addressed better (e.g. in running generally
less kernel threads). I don't have numbers on this,
but then the people who made this argument didn't have any
either :)

If anybody has concrete statistics on this
(including other kernel memory users in realistic situations)
please feel free to post them.

>> Can you produce a patch for 2.6.26 for this?
>
> Unfortunately, I can't. I wouldn't know where to start from.

The problem with his suggestion is that the lower 4K of the stack page
are accessed in normal operation too because it contains the thread_struct.
That could be changed, but it would be a relatively large change
because you would need to audit/change a lot of code who assumes
thread_struct and stack are continuous

If that was changed implementing Willy's suggestion would not be that
difficult using cpa() at the cost of some general slowdown in
increased TLB misses and much higher thread creation/tear down cost etc,
Using the alternative vmalloc way has also other issues.

But still the fundamental problem is that it would likely only
hit the interesting cases in real production setups and I don't
think the production users would be very happy to slow down
their kernels and handle strange backtraces just to act as guinea pigs
for something dubious

-Andi

--


From: Jörn <joern@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 12:41 pm 2008

On Sun, 20 April 2008 16:19:29 +0200, Andi Kleen wrote:
>
> Only if you believe that 4K stack pages are a worthy goal.
> As far as I can figure out they are not. They might have been
> a worthy goal on crappy 2.4 VMs, but these times are long gone.
>
> The "saving memory on embedded" argument also does not
> quite convince me, it is unclear if that is really
> a significant amount of memory on these systems and if that
> couldn't be addressed better (e.g. in running generally
> less kernel threads). I don't have numbers on this,
> but then the people who made this argument didn't have any
> either :)

It is not uncommon for embedded systems to be designed around 16MiB.
Some may even have less, although I haven't encountered any of those
lately.

When dealing in those dimensions, savings of 100k are substantial. In
some causes they may be the difference between 16MiB or 32MiB, which
translates to manufacturing costs. In others it simply means that the
system can cache a bit more and run faster, or it can have a little more
functionality.

In most cases it simply allows userspace programmers to avoid looking
harder to save those 100k, as they are already saved in kernel space.
Therefore we made life hard for us in order to make life easier for
someone else, saving them time and money.

Whether that is worth it depends on your personal point of view. Many
embedded people will claim "Hell yes!" Of those that don't, most are
simply ignoring currently mainline kernels and will regret the
development later. They care, thay just don't tend to care enough to
engage in these discussions or even know about them. :(

Jörn

--
Eighty percent of success is showing up.
-- Woody Allen
--


From: Andi Kleen <andi@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 1:19 pm 2008

Jörn Engel wrote:
> On Sun, 20 April 2008 16:19:29 +0200, Andi Kleen wrote:
>> Only if you believe that 4K stack pages are a worthy goal.
>> As far as I can figure out they are not. They might have been
>> a worthy goal on crappy 2.4 VMs, but these times are long gone.
>>
>> The "saving memory on embedded" argument also does not
>> quite convince me, it is unclear if that is really
>> a significant amount of memory on these systems and if that
>> couldn't be addressed better (e.g. in running generally
>> less kernel threads). I don't have numbers on this,
>> but then the people who made this argument didn't have any
>> either :)
>
> It is not uncommon for embedded systems to be designed around 16MiB.

But these are SoC systems. Do they really run x86?
(note we're talking about an x86 default option here)

Also I suspect in a true 16MB system you have to strip down
everything kernel side so much that you're pretty much outside
the "validated by testers" realm that Adrian cares about.

> When dealing in those dimensions, savings of 100k are substantial. In
> some causes they may be the difference between 16MiB or 32MiB, which
> translates to manufacturing costs. In others it simply means that the
> system can cache

If you need the stack you don't have any less cache foot print.
If you don't need it you don't have any either.

-Andi
--


From: Jörn <joern@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 1:43 pm 2008

On Sun, 20 April 2008 19:19:26 +0200, Andi Kleen wrote:
>
> But these are SoC systems. Do they really run x86?
> (note we're talking about an x86 default option here)
>
> Also I suspect in a true 16MB system you have to strip down
> everything kernel side so much that you're pretty much outside
> the "validated by testers" realm that Adrian cares about.

Maybe. I merely showed that embedded people (not me) have good reasons
to care about small stacks. Whether they care enough to actually spend
work on it - doubtful.

> > When dealing in those dimensions, savings of 100k are substantial. In
> > some causes they may be the difference between 16MiB or 32MiB, which
> > translates to manufacturing costs. In others it simply means that the
> > system can cache
>
> If you need the stack you don't have any less cache foot print.
> If you don't need it you don't have any either.

This part I don't understand.

Jörn

--
You ain't got no problem, Jules. I'm on the motherfucker. Go back in
there, chill them niggers out and wait for the Wolf, who should be
coming directly.
-- Marsellus Wallace
--


From: Andi Kleen <andi@...> Subject: Re: x86: 4kstacks default Date: Apr 20, 2:19 pm 2008

Jörn Engel wrote:
> On Sun, 20 April 2008 19:19:26 +0200, Andi Kleen wrote:
>> But these are SoC systems. Do they really run x86?
>> (note we're talking about an x86 default option here)
>>
>> Also I suspect in a true 16MB system you have to strip down
>> everything kernel side so much that you're pretty much outside
>> the "validated by testers" realm that Adrian cares about.
>
> Maybe. I merely showed that embedded people (not me) have good reasons
> to care about small stacks.

Sure but I don't think they're x86 embedded people. Right now there
are very little x86 SOCs if any (iirc there is only some obscure rise
core) and future SOCs will likely have more RAM.

Anyways I don't have a problem to give these people any special options
they need to do whatever they want. I just object to changing the
default options on important architectures to force people in completely
different setups to do part of their testing.

Whether they care enough to actually spend
> work on it - doubtful.
>
>>> When dealing in those dimensions, savings of 100k are substantial. In
>>> some causes they may be the difference between 16MiB or 32MiB, which
>>> translates to manufacturing costs. In others it simply means that the
>>> system can cache
>> If you need the stack you don't have any less cache foot print.
>> If you don't need it you don't have any either.
>
> This part I don't understand.

I was just objecting to your claim that small stack implies smaller
cache foot print. Smaller stacks rarely give you smaller cache foot
print in my kernel coding experience:

First some stack is always safety and in practice unused. It won't
be in cache.

Then typically standard kernel stack pigs are just too large
buffers on the stack which are not fully used. These also
don't have much cache foot print.

Or if you have a complicated call stack the typical fix
is to move parts of it into another thread. But that doesn't
give you less cache footprint because the cache foot print
is just in someone else's stack. In fact you'll likely
have slightly more cache foot print from that due to the
context of the other thread.

In theory if you e.g. convert a recursive algorithm
to iterative you might save some cache foot print, but I don't
think that really happens in kernel code.

-Andi
--


it's a long discussion on lkml already

Tomasz Chmielewski (not verified)
on
April 23, 2008 - 1:53pm

The discussion on 4k stacks pops in on lkml now and then for at least 4 years.

I wonder what were the technical arguments for defaulting to 8k years ago?

Strange question - it's much

Anonymous (not verified)
on
April 23, 2008 - 2:37pm

Strange question - it's much simpler to write code for a larger stack size (think recursion).

So why didn't they choose

Anonymous (not verified)
on
April 23, 2008 - 4:11pm

So why didn't they choose 16k? ;)

Because 8k apparently was

Anonymous (not verified)
on
April 23, 2008 - 4:19pm

Because 8k apparently was enough.

Actually, 640kb should be

Anonymous (not verified)
on
April 23, 2008 - 5:58pm

Actually, 640kb should be enough for anyone.
8k is only enough for linux hippies.

8K stacks

Anonymouse (not verified)
on
April 23, 2008 - 9:35pm

Some processors have a 'page size' of 8192B. If you allocate 8KB stacks, then allocation is trivial on 8KB, 4KB, 2KB, 1KB page sizes (but really, all CPU's I've dealt with over the past 10 years have 4K and 8K page sizes only). The only substantial issue I can think of is that on 8KB page machines, using 4KB would make memory management unnecessarily difficult. So 8K paged machines need to maintain that 8K stack and 4K paged machines can use 4K. Some poor person has got to go through an awful lot of code and make sure that there aren't any problems with merely changing definitions from 8K to 4K.

The issue is "order(1)"

on
April 24, 2008 - 10:29am

Nothing's magic about 4K stacks in general. The issue is the difference between a single page allocation—an "order(0)" allocation—versus a two page allocation—an "order(1)" allocation. Kernel stacks need to have physically contiguous addresses because they need to be "always present," and therefore have trivial mappings not subject to VM management. Finding two physically contiguous pages is much harder than simply "finding a page."

On most architectures, the page size is 4K. This is true on x86 and x86-64, which are the architectures under discussion. Thus, the issue for these architectures is whether they can support a 4K stack size so that kernel stack allocations are always order(0). On architectures with larger page sizes, it's trivial to have order(0) kernel stacks. It seems rather unlikely someone would try for a sub-page kernel stack on a machine with 8K pages. It's the architectures with 4K pages that have this issue.

That's why this feature is generally referred to as "4K kernel stacks," since its the 4K page architectures that impose the maximum stack depth on the rest of the kernel should they try to go for order(0) kernel stack allocations.

On a different note, PPC can also do 64K hardware pages, and there's some evidence this has a huge performance benefit too due to fewer trips through the VM code. (This has been confirmed by building 64K pages out of 4K pages, to sort the kernel benefit from the MMU benefit.) Thus, on PPC with 64K hardware pages, they can have very generous kernel stacks. :-)

--
Program Intellivision and play Space Patrol!

But Linux was developed on

Anonymous (not verified)
on
April 24, 2008 - 3:17pm

But Linux was developed on x86, which has 4k pages. So what is your point again?

recursion is not allowed in

Anonymous (not verified)
on
April 23, 2008 - 5:31pm

recursion is not allowed in kernel code

But there used to be

Anonymous (not verified)
on
April 24, 2008 - 5:25am

But there used to be recursive (which has different meanings, btw) algorithms and the question was a historic one.

"Since the

Anonymous (not verified)
on
April 23, 2008 - 6:36pm

"Since the second-most-common reason for stack overages is ndiswrapper... Well,
with there being so much more hardware now supported directly by the linux
kernel... I'm stunned every time someone tells me "I can't run Linux on my
laptop, there is hardware that isn't supported without me having to get
ndiswrapper". The last time someone said that to me I pointed to the fact
that their hardware is supported by the latest kernel and even offered to
build&install it for them."

Is this guy joking? I use 3 different laptops with Linux (2 work, 1 home) and none of them work without ndiswrapper. As far as I can tell, the only wireless with good support is intel. If you don't have intel, you're kind of screwed without ndiswrapper.

ndiswrapper

Anonymous (not verified)
on
April 24, 2008 - 1:22am

Are you so sure?

A *lot* of people think they need NDISwrapper but are wrong. At some point in the past their card wasn't well supported and ndiswrapper got traction. So now anyone who does a search finds a dozen (old) forums recommending ndiswrapper, and a dozen current people who are confused... but today, many of those cards have very robust native support.

There are, no doubt, some cards which are still not (well) supported and it's possible that you have some of those cards, but I'd bet that all three of your laptops have the same card .. which wouldn't be a fair comparison.

Intel cards are well supported and fairly popular, Atheros cards are very popular and are well supported by the blobby native driver, and some are well supported by the free driver. People used to use ndiswrapper for both of these, and some people still insist that they need it.

You should have mentioned what cards you have...

ndiswrapper vs bcm43xx

Anonymous (not verified)
on
April 25, 2008 - 11:46am

The most common need for using ndiswrapper is having a broadcom wireless chipset.
Most broadcom chips are now supported by bcm43xx but I'm still using ndiswrapper. Why? Because it works better. Bcm43xx loses connection every ~30min and has lower power output (loses connection when behind 2 walls in my room). Ndiswrapper doesn't have these problems. It never crashed my system.
Some would say that ndiswrapper is not free because it has to use the Windows driver. Bcm43xx needs firmware, which has to be extracted from Windows drivers, so what's the difference?

> There are, no doubt, some

Anonymous (not verified)
on
April 27, 2008 - 9:12pm

> There are, no doubt, some cards which are still not (well) supported and it's possible that you have some of those cards, but I'd bet that all three of your laptops have the same card .. which wouldn't be a fair comparison.

Just to give you an idea of the real state of wireless:

The only wireless hardware I've got working without a problem is a D-Link - shame that they no longer stock them in the shop I got it from.

I've also got a US Robotics USB dongle (USR8054-22) that doesn't work at all without ndiswrapper, and then only after making sure the kernel doesn't "helpfully" autoload prism54 locking it up. On the rare occasions when it actually initialises properly and connects to the access point it lasts about an hour before dropping the connection until next reboot. I tried searching around to see if there really was an open driver for this but there's nothing even close to working.

And then there's the Atheros chipset in the eeePC that works, most of the time anyway, with its native linux driver - a binary blob.

32-bit vs 64-bit

on
April 23, 2008 - 7:49pm

Actually, it doesn't surprise me that 32-bit would use more stack space than 64-bit. More function arguments get passed on the stack in the 32-bit model, and there will be more register spills to the stack with the smaller register file. (I'm not 100% sure the stack args argument is completely true with reg-parm=3, but I believe it is.)

The number of actual pointers stored on the stack ought to get dwarfed by both these phenomena.

--
Program Intellivision and play Space Patrol!

Need for 8 KiB stacks

David VomLehn (not verified)
on
April 24, 2008 - 6:44pm

One place where I'm seeing some pretty big stack sizes is running NFS over Ethernet on USB with 32-bit MIPS processors. That seems to just pile on the stack frames. 4 KiB stacks just don't look acceptable with this configuration. Note that this is a development environment for embedded systems.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.