do_mount() uses a whopping 616 bytes of stack on x86_64 in
2.6.24-mm1, largely thanks to gcc inlining the various helper
functions.noinlining these can slim it down a lot; on my box this patch
gets it down to 168, which is mostly the struct nameidata nd;
left on the stack.These functions are called only as do_mount() helpers;
none of them should be in any path that would see a performance
benefit from inlining...Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---Index: linux-2.6.24-mm1/fs/namespace.c
===================================================================
--- linux-2.6.24-mm1.orig/fs/namespace.c
+++ linux-2.6.24-mm1/fs/namespace.c
@@ -1296,7 +1296,7 @@ out_unlock:
/*
* recursively change the type of the mountpoint.
*/
-static int do_change_type(struct nameidata *nd, int flag)
+static noinline int do_change_type(struct nameidata *nd, int flag)
{
struct vfsmount *m, *mnt = nd->path.mnt;
int recurse = flag & MS_REC;
@@ -1320,7 +1320,7 @@ static int do_change_type(struct nameida
/*
* do loopback mount.
*/
-static int do_loopback(struct nameidata *nd, char *old_name, int recurse)
+static noinline int do_loopback(struct nameidata *nd, char *old_name, int recurse)
{
struct nameidata old_nd;
struct vfsmount *mnt = NULL;
@@ -1387,7 +1387,7 @@ static int change_mount_flags(struct vfs
* If you've mounted a non-root directory somewhere and want to do remount
* on it - tough luck.
*/
-static int do_remount(struct nameidata *nd, int flags, int mnt_flags,
+static noinline int do_remount(struct nameidata *nd, int flags, int mnt_flags,
void *data)
{
int err;
@@ -1425,7 +1425,7 @@ static inline int tree_contains_unbindab
return 0;
}-static int do_move_mount(struct nameidata *nd, char *old_name)
+static noinline int do_move_mount(struct nameidata *nd, char *old_name)
{
struct nameidata old_nd, parent_nd;
struct vfsmount *p;
@@ -1505,7 +1505,7 @@ out:
* create a new mount for userspace and re...
On Wed, 06 Feb 2008 16:13:58 -0600
hm, sizeof(nameidata)=136 and I can see about three of them on the stack.
Does the patch actually help? I mean, if a() calls b() and both use N
bytes of locals, our worst-case stack usage remains ~2N whether or not b()
was inlined in a()? In fact, uninlining makes things a little worse due toThere's no way for the reader to work out why this is here, so I do think
it should be commented somewhere.--
(updated with comments about the noinlines, and a fix for
an >80 char line)do_mount() uses a whopping 616 bytes of stack on x86_64 in
2.6.24-mm1, largely thanks to gcc inlining the various helper
functions.noinlining these can slim it down a lot; on my box this patch
gets it down to 168, which is mostly the struct nameidata nd;
left on the stack.These functions are called only as do_mount() helpers;
none of them should be in any path that would see a performance
benefit from inlining...Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---Index: linux-2.6.24-mm1/fs/namespace.c
===================================================================
--- linux-2.6.24-mm1.orig/fs/namespace.c
+++ linux-2.6.24-mm1/fs/namespace.c
@@ -1295,8 +1295,9 @@ out_unlock:/*
* recursively change the type of the mountpoint.
+ * noinline this do_mount helper to save do_mount stack space.
*/
-static int do_change_type(struct nameidata *nd, int flag)
+static noinline int do_change_type(struct nameidata *nd, int flag)
{
struct vfsmount *m, *mnt = nd->path.mnt;
int recurse = flag & MS_REC;
@@ -1319,8 +1320,10 @@ static int do_change_type(struct nameida/*
* do loopback mount.
+ * noinline this do_mount helper to save do_mount stack space.
*/
-static int do_loopback(struct nameidata *nd, char *old_name, int recurse)
+static noinline int do_loopback(struct nameidata *nd, char *old_name,
+ int recurse)
{
struct nameidata old_nd;
struct vfsmount *mnt = NULL;
@@ -1386,8 +1389,9 @@ static int change_mount_flags(struct vfs
* change filesystem flags. dir should be a physical root of filesystem.
* If you've mounted a non-root directory somewhere and want to do remount
* on it - tough luck.
+ * noinline this do_mount helper to save do_mount stack space.
*/
-static int do_remount(struct nameidata *nd, int flags, int mnt_flags,
+static noinline int do_remount(struct nameidata *nd, int flags, int mnt_flags,
void *data)
{
int...
On Wed, 06 Feb 2008 17:11:38 -0600
What we could do here is defined a new noinline_because_of_stack_suckiness
and use that. Reasons:- self-documenting, so we don't need to comment each site
- can be made a no-op for suitable __GNUC__ values if gcc ever fixes this
- our children we can go through and delete them all when nobody is using
the offending gcc versions any more.what thinkest thou?
--
In theory it should be already fixed; iirc Richard H. (cc'ed) added
code for this somewhere in 4.x. Don't quite remember which x, likely
either 1 or 2.e.g. if I do a quick test here on gcc 4.2 then it definitely
reuses stack slots between inlines. As you can see only ~100 bytes
are allocated, not ~200.-Andi
% cat ts.c
static inline a(void)
{
char x[100];
extf(x);
}static inline b(void)
{
char y[100];
extf(y);
}f()
{
a();
b();
}
% gcc -O2 -S ts.c
% cat ts.s
...
f:
.LFB4:
pushq %rbx
.LCFI0:
xorl %eax, %eax
subq $112, %rsp
.LCFI1:
movq %rsp, %rdi
call extf
movq %rsp, %rdi
xorl %eax, %eax
call extf
addq $112, %rsp
popq %rbx
ret
...
%--
On gcc 4.1.2 and 4.3 (fedora flavors) I don't see it re-used in
do_mount, though... *shrug*-Eric
--
Frankly, a wrapper for path_lookup() that would take struct path *,
refuse to do LOOKUP_PARENT (i.e. guaranteed to have nothing stored
in nameidata other than ->mnt and ->dentry) and copied result of
lookup into passed struct path * would help a lot more.Taking it to fs/namei.c would prevent inlining just fine and we'd
be left with pair of pointers in caller's stack frame instead of
full struct nameidata. All users in fs/namespace.c can be trivially
converted to that and AFAICS it would save a lot more than we would
get from making everything in there not inlined.I'll do that in fs/namei.c sanitizing series (opens as soon as ro-bind
stuff goes in).
--
Something like:
Index: linux-2.6.24-mm1/include/linux/compiler-gcc.h
===================================================================
--- linux-2.6.24-mm1.orig/include/linux/compiler-gcc.h
+++ linux-2.6.24-mm1/include/linux/compiler-gcc.h
@@ -53,3 +53,9 @@
#define noinline __attribute__((noinline))
#define __attribute_const__ __attribute__((__const__))
#define __maybe_unused __attribute__((unused))
+
+/*
+ * When gcc inlines multiple functions into a parent function,
+ * the stack space used sometimes increases excessively...
+ */
+#define noinline_stackspace noinline?
I couldn't think of a great name for it. There are several noinline
users throughout the kernel with stackspace related comments, so if
desired, I could sprinkle this around. I'm not very pleased with it
aesthetically though. :)-Eric
--
On Thu, 07 Feb 2008 17:08:19 -0600
I think it's fine. People can go look at the definition site, read the
comment and come away happy.
--
On Thu, 07 Feb 2008 17:08:19 -0600
how about just __stackhog ?
--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
Yes, sounds very good to me. I'm sure there are more places which could
use this.Or perhaps there are some magic flags to gcc so that it doesn't inline
so aggressively in situations like this? IOW is it gcc breakage, or
design - but maybe with defaults that don't fit well in the limited
stack... (well, I suppose the "for N mutually exclusive helper functions
each with stack usage S, use about N*S stack when inlining them all" bit
probably isn't a feature.)FWIW the XFS team finally just wrested control from GCC.
http://oss.sgi.com/archives/xfs/2006-11/msg00195.html-Eric
--
On Wed, 06 Feb 2008 17:34:39 -0600
The auto inlining is OK. The problem is that when gcc handles this:
static inline foo()
{
char a[10];
}static inline bar()
{
char a[10];
}zot()
{
foo();
bar();
}it will use 20 bytes of stack instead of using the same 10 bytes for both
"calls". It doesn't need to do that, and other compilers avoid it, iirc.Now, it _used_ to be the case that when presented with this:
foo()
{
{
char a[10];bar(a);
}
{
char a[10];bar(a);
}
}gcc would also consume 20 bytes of stack. But I see that this is fixed in
gcc-4.0.3.These two things are equivalent and hopefully gcc will soon fix the
inlined-functions-use-too-much-stack thing. Once that happens, we don'tyup.
--
I think it does.
[linux-2.6.24-mm1]$ make fs/namespace.o > /dev/null
[linux-2.6.24-mm1]$ objdump -d fs/namespace.o | scripts/checkstack.pl
x86_64 | grep do_mount
0x00002307 do_mount [namespace.o]: 616
[linux-2.6.24-mm1]$ quilt push
Applying patch patches/do_mount_stack
patching file fs/namespace.cNow at patch patches/do_mount_stack
[linux-2.6.24-mm1]$ make fs/namespace.o > /dev/null
[linux-2.6.24-mm1]$ objdump -d fs/namespace.o | scripts/checkstack.pl
x86_64 | grep do_mount
0x00002a8b do_mount [namespace.o]: 168So clearly that one function is reduced. But it's more than that....
I guess the problem is a() calls b() or c() or d() or e() or f(), and
gcc adds up all that stack usage, or seems to, and we get more like 6N
regardless of the path taken.For example, 2 of the helper functions, once un-inlined, are:
0x00001fd9 do_move_mount [namespace.o]: 288
0x00001e94 do_loopback [namespace.o]: 168so it looks like we do carry that baggage even if we go the
Ok, good point, will resend... want a comment on each, or perhaps above
do_mount? I suppose on each.-Eric
--
On Wed, 6 Feb 2008 14:34:57 -0800
it gets interesting at the three-way..
if a() calls b() and then calls c(), and they all use N,
the total usage is now 3N not 2N.(although current gcc is already somewhat smarter about this, and 3N might actually be 2N for some cases)
--
*nod*
It'd be nice if it could be max of (a,b,c) though. Or maybe compilers
on x86, gcc 4.1.2, do_mount goes from 360 to 112 bytes w/ the patch I sent.
with gcc 4.3, it goes from 364 to 104.
-ERic
--
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| Roland McGrath | Re: Linus 2.6.23-rc1 |
| James Bottomley | Re: Announce: Linux-next (Or Andrew's dream :-)) |
| Greg Kroah-Hartman | [PATCH 004/196] Chinese: add translation of SubmittingPatches |
git: | |
| Gerrit Renker | [PATCH 13/37] dccp: Deprecate Ack Ratio sysctl |
| Corey Minyard | [PATCH 3/3] Convert the UDP hash lock to RCU |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | [GIT]: Networking |
