I am seeing random kernel and userland application crashes on a Powerbook running a 2.6.27-rc3 based kernel (wireless-testing.git). The crashes did recently appear. It might be the case that they were introduced with the merge of 2.6.27-rc1 into wireless-testing. I'm not sure on that one, however. Just a guess. I still need to do more testing (also on vanilla upstream kernels). The crashes are completely random and they look like bad hardware. However I cannot reproduce on 2.6.25.9 (That's a kernel I still had installed, so I tried that one). So it most likely is _not_ caused by faulty hardware. The crashes are hard to reproduce, and happen about every 20 minutes when compiling a kernel tree. (gcc segfaults). Sometimes the kernel oopses in random places with pointer dereference faults. Is this a known issue? I'm going to bisect this one, but it will take a lot of time, as reproducing takes about 20 minutes. So that's about an hour for one test round. The kernel configuration is the following: # # Automatically generated make config: don't edit # Linux kernel version: 2.6.27-rc3 # Fri Aug 22 18:57:55 2008 # # CONFIG_PPC64 is not set # # Processor support # CONFIG_6xx=y # CONFIG_PPC_85xx is not set # CONFIG_PPC_8xx is not set # CONFIG_40x is not set # CONFIG_44x is not set # CONFIG_E200 is not set CONFIG_PPC_FPU=y CONFIG_ALTIVEC=y CONFIG_PPC_STD_MMU=y CONFIG_PPC_STD_MMU_32=y # CONFIG_PPC_MM_SLICES is not set # CONFIG_SMP is not set CONFIG_PPC32=y CONFIG_WORD_SIZE=32 CONFIG_PPC_MERGE=y CONFIG_MMU=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_HARDIRQS=y # CONFIG_HAVE_SETUP_PER_CPU_AREA is not ...
Random guess: CONFIG_FRAME_POINTER=y CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y Note sure what those together do, check if you have any file compiled with -fno-omit-frame-pointer and if you do, try to change things so that you don't ... we found some miscompiles when that is set, exposed by FTRACE typically (which you don't have enabled) but possibly by other things. Ben. --
Ok, thanks for the suggestion. I could reproduce the crash with 2.6.26, so this is not a regression between 2.6.26 and 2.6.27-rcX. I'm currently running longer tests on 2.6.25 again to make sure it really isn't hardware related. NO_NO_OMIT is a brain screwer, btw :) -- Greetings Michael. --
Thanks for your random guess.
The following workaround seems to fix the crashes on powerpc.
However, this patch is clearly not what we want for other architectures,
as they might need -fno-omit-frame-pointer to function properly.
I reproduced the random crashes of kernel and userspace applications
(without the following patch) on a vanilla 2.6.26 and 2.6.27-rc{1-4}
kernel. I did _not_ try a 2.6.25 kernel with -fno-omit-frame-pointer, so
I don't know if it would also crash then.
I'm currently running more tests on a patched 2.6.27-rc4 kernel, but it
didn't crash, yet. I already did 5 complete kernel tree compilations. It
should have crashed by now, but it didn't :)
The compiler is:
gcc (GCC) 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)
Index: linux-2.6/Makefile
===================================================================
--- linux-2.6.orig/Makefile 2008-08-24 11:49:53.000000000 +0200
+++ linux-2.6/Makefile 2008-08-24 12:16:42.000000000 +0200
@@ -523,13 +523,13 @@ endif
# Force gcc to behave correct even for buggy distributions
# Arch Makefiles may override this setting
KBUILD_CFLAGS += $(call cc-option, -fno-stack-protector)
ifdef CONFIG_FRAME_POINTER
-KBUILD_CFLAGS += -fno-omit-frame-pointer -fno-optimize-sibling-calls
+KBUILD_CFLAGS += -fno-optimize-sibling-calls
else
KBUILD_CFLAGS += -fomit-frame-pointer
endif
ifdef CONFIG_DEBUG_INFO
KBUILD_CFLAGS += -g
Index: linux-2.6/kernel/Makefile
===================================================================
--- linux-2.6.orig/kernel/Makefile 2008-08-24 11:50:23.000000000 +0200
+++ linux-2.6/kernel/Makefile 2008-08-24 12:15:54.000000000 +0200
@@ -92,13 +92,13 @@ obj-$(CONFIG_SMP) += sched_cpupri.o
ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
# According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
# needed for x86 only. Why this used to be enabled for all architectures is beyond
# me. I suspect most platforms don't need this, but until we know ...This has a better chance to be accepted. :-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 8b5a7d3..f9a2e48 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -394,7 +394,7 @@ config LOCKDEP bool depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT select STACKTRACE - select FRAME_POINTER if !X86 && !MIPS + select FRAME_POINTER if !X86 && !MIPS && !PPC select KALLSYMS CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER is already enabled on powerpc. Andreas. -- Andreas Schwab, SuSE Labs, schwab@suse.de SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." --
This is not what my patch is doing. Your patch always forces FRAME_POINTER off. At least as far as lockdep is concerned. What about other parts of the kernel that enable FRAME_POINTER? I think this should be fixed in the makefile by substitution of -fno-omit-frame-pointer on PPC (and probably depending on the compiler version). Otherwise, if somebody else decides to do select FRAME_POINTER in some other code, the bug will reappear. I'm also not sure if it's desired to always force FRAME_POINTER off. -- Greetings Michael. --
Unfortunately, that won't solve the FTRACE problem. --
Well, and -pg requires it, even on powerpc, so that won't work for ftrace. Any chance you can try the workaround that segher proposed though ? http://penguinppc.de/~segher/0001-powerpc-Workaround-for-the-ftrace-problem.patch His workaround only kicks in with CONFIG_FTRACE, that would have to be fixed of course. Also, I suspect the bits that have -pg in a flag "remove" section should have also "fno-omit-frame-pointer" in that Thanks ! Ben. --
This bug is causing random crashes (http://bugzilla.kernel.org/show_bug.cgi?id=11414). -fomit-frame-pointer is only needed on powerpc when -pg is also supplied. This patch ensures that CONFIG_FRAME_POINTER is only selected by ftrace. When CONFIG_FTRACE is enabled we also pass -mno-sched-epilog to work around the codegen bug Patch based on work by: Andreas Schwab <schwab@suse.de> Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> --- arch/powerpc/Makefile | 5 +++++ arch/powerpc/kernel/Makefile | 7 ++++--- arch/powerpc/platforms/powermac/Makefile | 2 +- lib/Kconfig.debug | 6 +++--- 4 files changed, 13 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 9155c93..c6be19e 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -116,6 +116,11 @@ ifeq ($(CONFIG_6xx),y) KBUILD_CFLAGS += -mcpu=powerpc endif +# Work around a gcc code-gen bug with -fno-omit-frame-pointer. +ifeq ($(CONFIG_FTRACE),y) +KBUILD_CFLAGS += -mno-sched-epilog +endif + cpu-as-$(CONFIG_4xx) += -Wa,-m405 cpu-as-$(CONFIG_6xx) += -Wa,-maltivec cpu-as-$(CONFIG_POWER4) += -Wa,-maltivec diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 64f5948..946daea 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -14,12 +14,13 @@ endif ifdef CONFIG_FTRACE # Do not trace early boot code -CFLAGS_REMOVE_cputable.o = -pg -CFLAGS_REMOVE_prom_init.o = -pg +CFLAGS_REMOVE_cputable.o = -pg -mno-sched-epilog +CFLAGS_REMOVE_prom_init.o = -pg -mno-sched-epilog +CFLAGS_REMOVE_btext.o = -pg -mno-sched-epilog ifdef CONFIG_DYNAMIC_FTRACE # dynamic ftrace setup. -CFLAGS_REMOVE_ftrace.o = -pg +CFLAGS_REMOVE_ftrace.o = -pg -mno-sched-epilog endif endif diff --git a/arch/powerpc/platforms/powermac/Makefile b/arch/powerpc/platforms/powermac/Makefile index ...
