Re: Regression [Was: Boot hang with stack protector on x86_64]

Previous thread: Re: [PATCH 2.6.25-rc2 5/9] Kconfig: Improve init/Kconfig help descriptions - IKCONFIG etc by Randy Dunlap on Thursday, February 21, 2008 - 5:19 pm. (1 message)

Next thread: [PATCH] kbuild: CONFIG_CC_OPTIMIZE_FOR_SIZE should not be default by Pavel Roskin on Thursday, February 21, 2008 - 5:36 pm. (1 message)
From: James Morris
Date: Thursday, February 21, 2008 - 5:29 pm

I've been getting an immediate hang on boot since:

e06b8b98da071f7dd78fb7822991694288047df0 kbuild: allow -fstack-protector to take effect

This is on an x86_64 system (actually an Intel SDV -- so it might be 
"special").

My .config has the following:

CONFIG_CC_STACKPROTECTOR=y
CONFIG_CC_STACKPROTECTOR_ALL=y

Let me know if you need more information.


Interestingly, git-bisect was unable to identify the changeset, and 
reliably mis-identified this as the problem:

71c044752cdae89136862495f244d37073e2cf66 V4L/DVB (7080): zr364xx: add support for Pentax Optio 50

(which I don't even build).


- James
-- 
James Morris
<jmorris@namei.org>
--

From: Arjan van de Ven
Date: Thursday, February 21, 2008 - 6:14 pm

On Fri, 22 Feb 2008 11:29:37 +1100 (EST)
Ingo has a set of stackprotector fixes that should go to mainline ASAP.


-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--

From: Sam Ravnborg
Date: Friday, February 22, 2008 - 12:59 am

Hi Linus.
This is a regression. Can you please revert this commit.
We can redo it wehn the -fstack-protector stuff are properly fixed in x86.

	Sam

--

From: Ingo Molnar
Date: Friday, February 22, 2008 - 2:36 am

No!

James, could you please check whether x86.git#testing:

   http://people.redhat.com/mingo/x86.git/README

works fine for you? That has all the current stackprotector fixes. I 
plan to send a separate pull request with just the stackprotector fixes 
to Linus, they are looking good in testing so far.

	Ingo
--

From: James Morris
Date: Friday, February 22, 2008 - 3:33 am

Nope, same problem.

(I followed your instructions in the readme exactly).


- James
-- 
James Morris
<jmorris@namei.org>
--

From: Ingo Molnar
Date: Friday, February 22, 2008 - 5:17 am

stupid double check, does "git-log | grep stackpro" give you the fixes:

    This patch adds a simple self-test capability to the stackprotector
    x86: unify stackprotector features
    streamline the stackprotector features under a single option
    x86: stackprotector: mix TSC to the boot canary
    x86: fix the stackprotector canary of the boot CPU
    stackprotector: add boot_init_stack_canary()
    stackprotector: include files

?

Please send me your full .config and the gcc version you used for 
building the failing kernel.

	Ingo
--

From: James Morris
Date: Friday, February 22, 2008 - 6:02 am

config attached.

$ gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
--infodir=/usr/share/info --enable-shared --enable-threads=posix 
--enable-checking=release --with-system-zlib --enable-__cxa_atexit 
--disable-libunwind-exceptions 
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada 
--enable-java-awt=gtk --disable-dssi --enable-plugin 
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre 
--enable-libgcj-multifile --enable-java-maintainer-mode 
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-cpu=generic 
--host=x86_64-redhat-linux
Thread model: posix
gcc version 4.1.2 20071124 (Red Hat 4.1.2-36)


-- 

James Morris 
<jmorris@namei.org>
From: Ingo Molnar
Date: Friday, February 22, 2008 - 6:27 am

ok, i have a slightly older one:

  gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)

i'll try to upgrade to your version to make sure we see the same 
regression.

	Ingo
--

From: Ingo Molnar
Date: Friday, February 22, 2008 - 11:18 pm

James,

could you try the fix below ontop of x86.git#testing, does it solve your 
boot hang?

	Ingo

--------------->
Subject: x86: stackprotector fix: do not zap %gs
From: Ingo Molnar <mingo@elte.hu>
Date: Sat Feb 23 07:06:55 CET 2008

pda_init() puts 0 into %gs - that's wrong because any %gs access will 
fault from now on and we already have a dummy PDA set up that can be 
accessed just fine.

This normally does not matter because almost nothing accesses %gs this 
early ... but the stackprotector now does to read the canary ...

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/setup64.c |    2 --
 1 file changed, 2 deletions(-)

Index: linux-x86.q/arch/x86/kernel/setup64.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/setup64.c
+++ linux-x86.q/arch/x86/kernel/setup64.c
@@ -165,8 +165,6 @@ void pda_init(int cpu)
 { 
 	struct x8664_pda *pda = cpu_pda(cpu);
 
-	/* Setup up data that may be needed in __get_free_pages early */
-	asm volatile("movl %0,%%fs ; movl %0,%%gs" :: "r" (0)); 
 	/* Memory clobbers used to order PDA accessed */
 	mb();
 	wrmsrl(MSR_GS_BASE, pda);
--

From: Ingo Molnar
Date: Saturday, February 23, 2008 - 12:37 am

find below another fix that is somewhat better as it does not affect the 
native kernel and !PARAVIRT.

btw., this also explains why this bug wasnt reported sooner against 
x86.git#testing: people done normally use PARAVIRT on 64-bit yet.
(there is no 64-bit host support)

	Ingo

---------------->
Subject: x86: stackprotector & PARAVIRT fix
From: Ingo Molnar <mingo@elte.hu>
Date: Sat Feb 23 07:06:55 CET 2008

on paravirt enabled 64-bit kernels the paravirt ops do function
calls themselves - which is bad with the stackprotector - for
example pda_init() loads 0 into %gs and then does MSR_GS_BASE
write (which modifies gs.base) - but that MSR write is a function
call on paravirt, which with stackprotector tries to read the
stack canary from the PDA ... crashing the bootup.

the solution was suggested by Arjan van de Ven: to exclude paravirt.c
from stackprotector, too many lowlevel functionality is in it. It's
not like we'll have paravirt functions with character arrays on
their stack anyway...

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/Makefile |    1 +
 1 file changed, 1 insertion(+)

Index: linux-x86.q/arch/x86/kernel/Makefile
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/Makefile
+++ linux-x86.q/arch/x86/kernel/Makefile
@@ -15,6 +15,7 @@ nostackp := $(call cc-option, -fno-stack
 CFLAGS_vsyscall_64.o	:= $(PROFILING) -g0 $(nostackp)
 CFLAGS_hpet.o		:= $(nostackp)
 CFLAGS_tsc_64.o		:= $(nostackp)
+CFLAGS_paravirt.o	:= $(nostackp)
 
 obj-y			:= process_$(BITS).o signal_$(BITS).o entry_$(BITS).o
 obj-y			+= traps_$(BITS).o irq_$(BITS).o
--

From: James Morris
Date: Sunday, February 24, 2008 - 3:53 pm

-- 
James Morris
<jmorris@namei.org>
--

From: Ingo Molnar
Date: Monday, February 25, 2008 - 1:23 am

thanks. We'll delay the stackprotector fixes for v2.6.26 (it's been 
broken for too long), but if you want to have it you can pick it up from 
x86.git.

	Ingo
--

From: Linus Torvalds
Date: Friday, February 22, 2008 - 8:43 am

Not really. The thing is, CONFIG_CC_STACKPROTECTOR has never done anything 
at all, now it does, and it shows that it never worked.

So the commit that made it do something shouldn't be reverted, but 
CONFIG_CC_STACKPROTECTOR should be marked BROKEN, because it obviously is 
broken right now.

But keeping the config option, and just not making it do anything is 
misleading and wrong.

So just something like this? To make sure normal people don't enable it..

			Linus

---
 arch/x86/Kconfig |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3be2305..4a88cf7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1054,7 +1054,7 @@ config SECCOMP
 
 config CC_STACKPROTECTOR
 	bool "Enable -fstack-protector buffer overflow detection (EXPERIMENTAL)"
-	depends on X86_64 && EXPERIMENTAL
+	depends on X86_64 && EXPERIMENTAL && BROKEN
 	help
          This option turns on the -fstack-protector GCC feature. This
 	  feature puts, at the beginning of critical functions, a canary
--

From: Arjan van de Ven
Date: Friday, February 22, 2008 - 8:59 am

On Fri, 22 Feb 2008 07:43:08 -0800 (PST)


looks fair; once the patches to make it work are merged the B0RKEN can
go away again easliy
--

From: Sam Ravnborg
Date: Friday, February 22, 2008 - 9:12 am

Whatever is fine with me - I hope Ingo & Co fixes the root cause soon so
we can get the right fix in.

	Sam
--

Previous thread: Re: [PATCH 2.6.25-rc2 5/9] Kconfig: Improve init/Kconfig help descriptions - IKCONFIG etc by Randy Dunlap on Thursday, February 21, 2008 - 5:19 pm. (1 message)

Next thread: [PATCH] kbuild: CONFIG_CC_OPTIMIZE_FOR_SIZE should not be default by Pavel Roskin on Thursday, February 21, 2008 - 5:36 pm. (1 message)