ah!
i got this:
.file ""
.text
.globl foo
.type foo, @function
foo:
.LFB2:
pushq %rbp
.LCFI0:
movq %rsp, %rbp
.LCFI1:
subq $208, %rsp
.LCFI2:
movq __stack_chk_guard(%rip), %rax
movq %rax, -8(%rbp)
xorl %eax, %eax
movl $3, %eax
movq -8(%rbp), %rdx
xorq __stack_chk_guard(%rip), %rdx
je .L3
call __stack_chk_fail
.L3:
leave
ret
but that's F8's gcc 4.1, and not the kernel mode code generator either.
the code you cited looks far better - that's good news!
one optimization would be to do a 'jne' straight into __stack_chk_fail()
- it's not like we ever want to return. [and it's obvious from the
existing stackframe which one the failing function was] That way we'd
have about 3 bytes less per function? We dont want to return to the
original function so for the kernel it would be OK.
another potential optimization would be to exchange this:
into:
pushq %fs:40
subq $80, %rsp
or am i missing something? (is there perhaps an address generation
dependency between the pushq and the subq? Or the canary would be at the
wrong position?)
ok. is -fstack-protector-all basically equivalent to
--param=ssp-buffer-size=0 ? I'm wondering whether it would be easy for
gcc to completely skip stackprotector code on functions that have no
buffers, even under -fstack-protector-all. (perhaps it already does?)
Ingo
--