[PATCH] /dev/mem gcc weak function workaround

Previous thread: Linux 2.6.24.4+mmap with overcommit >= 1 by Sami Farin on Tuesday, April 29, 2008 - 6:19 pm. (4 messages)

Next thread: linux-next: upstream build failure: v4l/dvb by Stephen Rothwell on Tuesday, April 29, 2008 - 6:41 pm. (5 messages)
From: Venki Pallipadi
Date: Tuesday, April 29, 2008 - 6:31 pm

Some flavors of gcc 4.1.0 and 4.1.1 seems to have trouble understanding
weak function definitions. Calls to function from the same file where it is
defined as weak _may_ get inlined, even when there is a non-weak definition of
the function elsewhere. I tried using attribute 'noinline' which does not
seem to help either.

One workaround for this is to have weak functions defined in different
file as below. Other possible way is to not use weak functions and go back
to ifdef logic.

There are few other usages in kernel that seem to depend on weak (and noinline)
working correctly, which can also potentially break and needs such workarounds.
Example -
mach_reboot_fixups() in arch/x86/kernel/reboot.c is one such call which
is getting inlined with a flavor of gcc 4.1.1.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

---
 drivers/char/Makefile     |    2 +-
 drivers/char/mem.c        |   23 +----------------------
 drivers/char/mem_weak.c   |   25 +++++++++++++++++++++++++
 include/asm-x86/io.h      |    1 -
 include/asm-x86/pgtable.h |    2 --
 include/linux/mmap.h      |   15 +++++++++++++++
 6 files changed, 42 insertions(+), 26 deletions(-)

Index: linux-2.6/drivers/char/mem_weak.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/drivers/char/mem_weak.c	2008-04-29 18:14:10.000000000 -0700
@@ -0,0 +1,25 @@
+
+#include <linux/mmap.h>
+
+void __attribute__((weak)) unxlate_dev_mem_ptr(unsigned long phys, void *addr)
+{
+}
+
+int __attribute__((weak)) phys_mem_access_prot_allowed(struct file *file,
+	unsigned long pfn, unsigned long size, pgprot_t *vma_prot)
+{
+	return 1;
+}
+
+void __attribute__((weak))
+map_devmem(unsigned long pfn, unsigned long len, pgprot_t prot)
+{
+	/* nothing. architectures can override. */
+}
+
+void __attribute__((weak))
+unmap_devmem(unsigned long pfn, unsigned long len, ...
From: David Miller
Date: Tuesday, April 29, 2008 - 9:28 pm

From: Venki Pallipadi <venkatesh.pallipadi@intel.com>

This sounds like a bug.  And if gcc does multi-file compilation it
can in theory do the same mistake even if you move it to another
file.

We need something more bulletproof here.

Also, we have a macro for marking things weak "__weak" which should
be used here.
--

From: Pallipadi, Venkatesh
Date: Wednesday, April 30, 2008 - 5:49 am

The references here
http://gcc.gnu.org/ml/gcc-bugs/2006-05/msg02801.html
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27781

seem to suggest that the bug is only with weak definition in the same
file.
So, having them in a different file should be good enough workaround
here.


Yes. Will change that.

Thanks,
Venki
--

From: Tom Rini
Date: Wednesday, April 30, 2008 - 1:15 pm

Yes, that matches my recollection.  some versions of gcc 4.1.0 (and
possibly 4.1.1, it wouldn't be hard to check) could not handle having a
regular and weak function in the same file (I hit this trying to do
some abstraction or another in kgdb, using includes) but it was correct
if in separate files.  I assume blacklisting 4.1.0 is out of the
question :)

-- 
Tom Rini
--

From: Adrian Bunk
Date: Thursday, May 1, 2008 - 2:56 pm

A workaround here is the wrong solution since this isn't the only place 
that suffers from this issue.

We currently give a #warning for 4.1.0.
But not for 4.1.1.
(Accordingto the bug >= 4.1.2 is fixed.)

And a #warning is not enough.

The huge problem is that "empty __weak function in the same file and 
non-weak arch function" has recently become a common pattern with 
several new usages added during this merge window alone.

And the breakages can be very subtle runtime breakages.

I see only the following choices:
- remove __weak and replace all current usages
- move all __weak functions into own files, and ensure that also happens
  for future usages

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Andrew Morton
Date: Thursday, May 1, 2008 - 3:20 pm

On Fri, 2 May 2008 00:56:33 +0300


Can we detect the {0,1}?  __GNUC_EVEN_MORE_MINOR__?

Yes, I guess we should ban 4.1.x.  Ouch.

--

From: Linus Torvalds
Date: Thursday, May 1, 2008 - 3:27 pm

It's __GNUC_PATCHLEVEL__, I believe.

So yes, we can distinguish 4.1.2 (good, and very common) from 4.1.{0,1} 
(bad, and rather uncommon).

And yes, considering that 4.1.1 (and even more so 4.1.0) should be rare to 
begin with, I think it's better to just not support it.

			Linus
--

From: Andrew Morton
Date: Thursday, May 1, 2008 - 3:33 pm

On Thu, 1 May 2008 15:27:26 -0700 (PDT)

Drat.  There go my alpha, i386, m68k, s390, sparc and powerpc
cross-compilers.  Vagard, save me!

Meanwhile I guess I can locally unpatch that patch.


--

From: Tom Rini
Date: Thursday, May 1, 2008 - 4:24 pm

I know I'll come off as an ass, but you can't make new ones with 4.1.2?
It's not like we're talking about gcc 2.95/96 fun here :)

-- 
Tom Rini
--

From: Andrew Morton
Date: Thursday, May 1, 2008 - 4:59 pm

On Thu, 1 May 2008 16:24:47 -0700

Honestly, I nearly died when I built all those cross-compilers.  Sooooooo
many combinations of gcc/binutils/glibc refused to work for obscure
reasons.  Compilation on x86_64 just didn't work at all and I ended up
having to build everything on a slow i386 box, etc, etc.  The stream of
email to Dan got increasingly strident ;)

I think crosstool has become a lot better since then, judging from the ease
with which Jens was able to spin up the powerpc compiler, but the trauma
was a life-long thing.

Vegard has been making noises about (finally!) preparing and maintaining a
decent set of cross-compilers for us.  It would be a great service (begs).

--

From: Justin Mattock
Date: Thursday, May 1, 2008 - 5:21 pm

On Thu, May 1, 2008 at 11:59 PM, Andrew Morton

I hate to jump in like this, but I noticed something while compiling
madwifi as well with
gcc, I'm wondering if this is related to the above:
/home/name/Desktop/madwifi-trunk-r3574-20080426/Makefile:50
Makefile.inc: no such file or directory
even though the files exists,


Makefile.inc98: scripts/get_arch.mk: no such file or directory
Makefile.inc:102 ath_hal/ah_target.inc: no such file or directory
Makefile.inc:163 *** TARGET i386-elf is invalid, invalid target are: . Stop.
make[1] *** [modules] Error 2

After modifying the  Makefiles you're left with ***TARGET i386-elf problem

regards;

-- 
Justin P. Mattock
--

From: Vegard Nossum
Date: Friday, May 2, 2008 - 12:18 am

Hey :)


It's true; crosstool hasn't been much of a trouble (for me anyway). A

Noises? Heh...

So I've only just begun that work. For i386 host we have the following targets:
alpha, arm, ia64, sparc64, sparc, x86_64

though these are largely untested with Linux Kernel. Most of them are
gcc 3.4.5 (this should in theory compile the kernel correctly), though
I found an e-mail from Ingo saying that you need the -tls versions for
stackprotector to work correctly. This might be a good time to ask if
I should be making gcc 4.1.2s instead. I need to recompile in either
case.

In other words, we are NOT fully operational yet :-)

Though H. Peter gave me a directory to stuff it in, you could try it
out if you're impatient:

http://www.kernel.org/pub/tools/crosstool/

(Hm, it seems that all my .asc files disappeared in the transition.. Oh well.)


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--

From: Theodore Tso
Date: Friday, May 2, 2008 - 6:43 am

Out of curiosity, are you using the stock gcc/binutils from the FSF,
or one of the distro toolchains?  When we were investigating which
compiler/toolchain to use for the LSB Sample Implementation, one of
the comments that I got from folks like Eric Troan from rPath and
others was that out-of-the-box compiler/toolchain had so many bugs,
particularly on non-x86 architectures, that they found they were much
better off using a distro-patched/bug-fixed compiler/toolchain and
treating that as the upstream.

There has been some requests to include cross-compilation
functionality into the LSB Build Environment, so I'd be interested in
seeing how your work has been going, and maybe there's some
opportunity to work together.  Is there a mailing list you have for
this project?

						- Ted
--

From: Adrian Bunk
Date: Friday, May 2, 2008 - 1:10 am

I have my binutils 2.18.50.0.6 / gcc 4.3 based cross toolchains for an 
i386 host here (no glibc included since I'm using them only for kernel 
testcompiles).

If you are using an x86 machine I can rebuild them without -march=k8 
and give you a copy.

cu
Adrian


-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Andi Kleen
Date: Friday, May 2, 2008 - 2:09 am

This might be pointing out the obvious, but to just update the compiler
from 4.1.0 to 4.1.2 (or other) you don't need to recompile binutils and glibc.

-Andi
--

From: Venki Pallipadi
Date: Thursday, May 1, 2008 - 3:35 pm

Not sure whether #error on gcc 4.1.{0.1} is the right thing as I found atleast
one distro gcc which says itself as 4.1.1, do not exhibit the problem as it
most likely has fix backported.

Putting all weak functions in one file is something Suresh and I considered
before sending this patch. But, looking at various users of __weak, that
single file did not look very appropriate.

Thanks,
Venki
--

From: Andrew Morton
Date: Thursday, May 1, 2008 - 3:42 pm

On Thu, 1 May 2008 15:35:15 -0700

Is there some vaguely maintainable workaround we can do?  If the problem
only affects completely-empty weak functions then we could put something in
them to make them non-empty?

#if __GNUC__ == ...
#define weak_function_filler for ( ; ; );
#else
#define weak_function_filler()
#endif

because __weak and attribute((weak)) are pretty easy to grep for. Dunno.

--

From: Jakub Jelinek
Date: Thursday, May 1, 2008 - 3:49 pm

for (;;); isn't enough, the function would be still considered const and by
4.1.0 and some 4.1.1 incorrectly optimized out, without regard to weak
attribute.
But e.g.
asm ("");
should be enough.

	Jakub
--

From: Tom Rini
Date: Thursday, May 1, 2008 - 4:21 pm

On Thu, May 01, 2008 at 03:42:38PM -0700, Andrew Morton wrote:


My memory is a tiny bit hazy (it was a while ago), but no, it's not just
empty functions (again, I _think_ I hit it with a generic vs arch weak
function).

-- 
Tom Rini
--

From: Venki Pallipadi
Date: Thursday, May 1, 2008 - 4:30 pm

Other thing we observed was: this does not just depend on the __weak
function definition. It also depends on where the function is called from.

__weak function with single return statement, did not get inlined when called
from say

caller()
{
	function();
}

but got inlined when called as in

caller()
{
	for (;;) {
		function();
	}
}

Thanks,
Venki

--

From: Linus Torvalds
Date: Thursday, May 1, 2008 - 5:34 pm

Is it always about inlining? If so, can't we add a __noinline__ to the 
declaration of __weak?

Something like this oneliner?

		Linus

---
 include/linux/compiler-gcc.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 5c8351b..4061fc7 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -41,7 +41,7 @@
 
 #define __deprecated			__attribute__((deprecated))
 #define __packed			__attribute__((packed))
-#define __weak				__attribute__((weak))
+#define __weak				__attribute__((weak)) noinline
 #define __naked				__attribute__((naked))
 #define __noreturn			__attribute__((noreturn))
 
--

From: Suresh Siddha
Date: Thursday, May 1, 2008 - 5:39 pm

We tried that and it was still getting inlined.

thanks,
suresh
--

From: Jeremy Fitzhardinge
Date: Friday, May 2, 2008 - 2:11 pm

That's a pity.  I've worked around this bug with noinline before.

    J
--

From: David Miller
Date: Friday, May 2, 2008 - 3:02 pm

From: Jeremy Fitzhardinge <jeremy@goop.org>

It's "constness", as Jakub mentioned, not inlineability, that triggers
this bug.

That's why his workaround of using an empty asm("") to the function is
an effective workaround, because the compiler can no longer internally
decide that the function is "const".
--

From: Tom Rini
Date: Thursday, May 1, 2008 - 4:23 pm

Really?  At the time this was a very uncommon thing (hence the initial
it's not a bug, you just didn't use the right flags) comments.  I
suppose it's possible of course that some distro took a 4.1 snapshot and

Indeed.  I suspect that even if you go so far as to do a single patch
per "feature", it's gonna be a lotta stuff.

-- 
Tom Rini
--

From: David Miller
Date: Thursday, May 1, 2008 - 3:51 pm

From: Linus Torvalds <torvalds@linux-foundation.org>

This is my impression as well.
--

From: Adrian Bunk
Date: Thursday, June 26, 2008 - 3:37 am

Some talk and one and a half months later we still don't abort the build 
for these gcc version.

If anyone has a better patch please step forward - otherwise I'd propose 

cu
Adrian


<--  snip  -->


gcc 4.1.0 and 4.1.1 are known to miscompile the kernel:
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27781

Usage of weak functions has become a common pattern in the kernel, and 
usages get added in each kernel version increasing the probability of 
bugs with each kernel release.

This miscompilation of weak functions can result in subtle runtime 
errors.

#error for gcc 4.1.0 and 4.1.1 to prevent users from running into
this bug.

Note:
We already printed a #warning for gcc 4.1.0 due to a different bug.

Signed-off-by: Adrian Bunk <bunk@kernel.org>

---
ee78871a1d85fe60958748c208389adb4031fefe diff --git a/init/main.c b/init/main.c
index f7fb200..bede344 100644
--- a/init/main.c
+++ b/init/main.c
@@ -76,8 +76,9 @@
  * trouble.
  */
 
-#if __GNUC__ == 4 && __GNUC_MINOR__ == 1 && __GNUC_PATCHLEVEL__ == 0
-#warning gcc-4.1.0 is known to miscompile the kernel.  A different compiler version is recommended.
+/*  due to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27781  */
+#if __GNUC__ == 4 && __GNUC_MINOR__ == 1 && (__GNUC_PATCHLEVEL__ == 0 || __GNUC_PATCHLEVEL__ == 1)
+#error gcc 4.1.0 and 4.1.1 are known to miscompile the kernel.
 #endif
 
 static int kernel_init(void *);

--

From: Jeremy Fitzhardinge
Date: Friday, May 2, 2008 - 2:09 pm

- make __weak also include noinline.  I think that's sufficient (at 
least it was when I encountered a gcc bug with these symptoms last year 
or so).

    J
--

From: Adrian Bunk
Date: Friday, May 2, 2008 - 2:19 pm

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

Previous thread: Linux 2.6.24.4+mmap with overcommit >= 1 by Sami Farin on Tuesday, April 29, 2008 - 6:19 pm. (4 messages)

Next thread: linux-next: upstream build failure: v4l/dvb by Stephen Rothwell on Tuesday, April 29, 2008 - 6:41 pm. (5 messages)