Re: [PATCH] debug: fix BUILD_BUG_ON() for non-constant expressions

Previous thread: [PATCH 0/5] merge io_apic_xx.c -- fix by Yinghai Lu on Saturday, August 16, 2008 - 3:07 am. (12 messages)

Next thread: SCHED_FIFO and SCHED_RR broken by cfs by Stefani Seibold on Saturday, August 16, 2008 - 2:55 am. (17 messages)
From: Alexey Dobriyan
Date: Saturday, August 16, 2008 - 3:09 am

BUILD_BUG_ON should have never existed -- BUG_ON could upgrade itself to
compile-breaking version if compiler has enough information and this is
what patch does.

The only downside is that one can't write BUG_ON(1) anymore.

Note: this compile-breaks jbd, jbd2, tipc,
      what this code is supposed to do?

	journal = handle->h_transaction->t_journal;
	if (!journal_set_features(journal, 0, 0, JFS_FEATURE_INCOMPAT_REVOKE)) {
		J_ASSERT (!"Cannot set revoke feature!");
			 ^^^^
		return -EINVAL;
        }

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---

 arch/cris/arch-v10/kernel/io_interface_mux.c |    8 ++++----
 arch/powerpc/include/asm/bug.h               |    3 +--
 drivers/infiniband/core/cma.c                |    2 +-
 drivers/infiniband/core/mad.c                |    2 +-
 drivers/infiniband/hw/amso1100/c2_ae.c       |    2 +-
 drivers/infiniband/hw/cxgb3/cxio_hal.c       |    2 +-
 drivers/infiniband/hw/cxgb3/iwch_cm.c        |    6 +++---
 drivers/mmc/host/wbsd.c                      |    2 +-
 drivers/net/wireless/b43legacy/b43legacy.h   |    2 ++
 drivers/net/wireless/b43legacy/main.c        |   12 ++++++------
 drivers/net/wireless/b43legacy/phy.c         |    2 +-
 drivers/net/wireless/b43legacy/radio.c       |   22 +++++++++++-----------
 drivers/net/wireless/b43legacy/xmit.c        |   12 ++++++------
 drivers/scsi/device_handler/scsi_dh_emc.c    |    2 +-
 drivers/ssb/scan.c                           |    4 ++--
 drivers/ssb/ssb_private.h                    |    2 ++
 fs/jbd2/journal.c                            |    2 +-
 include/asm-generic/bug.h                    |    8 +++++++-
 include/asm-mips/bug.h                       |    5 ++++-
 kernel/sched.c                               |    2 +-
 net/bluetooth/rfcomm/tty.c                   |    2 +-
 21 files changed, 58 insertions(+), 46 deletions(-)

--- a/arch/cris/arch-v10/kernel/io_interface_mux.c
+++ b/arch/cris/arch-v10/kernel/io_interface_mux.c
@@ -650,7 +650,7 @@ int ...
From: Rusty Russell
Date: Saturday, August 16, 2008 - 3:55 am

Interesting idea, but I've come to actually like the semantic explicitness of 
BUILD_BUG_ON.  There's a difference between "we should never get here" 
and "this should never exist".

But maybe I just like it because we have it.  At very least BUILD_BUG_ON 
should definitely compile-barf on a non-constant expr, and vice versa for 
BUG_ON().

Note that BUG_ON() is a hack caused by lack of attribute((cold)).  "if (x) 
BUG()" is clearer, and possible in the long run as people upgrade compilers.

Cheers,
Rusty.
--

From: Linus Torvalds
Date: Saturday, August 16, 2008 - 1:07 pm

Agreed. I think Alexey's patch is broken.

The thing is, BUILD_BUG_ON() is a different thing. It says "this is a 
build error", while BUG_ON() says "this is an error if we reach it".

Very different.

The fact that you broke BUG_ON(1) should have made you think. Sometimes 
the "1" isn't necessarily a constant one. It might be

	if (something_that_can_never_happen_in_some_configuration) {
		...
		BUG_ON(CONFIG_XYZZY);
		...
	}

where the BUG_ON(1) is absolutely *not* the same thing as BUILD_BUG_ON().

			Linus
--

From: Ingo Molnar
Date: Sunday, August 17, 2008 - 3:32 am

agreed.

There's one aspect of BUILD_BUG_ON() that is quite dangerous though: it 
does not 'upgrade' into a runtime check if an expression is not 
constant. And it does not warn either. So BUILD_BUG_ON() can degrade 
into a no-op very silently, and that is inherently dangerous.

That aspect bit me once: i added a BUILD_BUG_ON() under the assumption 
that it would catch a mis-sized virtual memory sizing detail in 
arch/x86/, but it just remained silent.

To fix these problems i've added the two commits below to tip/core/debug 
[one to extend BUILD_BUG_ON, one to clean up its location] - any 
objections against that direction? I've started testing it through to 
make sure we dont have any stale non-constant BUILD_BUG_ON() instances 
around.

( Note, i have not changed BUILD_BUG_ON_ZERO() because that is used in 
  structure initializers so no comma expression can be used in them. 
  Such structure initializers wont allow non-constant expressions 
  anyway, so there's not much extra value in checking for that. )

( Note #2, BUILD_BUG_ON() had to remain a macro, so that
  __builtin_constant_expression_p() can do its work. )

	Ingo

From f5b5d41dd51a31fe70e3a04fb80a3b90b84c6a4e Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Sun, 17 Aug 2008 11:58:58 +0200
Subject: [PATCH] debug: fix BUILD_BUG_ON() for non-constant expressions
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

constant expressions get detected at build time via:

  kernel/sched.c: In function ‘test':
  kernel/sched.c:9187: error: size of array ‘type name' is negative
  make[1]: *** [kernel/sched.o] Error 1

but non-constant expressions (for example BUILD_BUG_ON(variable)) simply
get discarded by the compiler - turning BUILD_BUG_ON() into a dangerous
construct.

So add another layer at the link level to detect such mishaps:

  kernel/built-in.o: In function `test':
  : undefined reference to `__BUILD_BUG_ON_non_constant'

Signed-off-by: ...
From: Linus Torvalds
Date: Sunday, August 17, 2008 - 9:56 am

Gag me now.

Why not just do

 #define __BBO(c)                sizeof(const char[1 - 2*!!(c)])
 #define __BBONC(c)              __BBO(!__builtin_constant_p(c))
 #define BUILD_BUG_ON_ZERO(c)    (__BBO(c) - __BBONC(c))
 #define BUILD_BUG_ON(c)         (void)BUILD_BUG_ON_ZERO(c)

and be done with it?

The rules are trivial:

 - __BBO() causes a warning if 'c' is a constant non-zero, returns a
   (size_t) 1 otherwise

 - __BBONC() causes a warning if 'c' is not a constant, returns a
   (size_t) 1 otherwise

And then BUILD_BUG_ON[_ZERO] are totally _trivial_ on top of those.

Yeah, yeah, I didn't test this, but it looks a hell of a lot simpler, and 
gives the warning about non-constant issues at compile time with line 
numbers rather than at link-time.

		Linus
--

From: Ingo Molnar
Date: Sunday, August 17, 2008 - 10:33 am

yeah, i first tried a few variants of that (compile-time warnings are 
much better than link time warnings), but none worked when i tested 
various failure modes.

try the patch below - it only gives this error during build:

 kernel/sched.c: In function ‘test':
 kernel/sched.c:9193: error: size of array ‘type name' is negative
 make[1]: *** [kernel/sched.o] Error 1

it doesnt notice the error on the next line. I suspect this is because 
__builtin_constant_p() is special (or broken). I tried it with gcc 4.2.3 
and 4.3.0.

	Ingo

Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -9181,3 +9181,16 @@ struct cgroup_subsys cpuacct_subsys = {
 	.subsys_id = cpuacct_subsys_id,
 };
 #endif	/* CONFIG_CGROUP_CPUACCT */
+
+#define __BBO(c)                sizeof(const char[1 - 2*!!(c)])
+#define __BBONC(c)              __BBO(!__builtin_constant_p(c))
+#define BUILD_BUG_ON_ZERO2(c)    (__BBO(c) - __BBONC(c))
+#define BUILD_BUG_ON2(c)         (void)BUILD_BUG_ON_ZERO(c)
+
+void test(void)
+{
+	BUILD_BUG_ON2(0);
+	BUILD_BUG_ON2(1);
+	BUILD_BUG_ON2(panic_timeout);
+}
+
--

From: Ingo Molnar
Date: Sunday, August 17, 2008 - 10:53 am

^^^^^^^^^^^^^^^^^
Lars Gullik Bjønnes pointed out that this should have been ZERO2 not 
ZERO - but same end result.

	Ingo
--

From: Linus Torvalds
Date: Sunday, August 17, 2008 - 11:39 am

Look at the #define of BUILD_BUG_ON2 a bit more.

Hint: you're using the _wrong_ BUILD_BUG_ON_ZERO. The old one, not the v2 
one!

That said, with that fixed, there's still something wrong. It does seem 
like gcc has some very odd interaction there with __builtin_constant_p. 
Odd.

			Linus
--

From: Ingo Molnar
Date: Sunday, August 17, 2008 - 11:45 am

yeah, i already tried various variants earlier today so i really didnt try

yeah. I tried various integer arithmetic expressions (which the array trick
relies on) and it didnt work as expected - it's always zero. It only makes
a difference when used in comparisons. (and that's where the kernel uses
__builtin_constant_p quite heavily, and it works fine there.)

Odd indeed.

	Ingo
--

From: Rusty Russell
Date: Sunday, August 17, 2008 - 6:09 pm

Hey, I thought I was the "undisputed ruler of Ugly-land".

How about this instead:

#define BUILD_BUG_ON(condition)						\
do {									\
	static struct { char arr[1 - 2*!!(condition)]; } x __maybe_unused;	\
} while(0)

Cheers,
Rusty.
--

From: Ingo Molnar
Date: Monday, August 18, 2008 - 12:54 am

hm, have you tried it and do we get a severe enough link error about 
that? If the macro gets ignored by the compiler that's really a hard 
error - such things are essential safeguards of kernel sanity:

	/*
	 * Build-time sanity checks on the kernel image and module
	 * area mappings. (these are purely build-time and produce no code)
	 */
	BUILD_BUG_ON(MODULES_VADDR < KERNEL_IMAGE_START);
	BUILD_BUG_ON(MODULES_VADDR-KERNEL_IMAGE_START < KERNEL_IMAGE_SIZE);
	BUILD_BUG_ON(MODULES_LEN + KERNEL_IMAGE_SIZE > 2*PUD_SIZE);
	BUILD_BUG_ON((KERNEL_IMAGE_START & ~PMD_MASK) != 0);
	BUILD_BUG_ON((MODULES_VADDR & ~PMD_MASK) != 0);
	BUILD_BUG_ON(!(MODULES_VADDR > __START_KERNEL));
	BUILD_BUG_ON(!(((MODULES_END - 1) & PGDIR_MASK) ==
				(__START_KERNEL & PGDIR_MASK)));
	BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);

( and propagating them into runtime failures not only increases bloat,
  it also makes failures harder to debug. These checks 'run' _early_. )

Link time warnings are easy enough to miss.

So unless there's a better way of doing it all at compile time (i'd 
really prefer that!) i'd prefer the link time error about botched 
BUILD_BUG_ON() conditions - as my commits introduce.

	Ingo
--

From: Boaz Harrosh
Date: Monday, August 18, 2008 - 2:55 am

#define BUILD_BUG_ON(condition)						\
do {
	enum { bad = !!(condition)}; 									\
	static struct { char arr[1 - 2*bad]; } x __maybe_unused;	\
} while(0)

the enum definition will not let in anything not compile-time constant.
But then I fail on: (include/linux/virtio_config.h:99)

	if (__builtin_constant_p(fbit))
		BUILD_BUG_ON(fbit >= 32);

is that code broken?

Boaz
--

From: Boaz Harrosh
Date: Monday, August 18, 2008 - 5:32 am

with this patch:
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index aaa998f..91d98f2 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -468,7 +468,12 @@ struct sysinfo {
 };
 
 /* Force a compilation error if condition is true */
-#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
+// #define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
+#define BUILD_BUG_ON(condition)					\
+do { 								\
+	enum { bad = !!(condition)}; 				\
+	static struct { char arr[1 - 2*bad]; } x __maybe_unused;\
+} while(0)
 
 /* Force a compilation error if condition is true, but also produce a
    result (of value 0 and type size_t), so the expression can be used


I found another BUILD_BUG_ON() none const bug, here:

diff --git a/drivers/net/niu.c b/drivers/net/niu.c
index e4765b7..1469a38 100644
--- a/drivers/net/niu.c
+++ b/drivers/net/niu.c
@@ -5133,9 +5133,9 @@ static void niu_init_tx_bmac(struct niu *np, u64 min, u64 max)
 
 static void niu_init_tx_mac(struct niu *np)
 {
-	u64 min, max;
+	u64 max;
 
-	min = 64;
+	enum { min = 64 } ;
 	if (np->dev->mtu > ETH_DATA_LEN)
 		max = 9216;
 	else

--

From: Ingo Molnar
Date: Tuesday, August 19, 2008 - 6:34 am

hmm ... that's a bit sad, gcc ought to have been able to figure this 
out. Can this be fixed somehow, without losing the strength of the 
checking here? I think we should not change BUILD_BUG_ON() to make it 
less useful.

	Ingo
--

From: Boaz Harrosh
Date: Tuesday, August 19, 2008 - 9:33 am

The BUILD_BUG_ON I proposed is not less useful. Actually with current
BUILD_BUG_ON above code does nothing, since fbit is a function parameter
it will translate to nothing. Certainly it will not check the size of passed
parameter, since that was converted on the stack to unsigned int. So my
definition will catch the bad programing here.

If the user of virtio_has_feature() must pass a compile-time constant then
it must be converted to a MACRO, and then the BUILD_BUG_ON will work.
Or it should be changed to a BUG_ON() if fbit is a runtime variable.

From what I understood this is what you wanted, some way to make sure
a BUILD_BUG_ON will check that the check is actually useful (compile-time)
and is not silently ignored.

Same thing at the other place I sent.

Boaz
--

From: Ingo Molnar
Date: Wednesday, August 20, 2008 - 3:59 am

well, that's the question i'm asking: that sort of proposed 
BUILD_BUG_ON() variantcannot be used in inline functions like 
virtio_has_feature() does. If we get forced back to macros that's not an 
improvement.

Maybe the link-time last-line-of-defense mechanism i posed is the most 
flexible one perhaps after all? (it's ugly too but none of this is 
particularly pretty)

hm?

	Ingo
--

From: Boaz Harrosh
Date: Wednesday, August 20, 2008 - 5:31 am

The use of __builtin_constant_p in inline functions is broken. This
is because it will give different results depending on the -O level
used. So I think that using it in the Kernel with inlines is plain
broken. And should be discouraged.

That said, my trick with enum is exactly the same as __builtin_constant_p
when -O is off, that is, does not traverse inline. But it is consistent

I think it is an improvement, in a sense that now we think something is happening
but get silently ignored if compilation conditions are different, and/or the programmer
had a mistake. The new way will show us what code will be produced in the worse 

The link-time gives the same results. Only warns at link time instead of
compile time. The difference between our approaches is the use of
__builtin_constant_p which is suppose to work cross inline stack boundary,

Here is gcc documentation about __builtin_constant_p:

— Built-in Function: int __builtin_constant_p (exp)

    You can use the built-in function __builtin_constant_p to determine if a value is known to be constant at compile-time and hence that GCC can perform constant-folding on expressions involving that value. The argument of the function is the value to test. The function returns the integer 1 if the argument is known to be a compile-time constant and 0 if it is not known to be a compile-time constant. A return of 0 does not indicate that the value is not a constant, but merely that GCC cannot prove it is a constant with the specified value of the -O option.

    You would typically use this function in an embedded application where memory was a critical resource. If you have some complex calculation, you may want it to be folded if it involves constants, but need to call a function if it does not. For example:

              #define Scale_Value(X)      \
                (__builtin_constant_p (X) \
                ? ((X) * SCALE + OFFSET) : Scale (X))
         

    You may use this built-in function in either a macro or an inline ...
From: adobriyan
Date: Wednesday, August 20, 2008 - 5:39 am

See how kmalloc() works.

--

From: Boaz Harrosh
Date: Wednesday, August 20, 2008 - 6:07 am

Right:

static inline void *kmalloc(size_t size, gfp_t flags)
{
	if (__builtin_constant_p(size)) {
	 	... do some stuff
	else
		return __kmalloc_xxx(..)
}

So if optimization is on we might gain some cycles
but we must make sure we have the proper else clause.

In any way virtio_has_feature() is broken and no amount
of optimization may ever fix it. Please look at it:

static inline bool virtio_has_feature(const struct virtio_device *vdev,
				      unsigned int fbit)
{
	/* Did you forget to fix assumptions on max features? */
	if (__builtin_constant_p(fbit))
		BUILD_BUG_ON(sizeof(fbit) >= 32);


the __builtin_constant_p(fbit) will never help.
sizeof(fbit) is always sizeof(unsigned int) in any case. Only
a macro can help here. Sorry

Boaz

--

From: Ingo Molnar
Date: Thursday, August 21, 2008 - 5:17 am

ok - that's convincing, so your approach is superior to my hack on 
multiple levels.

Could you please (re-)send your patch(es) with a changelog, etc., so 
that i can apply it and drop my patches?

Thanks,

	Ingo
--

From: Rusty Russell
Date: Sunday, August 24, 2008 - 6:19 pm

I was (ab)using the current slackness of BUILD_BUG_ON() to be a noop on 
non-constants.  We can replace with a BUG_ON if we fix BUILD_BUG_ON() to 
insist on a compile-time constant; I think that's still an overall win.

Cheers,
Rusty.
--

From: Boaz Harrosh
Date: Wednesday, August 20, 2008 - 6:21 am

Ingo Hi!

I got bitten by this duplication too. Please submit an equivalent patch
for today's code. That could be nice.

Then later we can take care of BUILD_BUG_ON which ever variant.

Thanks
Boaz
--

From: Andrew Morton
Date: Saturday, August 16, 2008 - 10:46 am

lol.  It's been there since I merged ext3 in 2.4.15.  Probably it was
in sct's ext3 patches in the RH kernel.

Don't change it - it might be important!
--

From: Theodore Tso
Date: Sunday, August 17, 2008 - 5:19 am

Heh!  Well, it does the right thing, and doesn't take any extra text
space assuming a vaguely competent C compiler optimizer.  :-)

I'm pretty sure that back in the 2.4 days, we didn't have BUG_ON.  We
should do a s/J_ASSERT/BUG_ON/g pass over all of fs/jbd and fs/jbd2.
I'll submit patches for application when the 2.6.27 merge window opens
up --- or is this an obvious enough and safe enough transformation
that it will get accepted mainline at this point?

     	     	 	  	      	   - Ted
--

From: Andrew Morton
Date: Sunday, August 17, 2008 - 9:33 am

May as well get it over and done with.

We presently have a mix of J_ASSERT, J_ASSERT_JH and J_ASSERT_BH.  That
will become BUG_ON, J_ASSERT_JH and J_ASSERT_BH.  Which a is slightly
unpleasing loss of consistency but whatever.


--

Previous thread: [PATCH 0/5] merge io_apic_xx.c -- fix by Yinghai Lu on Saturday, August 16, 2008 - 3:07 am. (12 messages)

Next thread: SCHED_FIFO and SCHED_RR broken by cfs by Stefani Seibold on Saturday, August 16, 2008 - 2:55 am. (17 messages)