Re: linux-next: PowerPC WARN_ON_ONCE() after merge of the final tree (tip related)

Previous thread: useless node/has_cpu sysfs attribute by Andreas Herrmann on Wednesday, April 14, 2010 - 11:12 pm. (4 messages)

Next thread: [Question] integrating KDB with linux kernel by santosh on Wednesday, April 14, 2010 - 11:22 pm. (3 messages)
From: Stephen Rothwell
Date: Wednesday, April 14, 2010 - 11:12 pm

Hi all,

Yesterday's (and today's) linux-next boot (PowerPC) failed like this:

------------[ cut here ]------------
Badness at kernel/lockdep.c:2301
NIP: c0000000000a35c8 LR: c0000000000084c4 CTR: 0000000000000000
REGS: c000000000bf77e0 TRAP: 0700   Not tainted  (2.6.34-rc4-autokern1)
MSR: 8000000000021032 <ME,CE,IR,DR>  CR: 24000044  XER: 00000004
TASK = c000000000aa3d30[0] 'swapper' THREAD: c000000000bf4000 CPU: 0
GPR00: 0000000000000001 c000000000bf7a60 c000000000bf32f0 c0000000000084c4 
GPR04: 0000000000000000 0000000000000a00 0000000000000000 0000000000000068 
GPR08: 0000000000000008 c000000000c4fabe 0000000000000000 7265677368657265 
GPR12: 8000000000009032 c000000007691000 0000000001c00000 c000000000770bf8 
GPR16: c00000000076f390 0000000000000000 0000000000430000 00000000024876f0 
GPR20: c000000000887480 0000000002487480 c0000000008876f0 0000000001b5f8d0 
GPR24: c000000000770478 0000000003300000 c000000000c1f1c8 c000000000884610 
GPR28: c000000000c1b290 c0000000000084c4 c000000000b45068 c000000000aa3d30 
NIP [c0000000000a35c8] .trace_hardirqs_on_caller+0xb0/0x224
LR [c0000000000084c4] system_call_common+0xc4/0x114
Call Trace:
[c000000000bf7a60] [c000000000bf7ba0] init_thread_union+0x3ba0/0x4000 (unreliable)
[c000000000bf7af0] [c0000000000084c4] system_call_common+0xc4/0x114
--- Exception: c01 at .kernel_thread+0x28/0x70
    LR = .rest_init+0x34/0xf8
[c000000000bf7de0] [c00000000086916c] .proc_sys_init+0x20/0x64 (unreliable)
[c000000000bf7e50] [c0000000000099c0] .rest_init+0x20/0xf8
[c000000000bf7ee0] [c000000000848af0] .start_kernel+0x484/0x4a8
[c000000000bf7f90] [c0000000000083c0] .start_here_common+0x1c/0x5c
Instruction dump:
409e0188 0fe00000 48000180 801f08d8 2f800000 41be0050 880d01da 2fa00000 
41be0028 e93e8538 88090000 68000001 <0b000000> 2fa00000 41be0010 e93e8538 
------------[ cut here ]------------

Caused by commit bd6d29c25bb1a24a4c160ec5de43e0004e01f72b ("lockstat:
Make lockstat counting per cpu").  This added a WARN_ON_ONCE to
debug_atomic_inc() ...
From: Ingo Molnar
Date: Wednesday, April 14, 2010 - 11:49 pm

Ok, we'll fix the warning.

Btw., WARN_ON trapping on PowerPC is clearly a PowerPC bug - there's a good 
reason we have WARN_ON versus BUG_ON - it should be fixed.

	Ingo
--

From: David Miller
Date: Wednesday, April 14, 2010 - 11:55 pm

From: Ingo Molnar <mingo@elte.hu>

I disagree, an implementation should be allowed to use the most
efficient implementation possible for both interfaces.

I would be using traps for both on sparc64 if that were really
feasible on sparc64 (and actually with gcc-4.5's "asm goto" it might
actually be now)

The WARN and BUG macros, when implemented without traps, have serious
implications for overall code size and register pressure.
--

From: Ingo Molnar
Date: Thursday, April 15, 2010 - 12:32 am

It trades robustness for slightly better space/code efficiency.

Such a trap based mechanism exists on x86 as well and we use it for BUG_ON(). 
We intentionally dont use it to generate warnings and dont override __WARN(), 
because it would blow up way too often when a warning triggers in some 
sensitive codepath that cannot take a trap.

Anyway, the warning obviously has to be fixed - but the boot crash itself is 
PowerPC's own doing.

	Ingo
--

From: Benjamin Herrenschmidt
Date: Thursday, April 15, 2010 - 6:51 pm

Well, yes and no, as I explained in a separate branch of that thread. We
indeed can't cope with a WARN in that spot because it goes recursive. 

Now the reason we have this double-enable is due afaik to the way I
implemented IRQ trace, because things like syscalls basically
force-enable IRQs on powerpc and I don't necessarily have tracking
informations in the exception return path of what the "old" value was.

I need to double check what the exact scenario here is and whether I can
fix it but it's one of those cases where what lockdep is warning about
isn't actually an error I believe.

Cheers,
Ben. 

--

From: Benjamin Herrenschmidt
Date: Thursday, April 15, 2010 - 6:56 pm

Right, and I don't think the reason why we have WARN_ON vs. BUG_ON ever
had anything to do with whether it's implemented with a trap or not :-) 

It's purely related to whether it's supposed to be fatal or not. Now,
there is indeed the potential problem you mention of WARN_ON being
called in places where such a trap is unsafe, but so far, this is the
first time I can remember we hit that problem so we've been getting away
with it for quite a while :-)

Now, whether the trap is or is not more efficient than an explicit test
is something that is still being debated on powerpc. It has the
advantage of not un-leafing functions (and thus not creating stack
frames, adding register reloads, etc... when not needed), basically
putting the burden of saving/restoring registers to the (hopefully) rare
path of actually taking the WARN/BUG.

We could probably manufacture something similar with careful use of
inline asm and an out of line asm trampoline.

The benefit of the trap instruction vs. conditional branches per-se is
probably nil. It's really more about the codegen impact, register
clobber due to the added function call, etc.. at least for us.

Cheers,


--

From: Stephen Rothwell
Date: Thursday, April 15, 2010 - 3:04 am

Hi Ingo,


Thanks.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
From: Frederic Weisbecker
Date: Thursday, April 15, 2010 - 6:37 am

When a path restore the flags while irqs are already enabled, we
update the per cpu var redundant_hardirqs_on in a racy fashion
and debug_atomic_inc() warns about this situation.

In this particular case, we need to explictly disable the irqs before
updating this stat var in order to update it safely.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 kernel/lockdep.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 78325f8..65d4336 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -2298,7 +2298,11 @@ void trace_hardirqs_on_caller(unsigned long ip)
 		return;
 
 	if (unlikely(curr->hardirqs_enabled)) {
+		unsigned long flags;
+
+		raw_local_irq_save(flags);
 		debug_atomic_inc(redundant_hardirqs_on);
+		raw_local_irq_restore(flags);
 		return;
 	}
 	/* we'll do an OFF -> ON transition: */
-- 
1.6.2.3

--

From: Stephen Rothwell
Date: Tuesday, April 27, 2010 - 12:32 am

Hi Frederic,


Ping?  Any movement on this?

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
From: Frederic Weisbecker
Date: Tuesday, April 27, 2010 - 11:07 am

Ingo, if you have no problems with these two patches, could you please
apply them?

Thanks.

--

From: Stephen Rothwell
Date: Wednesday, April 28, 2010 - 12:12 am

Hi Frederic,


I applied this patch to linux-next today and it fixes my PowerPC boot
problems, so I will keep it in linux-next until it gets applied to the
tip tree or something better comes along.

Thanks.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
From: Frederic Weisbecker
Date: Thursday, April 15, 2010 - 6:00 am

In this case, I guess the following fix should be sufficient?
I'm going to test it and provide a sane changelog.


diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 78325f8..65d4336 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -2298,7 +2298,11 @@ void trace_hardirqs_on_caller(unsigned long ip)
 		return;
 
 	if (unlikely(curr->hardirqs_enabled)) {
+		unsigned long flags;
+
+		raw_local_irq_save(flags);
 		debug_atomic_inc(redundant_hardirqs_on);
+		raw_local_irq_restore(flags);
 		return;
 	}
 	/* we'll do an OFF -> ON transition: */

--

From: Ingo Molnar
Date: Thursday, April 15, 2010 - 7:03 am

that looks rather ugly. Why not do a raw:

	this_cpu_inc(lockdep_stats.redundant_hardirqs_on);

which basically open-codes debug_atomic_inc(), but without the warning?

Btw., using the this_cpu() methods might result in faster code for all the 
debug_atomic_inc() macros as well?

	Ingo
--

From: Frederic Weisbecker
Date: Thursday, April 15, 2010 - 10:15 am

Because that would open a race against interrupts that might
touch lockdep_stats.redundant_hardirqs_on too.

If you think it's not very important (this race must be pretty rare I guess),


Indeed, will change that too.

Thanks.

--

From: Peter Zijlstra
Date: Friday, April 16, 2010 - 3:38 am

How so, its a pure per-cpu variable right? so either the increment
happens before the interrupts hits, or after, in either case there
should not be a race with interrupts.
--

From: Frederic Weisbecker
Date: Friday, April 16, 2010 - 5:32 am

In x86 yeah, I guess the compiler simply loads the address
and does an inc directly, which is atomic wrt interrupts.

But what about another arch that would need an intermediate
load of the value:

load val, reg
add reg, 1
		<---interrupt here
store reg, val

--

From: Frederic Weisbecker
Date: Thursday, April 15, 2010 - 10:24 am

There is also no guarantee we are in a non-preemptable section. We can then
also race against another cpu.

I'm not sure what to do.

--

From: Ingo Molnar
Date: Thursday, April 15, 2010 - 10:39 am

it's a statistics counter so worst-case we lose a count. It's not a real issue 
- but might be worth adding a comment.

	Ingo
--

From: Benjamin Herrenschmidt
Date: Wednesday, April 14, 2010 - 11:48 pm

I think this is because our syscall entry pretty much force-enable irqs.

I remember deciding back then that getting lockdep balanced in that area
was tricky and I didn't do it to avoid adding more overhead to the
syscall path but I suppose I could revisit if necessary.

Cheers,
Ben.

--

Previous thread: useless node/has_cpu sysfs attribute by Andreas Herrmann on Wednesday, April 14, 2010 - 11:12 pm. (4 messages)

Next thread: [Question] integrating KDB with linux kernel by santosh on Wednesday, April 14, 2010 - 11:22 pm. (3 messages)