NOHZ: local_softirq_pending 20

Previous thread: Re: [PATCH 1/1] [MTD/NAND]: Add Blackfin BF52x on-chip NAND Flash controller driver support in bf5xx_nand driver by David Woodhouse on Friday, November 23, 2007 - 3:19 am. (7 messages)

Next thread: [PATCH] sched: minor optimization by Nikanth Karthikesan on Saturday, November 24, 2007 - 4:46 am. (2 messages)
From: Uwe
Date: Friday, November 23, 2007 - 4:24 am

Hello,

my kernel reported:

	NOHZ: local_softirq_pending 20

I cannot interpret it, but probably this is bad, because before
bc5393a6c9c0e70b4b43fb2fb63e3315e9a15c8f this used to BUG().

This happend while having a high load.  Up to now it only happend once
and I cannot reproduce it.

The machine is a cc7ucamry running v2.6.24-rc3-36-g4dd2d66[1].
Config and dmesg is available at

	http://www.modarm9.com/~ukleinek/20071123-local_softirq_pending/

Maybe the relevant source file is arch/arm/mach-ns9xxx/time-ns921x.c?


[1] available at
git://www.modarm9.com/gitsrc/pub/people/ukleinek/linux-2.6.git
	


-- 
Uwe Kleine-König, Software Engineer
Digi International GmbH Branch Breisach, Küferstrasse 8, 79206 Breisach, Germany
Tax: 315/5781/0242 / VAT: DE153662976 / Reg. Amtsgericht Dortmund HRB 13962
-

From: Thomas Gleixner
Date: Friday, November 23, 2007 - 5:30 am

On Fri, 23 Nov 2007, Uwe Kleine-K
From: Uwe
Date: Friday, November 23, 2007 - 7:33 am

Hello,

I suppose you're right, and it's only me that fails to see that.  Just
from looking on the code, I'd say TASKLET_SOFTIRQ is 5.  Ah, OK, I see,
Is there something I can do to be able to report more if it reoccurs?

Can you isolate the problem?  Has it to do with the arch-specific timing
code?  With the hardware?

Best regards and thanks
Uwe

-- 
Uwe Kleine-König, Software Engineer
Digi International GmbH Branch Breisach, Küferstrasse 8, 79206 Breisach, Germany
Tax: 315/5781/0242 / VAT: DE153662976 / Reg. Amtsgericht Dortmund HRB 13962
-

From: Uwe
Date: Saturday, February 2, 2008 - 4:45 am

Hello,

Back then I added some debug code to tick_nohz_stop_sched_tick to get
some more information when this happens again.  As this happened just
now I saw:

	- tick_nohz_stop_sched_tick was called from irq_exit
	  Actually this didn't surprise me, because
	  tick_nohz_stop_sched_tick is only called at two places, namely
	  irq_exit and cpu_idle.  And I cannot see how
	  local_softirq_pending() != 0 can happen in the latter (without
	  first happening in irq_exit maybe).

	- it happened three times in a row at the following times:

		[ 1593.470000] NOHZ: (c003a3ac) local_softirq_pending 20
		[ 1593.470000] Tasklet state=1, func=c0046248, data=0
		[ 1593.920000] NOHZ: (c003a3ac) local_softirq_pending 20
		[ 1593.920000] Tasklet state=1, func=c0046248, data=0
		[ 1594.980000] NOHZ: (c003a3ac) local_softirq_pending 20
		[ 1594.980000] Tasklet state=1, func=c0046248, data=0

	  (c003a3ac = irq_exit+0x24/0x94)

	- There was a single tasklet in __get_cpu_var(tasklet_vec).list:

	  	state = 1
		func = rcu_process_callbacks (= c0046248)
		data = 0

	- directly afterwards the oom-killer started killing tasks

I think the only user of rcu in my kernel is networking code.  Does this
help anyone to further debug my problem here?

Best regards
Uwe

-- 
Uwe Kleine-König, Software Engineer
Digi International GmbH Branch Breisach, Küferstrasse 8, 79206 Breisach, Germany
Tax: 315/5781/0242 / VAT: DE153662976 / Reg. Amtsgericht Dortmund HRB 13962
--

Previous thread: Re: [PATCH 1/1] [MTD/NAND]: Add Blackfin BF52x on-chip NAND Flash controller driver support in bf5xx_nand driver by David Woodhouse on Friday, November 23, 2007 - 3:19 am. (7 messages)

Next thread: [PATCH] sched: minor optimization by Nikanth Karthikesan on Saturday, November 24, 2007 - 4:46 am. (2 messages)