Hello,
Back then I added some debug code to tick_nohz_stop_sched_tick to get
some more information when this happens again. As this happened just
now I saw:
- tick_nohz_stop_sched_tick was called from irq_exit
Actually this didn't surprise me, because
tick_nohz_stop_sched_tick is only called at two places, namely
irq_exit and cpu_idle. And I cannot see how
local_softirq_pending() != 0 can happen in the latter (without
first happening in irq_exit maybe).
- it happened three times in a row at the following times:
[ 1593.470000] NOHZ: (c003a3ac) local_softirq_pending 20
[ 1593.470000] Tasklet state=1, func=c0046248, data=0
[ 1593.920000] NOHZ: (c003a3ac) local_softirq_pending 20
[ 1593.920000] Tasklet state=1, func=c0046248, data=0
[ 1594.980000] NOHZ: (c003a3ac) local_softirq_pending 20
[ 1594.980000] Tasklet state=1, func=c0046248, data=0
(c003a3ac = irq_exit+0x24/0x94)
- There was a single tasklet in __get_cpu_var(tasklet_vec).list:
state = 1
func = rcu_process_callbacks (= c0046248)
data = 0
- directly afterwards the oom-killer started killing tasks
I think the only user of rcu in my kernel is networking code. Does this
help anyone to further debug my problem here?
Best regards
Uwe
--
Uwe Kleine-König, Software Engineer
Digi International GmbH Branch Breisach, Küferstrasse 8, 79206 Breisach, Germany
Tax: 315/5781/0242 / VAT: DE153662976 / Reg. Amtsgericht Dortmund HRB 13962
--