Got this on today's git (2.6.24-rc4) while compiling stuff - Looks like it is related to CpuIdle stuff. I chose CONFIG_CPU_IDLE for the first time so I don't know when this was introduced. This is on x86_32, SMP. BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #3) EIP: 0060:[<c0603e22>] EFLAGS: 00000202 CPU: 1 EIP is at _spin_lock_irqsave+0x16/0x27 EAX: c06b4110 EBX: 00000001 ECX: f7873808 EDX: 00000293 ESI: 00000005 EDI: f7873808 EBP: 00000000 ESP: f7829f10 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: 004f5960 CR3: 372c5000 CR4: 000006d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c0438233>] tick_broadcast_oneshot_control+0x10/0xda [<c0437c82>] tick_notify+0x1d4/0x2eb [<c04281b4>] get_next_timer_interrupt+0x143/0x1b4 [<c0605819>] notifier_call_chain+0x2a/0x47 [<c04345a0>] raw_notifier_call_chain+0x17/0x1a [<c04378b7>] clockevents_notify+0x19/0x4f [<c0533cc3>] acpi_idle_enter_simple+0x183/0x1d0 [<c058cea3>] cpuidle_idle_call+0x53/0x78 [<c058ce50>] cpuidle_idle_call+0x0/0x78 [<c0402575>] cpu_idle+0x97/0xb8 --
On Fri, 7 Dec 2007 17:53:47 -0500 OK, thanks. Another one for the regression list, please. --
Added, http://bugzilla.kernel.org/show_bug.cgi?id=9525, thanks. --
Looks like tick_broadcast_lock did not get freed in some path. You do not see this when you CPU_IDLE is not configured? Thanks, Venki --
This looks pretty much like the problem I was solving yesterday. Parag, can you please try Linus latest and please check whether there is a stack trace with clockevents_program_event on the top in your dmesg. Thanks, tglx --
Just booted with latest git and there is no clockevents_program_event in dmesg anywhere. Any way I have enabled CPU_IDLE again and the machine's got to do lot of compiles today. I'll see if the problem reproduces. Thanks Parag --
Whoa whoa whoa - this happens left and right with latest Linus +CPU_IDLE. As soon as I start compiling, dmesg starts filling up with - Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5) EIP: 0060:[<c0533d06>] EFLAGS: 00000202 CPU: 1 EIP is at acpi_idle_enter_simple+0x166/0x1d0 EAX: f7829f88 EBX: 00000da1 ECX: 00000266 EDX: 00000000 ESI: 00000000 EDI: 00042070 EBP: 000412cf ESP: f7829f88 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: 080ce178 CR3: 00715000 CR4: 000006d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c058cf03>] cpuidle_idle_call+0x53/0x78 [<c058ceb0>] cpuidle_idle_call+0x0/0x78 [<c0402575>] cpu_idle+0x97/0xb8 ======================= BUG: soft lockup - CPU#1 stuck for 11s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #5) EIP: 0060:[<c0603eaa>] EFLAGS: 00000202 CPU: 1 EIP is at _spin_lock_irqsave+0x16/0x27 EAX: c06b4110 EBX: 00000001 ECX: f7873808 EDX: 00000293 ESI: 00000005 EDI: f7873808 EBP: 00000000 ESP: f7829f10 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: 080ff4b0 CR3: 37cf5000 CR4: 000006d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c0438293>] tick_broadcast_oneshot_control+0x10/0xda [<c0437ce2>] tick_notify+0x1d4/0x2eb [<c04281bc>] get_next_timer_interrupt+0x143/0x1b4 [<c06058a1>] notifier_call_chain+0x2a/0x47 [<c04345c0>] raw_notifier_call_chain+0x17/0x1a [<c043781e>] clockevents_notify+0x19/0x4f [<c0533d23>] acpi_idle_enter_simple+0x183/0x1d0 [<c058cf03>] cpuidle_idle_call+0x53/0x78 [<c058ceb0>] cpuidle_idle_call+0x0/0x78 [<c0402575>] cpu_idle+0x97/0xb8 ======================= BUG: soft lockup - CPU#1 stuck for 11s! [vim:3736] Pid: 3736, comm: vim Not tainted (2.6.24-rc4 #5) EIP: 0060:[<c060540b>] EFLAGS: 00000206 CPU: 1 EIP is at do_page_fault+0x117/0x583 EAX: f6ec1380 EBX: 005d0ff4 ECX: 0000007b EDX: 0...
does the patch below help? But the root cause is likely some timer problems - do you get consistent results from: while :; do time usleep 1111111; done or do these sleeps fluctuate? Ingo Index: linux/kernel/sched.c =================================================================== --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -774,6 +774,7 @@ void sched_clock_idle_wakeup_event(u64 d struct rq *rq = cpu_rq(smp_processor_id()); u64 now = sched_clock(); + touch_softlockup_watchdog(); rq->idle_clock += delta_ns; /* * Override the previous timestamp and ignore all --
They seem to fluctuate - not sure if that's supposed to be exact or if below variations are normal - This is when my compiles are running - [parag@mini ~]$ while :; do time usleep 1111111; done real 0m1.116s user 0m0.000s sys 0m0.000s real 0m1.112s user 0m0.000s sys 0m0.000s real 0m1.130s user 0m0.000s sys 0m0.004s real 0m1.144s user 0m0.000s sys 0m0.000s Thanks Parag --
it's normal for them to fluctuate a little when load goes up and the task gets delayed. 'bad' fluctuations are ones that happen when the system is idle. Ingo --
Even on 100% idle I get variations that are approx in the same range when not idle. Clocksource is hpet if that matters. Next I think I will disable CPU_IDLE, Tickless My ssh connection just died - another lockup in dmesg - BUG: soft lockup - CPU#1 stuck for 14s! [sshd:509] Pid: 509, comm: sshd Not tainted (2.6.24-rc4 #6) EIP: 0060:[<c060540b>] EFLAGS: 00200206 CPU: 1 EIP is at do_page_fault+0x117/0x583 EAX: ef5aaa80 EBX: 0048cfc0 ECX: 0000007b EDX: 00000003 ESI: ef691fb8 EDI: b7ca1ab4 EBP: ef47f050 ESP: ef691f70 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: b7ca1ab4 CR3: 2f555000 CR4: 000006d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c06052f4>] do_page_fault+0x0/0x583 [<c06040aa>] error_code+0x72/0x78 [<c0600000>] netlbl_domhsh_walk+0x94/0xba $> while :; do time usleep 1111111; done real 0m1.112s user 0m0.000s sys 0m0.000s real 0m1.385s user 0m0.000s sys 0m0.004s real 0m1.112s user 0m0.000s sys 0m0.000s real 0m1.112s user 0m0.000s sys 0m0.000s real 0m1.279s user 0m0.000s sys 0m0.000s --
this would suggest that the softlockup watchdog is correct and indeed this, if done on an idle system, looks anomalous to me. Does hpet=disable help? Ingo --
With hpet=disable (clocksource is acpi_pm ) the usleeps are somewhat more closer to correct - under idle as well as under kernel compilation load. But there are still fluctuations under 100% idle - Kernel compile -------------------------- real 0m1.112s user 0m0.004s sys 0m0.000s real 0m1.112s user 0m0.000s sys 0m0.000s real 0m1.115s user 0m0.000s sys 0m0.000s real 0m1.112s user 0m0.000s sys 0m0.000s real 0m1.112s user 0m0.000s sys 0m0.000s ------------------------------ IDLE real 0m1.112s user 0m0.000s sys 0m0.000s real 0m1.131s user 0m0.000s sys 0m0.000s real 0m1.112s user 0m0.000s sys 0m0.000s real 0m1.112s user 0m0.000s sys 0m0.000s real 0m1.139s user 0m0.000s Yep, I was about to say that there are indeed real soft-lockups going on as opposed to the psuedo ones that we thought may be due to timer inconsistencies. BTW this is no-preempt config and 2.6.22 and latest Centos-5.1 kernels ran fine with HPET. Parag --
these fluctuations would still be OK if they are due to HZ granularity. Ingo --
Yes - NO_HZ=y and HIGH_RES_TIMERS=y. My ssh connection still died with hpet=disable although this time I did not see soft lockup message. Also it randomly gets stuck - suddenly there is a freeze and I can't So are the hpet ones ok too or do they seem off even with HRTIMERS=y? Just trying to figure out if I should be testing further with hpet disabled or enabling it is ok. Parag --
On Sat, 8 Dec 2007 15:46:54 -0500 what chipset is this? (I wonder if there's one where we shouldn't be force-enabling the hpet) --
It's an Intel Mac Mini - Intel 945GM chipset. I believe OSX requires HPET and so there shouldn't be a need to force enable it on this chipset? Also I don't have problems under Centos 5.1 kernel (2.6.18-53.1.4.el5) and hpet is enabled there. Parag --
No problems after disabling CONFIG_HIGHRES_TIMERS , CONFIG_CPU_IDLE and CONFIG_NO_HZ. I will try enabling them one by one - HRT, NOHZ and CPU_IDLE last - that way we can at least tell what is required to be hit with this problem. Parag --
Looks like CPU_IDLE=y is necessary for the problem to show up. With CPU_IDLE=n HRT+NO_HZ+TICK_ONESHOT does not give soft lockup problems. (Actually with HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y I do see short freezes on ssh - when I cannot type anything for may be a second even under 100% idle. But Soft Lock up doesnt show up in dmesg with this configuration.) No Soft Lockup - CPU_IDLE=n 5+ Hours uptime --------------------------------------------------------------------------- HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=n HIGH_RES_TIMERS=y NO_HZ=TICK_ONESHOT=n HIGH_RES_TIMERS=NO_HZ=y TICK_ONESHOT=n HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y - short freezes. Soft Lockup - CPU_IDLE=y 2 hours uptime -------------------------------------------------------------------------- HIGH_RES_TIMERS=NO_HZ=TICK_ONESHOT=y dmesg - BUG: soft lockup - CPU#1 stuck for 11s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-rc4 #13) EIP: 0060:[<c0603eaa>] EFLAGS: 00000202 CPU: 1 EIP is at _spin_lock_irqsave+0x16/0x27 EAX: c06b4110 EBX: 00000001 ECX: f7873808 EDX: 00000293 ESI: 00000005 EDI: f7873808 EBP: 00000000 ESP: f7829f10 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: 800f5958 CR3: 37284000 CR4: 000006d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c0438293>] tick_broadcast_oneshot_control+0x10/0xda [<c0437ce2>] tick_notify+0x1d4/0x2eb [<c04281bc>] get_next_timer_interrupt+0x143/0x1b4 [<c06058a1>] notifier_call_chain+0x2a/0x47 [<c04345c0>] raw_notifier_call_chain+0x17/0x1a [<c043781e>] clockevents_notify+0x19/0x4f [<c0533d23>] acpi_idle_enter_simple+0x183/0x1d0 [<c058cf03>] cpuidle_idle_call+0x53/0x78 [<c058ceb0>] cpuidle_idle_call+0x0/0x78 [<c0402575>] cpu_idle+0x97/0xb8 ======================= Parag --
Can you please apply the patch below ? It prints out the internal state of the clockevents/timer system when the softlockup is detected. Thanks, tglx diff --git a/kernel/softlockup.c b/kernel/softlockup.c index 11df812..82f1a05 100644 --- a/kernel/softlockup.c +++ b/kernel/softlockup.c @@ -118,6 +118,7 @@ void softlockup_tick(void) show_regs(regs); else dump_stack(); + sysrq_timer_list_show(); spin_unlock(&print_lock); } --
Below is the relevant output after multiple soft lockups. Thanks Parag BUG: soft lockup - CPU#1 stuck for 14s! [squid_redirect:3109] Pid: 3109, comm: squid_redirect Not tainted (2.6.24-rc4 #15) EIP: 0073:[<0077a362>] EFLAGS: 00000202 CPU: 1 EIP is at 0x77a362 EAX: 0804c008 EBX: 007ca5cc ECX: 08199947 EDX: 0846da74 ESI: 0846da74 EDI: 08199968 EBP: bfcc2d58 ESP: bfcc2d20 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b CR0: 80050033 CR2: b7ea2c84 CR3: 3721b000 CR4: 000006d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 ======================= Timer List Version: v0.3 HRTIMER_MAX_CLOCK_BASES: 2 now at 2899263521779 nsecs cpu: 0 clock 0: .index: 0 .resolution: 1 nsecs .get_time: ktime_get_real .offset: 1197343983635176004 nsecs active timers: clock 1: .index: 1 .resolution: 1 nsecs .get_time: ktime_get .offset: 0 nsecs active timers: #0: <f6eb1e9c>, tick_sched_timer, S:01, tick_nohz_restart_sched_tick, swapper/0 # expires at 2899384000000 nsecs [in 120478221 nsecs] #1: <f6eb1e9c>, hrtimer_wakeup, S:01, do_nanosleep, irqbalance/2663 # expires at 2902364827067 nsecs [in 3101305288 nsecs] #2: <f6eb1e9c>, it_real_fn, S:01, do_setitimer, syslogd/2643 # expires at 2918995045613 nsecs [in 19731523834 nsecs] #3: <f6eb1e9c>, hrtimer_wakeup, S:01, do_nanosleep, crond/3083 # expires at 2938118668932 nsecs [in 38855147153 nsecs] #4: <f6eb1e9c>, hrtimer_wakeup, S:01, do_nanosleep, atd/3184 # expires at 3037524193866 nsecs [in 138260672087 nsecs] #5: <f6eb1e9c>, hrtimer_wakeup, S:01, do_nanosleep, smartd/3313 # expires at 3645576708660 nsecs [in 746313186881 nsecs] .expires_next : 2899500000000 nsecs .hres_active : 1 .nr_events : 113175 .nohz_mode : 2 .idle_tick : 2899000000000 nsecs .tick_stopped : 0 .idle_jiffies : 649750 .idle_calls : 848768 .idle_sleeps : 642128 .idle_entrytime : 289958001...
does processor.max_cstate=1 make the failing configuration work? If yes, how about processor.max_cstate=2? what do you see in /proc/acpi/processor/*/power? -Len --
Until now 2 things were necessary to reproduce the problem -
1) CPU_IDLE=y and
2) Wakeups from Idle = 5-7 Per second (== Longer/deeper C state residency)
If I left the wakeups to high number (50-60) - there were no lockups
but it was very jerky over ssh.
(Typing keys had no effect for seconds etc.)
CPU_IDLE=y
With max_cstate=1 and CPU_IDLE=y things are pretty smooth - no lockups
for the last hour. (Soft lockups used to appear in minutes
previously.)
With max_cstate=2 - old story repeats - it's very jerky and soft
lockups appear in under a minute after going to 3-5 wakeups from idle
Normally (without any max_cstate= parameter) I see this -
(I admit I never thought there was a C8 - C3 was all I ever heard.)
[parag@mini router]$ cat /proc/acpi/processor/CPU0/power
active state: C2
max_cstate: C8
bus master activity: 00000000
maximum allowed latency: 8000 usec
states:
C1: type[C1] promotion[C2] demotion[--]
latency[001] usage[00001010] duration[00000000000000000000]
*C2: type[C2] promotion[--] demotion[C1]
latency[001] usage[07181700] duration[00000000379397304978]
Thanks
Parag
--I have a patch staged for Linus, which fixes a thinko in the broadcast
code. It might be related to your problem. Can you give it a try ?
Thanks,
tglx
------>
Subject: clockevents: fix reprogramming decision in oneshot broadcast
From: Thomas Gleixner <tglx@linutronix.de>
A previous version of the code did the reprogramming of the broadcast
device in the return from idle code. This was removed, but the logic in
tick_handle_oneshot_broadcast() was kept the same.
When a broadcast interrupt happens we signal the expiry to all CPUs
which have an expired event. If none of the CPUs has an expired event,
which can happen in dyntick mode, then we reprogram the broadcast
device. We do not reprogram otherwise, but this is only correct if all
CPUs, which are in the idle broadcast state have been woken up.
The code ignores, that there might be pending not yet expired events on
other CPUs, which are in the idle broadcast state. So the delivery of
those events can be delayed for quite a time.
Change the tick_handle_oneshot_broadcast() function to check for CPUs,
which are in broadcast state and are not woken up by the current event,
and enforce the rearming of the broadcast device for those CPUs.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/time/tick-broadcast.c | 56 ++++++++++++++++---------------------------
1 file changed, 21 insertions(+), 35 deletions(-)
Index: linux-x86.q/kernel/time/tick-broadcast.c
===================================================================
--- linux-x86.q.orig/kernel/time/tick-broadcast.c
+++ linux-x86.q/kernel/time/tick-broadcast.c
@@ -384,45 +384,19 @@ int tick_resume_broadcast_oneshot(struct
}
/*
- * Reprogram the broadcast device:
- *
- * Called with tick_broadcast_lock held and interrupts disabled.
- */
-static int tick_broadcast_reprogram(void)
-{
- ktime_t expires = { .tv64 = KTIME_MAX };
- struct tick_device *td;
- int cpu;
-
- /*
...On Sun, 9 Dec 2007 16:57:38 -0500 starts to sound like something is turning off the clock during a somewhat deeper C-state... --
On Dec 7, 2007 6:12 PM, Pallipadi, Venkatesh No, I did not see this prior to enabling CPU_IDLE. All previous kernels without CPU_IDLE also had soft lockup detection enabled. Parag --
