Hi;
With Linus's latest git, powertop reports following while system nearly %10=
0=20
idle;PowerTOP version 1.9 (C) 2007 Intel Corporation
Cn Avg residency P-states (frequencies)
C0 (cpu running) ( 6,3%) 1,84 Ghz 0,4%
C1 0,0ms ( 0,0%) 1333 Mhz 0,0%
C2 0,1ms ( 0,5%) 1000 Mhz 99,6%
C3 3,7ms (93,2%)Wakeups-from-idle per second : 306,8 interval: 10,0s
Power usage (5 minute ACPI estimate) : 23,1 W (0,5 hours left)Top causes for wakeups:
59,9% (238,4) <kernel IPI> : Rescheduling interrupts
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^=
^^^
14,7% ( 58,6) amarokapp : schedule_timeout (process_timeout)
5,5% ( 21,9) java : futex_wait (hrtimer_wakeup)
5,0% ( 19,8) <interrupt> : iwl3945
2,5% ( 10,0) java : schedule_timeout (process_timeout)
2,5% ( 10,0) <kernel core> : ehci_work (ehci_watchdog)
2,5% ( 10,0) <interrupt> : extra timer interrupt
1,6% ( 6,4) artsd : schedule_timeout (process_timeout)
1,0% ( 4,0) <kernel module> : usb_hcd_poll_rh_status (rh_timer_func)
0,5% ( 2,0) <interrupt> : ohci1394, uhci_hcd:usb4, nvidia
0,5% ( 2,0) <kernel core> : clocksource_check_watchdog=20
(clocksource_watchdog)
0,5% ( 2,0) kwin : schedule_timeout (process_timeout)
0,5% ( 1,9) wpa_supplicant : schedule_timeout (process_timeout)
0,3% ( 1,2) kdesktop : schedule_timeout (process_timeout)
0,3% ( 1,0) kwrapper : do_nanosleep (hrtimer_wakeup)
0,3% ( 1,0) klipper : schedule_timeout (process_timeout)
0,3% ( 1,0) artsd : do_setitimer (it_real_fn)
0,3% ( 1,0) gpg-agent : schedule_timeout (process_timeout)
0,3% ( 1,0) X : nv_start_rc_timer (nv_kern_rc_timer)
0,3% ( 1,0) kicke...
hm, would be nice to figure out what causes these IPIs. Could you stick
something like this into arch/x86/kernel/smp_32.c's
smp_send_reschedule() function [this is the function that generates the
IPI]:static void native_smp_send_reschedule(int cpu)
{
WARN_ON(cpu_is_offline(cpu));
send_IPI_mask(cpumask_of_cpu(cpu), RESCHEDULE_VECTOR);
if (panic_timeout > 0) {
panic_timeout--;
printk("IPI from task %s:%d on CPU#%d:\n",
current->comm, current->pid, cpu);
dump_stack();
}
}NOTE: if you run an SMP kernel then first remove these two lines from
kernel/printk.c:if (!oops_in_progress && waitqueue_active(&log_wait))
wake_up_interruptible(&log_wait);otherwise you'll get lockups. (the IPI is sent while holding the
runqueue lock, so the printks will lock up)then wait for the bad condition to occur on your system and generate a
stream of ~10 backtraces, via:echo 10 > /proc/sys/kernel/panic
you should be getting 10 immediate backtraces - please send them to us.
The backtraces should show the place that generates the wakeups. [turn
on CONFIG_FRAME_POINTERS=y to get high quality backtraces.]If you do _not_ get 10 immediate backtraces, then something in the
system is generating such IPIs outside of the scheduler's control. That
would suggest some other sort of borkage.Ingo
--
Hi;
I grabbed the logs two times to make sure to catch needed info. 1st [1] one is generated while "Rescheduling interrupts" wakeups ~200 times and 2nd one generated for ~350 wakeups.
[1] http://cekirdek.pardus.org.tr/~caglar/dmesg.1st
[2] http://cekirdek.pardus.org.tr/~caglar/dmesg.2ndCheers
--
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
--
thanks, these seem to be mostly normal wakeups from standard tasks:
IPI from task kdm_greet:2118 on CPU#0:
IPI from task X:2079 on CPU#1:
IPI from task kdm_greet:2118 on CPU#0:
IPI from task hald-addon-inpu:2009 on CPU#1:
IPI from task events/0:7 on CPU#1:
IPI from task bash:2129 on CPU#0:
IPI from task kdm_greet:2118 on CPU#0:
IPI from task events/0:7 on CPU#1:
IPI from task events/0:7 on CPU#1:
IPI from task events/0:7 on CPU#1:
IPI from task bash:3902 on CPU#1:
IPI from task bash:3902 on CPU#1:
IPI from task amarokapp:3423 on CPU#1:
IPI from task amarokapp:3423 on CPU#1:
IPI from task amarokapp:3423 on CPU#1:
IPI from task X:2079 on CPU#0:
IPI from task yakuake:3422 on CPU#0:
IPI from task X:2079 on CPU#1:
IPI from task amarokapp:3423 on CPU#1:
IPI from task amarokapp:3423 on CPU#1:could you also add a similar IPI printouts (with the same panic_timeout
logic) to arch/x86/kernel/smp_32.c's smp_reschedule_interrupt() function
- while still keeping the other printouts too?Could you also enable PRINTK_TIME timestamps, so that we can see the
timings? (And do a "dmesg -n 1" so that the printks happen fast and the
timings are accurate.) I'd suggest to increase CONFIG_LOG_BUF_SHIFT to
20, so that your dmesg buffer is large enough. Plus try to capture 100
events, ok?My theory is that for whatever reason we get "repeat" IPIs: multiple
reschedule IPIs although the other CPU only initiated one.Ingo
--
Hi;
Ok, please see http://cekirdek.pardus.org.tr/~caglar/dmesg.3rd :)
Cheers
--
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
--
hm, the IPI sending and receiving is nicely paired up:
[ 625.795008] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1:
[ 625.795223] IPI (@native_smp_send_reschedule) from task amarokapp:2882 on CPU#1:amarokapp does wake up threads every 20 microseconds - that could
explain it. It's probably Xorg running on one core, amarokapp on the
other core. That's already 100 reschedules/sec.Ingo
--
Hi,
Was this IPI thing ever resolved/fixed? I'm seeing the exact same
thing on my newest Fedora 8 kernel (2.6.24.5-85.fc8) now:33.3% (334.0) <kernel IPI> : Rescheduling interrupts
And I know for sure that this did not show up a couple of months back.
If the cause of this is known, please let me know about it. If it
isn't, I will start a bisect run now :-)Thanks.
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
Ok -- just a confirmation. I have this on v2.6.26-rc1:
Running only init=/bin/sh (from busybox) and powertop shows:
kernel IPI : Rescheduling interruptscompletely idle box (except powertop): 1 wakeups/sec
one busy-loop[1]: 2 wakeups/sec
two busy-loops: 500 wakeups/secSo it seems like a real bug to me.
Vegard
[1]: while true; do echo hi; done
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
That suggests we want an "anti-load-balancing" heuristic when CPU usage
is very low. Migrating everything onto one core when we're close to idle
will save power and probably reduce latencies.--
Mathematics is the supreme nostalgia of our time.--
that would probably be the case if it's multiple sockets - but for
multiple cores exactly the opposite is true: the sooner _both_ cores
finish processing, the deeper power use the CPU can reach. So effective
and immediate spreading of workloads amongst multiple cores - especially
with shared L2 caches where the cost of migration is low, helps power
consumption. (and it obviously helps latencies and bandwith)Ingo
--
Heh, killing amarok ends up with following;
PowerTOP version 1.9 (C) 2007 Intel Corporation
Cn Avg residency P-states (frequencies)
C0 (cpu running) ( 0,9%)
C1 0,0ms ( 0,0%)
C2 0,2ms ( 0,0%)
C3 5,1ms (99,1%)Wakeups-from-idle per second : 197,8 interval: 10,0s
no ACPI power usage estimate availableTop causes for wakeups:
34,7% (130,7) USB device 3-2 : HP Integrated Module (Broadcom Corp)
26,5% (100,0) <interrupt> : uhci_hcd:usb3
5,8% ( 22,0) java : futex_wait (hrtimer_wakeup)
5,3% ( 20,0) <interrupt> : iwl3945
4,1% ( 15,4) USB device 2-2 : Microsoft Wireless Optical Mouse .00 (Microsoft)
2,9% ( 11,0) <interrupt> : libata
2,7% ( 10,1) <interrupt> : extra timer interrupt
2,7% ( 10,0) java : schedule_timeout (process_timeout)
2,7% ( 10,0) <kernel core> : scan_async (ehci_watchdog)
2,4% ( 9,0) <kernel IPI> : Rescheduling interrupts
2,1% ( 8,0) <kernel module> : usb_hcd_poll_rh_status (rh_timer_func)
1,7% ( 6,4) <interrupt> : uhci_hcd:usb2
1,7% ( 6,4) artsd : schedule_timeout (process_timeout)
0,6% ( 2,1) <interrupt> : ohci1394, uhci_hcd:usb4, nvidia
0,5% ( 2,0) <kernel core> : clocksource_check_watchdog (clocksource_watchdog)
0,5% ( 1,7) wpa_supplicant : schedule_timeout (process_timeout)
0,3% ( 1,0) kicker : schedule_timeout (process_timeout)
0,3% ( 1,0) kwin : schedule_timeout (process_timeout)
0,3% ( 1,0) kdesktop : schedule_timeout (process_timeout)
0,3% ( 1,0) klipper : schedule_timeout (process_timeout)
0,3% ( 1,0) kwrapper : do_nanosleep (hrtimer_wakeup)
0,3% ( 1,0) X : nv_start_rc_timer (nv_kern_rc_timer)--
S.Çağlar Onur <caglar@pardus.org.tr>
[ message continues ]
Hi;
By the way loging out from KDE also suffers same problem, this time kdm migrated to CPU1 and powertop reports ~300 wakeups for "<kernel IPI> : Rescheduling interrupts" again.
[...]
[ 2058.246692] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#0:
[ 2058.246737] IPI (@native_smp_send_reschedule) from task kdm_greet:6122 on CPU#1:
[ 2058.246812] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1:
[ 2058.278947] IPI (@native_smp_send_reschedule) from task X:2073 on CPU#0:
[ 2058.279070] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#0:
[ 2058.279175] IPI (@native_smp_send_reschedule) from task kdm_greet:6122 on CPU#1:
[ 2058.279251] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1:
[ 2058.279301] IPI (@native_smp_send_reschedule) from task X:2073 on CPU#0:
[ 2058.279377] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#0:
[ 2058.279425] IPI (@native_smp_send_reschedule) from task kdm_greet:6122 on CPU#1:
[ 2058.279503] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1:
[ 2058.279565] IPI (@native_smp_send_reschedule) from task X:2073 on CPU#0:
[ 2058.279637] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#0:
[ 2058.279683] IPI (@native_smp_send_reschedule) from task kdm_greet:6122 on CPU#1:
[ 2058.279758] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1:
[ 2058.311903] IPI (@native_smp_send_reschedule) from task X:2073 on CPU#0:
[ 2058.312028] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#0:
[...]Cheers
--
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
--
also, this might reduce the number of cross-CPU wakeups on near-idle
systems:echo 1 > /sys/devices/system/cpu/sched_mc_power_savings
[ or if it doesnt, it should ;) ]
Ingo
--
Hi;
Seems like nothing changes
zangetsu ~ # cat /sys/devices/system/cpu/sched_mc_power_savings
1Powertop still reports ~300 wakeups for "<kernel IPI> : Rescheduling interrupts"
PowerTOP version 1.9 (C) 2007 Intel Corporation
Cn Avg residency P-states (frequencies)
C0 (cpu running) ( 4,8%)
C1 0,0ms ( 0,0%)
C2 0,2ms ( 2,4%)
C3 2,4ms (92,8%)Wakeups-from-idle per second : 495,2 interval: 3,0s
no ACPI power usage estimate availableTop causes for wakeups:
40,0% (330,7) <kernel IPI> : Rescheduling interrupts
12,3% (102,0) USB device 3-2 : HP Integrated Module (Broadcom Corp)
12,1% (100,0) <interrupt> : uhci_hcd:usb3
8,0% ( 66,3) <interrupt> : extra timer interrupt
7,0% ( 58,0) amarokapp : schedule_timeout (process_timeout)
4,0% ( 33,0) <interrupt> : uhci_hcd:usb2and this is what system is doing while powertop reports above;
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.1 0.0 1512 532 ? Ss 17:41 0:00 init [3]
root 2 0.0 0.0 0 0 ? S< 17:41 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S< 17:41 0:00 [migration/0]
root 4 0.0 0.0 0 0 ? S< 17:41 0:00 [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S< 17:41 0:00 [migration/1]
root 6 0.0 0.0 0 0 ? S< 17:41 0:00 [ksoftirqd/1]
root 7 0.0 0.0 0 0 ? S< 17:41 0:00 [events/0]
root 8 0.0 0.0 0 0 ? S< 17:41 0:00 [events/1]
root 9 0.0 0.0 0 0 ? S< 17:41 0:00 [khelper]
root 10 0.0 0.0 0 0 ? S< 17:41 0:00 [kblockd/0]
root 11 0.0 0.0 0 0 ? S< 17:41 0:00 [kblockd/1]
root 12 0.0 0.0 0...
| Greg Kroah-Hartman | [PATCH 012/196] nozomi driver |
| Ingo Molnar | Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3 |
| Rafael J. Wysocki | [PATCH -mm 5/6] Freezer: Remove PF_NOFREEZE from bluetooth threads |
| Ingo Molnar | Re: [PATCH 00/23] per device dirty throttling -v8 |
git: | |
| David Miller | [GIT]: Networking |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| Natalie Protasevich | [BUG] New Kernel Bugs |
