Re: Rescheduling interrupts

Previous thread: [PATCH 00/16] ide: various fixes/cleanups by Bartlomiej Zolnierkiewicz on Monday, January 21, 2008 - 7:30 pm. (30 messages)

Next thread: Plus besoin d'investir dans un PDA... by webmaster on Monday, January 21, 2008 - 7:22 pm. (1 message)
To: LKML <linux-kernel@...>
Cc: Ingo Molnar <mingo@...>, Arjan van de Ven <arjan@...>
Date: Monday, January 21, 2008 - 7:19 pm

Hi;

With Linus's latest git, powertop reports following while system nearly %10=
0=20
idle;

PowerTOP version 1.9 (C) 2007 Intel Corporation

Cn Avg residency P-states (frequencies)
C0 (cpu running) ( 6,3%) 1,84 Ghz 0,4%
C1 0,0ms ( 0,0%) 1333 Mhz 0,0%
C2 0,1ms ( 0,5%) 1000 Mhz 99,6%
C3 3,7ms (93,2%)

Wakeups-from-idle per second : 306,8 interval: 10,0s
Power usage (5 minute ACPI estimate) : 23,1 W (0,5 hours left)

Top causes for wakeups:
59,9% (238,4) <kernel IPI> : Rescheduling interrupts
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^=
^^^
14,7% ( 58,6) amarokapp : schedule_timeout (process_timeout)
5,5% ( 21,9) java : futex_wait (hrtimer_wakeup)
5,0% ( 19,8) <interrupt> : iwl3945
2,5% ( 10,0) java : schedule_timeout (process_timeout)
2,5% ( 10,0) <kernel core> : ehci_work (ehci_watchdog)
2,5% ( 10,0) <interrupt> : extra timer interrupt
1,6% ( 6,4) artsd : schedule_timeout (process_timeout)
1,0% ( 4,0) <kernel module> : usb_hcd_poll_rh_status (rh_timer_func)
0,5% ( 2,0) <interrupt> : ohci1394, uhci_hcd:usb4, nvidia
0,5% ( 2,0) <kernel core> : clocksource_check_watchdog=20
(clocksource_watchdog)
0,5% ( 2,0) kwin : schedule_timeout (process_timeout)
0,5% ( 1,9) wpa_supplicant : schedule_timeout (process_timeout)
0,3% ( 1,2) kdesktop : schedule_timeout (process_timeout)
0,3% ( 1,0) kwrapper : do_nanosleep (hrtimer_wakeup)
0,3% ( 1,0) klipper : schedule_timeout (process_timeout)
0,3% ( 1,0) artsd : do_setitimer (it_real_fn)
0,3% ( 1,0) gpg-agent : schedule_timeout (process_timeout)
0,3% ( 1,0) X : nv_start_rc_timer (nv_kern_rc_timer)
0,3% ( 1,0) kicke...

To: S.Çağlar <caglar@...>
Cc: LKML <linux-kernel@...>, Arjan van de Ven <arjan@...>
Date: Tuesday, January 22, 2008 - 6:57 am

hm, would be nice to figure out what causes these IPIs. Could you stick
something like this into arch/x86/kernel/smp_32.c's
smp_send_reschedule() function [this is the function that generates the
IPI]:

static void native_smp_send_reschedule(int cpu)
{
WARN_ON(cpu_is_offline(cpu));
send_IPI_mask(cpumask_of_cpu(cpu), RESCHEDULE_VECTOR);
if (panic_timeout > 0) {
panic_timeout--;
printk("IPI from task %s:%d on CPU#%d:\n",
current->comm, current->pid, cpu);
dump_stack();
}
}

NOTE: if you run an SMP kernel then first remove these two lines from
kernel/printk.c:

if (!oops_in_progress && waitqueue_active(&log_wait))
wake_up_interruptible(&log_wait);

otherwise you'll get lockups. (the IPI is sent while holding the
runqueue lock, so the printks will lock up)

then wait for the bad condition to occur on your system and generate a
stream of ~10 backtraces, via:

echo 10 > /proc/sys/kernel/panic

you should be getting 10 immediate backtraces - please send them to us.
The backtraces should show the place that generates the wakeups. [turn
on CONFIG_FRAME_POINTERS=y to get high quality backtraces.]

If you do _not_ get 10 immediate backtraces, then something in the
system is generating such IPIs outside of the scheduler's control. That
would suggest some other sort of borkage.

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: LKML <linux-kernel@...>, Arjan van de Ven <arjan@...>
Date: Tuesday, January 22, 2008 - 8:55 am

Hi;

I grabbed the logs two times to make sure to catch needed info. 1st [1] one is generated while "Rescheduling interrupts" wakeups ~200 times and 2nd one generated for ~350 wakeups.

[1] http://cekirdek.pardus.org.tr/~caglar/dmesg.1st
[2] http://cekirdek.pardus.org.tr/~caglar/dmesg.2nd

Cheers
--
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
--

To: S.Çağlar <caglar@...>
Cc: LKML <linux-kernel@...>, Arjan van de Ven <arjan@...>
Date: Tuesday, January 22, 2008 - 11:23 am

thanks, these seem to be mostly normal wakeups from standard tasks:

IPI from task kdm_greet:2118 on CPU#0:
IPI from task X:2079 on CPU#1:
IPI from task kdm_greet:2118 on CPU#0:
IPI from task hald-addon-inpu:2009 on CPU#1:
IPI from task events/0:7 on CPU#1:
IPI from task bash:2129 on CPU#0:
IPI from task kdm_greet:2118 on CPU#0:
IPI from task events/0:7 on CPU#1:
IPI from task events/0:7 on CPU#1:
IPI from task events/0:7 on CPU#1:
IPI from task bash:3902 on CPU#1:
IPI from task bash:3902 on CPU#1:
IPI from task amarokapp:3423 on CPU#1:
IPI from task amarokapp:3423 on CPU#1:
IPI from task amarokapp:3423 on CPU#1:
IPI from task X:2079 on CPU#0:
IPI from task yakuake:3422 on CPU#0:
IPI from task X:2079 on CPU#1:
IPI from task amarokapp:3423 on CPU#1:
IPI from task amarokapp:3423 on CPU#1:

could you also add a similar IPI printouts (with the same panic_timeout
logic) to arch/x86/kernel/smp_32.c's smp_reschedule_interrupt() function
- while still keeping the other printouts too?

Could you also enable PRINTK_TIME timestamps, so that we can see the
timings? (And do a "dmesg -n 1" so that the printks happen fast and the
timings are accurate.) I'd suggest to increase CONFIG_LOG_BUF_SHIFT to
20, so that your dmesg buffer is large enough. Plus try to capture 100
events, ok?

My theory is that for whatever reason we get "repeat" IPIs: multiple
reschedule IPIs although the other CPU only initiated one.

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: LKML <linux-kernel@...>, Arjan van de Ven <arjan@...>
Date: Tuesday, January 22, 2008 - 11:55 am

Hi;

Ok, please see http://cekirdek.pardus.org.tr/~caglar/dmesg.3rd :)

Cheers
--
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
--

To: S.Çağlar <caglar@...>
Cc: LKML <linux-kernel@...>, Arjan van de Ven <arjan@...>
Date: Tuesday, January 22, 2008 - 12:05 pm

hm, the IPI sending and receiving is nicely paired up:

[ 625.795008] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1:
[ 625.795223] IPI (@native_smp_send_reschedule) from task amarokapp:2882 on CPU#1:

amarokapp does wake up threads every 20 microseconds - that could
explain it. It's probably Xorg running on one core, amarokapp on the
other core. That's already 100 reschedules/sec.

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: S.Çağlar Onur <caglar@...>, LKML <linux-kernel@...>, Arjan van de Ven <arjan@...>
Date: Tuesday, May 13, 2008 - 2:03 pm

Hi,

Was this IPI thing ever resolved/fixed? I'm seeing the exact same
thing on my newest Fedora 8 kernel (2.6.24.5-85.fc8) now:

33.3% (334.0) <kernel IPI> : Rescheduling interrupts

And I know for sure that this did not show up a couple of months back.

If the cause of this is known, please let me know about it. If it
isn't, I will start a bisect run now :-)

Thanks.

Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

To: Ingo Molnar <mingo@...>
Cc: S.Çağlar Onur <caglar@...>, LKML <linux-kernel@...>, Arjan van de Ven <arjan@...>
Date: Tuesday, May 13, 2008 - 3:04 pm

Ok -- just a confirmation. I have this on v2.6.26-rc1:

Running only init=/bin/sh (from busybox) and powertop shows:
kernel IPI : Rescheduling interrupts

completely idle box (except powertop): 1 wakeups/sec
one busy-loop[1]: 2 wakeups/sec
two busy-loops: 500 wakeups/sec

So it seems like a real bug to me.

Vegard

[1]: while true; do echo hi; done

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

To: Ingo Molnar <mingo@...>
Cc: S.Çağlar <caglar@...>, LKML <linux-kernel@...>, Arjan van de Ven <arjan@...>
Date: Tuesday, January 22, 2008 - 1:03 pm

That suggests we want an "anti-load-balancing" heuristic when CPU usage
is very low. Migrating everything onto one core when we're close to idle
will save power and probably reduce latencies.

--
Mathematics is the supreme nostalgia of our time.

--

To: Matt Mackall <mpm@...>
Cc: S.Çağlar <caglar@...>, LKML <linux-kernel@...>, Arjan van de Ven <arjan@...>
Date: Tuesday, January 22, 2008 - 3:27 pm

that would probably be the case if it's multiple sockets - but for
multiple cores exactly the opposite is true: the sooner _both_ cores
finish processing, the deeper power use the CPU can reach. So effective
and immediate spreading of workloads amongst multiple cores - especially
with shared L2 caches where the cost of migration is low, helps power
consumption. (and it obviously helps latencies and bandwith)

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: LKML <linux-kernel@...>, Arjan van de Ven <arjan@...>
Date: Tuesday, January 22, 2008 - 12:11 pm

Heh, killing amarok ends up with following;

PowerTOP version 1.9 (C) 2007 Intel Corporation

Cn Avg residency P-states (frequencies)
C0 (cpu running) ( 0,9%)
C1 0,0ms ( 0,0%)
C2 0,2ms ( 0,0%)
C3 5,1ms (99,1%)

Wakeups-from-idle per second : 197,8 interval: 10,0s
no ACPI power usage estimate available

Top causes for wakeups:
34,7% (130,7) USB device 3-2 : HP Integrated Module (Broadcom Corp)
26,5% (100,0) <interrupt> : uhci_hcd:usb3
5,8% ( 22,0) java : futex_wait (hrtimer_wakeup)
5,3% ( 20,0) <interrupt> : iwl3945
4,1% ( 15,4) USB device 2-2 : Microsoft Wireless Optical Mouse .00 (Microsoft)
2,9% ( 11,0) <interrupt> : libata
2,7% ( 10,1) <interrupt> : extra timer interrupt
2,7% ( 10,0) java : schedule_timeout (process_timeout)
2,7% ( 10,0) <kernel core> : scan_async (ehci_watchdog)
2,4% ( 9,0) <kernel IPI> : Rescheduling interrupts
2,1% ( 8,0) <kernel module> : usb_hcd_poll_rh_status (rh_timer_func)
1,7% ( 6,4) <interrupt> : uhci_hcd:usb2
1,7% ( 6,4) artsd : schedule_timeout (process_timeout)
0,6% ( 2,1) <interrupt> : ohci1394, uhci_hcd:usb4, nvidia
0,5% ( 2,0) <kernel core> : clocksource_check_watchdog (clocksource_watchdog)
0,5% ( 1,7) wpa_supplicant : schedule_timeout (process_timeout)
0,3% ( 1,0) kicker : schedule_timeout (process_timeout)
0,3% ( 1,0) kwin : schedule_timeout (process_timeout)
0,3% ( 1,0) kdesktop : schedule_timeout (process_timeout)
0,3% ( 1,0) klipper : schedule_timeout (process_timeout)
0,3% ( 1,0) kwrapper : do_nanosleep (hrtimer_wakeup)
0,3% ( 1,0) X : nv_start_rc_timer (nv_kern_rc_timer)

--
S.Çağlar Onur <caglar@pardus.org.tr>
[ message continues ]

" title="http://cekirdek.pardus.or...">http://cekirdek.pardus.or...

To: Ingo Molnar <mingo@...>
Cc: LKML <linux-kernel@...>, Arjan van de Ven <arjan@...>
Date: Tuesday, January 22, 2008 - 12:22 pm

Hi;

By the way loging out from KDE also suffers same problem, this time kdm migrated to CPU1 and powertop reports ~300 wakeups for "<kernel IPI> : Rescheduling interrupts" again.

[...]
[ 2058.246692] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#0:
[ 2058.246737] IPI (@native_smp_send_reschedule) from task kdm_greet:6122 on CPU#1:
[ 2058.246812] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1:
[ 2058.278947] IPI (@native_smp_send_reschedule) from task X:2073 on CPU#0:
[ 2058.279070] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#0:
[ 2058.279175] IPI (@native_smp_send_reschedule) from task kdm_greet:6122 on CPU#1:
[ 2058.279251] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1:
[ 2058.279301] IPI (@native_smp_send_reschedule) from task X:2073 on CPU#0:
[ 2058.279377] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#0:
[ 2058.279425] IPI (@native_smp_send_reschedule) from task kdm_greet:6122 on CPU#1:
[ 2058.279503] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1:
[ 2058.279565] IPI (@native_smp_send_reschedule) from task X:2073 on CPU#0:
[ 2058.279637] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#0:
[ 2058.279683] IPI (@native_smp_send_reschedule) from task kdm_greet:6122 on CPU#1:
[ 2058.279758] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1:
[ 2058.311903] IPI (@native_smp_send_reschedule) from task X:2073 on CPU#0:
[ 2058.312028] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#0:
[...]

Cheers
--
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
--

To: S.Çağlar <caglar@...>
Cc: LKML <linux-kernel@...>, Arjan van de Ven <arjan@...>
Date: Tuesday, January 22, 2008 - 11:37 am

also, this might reduce the number of cross-CPU wakeups on near-idle
systems:

echo 1 > /sys/devices/system/cpu/sched_mc_power_savings

[ or if it doesnt, it should ;) ]

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: LKML <linux-kernel@...>, Arjan van de Ven <arjan@...>
Date: Tuesday, January 22, 2008 - 12:00 pm

Hi;

Seems like nothing changes

zangetsu ~ # cat /sys/devices/system/cpu/sched_mc_power_savings
1

Powertop still reports ~300 wakeups for "<kernel IPI> : Rescheduling interrupts"

PowerTOP version 1.9 (C) 2007 Intel Corporation

Cn Avg residency P-states (frequencies)
C0 (cpu running) ( 4,8%)
C1 0,0ms ( 0,0%)
C2 0,2ms ( 2,4%)
C3 2,4ms (92,8%)

Wakeups-from-idle per second : 495,2 interval: 3,0s
no ACPI power usage estimate available

Top causes for wakeups:
40,0% (330,7) <kernel IPI> : Rescheduling interrupts
12,3% (102,0) USB device 3-2 : HP Integrated Module (Broadcom Corp)
12,1% (100,0) <interrupt> : uhci_hcd:usb3
8,0% ( 66,3) <interrupt> : extra timer interrupt
7,0% ( 58,0) amarokapp : schedule_timeout (process_timeout)
4,0% ( 33,0) <interrupt> : uhci_hcd:usb2

and this is what system is doing while powertop reports above;

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.1 0.0 1512 532 ? Ss 17:41 0:00 init [3]
root 2 0.0 0.0 0 0 ? S< 17:41 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S< 17:41 0:00 [migration/0]
root 4 0.0 0.0 0 0 ? S< 17:41 0:00 [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S< 17:41 0:00 [migration/1]
root 6 0.0 0.0 0 0 ? S< 17:41 0:00 [ksoftirqd/1]
root 7 0.0 0.0 0 0 ? S< 17:41 0:00 [events/0]
root 8 0.0 0.0 0 0 ? S< 17:41 0:00 [events/1]
root 9 0.0 0.0 0 0 ? S< 17:41 0:00 [khelper]
root 10 0.0 0.0 0 0 ? S< 17:41 0:00 [kblockd/0]
root 11 0.0 0.0 0 0 ? S< 17:41 0:00 [kblockd/1]
root 12 0.0 0.0 0...

Previous thread: [PATCH 00/16] ide: various fixes/cleanups by Bartlomiej Zolnierkiewicz on Monday, January 21, 2008 - 7:30 pm. (30 messages)

Next thread: Plus besoin d'investir dans un PDA... by webmaster on Monday, January 21, 2008 - 7:22 pm. (1 message)