Hello, i use kernel version 2.6.33.2 (x86-64, Core 2 Duo) and the perf events to track frequency and c states of my cpu. But if i want to trace power:power_start events, it does not work. After typing echo "power:power_start" | sudo tee set_event on the console, the trace doesn't contain such events. power:power_frequency and power:power_end events are reported correctly by the system. I read in the mailing list that Robert Schoene had a similiar problem, but with power:power_end events. Is there a known bug regarding to trace_power_start methods in process.c? Bye Ronny -- Ronny Tschueter Technische Universität Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Office: Willersbau, room A31 Phone: (+49) 351/463-35448 E-Mail: ronny.tschueter@tu-dresden.de WWW: http://www.tu-dresden.de/zih --
FWIW, on that kernel version (Fedora 13 on Athlon X2), systemtap hooks
into all those tracepoints fine, and
# stap -e 'probe kernel.trace("power_start") { log($$parms) }'
produces plenty of traffic.
- FChE
--
Is this for perf or ftrace? I'm assuming you did this from the debugfs tracing directory. The set_event file is for ftrace not perf. -- Steve --
It is for perf - i'm interested in event tracing. I used the event tracing system like it is described in Documentation/trace/events.txt. In addition i wrote a program which uses syscall(__NR_perf_counter_open, ... ) to trace power events. But the result is the same - power:power_end events are reported, power:power_start events not. Bye Ronny --
Hi, can you please compare the perf and ftrace output to see if ftrace works while perf doesn't for example? In perf: perf record -f -c 1 -a -e power:power_start ^C perf report In ftrace: echo 1 > $DEBUGFS/tracing/events/power/power_start/enable cat $DEBUGFS/trace_pipe Thanks. --
Hi
IN PERF:
-> patched kernel (see my mail from yesterday)
perf record -f -c 1 -a -e power:power_start
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.500 MB perf.data (~21826 samples) ]
perf report
# Samples: 7880
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ......
#
57.13% swapper 0 [k] 0000000000000000
42.86% init 0 [k] 0000000000000000
0.01% swapper [kernel] [k] 0000000000000000
#
# (For a higher level overview, try: perf report --sort comm,dso)
#
-> unpatched kernel
perf record -f -c 1 -a -e power:power_start
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.263 MB perf.data (~11484 samples) ]
perf report
# Samples: 0
#
# Overhead Command Shared Object Symbol
# ........ ............... ............. ......
#
#
# (For a higher level overview, try: perf report --sort comm,dso)
#
IN FTRACE:
-> patched kernel
cat trace_pipe shows events like
<idle>-0 [001] 59067.030622: power_start: type=1 state=2
<idle>-0 [001] 59067.032514: power_start: type=1 state=2
<idle>-0 [000] 59067.032522: power_start: type=1 state=2
<idle>-0 [001] 59067.032561: power_start: type=1 state=2
<idle>-0 [000] 59067.037527: power_start: type=1 state=2
<idle>-0 [001] 59067.037528: power_start: type=1 state=2
-> unpatched kernel
cat trace returns no output
Bye
--
Hello,
I took a look at ftrace and tried to narrow my problem down.
1) <idle>-0 | | cpuidle_idle_call() {
1) <idle>-0 | | menu_select() {
1) <idle>-0 | 0.219 us | pm_qos_requirement();
1) <idle>-0 | 0.219 us | tick_nohz_get_sleep_length();
1) <idle>-0 | 0.210 us | nr_iowait_cpu();
1) <idle>-0 | 0.213 us | this_cpu_load();
1) <idle>-0 | 0.213 us | nr_iowait_cpu();
1) <idle>-0 | 2.472 us | }
1) <idle>-0 | | acpi_idle_enter_simple() {
1) <idle>-0 | |
lapic_timer_state_broadcast() {
1) <idle>-0 | | clockevents_notify() {
1) <idle>-0 | 0.234 us | _raw_spin_lock_irqsave();
1) <idle>-0 | |
raw_notifier_call_chain() {
1) <idle>-0 | | tick_notify() {
1) <idle>-0 | |
tick_broadcast_oneshot_control() {
1) <idle>-0 | 0.237 us | _raw_spin_lock_irqsave();
1) <idle>-0 | |
clockevents_set_mode() {
1) <idle>-0 | |
lapic_timer_setup() {
1) <idle>-0 | 0.222 us |
native_apic_mem_read();
1) <idle>-0 | 0.213 us |
native_apic_mem_write();
1) <idle>-0 | 0.210 us |
native_apic_mem_write();
1) <idle>-0 | 1.554 us | }
1) <idle>-0 | 2.007 us | }
1) <idle>-0 | |
tick_dev_program_event() {
1) <idle>-0 | | ktime_get() {
1) <idle>-0 | 0.606 us | read_hpet();
1) <idle>-0 | 1.053 us | }
1) <idle>-0 | | ...Depending on gcc, the above can bloat the code since each call to
trace_power_start() is a macro expanded. Try to call it just once.
Perhaps one of the following:
trace_power_start(POWER_CSTATE,
cx->type == ACPI_STATE_C1 ? 1 :
cx->type == ACPI_STATE_C2 ? 2 :
3);
Or make a local variable instead, although the above would compile out
completely if tracing is not enabled.
int type:
switch (cx->type) {
case ACPI_STATE_C1:
type = 1;
break;
case ACPI_STATE_C2:
type = 2;
break;
case ACPI_STATE_C3:
type = 3;
break;
}
trace_power_start(POWER_CSTATE, type);
--
the code is also incorrect fundamentally. You need to pass in the mwait value or equivalent; the ACPI STATE type is pretty much useless random garbage and should completely be ignored. --
You're correct for your case (switching to c-state via monitor/mwait). But the current implementation is broken too. It works only if your kernel uses processor_idle AND monitor/mwait. For all other cases it fails - may it be on Ronnys system, that uses IO port based C-states or my system, that uses cpu_idle (not processor_idle). I'd suggest, that it should work for IO port based switches and could be included in the syntactic block of that case. Afaik, halt means that the processor goes to C1. So in this block, there could be a power_start event too with 1 as new state. (Maybe there are problems with AMDs C1E which would be reported as C1.) -- Robert Schoene Technische Universitaet Dresden Zentrum fuer Informationsdienste und Hochleistungsrechnen 01062 Dresden Tel.: (0351) 463-42483, Fax: (0351) 463-37773 E-Mail: Robert.Schoene@tu-dresden.de --
Unfortunately the bug still exists in kernel 2.6.34. On my system I'm
not able to trace power:power_start events, because they are not
reported. Regarding to the preceding comments I moved my code snippet
into the IO port based C-state part of acpi_idle_do_entry(). Therefore
it should not interfere with the other two C-state calls in this function.
Furthermore a lot of other code sequences in processor_idle.c use the
ACPI STATE type in if- or switch-statements. So I don't think that ACPI
STATE type is useless garbage.
Finally here is my new patch:
diff --git a/old/processor_idle.c b/new/processor_idle.c
index 5939e7f..5818522 100644
--- a/old/processor_idle.c
+++ b/new/processor_idle.c
@@ -43,6 +43,7 @@
#include <linux/clockchips.h>
#include <linux/cpuidle.h>
#include <linux/irqflags.h>
+#include <trace/events/power.h>
/*
* Include the apic definitions for x86 to have the APIC timer related
defines
@@ -807,6 +808,10 @@ static inline void acpi_idle_do_entry(struct
acpi_processor_cx *cx)
} else {
int unused;
/* IO port based C-state */
+ trace_power_start(POWER_CSTATE,
+ cx->type == ACPI_STATE_C1 ? 1 :
+ cx->type == ACPI_STATE_C2 ? 2 :
+ 3);
inb(cx->address);
/* Dummy wait op - must do something useless after P_LVL2 read
because chipsets cannot guarantee that STPCLK# signal
--
I use Linux performance counters to trace some performance related
events. Unfortunately power:power_start events are not reported on my
machine. In /drivers/acpi/processor_idle.c I found three different ways
to call a C-state, but only one way (via architectural FFH based C-state
and using MONITOR/MWAIT) is currently instrumented. Here is my patch to
instrument the remaining two possibilities:
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index b1b3856..08e7c42 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -43,6 +43,7 @@
#include <linux/clockchips.h>
#include <linux/cpuidle.h>
#include <linux/irqflags.h>
+#include <trace/events/power.h>
/*
* Include the apic definitions for x86 to have the APIC timer related
defines
@@ -136,6 +137,7 @@ static void acpi_safe_halt(void)
*/
smp_mb();
if (!need_resched()) {
+ trace_power_start(POWER_CSTATE,1);
safe_halt();
local_irq_disable();
}
@@ -797,6 +799,10 @@ static inline void acpi_idle_do_entry(struct
acpi_processor_cx *cx)
} else {
int unused;
/* IO port based C-state */
+ trace_power_start(POWER_CSTATE,
+ cx->type == ACPI_STATE_C1 ? 1 :
+ cx->type == ACPI_STATE_C2 ? 2 :
+ 3);
inb(cx->address);
/* Dummy wait op - must do something useless after P_LVL2 read
because chipsets cannot guarantee that STPCLK# signal
--
