Tracing of power:power_start events doesn't work

Previous thread: [PATCH 8/8] ipmi: Change timeout and event poll to one second by Corey Minyard on Monday, May 3, 2010 - 6:38 am. (1 message)

Next thread: Re: s2ram slow (radeon) / failing (usb) by Alan Stern on Monday, May 3, 2010 - 6:57 am. (3 messages)
From: Ronny Tschüter
Date: Monday, May 3, 2010 - 6:19 am

Hello,

i use kernel version 2.6.33.2 (x86-64, Core 2 Duo) and the perf events 
to track frequency and c states of my cpu. But if i want to trace 
power:power_start events, it does not work. After typing

echo "power:power_start" | sudo tee set_event

on the console, the trace doesn't contain such events. 
power:power_frequency and power:power_end events are reported correctly 
by the system. I read in the mailing list that Robert Schoene had a 
similiar problem, but with power:power_end events. Is there a known bug 
regarding to trace_power_start methods in process.c?

Bye Ronny

-- 
Ronny Tschueter

Technische Universität Dresden
Center for Information Services
and High Performance Computing (ZIH)
D-01062 Dresden
Germany

Office: Willersbau, room A31
Phone:	(+49) 351/463-35448
E-Mail: ronny.tschueter@tu-dresden.de
WWW:    http://www.tu-dresden.de/zih

--

From: Frank Ch. Eigler
Date: Monday, May 3, 2010 - 2:36 pm

FWIW, on that kernel version (Fedora 13 on Athlon X2), systemtap hooks
into all those tracepoints fine, and

# stap -e 'probe kernel.trace("power_start") { log($$parms) }'

produces plenty of traffic.

- FChE
--

From: Steven Rostedt
Date: Tuesday, May 4, 2010 - 5:31 pm

Is this for perf or ftrace? I'm assuming you did this from the debugfs
tracing directory. The set_event file is for ftrace not perf.

-- Steve


--

From: Ronny Tschüter
Date: Tuesday, May 4, 2010 - 11:34 pm

It is for perf - i'm interested in event tracing. I used the event 
tracing system like it is described in Documentation/trace/events.txt. 
In addition i wrote a program which uses syscall(__NR_perf_counter_open, 
... ) to trace power events. But the result is the same - 
power:power_end events are reported, power:power_start events not.

Bye Ronny
--

From: Frederic Weisbecker
Date: Wednesday, May 5, 2010 - 10:23 am

Hi, can you please compare the perf and ftrace output to
see if ftrace works while perf doesn't for example?

In perf:
	perf record -f -c 1 -a -e power:power_start
	^C
	perf report

In ftrace:
	echo 1 > $DEBUGFS/tracing/events/power/power_start/enable
	cat $DEBUGFS/trace_pipe

Thanks.

--

From: Ronny Tschüter
Date: Wednesday, May 5, 2010 - 11:32 pm

Hi

IN PERF:

-> patched kernel (see my mail from yesterday)
perf record -f -c 1 -a -e power:power_start
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.500 MB perf.data (~21826 samples) ]
perf report
# Samples: 7880
#
# Overhead          Command      Shared Object  Symbol
# ........  ...............  .................  ......
#
     57.13%          swapper                  0  [k] 0000000000000000
     42.86%             init                  0  [k] 0000000000000000
      0.01%          swapper  [kernel]           [k] 0000000000000000
#
# (For a higher level overview, try: perf report --sort comm,dso)
#

-> unpatched kernel
perf record -f -c 1 -a -e power:power_start
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.263 MB perf.data (~11484 samples) ]
perf report
# Samples: 0
#
# Overhead          Command  Shared Object  Symbol
# ........  ...............  .............  ......
#
#
# (For a higher level overview, try: perf report --sort comm,dso)
#


IN FTRACE:
-> patched kernel
cat trace_pipe shows events like
<idle>-0     [001] 59067.030622: power_start: type=1 state=2
<idle>-0     [001] 59067.032514: power_start: type=1 state=2
<idle>-0     [000] 59067.032522: power_start: type=1 state=2
<idle>-0     [001] 59067.032561: power_start: type=1 state=2
<idle>-0     [000] 59067.037527: power_start: type=1 state=2
<idle>-0     [001] 59067.037528: power_start: type=1 state=2

-> unpatched kernel
cat trace returns no output

Bye
--

From: Ronny Tschüter
Date: Wednesday, May 5, 2010 - 7:11 am

Hello,

I took a look at ftrace and tried to narrow my problem down.

  1) <idle>-0    |                     |  cpuidle_idle_call() {
  1) <idle>-0    |                     |    menu_select() {
  1) <idle>-0    |   0.219 us    |      pm_qos_requirement();
  1) <idle>-0    |   0.219 us    |      tick_nohz_get_sleep_length();
  1) <idle>-0    |   0.210 us    |      nr_iowait_cpu();
  1) <idle>-0    |   0.213 us    |      this_cpu_load();
  1) <idle>-0    |   0.213 us    |      nr_iowait_cpu();
  1) <idle>-0    |   2.472 us    |    }
  1) <idle>-0    |                     |    acpi_idle_enter_simple() {
  1) <idle>-0    |                     |      
lapic_timer_state_broadcast() {
  1) <idle>-0    |                     |        clockevents_notify() {
  1) <idle>-0    |   0.234 us    |          _raw_spin_lock_irqsave();
  1) <idle>-0    |                     |          
raw_notifier_call_chain() {
  1) <idle>-0    |                     |            tick_notify() {
  1) <idle>-0    |                     |              
tick_broadcast_oneshot_control() {
  1) <idle>-0    |   0.237 us    |                _raw_spin_lock_irqsave();
  1) <idle>-0    |                     |                
clockevents_set_mode() {
  1) <idle>-0    |                     |                  
lapic_timer_setup() {
  1) <idle>-0    |   0.222 us    |                    
native_apic_mem_read();
  1) <idle>-0    |   0.213 us    |                    
native_apic_mem_write();
  1) <idle>-0    |   0.210 us    |                    
native_apic_mem_write();
  1) <idle>-0    |   1.554 us    |                  }
  1) <idle>-0    |   2.007 us    |                }
  1) <idle>-0    |                     |                
tick_dev_program_event() {
  1) <idle>-0    |                     |                  ktime_get() {
  1) <idle>-0    |   0.606 us    |                    read_hpet();
  1) <idle>-0    |   1.053 us    |                  }
  1) <idle>-0    |                     |                  ...
From: Steven Rostedt
Date: Wednesday, May 5, 2010 - 7:31 am

Depending on gcc, the above can bloat the code since each call to
trace_power_start() is a macro expanded. Try to call it just once.
Perhaps one of the following:


	trace_power_start(POWER_CSTATE,
		cx->type == ACPI_STATE_C1 ? 1 :
		cx->type == ACPI_STATE_C2 ? 2 :
		3);

Or make a local variable instead, although the above would compile out
completely if tracing is not enabled.


	int type:

	switch (cx->type) {
	case ACPI_STATE_C1:
		type = 1;
		break;
	case ACPI_STATE_C2:
		type = 2;
		break;
	case ACPI_STATE_C3:
		type = 3;
		break;
	}
	trace_power_start(POWER_CSTATE, type);



--

From: Arjan van de Ven
Date: Wednesday, May 5, 2010 - 7:33 am

the code is also incorrect fundamentally.
You need to pass in the mwait value or equivalent; the ACPI STATE type is
pretty much useless random garbage and should completely be ignored.
--

From: Robert Schöne
Date: Wednesday, May 12, 2010 - 1:57 am

You're correct for your case (switching to c-state via monitor/mwait).
But the current implementation is broken too. It works only if your
kernel uses processor_idle AND monitor/mwait. For all other cases it
fails - may it be on Ronnys system, that uses IO port based C-states or
my system, that uses cpu_idle (not processor_idle).

I'd suggest, that it should work for IO port based switches and could be
included in the syntactic block of that case.
Afaik, halt means that the processor goes to C1. So in this block, there
could be a power_start event too with 1 as new state. (Maybe there are
problems with AMDs C1E which would be reported as C1.)



-- 
Robert Schoene
Technische Universitaet Dresden
Zentrum fuer Informationsdienste und Hochleistungsrechnen
01062 Dresden

Tel.: (0351) 463-42483, Fax: (0351) 463-37773
E-Mail: Robert.Schoene@tu-dresden.de

--

From: Ronny Tschüter
Date: Friday, June 11, 2010 - 3:26 am

Unfortunately the bug still exists in kernel 2.6.34. On my system I'm 
not able to trace power:power_start events, because they are not 
reported. Regarding to the preceding comments I moved my code snippet 
into the IO port based C-state part of acpi_idle_do_entry(). Therefore 
it should not interfere with the other two C-state calls in this function.
Furthermore a lot of other code sequences in processor_idle.c use the 
ACPI STATE type in if- or switch-statements. So I don't think that ACPI 
STATE type is useless garbage.

Finally here is my new patch:


diff --git a/old/processor_idle.c b/new/processor_idle.c
index 5939e7f..5818522 100644
--- a/old/processor_idle.c
+++ b/new/processor_idle.c
@@ -43,6 +43,7 @@
  #include <linux/clockchips.h>
  #include <linux/cpuidle.h>
  #include <linux/irqflags.h>
+#include <trace/events/power.h>

  /*
   * Include the apic definitions for x86 to have the APIC timer related 
defines
@@ -807,6 +808,10 @@ static inline void acpi_idle_do_entry(struct 
acpi_processor_cx *cx)
      } else {
          int unused;
          /* IO port based C-state */
+        trace_power_start(POWER_CSTATE,
+        cx->type == ACPI_STATE_C1 ? 1 :
+        cx->type == ACPI_STATE_C2 ? 2 :
+        3);
          inb(cx->address);
          /* Dummy wait op - must do something useless after P_LVL2 read
             because chipsets cannot guarantee that STPCLK# signal
--

From: Ronny Tschüter
Date: Tuesday, June 29, 2010 - 7:49 am

I use Linux performance counters to trace some performance related 
events. Unfortunately power:power_start events are not reported on my 
machine. In /drivers/acpi/processor_idle.c I found three different ways 
to call a C-state, but only one way (via architectural FFH based C-state 
and using MONITOR/MWAIT) is currently instrumented. Here is my patch to 
instrument the remaining two possibilities:

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index b1b3856..08e7c42 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -43,6 +43,7 @@
  #include <linux/clockchips.h>
  #include <linux/cpuidle.h>
  #include <linux/irqflags.h>
+#include <trace/events/power.h>

  /*
   * Include the apic definitions for x86 to have the APIC timer related 
defines
@@ -136,6 +137,7 @@ static void acpi_safe_halt(void)
       */
      smp_mb();
      if (!need_resched()) {
+        trace_power_start(POWER_CSTATE,1);
          safe_halt();
          local_irq_disable();
      }
@@ -797,6 +799,10 @@ static inline void acpi_idle_do_entry(struct 
acpi_processor_cx *cx)
      } else {
          int unused;
          /* IO port based C-state */
+        trace_power_start(POWER_CSTATE,
+        cx->type == ACPI_STATE_C1 ? 1 :
+        cx->type == ACPI_STATE_C2 ? 2 :
+        3);
          inb(cx->address);
          /* Dummy wait op - must do something useless after P_LVL2 read
             because chipsets cannot guarantee that STPCLK# signal
--

Previous thread: [PATCH 8/8] ipmi: Change timeout and event poll to one second by Corey Minyard on Monday, May 3, 2010 - 6:38 am. (1 message)

Next thread: Re: s2ram slow (radeon) / failing (usb) by Alan Stern on Monday, May 3, 2010 - 6:57 am. (3 messages)