Re: [RFC] x86,perf: Implement minimal P4 PMU driver v14

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Ingo Molnar
Date: Thursday, March 11, 2010 - 11:16 am

* Cyrill Gorcunov <gorcunov@openvz.org> wrote:


tried it on a Pentium-D dual core CPU, and it boots fine:

[    0.020009] using mwait in idle threads.
[    0.021004] Performance Events: Netburst events, Netburst P4/Xeon PMU driver.
[    0.024006] ... version:                0
[    0.025003] ... bit width:              40
[    0.026003] ... generic registers:      18
[    0.027003] ... value mask:             000000ffffffffff
[    0.028003] ... max period:             0000007fffffffff
[    0.029003] ... fixed-purpose events:   0
[    0.030003] ... event mask:             000000000003ffff
[    0.031027] ACPI: Core revision 20100121
[    0.050126] Setting APIC routing to flat
[    0.051010] enabled ExtINT on CPU#0

perf stat seems to work fine as well:

rhea:~> perf stat ls >/dev/null

 Performance counter stats for 'ls':

       6.596037  task-clock-msecs         #      0.439 CPUs 
              1  context-switches         #      0.000 M/sec
              0  CPU-migrations           #      0.000 M/sec
            236  page-faults              #      0.036 M/sec
        4745843  cycles                   #    719.499 M/sec
              0  instructions             #      0.000 IPC  
  <not counted>  cache-references        
  <not counted>  cache-misses            

    0.015009286  seconds time elapsed

perf top works fine as well:

------------------------------------------------------------------------------
   PerfTop:   25056 irqs/sec  kernel:25.7% [100000 cycles],  (all, 2 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

              845.00 -  6.6% : __switch_to
              785.00 -  6.1% : schedule
              687.00 -  5.3% : perf_poll
              455.00 -  3.5% : _raw_spin_lock_irqsave
              436.00 -  3.4% : delay_tsc
              371.00 -  2.9% : fget_light
              346.00 -  2.7% : pick_next_task_fair
              328.00 -  2.5% : fput
              285.00 -  2.2% : free_poll_entry

i also triggered this:

[  436.224139] PMU: Dep events are not implemented yet

i'm getting a healthy amount of NMIs:

NMI:      44400     108796   Non-maskable interrupts

perf record + report works fine too:

# Samples: 32829281626
#
# Overhead          Command       Shared Object  Symbol
# ........  ...............  ..................  ......
#
    11.22%     pipe-test-1m  [kernel.kallsyms]   [k] __switch_to
     4.82%     pipe-test-1m  [kernel.kallsyms]   [k] switch_mm
     4.37%     pipe-test-1m  [kernel.kallsyms]   [k] schedule
     3.01%     pipe-test-1m  [kernel.kallsyms]   [k] pipe_read
     2.96%     pipe-test-1m  [kernel.kallsyms]   [k] system_call
     2.53%     pipe-test-1m  [kernel.kallsyms]   [k] update_curr
     2.15%     pipe-test-1m  [kernel.kallsyms]   [k] vfs_read

perf annotate __switch_to works too, and sees inside irqs-disabled regions due 
to NMI sampling:

    0.00 :      ffffffff81001664:       48 89 c2                mov    %rax,%rdx
    0.18 :      ffffffff81001667:       b9 00 01 00 c0          mov    $0xc0000100,%ecx
    0.00 :      ffffffff8100166c:       48 c1 ea 20             shr    $0x20,%rdx
    0.00 :      ffffffff81001670:       0f 30                   wrmsr  
   67.80 :      ffffffff81001672:       45 85 ff                test   %r15d,%r15d
    1.85 :      ffffffff81001675:       66 89 b3 8c 04 00 00    mov    %si,0x48c(%rbx)
    5.35 :      ffffffff8100167c:       41 0f b7 bd 8e 04 00    movzwl 0x48e(%r13),%edi
    0.00 :      ffffffff81001683:       00 

(and that wrmsr is indeed one known overhead point in __switch_to.)

All in one, the P4 PMU perf driver works on this box like a charm and all the 
common profiling workflows work out of box, without any serious limitations - 
really nice work! (Obviously some events wont work yet, etc.)

So it's pretty impressive and i've queued up your patch in tip:perf/x86 and 
will merge it into perf/core after others had a chance to test it too.

	Ingo
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[RFC] x86,perf: Implement minimal P4 PMU driver v14, Cyrill Gorcunov, (Wed Mar 10, 11:31 am)
Re: [RFC] x86,perf: Implement minimal P4 PMU driver v14, Robert Richter, (Wed Mar 10, 12:29 pm)
Re: [RFC] x86,perf: Implement minimal P4 PMU driver v14, Cyrill Gorcunov, (Wed Mar 10, 12:43 pm)
Re: [RFC] x86,perf: Implement minimal P4 PMU driver v14, Cyrill Gorcunov, (Wed Mar 10, 9:12 pm)
Re: [RFC] x86,perf: Implement minimal P4 PMU driver v14, Cyrill Gorcunov, (Thu Mar 11, 9:54 am)
Re: [RFC] x86,perf: Implement minimal P4 PMU driver v14, Ingo Molnar, (Thu Mar 11, 11:16 am)
Re: [RFC] x86,perf: Implement minimal P4 PMU driver v14, Cyrill Gorcunov, (Thu Mar 11, 11:29 am)
[tip:perf/x86] perf, x86: Implement initial P4 PMU driver, tip-bot for Cyrill G ..., (Thu Mar 11, 11:33 am)
Re: [RFC] x86,perf: Implement minimal P4 PMU driver v14, Ingo Molnar, (Thu Mar 11, 11:39 am)
Re: [RFC] x86,perf: Implement minimal P4 PMU driver v14, Cyrill Gorcunov, (Thu Mar 11, 2:15 pm)
Re: [RFC] x86,perf: Implement minimal P4 PMU driver v14, Peter Zijlstra, (Thu Mar 11, 2:24 pm)
Re: [RFC] x86,perf: Implement minimal P4 PMU driver v14, Cyrill Gorcunov, (Thu Mar 11, 2:31 pm)
Re: [RFC] x86,perf: Implement minimal P4 PMU driver v14, Peter Zijlstra, (Thu Mar 11, 2:38 pm)
Re: [RFC] x86,perf: Implement minimal P4 PMU driver v14, Cyrill Gorcunov, (Thu Mar 11, 2:41 pm)
Re: [RFC] x86,perf: Implement minimal P4 PMU driver v14, Cyrill Gorcunov, (Thu Mar 11, 2:50 pm)
[tip:perf/x86] x86, perf: Fix NULL deref on not assigned x ..., tip-bot for Cyrill G ..., (Fri Mar 12, 2:54 am)
[tip:perf/core] perf, x86: Report error code that returned ..., tip-bot for Robert R ..., (Wed Mar 17, 2:48 am)