Re: TRACE_EVENT() declarations belong to include/trace/

Previous thread: [PATCH] pda_power: add function callbacks for suspend and resume by Daniel Mack on Monday, April 12, 2010 - 2:33 pm. (3 messages)

Next thread: [PATCHv2 1/2] fbdev: allow passing more than one aperture for handoff by marcin.slusarz on Monday, April 12, 2010 - 3:12 pm. (2 messages)
From: Mathieu Desnoyers
Date: Monday, April 12, 2010 - 2:45 pm

Hi,

Ranting about:

commit 1bf4af165050d90ea6659ffb2536ec8ca783aab5
Author: Anton Blanchard <anton@samba.org>
Date:   Mon Oct 26 18:47:42 2009 +0000

    powerpc: tracing: Add powerpc tracepoints for interrupt entry and exit

Why are there TRACE_EVENT() declarations in arch/powerpc/include/asm/trace.h for
irq_entry/exit ?

What's so special about them that they cannot be put in linux/trace/ ?

I'm all for the trace_irq_entry/exit instrumentation, but I don't see any good
in adding event declarations outside of include/trace/.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--

From: Frederic Weisbecker
Date: Monday, April 12, 2010 - 3:04 pm

Yeah,

If this is to trace all irqs, then it seems to me the wrong way.
We already have generic irq_handler_entry and irq_handler_exit trace events.

May be those in powerpc are here to get the spurious irqs by computing
a diff between generic and arch irq events? In which case
it would be better to get dedicated spurious irq tracepoints.

--

From: Mathieu Desnoyers
Date: Monday, April 12, 2010 - 3:17 pm

The commit changelog :

<quote>
commit 1bf4af165050d90ea6659ffb2536ec8ca783aab5
Author: Anton Blanchard <anton@samba.org>
Date:   Mon Oct 26 18:47:42 2009 +0000

    powerpc: tracing: Add powerpc tracepoints for interrupt entry and exit
    
    This adds powerpc-specific tracepoints for interrupt entry and exit.
    
    While we already have generic irq_handler_entry and irq_handler_exit
    tracepoints there are cases on our virtualised powerpc machines where an
    interrupt is presented to the OS, but subsequently handled by the hypervisor.
    This means no OS interrupt handler is invoked.
    
    Here is an example on a POWER6 machine with the patch below applied:
    
    <idle>-0     [006]  3243.949840744: irq_entry: pt_regs=c0000000ce31fb10
    <idle>-0     [006]  3243.949850520: irq_exit: pt_regs=c0000000ce31fb10
    
    <idle>-0     [007]  3243.950218208: irq_entry: pt_regs=c0000000ce323b10
    <idle>-0     [007]  3243.950224080: irq_exit: pt_regs=c0000000ce323b10
    
    <idle>-0     [000]  3244.021879320: irq_entry: pt_regs=c000000000a63aa0
    <idle>-0     [000]  3244.021883616: irq_handler_entry: irq=87 handler=eth0
    <idle>-0     [000]  3244.021887328: irq_handler_exit: irq=87 return=handled
    <idle>-0     [000]  3244.021897408: irq_exit: pt_regs=c000000000a63aa0
    
    Here we see two phantom interrupts (no handler was invoked), followed
    by a real interrupt for eth0. Without the tracepoints in this patch we
    would have missed the phantom interrupts.
</quote>

states that this is done for setups where no in-kernel handler is called. But it
does not say if tracing the beginning and end of handle_IRQ_event() from
kernel/irq/handle.c would fix the problem. That would be a lot neater than this
arch-specific solution.

Thanks,


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--

From: Anton Blanchard
Date: Monday, April 12, 2010 - 4:17 pm

Unfortunately that misses this problem completely. On some versions of the
POWER hypervisor we can be presented with interrupts for our virtualisation
layer that get handled in the get_irq hypervisor call. The code looks like
this:


void do_IRQ(struct pt_regs *regs)
{
        struct pt_regs *old_regs = set_irq_regs(regs);
        unsigned int irq;

        trace_irq_entry(regs);

        irq_enter();

        check_stack_overflow();

        irq = ppc_md.get_irq();		<------------- jitter spikes here

        if (irq != NO_IRQ && irq != NO_IRQ_IGNORE)
                handle_one_irq(irq);
        else if (irq != NO_IRQ_IGNORE)
                __get_cpu_var(irq_stat).spurious_irqs++;


We've had HPC customers who have experienced jitter in their applications
caused by this and as a result I added the events so we can monitor it.

Since this is a POWER specific issue I'm happy to rename the trace events to
powerpc_irq_entry/exit. We could also look at changing the tracepoints, eg
putting it around the ppc_md.get_irq(), but I can't see how we can remove
them completely.

Anton
--

From: Mathieu Desnoyers
Date: Monday, April 12, 2010 - 5:27 pm

OK, I see. How about arch_irq_entry/exit() ?

This way, if we need to do something similar on another arch at the
architecture-level, we can use the same names.

Thanks,


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--

From: Anton Blanchard
Date: Monday, April 12, 2010 - 4:24 pm

The number of spurious irqs are not interesting - we track them in
/proc/interrupts. The duration of the disturbances are, and they were big
enough for people to see them in certain HPC loops.

Anton
--

From: Steven Rostedt
Date: Monday, April 12, 2010 - 3:01 pm

If there is any specific architecture data being recorded in the
TRACE_EVENT() macro, then it should be arch specific, but if not, then
it should go in  include/trace/

/me goes to look at the code.

-- Steve

--

Previous thread: [PATCH] pda_power: add function callbacks for suspend and resume by Daniel Mack on Monday, April 12, 2010 - 2:33 pm. (3 messages)

Next thread: [PATCHv2 1/2] fbdev: allow passing more than one aperture for handoff by marcin.slusarz on Monday, April 12, 2010 - 3:12 pm. (2 messages)