Hi,
Ranting about:
commit 1bf4af165050d90ea6659ffb2536ec8ca783aab5
Author: Anton Blanchard <anton@samba.org>
Date: Mon Oct 26 18:47:42 2009 +0000
powerpc: tracing: Add powerpc tracepoints for interrupt entry and exit
Why are there TRACE_EVENT() declarations in arch/powerpc/include/asm/trace.h for
irq_entry/exit ?
What's so special about them that they cannot be put in linux/trace/ ?
I'm all for the trace_irq_entry/exit instrumentation, but I don't see any good
in adding event declarations outside of include/trace/.
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
Yeah, If this is to trace all irqs, then it seems to me the wrong way. We already have generic irq_handler_entry and irq_handler_exit trace events. May be those in powerpc are here to get the spurious irqs by computing a diff between generic and arch irq events? In which case it would be better to get dedicated spurious irq tracepoints. --
The commit changelog :
<quote>
commit 1bf4af165050d90ea6659ffb2536ec8ca783aab5
Author: Anton Blanchard <anton@samba.org>
Date: Mon Oct 26 18:47:42 2009 +0000
powerpc: tracing: Add powerpc tracepoints for interrupt entry and exit
This adds powerpc-specific tracepoints for interrupt entry and exit.
While we already have generic irq_handler_entry and irq_handler_exit
tracepoints there are cases on our virtualised powerpc machines where an
interrupt is presented to the OS, but subsequently handled by the hypervisor.
This means no OS interrupt handler is invoked.
Here is an example on a POWER6 machine with the patch below applied:
<idle>-0 [006] 3243.949840744: irq_entry: pt_regs=c0000000ce31fb10
<idle>-0 [006] 3243.949850520: irq_exit: pt_regs=c0000000ce31fb10
<idle>-0 [007] 3243.950218208: irq_entry: pt_regs=c0000000ce323b10
<idle>-0 [007] 3243.950224080: irq_exit: pt_regs=c0000000ce323b10
<idle>-0 [000] 3244.021879320: irq_entry: pt_regs=c000000000a63aa0
<idle>-0 [000] 3244.021883616: irq_handler_entry: irq=87 handler=eth0
<idle>-0 [000] 3244.021887328: irq_handler_exit: irq=87 return=handled
<idle>-0 [000] 3244.021897408: irq_exit: pt_regs=c000000000a63aa0
Here we see two phantom interrupts (no handler was invoked), followed
by a real interrupt for eth0. Without the tracepoints in this patch we
would have missed the phantom interrupts.
</quote>
states that this is done for setups where no in-kernel handler is called. But it
does not say if tracing the beginning and end of handle_IRQ_event() from
kernel/irq/handle.c would fix the problem. That would be a lot neater than this
arch-specific solution.
Thanks,
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
Unfortunately that misses this problem completely. On some versions of the
POWER hypervisor we can be presented with interrupts for our virtualisation
layer that get handled in the get_irq hypervisor call. The code looks like
this:
void do_IRQ(struct pt_regs *regs)
{
struct pt_regs *old_regs = set_irq_regs(regs);
unsigned int irq;
trace_irq_entry(regs);
irq_enter();
check_stack_overflow();
irq = ppc_md.get_irq(); <------------- jitter spikes here
if (irq != NO_IRQ && irq != NO_IRQ_IGNORE)
handle_one_irq(irq);
else if (irq != NO_IRQ_IGNORE)
__get_cpu_var(irq_stat).spurious_irqs++;
We've had HPC customers who have experienced jitter in their applications
caused by this and as a result I added the events so we can monitor it.
Since this is a POWER specific issue I'm happy to rename the trace events to
powerpc_irq_entry/exit. We could also look at changing the tracepoints, eg
putting it around the ppc_md.get_irq(), but I can't see how we can remove
them completely.
Anton
--
OK, I see. How about arch_irq_entry/exit() ? This way, if we need to do something similar on another arch at the architecture-level, we can use the same names. Thanks, -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com --
The number of spurious irqs are not interesting - we track them in /proc/interrupts. The duration of the disturbances are, and they were big enough for people to see them in certain HPC loops. Anton --
If there is any specific architecture data being recorded in the TRACE_EVENT() macro, then it should be arch specific, but if not, then it should go in include/trace/ /me goes to look at the code. -- Steve --
