Re: [RFC] Create kinst/ or ki/ directory ?

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Linus Torvalds <torvalds@...>
Cc: Jeff Garzik <jeff@...>, Randy Dunlap <rdunlap@...>, <hch@...>, <linux-kernel@...>, Sam Ravnborg <sam@...>, Jens Axboe <jens.axboe@...>, Prasanna S Panchamukhi <prasanna@...>, Ananth N Mavinakayanahalli <ananth@...>, Anil S Keshavamurthy <anil.s.keshavamurthy@...>, David S. Miller <davem@...>, Ingo Molnar <mingo@...>, Peter Zijlstra <pzijlstr@...>, Philippe Elie <phil.el@...>, William L. Irwin <wli@...>, Arjan van de Ven <arjan@...>, Christoph Lameter <christoph@...>, <Valdis.Kletnieks@...>
Date: Tuesday, October 30, 2007 - 4:40 pm

* Linus Torvalds (torvalds@linux-foundation.org) wrote:

vmstat "counter increments" and blktrace instrumentation, profile.c
"profile_hits" calls could be all expressed as "generic markup", and
then used for profiling and tracing. But that would imply the creation
of a markup management that would permit it without hurting performance.


If we have to put it that way, code markup can be itself seen as a
user-visible interface. The marker name, if a particular analysis
depends on it, will have to keep its name unchanged. The same applies to
the arguments passed to it. Therefore, even though the scheduler code
changed a lot over the past 10 years, its context switch marker could always
be expressed as 

  trace_mark(kernel_sched_schedule,
         "prev_pid %d next_pid %d prev_state %ld",
         prev->pid, next->pid, prev->state);

Where kernel_sched_schedule and the format string field names are kept
unchanged. Only its location and the name of the variables it touches
could have to be modified to follow the kernel tree.



Since I am not a kprobe user myself, so I understand you completely. :)
What users expect when they try to fix that kind of issue, when oprofile
and gdb are not sufficient, is to start a data collection mechanism that
will tell them what is going in their system at large, without requiring
them to write kernel code.

However, that involves marking up key kernel code that will call into a
tracer to extract that information. Other projects has done this in
different ways.. SystemTAP, for instance, does it out of tree by keeping
a separate list of address where kprobes must be installed. It does the
job on a distribution kernel maintainer perspective (Redhat), since they
freeze to a particular kernel version and update this list every time it
breaks, but will always be a source of frustration for vanilla kernel
users and kernel developers. I think the best way to follow the code flow
is to add markup in the code itself: it would follow the kernel HEAD and
let each subsystem maintainer identify the key instrumentation sites of
their subsystem.

It's important to state that if anyone want to have his own marker set
in a separate patchset, he can do so. I currently have my own set of
markers to trace the most important kernel sites required to analyze and
show a trace of the Linux kernel in my LTTng kernel tracer. It's derived
from the set found in LTT which did not change much in about 8 years. I
could always submit that for comments to see how subsystem maintainers
will react to the proposed instrumentation.

About extending on ptrace, I am sorry to say that this solution has the
same downsides as kprobes: it is too slow for high performance
applications, especially if turned on system-wide. It will also change
the system behavior so much that it may hide the bugs and performance
issues people are struggling to find. Ptrace is very good at what it
does: looking inside _one_ application and tracing its system calls and
signals, but the approach finds its limits when we are trying to look at
the interactions between multiple applications and the kernel more
globally.



Absolutely. Let's do it.


Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[RFC] Create instrumentation directory (git repository), Mathieu Desnoyers, (Mon Oct 29, 5:51 pm)
Re: [RFC] Create instrumentation directory (git repository), Christoph Lameter, (Mon Oct 29, 7:20 pm)
Re: [RFC] Create instrumentation directory (git repository), Mathieu Desnoyers, (Mon Oct 29, 7:40 pm)
Re: [RFC] Create instrumentation directory (git repository), Christoph Lameter, (Mon Oct 29, 7:45 pm)
Re: [RFC] Create instrumentation directory (git repository), Mathieu Desnoyers, (Mon Oct 29, 7:04 pm)
Re: [RFC] Create kinst/ or ki/ directory ?, Mathieu Desnoyers, (Tue Oct 30, 1:24 pm)
Re: [RFC] Create kinst/ or ki/ directory ?, Linus Torvalds, (Tue Oct 30, 1:50 pm)
Re: [RFC] Create kinst/ or ki/ directory ?, Mathieu Desnoyers, (Tue Oct 30, 2:56 pm)
Re: [RFC] Create kinst/ or ki/ directory ?, Sam Ravnborg, (Tue Oct 30, 5:46 pm)
Re: [RFC] Create kinst/ or ki/ directory ?, Linus Torvalds, (Tue Oct 30, 3:25 pm)
Re: [RFC] Create kinst/ or ki/ directory ?, Mathieu Desnoyers, (Tue Oct 30, 4:40 pm)
Re: [RFC] Create kinst/ or ki/ directory ?, Frank Ch. Eigler, (Wed Oct 31, 11:48 am)
Re: [RFC] Create kinst/ or ki/ directory ?, Mathieu Desnoyers, (Wed Oct 31, 12:36 pm)
Re: [RFC] Create kinst/ or ki/ directory ?, Frank Ch. Eigler, (Wed Oct 31, 3:29 pm)
Re: [RFC] Create kinst/ or ki/ directory ?, Arjan van de Ven, (Wed Oct 31, 12:29 pm)
Re: [RFC] Create kinst/ or ki/ directory ?, Frank Ch. Eigler, (Wed Oct 31, 3:05 pm)
Re: [RFC] Create kinst/ or ki/ directory ?, Arjan van de Ven, (Wed Oct 31, 3:49 pm)
Re: [RFC] Create kinst/ or ki/ directory ?, Christoph Hellwig, (Tue Oct 30, 1:58 pm)
Re: [RFC] Create kinst/ or ki/ directory ?, Peter Zijlstra, (Tue Oct 30, 1:49 pm)
Re: [RFC] Create instrumentation directory (git repository), Mathieu Desnoyers, (Mon Oct 29, 7:35 pm)
Re: [RFC] Create instrumentation directory (git repository), Mathieu Desnoyers, (Mon Oct 29, 9:38 pm)
Re: [RFC] Create instrumentation directory (git repository), Arnaldo Carvalho de Melo, (Tue Oct 30, 5:13 am)