Hi,
Since we already have the Instrumentation menu in
kernel/Kconfig.instrumentation and instrumentation code all over the
kernel tree:arch/*/oprofile/*.c
kernel/kprobes.c
arch/*/kernel/kprobes.c
kernel/marker.c
kernel/profile.c
kernel/lockdep.c
vm/vmstat.c
block/blktrace.c
drivers/base/power/trace.cWe could move them to
instrumentation/
arch/*/instrumentation/Therefore, we could also move the kprobes and marker samples under
instrumentation/samples/
Here is a link to a git repository containing the changes, based on
2.6.24-rc1:git://ltt.polymtl.ca/linux-2.6-instrumentation.git instrumentation-for-linus
(the interesting range is : v2.6.24-rc1..instrumentation-for-linus)Through the gitweb interface:
http://ltt.polymtl.ca/cgi-bin/gitweb.cgi?p=linux-2.6-instrumentation.gitFeedback is appreciated. Sorry for the huge CC list, but the change
involves many maintainers.Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
The vm statistics are important for the operation of the VM. They are not
optional. So I do not think that they fall under the category of
instrumentation.
-
But I guess vm stats can be useful to others; a kernel tracer for
instance ?Putting stuff in instrumentation/ by no way means that it becomes
optional for a subsystem, but merely that it could either export
information useful for kernel instrumentation or have some
infrastructure parts merged with others.Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
The vm statistics are intricately connected with other mm code. Best leave
it where it is. The other instrumentation is something that is put in
particularly for gaining statistics.-
More reason why you should not be moving stuff all around the tree...
Really, file structure is one of the LEAST important issues around --
while moving files around introduces a non-zero amount of pain.New files -- like that godawful and nearly empty samples/ directory --
sure, fix that up before release. But let's not break diffs of existing
architectures without good reason.Jeff
-
Two more added. Jeff Garzik and Christoph H. sometimes have some comments
about this.It would be helpful if we could get comments on this in the next day
or two [instead of in 1-2 weeks].Thanks,
---
~Randy
-
"instrumentation" is long, and painful to the fingers :)
Jeff
-
And quoting the answer from Valdis.Kletnieks@vt.edu :
How so? i n s esc. 4 keystrokes (and still 2 more than D<ESC> ;)Better suggestions are wery welcome. However, in modern shells,
auto-completion is cheap nowadays.Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
That is no excuse for extreme verbosity. It makes ls(1) displays ugly,
it makes diffstat ugly, it causes long pathnames to be truncated in
various display-oriented programs.Pick a shorter word like probes or profile or what... or better yet...
just leave most things in their current directories.Shuffling files around just to put them into directories with extra-long
names is highly undesirable.Jeff
-
* Jeff Garzik (jeff@garzik.org) wrote:
...How about something along the
kinst or ki
lines ?
(for "kernel instrumentation")
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
No, that's horrible.
Also, in general, why do people want to have an "instrumentation" thing?
Yes, you can put random things into the same box, but that doesn't make
them be the same thing. Personally, I don't think "instrumentation" is
very useful at all. I consider "profiling" and "markers" to be two
fundamentally different things, and putting them both in the same box does
not make them any more similar.Yes, technically they are both "instrumentation", but hey, technically the
VM and the VFS layer are both "infrastructure", but we don't put *those*
in a "infrastructure" subdirectory.In other words, the fact that two different things share some attribute
does not mean that they should be collapsed together by that attribute,
does it?I think "instrumentation" was/is a particularly bad thing to group things
by. It doesn't actually tell you anything about the thing, and it's not
even true that some people are interested in "instrumentation" and others
aren't.For example: I think profiling support is something REALLY FUNDAMENTAL.
It's something each and every developer should generally care about, and
OProfile should be considered an indispensable tool for any developer, on
par with something like gdb.In contrast, we should *not* expect most people to do any kernel markers
etc. That's a very esoteric thing.So I actually think that the current Kconfig.instrumentation should be
*removed*. Rather than adding more groupings based on that fundamentally
flawed premise of false commonality.Linus
-
The key idea for collapsing profiling, markup and tracing was that
marking up the code is required for both profiling and tracing. It'sIt becomes interesting when they can share code and/or a common control
architecture. The fact that markup could be shared between profiling andOk, so maybe we should keep "markup", "tracing" and "profiling"
With SMP systems becoming cheap commodity hardware, each and every
developer increasingly face thorny race problems, both in user-space
apps and in the kernel, which may involve hypervisor-kernel-userspace
interaction. Sadly, the blame is often put on kernel developers because
tools like gdb, oprofile and strace are practically useless to solve
such problems and people lack the right tool for the job.Therefore, marking up the code to perform tracing should not be
considered esoteric: it's a very useful tool when one needs to
understand what is happening in their large scale system. Userspace
doesn't always have the ability to isolate problems and, worse, some
problems a just unreproduceable when tried to be isolated. I think it is
sensible to give them a tool that helps them understanding what is goingShould it come with a re-duplication of it's content into each
architecture, which was the case previously ? The oprofile and kprobes menu
entries were litteraly cut and pasted from one architecture to another.
Should we put its content in init/Kconfig then ?Regards,
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
Stuff it into a new file: arch/Kconfig
We can then extend this file to include all the 'trailing'
Kconfig things that are anyway equal for all ARCHs.But it should be kept clean - so if we introduce such a file
then we should use ARCH_HAS_whatever in the arch specific Kconfig
files to enable stuff that is not shared.Sam
-
What code is actually shared?
Regardless, an internal implementation issue is *not* a good basis for a
I think so. At least conceptually - ie it might be fine to share a Kconfig
Well, the thing is, most of the time, those app developers will not be
doing kernel-level markers. But they may well be doing profiling.Speaking as an application developer myself (git), I care deeply about
good profiling info, and I love Oprofile. But even though I'm a kernel
person too, I'd not want to do kprobes. It's just not relevant to me as a
user-land developer.(I might want to extend on strace, but if so, I'd do it generically, not
as a "probe". For example, I'd love to see the page faults, but I think
they really *are* "system calls", so I think it would make more sense toI don't think it's a good idea to go back to making it per-architecture,
although that extensive "depends on <list-of-archiectures-here>" might
indicate that there certainly is room for cleanup there.And I don't think it's wrong keeping it in kernel/Kconfig.xyz per se, I
just think it's wrong to (a) lump the code together when it really doesn't
necessarily need to and (b) show it to users as some kind of choice that
is tied together (whether it then has common code or not).On the per-architecture side, I do think it would be better to *not* have
internal architecture knowledge in a generic file, and as such a line likedepends on X86_32 || IA64 || PPC || S390 || SPARC64 || X86_64 || AVR32
really shouldn't exist in a file like kernel/Kconfig.instrumentation.
It would be much better to do
depends on ARCH_SUPPORTS_KPROBES
in that generic file, and then architectures that do support it would just
have abool ARCH_SUPPORTS_KPROBES
default yin *their* architecture files. That would seem to be much more logical,
and is readable both for arch maintainers *and* for people who have no
clue - and don't care - about which architecture is supposed to support
which interf...
vmstat "counter increments" and blktrace instrumentation, profile.c
"profile_hits" calls could be all expressed as "generic markup", and
then used for profiling and tracing. But that would imply the creationIf we have to put it that way, code markup can be itself seen as a
user-visible interface. The marker name, if a particular analysis
depends on it, will have to keep its name unchanged. The same applies to
the arguments passed to it. Therefore, even though the scheduler code
changed a lot over the past 10 years, its context switch marker could always
be expressed astrace_mark(kernel_sched_schedule,
"prev_pid %d next_pid %d prev_state %ld",
prev->pid, next->pid, prev->state);Where kernel_sched_schedule and the format string field names are kept
unchanged. Only its location and the name of the variables it touchesSince I am not a kprobe user myself, so I understand you completely. :)
What users expect when they try to fix that kind of issue, when oprofile
and gdb are not sufficient, is to start a data collection mechanism that
will tell them what is going in their system at large, without requiring
them to write kernel code.However, that involves marking up key kernel code that will call into a
tracer to extract that information. Other projects has done this in
different ways.. SystemTAP, for instance, does it out of tree by keeping
a separate list of address where kprobes must be installed. It does the
job on a distribution kernel maintainer perspective (Redhat), since they
freeze to a particular kernel version and update this list every time it
breaks, but will always be a source of frustration for vanilla kernel
users and kernel developers. I think the best way to follow the code flow
is to add markup in the code itself: it would follow the kernel HEAD and
let each subsystem maintainer identify the key instrumentation sites of
their subsystem.It's important to state that if anyone want to have his own marker set
in a separate pat...
This misstates the details. What systemtap has out-of-tree is a list
of kernel function names (and parameter names), not addresses. This
list does change somewhat with kernel versions, but we generally keep
up. We do test with vanilla kernels, and several non-RH distributorsOf course - when and where the dormant overheads are acceptable, and
where the maintainers are willing to commit to a long-term interface
(marker name/arguments). Systemtap can connect to markers as well as
to kprobes and other event sources: mix & match based on what's
available in your particular kernel and what data/computation youRoland McGrath's ptrace-replacement (utrace) should help with this.
- FChE
-
That's right, Systemtap uses symbols, thanks for the clarification. But
my point is still valid: SystemTAP expects function names and argument
names to stay unchanged, therefore using the kernel code itself as an
API to userspace tools. The markers act as a buffer between whatI have not been able to detect a significant dormant marker overhead
with the immediate values optimization. A load immediate and a predictedI think that SystemTAP's flexibility is great, but leads to fagileness
wrt kernel code changes. If the "core events" required by SystemTAP
(and also by LTTng by the way) could be turned into markers, I think it
would gain in robustness.Providing the ability to instrument code locations with breakpoints, in
addition to this, will help users unsatisfied with the information
they have, unwilling to recompile their kernel or modules with their
own markers, ready to accept the two limitations :
- performance hit of a breakpoint
- unability to access variables within optimized functionsYes, I think he did a good job at it. However, it is not a replacement
for the markers, SystemTAP or LTTng, because it defines a limited
set of hardcoded events (implying yet another type of code markup) that
is by itself a pain to extend. I am not willing to ask a subsystem
maintainer to do more than to "just identify" their important code
paths, the equivalent of adding a printk to their code. I don't think it
is realistic to ask them to create specialized callbacks for each of the
sites they would like to instrument.So I would say : I'll try to submit a core set of markers patches for
review on LKML and see what people have to say.--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
Hi -
To be precise, this applies *kprobes*-based probes only. In
acceptance of this fragility, systemtap includes constructs (aliases,
version-dependent conditionals) to make it reasonably easy to adapt toThat latter point has been repeatedly overstated. Markers provides a
fixed set of values. kprobes/dwarf provides access to any statements
and any values (including locals) that a compiler did not altogether
elide. While the latter set is by its nature variable, it will be
much bigger than anything a reasonable set of markers will everRight, not as a whole, but it *could* be an alternative way to hook
Thank you. Our team is already in contact to help.
- FChE
-
On Wed, 31 Oct 2007 11:48:20 -0400
yes so please please submit this stuff for mainline inclusion as has
been asked quite a few times before.--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
-
Hi -
OK, but I don't recall receiving a clear answer as to how you envision
this would work. Would you support distribution of some systemtap
script files in some new subdirectory?- FChE
-
On Wed, 31 Oct 2007 15:05:06 -0400
yes absolutely.
-
I completely agree. This is not the kind of thing we need to categorize.
-
I think I'm with jgarzik on this, lets not do this until its clear where
the generalized instrumentation goes to.That is, i386/x86_64 -> x86 was part of a full integration plan, one
that was immediately followed up by a series of integration patches.With this, I see no such plan. Please draft this generic instrumentation
you talk about, if after that we all like it, we can go moving files
together with the immediate purpose of integrating them.-
I'll keep the probes and profile directory name ideas in mind, thanks.
This patchset does more than moving things around : its purpose is to
gather various kernel files that have similar purpose (instrumentation)
into a single directory so that it becomes easier to work on these
without duplicating the effort.I see no good reason to have so many different adhoc instrumentation
mechanisms for profiling (sched, vm, oprofile) and tracing (blktrace,
suspend/resume tracing) all over the place. Merging them in a single
directory seems like a good step towards a more generic
instrumentation/profiling/tracing infrastructure.Back to "profile" and "probes" directory names, they might be short, but
they do not represent the whole markup-profiling-tracing trio,
"profile" lacks the tracing part and "probe" lacks the markup part.Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
Moving files about in directories should be at the /lowest/ end of the
priority scale. It makes diffs unreadable, file histories and diffing
difficult, and a host of other problems.Please solve the /real/ problems, and then come back and clean up the
file structure after that is done. Massive file renaming to satisfying
some imagined future everything-is-golden scheme is the /last/ step. It
is the last step taken because the previous steps inevitably give you
guidance that you otherwise would not have had at the start of the task.When I try to diff between old and new alpha oprofile code, I really
want to know that the reason why diffing is a pain in the ass is moreYou can always add more letters (and words) to even reach the desired
level of specificity. That does nothing to help readability though.Anyway, it should be clear from existing precedent -- existing pathnames
-- that "instrumentation" is too long, and really IMO too vague anyway.Jeff
-
And how is this confirmed by the way the i386-x86_64 -> x86 merge is
done ? It seems like a good current counter-example of what you just
affirmed.First organizing the functionally similar existing code into a single
placeholder will just help finding code duplication, just like two very
similar architectures such as i386 and x86_64.Talking about solving "real" problems, this is what I have been working
on for about 3 years in the kernel tracing area, writing the LTTng
tracer. What I see at this point is that there is a strong interest for
collaboration between the instrumentation projects (LTTng, SystemTAP,
DTI), but since the code ends up being sprinkled all across the kernel,
it's rather hard to spot duplicates. Actually, I just ran into Linus's
suspend/resume tracer _today_.Talking about solving real problems, this is also what I did with the
Linux Kernel Markers patch, which can now be used to instrument the
kernel code. But it only deals with one aspect of instrumentation: the
markup itself.I would categorize what we need for instrumentation in the following
categories :- Data identification
* static markup, enabled dynamically, very low impact
* dynamic markup
* oprofile (especially for the performance counters)
* stack traces
- Control
* Tracing management
* Profiling management
* PMC management
- Data extraction
* relay
* debugfs
* serial port output
* LKCDWhat I consider to fit into the instrumentation directory is the data
identification and the control mechanisms. The data extraction should be
done be generic pieces of infrastructure already present in the kernel.Your suggestion of "first fixing the real problems" (do you mean by
this : add new code ?) and later bother about the file structure just
seems to go against most suggestions I have received from kernel
developers in the past years. Getting something new in the kernel is
much more straightforward if someone is willing to first clean up the
mess (I am quot...
i14m
hooks8)
- Arnaldo
-
| Davide Libenzi | Re: [patch 7/8] fdmap v2 - implement sys_socket2 |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
| Greg Kroah-Hartman | [PATCH 005/196] Chinese: add translation of SubmittingDrivers |
| Mariusz Kozlowski | [KJ PATCHES] mostly kmalloc + memset conversion to k[cz]alloc |
git: | |
| KOSAKI Motohiro | [bug?] tg3: Failed to load firmware "tigon/tg3_tso.bin" |
| Stefan Richter | Re: [GIT]: Networking |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 0/37] dccp: Feature negotiation - last call for comments |
