Agreed, although 65536 types ID is probably overkill for the common case.
I prefer to go for approaches with a header that contains a smaller number of
bits, and use an extended header for those rare cases that need it.
Also useless for lttng.
Ditto.
Yep :)
Finally ;)
Yep, you'd have to support the two formats side-to-side for a while anyway. So
we can definitely call it a ABI breakage rather than extension.
That's right. It's more in the trace-clock area. Let's keep this problem for
later, as we are focusing on the ABIs.
Yep, also trace-clock related. No effect on ABI.
There were more details below on the impact of supporting flight recorder on the
trace format (using sub-buffers, etc). The ABI impact is more than just a flag,
although adding a flag is a good starting point. ;-)
This one could be done through ABI extension I guess.
This one is when the kernel is crashed. So there is not much still available,
certainly not splice(). :) The idea is to keep the trace buffers around in the
system after a OOPS (or hard lockup) so that they can be gathered later on.
Portable bitfields comes to my mind. And no, it's not enough to just reverse the
byte order across endianness.
The setup is that the traces are gathered on telecom switches, and brought to a
host machine for viewing. The user has to deal with traces gathered from various
kernel versions.
I did push Steven to support cross-endianness and self-describing types in
Ftrace in the past, and I have to admit that a large part of this requirement is
met, which is good.
Yep, this one involves that the trace metadata (currently exported through
debugfs) should make its way along with the trace stream. One way to do it would
be to have a small separate buffer to transport the metadata.
Being able to set the periodic timer flush impact the ABI (very slightly).
Yep. Mainly and largely has big impacts on trace clock implementation.
There are ways to layout the trace data so that a userspace tool can dig through
it quickly. Therefore it impacts the trace format too.
I'm working for the Linux Foundation CELF group and Ericsson, with the
Multi-Core Association, to come up with a standardized trace format across
trace providers in the industry, so that we can use the same tools to analyze
traces taken from heterogeneous systems (hardware traces, OS traces, user-space
traces...).
Given the live analysis and low-overhead requirements, being able to generate
this trace format natively would be a great gain.
Nope, this one is an ABI breakage. The current mmap shared control head/tail
values used for synchronization between the kernel (writer) and user-space
(reader) does not allow concurrent read/write in flight recorder mode. We need,
at the very least, to call the kernel after we've finished reading a sub-buffer.
Yep.
Because we need to get exclusive access to the next sub-buffer (exchanging it
with the one we currently own). This operation is an atomic pointer CAS (or
exchange for ftrace), which should only be done by the kernel.
This problem applies to both Ftrace and Perf. If you have the following
scenario:
1 - start tracing
2 - debugfs event descriptions are read
3 - load a module with tracepoints in it or add a dynamic kprobe
4 - hit the newly added events
5 - stop tracing
Then you end up being unable to parse the dynamically loaded information. That
is if the dynamically loaded instrumentation ends up being activated at all.
In a context where distributions load modules like KVM on demand, it does not
make sense to keep these events out of the trace just because they have been
dynamically loaded without the user knowledge. The problem is twofold here:
1 - we need to be able to specify which tracepoints are to be activated
independently of their location (kernel/modules) and of whether or not they
currently exist.
2 - we need to be able to append to the event list (metadata) while the trace is
being gathered.
Yep.
Well, it all depends on how much the ftrace tools expect the sub-buffer size to
be 4kB.
Nope, this is userspace tracing performed all in userspace. However, if we want
to share the same trace format, then we need to come up with a trace format that
is not inherently tied to a scheme where preemption can be disabled.
Yep.
Yep.
Explained above.
Thanks for the feedback!
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--