I would propose disable the event source ASAP when disabled, without any
conditional test if possible. This is actually what the Kernel Markers
does with the help of Immediate Values. It should also come with a
finer-grained filtering based on a global "enable" variable and
per-buffer "enable" variables so a tracer can atomically start
collecting all of its event types. We could then require both
global tracing and per-buffer tracing to be enabled to write into the
buffer.
I think I see where Frank is going, and I agree that this is more or
less what I have had in mind : the Markers could be used as a global
event ID registry and could hold the event name, ID, types (format
string) table. Therefore, the markers would simply become the "Write to
buffer" interface, which would associate IDs automatically and keep the
format strings into a table dumped into a metainformation trace buffer
at trace start and whenever IDs are dynamically registered while the
trace is active.
As I said above, the tracepoints are meant to be a in-kernel API which
instruments the kernel code. It leaves the markers, which are meant to
be exposed to userspace anyway, for such record_event use. They would
actually accomplish two things : they would register the event (just
declaring the marker puts the event in a special section, which is our
mapping table) and would also record the event when enabled and
executed.
Given that systemtap may need to access the kernel state and the moment
the instrumentation is reached, I think it implies they have to be
called _before_ the data is writter to the buffers. We can thus see
SystemTAP as a very powerful filtering mechanism. SystemTAP could also
choose to directly use the instrumentation available (kprobes,
tracepoints, ...) when it does not need to act as a filter, but more
like a statistic gathering module. So I think we should simply provide a
callback filter registration mechanism in the filtering chain, thus
having a filtering pseudo-code looking like :
if (unlikely(marker_enabled))
if (likely(global_tracing_enabled))
if (likely(buffer_tracing_enabled))
if (likely(call_filter_chain()))
write to buffer
If we export the data to record through markers, we would be able to let
SystemTAP (registered in our filter chain) look at the event types being
passed to the buffers and thus perform clever filtering on the
information.
I think what Frank tries to express is that we would not lose any
flexibility, but make life much easier for everyone, if we use the
markers as the API to register event ids, keep their type table and to
export the data at runtime.
One addition that would be needed to the markers is to create a "binary
blob" type (size %lu binary %pW ? W would be for "Write") which would
express that this data is actually only parsable by specialized
functions which knows the types embedded within the binary blob. If
necessary, systemtap could then develop incrementally specialized
functions to deal with these blob, only if required. All the other basic
types would be easy to print or filter with a vsnprintf-like function.
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--