OK.
I guess timestamps should be always on.
This would affect the ptrace interface. There will be less options and I
need to drop the DRAIN and CLEAR commands. With multiple tracers, I can
no longer clear the BTS buffer. Drain will morph into a whole-buffer
read.
OK.
Branch trace buffers tend to run full pretty fast.
For example:
- switching from a task (from user-mode to __switch_to_xtra()) takes
~300 entries (7.2k) - handling a segv (from null function pointer call
to __switch_to_xtra()) takes ~400 entries (9.6k)
- 512k are not enough to hold a single time slice (measured on a printf
loop)
On the other hand:
- knowing where you died takes 1 entry
- knowing how you got there takes <100 entries or you lose track, anyway
If we add ~10k to the buffer size the user requested, we should be able
to hold the extra kernel-mode trace and do the filtering in software -
assuming that we will never hold more trace than the tail of a single
time slice.
Who should pay for this memory?
Should it be taken from the requesting task's rlimit?
Who should pay in the case of multiple tracers?
What if the initial tracer leaves while other tracers remain? (the
initial tracer would want its rlimit adjusted, but other tracers may not
even have enough resources).
Should I drop the whole rlimit check and allow everybody to request
arbitrarily big buffers?
Given the vast memory consumption, I would only consider circular
buffers. That's all you need for debugging. Trying to collect a full
system trace even for a few seconds will likely fill up an entire disk
server. We may add support for an overflow interrupt for each of the
various buffers, but I doubt that it will be very useful. Trying to get
a consistent global trace will likely eat up more memory and compute
time than its worth.
I would go for per-task buffers and a catch-all per-cpu buffer (if
per-cpu trace is requested).
I would not try to remember the sequence of recent tasks and construct a
consistent global trace on the fly - we would need too big per-task
buffers. We could copy the per-task trace into the per-cpu buffer on
context switch, but, again, I doubt that this is overly useful.
I'm fine to move it into the BTS layer. But it would duplicate the
allocation and accounting code into all of DS users.
The model was to allow a single owner of the BTS and PEBS configurations
to prevent different users from overwriting their settings.
The first task to make a ds_request() for BTS or PEBS, would own the
respective configuration until it ds_release()s it.
This is essentially a BTS/PEBS resource allocator.
If we cut down on the BTS interface and collect all trace at all times,
anyway, we would not need this, any more.
PEBS would still need something like that, though. I wonder whether a
multiplexing model makes sense for PEBS, at all.
regards,
markus.
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
--