Nah, it's really a matter of fine-tuning really. I already performed these
benchmarks, have published the results, and know what to expect, so I figured
out quickly that something was wrong with your setup. Disabling the extra
syscall tracing is the only detail I see that is missing for fair comparison of
LTTng with your setup.
For the "fair" setup to compare ring buffers, I would personally tend to say
that going through a system call might not be the way I would benchmark the ring
buffer per se, but I must admit that the benchmark you are doing is very useful
for evaluation of system call entry/exit tracing. On this particular aspect, I
have always admitted that lttng is not super-fast, but it actually takes the
same overhead as the Linux kernel syscall tracing has today. But given it is
always enabled when lttng tracing runs, you happened to add this overhead to the
lttng ring buffer inappropriately.
I'm willing to help David with the setup, and I'm sure it won't take long. By
the way, I'm currently porting lttng to the new Generic Ring Buffer Library,
which might be even slightly faster than the current lttng code (and faster than
ftrace with the ring buffer benchmark Steven wrote). So when I'm done porting
lttng to it, it might be worthwhile to benchmark the Generic Ring Buffer too.
Thank you,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--