* Ingo Molnar (mingo@elte.hu) wrote:I have not seen any counter argument for the in-depth analysis of the instruction cache impact of the optimized markers I've done. Arguing that the markers are "bloated" based only on "size kernel/sched.o" output is a bit misleading. You will probably be interested in the following paper, which explains various situations in which using a tracer has solved real problems at Google, IBM, Autodesk, which are Linux users running large clusters or Linux systems with soft RT constraints. Linux Kernel Debugging on Google-sized clusters at Ottawa Linux Symposium 2007 http://ltt.polymtl.ca/papers/bligh-Reprint.pdf Now for some performance impact : Here are some results I have taken comparing the optimized markers approach with the dynamic ftrace approach. These runs with some ALU work in tight loops, using clflush() to flush the cache lines pointing to "global" data (pointer read : current->pid) used in the loop. I also have the numbers for running the loop without the ALU work, but I leave them out since they only make the tables harder to read : basically, the cached impact for running the empty loop with markers or ftrace instrumentation is about 0 to 3 cycles. It's the uncached impact which clearly makes the difference between both approaches. On AMD64, adding the markers or ftrace statement actually accelerates the runs when executed with an ALU work baseline. It adds 1 to 2 cycles with executed alone in the loop without any work. Frank Ch. Eigler is preparing some macrobenchmarks. I hope he will find time to post them soon. Results in cycles per loop baseline : Cycles for ALU loop 28.10013 (will be substracted for cached runs) Cycles for clflush() and ALU loop 230.11087 (will be substracted from non-cached runs) gcc version 4.1.3 20070812 (prerelease) (Debian 4.1.2-15), -O2 ------------------------------------------------------------------------------ |x86 Pentium 4, 3.0GHz, Linux 2.6.25-rc7 | cached | uncached | ------------------------------------------------------------------------------ |Added cycles for optimized marker | 0.002 | 0.07 | |Added cycles for normal marker | 0.004 | 154.7 | |Added cycles for stack setup + (1+4 bytes) NOPs | | | |(6 local vars) | 0.035 | 0.6 | |Added cycles for stack setup + (1+4 bytes) NOPs | | | |(1 pointer read, 5 local vars) | 0.030 | 222.8 | ------------------------------------------------------------------------------ Results in cycles per loop baseline : Cycles for ALU and loop 25.32369 (will be substracted for cached runs) Cycles for clflush() and ALU loop 118.24227 (will be substracted from non-cached runs) gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21), -O2 ------------------------------------------------------------------------------ |AMD64, 2.0GHz, Linux 2.6.25-rc7 | cached | uncached | ------------------------------------------------------------------------------ |Added cycles for optimized marker | -1.0 | 0.2 | |Added cycles for normal marker | -0.3 | 41.8 | |Added cycles for stack setup + (1+4 bytes) NOPs | -0.5 | 0.01 | |(6 local vars) | | | |Added cycles for stack setup + (1+4 bytes) NOPs | 2.7 | 51.8 | |(1 pointer read, 5 local vars) | | | ------------------------------------------------------------------------------ test bench at : http://ltt.polymtl.ca/svn/markers-test/ Regards, Mathieu -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 --
| Arjan van de Ven | [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in |
| Linus Torvalds | Linux 2.6.27-rc8 |
| Tilman Schmidt | git guidance |
| Greg KH | [GIT PATCH] driver core patches against 2.6.24 |
git: | |
| Martin Langhoff | Re: pack operation is thrashing my server |
| Alan Larkin | fatal: Out of memory, malloc failed |
| Mark Junker | git on MacOSX and files with decomposed utf-8 file names |
| Alex Riesen | Re: How do get a specific version of a particular file? |
| Leon Dippenaar | New tcp stack attack |
| Richard Stallman | Real men don't attack straw men |
| Pieter Verberne | Remove escape characters from file |
| Juan Miscaro | removing sendmail |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| David Miller | [GIT]: Networking |
| Chuck Lever | Re: [bug?] tg3: Failed to load firmware "tigon/tg3_tso.bin" |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
