* H. Peter Anvin (hpa@zytor.com) wrote:Interesting. Actually, I use the "g" constraint in the code I showed you, so it might be more acceptable to put in the fast path without requiring registers to be populated. The only cases that would generate additional code would probably be arguments like : /* multiple pointer dereference */ trace_mark(evname, "argname", ptr->stuff[index]->...); /* * having to do some work to prepare the variable (calling a macro or * inline function which does more than just a pointer deref. */ trace_mark(evname, "argname", get_real_valueof(variable)); /* constants */ trace_mark(evname, "argname", 10000); Those cases won't add code to the critial path with my current markers, but it would with the inline assembly "g" constraint approach. Looking at the "mm" instrumentation, where page_to_pfn, swp_offset and get_swap_info_struct are used makes me think it would not be such a rare case. I would also like to point out that maintaining a _separated_ piece of code for each instrumentation site which would heavily depend on the inner kernel data structures seems like a maintenance nightmare. This is why I am trying to get the instrumented site to export the meaningful data, self-described, in a standardized way. We can then simply hook on all the instrumented sites to either perform some in-kernel analysis on the data (ftrace) or to export that to a userspace trace analyzer (LTTV analyzing LTTng traces). I would be happy with a solution that doesn't depend on this gigantic DWARF information and can be included in the kernel build process. See, I think tracing is, primarily, a facility that the kernel should provide to users so they can tune and find problems in their own applications. From this POV, it would make sense to consider tracing as part of the kernel code itself, not as a separated, kernel debugging oriented piece of code. If you require per-site dynamic pieces of code, you are only adding to the complexity of such a tracer. Actually, an active tracer would trash the i-cache quite heavily due to these per-site pieces of code. Given that users want a tracer that disturbs as little as possible the normal system behavior, I don't think this "per-site" pieces of code approach is that good. So, in terms of complexity added to the kernel, i-cache impact of an active tracer and maintainability, I think the register constraining assembly isn't such a good approach. And why would we do that ? The real contention point here seems to be to remove a few bytes from an unlikely block. I think I should paste my reply to Ingo about d-cache, i-cache and TLB impact of such code : <quote> Data cache bloat inspection : If you use the "size" output, it will take into account all the data placed in special sections. At link time, these sections are put together far from the actual cache hot kernel data. Instruction cache bloat inspection : If a code region is placed with cache cold instructions (unlikely branches), it should not increase the cache impact, since although we might use one more cache line, it won't be often loaded in cache because all the code that shares this cache line is unlikely. TLB entries bloat : If code is added in unlikely branches, the instruction size increase could increase the number of TLB entries required to keep cache hot code. However, in our case, adding 10 (hot) + 50 (cold) bytes to the scheduler code per optimized marker would require 68 markers to occupy a whole 4kB TLB entry. Statistically, we could suppose that adding less than 34 markers to the scheduler should not use any supplementary TLB entry. Adding 3 markers is therefore very unlikely to increase the TLB impact. Given we have about 1024 TLB entries, adding 1/25th of a TLB entry to the cache hot kernel instructions should not matter much, especially since it might be absorbed by alignment. And since the kernel core code is placed in "Huge TLB pages" on many architectures nowadays, I really don't think the impact of a few bytes out of 4MB is significant. I therefore think that looking only at code size is misleading when considering the cache impact of markers, since they have been designed to put the bytes as far away as possible from cache-hot memory. </quote> Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 --
| Andi Kleen | [PATCH] [16/22] x86: Move swsusp __pa() dependent code to arch portion |
| Nick Piggin | [patch 5/6] mm: merge nopfn into fault |
| Chuck Ebbert | Wanted: simple, safe x86 stack overflow detection |
| Balbir Singh | Re: 2.6.23-rc7-mm1 - 'touch' command causes Oops. |
git: | |
| Junio C Hamano | Re: [PATCH resend] make "git push" update origin and mirrors, "git push --mirror" ... |
| David Kastrup | Re: [OT] Re: C++ *for Git* |
| Bryan Donlan | [PATCH 0/8] Fix git's test suite to pass when the path contains spaces |
| Davide Libenzi | Re: First cut at git port to Cygwin |
| Khalid Schofield | Configuring sendmail openbsd 4.2 |
| Richard Stallman | Real men don't attack straw men |
| Jake Conk | Setting up ccd RAID 1 Howto OpenBSD 4.1 |
| Thilo Pfennig | OpenBSD project goals |
| Jim Winstead Jr. | Re: Root Disk/Book Disk Compatibility |
| Howard Wei-Hao Pan | [Q] Does Linux work with PCMCIA devices? |
| Curtis Yarvin | Re: Problem with UNCOMPRESS |
| Linus Benedict Torvalds | Re: trouble booting 0.11 (continued) |
