* H. Peter Anvin (hpa@zytor.com) wrote:Do you consider all unlikely blocks to be in line ? If the real issue is to make sure they don't share cache lines with the body of the function, that could be arranged. However, I assume that using an unlikely branch to let gcc with -freorder-blocks put the instructions at the end of the function is enough. When disabled : 0 cycles ? It additionnally clobbers eax and the EFLAGS. For the parameters passed to the marker, I think the marker location should be chosen carefully so most of the variables would be live anyway even without a marker. I was perfectly happy with the immediate value + conditional branch, but for apparently 0 cycles is more appealing than 2 :-) Let's consider this option : First of all, I wouldn't like to require tracing users to get the kernel debuginfos each time they want to trace. I think it should be a the "on" switch kind of infrastructure. Getting a few hundreds MB worth of data isn't exactly that. If I get your idea right, you propose to use an inline assembly with "g" constraints to make sure gcc lets them alive. I just did some testing of your approach applied to a marker in schedule() that shows that as soon as you need to dereference a pointer in the parameters, this adds operations in the fast path, which is not the case for markers because, as Ingo explained, this is done in a block outside the fast path. So your assembly constraint solution works fine only if the information happens to be there, in a register, at the inline assembly site. Then there is no added cost for register preparation. However, given it won't always be true, you have to bear the cost of setting up the registers from the stack or, worse, from a pointer read in the function fast path. The markers offloads this to the jump target located outside of the fast path. Therefore, in the general case which includes parameters not present in the registers, markers seems like a more palatable solution. If you suppose the information is always live in registers at the instrumented site, then yes, I guess your constraint+call approach is good, modulo the fact that users will depend on hundreds of megabytes of debuginfo. However, in order to populate registers appropriately with a wider range of parameters without adding instructions to the fast path, markers, which add instructions in a cache-cold block seems like a good way to go. And that depends on the ability to branch efficiently to that block, when enabled, in order to prepare the stack and do the call. Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 --
| Andy Whitcroft | clam |
| Jon Smirl | Re: 463 kernel developers missing! |
| Trent Piepho | [PATCH] [POWERPC] Improve (in|out)_beXX() asm code |
| Linus Torvalds | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
git: | |
| Jarek Poplawski | Re: HTB accuracy for high speed |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| David Miller | [GIT]: Networking |
| Natalie Protasevich | [BUG] New Kernel Bugs |
