* H. Peter Anvin <hpa@zytor.com> wrote:the counter argument was that by specific sched.o analysis, this results in slower code. The reason is that the "function call parameter preparation" halo around that 5-byte patch site is larger than that single conditional branch operation to an offline place of the current function is. i.e. the current optimized marker approach does roughly this: [ .... fastpath head .... ] [ immediate value instruction ] ---> [ branch instruction ] ---> these two get NOP-ed out [ .... fastpath tail .... ] [ ............................. ] [ ... offline area ............ ] [ ... parameter preparation ... ] [ ... marker call ............. ] your proposed 5-byte call NOP approach (which btw. was what i proposed multiple times in the past 2 years) would do this: [ .... fastpath head ...... ] [ ... parameter preparation ... ] [ .... 5-byte CALL .......... ] ---> NOP-ed out [ .... fastpath tail .......... ] [ ............................. ] in the first case we have little "marker parameter/value preparation" cost: it all happens in the 'offline area' _by GCC_. I.e. the fastpath is relatively undisturbed. in the latter case, all the 'parameter preparation' phase has to happen at around the 5-byte CALL site, in the fastpath. This, in the specific, assembly level analysis of sched.o, was shown by Matthieu to be a pessimisation. We are better off by inserting that conditional and letting gcc generate the call, than by forcing it in the middle of the fastpath - even if we end up NOP-ing out the call. wrt. complexity i agree with you - if the current optimization cannot be made correctly we have to fall back to a simpler variant, even if it's slower. Ingo --
| Andrew Morton | -mm merge plans for 2.6.23 |
| Greg Kroah-Hartman | [PATCH 006/196] Chinese: add translation of oops-tracing.txt |
| Greg KH | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Roland Dreier | Re: Integration of SCST in the mainstream Linux kernel |
git: | |
| David Miller | [GIT]: Networking |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| Linus Torvalds | Re: iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49 |
| Herbert Xu | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
