I noticed expensive divides done in try_to_wakeup() and find_busiest_group()
on a bi dual core Opteron machine (total of 4 cores), moderatly loaded (15.000
context switch per second)
oprofile numbers :
CPU: AMD64 processors, speed 2600.05 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
mask of 0x00 (No unit mask) count 50000
samples % symbol name
...
613914 1.0498 try_to_wake_up
834 0.0013 :ffffffff80227ae1: div %rcx
77513 0.1191 :ffffffff80227ae4: mov %rax,%r11
608893 1.0413 find_busiest_group
1841 0.0031 :ffffffff802260bf: div %rdi
140109 0.2394 :ffffffff802260c2: test %sil,%sil
Some of these divides can use the reciprocal divides we introduced some time
ago (currently used in slab AFAIK)
We can assume a load will fit in a 32bits number, because with a
SCHED_LOAD_SCALE=128 value, its still a theorical limit of 33554432
When/if we reach this limit one day, probably cpus will have a fast hardware
divide and we can zap the reciprocal divide trick.
Ingo suggested to rename cpu_power to __cpu_power to make clear it should not
be modified without changing its reciprocal value too.
I did not convert the divide in cpu_avg_load_per_task(), because tracking
nr_running changes may be not worth it ? We could use a static table of 32
reciprocal values but it would add a conditional branch and table lookup.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>| Ingo Molnar | [patch 02/13] syslets: add syslet.h include file, user API/ABI definitions |
| Heiko Carstens | Re: 2.6.25-rc6-git6: Reported regressions from 2.6.24 |
| Greg Kroah-Hartman | [PATCH 010/196] Chinese: add translation of Codingstyle |
| Rafael J. Wysocki | [Bug #10629] 2.6.26-rc1-$sha1: RIP __d_lookup+0x8c/0x160 |
git: | |
| Gerrit Renker | [PATCH 0/37] dccp: Feature negotiation - last call for comments |
| Linus Torvalds | Re: [GIT]: Networking |
| Mark Lord | Re: [BUG] New Kernel Bugs |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
