Hello,
Please consider this patch. It makes a few minor changes to
sched_fair.c.
sched: Minor optimizations in wake_affine and select_task_rq_fair
This patch does following:
o Reduces the number of arguments to wake_affine().
o Removes unused variable "rq".
o Optimizes one of the "if" conditions in wake_affine() - i.e. if
"balanced" is true, we need not do rest of the calculations in the
condition.
o If this cpu is same as the previous cpu (on which woken up task
was running when it went to sleep), no need to call wake_affine at all.
Signed-off-by: Amit K Arora <aarora@linux.vnet.ibm.com>
CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Ingo Molnar <mingo@elte.hu>
---
kernel/sched_fair.c | 29 ++++++++++++-----------------
1 file changed, 12 insertions(+), 17 deletions(-)
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1143,17 +1143,18 @@ static inline unsigned long effective_lo
#endif
static int
-wake_affine(struct rq *rq, struct sched_domain *this_sd, struct rq *this_rq,
- struct task_struct *p, int prev_cpu, int this_cpu, int sync,
- int idx, unsigned long load, unsigned long this_load,
- unsigned int imbalance)
+wake_affine(struct sched_domain *this_sd, struct task_struct *p, int prev_cpu,
+ int this_cpu, int sync, unsigned long load, unsigned long this_load)
{
+ struct rq *this_rq = cpu_rq(this_cpu);
struct task_struct *curr = this_rq->curr;
struct task_group *tg;
unsigned long tl = this_load;
unsigned long tl_per_task;
unsigned long weight;
int balanced;
+ int idx = this_sd->wake_idx;
+ unsigned int imbalance = 100 + (this_sd->imbalance_pct - 100) / 2;
if (!(this_sd->flags & SD_WAKE_AFFINE) || !sched_feat(AFFINE_WAKEUPS))
return 0;
@@ -1191,8 +1192,8 @@ wake_affine(struct rq *rq, struct sched_
schedstat_inc(p, se.nr_wakeups_affine_attempts);
tl_per_task = cpu_avg_load_per_task(this_cpu);
- if ((tl <= load && tl + ...At what point is it cheaper to pass items as args rather than recalculating them? If reducing the number of args is desirable, what about removing the "this_cpu" and "prev_cpu" args and recalculating them in wake_affine()? Chris --
it's usually not worth it, especially if it leads to duplicated calculations (and code) like: + unsigned int imbalance = 100 + (this_sd->imbalance_pct - 100) / 2; gcc will optimize it away because it's all static functions, but still. 'size kernel/sched.o' should be a good guideline: if the .o's text section gets smaller due to a patch it usually gets faster as well. Ingo --
Ok. I will resubmit the patch with other suggested changes only. It won't try to reduce wake_affine's arguments (besides the first argument "rq" which is not being used at all). Regards, --
Thats a good question. Its kind of arguable and I wasn't sure if everyone will be happy if I removed more arguments from wake_affine() than what I did in my patch (because of the recalculations required). wake_affine() currently has 11 arguments and I thought it may make sense in reducing it to a sane number. For that I chose arguments which I thought can be recalculated with minimum overhead (involves single struct dereference, a simple per cpu variable and/or a simple arithmetic). And one argument ("rq") which is being removed, isn't used at all in the function! Regarding the two variables you have mentioned, I didn't remove them as args since I wasn't sure of "this_cpu" (which is nothing but smp_processor_id()) as it is arch dependent, and calculating "prev_cpu" involves two struct dereferences (((struct thread_info *)(task)->stack)->cpu). And the calculation for other arguments (like, this_sd, load and this_load) involves good amount of instructions. If you disagree, what do you suggest we do here ? Regards, Amit Arora --
As mentioned in http://lkml.org/lkml/2008/9/30/141 , this is the new patch. sched: Minor optimizations in wake_affine and select_task_rq_fair This patch does following: o Removes unused variable and argument "rq". o Optimizes one of the "if" conditions in wake_affine() - i.e. if "balanced" is true, we need not do rest of the calculations in the condition. o If this cpu is same as the previous cpu (on which woken up task was running when it went to sleep), no need to call wake_affine at all. Signed-off-by: Amit K Arora <aarora@linux.vnet.ibm.com> CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Ingo Molnar <mingo@elte.hu> --- kernel/sched_fair.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -1143,7 +1143,7 @@ static inline unsigned long effective_lo #endif static int -wake_affine(struct rq *rq, struct sched_domain *this_sd, struct rq *this_rq, +wake_affine(struct sched_domain *this_sd, struct rq *this_rq, struct task_struct *p, int prev_cpu, int this_cpu, int sync, int idx, unsigned long load, unsigned long this_load, unsigned int imbalance) @@ -1191,8 +1191,8 @@ wake_affine(struct rq *rq, struct sched_ schedstat_inc(p, se.nr_wakeups_affine_attempts); tl_per_task = cpu_avg_load_per_task(this_cpu); - if ((tl <= load && tl + target_load(prev_cpu, idx) <= tl_per_task) || - balanced) { + if (balanced || (tl <= load && tl + target_load(prev_cpu, idx) <= + tl_per_task)) { /* * This domain has SD_WAKE_AFFINE and * p is cache cold in this domain, and @@ -1211,16 +1211,17 @@ static int select_task_rq_fair(struct ta struct sched_domain *sd, *this_sd = NULL; int prev_cpu, this_cpu, new_cpu; unsigned long load, this_load; - struct rq *rq, *this_rq; + struct rq *this_rq; unsigned int imbalance; int idx; prev_cpu = task_cpu(p); - rq = task_rq(p); this_cpu = ...
applied to tip/sched/devel, thanks Amit! Ingo --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
