On Tue, 2008-05-27 at 07:59 +0200, Mike Galbraith wrote:Hm, pbench's extreme dislike of preemption, and the starvation testcase I sent earlier having an absolute requirement of preemption kinda argues that some knobs and dials should be per task or task group (or, or... or scheduler should be all knowing all seeing;) 2.6.25.4-feat=45 2.6.25.4-feat=111 2.6.25.4-feat=47 1 11385.471887 10292.721924 9551.157672 2 16709.515434 15540.399522 16283.968970 3 25456.658841 20187.320016 24562.735943 4 24453.435157 24975.037450 23391.583053 5 25504.302958 23102.131056 23671.860667 6 27076.359200 24688.791507 25947.592071 8 31758.200682 29462.639752 29700.144372 10 32190.081142 30428.413809 27439.441838 15 31175.074906 11097.668025 20344.284129 20 30513.974332 10742.166624 19256.695409 30 28307.399275 10233.708047 17535.423344 40 26720.463867 10037.856773 16104.895695 50 24899.945793 9907.624283 15768.746911 Anyway, if patchlet flies, and Ingo concurs, I'll submit the below. Prevent short-running wakers of short-running threads from overloading a single cpu via wakeup affinity, and provide affinity related debug/tuning options. Signed-off-by: Mike Galbraith <efault@gmx.de> kernel/sched.c | 9 ++++++++- kernel/sched_fair.c | 25 ++++++++++++++----------- 2 files changed, 22 insertions(+), 12 deletions(-) diff --git a/kernel/sched.c b/kernel/sched.c index 1e4596c..d6d70a8 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -596,6 +596,8 @@ enum { SCHED_FEAT_START_DEBIT = 4, SCHED_FEAT_HRTICK = 8, SCHED_FEAT_DOUBLE_TICK = 16, + SCHED_FEAT_AFFINE_WAKEUPS = 32, + SCHED_FEAT_SYNC_WAKEUPS = 64, }; const_debug unsigned int sysctl_sched_features = @@ -603,7 +605,9 @@ const_debug unsigned int sysctl_sched_features = SCHED_FEAT_WAKEUP_PREEMPT * 1 | SCHED_FEAT_START_DEBIT * 1 | SCHED_FEAT_HRTICK * 1 | - SCHED_FEAT_DOUBLE_TICK * 0; + SCHED_FEAT_DOUBLE_TICK * 0 | + SCHED_FEAT_AFFINE_WAKEUPS * 1 | + SCHED_FEAT_SYNC_WAKEUPS * 1; #define sched_feat(x) (sysctl_sched_features & SCHED_FEAT_##x) @@ -1902,6 +1906,9 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state, int sync) long old_state; struct rq *rq; + if (!sched_feat(SYNC_WAKEUPS)) + sync = 0; + smp_wmb(); rq = task_rq_lock(p, &flags); old_state = p->state; diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c index 0080968..bf2defe 100644 --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -992,16 +992,27 @@ wake_affine(struct rq *rq, struct sched_domain *this_sd, struct rq *this_rq, struct task_struct *curr = this_rq->curr; unsigned long tl = this_load; unsigned long tl_per_task; + int bad_imbalance; - if (!(this_sd->flags & SD_WAKE_AFFINE)) + if (!(this_sd->flags & SD_WAKE_AFFINE) || !sched_feat(AFFINE_WAKEUPS)) return 0; /* + * If sync wakeup then subtract the (maximum possible) + * effect of the currently running task from the load + * of the current CPU: + */ + if (sync && tl) + tl -= curr->se.load.weight; + + bad_imbalance = 100*(tl + p->se.load.weight) > imbalance*load; + + /* * If the currently running task will sleep within * a reasonable amount of time then attract this newly * woken task: */ - if (sync && curr->sched_class == &fair_sched_class) { + if (sync && !bad_imbalance && curr->sched_class == &fair_sched_class) { if (curr->se.avg_overlap < sysctl_sched_migration_cost && p->se.avg_overlap < sysctl_sched_migration_cost) return 1; @@ -1010,16 +1021,8 @@ wake_affine(struct rq *rq, struct sched_domain *this_sd, struct rq *this_rq, schedstat_inc(p, se.nr_wakeups_affine_attempts); tl_per_task = cpu_avg_load_per_task(this_cpu); - /* - * If sync wakeup then subtract the (maximum possible) - * effect of the currently running task from the load - * of the current CPU: - */ - if (sync) - tl -= current->se.load.weight; - if ((tl <= load && tl + target_load(prev_cpu, idx) <= tl_per_task) || - 100*(tl + p->se.load.weight) <= imbalance*load) { + !bad_imbalance) { /* * This domain has SD_WAKE_AFFINE and * p is cache cold in this domain, and --
| Greg Kroah-Hartman | [PATCH 001/196] Chinese: Add the known_regression URI to the HOWTO |
| Andrew Morton | 2.6.25-mm1 |
| david | Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3 |
| Bart Van Assche | Integration of SCST in the mainstream Linux kernel |
git: | |
| Jarek Poplawski | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Gerrit Renker | [PATCH 15/37] dccp: Set per-connection CCIDs via socket options |
| David Miller | [GIT]: Networking |
| Natalie Protasevich | [BUG] New Kernel Bugs |
