On Mon, 2008-05-26 at 20:28 -0400, Greg Smith wrote:Care to give the below a whirl? If fixes the over-enthusiastic affinity bug in a less restrictive way. It doesn't attempt to addresss the needs of any particular load though, that needs more thought (tricky issue). With default features, I get the below. 2.6.26-smp x86_64 1 10121.600913 2 14360.229517 3 17048.770371 4 18748.777814 5 22086.493358 6 24913.416187 8 27976.026783 10 29346.503261 15 29157.239431 20 28392.257204 30 26590.199787 40 24422.481578 50 23305.981434 (I can get a bit more by disabling HR_TICK along with a dinky patchlet to reduce overhead when it's disabled. Bottom line is that the bug is fixed though, maximizing performance is separate issue imho) Prevent short-running wakers of short-running threads from overloading a single cpu via wakeup affinity, and wire up disconnected debug option. Signed-off-by: Mike Galbraith <efault@gmx.de> kernel/sched_fair.c | 25 ++++++++++++++----------- 1 files changed, 14 insertions(+), 11 deletions(-) Index: linux-2.6.26.git/kernel/sched_fair.c =================================================================== --- linux-2.6.26.git.orig/kernel/sched_fair.c +++ linux-2.6.26.git/kernel/sched_fair.c @@ -1057,16 +1057,27 @@ wake_affine(struct rq *rq, struct sched_ struct task_struct *curr = this_rq->curr; unsigned long tl = this_load; unsigned long tl_per_task; + int bad_imbalance; - if (!(this_sd->flags & SD_WAKE_AFFINE)) + if (!(this_sd->flags & SD_WAKE_AFFINE) || !sched_feat(AFFINE_WAKEUPS)) return 0; /* + * If sync wakeup then subtract the (maximum possible) + * effect of the currently running task from the load + * of the current CPU: + */ + if (sync && tl) + tl -= curr->se.load.weight; + + bad_imbalance = 100*(tl + p->se.load.weight) > imbalance*load; + + /* * If the currently running task will sleep within * a reasonable amount of time then attract this newly * woken task: */ - if (sync && curr->sched_class == &fair_sched_class) { + if (sync && !bad_imbalance && curr->sched_class == &fair_sched_class) { if (curr->se.avg_overlap < sysctl_sched_migration_cost && p->se.avg_overlap < sysctl_sched_migration_cost) return 1; @@ -1075,16 +1086,8 @@ wake_affine(struct rq *rq, struct sched_ schedstat_inc(p, se.nr_wakeups_affine_attempts); tl_per_task = cpu_avg_load_per_task(this_cpu); - /* - * If sync wakeup then subtract the (maximum possible) - * effect of the currently running task from the load - * of the current CPU: - */ - if (sync) - tl -= current->se.load.weight; - if ((tl <= load && tl + target_load(prev_cpu, idx) <= tl_per_task) || - 100*(tl + p->se.load.weight) <= imbalance*load) { + !bad_imbalance) { /* * This domain has SD_WAKE_AFFINE and * p is cache cold in this domain, and Because those were git _with_ Peter's patch. I was looking at the difference between a non-broken 2.6.26 with and without minimizing preemption, to match the way I tested 2.6.22. If the patch above flies, imho, that should be folded into the backport to stable. The heart of the patch is a bugfix, so definitely stable material. Whether to enable the debugging/tweaking knobs as well is a different question. (I would do it, but I ain't the maintainer;) (it doesn't fully address the 1:N needs, that needs much more thought) I suspect very many, certainly any load where latency is of major importance. Peak performance of the mysql+oltp benchmark for sure is injured with your settings. As a really good demonstration of how important wakeup preemption can be, try running the attached testcase, which was distilled from a real world load and posted to lkml a few years ago, without wakeup preemption and nailed to one cpu. See patch :) -Mike
| Greg Kroah-Hartman | [PATCH 006/196] Chinese: add translation of oops-tracing.txt |
| Andrew Morton | Re: -mm merge plans for 2.6.23 -- sys_fallocate |
| Eric W. Biederman | [PATCH] nfs lockd reclaimer: Convert to kthread API |
| James Bottomley | Re: Integration of SCST in the mainstream Linux kernel |
git: | |
| David Miller | [GIT]: Networking |
| Gerrit Renker | [PATCH 03/37] dccp: List management for new feature negotiation |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
