[patch] Re: PostgreSQL pgbench performance regression in 2.6.23+

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Greg Smith <gsmith@...>
Cc: Ingo Molnar <mingo@...>, Peter Zijlstra <peterz@...>, Dhaval Giani <dhaval@...>, lkml <linux-kernel@...>, Srivatsa Vaddagiri <vatsa@...>
Date: Tuesday, May 27, 2008 - 1:59 am

On Mon, 2008-05-26 at 20:28 -0400, Greg Smith wrote:


Care to give the below a whirl?  If fixes the over-enthusiastic affinity
bug in a less restrictive way.  It doesn't attempt to addresss the needs
of any particular load though, that needs more thought (tricky issue).

With default features, I get the below.

2.6.26-smp x86_64
1 10121.600913
2 14360.229517
3 17048.770371
4 18748.777814
5 22086.493358
6 24913.416187
8 27976.026783
10 29346.503261
15 29157.239431
20 28392.257204
30 26590.199787
40 24422.481578
50 23305.981434

(I can get a bit more by disabling HR_TICK along with a dinky patchlet
to reduce overhead when it's disabled.  Bottom line is that the bug is
fixed though, maximizing performance is separate issue imho) 

Prevent short-running wakers of short-running threads from overloading a single
cpu via wakeup affinity, and wire up disconnected debug option.

Signed-off-by: Mike Galbraith <efault@gmx.de>

 kernel/sched_fair.c |   25 ++++++++++++++-----------
 1 files changed, 14 insertions(+), 11 deletions(-)

Index: linux-2.6.26.git/kernel/sched_fair.c
===================================================================
--- linux-2.6.26.git.orig/kernel/sched_fair.c
+++ linux-2.6.26.git/kernel/sched_fair.c
@@ -1057,16 +1057,27 @@ wake_affine(struct rq *rq, struct sched_
 	struct task_struct *curr = this_rq->curr;
 	unsigned long tl = this_load;
 	unsigned long tl_per_task;
+	int bad_imbalance;
 
-	if (!(this_sd->flags & SD_WAKE_AFFINE))
+	if (!(this_sd->flags & SD_WAKE_AFFINE) || !sched_feat(AFFINE_WAKEUPS))
 		return 0;
 
 	/*
+	 * If sync wakeup then subtract the (maximum possible)
+	 * effect of the currently running task from the load
+	 * of the current CPU:
+	 */
+	if (sync && tl)
+		tl -= curr->se.load.weight;
+
+	bad_imbalance = 100*(tl + p->se.load.weight) > imbalance*load;
+
+	/*
 	 * If the currently running task will sleep within
 	 * a reasonable amount of time then attract this newly
 	 * woken task:
 	 */
-	if (sync && curr->sched_class == &fair_sched_class) {
+	if (sync && !bad_imbalance && curr->sched_class == &fair_sched_class)
{
 		if (curr->se.avg_overlap < sysctl_sched_migration_cost &&
 				p->se.avg_overlap < sysctl_sched_migration_cost)
 			return 1;
@@ -1075,16 +1086,8 @@ wake_affine(struct rq *rq, struct sched_
 	schedstat_inc(p, se.nr_wakeups_affine_attempts);
 	tl_per_task = cpu_avg_load_per_task(this_cpu);
 
-	/*
-	 * If sync wakeup then subtract the (maximum possible)
-	 * effect of the currently running task from the load
-	 * of the current CPU:
-	 */
-	if (sync)
-		tl -= current->se.load.weight;
-
 	if ((tl <= load && tl + target_load(prev_cpu, idx) <= tl_per_task) ||
-			100*(tl + p->se.load.weight) <= imbalance*load) {
+			!bad_imbalance) {
 		/*
 		 * This domain has SD_WAKE_AFFINE and
 		 * p is cache cold in this domain, and



Because those were git _with_ Peter's patch.  I was looking at the
difference between a non-broken 2.6.26 with and without minimizing
preemption, to match the way I tested 2.6.22.


If the patch above flies, imho, that should be folded into the backport
to stable.  The heart of the patch is a bugfix, so definitely stable
material.  Whether to enable the debugging/tweaking knobs as well is a
different question.  (I would do it, but I ain't the maintainer;)


(it doesn't fully address the 1:N needs, that needs much more thought)


I suspect very many, certainly any load where latency is of major
importance.  Peak performance of the mysql+oltp benchmark for sure is
injured with your settings.  As a really good demonstration of how
important wakeup preemption can be, try running the attached testcase,
which was distilled from a real world load and posted to lkml a few
years ago, without wakeup preemption and nailed to one cpu.


See patch :)

	-Mike
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
PostgreSQL pgbench performance regression in 2.6.23+, Greg Smith, (Wed May 21, 1:34 pm)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Thu May 22, 3:10 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Thu May 22, 5:05 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Thu May 22, 6:34 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Thu May 22, 7:25 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Peter Zijlstra, (Thu May 22, 7:44 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Thu May 22, 8:09 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Peter Zijlstra, (Thu May 22, 8:24 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Fri May 23, 6:00 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Fri May 23, 9:05 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Fri May 23, 9:35 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Fri May 23, 6:15 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Fri May 23, 7:46 pm)
[patch] Re: PostgreSQL pgbench performance regression in 2.6..., Mike Galbraith, (Tue May 27, 1:59 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Sat May 24, 4:08 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Thu May 22, 9:16 am)