Re: [patch] Re: PostgreSQL pgbench performance regression in 2.6.23+

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Greg Smith <gsmith@...>
Cc: Ingo Molnar <mingo@...>, Peter Zijlstra <peterz@...>, Dhaval Giani <dhaval@...>, lkml <linux-kernel@...>, Srivatsa Vaddagiri <vatsa@...>
Date: Tuesday, May 27, 2008 - 4:20 am

On Tue, 2008-05-27 at 07:59 +0200, Mike Galbraith wrote:

Hm, pbench's extreme dislike of preemption, and the starvation testcase
I sent earlier having an absolute requirement of preemption kinda argues
that some knobs and dials should be per task or task group (or, or... or
scheduler should be all knowing all seeing;) 
 
2.6.25.4-feat=45  2.6.25.4-feat=111    2.6.25.4-feat=47
1  11385.471887        10292.721924         9551.157672
2  16709.515434        15540.399522        16283.968970
3  25456.658841        20187.320016        24562.735943
4  24453.435157        24975.037450        23391.583053
5  25504.302958        23102.131056        23671.860667
6  27076.359200        24688.791507        25947.592071
8  31758.200682        29462.639752        29700.144372
10 32190.081142        30428.413809        27439.441838
15 31175.074906        11097.668025        20344.284129
20 30513.974332        10742.166624        19256.695409
30 28307.399275        10233.708047        17535.423344
40 26720.463867        10037.856773        16104.895695
50 24899.945793         9907.624283        15768.746911

Anyway, if patchlet flies, and Ingo concurs, I'll submit the below.

Prevent short-running wakers of short-running threads from overloading a
single
cpu via wakeup affinity, and provide affinity related debug/tuning
options.

Signed-off-by: Mike Galbraith <efault@gmx.de>

 kernel/sched.c      |    9 ++++++++-
 kernel/sched_fair.c |   25 ++++++++++++++-----------
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 1e4596c..d6d70a8 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -596,6 +596,8 @@ enum {
 	SCHED_FEAT_START_DEBIT		= 4,
 	SCHED_FEAT_HRTICK		= 8,
 	SCHED_FEAT_DOUBLE_TICK		= 16,
+	SCHED_FEAT_AFFINE_WAKEUPS	= 32,
+	SCHED_FEAT_SYNC_WAKEUPS		= 64,
 };
 
 const_debug unsigned int sysctl_sched_features =
@@ -603,7 +605,9 @@ const_debug unsigned int sysctl_sched_features =
 		SCHED_FEAT_WAKEUP_PREEMPT	* 1 |
 		SCHED_FEAT_START_DEBIT		* 1 |
 		SCHED_FEAT_HRTICK		* 1 |
-		SCHED_FEAT_DOUBLE_TICK		* 0;
+		SCHED_FEAT_DOUBLE_TICK		* 0 |
+		SCHED_FEAT_AFFINE_WAKEUPS	* 1 |
+		SCHED_FEAT_SYNC_WAKEUPS		* 1;
 
 #define sched_feat(x) (sysctl_sched_features & SCHED_FEAT_##x)
 
@@ -1902,6 +1906,9 @@ static int try_to_wake_up(struct task_struct *p,
unsigned int state, int sync)
 	long old_state;
 	struct rq *rq;
 
+	if (!sched_feat(SYNC_WAKEUPS))
+		sync = 0;
+
 	smp_wmb();
 	rq = task_rq_lock(p, &flags);
 	old_state = p->state;
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 0080968..bf2defe 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -992,16 +992,27 @@ wake_affine(struct rq *rq, struct sched_domain
*this_sd, struct rq *this_rq,
 	struct task_struct *curr = this_rq->curr;
 	unsigned long tl = this_load;
 	unsigned long tl_per_task;
+	int bad_imbalance;
 
-	if (!(this_sd->flags & SD_WAKE_AFFINE))
+	if (!(this_sd->flags & SD_WAKE_AFFINE) || !sched_feat(AFFINE_WAKEUPS))
 		return 0;
 
 	/*
+	 * If sync wakeup then subtract the (maximum possible)
+	 * effect of the currently running task from the load
+	 * of the current CPU:
+	 */
+	if (sync && tl)
+		tl -= curr->se.load.weight;
+
+	bad_imbalance = 100*(tl + p->se.load.weight) > imbalance*load;
+
+	/*
 	 * If the currently running task will sleep within
 	 * a reasonable amount of time then attract this newly
 	 * woken task:
 	 */
-	if (sync && curr->sched_class == &fair_sched_class) {
+	if (sync && !bad_imbalance && curr->sched_class == &fair_sched_class)
{
 		if (curr->se.avg_overlap < sysctl_sched_migration_cost &&
 				p->se.avg_overlap < sysctl_sched_migration_cost)
 			return 1;
@@ -1010,16 +1021,8 @@ wake_affine(struct rq *rq, struct sched_domain
*this_sd, struct rq *this_rq,
 	schedstat_inc(p, se.nr_wakeups_affine_attempts);
 	tl_per_task = cpu_avg_load_per_task(this_cpu);
 
-	/*
-	 * If sync wakeup then subtract the (maximum possible)
-	 * effect of the currently running task from the load
-	 * of the current CPU:
-	 */
-	if (sync)
-		tl -= current->se.load.weight;
-
 	if ((tl <= load && tl + target_load(prev_cpu, idx) <= tl_per_task) ||
-			100*(tl + p->se.load.weight) <= imbalance*load) {
+			!bad_imbalance) {
 		/*
 		 * This domain has SD_WAKE_AFFINE and
 		 * p is cache cold in this domain, and




--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
PostgreSQL pgbench performance regression in 2.6.23+, Greg Smith, (Wed May 21, 1:34 pm)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Thu May 22, 3:10 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Thu May 22, 5:05 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Thu May 22, 6:34 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Thu May 22, 7:25 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Peter Zijlstra, (Thu May 22, 7:44 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Thu May 22, 8:09 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Peter Zijlstra, (Thu May 22, 8:24 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Fri May 23, 6:00 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Fri May 23, 9:05 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Fri May 23, 9:35 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Fri May 23, 6:15 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Fri May 23, 7:46 pm)
Re: [patch] Re: PostgreSQL pgbench performance regression in..., Mike Galbraith, (Tue May 27, 4:20 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Sat May 24, 4:08 am)
Re: PostgreSQL pgbench performance regression in 2.6.23+, Mike Galbraith, (Thu May 22, 9:16 am)