After reverting below patch, volanoMark regression becomes less than 2% with CONFIG_GROUP_SCHED=n
on my 8-core stoakely. The improvement on 16-core tigerton is about 44%, but there is still about
20% regression, comparing with 2.6.26_nogroup.commit 93b75217df39e6d75889cc6f8050343286aff4a5
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Fri Jun 27 13:41:33 2008 +0200sched: disable source/target_load bias
The bias given by source/target_load functions can be very large, disable
it by default to get faster convergence.Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>This patch adds a new feature LB_BIAS, but uses it with a NOT, so I lost it when I tested
single sched feature one by one. That also explains why wake_affine and load_balance_newidle
have more successful task pulling with kernel 2.6.27-rc, because MC and CPU domain's wake_idx
is 1, so this patch has impact on them.--
Ah - I assumed you already tried that knob since you mentioned fiddling
with the various feature flags.And I must admit to having overlooked the effect on wake_affine..
Chris, could you see the effect of this on smp group fairness?
---
diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index 862b06b..9353ca7 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -8,6 +8,6 @@ SCHED_FEAT(SYNC_WAKEUPS, 1)
SCHED_FEAT(HRTICK, 1)
SCHED_FEAT(DOUBLE_TICK, 0)
SCHED_FEAT(ASYM_GRAN, 1)
-SCHED_FEAT(LB_BIAS, 0)
+SCHED_FEAT(LB_BIAS, 1)
SCHED_FEAT(LB_WAKEUP_UPDATE, 1)
SCHED_FEAT(ASYM_EFF_LOAD, 1)--
applied your commit below to tip/sched/urgent, thanks.
Ingo
-------------->
From 939387c3a6141ec6aefc7acd40f8b186781bb098 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Wed, 20 Aug 2008 12:44:55 +0200
Subject: [PATCH] sched: enable LB_BIAS by defaultYanmin reported a significant regression on his 16-core machine due to:
commit 93b75217df39e6d75889cc6f8050343286aff4a5
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Fri Jun 27 13:41:33 2008 +0200Flip back to the old behaviour.
Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/sched_features.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index 862b06b..9353ca7 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -8,6 +8,6 @@ SCHED_FEAT(SYNC_WAKEUPS, 1)
SCHED_FEAT(HRTICK, 1)
SCHED_FEAT(DOUBLE_TICK, 0)
SCHED_FEAT(ASYM_GRAN, 1)
-SCHED_FEAT(LB_BIAS, 0)
+SCHED_FEAT(LB_BIAS, 1)
SCHED_FEAT(LB_WAKEUP_UPDATE, 1)
SCHED_FEAT(ASYM_EFF_LOAD, 1)
--
Just realized my brainfart..
---
Subject: sched: load-balance bias fixes
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Wed Aug 20 15:28:51 CEST 2008Yanmin spotted a regression with my patch that introduces LB_BIAS:
commit 93b75217df39e6d75889cc6f8050343286aff4a5
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Fri Jun 27 13:41:33 2008 +0200And I just spotted the brainfart - I should have replaced min/max with avg
instead of removing it completely.Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/kernel.h | 6 ++++++
kernel/sched.c | 10 ++++++++--
2 files changed, 14 insertions(+), 2 deletions(-)Index: linux-2.6/include/linux/kernel.h
===================================================================
--- linux-2.6.orig/include/linux/kernel.h
+++ linux-2.6/include/linux/kernel.h
@@ -367,6 +367,12 @@ static inline char *pack_hex_byte(char *
(void) (&_max1 == &_max2); \
_max1 > _max2 ? _max1 : _max2; })+#define avg(x, y) ({ \
+ typeof(x) _avg1 = ((x)+1)/2; \
+ typeof(x) _avg2 = ((y)+1)/2; \
+ (void) (&_avg1 == &_avg2); \
+ _avg1 + _avg2; })
+
/**
* clamp - return a value clamped to a given range with strict typechecking
* @val: current value
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2008,9 +2008,12 @@ static unsigned long source_load(int cpu
struct rq *rq = cpu_rq(cpu);
unsigned long total = weighted_cpuload(cpu);- if (type == 0 || !sched_feat(LB_BIAS))
+ if (type == 0)
return total;+ if (!sched_feat(LB_BIAS))
+ return avg(rq->cpu_load[type-1], total);
+
return min(rq->cpu_load[type-1], total);
}@@ -2023,9 +2026,12 @@ static unsigned long target_load(int cpu
struct rq *rq = cpu_rq(cpu);
unsigned long total = weighted_cpuload(cpu);- if (type == 0 || !sched_feat(LB_BIAS))
+ if (t...
--
--
I don't think this implementation of avg should go in kernel.h?
It gives an average of 1 and 1 to be 2, 3 and 3 is 4, 1 and 3 is
3 etc.Maybe it is reasonable for very high numbers that would overflow
if added first, but it doesn't seem reasonable for a generic
averaging function.
--
The usual way of averaging numbers that may be large is
#define avg(x, y) ({ \
typeof(x) _x = (x); \
typeof(x) _y = (y); \
(void) (&_x == &_y); \
_x + (_y - _x)/2; })...which also works for small and negative numbers.
--
Ok, so one last time (I hope!)..
Everybody happy with this?
---
Subject: sched: load-balance bias fixes
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Wed Aug 20 15:28:51 CEST 2008Yanmin spotted a regression with my patch that introduces LB_BIAS:
commit 93b75217df39e6d75889cc6f8050343286aff4a5
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Fri Jun 27 13:41:33 2008 +0200And I just spotted the brainfart - I should have replaced min/max with avg
instead of removing it completely.[ray-lk@madrabbit.org: better avg implementation]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/kernel.h | 6 ++++++
kernel/sched.c | 10 ++++++++--
2 files changed, 14 insertions(+), 2 deletions(-)Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2008,9 +2008,12 @@ static unsigned long source_load(int cpu
struct rq *rq = cpu_rq(cpu);
unsigned long total = weighted_cpuload(cpu);- if (type == 0 || !sched_feat(LB_BIAS))
+ if (type == 0)
return total;+ if (!sched_feat(LB_BIAS))
+ return avg(rq->cpu_load[type-1], total);
+
return min(rq->cpu_load[type-1], total);
}@@ -2023,9 +2026,12 @@ static unsigned long target_load(int cpu
struct rq *rq = cpu_rq(cpu);
unsigned long total = weighted_cpuload(cpu);- if (type == 0 || !sched_feat(LB_BIAS))
+ if (type == 0)
return total;+ if (!sched_feat(LB_BIAS))
+ return avg(rq->cpu_load[type-1], total);
+
return max(rq->cpu_load[type-1], total);
}Index: linux-2.6/include/linux/kernel.h
===================================================================
--- linux-2.6.orig/include/linux/kernel.h
+++ linux-2.6/include/linux/kernel.h
@@ -367,6 +367,12 @@ static inline char *pack_hex_byte(char *
(void) (&_max1 == &_max2); \
_max1 > _max2 ? _max1 : _max2; })+#define avg(x, y) ({ ...
looks good to me. What is missing is Yanmin's confirmation that the
patch indeed solves/improves the regression :-)Ingo
--
That's not going to work with unsigned types.
--
Uhm, I think it works fine, even with unsigned, even where _avg2 is
smaller than _avg1. Underflow is a good thing here. And I mocked up a
little test harness and it gives the correct answers for a half dozen
sets of values I tossed at itBut maybe I'm forgetting an obscure unsigned or signed int type
widening rule, so, care to elaborate?
--
Nick is right, try:
int main(int argc, char **argv)
{
unsigned int x = 7, y = 5;
printf("%d\n", avg(x,y));
return 0;
}It fails because 5-7 = -2, which needs a signed division or sign
extending right shift.we'd need something like:
#define avg(x, y) ({ \
typeof(x) _avg1 = (x); \
typeof(y) _avg2 = (y); \
(void) (&_avg1 == &_avg2); \
_avg1 + (signed typeof(x))(_avg2 - _avg1)/2; })except that typeof() doesn't work that way.
#define avg(x, y) ({ \
typeof(x) _avg1 = (x); \
typeof(y) _avg2 = (y); \
(void) (&_avg1 == &_avg2); \
_avg1 + (long)(_avg2 - _avg1)/2; })works for the above example, but when I make it long long, so as to
match the longest supported type, it goes boom again - for as of yet
unknown reasons.--
I think you'd want to cast it with a (signed) instead? as in:
#include <stdio.h>
#define avg(x, y) ({ \
typeof(x) _x = (x); \
typeof(y) _y = (y); \
(void) (&_x == &_y); \
_x + (signed)(_y - _x)/2; })int main (void) {
unsigned long long a=7,b=5;printf("%d %d\n", avg(a,b), avg(b,a));
}...which works here, for me, but hey, I managed to goof up my other
test case, so take it for a spin.Ray
--
signed is short for signed int, which is too short for say long or long
long input.Anyway, see my previous mail in which I explained that I got the cast
order wrong.--
Ok, people pointed out I got my promotion rules mixed up, I casted the
result of the division to signed, instead of ending up with a signed
division.#define avg(x, y) ({ \
typeof(x) _avg1 = (x); \
typeof(y) _avg2 = (y); \
(void) (&_avg1 == &_avg2); \
(typeof(x))(_avg1 + ((long long)_avg2 - _avg1)/2); })seems to work.
--
ok, could you please just send a patch that is local to sched.c and then
we can let this kernel.h change play out independently? There's too many
iterations of this and it's better to decouple the two.Ingo
--
Right, I guess that will work, but unfortunately the code gen on 32-bit
is a monstrosity. If you're going to cast to 64-bit anyway, we might as
well then just do the normal add rather than playing the game to avoid
overflow.Secondly, this is operating on the fixed point scaled load numbers, so in
the case of the scheduler I wouldn't worry too much about rounding... also
in most integer operations, rounding down is less surprising than rounding
up like the last code did.I still don't know whether it is appropriate to put it into kernel.h
(because of rounding, and variability when it comes to what type size will
hold the sum of parameters), but for the scheduler, I would use this:((unsigned long long)a + b) / 2;
Which gives this on 32-bit:
div:
movl %edx, %ecx
xorl %edx, %edx
pushl %ebx
xorl %ebx, %ebx
addl %ecx, %eax
adcl %ebx, %edx
popl %ebx
shrdl $1, %edx, %eax
shrl %edx
retRather than this:
div:
subl $8, %esp
xorl %ecx, %ecx
movl %ebx, (%esp)
movl %edx, %ebx
movl %esi, 4(%esp)
xorl %esi, %esi
subl %eax, %ebx
sbbl %ecx, %esi
movl %esi, %ecx
movl %esi, %edx
sarl $31, %ecx
movl %ecx, %edx
xorl %ecx, %ecx
shrl $31, %edx
addl %ebx, %edx
movl (%esp), %ebx
adcl %esi, %ecx
movl 4(%esp), %esi
addl $8, %esp
shrdl $1, %ecx, %edx
addl %edx, %eax
sarl %ecx
retAnd it's also slightly better on 64-bit.
--
Right - anyway the point is moot - as Yanmin says it still sucks rocks.
But since I couldn't let it rest :-)
---
Index: linux-2.6/include/linux/kernel.h
===================================================================
--- linux-2.6.orig/include/linux/kernel.h
+++ linux-2.6/include/linux/kernel.h
@@ -367,6 +367,45 @@ static inline char *pack_hex_byte(char *
(void) (&_max1 == &_max2); \
_max1 > _max2 ? _max1 : _max2; })+#define __avg_t(type, x, y) ({ \
+ typeof(x) __avg1 = (x); \
+ typeof(y) __avg2 = (y); \
+ __avg1 + ((type)(__avg2 - __avg1))/2; })
+
+extern void avg_unknown_size(void);
+
+#define __avg(x, y) ({ \
+ typeof(x) ret; \
+ switch (sizeof(ret)) { \
+ case 1: \
+ ret = __avg_t(s8, x, y); \
+ break; \
+ case 2: \
+ ret = __avg_t(s16, x, y); \
+ break; \
+ case 4: \
+ ret = __avg_t(s32, x, y); \
+ break; \
+ case 8: \
+ ret = __avg_t(s64, x, y); \
+ break; \
+ default: \
+ avg_unknown_size(); \
+ break; \
+ } \
+ ret; })
+
+#define avg(x, y) ({ \
+ typeof(x) _avg1 = (x); \
+ typeof(y) _avg2 = (y); \
+ (void) (&_avg1 == &_avg2); \
+ __avg(_avg1, _avg2); })
+
+#define avg_t(type, x, y) ({ \
+ type _avg1 = (x); \
+ type _avg2 = (y); \
+ __avg(_avg1, _avg2); })
+
/**
* clamp - return a value clamped to a given range with strict typechecking
* @val: current value--
D'oh, why didn't I think of that..
--
I had it in sched.c, then moved it to kernel.h and back again, etc.. I'm
fine with wherever..---
Subject: sched: load-balance bias fixes
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Wed Aug 20 15:28:51 CEST 2008Yanmin spotted a regression with my patch that introduces LB_BIAS:
commit 93b75217df39e6d75889cc6f8050343286aff4a5
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Fri Jun 27 13:41:33 2008 +0200And I just spotted the brainfart - I should have replaced min/max with avg
instead of removing it completely.Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
kernel/sched.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -1996,6 +1996,12 @@ void kick_process(struct task_struct *p)
preempt_enable();
}+#define avg(x, y) ({ \
+ typeof(x) _avg1 = ((x)+1)/2; \
+ typeof(y) _avg2 = ((y)+1)/2; \
+ (void) (&_avg1 == &_avg2); \
+ _avg1 + _avg2; })
+
/*
* Return a low guess at the load of a migration-source cpu weighted
* according to the scheduling class and "nice" value.
@@ -2008,9 +2014,12 @@ static unsigned long source_load(int cpu
struct rq *rq = cpu_rq(cpu);
unsigned long total = weighted_cpuload(cpu);- if (type == 0 || !sched_feat(LB_BIAS))
+ if (type == 0)
return total;+ if (!sched_feat(LB_BIAS))
+ return avg(rq->cpu_load[type-1], total);
+
return min(rq->cpu_load[type-1], total);
}@@ -2023,9 +2032,12 @@ static unsigned long target_load(int cpu
struct rq *rq = cpu_rq(cpu);
unsigned long total = weighted_cpuload(cpu);- if (type == 0 || !sched_feat(LB_BIAS))
+ if (type == 0)
return total;+ if (!sched_feat(LB_BIAS))
+ return avg(rq->cpu_load[type-1], total);
+
return max(rq->cpu_load[type-1], total);
}--
ok, i've applied this one to tip/sched/urgent instead of the
feature-disabling patchlet. Yanmin, could you please check whether this
one does the trick?Ingo
--
This new patch almost doesn't help volanoMark. Pls. use the patch
which sets LB_BIAS=1 by default.-yanmin
--
ok. That also removes the kernel.h complications ;-)
Ingo
--
Sorry, I have new update.
Originally, I worked on 2.6.27-rc1. I just move to 2.6.27-rc3 and found
something defferent when CONFIG_GROUP_SCHED=n.With 2.6.27-rc3, on my 8-core stoakley, all volanoMark regression disappears,
no matter if I enable LB_BIAS. On 16-core tigerton, the regression is still
there if I don't enable LB_BIAS and regression becomes 11% from 65%.8-core stoakley:
2.6.26_nogroup 340669
2.6.27-rc1_nogroup 267237
2.6.27-rc1_nogroup+LB_BIAS=1 330693
2.6.27-rc3_nogroup 352193
2.6.27-rc3_nogroup+LB_BIAS=1 35587216-core tigerton:
2.6.26_nogroup 539644
2.6.27-rc1_nogroup 309334
2.6.27-rc1_nogroup+LB_BIAS=1 478360
2.6.27-rc3_nogroup 348426
2.6.27-rc3_nogroup+LB_BIAS=1 483150-yanmin
--
I have new updates on this regression. I checked volanoMark web page and
found the client command line has option rooms and users. rooms means how many
chat room will be started. users means how many users are in 1 room. The default
rooms is 10 and users is 20, so every room has about 800 threads. As all threads of a
room just communicate within this room, so the rooms number is important.All my previous volanoMark testing uses default rooms 10 and users 20. With wake_offine
in kernel, waker/sleeper will be moved to the same cpu gradually. However, if the
rooms is not multiple of cpu number, due to load balance, kernel will move threads from
one cpu to another cpu continually. If there are too many threads to weaken the cache-hot
effect, load balance is more important. But if there are not too many threads running,
cache-hot is more important than load balance. Should we prefer to wake_affine more?Below is some data I collected with numerous testing on 3 machines.
On 2-quadcore processor stoakley (8-core):
kernel\rooms | 8 | 10 | 16 | 32
-------------------------------------------------------------------------------------------
2.6.26_nogroup | 385617 | 351247 | 323324 | 231934
-------------------------------------------------------------------------------------------
2.6.27-rc4_nogroup | 359124 | 336984 | 335180 | 235258
-------------------------------------------------------------------------------------------
2.6.26group | 381425 | 343636 | 312280 | 179673
-------------------------------------------------------------------------------------------
2.6.27-rc4group | 212112 | 270000 | 300188 | 228465
-------------------------------------------------------------------------------------------
On 2-quadcore+HT processor new x86_64 (8-core+HT, total 16 threads):
kernel\rooms | 10 |...
--
| Mariusz Kozlowski | [PATCH 01] kmalloc + memset conversion co kzalloc |
| Rafael J. Wysocki | [Bug #10629] 2.6.26-rc1-$sha1: RIP __d_lookup+0x8c/0x160 |
| Vladislav Bolkhovitin | Re: Integration of SCST in the mainstream Linux kernel |
| Jeff Garzik | Re: [RFC] Heads up on sys_fallocate() |
git: | |
| Linus Torvalds | Re: [GIT]: Networking |
| Gerrit Renker | [PATCH 27/37] dccp: Integration of dynamic feature activation - part 2 (server side) |
| Jarek Poplawski | [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Andrew Morton | Re: [BUG] New Kernel Bugs |
