I tested 2.6.24-rc1 on my x86_64 machine which has 2 quad-core processors.
Comparing with 2.6.23, aim7 has about -30% regression. I did a bisect and found
patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commi...
caused the issue.kbuild/SPECjbb2000/SPECjbb2005 also has big regressions. On my another
tigerton machine (4 quad-core processors), SPECjbb2005 has more than -40%
regression. I didn't do a bisect on such benchmark testing, but I suspect
the root cause is like aim7's.-yanmin
-
these two commits might be relevant:
7a6c6bcee029a978f866511d6e41dbc7301fde4c
95dbb421d12fdd9796ed153853daf3679809274fbut a bisection result would be the best info.
Ingo
-
I got the tag from #git log. As for above link, I just added prior http address,
Above big patch doesn't include this one, which means if I do
'git checkout b5869ce7f68b233ceb81465a7644be0d9a5f3dbb', the kernel doesn't include
I will do a bisect between 2.6.23 and tag 9c63d9c021f375a2708ad79043d6f4dd1291a085.-yanmin
-
I ran git bisect with kernel version as the tag. It looks like git will
be crazy sometimes. So I checked ChangeLog and used the number tag to replace
the kernel version and retested it.It looks like at least 2 patches were responsible for the regression. I'm
doing sub-bisect now.I could find aim7 regression on all my testing machines although the regression
percentage is different.Machine regression
8-core stoakley 30%
16-core tigerton 6%
tulsa(dual-core+HT, 16 logical cpu) 20%-yanmin
-
sub-bisecting captured patch 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings)
caused 20% regression of aim7.The last 10% should be also related to sched parameters, such like
sysctl_sched_min_granularity.-yanmin
-
ah, interesting. Since you have CONFIG_SCHED_DEBUG enabled, could you
please try to figure out what the best value for
/proc/sys/kernel_sched_latency, /proc/sys/kernel_sched_nr_latency and
/proc/sys/kernel_sched_min_granularity is?there's a tuning constraint for kernel_sched_nr_latency:
- kernel_sched_nr_latency should always be set to
kernel_sched_latency/kernel_sched_min_granularity. (it's not a free
tunable)i suspect a good approach would be to double the value of
kernel_sched_latency and kernel_sched_nr_latency in each tuning
iteration, while keeping kernel_sched_min_granularity unchanged. That
will excercise the tuning values of the 2.6.23 kernel as well.Ingo
-
I followed your idea to test 2.6.24-rc1. The improvement is slow.
When sched_nr_latency=2560 and sched_latency_ns=640000000, the performance
is still about 15% less than 2.6.23.-yanmin
-
I got the aim7 30% regression on my new upgraded stoakley machine. I found
this mahcine is slower than the old one. Maybe BIOS has issues, or memeory(Might not
be dual-channel?) is slow. So I retested it on the old machine and found on the old
stoakley machine, the regression is about 6%, quite similiar to the regression on tigerton
machine.By sched_nr_latency=640 and sched_latency_ns=640000000 on the old stoakley machine,
the regression becomes about 2%. Other latency has more regression.On my tulsa machine, by sched_nr_latency=640 and sched_latency_ns=640000000,
the regression becomes less than 1% (The original regression is about 20%).When I ran a bad script to change the values of sched_nr_latency and sched_latency_ns,
I hit OOPS on my tulsa machine. Below is the log. It looks like sched_nr_latency becomes
0.*******************Log************************************
divide error: 0000 [1] SMP
CPU 1
Modules linked in: megaraid_mbox megaraid_mm
Pid: 7326, comm: sh Not tainted 2.6.24-rc1 #2
RIP: 0010:[<ffffffff8022c2bf>] [<ffffffff8022c2bf>] __sched_period+0x22/0x2e
RSP: 0018:ffff810105909e38 EFLAGS: 00010046
RAX: 000000005a000000 RBX: 0000000000000000 RCX: 000000002d000000
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000002
RBP: ffff810105909e40 R08: ffff810103bfed50 R09: 00000000ffffffff
R10: 0000000000000038 R11: 0000000000000296 R12: ffff810100d6db40
R13: ffff8101058c4148 R14: 0000000000000001 R15: ffff810104c34088
FS: 00002b851bc59f50(0000) GS:ffff810100cb1b40(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000006c64d8 CR3: 000000010752c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sh (pid: 7326, threadinfo ffff810105908000, task ffff810104c34040)
Stack: 0000000000000800 ffff810105909e58 ffffffff8022c2db 00000000079d292b
ffff810105909e88 ffffffff8022c36e ffff810100d6db...
I rerun SPECjbb by ched_nr_latency=640 and sched_latency_ns=640000000. On tigerton,
the regression is still more than 40%. On stoakley machine, it becomes worse (26%,
original is 9%). I will do more investigation to make sure SPECjbb regression is
also casued by the bad default values.We need a smarter method to calculate the best default values for the key tuning
parameters.One interesting is sysbench+mysql(readonly) got the same result like 2.6.22 (no
regression). Good job!-yanmin
-
Do you mean you couldn't reproduce the regression which was reported
with 2.6.23 (http://lkml.org/lkml/2007/10/30/53) with 2.6.24-rc1? It
would be nice if you could provide some numbers for 2.6.22, 2.6.23 andgreetings
Cyrus-
It looks like you missed my emails.
Firstly, I reproduced (or just find the same myself :) ) the issue with kernel 2.6.22,
2.6.23-rc and 2.6.23.Ingo wrote a big patch to fix it and the new patch is in 2.6.24-rc1 now.
Then I retested it with 2.6.24-rc1 on a couple of x86_64 machines. The issue
Sorry. Intel policy doesn't allow me to publish the numbers because only
specific departments in Intel could do that. But I could talk the regression
percentage.-yanmin
-
greetings
Cyrus-
The patch is very big.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commi...
-
Oops, yeah I think I overlooked that case :-/
I think limiting the sysctl parameters make most sense, as a 0 value
really doesn't.Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 3b4efbe..0f34c91 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -94,6 +94,7 @@ static int two =3D 2;
=20
static int zero;
static int one_hundred =3D 100;
+static int int_max =3D INT_MAX;
=20
/* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and G=
ID */
static int maxolduid =3D 65535;
@@ -239,7 +240,10 @@ static struct ctl_table kern_table[] =3D {
.data =3D &sysctl_sched_nr_latency,
.maxlen =3D sizeof(unsigned int),
.mode =3D 0644,
- .proc_handler =3D &proc_dointvec,
+ .proc_handler =3D &proc_dointvec_minmax,
+ .strategy =3D &sysctl_intvec,
+ .extra1 =3D &one,
+ .extra2 =3D &int_max,
},
{
.ctl_name =3D CTL_UNNUMBERED,
could we instead justmake sched_nr_latency non-tunable, and recalculate
it from the sysctl handler whenever sched_latency or
sched_min_granularity changes? That would avoid not only the division by
zero bug but also other out-of-spec tunings.Ingo
-
Bit weird that you point to a merge commit, and not an actual patch. Are
you sure git bisect pointed at this one?-
When I did a bisect, kernel couldn't boot and my testing log showed
it's at b5869ce7f68b233ceb81465a7644be0d9a5f3dbb. So I did a manual
checkout.#git clone ...
#git pull ...
#git checkout b5869ce7f68b233ceb81465a7644be0d9a5f3dbbThen, compiled kernel and tested it. Then, reversed above patch and recompiled/retested it.
If I ran git log, I could see this tag in the list.
-yanmin
-
