fresh back from the Kernel Summit, Peter Zijlstra and me are pleased to announce the latest iteration of the CFS scheduler development tree. Our main focus has been on simplifications and performance - and as part of that we've also picked up some ideas from Roman Zippel's 'Really Fair Scheduler' patch as well and integrated them into CFS. We'd like to ask people go give these patches a good workout, especially with an eye on any interactivity regressions. The combo patch against 2.6.23-rc6 can be picked up from: http://people.redhat.com/mingo/cfs-scheduler/devel/ The sched-devel.git tree can be pulled from: git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git There are lots of small performance improvements in form of a finegrained 29-patch series. We have removed a number of features and metrics from CFS that might have been needed but ended up being superfluous - while keeping the things that worked out fine, like sleeper fairness. On 32-bit x86 there's a ~16% speedup (over -rc6) in lmbench (lat_ctx -s 0 2) results: (microseconds, lower is better) ------------------------------------------------------------ v2.6.22 2.6.23-rc6(CFS) v2.6.23-rc6-CFS-devel ---------------------------------------------------- 0.70 0.75 0.65 0.62 0.66 0.63 0.60 0.72 0.69 0.62 0.74 0.61 0.69 0.73 0.53 0.66 0.73 0.63 0.63 0.69 0.61 0.63 0.70 0.64 0.61 0.76 0.61 0.69 0.74 0.63 ---------------------------------------------------- avg: 0.64 0.72 (+12%) 0.62 (-3%) there is a similar speedup on 64-bit x86 as well. We ...
Hi, Hi, Out of curiousity: will I ever get answers to my questions? bye, Roman -
the last few weeks/months have been pretty hectic - i get more than 50 non-list emails a day so i could easily have missed some. (and to take a line from Linus: my attention span is roughly that of a slightly retarded golden retriever ;) so it would be helpful if you could please re-state any questions you still have, in context of our latest CFS-devel queue. I tried to answer the error/rounding worries you had - which seemed to be the main theme of your patch. There are lots of good kernel hackers on lkml who know the new scheduler code pretty well and who might be able to provide an answer even if i dont manage to answer. (Perhaps asking the questions without heavy math will also help more people be able to understand and answer your questions and their practical relevance.) In any case - if you see packet loss on my side then please resend :) That would be hugely helpful. Thanks, Ingo -
Hi, Well, let's just take the recent "Really Simple Really Fair Scheduler" thread. You had the time to ask me questions about my scheduler, I even explained to you how the sleeping bonus works in my model. At the end I was sort of hoping you would start answering my questions and explaining things how the same things work in CFS - but nothing. Then you had the time to reimplement the very things you've just asked me about and what do I get credit for - "two cleanups from RFS". And now I get this lame ass excuse for not answering my questions? :-( bye, Roman -
i'm sorry to say this, but you must be reading some other email list and a different git tree than what i am reading. Firstly, about communications - in the past 3 months i've written you 40 emails regarding CFS - and that's more emails than my wife (or any member of my family) got in that timeframe :-( I just ran a quick script: i sent more CFS related emails to you than to any other person on this planet. I bent backwards trying to somehow get you to cooperate with us (and i still havent given up on that!) - instead of you disparaging CFS and me frequently :-( Secondly, i prominently credited you as early as in the second sentence of our announcement: | fresh back from the Kernel Summit, Peter Zijlstra and me are pleased | to announce the latest iteration of the CFS scheduler development | tree. Our main focus has been on simplifications and performance - | and as part of that we've also picked up some ideas from Roman | Zippel's 'Really Fair Scheduler' patch as well and integrated them | into CFS. We'd like to ask people go give these patches a good | workout, especially with an eye on any interactivity regressions. http://lkml.org/lkml/2007/9/11/395 And you are duly credited in 3 patches: -------------------> Subject: sched: introduce se->vruntime introduce se->vruntime as a sum of weighted delta-exec's, and use that as the key into the tree. the idea to use absolute virtual time as the basic metric of scheduling has been first raised by William Lee Irwin, advanced by Tong Li and first prototyped by Roman Zippel in the "Really Fair Scheduler" (RFS) patchset. also see: http://lkml.org/lkml/2007/9/2/76 for a simpler variant of this patch. -------------------> Subject: sched: track cfs_rq->curr on !group-scheduling too Noticed by Roman Zippel: use cfs_rq->curr in the !group-scheduling case too. Small micro-optimization and cleanup effect: ...
Hi, This needs a little perspective, as I couldn't clone the repository (and you know that), all I had was this announcement, so using the patch descriptions now as defense is unfair by you. In this announcement you make relatively few references how this relates to my work. Maybe someone else can show me how to read that announcement differently, but IMO the casual reader is likely to get the impression, that you only picked some minor cleanups from my patch, but it's rather unclear that you already reimplemented key aspects of my patch. Don't Let's compare this to the relevant part of the announcement: | The ->vruntime metric is similar to the ->time_norm metric used by | Roman's patch (and both are losely related to the already existing | sum_exec_runtime metric in CFS), it's in essence the sum of CPU time | executed by a task, in nanoseconds - weighted up or down by their nice | level (or kept the same on the default nice 0 level). Besides this basic | metric our implementation and math differs from RFS. In the patch you are more explicit about the virtual time aspect, in the announcement you're less clear that it's all based on the same idea and somehow it's important to stress the point that "implementation and math differs", which is not untrue, but your forget to mention that the This is ridiculous, I asked you multiple times to explain to me some of the differences relative to CFS as response to the splitup requests. Not once did you react, you didn't even ask what I'd like to know I never claimed to understand every detail of CFS, I can _guess_ what _might_ have been intended, but from that it's impossible to know for certain how important they are. Let's take this patch fragment: - /* - * Fix up delta_fair with the effect of us running - * during the whole sleep period: - */ - if (sched_feat(SLEEPER_AVG)) - delta_fair = div64_likely32((u64)delta_fair * load, - ...
delta_fair =3D se->delta_fair_sleep; if we would have ran we would not have been removed from the rq and the weight would have been: rq_weight + weight so compensate for us having been removed from the rq by scaling the scale for nice levels
Or at least, I think that is how to read it :-) -
Hi, AFAICT the compensation part is already done by the scaling part, without the load part it largely mirrors what __update_stats_wait_end() does, i.e. it gets the same time as other tasks, which have been on the rq. bye, Roman -
All it tried to do was approximate the situation where the task never left the rq. I'm not saying it makes sense or is the right thing to do, just what the thought behind that particular bit was. There is a reason it was turned off by default: - SCHED_FEAT_SLEEPER_AVG *0 |
How the hell is that unfair? The fact that nobody could clone the repo for about 24 hours is *totally* *irrelevant* to the whole discussion as it's simply a matter of a technical glitch. His point in referencing patch descriptions is to clear up matters of credit. Ingo has never in this discussion been "out to get you". From the point of view of a sideline observer it's been *you* that has been demanding answers and refusing to answer questions directed at you. The most brilliant mathematician in the world would have nothing to contribute to the Linux scheduler if he couldn't describe, code, and comment his algorithm in detail so that others (even code-monkeys like myself) could grok at least the basic outline and be able to As a casual reader and reviewer I have yet to actually see you post readable/reviewable patches in this thread. I was basically completely unable to follow the detailed math you go into (even with a math minor) due to your *complete* lack of comments. The fact that you renamed files and didn't split up your patch made it useless for actual practical kernel development, its only value was as a comparison point. I did however get the impression that Ingo got something significantly useful out of your code despite the problems, but I still haven't had time to read through his and Peter's patches in detail to understand exactly what it was. From personal inspection of a fair percentage of the changes that Ingo and Peter committed, they certainly appear to be deleting a lot more code than they add. More specifically they appear to describe in detail what they are deleting and why, with the exception of one patch that's missing a changelog entry. So yeah, I get the impression that Ingo re-implemented some ideas that you had because you refused to do so in a way that was acceptable for the upstream kernel. How exactly is this a bad thing? You came up with a great idea that worked and somebody ...
Ah, that would have been one of mine.
---
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Handle vruntime overflow by centering the key space around min_vruntime.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/sched_fair.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index a306f05..b8e2a0d 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -116,11 +116,18 @@ set_leftmost(struct cfs_rq *cfs_rq, struct rb_node *leftmost)
cfs_rq->rb_leftmost = leftmost;
if (leftmost) {
se = rb_entry(leftmost, struct sched_entity, run_node);
- cfs_rq->min_vruntime = max(se->vruntime,
- cfs_rq->min_vruntime);
+ if ((se->vruntime > cfs_rq->min_vruntime) ||
+ (cfs_rq->min_vruntime > (1ULL << 61) &&
+ se->vruntime < (1ULL << 50)))
+ cfs_rq->min_vruntime = se->vruntime;
}
}
+s64 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+ return se->fair_key - cfs_rq->min_vruntime;
+}
+
/*
* Enqueue an entity into the rb-tree:
*/
@@ -130,7 +137,7 @@ __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
struct rb_node **link = &cfs_rq->tasks_timeline.rb_node;
struct rb_node *parent = NULL;
struct sched_entity *entry;
- s64 key = se->fair_key;
+ s64 key = entity_key(cfs_rq, se);
int leftmost = 1;
/*
@@ -143,7 +150,7 @@ __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
* We dont care about collisions. Nodes with
* the same key stay together.
*/
- if (key - entry->fair_key < 0) {
+ if (key < entity_key(cfs_rq, entry)) {
link = &parent->rb_left;
} else {
link = &parent->rb_right;
-
On Thu, 13 Sep 2007 18:50:12 +0200 (CEST) Roman, this is... a strange comment. It almost sounds like you were holding the splitup hostage depending on some other thing happening.... that's not a good attitude in my book. Having big-blob patches that do many things at the same time leads to them being impossible to apply. Linux works by having smaller incrementals. You know that; you've been around for a long time. Complaining that someone finally did splitup work after you refused, and even puts credit in for you... that's beyond my comprehension. Sorry. -
Hi, There is actually a very simple reason for that, the actual patch is not my primary focus, for me it's actually more an afterthought of the actual design to show that it actually works. My primary interest is a _discussion_ of the scheduler design, but Ingo insists on patches. Sorry, but I don't really work this way, I want to think things through _first_, I need a solid concept and I don't like to rely on guesswork. How much response would I have gotten if I had only posted the example program and the math description as I initially planned? bye, Roman -
On Fri, 14 Sep 2007 16:50:22 +0200 (CEST) for someone who's not focused on patches/code, you make quite a bit of noise when someone does turn your discussion into smaller patches and only credits you three times. -
Hi, As I said before, it's not really the lack of credit, it's the lack of discussion. bye, Roman -
Hi Roman. I have read the announcement from Ingo and after reading it I concluded that it was good to see that Ingo had taken in consideration the feedback from you and improved the schduler based on this. And when I read that he removed a lot of stuff I smiled. This reminded me of countless monkey aka code review sessions where I repeatedly do like my childred and asks why so many times that the author realize that something is not needed or no longer used. The above were my impression after reading the announcement with respect to your influence and that goes far beyond "two cleanups". I bet many others read it roughly like I did. And no - I did not go back and re-read it. So do not answering by quoting the announcement or stuff like this. Because that will NOT change what my first impression was. So keep up the review - we get a better scheduler this way. Sam -
Hi, Sam, in a way you actually prove my point. Thanks. :) The primary thing you remember here from the announcements is the cleanup and tuning part, during which he picked up some small parts from my patch. That's what I was afraid of, most people won't realize what was added in this process and even if they notice it, Ingo describes it somewhat "similiar", but actually "different". That part is pretty important to me, but Ingo treats it more as a minor matter. bye, Roman -
Hi Ingo, When compiling, I get: In file included from kernel/sched.c:794: kernel/sched_fair.c: In function 'task_new_fair': kernel/sched_fair.c:857: error: 'sysctl_sched_child_runs_first' undeclared (first use in this function) kernel/sched_fair.c:857: error: (Each undeclared identifier is reported only once kernel/sched_fair.c:857: error: for each function it appears in.) Presumably because sched_fair.c is being included into sched.c before sysctl_sched_child_runs_first is defined. Regards, Rob -
Yeah, this was my fault :(
I've had a chance to test this now, and everything feels great. I did
some benchmarks for 2.6.23-rc1, 2.6.23-rc6-cfs, and
2.6.23-rc6-cfs-devel:
lat_ctx -s 0 2:
2.6.23-rc1 2.6.23-rc6-cfs 2.6.23-rc6-cfs-devel
5.15 4.91 5.05
5.23 5.18 4.85
5.19 4.89 5.17
5.36 5.23 4.86
5.35 5.00 5.13
5.34 5.05 5.12
5.26 4.99 5.06
5.11 5.04 4.96
5.29 5.19 5.18
5.40 4.93 5.07
hackbench 50:
2.6.23-rc1 2.6.23-rc6-cfs 2.6.23-rc6-cfs-devel
6.301 5.963 5.837
6.417 5.961 5.814
6.468 5.965 5.757
6.525 5.926 5.840
6.320 5.929 5.751
6.457 5.909 5.825
pipe-test (http://redhat.com/~mingo/cfs-scheduler/tools/pipe-test.c):
2.6.23-rc1 2.6.23-rc6-cfs 2.6.23-rc6-cfs-devel
14.29 14.03 13.89
14.31 14.01 14.10
14.27 13.99 14.15
14.31 14.02 14.16
14.53 14.02 14.14
14.53 14.27 14.16
14.51 14.36 14.12
14.48 14.33 14.16
14.52 14.36 14.17
14.47 14.36 14.15
I turned the results into graphs as well. I'll attach them, but they're also at:
http://www.healthcarelinen.com/misc/lat_ctx_benchmark.png
http://www.healthcarelinen.com/misc/hackbench_benchmark.png
http://www.healthcarelinen.com/misc/pipe-test_benchmark.png
The hackbench and pipe-test numbers are very encouraging. The avg
between the 2.6.23-rc6-cfs and 2.6.23-rc6-cfs-devel lat_ctx numbers
are nearly identical (5.041 and 5.045 respectively).
thanks for the numbers! Could you please also post the .config you used? Thx, Ingo -
Sure, .config for 2.6.23-rc1 and 2.6.23-rc6 attached.
thx! If you've got some time, could you perhaps re-measure with these disabled: CONFIG_SCHED_DEBUG=y CONFIG_SCHEDSTATS=y these options mask some of the performance enhancements we made. There's also a new code drop at: http://people.redhat.com/mingo/cfs-scheduler/devel/ with some fixes for SMP. (and you've got an SMP box it appears) also, if you want to maximize performance, it usually makes more sense to build with these flipped around: # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_FORCED_INLINING=y i.e.: CONFIG_CC_OPTIMIZE_FOR_SIZE=y # CONFIG_FORCED_INLINING is not set because especially on modern x86 CPUs, smaller x86 code is faster. (and it also takes up less I-cache size) Ingo -
Well, I was going over my config myself after you asked for me to post it, and I thought to do the same thing. Except, disabling sched_debug caused the same error as before: In file included from kernel/sched.c:794: kernel/sched_fair.c: In function 'task_new_fair': kernel/sched_fair.c:857: error: 'sysctl_sched_child_runs_first' undeclared (first use in this function) kernel/sched_fair.c:857: error: (Each undeclared identifier is reported only once kernel/sched_fair.c:857: error: for each function it appears in.) make[1]: *** [kernel/sched.o] Error 1 make: *** [kernel] Error 2 It only happens with sched_debug=y. I take it back, it wasn't my fault :) As for everything else, I'd be happy to. -
I'm trying the patches now to see if they help. -
Current cfs-devel git compiles fine without sched_debug. Not sure how I broke things, but I need some sleep. I know the 2.6.23-rc1 numbers were good, but not sure about the others. I'll make the changes you suggested, and get some new and hopefully good numbers for 2.6.23-rc6-cfs and 2.6.23-rc6-cfs-devel. -
are you sure this is happening with the latest iteration of the patch too? (with the combo-3.patch?) You can pick it up from here: http://people.redhat.com/mingo/cfs-scheduler/devel/sched-cfs-v2.6.23-rc6-v21-combo-3.p... I tried your config and it builds fine here. Ingo -
I managed to work it all out (it was my fault after all), and I've now made the changes you suggested to my .configs for 2.6.23-rc1 and 2.6.23-rc6. I've done the benchmarks all over, including tests with the task bound to a single core. Without further ado, the numbers I promised: lat_ctx -s 0 2 # rc1 rc6 cfs-devel 1 4.58 4.39 4.42 2 4.76 4.42 4.41 3 4.74 4.52 4.67 4 4.74 4.44 4.76 5 4.79 4.74 4.59 6 4.80 4.65 4.76 7 4.52 4.54 4.50 8 4.72 4.57 4.62 9 4.87 4.67 4.80 10 4.69 4.47 4.65 hackbench 50 # rc1 rc6 cfs-devel 1 6.634 5.969 5.894 2 6.342 5.974 5.903 3 6.219 5.913 5.941 4 6.702 5.980 5.916 5 6.287 6.007 5.943 6 6.239 6.022 5.899 7 6.434 5.946 5.904 8 6.229 6.007 5.941 9 6.387 5.947 5.880 10 6.383 5.946 5.933 pipe-test # rc1 rc6 cfs-devel 1 13.39 13.16 13.20 2 13.37 13.12 13.22 3 13.19 13.17 13.26 4 13.17 13.16 13.18 5 13.16 13.22 13.23 6 13.15 13.19 13.18 7 13.18 13.42 13.21 8 13.45 13.39 13.26 9 13.40 13.40 13.28 10 13.39 13.44 13.24 Bound to single core: lat_ctx -s 0 2 # rc1 rc6 cfs-devel 1 3.20 2.61 2.37 2 3.20 2.60 2.40 3 3.21 2.67 2.38 4 3.19 2.66 2.34 5 3.19 2.64 2.37 6 3.22 2.67 2.36 7 3.21 3.29 2.36 8 3.22 2.61 2.44 9 3.23 2.68 2.36 10 3.22 2.60 2.37 hackbench 50 # rc1 rc6 cfs-devel 1 7.528 7.950 7.538 2 7.649 8.026 7.548 3 7.613 8.160 7.580 4 7.550 8.054 7.558 5 7.563 8.373 7.559 6 7.617 8.152 7.550 7 7.593 7.831 7.562 8 7.602 8.311 7.588 9 7.589 8.010 7.552 10 7.682 8.059 7.556 pipe-test # rc1 rc6 cfs-devel 1 10.29 9.27 8.54 2 10.30 9.29 8.54 3 10.31 9.28 8.54 4 10.29 9.27 8.54 5 10.28 9.28 8.53 6 10.30 9.28 8.53 7 10.30 9.28 8.54 I've made graphs like last ...
I knew there was no way I'd post all these numbers and not screw something up. Switch rc6 and rc1 for hackbench 50 (bound to single core). Updated graph: http://www.healthcarelinen.com/misc/BOUND_hackbench_benchmark_fixed.png Also attached.
Well looking at these graphs (and the fixed one from your second email), it sure looks a lot like CFS is doing at *least* as well as the old scheduler in every single test, and doing much better in most of them (in addition it's much more consistent between runs). This seems to jive with all the other benchmarks and overall empirical testing that everyone has been doing. Overall I have to say a job well done for Ingo, Peter, Con, and all the other major contributors to this impressive endeavor. Cheers, Kyle Moffett -
Initial test-drive looks good here, but I do see a regression. First the good news. fairtest2 is perfect, more perfect than ever seen before in fact. Mixed interval sleepers/hog looks fine as well (can't say perfect due to startup differences with the various proggies, but cpu% looks perfect). Amarok song switch time under hefty kbuild load is fine as well. I haven't done heavy multimedia testing yet, but will give it a more thorough workout later (errands). The regression: I see some GUI lurch, easily reproducible by running a make -j5 and moving the mouse in a circle... perceptible (100ms or so) lurches not present in rc5. -Mike -
Hi, I'm must really say, I'm quite impressed by your efforts to give me as little credit as possible. On the one hand it's of course positive to see so much sudden activity, on the other hand I'm not sure how much had happened if I hadn't posted my patch, I don't really think it were my complaints about CFS's complexity that finally lead to the improvements in this area. I presented the basic concepts of my patch already with my first CFS review, but at that time you didn't show any interest and instead you were rather quick to simply dismiss it. My patch did not add that much new, it's mostly a conceptual improvement and describes the math in more detail, but it also Am I the only one who can't clone that thing? So I can't go into much detail about the individual changes here. The thing that makes me curious, is that it also includes patches by others. It can't be entirely explained with the Kernel Summit, as this is not the first time patches appear out of the blue in form of a git tree. The funny/sad thing is that at some point Linus complained about Con that his development activity happend on a separate mailing list, but there was at least a place to go to. CFS's development appears to mostly happen in private. Patches may be your primary form of communication, but that isn't true for many other people, with patches a lot of intent and motivation for a change is lost. I know it's rather tempting to immediately try out an idea first, but would it really hurt you so much to formulate an idea in a more conventional manner? Are you afraid it might hurt your ueberhacker status by occasionally screwing up in public? Patches on the other hand have the advantage to more easily cover that up by simply posting a fix - it makes it more difficult to understand what's going on. A more conventional way of communication would give more people a chance to participate, they may not understand every detail of the patch, but they can try to understand the general ...
---------- Forwarded message ---------- From: Roman Zippel <zippel@linux-m68k.org> Date: Sep 12, 2007 6:17 PM Subject: Re: [announce] CFS-devel, performance improvements To: Ingo Molnar <mingo@elte.hu> Cc: linux-kernel@vger.kernel.org, Peter Zijlstra <a.p.zijlstra@chello.nl>, Mike Galbraith <efault@gmx.de> Hi, I'm must really say, I'm quite impressed by your efforts to give me as little credit as possible. On the one hand it's of course positive to see so much sudden activity, on the other hand I'm not sure how much had happened if I hadn't posted my patch, I don't really think it were my complaints about CFS's complexity that finally lead to the improvements in this area. I presented the basic concepts of my patch already with my first CFS review, but at that time you didn't show any interest and instead you were rather quick to simply dismiss it. My patch did not add that much new, it's mostly a conceptual improvement and describes the math in more detail, but it also Am I the only one who can't clone that thing? So I can't go into much detail about the individual changes here. The thing that makes me curious, is that it also includes patches by others. It can't be entirely explained with the Kernel Summit, as this is not the first time patches appear out of the blue in form of a git tree. The funny/sad thing is that at some point Linus complained about Con that his development activity happend on a separate mailing list, but there was at least a place to go to. CFS's development appears to mostly happen in private. Patches may be your primary form of communication, but that isn't true for many other people, with patches a lot of intent and motivation for a change is lost. I know it's rather tempting to immediately try out an idea first, but would it really hurt you so much to formulate an idea in a more conventional manner? Are you afraid it might hurt your ueberhacker status by occasionally screwing up in public? Patches on the other hand have the advantage to ...
Please ignore the previous mail, i messed it up bad. is'nt it gud to use all those tricks if it helps? we'll know soon if it helps from the testing which it'll get. i'm just concerned about doing these cleanups so late in the rc cycle. And Ingo, please do explain the reasons for all these cleanups and why I give you credit for coming up with the math which is so easily understandable comapared to CFS. Don't loose patience, like Con did. Please keep fighting if u think ur code is better. It'll help all of us out here. regards, debian dev -
Ah - i have messed up my sched-devel.git script so the git-push went to
kernel.org but into my home directory :-/ Should work now - let me know
if it doesnt.
i've also uploaded the patch series in quilt format, to:
i'm not sure what you mean, but i can definitely tell you that there was
no scheduler hacking at the Kernel Summit. (there's no good wireless in
the pubs and not enough space for a laptop anyway ;)
The impressive linecount has been mostly achieved by dumb removal:
sched: remove wait_runtime fields and features
4 files changed, 14 insertions(+), 161 deletions(-)
sched: remove wait_runtime limit
5 files changed, 3 insertions(+), 124 deletions(-)
sched: remove precise CPU load calculations #2
1 file changed, 1 insertion(+), 31 deletions(-)
sched: remove precise CPU load
3 files changed, 9 insertions(+), 41 deletions(-)
sched: remove stat_gran
4 files changed, 15 insertions(+), 50 deletions(-)
Hack time to do them: ~10 minutes apiece. Removing stuff is _easy_ :-)
The rest is finegrained, small changes. One of the harder patches was
this one:
commit 28c4b8ed35f0fc7050f186147da9e10b55e1e446
sched: introduce se->vruntime
3 files changed, 50 insertions(+), 33 deletions(-)
And i sent you the first variant of that already:
http://lkml.org/lkml/2007/9/2/76
we needed 2 days after the KS to put it into shape and send it out for
feedback.
Ingo
-
The rounding error we now still have is accumulative over the long time but has no real effect. The only effect is that a nice level would be a little different that it would have been had the division been perfect, not dissimilar to having a small error in the divisor series to being with. (note that in order to see this little fuzz you need amazingly high context switch rates) We've measured the effect with the strongest nice levels -20 and 19, a normal loop against two yield loops (this generated 700.000 context switches per second), and the effect is <1%. Not something worth fixing IMHO (unless it comes for free).=20 At that high switching rates the overhead of scheduling itself and caching causes more skew than this - the small error is totally swamped While I agree that having this average is nice, your particular implementation has the problem that it quickly overflows u64 at which point it becomes a huge problem (a CPU hog could basically lock up your box when that happens). I solved the wrap around problem in cfs-devel, and from that base I _could_ probably maintain the average without overflow problems, but Currently we have 2 approximations in place: (leftmost + rightmost) / 2 and leftmost + period/2 (where period should match the span of the tree) neither are perfect but they seem to work quite well.
Hi, If you look at the math, you'll see that I took the overflow into account, I even expected it. If you see this effect in my implementation, it would You need more than two busy loops. There's a reason I implemented a simple simulator first, so I could actually study the scheduling behaviour of different load situations. That doesn't protect from all surprises of course, but it gives me the necessary confidence the scheduler will work reasonably even in weird situations. From these tests I already know that your approximations only work with rather simple loads. bye, Roman -
Ah, ok, I shall look to your patches in more detail, it was not obvious I'm missing context here, are you referring to the nice level error or Right, I've build user-space simulators too, handy little things to play I've not yet seen it go spectacularly wrong, although admittedly a highly concurrent kbuild is the most complex task I let loose on it. Could you perhaps be more specific on the circumstances it breaks down and what the negative impact is?
You indeed outlined it in your email, I must have forgotten it. I have the attention span of a goldfish these days :-/ -
Roman, i disagree strongly. I did test with different nice levels. Here are some hard numbers: the CPU usage table of 40 busy loops started at once, all running at a different nice level, from nice -20 to nice +19: top - 12:25:07 up 19 min, 2 users, load average: 40.00, 39.15, 28.35 Tasks: 172 total, 41 running, 131 sleeping, 0 stopped, 0 zombie PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2455 root 0 -20 1576 248 196 R 20 0.0 3:47.56 loop 2456 root 1 -19 1576 244 196 R 16 0.0 3:03.96 loop 2457 root 2 -18 1576 244 196 R 13 0.0 2:24.80 loop 2458 root 3 -17 1576 248 196 R 10 0.0 1:58.63 loop 2459 root 4 -16 1576 244 196 R 8 0.0 1:33.04 loop 2460 root 5 -15 1576 248 196 R 7 0.0 1:14.73 loop 2461 root 6 -14 1576 248 196 R 5 0.0 0:59.61 loop 2462 root 7 -13 1576 244 196 R 4 0.0 0:47.95 loop 2463 root 8 -12 1576 248 196 R 3 0.0 0:38.31 loop 2464 root 9 -11 1576 244 196 R 3 0.0 0:30.54 loop 2465 root 10 -10 1576 244 196 R 2 0.0 0:24.47 loop 2466 root 11 -9 1576 244 196 R 2 0.0 0:19.52 loop 2467 root 12 -8 1576 248 196 R 1 0.0 0:15.63 loop 2468 root 13 -7 1576 248 196 R 1 0.0 0:12.56 loop 2469 root 14 -6 1576 248 196 R 1 0.0 0:10.00 loop 2470 root 15 -5 1576 244 196 R 1 0.0 0:07.99 loop 2471 root 16 -4 1576 244 196 R 1 0.0 0:06.40 loop 2472 root 17 -3 1576 244 196 R 0 0.0 0:05.09 loop 2473 root 18 -2 1576 244 196 R 0 0.0 0:04.05 loop 2474 root 19 -1 1576 248 196 R 0 0.0 0:03.26 loop 2475 root 20 0 1576 244 196 R 0 0.0 0:02.61 loop 2476 root ...
Hi, Ingo, you should have read the rest of the paragraph too, I said "it's needed for a good task placement", I didn't say anything about time distribution. Try to start a few niced busy loops and then try some interactivity tests. You should also increase the granularity, the rather small time slices can You're forgetting that only a few days before that announcement, the worst Did you read the rest of mail? I said a little bit more than that, which actually explains this already in large parts. (BTW this mail also has one example where I almost begged you to explain me some of the CFS features in response to your splitup request - no response.) Accuracy is an important aspect, but it's not really the primary goal. As I said I wanted a correct mathematical model of CFS, but due to the complexity of CFS (of which a lot has been removed now in CFS-devel) it was rather difficult to produce such a model. Producing an accurate model is meant as a _tool_ for further transformations, e.g. to analyze where are further simplifications possible, where can the 64bit math be replaced with something simpler without reducing scheduling quality significantly. The added accuracy increases of course the complexity, but compared to the already existing complexity it was still less (at least according to the lmbench numbers), so IMO it's worth it. The advantage is that I didn't had to worry about any effects of unexpected rounding errors. This scheduler has to work with a wide range of clock implementations and AFAICT it's impossible to guarantee that it work in any situation, it may not break down completely, but I couldn't exclude unexplainable anomalities, especially after seeing the problems in the early CFS version, which got merged. As I also mentioned this is only part of the problem (but to which early CFS version significantly contributed). The main problem were the limits, once the limits are exceeded, that overflow/underflow time is simply lost and ...
Roman, I've been trying to follow your mails about CFS since your review posted on Aug 1st. Back to that date, I was thinking "cool, an in-depth review by someone who understands schedulers and mathematics very well, we'll quickly have a very solid design". On Aug 10th, I was disappointed to see that you still had not provided the critical information that Ingo had been asking to you for 9 days (cfs-sched-debug output). Your motivations in this work started to become a bit fuzzy to me, since people who behave like this generally do so to get all the lights on them and you really don't need this. Your explanation was kind of "show me yours and only then I'll show you mine". Pretty childish but you finally sent that long-requested information. Since then, I've been noticing your now popular "will I get a response to my questions" stuffed in most of your mails. That was getting very suspicious from someone who can write down mathematics equations to prove his design is right, especially considering the fact that your "question" only relates to what a few lines were supposed to do. Nobody believes that someone as smart as you is still blocked on the same line of code after one month! And if getting CFS fixed wasn't your real motivation... I'm now fairly convinced that you're not seeking credits either. There are more credits to your name per line of patch here than there is in your own code in the kernel. That complaint does not stand by itself. In fact, I'm beginning to think that you're like a cat who has found a mouse. Why kill it if you can play with it ? Each of your "will I get a response" are just like a small kick in the mouse's back to make it move. But by dint of doing this, you're slowly pushing the mouse to the door where it risks to escape from you, and you're losing your toy. So right now, I'm sure you really do not want to get any code merged. It's so much fun for you to say "hey, Ingo, respond to me" that you would lose At that time, if my memory ...
Hi, Well, I admit it was rather fruitless attempt to get some information out of Ingo, but I only did it _once_. The problem is that the flow of informations hasn't improved since, later I actually answered his questions, but my information requests still go to Getting credit is indeed not really that important to me, but apparently some lousy credit notes is the only way to get any kind of acknowledgement. I want to get more attention, but not in a way you suspect. I don't think that a mouse is a really good analogy, is he really that defenseless? All I want is to be taken a bit more seriously, the communication aspect I mentioned is really important. From my perspective Ingo is somewhere up on his pedestal and I have to scream to get any kind of attention. I skip the rest of the mail, it's one big attempt trying to prove that I'm dishonest, but you only look at the issue from one side and thus making it yourself very easy. bye, Roman -
Acknowledgement has always been one of the kernel's weaknesses it seems, given the recent issues on other subjects. But it's not always easy either, especially when you just change sparse parts of code based on someone else's analysis. I personally do credit people in the GIT changelogs for their ideas I don't know. But I observe that you're very efficient at building the road In my opinion, you're screaming in a language he does not understand, and when he proposes random responses, you don't understand them either. That game can last very long. You want to speak maths, he cannot. He wants to speak patches, you cannot. I'm not saying one is better than the other, but I know for sure that the common language here on LKML is patches. So my conclusion is that you need someone to act as a translator when you want Maybe there was a very prominent side then. I might be wrong in my analysis, but I cannot find any other interpretation, there are too many coincidences, and I don't believe in that, especially from smart people ;-) Willy -
