login
Header Space

 
 

Linux: Discussing the Really Fair Scheduler

September 3, 2007 - 5:18pm
Submitted by Jeremy on September 3, 2007 - 5:18pm.
Linux news

Ingo Molnar reviewed Roman Zippel's Really Fair Scheduler code, suggesting that much of the work was similar to that which was being done by Peter Zijlstra, "all in one, we don't disagree, this is an incremental improvement we are thinking about for 2.6.24. We do disagree with this being positioned as something fundamentally different though - it's just the same thing mathematically, expressed without a "/weight" divisor, resulting in no change in scheduling behavior. (except for a small shift of CPU utilization for a synthetic corner-case)"

Roman was not impressed with Ingo's review, asking, "did you even try to understand what I wrote?" He continued, "while Peter's patches are interesting, they are only a small step to what I'm trying to achieve." Regarding performance and code-quality concerns with his patch he added, "I explicitly said that my patch is only a prototype, so I haven't done any cleanups and tuning in this direction yet, so making any conclusions based on code size comparisons is quite ridiculous at this point. The whole point of this patch was to demonstrate the algorithmic changes, not to demonstrate a final and perfectly tuned scheduler."

Ingo noted that he was primarily interested in improving scheduling quality and scheduling performance:

"The main reason I'm interested in changing the fairness math under CFS (be that Peter's changes or your changes) is _not_ primarily to address any rounding behavior, but to potentially improve performance! Rounding errors are at most an academic issue, unless they are actually measurable in a workload we care about. (If rounding gets improved as a side-effect of a change that's an added, albeit lower-prio bonus.) But even more important is quality of scheduling - performance is secondary to that. (unless performance is so bad that it becomes a quality issue: such as an O(N) algorithm would do.)"

He went on to suggest that the main goals of Roman's patch could be implemented without the need for such a large patch, "the math in your patch _could_ be implemented as a much smaller add-on to the already existing variables maintained by CFS, but you chose to do lots of changes, variable-renames and removals at once and posted them as one big change." Finally, he acknowledged that he's fully willing to merge in large changes as necessary, though prefers them to be broken into small manageable pieces:

"I really welcome large changes to the scheduler (hey, in the past 2.5 years alone we added 350+ scheduler commits from over 95 unique contributors, so I'm as feature-happy as it gets), but it's far easier to review and merge stuff if it's nicely split up. (I'll eventually get through your patch too, but it's much harder that way and as you probably know every core kernel hacker is away for the Kernel Summit so don't expect much activity for a week or so.)"


From: Ingo Molnar [email blocked]
Subject: Re: [ANNOUNCE/RFC] Really Fair Scheduler
Date:	Fri, 31 Aug 2007 12:54:47 +0200


* Roman Zippel [email blocked] wrote:

> Hi,
> 
> I'm glad to announce a working prototype of the basic algorithm I 
> already suggested last time. As I already tried to explain previously 
> CFS has a considerable algorithmic and computational complexity. [...]

hey, thanks for working on this, it's appreciated! In terms of merging 
your stuff, your patch looks a bit large and because you did not tell us 
that you were coding in this area, you probably missed Peter Zijlstra's 
excellent CFS work:

  http://programming.kicks-ass.net/kernel-patches/sched-cfs/

The following portion of Peter's series does much of the core math 
changes of what your patch provides (and which makes up for most of the 
conceptual delta in your patch), on a step by step basis:

  sched-update_weight_inv.patch
  sched-se_load.patch
  sched-se_load-2.patch
  sched-64-wait_runtime.patch
  sched-scaled-limit.patch
  sched-wait_runtime-scale.patch
  sched-calc_delta.patch

So the most intrusive (math) aspects of your patch have been implemented 
already for CFS (almost a month ago), in a finegrained way.

Peter's patches change the CFS calculations gradually over from 
'normalized' to 'non-normalized' wait-runtime, to avoid the 
normalizing/denormalizing overhead and rounding error. Turn off sleeper 
fairness, remove the limit code and we should arrive to something quite 
close to the core math in your patch, with similar rounding properties 
and similar overhead/complexity. (there are some other small details in 
the math but this is the biggest item by far.) I find Peter's series 
very understandable and he outlined the math arguments in his replies to 
your review mails. (would be glad to re-open those discussions though if 
you still think there are disagreements.)

Peter's work fully solves the rounding corner-cases you described as:

> This model is far more accurate than CFS is and doesn't add an error 
> over time, thus there are no more underflow/overflow anymore within 
> the described limits.

( your characterisation errs in that it makes it appear to be a common
  problem, while in practice it's only a corner-case limited to extreme 
  negative nice levels and even there it needs a very high rate of 
  scheduling and an artificially constructed workload: several hundreds 
  of thousand of context switches per second with a yield-ing loop to be 
  even measurable with unmodified CFS. So this is not a 2.6.23 issue at 
  all - unless there's some testcase that proves the opposite. )

with Peter's queue there are no underflows/overflows either anymore in 
any synthetic corner-case we could come up with. Peter's queue works 
well but it's 2.6.24 material.

Non-normalized wait-runtime is simply a different unit (resulting in 
slightly higher context-switch performance), the principle and the end 
result does not change.

All in one, we dont disagree, this is an incremental improvement we are 
thinking about for 2.6.24. We do disagree with this being positioned as 
something fundamentally different though - it's just the same thing 
mathematically, expressed without a "/weight" divisor, resulting in no 
change in scheduling behavior. (except for a small shift of CPU 
utilization for a synthetic corner-case)

And if we handled that fundamental aspect via Peter's queue, all the 
remaining changes you did can be done (and considered and merged) 
evolutionarily instead of revolutionarily, ontop of CFS - this should 
cut down on the size and the impact of your changes significantly!

So if you'd like to work with us on this and get items that make sense 
merged (which we'd very much like to see happen), could you please 
re-shape the rest of your changes and ideas (such as whether to use 
ready-queueing or a runqueue concept, which does look interesting) ontop 
of Peter's queue, and please do it as a finegrained, reviewable, 
mergable series of patches, like Peter did. Thanks!

	Ingo


From: Roman Zippel [email blocked] Subject: Re: [ANNOUNCE/RFC] Really Fair Scheduler Date: Fri, 31 Aug 2007 15:19:56 +0200 (CEST) Hi, On Fri, 31 Aug 2007, Ingo Molnar wrote: > So the most intrusive (math) aspects of your patch have been implemented > already for CFS (almost a month ago), in a finegrained way. Interesting claim, please substantiate. > Peter's patches change the CFS calculations gradually over from > 'normalized' to 'non-normalized' wait-runtime, to avoid the > normalizing/denormalizing overhead and rounding error. Actually it changes wait-runtime to a normalized value and it changes nothing about the rounding error I was talking about. It addresses the conversion error between the different units I was mentioning in an earlier mail, but the value is still rounded. > > This model is far more accurate than CFS is and doesn't add an error > > over time, thus there are no more underflow/overflow anymore within > > the described limits. > > ( your characterisation errs in that it makes it appear to be a common > problem, while in practice it's only a corner-case limited to extreme > negative nice levels and even there it needs a very high rate of > scheduling and an artificially constructed workload: several hundreds > of thousand of context switches per second with a yield-ing loop to be > even measurable with unmodified CFS. So this is not a 2.6.23 issue at > all - unless there's some testcase that proves the opposite. ) > > with Peter's queue there are no underflows/overflows either anymore in > any synthetic corner-case we could come up with. Peter's queue works > well but it's 2.6.24 material. Did you even try to understand what I wrote? I didn't say that it's a "common problem", it's a conceptual problem. The rounding has been improved lately, so it's not as easy to trigger with some simple busy loops. Peter's patches don't remove limit_wait_runtime() and AFAICT they can't, so I'm really amazed how you can make such claims. > All in one, we dont disagree, this is an incremental improvement we are > thinking about for 2.6.24. We do disagree with this being positioned as > something fundamentally different though - it's just the same thing > mathematically, expressed without a "/weight" divisor, resulting in no > change in scheduling behavior. (except for a small shift of CPU > utilization for a synthetic corner-case) Everytime I'm amazed how quickly you get to your judgements... :-( Especially interesting is that you don't need to ask a single question for that, which would mean you actually understood what I wrote, OTOH your wild claims tell me something completely different. BTW who is "we" and how is it possible that this meta mind can come to such quick judgements? The basic concept is quite different enough, one can e.g. see that I have to calculate some of the key CFS variables for the debug output. The concepts are related, but they are definitively not "the same thing mathematically", the method of resolution is quite different, if you think otherwise then please _prove_ it. bye, Roman
From: Ingo Molnar [email blocked] Subject: Re: [ANNOUNCE/RFC] Really Fair Scheduler Date: Sun, 2 Sep 2007 11:26:11 +0200 * Roman Zippel [email blocked] wrote: > > with Peter's queue there are no underflows/overflows either anymore > > in any synthetic corner-case we could come up with. Peter's queue > > works well but it's 2.6.24 material. > > Did you even try to understand what I wrote? I didn't say that it's a > "common problem", it's a conceptual problem. The rounding has been > improved lately, so it's not as easy to trigger with some simple busy > loops. As i mentioned it in my first reply to you i really welcome your changes and your interest in the Linux scheduler, but i'm still somewhat surprised why you focus on rounding so much (and why you attack CFS's math implementation so vehemently and IMO so unfairly) - and we had this discussion before in the "CFS review" thread that you started. The kind of rounding error you seem to be worried about is very, very small. For normal nice-0 tasks it's in the "one part per a hundred million" range, or smaller. As a comparison: "top"'s CPU utilization statistics are accurate to "one part per thousand" - and that's roughly the precision range that humans care about. (Graphical tools are even coarser - one part per hundred or worse.) I suspect if rounding effects are measurable you will post numbers that prove it, correct? You did not do that so far. Or if they are not measurable for any workload we care about, why should we worry about it if it causes no problems in practice? (or if it causes problems, what are the actual effects and can they be addressed in a simpler way?) The main reason i'm interested in changing the fairness math under CFS (be that Peter's changes or your changes) is _not_ primarily to address any rounding behavior, but to potentially improve performance! Rounding errors are at most an academic issue, unless they are actually measurable in a workload we care about. (If rounding gets improved as a side-effect of a change that's an added, albeit lower-prio bonus.) But even more important is quality of scheduling - performance is secondary to that. (unless performance is so bad that it becomes a quality issue: such as an O(N) algorithm would do.) Your math is fairly simple (and that is _good_, just like CFS's existing math is simple), it can be summed up in essence as (without complicating it with nice-level weighting, for easy understandability): " use the already existing p->sum_exec_runtime 'task runtime' metric that CFS maintains, and use that as the key into the rb-tree that selects tasks that should be run next. To handle sleeping tasks: keep a per-rq sum of all runnable task's ->sum_exec_runtime values and start newly woken tasks at the average rq->sum/nr_running value. " Now your patch does not actually do it that way in a clearly discernible manner because lots of changes are intermixed into one big patch. ( please correct me if i got your math wrong. Your patch does not add any comments at all to the new code and this slowed down my review and analysis of your patch quite considerably. Lack of comments makes it harder to see the purpose and makes it harder to notice the benefits/tradeoffs involved in each change. ) Much of the remaining complexity in your patch i see as an add-on optimization to that concept: you use a quite complex Bresenham implementation that hides the real machinery of the code. Yes, rounding improves too with Breshenham, but to me that is secondary - i'm mainly interested in the performance aspects of Breshenham. There are advantages of your approach but it also has disadavantages: you removed sleeper fairness for example, which makes it harder to compare the scheduling quality of the two implementations. To sum up: the math in your patch _could_ be implemented as a much smaller add-on to the already existing variables maintained by CFS, but you chose to do lots of changes, variable-renames and removals at once and posted them as one big change. I really welcome large changes to the scheduler (hey, in the past 2.5 years alone we added 350+ scheduler commits from over 95 unique contributors, so i'm as feature-happy as it gets), but it's far easier to review and merge stuff if it's nicely split up. (I'll eventually get through your patch too, but it's much harder that way and as you probably know every core kernel hacker is away for the Kernel Summit so dont expect much activity for a week or so.) One conceptual reason behind the intrusive policy-modularization done in CFS was to _never again_ be forced to do "big" scheduler changes - we now can do most things in small, understandable steps. I posted my very first, raw version of CFS after 3 days of hacking which showed the raw math and nothing more, and i posted every iteration since then, so you can follow through its history. _Please_, separate things out so that they can be reviewed one by one. And _please_ order the patches in "level of intrusiveness" order - i.e. leave the more intrusive patches as late in the patch-queue as possible. Thanks! Ingo
From: Roman Zippel [email blocked] Subject: Re: [ANNOUNCE/RFC] Really Fair Scheduler Date: Mon, 3 Sep 2007 04:58:03 +0200 (CEST) Hi, On Sun, 2 Sep 2007, Ingo Molnar wrote: > > Did you even try to understand what I wrote? I didn't say that it's a > > "common problem", it's a conceptual problem. The rounding has been > > improved lately, so it's not as easy to trigger with some simple busy > > loops. > > As i mentioned it in my first reply to you i really welcome your changes > and your interest in the Linux scheduler, but i'm still somewhat > surprised why you focus on rounding so much (and why you attack CFS's > math implementation so vehemently and IMO so unfairly) - and we had this > discussion before in the "CFS review" thread that you started. The rounding is insofar a problem as it makes it very difficult to produce a correct mathematical model of CFS, which can be used to verify the working of code. With the recent rounding changes they have indeed little effect in the short term, especially as the error is distributed equally to the task, so every task gets relatively the same time, the error still exists nonetheless and adds up over time (although with the rounding changes it doesn't grow into one direction anymore). The problem is now the limiting, which you can't remove as long as the error exists. Part of this problem is that CFS doesn't really maintain a balanced sum of the available runtime, i.e. the sum of all updated wait_runtime values of all active tasks should be zero. This means under complex loads it's possible to hit the wait_runtime limits and the overflow/underflow time is lost in the calculation and this value can be easily in the millisecond range. To be honest I have no idea how real this problem in the praxis is right now, quite possibly people will only see it as a small glitch and in most cases they won't even notice. Previously it had been rather easy to trigger these problems due to the rounding problems. The main problem for me here is now that with any substantial change in CFS I'm running risk of making this worse and noticable again somehow. For example CFS relies on the 64bit math to have enough resolution so that the errors aren't noticable. That's why I needed a mathematical model I can work with and with it for example I can easily downscale the math to 32bit and I know exactly the limits within it will work correctly. > Your math is fairly simple (and that is _good_, just like CFS's existing > math is simple), I really wouldn't call CFS's existing math simple, the basic ideas are indeed simple, but that the math of the actual implementation is quite a bit more complex. > it can be summed up in essence as (without complicating > it with nice-level weighting, for easy understandability): > > " use the already existing p->sum_exec_runtime 'task runtime' metric > that CFS maintains, and use that as the key into the rb-tree that > selects tasks that should be run next. To handle sleeping tasks: keep > a per-rq sum of all runnable task's ->sum_exec_runtime values and > start newly woken tasks at the average rq->sum/nr_running value. " > > Now your patch does not actually do it that way in a clearly discernible > manner because lots of changes are intermixed into one big patch. Well, it's part of the math, but I wouldn't call it the "essence" of the _math_, as it leaves many questions unanswered, the essence of the math mainly involves how the problems are solved created by the weighting. If you want to describe the basic logical differences, you can do it a little simpler: CFS maintains the runtime of a task relative to a global clock, while in my model every task has its own clock and an approximated average is used to initialize the clock of new tasks. > ( please correct me if i got your math wrong. Your patch does not add > any comments at all to the new code and this slowed down my review > and analysis of your patch quite considerably. Lack of comments makes > it harder to see the purpose and makes it harder to notice the > benefits/tradeoffs involved in each change. ) I hope you did notice that I explained in the announcement in quite detail what the code does. > To sum up: the math in your patch _could_ be implemented as a much > smaller add-on to the already existing variables maintained by CFS, but > you chose to do lots of changes, variable-renames and removals at once > and posted them as one big change. I didn't rename that much and there are only a few that could be reused, for most part I indeed removed quite a bit and the rest really has a different meaning. > _Please_, separate things > out so that they can be reviewed one by one. That's not really the problem, but _please_ someone explain to me the features you want to see preserved. I don't want to be left at guessing what they're intendend to do and blindly implement something which should do something similiar. Thanks. bye, Roman
From: Roman Zippel [email blocked] Subject: Re: [ANNOUNCE/RFC] Really Fair Scheduler Date: Sat, 1 Sep 2007 08:48:53 +0200 (CEST) Hi, On Fri, 31 Aug 2007, Ingo Molnar wrote: Maybe I should explain for everyone else (especially after seeing some of the comments on kerneltrap), why I reacted somewhat irritated on what looks like such an innocent mail. The problem is without the necessary background one can't know how wrong such statements as this are, the level of confidence is amazing though: > Peter's work fully solves the rounding corner-cases you described as: I'd expect Ingo to know better, but the more he refuses to answer my questions, the more I doubt it, at least than it comes to the math part. While Peter's patches are interesting, they are only a small step to what I'm trying to achieve. With these patches the basic CFS math pretty much looks like this: sum_{t}^{T}(round(time_{t} * round(WMULT / weight_{t}) * WEIGTH0 / WMULT)) = sum_{r}^{R}(round(time_{r} * round(WMULT / weight_sum) * WEIGTH0 / WMULT)) It's based on this equation: sum_{t}^{T}(time_{t} / weight_{t}) = sum_{r}^{R}(time_{r} / weight_sum) This is the time a single task gets relative to the the runtime of all tasks. These sums are incrementally added/substracted to wait_runtime and it should stay around zero. All Peter's wait_runtime-scale patch does is to move the weight_{t} from one side to the other and that's it, it changes _nothing_ about the rounding above - "the rounding corner-cases" are still there. In my announcement I describe now in quite some detail, how I get rid of this rounding effect. The only rounding from above equation which is still left is "round(WMULT / weight_{t})", but this is luckily a constant and so is the time one task gets relative to another (unless reniced). Anyone who has actually read and understood what I wrote will hopefully realize what complete nonsense a statement like this is: > So the most intrusive (math) aspects of your patch have been implemented > already for CFS (almost a month ago), in a finegrained way. I'm not repeating again the whole math here, if anyone has questions about it, I'll try my best to answer them. So instead here are some of the intrusive aspects, which supposedly have been implemented already. One key difference is that I don't maintain the global sum (fair_clock) directly anymore, I can calculate it if needed, but it's not used to update wait_runtime anymore. This has the advantage that the whole rounding involved in it has no influence anymore on how much time a task gets. Without this value the whole math around how to schedule a task is quite different as well. Another key difference is that I got rid of (WEIGTH0 / WMULT), this has the advantage that it completely gets rid of the problematic rounding and the scheduler can be now finally precise as the hardware allows it. OTOH this has consequences for the range of values, as they can and are expected to overflow now after some time, which the math has to take into account. One positive side effect of these overflows is that I can reduce the resolution the scheduler is working with and thus I can get rid of pretty much all of the 64bit math, if the reduced resolution is sufficient, e.g. for all archs which have a jiffies based scheduler clock, but even embedded archs might like it. > So if you'd like to work with us on this and get items that make sense > merged (which we'd very much like to see happen), could you please > re-shape the rest of your changes and ideas (such as whether to use > ready-queueing or a runqueue concept, which does look interesting) ontop > of Peter's queue, and please do it as a finegrained, reviewable, > mergable series of patches, like Peter did. Thanks! The funny thing is it should be no that hard to split the patch, but that wasn't point. The point is to discuss the differences - how the different approach effects the scheduling decisions, as the new scheduler maintains somewhat different values. If I'm now told "it's just the same thing mathematically", which is provably nonsense, I'm a little stunned and the point that aggravates me is that most people simply are going to believe Ingo, because they don't understand the issue (and they don't really have to). I'm still amazed how easily Ingo can just ignore the main part of my work and still gets away with it. The last thing I want is yet another flame war, all I want is that this work is taken seriously, but it took Ingo less than 9 hours to get to a thorough judgement... bye, Roman
From: Bill Davidsen [email blocked] Subject: Re: [ANNOUNCE/RFC] Really Fair Scheduler Date: Sat, 01 Sep 2007 22:19:43 -0400 Roman Zippel wrote: > Hi, > > On Fri, 31 Aug 2007, Ingo Molnar wrote: > > Maybe I should explain for everyone else (especially after seeing some of > the comments on kerneltrap), why I reacted somewhat irritated on what > looks like such an innocent mail. > The problem is without the necessary background one can't know how wrong > such statements as this are, the level of confidence is amazing though: > >> Peter's work fully solves the rounding corner-cases you described as: > > I'd expect Ingo to know better, but the more he refuses to answer my > questions, the more I doubt it, at least than it comes to the math part. > The "math part" is important if you're doing a thesis defense, but demonstrating better behavior under some realistic load would probably be a better starting point. Maybe Ingo missed something in your math, and maybe he just didn't find a benefit. The ck and sd schedulers developed over a long time of public use and redefinition of goals. The cfs developed faster, but still had months of testing and the benefit of rt experience. Your scheduler is more of a virgin birth, no slow public discussion before, no time yet for people to run it under real loads, you're seeing first impressions. You dropped this on the world two days before a major US holiday, at the end of the summer when those of us not on vacation may be covering for those who are, did you expect Ingo to drop his regular work to look at your stuff? And do you think there are many people who can look at your math, and look at your code, and then have any clue how well it works in practice? I bet there aren't ten people in the world who would even claim to do that, and half of them are kidding themselves. So give people a few weeks to see if the rounding errors you eliminated mean anything in practice. Faster context is nice, but it's not something most people count as their major problem. I did a *lot* of cfs vs. sd testing, not just all the glitch1 tests I posted here, lots of stuff I just send to Ingo, and lots of tests like IPCbench and NNTP server loading showed nothing. (That's good, it means cfs isn't worse than the old scheduler for those loads.) I played with the tuning, I even diddled the code a bit, without finding meaningful improvement, so your possibly better math may not change the overall behavior much. I will wait until I post actual test results before I say any more. -- Bill Davidsen [email blocked] "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot
From: Roman Zippel [email blocked] Subject: Re: [ANNOUNCE/RFC] Really Fair Scheduler Date: Sun, 2 Sep 2007 19:02:55 +0200 (CEST) Hi, On Sat, 1 Sep 2007, Bill Davidsen wrote: > > I'd expect Ingo to know better, but the more he refuses to answer my > > questions, the more I doubt it, at least than it comes to the math part. > > > The "math part" is important if you're doing a thesis defense, but > demonstrating better behavior under some realistic load would probably be a > better starting point. That wasn't my design goal. The initial trigger for this was all the 64bit math in CFS and the question how can this be tuned using arch specific knowledge about the input values. For this I needed a correct and verifyable model to analyze, so I know where the limits are and how it reacts to different sets of inputs. This is where I got into trouble with all the rounding - I couldn't substantially change the math without convincing myself that it's still correct for all kinds of input data. That's why I completely redid the math from scratch - it's based on the same basic ideas, but the solution and thus the math is quite different. The main reason I didn't concentrate on the behaviour so much is that since both scheduler are conceptually not that far apart, it should be possible to apply any tuning done to CFS also to my model. But this requires someone actually understands what tuning was done and it wasn't done by random changes and seeing what happens, i.e. someone should be able to explain it to me. BTW in this context it's rather interesting to see that Ingo attacks me now for not having done this tuning yet and for which I explicitely asked for help... > Maybe Ingo missed something in your math, and maybe he > just didn't find a benefit. Maybe he didn't understand it at all and is too proud to admit it? (At least that's the only explanation which makes sense to me.) > You dropped this on the world two days before a major US holiday, at the end > of the summer when those of us not on vacation may be covering for those who > are, did you expect Ingo to drop his regular work to look at your stuff? Did I demand anything like that? It had been fine if he had been asking for more time or if he had more questions first, but it took him only a few hours to come to his conclusion without any need for further questions. > And > do you think there are many people who can look at your math, and look at your > code, and then have any clue how well it works in practice? I bet there aren't > ten people in the world who would even claim to do that, and half of them are > kidding themselves. I don't think the math is that complex, it may need some more explanations outside the math formulas though. But the math is needed to analyze what effect changes to scheduler have, otherwise there is a risk it becomes only guesswork with random changes which may help in some situations but break others. bye, Roman
From: Ingo Molnar [email blocked] Subject: Re: [ANNOUNCE/RFC] Really Fair Scheduler Date: Sun, 2 Sep 2007 09:20:29 +0200 * Daniel Walker [email blocked] wrote: > The the patch is near the end of this email.. The most notable thing > about the rediff is the line count, > > 4 files changed, 323 insertions(+), 729 deletions(-) > > That's impressive (assuming my rediff is correct). [...] Yeah, at first glance i liked that too, then i looked into the diff and noticed that a good chunk of the removal "win" comes from the removal of ~35 comment blocks while adding new code that has no comments at all (!). And if you look at the resulting code size/complexity, it actually increases with Roman's patch (UP, nodebug, x86): text data bss dec hex filename 13420 228 1204 14852 3a04 sched.o.rc5 13554 228 1228 15010 3aa2 sched.o.rc5-roman Although it _should_ have been a net code size win, because if you look at the diff you'll see that other useful things were removed as well: sleeper fairness, CPU time distribution smarts, tunings, scheduler instrumentation code, etc. > I also ran hackbench (in a haphazard way) a few times on it vs. CFS in > my tree, and RFS was faster to some degree (it varied).. here are some actual numbers for "hackbench 50" on -rc5, 10 consecutive runs fresh after bootup, Core2Duo, UP: -rc5(cfs) -rc5+rfs ------------------------------- Time: 3.905 Time: 4.259 Time: 3.962 Time: 4.190 Time: 3.981 Time: 4.241 Time: 3.986 Time: 3.937 Time: 3.984 Time: 4.120 Time: 4.001 Time: 4.013 Time: 3.980 Time: 4.248 Time: 3.983 Time: 3.961 Time: 3.989 Time: 4.345 Time: 3.981 Time: 4.294 ------------------------------- Avg: 3.975 Avg: 4.160 (+4.6%) Fluct: 0.138 Fluct: 1.671 so unmodified CFS is 4.6% faster on this box than with Roman's patch and it's also more consistent/stable (10 times lower fluctuations). At lower hackbench levels (hackbench 10) the numbers are closer - that could be what you saw. But, this measurement too is apples to oranges, given the amount of useful code the patch removes - fact is that you can always speed up the scheduler by removing stuff, just run hackbench as SCHED_FIFO (via "chrt -f 90 ./hackbench 50") to see what a minimal scheduler can do. It would be far more reviewable and objectively judgeable on an item by item basis if Roman posted the finegrained patches i asked for. (which patch series should be sorted in order of intrusiveness - i.e. leaving the harder changes to the end of the series.) Ingo
From: Roman Zippel [email blocked] Subject: Re: [ANNOUNCE/RFC] Really Fair Scheduler Date: Sun, 2 Sep 2007 17:16:23 +0200 (CEST) Hi, On Sun, 2 Sep 2007, Ingo Molnar wrote: > And if you look at the resulting code size/complexity, it actually > increases with Roman's patch (UP, nodebug, x86): > > text data bss dec hex filename > 13420 228 1204 14852 3a04 sched.o.rc5 > 13554 228 1228 15010 3aa2 sched.o.rc5-roman That's pretty easy to explain due to differences in inlining: text data bss dec hex filename 15092 228 1204 16524 408c kernel/sched.o 15444 224 1228 16896 4200 kernel/sched.o.rfs 14708 224 1228 16160 3f20 kernel/sched.o.rfs.noinline Sorry, but I didn't spend as much time as you on tuning these numbers. Index: linux-2.6/kernel/sched_norm.c =================================================================== --- linux-2.6.orig/kernel/sched_norm.c 2007-09-02 16:58:05.000000000 +0200 +++ linux-2.6/kernel/sched_norm.c 2007-09-02 16:10:58.000000000 +0200 @@ -145,7 +145,7 @@ static inline struct task_struct *task_o /* * Enqueue an entity into the rb-tree: */ -static inline void +static void __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) { struct rb_node **link = &cfs_rq->tasks_timeline.rb_node; @@ -192,7 +192,7 @@ __enqueue_entity(struct cfs_rq *cfs_rq, se->queued = 1; } -static inline void +static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) { if (cfs_rq->rb_leftmost == se) { @@ -240,7 +240,7 @@ static void verify_queue(struct cfs_rq * * Update the current task's runtime statistics. Skip current tasks that * are not in our scheduling class. */ -static inline void update_curr(struct cfs_rq *cfs_rq) +static void update_curr(struct cfs_rq *cfs_rq) { struct sched_entity *curr = cfs_rq->curr; kclock_t now = rq_of(cfs_rq)->clock; > Although it _should_ have been a net code size win, because if you look > at the diff you'll see that other useful things were removed as well: > sleeper fairness, CPU time distribution smarts, tunings, scheduler > instrumentation code, etc. Well, these are things I'd like you to explain a little, for example I repeatedly asked you about the sleeper fairness and I got no answer. BTW you seemed to haved missed that I actually give a bonus to sleepers as well. > > I also ran hackbench (in a haphazard way) a few times on it vs. CFS in > > my tree, and RFS was faster to some degree (it varied).. > > here are some actual numbers for "hackbench 50" on -rc5, 10 consecutive > runs fresh after bootup, Core2Duo, UP: > > -rc5(cfs) -rc5+rfs > ------------------------------- > Time: 3.905 Time: 4.259 > Time: 3.962 Time: 4.190 > Time: 3.981 Time: 4.241 > Time: 3.986 Time: 3.937 > Time: 3.984 Time: 4.120 > Time: 4.001 Time: 4.013 > Time: 3.980 Time: 4.248 > Time: 3.983 Time: 3.961 > Time: 3.989 Time: 4.345 > Time: 3.981 Time: 4.294 > ------------------------------- > Avg: 3.975 Avg: 4.160 (+4.6%) > Fluct: 0.138 Fluct: 1.671 > > so unmodified CFS is 4.6% faster on this box than with Roman's patch and > it's also more consistent/stable (10 times lower fluctuations). Was SCHED_DEBUG enabled or disabled for these runs? bye, Roman
From: Ingo Molnar [email blocked] Subject: Re: [ANNOUNCE/RFC] Really Fair Scheduler Date: Sun, 2 Sep 2007 17:29:00 +0200 * Roman Zippel [email blocked] wrote: > > And if you look at the resulting code size/complexity, it actually > > increases with Roman's patch (UP, nodebug, x86): > > > > text data bss dec hex filename > > 13420 228 1204 14852 3a04 sched.o.rc5 > > 13554 228 1228 15010 3aa2 sched.o.rc5-roman > > That's pretty easy to explain due to differences in inlining: > > text data bss dec hex filename > 15092 228 1204 16524 408c kernel/sched.o > 15444 224 1228 16896 4200 kernel/sched.o.rfs > 14708 224 1228 16160 3f20 kernel/sched.o.rfs.noinline no, when generating those numbers i used: CONFIG_CC_OPTIMIZE_FOR_SIZE=y # CONFIG_FORCED_INLINING is not set (but i also re-did it for all the other combinations of these build flags and similar results can be seen - your patch, despite removing lots of source code, produces a larger sched.o.) > Sorry, but I didn't spend as much time as you on tuning these numbers. some changes did slip into your patch that have no other purpose but to reduce code size: +#ifdef CONFIG_SMP unsigned long cpu_load[CPU_LOAD_IDX_MAX]; +#endif [...] +#ifdef CONFIG_SMP /* Used instead of source_load when we know the type == 0 */ unsigned long weighted_cpuload(const int cpu) { return cpu_rq(cpu)->ls.load.weight; } +#endif [...] so i thought you must be aware of the problem - at least considering how much you've criticised CFS's "complexity" both in your initial review of CFS (which included object size comparisons) and in this patch submission of yours (which did not include object size comparisons though). > > so unmodified CFS is 4.6% faster on this box than with Roman's patch > > and it's also more consistent/stable (10 times lower fluctuations). > > Was SCHED_DEBUG enabled or disabled for these runs? debugging disabled of course. (your patch has a self-validity checking function [verify_queue()] that is called on SCHED_DEBUG=y, it would have been unfair to test your patch with that included.) Ingo
From: Roman Zippel [email blocked] Subject: Re: [ANNOUNCE/RFC] Really Fair Scheduler Date: Sun, 2 Sep 2007 19:16:34 +0200 (CEST) Hi, On Sun, 2 Sep 2007, Ingo Molnar wrote: > so i thought you must be aware of the problem - at least considering how > much you've criticised CFS's "complexity" both in your initial review of > CFS (which included object size comparisons) and in this patch > submission of yours (which did not include object size comparisons > though). Ingo as long as you freely mix algorithmic and code complexity we won't get very far. I did code size comparisons for your _stable_ code, which was merged into Linus' tree. I explicitly said that my patch is only a prototype, so I haven't done any cleanups and tuning in this direction yet, so making any conclusions based on code size comparisions is quite ridiculous at this point. The whole point of this patch was to demonstrate the algorithmic changes, not to demonstrate a final and perfectly tuned scheduler. > > > so unmodified CFS is 4.6% faster on this box than with Roman's patch > > > and it's also more consistent/stable (10 times lower fluctuations). > > > > Was SCHED_DEBUG enabled or disabled for these runs? > > debugging disabled of course. (your patch has a self-validity checking > function [verify_queue()] that is called on SCHED_DEBUG=y, it would have > been unfair to test your patch with that included.) I'll look into it next week. bye, Roman
From: Ingo Molnar [email blocked] Subject: Re: [ANNOUNCE/RFC] Really Fair Scheduler Date: Sun, 2 Sep 2007 21:21:52 +0200 * Roman Zippel [email blocked] wrote: > > > > so unmodified CFS is 4.6% faster on this box than with Roman's > > > > patch and it's also more consistent/stable (10 times lower > > > > fluctuations). > > > > > > Was SCHED_DEBUG enabled or disabled for these runs? > > > > debugging disabled of course. (your patch has a self-validity > > checking function [verify_queue()] that is called on SCHED_DEBUG=y, > > it would have been unfair to test your patch with that included.) > > I'll look into it next week. thanks. FYI, there's plenty of time - i'll be at the KS next week so i'll be quite unresponsive to emails. Would be nice if you could take a quick look at the trivial patch i posted today though. How close is it to your algorithm, have i missed any important details? (not counting nice levels and rounding, it's just a quick & dirty prototype to show the first layer of the core math and nothing more.) Ingo



Related Links:

Its amazing to see Ingo's

September 4, 2007 - 4:02pm
Anonymous (not verified)

Its amazing to see Ingo's responses, quite rude and arrogant. Unwilling to discuss anything that questions his scheduler ...

Turnover on the kernel team

September 4, 2007 - 4:47pm
Anonymous (not verified)

I is pretty obvious that *some* turnover in the core kernel group would be healthy. It is also obvious the CFS should not have been merged as quickly as it was without wider review. Molnar's handling of the scheduler this year has been dismal.

INGO is the rude one? Heh

September 4, 2007 - 5:26pm
Anonymous (not verified)

INGO is the rude one? Heh

"I don't think it means what you think it means"

September 4, 2007 - 8:45pm

Ingo's responses struck me as rather straightforward, calm and to the point. They didn't seem rude at all.

Unable to discuss changes to his scheduler?

So if you'd like to work with us on this and get items that make sense
merged (which we'd very much like to see happen), could you please
re-shape the rest of your changes and ideas (such as whether to use
ready-queueing or a runqueue concept, which does look interesting) ontop
of Peter's queue, and please do it as a finegrained, reviewable,
mergable series of patches, like Peter did. Thanks!


Indeed.

--
Program Intellivision and play Space Patrol!

Ingo's responses struck me

September 4, 2007 - 11:25pm
Anonymous (not verified)

Ingo's responses struck me as rather straightforward, calm and to the point. They didn't seem rude at all.

Unable to discuss changes to his scheduler?

the fact is that Ingo compiled with debug sched on. Hey how many of you use sched_debug on a workstation, I presume very few unless you work embedding linux onto slim devices and you maintain git tree.
Besides of patching against git -rc5 instead of -rc3.

bye!!

Debug was disabled

September 5, 2007 - 11:52am
> > so unmodified CFS is 4.6% faster on this box than with Roman's patch 
> > and it's also more consistent/stable (10 times lower fluctuations).
> 
> Was SCHED_DEBUG enabled or disabled for these runs?

debugging disabled of course. (your patch has a self-validity checking 
function [verify_queue()] that is called on SCHED_DEBUG=y, it would have 
been unfair to test your patch with that included.)

	Ingo

What part of "debugging disabled of course" indicates debugging was enabled?
--
Program Intellivision and play Space Patrol!

You're biased

September 5, 2007 - 3:53am
renoX-nc (not verified)

I'd say both were a little "rude" (that's not really the correct word, but it's a good first approximation): Ingo was a too dismissive at first and Roman was too aggressive.

It's nice to see that the discussions have been more civil after the thorny beginning.

Also I'd say that's there is a conflict between theoretical and 'bottom-up' approach:
A-how about we're doing this, the math are nicer?
B-I don't care about the math, show-me real world improvement.

Historically, Linux developers have been much more on the B-side.

Where is the patch available?

September 23, 2007 - 10:12am
Rohit PC (not verified)

Can someone plz tell me where I can find RFS patch for installation
or RFS code.... to have a look at it

The patch is in the original

February 3, 2008 - 4:11pm
Anonymous (not verified)

The patch is in the original post: [ANNOUNCE/RFC] Really Fair Scheduler

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
speck-geostationary