Re: [patch] CFS (Completely Fair Scheduler), v2

Previous thread: bug in tcp? by Sebastian Kuzminsky on Monday, April 16, 2007 - 2:45 pm. (2 messages)

Next thread: Staircase cpu scheduler v17.1 by Con Kolivas on Monday, April 16, 2007 - 4:13 pm. (1 message)
From: Ingo Molnar
Date: Monday, April 16, 2007 - 3:07 pm

this is the second release of the CFS (Completely Fair Scheduler) 
patchset, against v2.6.21-rc7:

   http://redhat.com/~mingo/cfs-scheduler/sched-cfs-v2.patch

i'd like to thank everyone for the tremendous amount of feedback and 
testing the v1 patch got - i could hardly keep up with just reading the 
mails! Some of the stuff people addressed i couldnt implement yet, i 
mostly concentrated on bugs, regressions and debuggability.

there's a fair amount of churn:

   15 files changed, 456 insertions(+), 241 deletions(-)

But it's an encouraging sign that there was no crash bug found in v1, 
all the bugs were related to scheduling-behavior details. The code was 
tested on 3 architectures so far: i686, x86_64 and ia64. Most of the 
code size increase in -v2 is due to debugging helpers, they'll be 
removed later. (The new /proc/sched_debug file can be used to see the 
fine details of CFS scheduling.)

Changes since -v1:

 - make nice levels less starvable. (reported by Willy Tarreau)

 - fixed child-runs first. A /proc/sys/kernel/sched_child_runs_first 
   flag can be used to turn it on/off. (This might fix the Kaffeine bug
   reported by S.Çağlar Onur <)

 - changed SCHED_FAIR back to SCHED_NORMAL (suggested by Con Kolivas)

 - UP build fix. (reported by Gabriel C)

 - timer tick micro-optimization (Dmitry Adamushko)

 - preemption fix: sched_class->check_preempt_curr method to decide 
   whether to preempt after a wakeup (or at a timer tick). (Found via a
   fairness-test-utility written for CFS by Mike Galbraith)

 - start forked children with neutral statistics instead of trying to 
   inherit them from the parent: Willy Tarreau reported that this 
   results in better behavior on extreme workloads, and it also 
   simplifies the code quite nicely. Removed sched_exit() and the 
   ->task_exit() methods.

 - make nice levels independent of the sched_granularity value

 - new /proc/sched_debug file listing runqueue details and the rbtree

 - new SCH-* fields in ...
From: S.Çağlar
Date: Monday, April 16, 2007 - 3:12 pm

17 Nis 2007 Sal tarihinde, Ingo Molnar =C5=9Funlar=C4=B1 yazm=C4=B1=C5=9Ft=

Sorry for delayed response but i just find some free time, do you still wan=
t=20
me to test mainline + "parent-runs first" patch or will i drop that one and=
=20
test v2 which can change default behaviour?

=2D-=20
S.=C3=87a=C4=9Flar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in hou=
se!
From: Ingo Molnar
Date: Tuesday, April 17, 2007 - 1:59 am

i suspect for now it would be sufficient if you could check the v2 
patch.

if it _works_, please try this:

    echo 0 > /proc/sys/kernel/sched_child_runs_first

this should break Kaffeine again :)

(if it doesnt work then the Kaffeine problem is unrelated to 
child-runs-first.)

	Ingo
-

From: S.Çağlar
Date: Tuesday, April 17, 2007 - 7:45 am

17 Nis 2007 Sal tarihinde, Ingo Molnar =C5=9Funlar=C4=B1 yazm=C4=B1=C5=9Ft=

OK, i tested both plain -rc7 and -rc7 + CFSv2 with while=20
sched_child_runs_first enabled/disabled.

I'm always using same video file and try to reproduce freeze with constantl=
y=20
pressing forward/backward buttons. With CFS 2-3 forward/backward attempt=20
reproduces this behaviour.=20

And here are the results.

Mainline still has no issues with both xine-lib/kaffeine and xine-ui=20
(kaffeine-0.8.4, xine-lib-1.1.5 [both xcb enabled], xine-ui-0.99.4). I real=
ly=20
try hard to reproduce the freeze, but i can't...

And CFSv2 still fails for both child_runs_first and parent_runs_first cases=
=20
with same strace output (FUTEX_WAIT).

If you want me to test something else just ask please :)=20

Cheers
=2D-=20
S.=C3=87a=C4=9Flar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in hou=
se!
From: Gabriel C
Date: Tuesday, April 17, 2007 - 8:48 am

I have the same problem here ( same packages ).

Even VLC if I go forward/backward and then play again its start to 

-

From: Ingo Molnar
Date: Tuesday, April 17, 2007 - 9:01 am

yes, it would be nice to do a:

	strace -o kaffine.log -f -tttTTT kaffeine

log. Because in your old log this is visible:

 clone(child_stack=0xb02394a4, 
 flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, 
 parent_tidptr=0xb0239bd8, {entry_number:6, base_addr:0xb0239b90, 
 limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, 
 limit_in_pages:1, seg_not_present:0, useable:1}, 
 child_tidptr=0xb0239bd8) = 11340
 futex(0x89ac218, FUTEX_WAKE, 1)         = 1

we cloned a task and immediately afterwards we used futex 0x89ac218. 
After that point many things happen, but the lockup itself:

 futex(0x89ac218, FUTEX_WAIT, 2, NULL)   = 0
 futex(0x89ac218, FUTEX_WAIT, 2, NULL)   = 0
 futex(0x89ac218, FUTEX_WAIT, 2, NULL)   = 0

is the same futex. Probably related to the same child thread? It would 
be nice to also get a gdb backtrace:

	gdb kaffine
	<reproduce the hang>
	Ctrl-C
	bt

this should give you a gdb backtrace of that kaffeine hang. Thanks,

	Ingo
-

From: Peter Williams
Date: Monday, April 16, 2007 - 9:06 pm

Can I make a suggestion?

Would it be possible (from now on) to publish changes relevant to the 
previous patch (eventually leading to a series of patches that describes 
the evolution of the new scheduler) so that it's easier for us 
reviewers/critics to see the latest changes.  E.g. if import such 
changes into something like quilt (using my gquilt GUI wrapper, of 
course :-)) I can then use meld (or similar) to follow what's going as 
suggestions get folded in and bugs get fixed etc.

Thanks
Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce
-

From: Ingo Molnar
Date: Monday, April 16, 2007 - 11:49 pm

the v1 patch is still downloadable so you can do a delta by first 
applying the v1 patch to a quilt queue, doing a 'quilt snapshot', then 
'quilt pop', add the v2 patch to the series file, do a 'quilt push', 
then doing a "quilt diff --snapshot". (I just posted the delta patch in 
this thread so you can pick it from there too.)

	Ingo
-

From: Gene Heskett
Date: Monday, April 16, 2007 - 9:53 pm

This one (v2-rc2) is not a keeper I'm sorry to say, Ingo.  v2-rc0 was much 
better.  Watching amanda run with htop, kmails composer is being subjected to 
5 to 10 second pauses, and htop says that gzip -best isn't getting more that 
15% of the cpu, and the /amandatapes drive is being written to in a regular 
pattern that seems to be the cause of the pauses  according to gkrellm, which 
also seems to track the size of the writes, and can show anything from 4.3k 
to 54 megs as being written in one cycle of its screen update.

Normally hdd will fire up and take it at about 40+M/second steady till its 
done when there is a file ready to write even if its a 7GB file.  And I can 
type right on during the disk i/o.  But not now.

In short, I seem to be heavily I/O bound.  But when the write to /dev/hdd3 is 
done, then gzip -best pops right up to 90% plus cpu and I get my machine 
back.

In between file writes I checked the drives speed with hdparm:

root@coyote ~]# hdparm -Tt /dev/hdd

/dev/hdd:
 Timing cached reads:   856 MB in  2.01 seconds = 426.15 MB/sec
 Timing buffered disk reads:  222 MB in  3.01 seconds =  73.68 MB/sec

That's not too shabby, and obviously dma is active at least for the reading.

gzip -best was running while this was executing. So I think the drive is fine 
and the scheduling is whats funkity.  Sorry.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
  After they got rid of capital punishment, they had to hang twice
  as many people as before.
-

From: Willy Tarreau
Date: Monday, April 16, 2007 - 10:25 pm

Hi Gene,


Have you tried previous version with the fair-fork patch ? It might be possible
that your workload is sensible to the fork()'s child getting much CPU upon
startup.

Ingo, maybe I'm saying something stupid, but in my userland scheduler, when
new tasks are "forked", they are queued at the end of the run queue with a
fixed priority. In our case, this would translate into assigning them the
same prio and timeslice as their parent, but queuing them at the end so that
they don't make existing tasks starve during huge fork() loads.

I don't know how that would be possible (nor if that would help in anything),
but I found it was a good compromise over sharing the timeslice with the
parent. Perhaps we should have some absolute timeslice and some relative
timeslice (eg: X percent of total time divided by the number of tasks) ?

Regards,
Willy

-

From: Gene Heskett
Date: Monday, April 16, 2007 - 10:51 pm

Somewhat interesting to this, I have amanda doing a verify phase too.  During 
the verify phase (and while I was waiting for gmail to transmit this message, 
it took 30 minutes before it showed up on the list) I noted that when 
amrestore fired up, it, and its child tar were only taking about 20% of the 
cpu between them, and that /dev/hdd was showing a pretty steady 55 to 
75MB/sec being read.  As to what this tells us, I'm not going to hazard a 
guess because it wouldn't, this time of the night here in WV, USA, even be a 
SWAG.  Its coming up on 2am and the toothpicks holding my eyes open are 

Willy, I think that patch went by, and was followed by the v2-rc2 so fast that 
I never got a chance to try it with the v2-rc0 framework.  So I believe the 
answer there is probably no.  I never saw a problem with the v2-rc0, but Ingo 
shot me a message about it without enough detail that I could have tested for 
it.

FWIW, I've been using the CFQ I/O scheduler for quite a while, is it time I 
gave the AS or Deadline versions another check?  They are all built in but I 

Thanks Willy.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
"I take Him shopping with me. I say, 'OK, Jesus, help me find a bargain'" 
--Tammy Faye Bakker
-

From: Paolo Ornati
Date: Tuesday, April 17, 2007 - 12:18 am

On Tue, 17 Apr 2007 01:51:08 -0400

easy :)

# cat /sys/block/DEVICE/queue/scheduler
as noop [cfq] ...

# echo IO_SCHED > /sys/block/DEVICE/queue/scheduler

-- 
	Paolo Ornati
	Linux 2.6.21-rc7 on x86_64
-

From: Mike Galbraith
Date: Monday, April 16, 2007 - 10:51 pm

Dunno about that, but here's a possibly related datapoint.  I reported
to Ingo yesterday that I was sometimes losing control of my GUI (KDE)
under heavy IO.  I just reproduced it in mainline rc7.  If I start a
bonnie, and click around popping windows to the foreground, then poke
KDE's menu button, I may lose all GUI capability for a _very_ long time.
Here, with bonnie, that means until it gets past writing with putc, and
moves on to rewrite.  Ages.

	-Mike

-

From: Ingo Molnar
Date: Monday, April 16, 2007 - 11:27 pm

the fair-fork patch is now included in -v2, but that was already in 
-v2-rc0 too that i sent to Gene separately. I've attached the 
-rc0->final delta.

Gene, could you please apply this patch to your -v2-rc0 tree and do a 
quick double-check that indeed these changes cause the regression?

	Ingo
From: Peter Williams
Date: Tuesday, April 17, 2007 - 5:06 pm

One way of handling forked tasks is to give them a high priority but a 
small chunk (i.e. give them a relatively short time to do some work and 
surrender the CPU voluntarily before you boot them off).  If you choose 
the size of this reduced chunk well the vast majority of tasks will 
never be booted off and will do a small bit of work and either exit or 
sleep and will suffer no penalty as a result of this mechanism.  But it 
gives you a chance to move any newly forked process that turns out to be 
a CPU hog to a lower priority before it gets its next chunk of CPU at 
which time it can revert to getting normal size chunks as pre-emption 
will stop it hogging the CPU from then on.

I've trialled this mechanism in some of my schedulers and it works well.

I found that 10 milliseconds was a good value for the initial chunk of 
CPU for a newly forked process.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce
-

From: Ingo Molnar
Date: Monday, April 16, 2007 - 11:18 pm

ok - fortunately the delta between -v2-rc0 and -v2-final is pretty 
small. One difference is the child-runs-first fix. To restore the 
parent-runs-first logic, do this: 

	echo 0 > /proc/sys/kernel/sched_child_runs_first

does this make any difference?

If not then pretty much the only other change was the nice level tweak i 
did. Could you try to grab a few snapshots of scheduling state via 
something like:

   while sleep 1; do cat /proc/sched_debug >> to-ingo.txt; done

(and tell me the PID of the kmail composer, to make sure i'm checking 
the right task's behavior.)

also, as a separate experiment, could you perhaps run this script as 
root:

   cd /proc; for N in [1-9]*; do renice -n 0 $N; done

this will move all tasks in the system to nice level 0 and should make 
any nice level handling logic in the scheduler irrelevant. Do you have X 
reniced perhaps?

Lots of system threads have negative or positive nice levels, so once 
you have executed this script, only a reboot will be a practical way to 
restore it to the previous settings.

	Ingo
-

From: Ingo Molnar
Date: Tuesday, April 17, 2007 - 12:01 am

ok, i've got something better to test: i separated the delta out into a 
more finegrained stack of 3 patches. You can pick them up from:

 http://redhat.com/~mingo/cfs-scheduler/older/sched-cfs-v2-rc0.patch
 http://redhat.com/~mingo/cfs-scheduler/older/sched-cfs-v2-rc0-preempt-fix.patch
 http://redhat.com/~mingo/cfs-scheduler/older/sched-cfs-v2-rc0-child-runs-first.patch
 http://redhat.com/~mingo/cfs-scheduler/older/sched-cfs-v2-rc0-misc.patch

i test-built and test-booted all 4 steps of this. The baseline -v2-rc0 
patch should be the one that works - you might want to double-check it, 
just to be sure. One of the other 3 patches ontop of this baseline 
causes the regression on your desktop. My current bet is on preempt-fix, 
so i have put that one first. The other one would be the second patch, 
child-runs-first. The misc patch should have no effect on behavior - but 
i've included it for completeness. (and i was wrong about the 'nice 
fix', it is not in this delta)

	Ingo
-

From: Davide Libenzi
Date: Tuesday, April 17, 2007 - 12:31 am

Isn't that easier for everyone if you keep them as quilt series (ala 
syslets)?


- Davide


-

From: Ingo Molnar
Date: Tuesday, April 17, 2007 - 12:39 am

i _do_ have a quilt tree, but i never had the clean splitup above. Why? 
Because i worked on all of these aspects (and a whole lot of other 
aspects as well) in parallel during the past 2 days, back and forth, 
often mixing changes, etc. and there was never any clean splitup.

Now it turned out that the clean splitup of -rc0->final delta would ease 
Gene's testing so i created it. Note that this is just 30% of the total 
v1->v2 delta and i just saved the work of having to do a clean splitup 
of the other 70%. (and note that this splitup will be undone because it 
makes no sense for any potential upstream merge at all, it's only to 
ease testing for Gene)

	Ingo
-

From: Gene Heskett
Date: Tuesday, April 17, 2007 - 10:18 am

Now he tells me.  :-)  But I have some CHO stuff to do, so it will be about 36 



-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Support the Girl Scouts!
	(Today's Brownie is tomorrow's Cookie!)
-

From: Gene Heskett
Date: Tuesday, April 17, 2007 - 10:22 am

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
This life is yours.  Some of it was given to you; the rest, you made yourself.
-

From: Gene Heskett
Date: Tuesday, April 17, 2007 - 10:15 am

Ahh, so many cats, and so few recipes here Ingo.  In this case cats=patches & 
recipes=time to test adequately.  I do have another box, but it would 
probably take a week & about a big buck to get that old rh7.3 brought up to 
date & suitable, and its only a 500MHZ K-III, which might make the diffs more 
obvious.  It would need a video card to replace its dinosaur Diamond and a 
fresh dvd drive.  And its motherboard has very buggy usb chips.  TYAN S-1590.  



-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Q:	What's the difference between a dead dog in the road and a dead
	lawyer in the road?
A:	There are skid marks in front of the dog.
-

From: Davide Libenzi
Date: Tuesday, April 17, 2007 - 1:03 am

Sorry, I did not follow the latest developments, but how many tunables we 
have so far in CFS? Are those for debug only or they're supposed to stay?
Weren't those listed inside the Axis of Evil (just to remain in topic :) 
till yesterday?


- Davide


-

From: Nick Piggin
Date: Tuesday, April 17, 2007 - 1:18 am

Actually I think this is something that makes sense to add, even if
just for debugging, but maybe also for production, depending on how
much it impacts things. Child runs first is an heuristic optimisation
that exploits a VM detail (however fundamental). But for things that
don't exec right after forking (and maybe some things that do), it
can be nicer to reduce context switches, improve cache patterns, and
allow children to be load balanced away before touching memory, if
child_runs_first is turned off.

-

From: Ingo Molnar
Date: Tuesday, April 17, 2007 - 1:26 am

yeah, the primary intent was debug. Nick, am i confused to conclude that 
mainline in fact runs the _parent_ first, despite all the elaborate 
runqueue juggling we do there? This piece of code in wake_up_new_task() 
caught my eyes:

                                p->prio = current->prio;
                                p->normal_prio = current->normal_prio;
                                list_add_tail(&p->run_list, &current->run_list);
                                p->array = current->array;
                                p->array->nr_active++;
                                inc_nr_running(p, rq);

shouldnt the list_add_tail() be list_add(), so that task pickup sees the 
child first? Maybe we still do child-runs-first in practice, due to the 
timeslice and sleep average fixups that happen if the parent preempts, 
but the above piece of code seems a quite elaborate way of doing 
activate_task(). To have the child _before_ the parent we'd need the 
add-on patch below. But ... i could be wrong, this is just a quick 
thought.

	Ingo

---
 kernel/sched.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -1685,7 +1685,7 @@ void fastcall wake_up_new_task(struct ta
 			else {
 				p->prio = current->prio;
 				p->normal_prio = current->normal_prio;
-				list_add_tail(&p->run_list, &current->run_list);
+				list_add(&p->run_list, &current->run_list);
 				p->array = current->array;
 				p->array->nr_active++;
 				inc_nr_running(p, rq);
-

From: Nick Piggin
Date: Tuesday, April 17, 2007 - 1:41 am

I think that it works because the list we're adding to is not the
normal runqueue list head, but the parent's list_head on that runqueue.
-

From: Ingo Molnar
Date: Tuesday, April 17, 2007 - 1:57 am

yeah, you are right, i was confused: list_add() adds _after_ the head, 
list_add_tail() adds _before_ the head - and in the middle of the list 
if we do a list_add_tail() it adds before that entry. So everything's 
fine and working as expected :)

	Ingo
-

From: Ingo Molnar
Date: Tuesday, April 17, 2007 - 1:20 am

yeah, debug only. I strongly suspect the Kaffeine breakage for example 
was related to child-runs-first, so userspace developers might be 
interested in a switch to turn this on/off.

while reviewing the upstream scheduler it occured to me that we are 
probably _not_ doing child-runs-first there due to the list_add_tail() 
[it should be a list_add() for it to be child-first. But i havent 
instrumented this heavily and this portion of the mainline scheduler is 
pretty fragile.]. So via this flag we could also see the performance 

heh ;)

	Ingo
-

From: Gene Heskett
Date: Tuesday, April 17, 2007 - 9:12 am

And I let the crf0 version run longer as I was looking for the composer's pid, 
but htop (or I) can't see it.  Even a ps -e isn't seeing it!  But its 
running, I'm actively typing in it.  So you get 3 files, the third one called 



-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
I have many CHARTS and DIAGRAMS..
-

From: Peter Williams
Date: Monday, April 16, 2007 - 11:46 pm

Have you considered using rq->raw_weighted_load instead of 
rq->nr_running in calculating fair_clock?  This would take the nice 
value (or RT priority) of the other tasks into account when determining 
what's fair.

Peter
PS You'd have to change the migration thread's load_weight from 0 to 1 
in order to prevent divide by zero without having to explicitly check 
for it every time.
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce
-

From: William Lee Irwin III
Date: Tuesday, April 17, 2007 - 12:51 am

I suspect you mean (curr->load_weight*delta_exec)/rq->raw_weighted_load
in update_curr().


-- wli
-

From: Ingo Molnar
Date: Tuesday, April 17, 2007 - 1:16 am

good idea, i'll try that.

	Ingo
-

From: Ingo Molnar
Date: Tuesday, April 17, 2007 - 1:52 am

i'll try another thing too: we could perhaps get rid of rq->nr_running 
and only use raw_weighted_load, because now the only main remaining 
property of ->nr_running is "is it zero or not".

[ ->nr_running's only other significant use is 'group_capacity', but in
  reality it is only interested in whether all CPUs in the group are
  busy and what the combined cpu power of that group is, and this could
  be restructured to use rq->curr and cpu_power - and become independent
  of nr_running. ]

[ then there are other details like load-average, but we could change
  that to be weighted-cpu-load driven - that makes sense anyway: a
  reniced task should have less effect on the 'system load' than a
  non-reniced task. ]

that would be one less variable to maintain in the scheduler hotpath, 
and it would make smpnice an effective _replacement_ for nr_running, 
instead of an add-on thing that costs a bit of performance.

	Ingo
-

From: Peter Williams
Date: Tuesday, April 17, 2007 - 7:05 am

In the longer term, I'd suggest modifying this idea to use the maximum 
of rq->raw_weighted_load and a running average of rq->raw_weighted_load 
much the same as was done within the load balancer code.  This will tend 
to make scheduling "smoother".  To try the idea out you could (on an SMP 
system) use one of the rq->cpu_load[] metrics as the running average.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce
-

From: Peter Williams
Date: Tuesday, April 17, 2007 - 1:30 am

Or something like that, yes. :-)

I was trying to make the point that the weighted load stuff provides 
useful data for implementing nice (in a number of ways e.g. see spa_ebs).

Also, now that the old time slices are gone, a simpler more efficient 
function for mapping RT priority or nice (as appropriate) to 
p->load_weight can be used instead of the current one which uses the 
time slice the task would have been allocated as a basis.  I'd suggest 
the function that the current one replaced.  (Because it was mine :-)).

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce
-

From: Peter Williams
Date: Wednesday, April 18, 2007 - 12:15 pm

Actually, this formula can't be used for the migration thread itself as 
its load_weight isn't an accurate reflection of its static priority. 
But as the migration thread is a real time task this probably isn't an 
issue, right?

If this assumption is correct (i.e. curr is never a real time task) then 
my earlier caveat re division by zero being possible is invalid because 
the migration task will never be the only task on the runqueue when this 
code is called.

I'm also assuming here that (because of its name) curr is already on the 
runqueue when this code is called.  If it isn't the divisor in the above 
expression should be (rq->raw_weighted_load + curr->load_weight).  This 
would also preclude the possibility of divide by zero.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce
-

From: Ingo Molnar
Date: Tuesday, April 17, 2007 - 2:53 am

yeah - nice idea, i'll try this.

	Ingo
-

Previous thread: bug in tcp? by Sebastian Kuzminsky on Monday, April 16, 2007 - 2:45 pm. (2 messages)

Next thread: Staircase cpu scheduler v17.1 by Con Kolivas on Monday, April 16, 2007 - 4:13 pm. (1 message)