Re: [RFC -v2 PATCH 2/3] sched: add yield_to function

Previous thread: [RFC -v2 PATCH 1/3] kvm: keep track of which task is running a KVM vcpu by Rik van Riel on Monday, December 13, 2010 - 8:45 pm. (1 message)

Next thread: [RFC -v2 PATCH 0/3] directed yield for Pause Loop Exiting by Rik van Riel on Monday, December 13, 2010 - 8:44 pm. (2 messages)
From: Rik van Riel
Date: Monday, December 13, 2010 - 8:46 pm

Add a yield_to function to the scheduler code, allowing us to
give the remainder of our timeslice to another thread.

We may want to use this to provide a sys_yield_to system call
one day.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
- move to a per sched class yield_to
- fix the locking

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2c79e92..408326f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1086,6 +1086,8 @@ struct sched_class {
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	void (*task_move_group) (struct task_struct *p, int on_rq);
 #endif
+
+	void (*yield_to) (struct rq *rq, struct task_struct *p);
 };
 
 struct load_weight {
@@ -1947,6 +1949,7 @@ extern void set_user_nice(struct task_struct *p, long nice);
 extern int task_prio(const struct task_struct *p);
 extern int task_nice(const struct task_struct *p);
 extern int can_nice(const struct task_struct *p, const int nice);
+extern void requeue_task(struct rq *rq, struct task_struct *p);
 extern int task_curr(const struct task_struct *p);
 extern int idle_cpu(int cpu);
 extern int sched_setscheduler(struct task_struct *, int, struct sched_param *);
@@ -2020,6 +2023,10 @@ extern int wake_up_state(struct task_struct *tsk, unsigned int state);
 extern int wake_up_process(struct task_struct *tsk);
 extern void wake_up_new_task(struct task_struct *tsk,
 				unsigned long clone_flags);
+
+extern u64 slice_remain(struct task_struct *);
+extern void yield_to(struct task_struct *);
+
 #ifdef CONFIG_SMP
  extern void kick_process(struct task_struct *tsk);
 #else
diff --git a/kernel/sched.c b/kernel/sched.c
index dc91a4d..6399641 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5166,6 +5166,46 @@ SYSCALL_DEFINE3(sched_getaffinity, pid_t, pid, unsigned int, len,
 	return ret;
 }
 
+/*
+ * Yield the CPU, giving the remainder of our time slice to task p.
+ * Typically used to hand CPU time to another thread inside the ...
From: Mike Galbraith
Date: Monday, December 13, 2010 - 11:08 pm

That part looks ok, except for the yield cross cpu bit.  Trying to yield

<ramble>
slice_remain() measures the distance to your last preemption, which has
no relationship with entitlement.  sched_slice() is not used to issue
entitlement, it's only a ruler.

You have entitlement on your current runqueue only, that entitlement
being the instantaneous distance to min_vruntime in a closed and fluid
system.  You can't inject some instantaneous relationship from one
closed system into an another without making the math go kind of fuzzy,
so you need tight constraints on how fuzzy it can get.

We do that with migrations, inject fuzz.  There is no global fair-stick,
but we invent one by injecting little bits of fuzz.  It's constrained by
chaos and the magnitude constraints of the common engine.  The more you
migrate, the more tightly you couple systems.  As long as we stay fairly
well balanced, we can migrate without the fuzz getting out of hand, and
end up with a globally ~fair system.

What you're injecting isn't instantaneously irrelevant lag-fuzz, which
distributed over time becomes relevant. you're inventing entitlement out
of the void.  Likely not a big hairy deal unless you do it frequently,
but you're doing something completely bogus and seemingly unconstrained.

This has an excellent chance of moving the recipient rightward.. and the
yielding task didn't yield anything.  This may achieve the desired
result or may just create a nasty latency spike... but it makes no
arithmetic sense.

	-Mike

--

From: Srivatsa Vaddagiri
Date: Tuesday, December 14, 2010 - 3:24 am

So another (crazy) idea is to move the "yieldee" task on another cpu over to 
yielding task's cpu, let it run till the end of yielding tasks slice and then
let it go back to the original cpu at the same vruntime position!

- vatsa
--

From: Mike Galbraith
Date: Tuesday, December 14, 2010 - 4:03 am

Yeah, pulling the intended recipient makes fine sense.  If he doesn't
preempt you, you can try to swap vruntimes or whatever makes arithmetic
sense and will help.  Dunno how you tell him how long he can keep the
cpu though, and him somehow going back home needs to be a plain old
migration, no fancy restoration of ancient history vruntime.

	-Mike

--

From: Srivatsa Vaddagiri
Date: Tuesday, December 14, 2010 - 4:26 am

can't we adjust the new task's [prev_]sum_exec_runtime a bit so that it is 

What is the issue if it gets queued at the old vruntime (assuming fair stick is
still behind that)? Without that it will hurt fairness for the yieldee (and
perhaps of the overall VM in this case).

- vatsa
--

From: Mike Galbraith
Date: Tuesday, December 14, 2010 - 5:47 am

And dork up accounting.  Why?  Besides, it won't work because you have
no idea who may preempt whom, when, and for how long.

(Why do people keep talking about timeslice?  The only thing that exists

Who all are you placing this task in front of or behind based upon a
non-existent relationship?

Your recipient may well have been preempted, and is now further behind
than the completely irrelevant to the current situation stored vruntime
would indicate, so why would you want to move it rightward?  Certainly
not in the interest of fairness.

	-Mike

--

From: Rik van Riel
Date: Thursday, December 16, 2010 - 12:49 pm

The current task just donated the rest of its timeslice.

Surely that makes it a reasonable idea to call yield, and
get one of the other tasks on the current CPU running for


Good point, the current task calls yield() in the function
that calls yield_to_fair, but I seem to have lost the code
that penalizes the current task's runtime...

I'll reinstate that.

-- 
All rights reversed
--

From: Mike Galbraith
Date: Thursday, December 16, 2010 - 11:56 pm

There's nothing wrong with trying to give up the cpu.  It's the concept

If you want to yield_to(task), task needs to move to the resource you're
sitting on.  That's the only thing that makes any sense to me.  Pull the
task, maybe do some vruntime twiddling (likely very bad idea) to improve
success chances, and schedule.

Dunno how effective that would be at solving real problems though.  You
might get him to run, might not, you'll be fighting load balancing, and

See comment in parentheses above :)

	-Mike

--

From: Mike Galbraith
Date: Friday, December 17, 2010 - 12:15 am

BTW, with this vruntime donation thingy, what prevents a task from
forking off accomplices who do nothing but wait for a wakeup and
yield_to(exploit)?

Even swapping vruntimes in the same cfs_rq is dangerous as hell, because
one party is going backward.

	-Mike

--

From: Avi Kivity
Date: Saturday, December 18, 2010 - 10:08 am

What's the difference between that and forking off accomplices who 
run(exploit) directly?

Many threads dominating the scheduler has been solved by group 
scheduling.  We need to make sure directed yield doesn't violate that, 
but I don't see new problems.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--

From: Mike Galbraith
Date: Saturday, December 18, 2010 - 12:13 pm

Let it become possible to stop clocks, and you surely will :)

	-Mike


--

From: Avi Kivity
Date: Saturday, December 18, 2010 - 11:08 pm

That's scheduler jargon, I'm not familiar with scheduler internals.  
Let's talk about this at a high level, since whenever I describe it in 
scheduler terms I'm likely to get it wrong.

If there are N tasks on the machine, each is entitled to 1/N of the 
machine's cpu resources (ignoring nice and cgroups for the moment).  
What I'd like is for one task to temporarily pass a portion of its 
entitlement to another.  No other task's entitlement is affected; they 
still get their 1/N.

If a task donates its entitlement and immediately commits suicide, still 
we won't have fundamentally changed anything; the task could have kept 
itself alive and consumed that entitlement, so the scheduler was already 
prepared to give it this entitlement so nothing was gained by the 
forking task.  The problem of fork() creating new entitlements out of 
thin air has nothing to do with directed yield, and is solved by cgroups.

We already do something similar with priority inheritance.  This doesn't 
involve cpu entitlement, but we have the same situation where a task's 
ability to make progress depends on another task receiving cpu time.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--

From: Rik van Riel
Date: Monday, December 20, 2010 - 8:40 am

I just realized the answer to this question.

We only give cpu time to tasks that are runnable, but not
currently running.  That ensures one task cannot block others
from running by having time yielded to it constantly.
--

From: Mike Galbraith
Date: Monday, December 20, 2010 - 9:04 am

Hm.  Don't think that will 100% sure prevent clock stoppage, because the
running task doesn't necessarily advance min_vruntime.

What about something like the below instead?  It compiles, but may eat
your first born child.

sched: implement fair class yield_to(task) using cfs_rq->next as a selection hint.

<CHANGELOG>

Not-signed-off-by: Mike Galbraith <efault@gmx.de>
---
 include/linux/sched.h |    1 
 kernel/sched.c        |   47 +++++++++++++++++++++++++++++++++++++++++
 kernel/sched_fair.c   |   56 ++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 104 insertions(+)

Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1056,6 +1056,7 @@ struct sched_class {
 	void (*enqueue_task) (struct rq *rq, struct task_struct *p, int flags);
 	void (*dequeue_task) (struct rq *rq, struct task_struct *p, int flags);
 	void (*yield_task) (struct rq *rq);
+	int (*yield_to_task) (struct task_struct *p, int preempt);
 
 	void (*check_preempt_curr) (struct rq *rq, struct task_struct *p, int flags);
 
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -5325,6 +5325,53 @@ void __sched yield(void)
 }
 EXPORT_SYMBOL(yield);
 
+/**
+ * yield_to - yield the current processor to another thread in
+ * your thread group, or accelerate that thread toward the
+ * processor it's on.
+ *
+ * It's the caller's job to ensure that the target task struct
+ * can't go away on us before we can do any checks.
+ */
+void __sched yield_to(struct task_struct *p, int preempt)
+{
+	struct task_struct *curr = current;
+	struct rq *rq, *p_rq;
+	unsigned long flags;
+	int yield = 0;
+
+	local_irq_save(flags);
+	rq = this_rq();
+
+again:
+	p_rq = task_rq(p);
+	double_rq_lock(rq, p_rq);
+	while (task_rq(p) != p_rq) ...
From: Mike Galbraith
Date: Monday, December 27, 2010 - 10:54 pm

From: Gene Heskett
Date: Monday, December 27, 2010 - 11:08 pm

Mike, I'd be glad to try it, but the last kernel I was able to make the 
nvidia drivers install on was 2.6.36.1-latencyfix, which had your earier 


-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
I have more humility in my little finger than you have in your whole 
____BODY!
		-- from "Cerebus" #82
--

From: Mike Galbraith
Date: Monday, December 27, 2010 - 11:16 pm

It wouldn't do diddly spit by itself anyway, needs someone to stick
yield_to(amigo) in interesting spots.  Might not do diddly spit even
then :)

	-Mike

--

From: Gene Heskett
Date: Tuesday, December 28, 2010 - 9:18 am

Oh, in that case it will eventually filter down to me anyway after the rest 
of this list beats on it to see what falls out.  But I occasionally like to 
be part of the filtering media as long as I don't bleed too much. ;-)

Thanks Mike.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Without facts, the decision cannot be made logically.  You must rely on
your human intuition.
		-- Spock, "Assignment: Earth", stardate unknown
--

From: Rik van Riel
Date: Tuesday, December 28, 2010 - 3:34 pm

It's christmas.

I've replaced my original patch with your patch in my
series, just haven't gotten around to running Avi's
latest test on it yet :)

-- 
All rights reversed
--

From: Avi Kivity
Date: Friday, December 17, 2010 - 8:09 am

What's so strange about it?  From a high level there are N runnable 
tasks contending for M cpus.  If task X really needs task Y to run, what 
does it matter if task Y last ran on the same cpu as task X or not?

Do I correctly read between the lines that CFS maintains complete 
fairness only on a cpu, but not globally?  I hope I'm wrong, but

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
  8648 avi       20   0  106m 1092  148 R 80.4  0.0   0:26.03 1 bash
  8656 avi       20   0  106m 1080  136 R 47.3  0.0   0:14.73 0 bash
  8652 avi       20   0  106m 1080  136 R 47.0  0.0   0:15.36 0 bash

doesn't look too good (three infinite loops in bash started at the 
roughly same time)

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--

From: Mike Galbraith
Date: Friday, December 17, 2010 - 12:51 pm

Task X wants control of when runnable task Y gets the cpu.  Task X
clearly wants to be the scheduler.  This isn't about _yielding_ diddly
spit, it's about individual tasks wanting to make scheduling decisions,
so calling it a yield is high grade horse-pookey.  You're trying to give
the scheduler a hint, the stronger that hint, the happier you'll be.

I can see the problem, and I'm not trying to be Mr. Negative here, I'm
only trying to point out problems I see with what's been proposed.  

If the yielding task had a concrete fee he could pay, that would be
fine, but he does not.

If he did have something, how often do you think it should be possible
for task X to bribe the scheduler into selecting task Y?  Will his
pockets be deep enough to actually solve the problem?  Once he's
yielded, he's out of the picture for a while if he really gave anything
up.  What happens to donated entitlement when the recipient goes to
sleep?  If you try to give it back, what happens if the donor exited?
Where did the entitlement come from if task A running alone on cpu A
tosses some entitlement over the fence to his pal task B on cpu B.. and
keeps on trucking on cpu A?  Where does that leave task C, B's

Nothing between the lines about it.  There are N individual engines,
coupled via load balancing.

	-Mike

--

From: Avi Kivity
Date: Saturday, December 18, 2010 - 10:02 am

It does.  The yielding task is entitled to its fair share of the cpu, as 
modified by priority and group scheduling.  The yielding task is willing 
to give up some of this cpu, in return for increasing another task's 


Unless the other task donates some cpu share back.  This is exactly what 



Eventually C would replace A, since its share will be exhausted.  If C 

Is this not seen as a major deficiency?

I can understand intra-cpu scheduling decisions at 300 Hz and inter-cpu 
decisions at 10 Hz (or even lower, with some intermediate rate for 
intra-socket scheduling).  But this looks like a major deviation from 
fairness - instead of 33%/33%/33% you get 50%/25%/25% depending on 
random placement.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--

From: Mike Galbraith
Date: Saturday, December 18, 2010 - 12:06 pm

The donor does absolutely NOT have any claim to any specific quantum of
any cpu at any given time.  There is no reservation, only a running
clock.  If you let one task move another task's clock backward, you open

Hm, so it needs to be very cheap, and highly repeatable.

What if: so you're trying to get spinners out of the way right?  You
somehow know they're spinning, so instead of trying to boost some task,
can you do a directed yield in terms of directing a spinner that you
have the right to diddle to yield.  Drop his lag, and resched him.  He's
not accomplishing anything anyway.

If the only thing running is virtualization, and nobody else can use the
interface being invented, all is fair, but this passing of vruntime
around is problematic when innocent bystanders may want to play too.





Doesn't seem to be.  That's what SMP-nice was all about.  It's not

Yeah, you have to bounce tasks around regularly to make that work out.

	-Mike


--

From: Avi Kivity
Date: Saturday, December 18, 2010 - 11:21 pm

Of course not, but we aren't interested in a specific quantum of time.  
All we want is to boost task A, and to unboost the donor in order to 
maintain fairness.  We don't want A to run now (though it would be 

There are a couple of problems with this approach:

- current yield() is a no-op
- even if it weren't, the process (containing the spinner and the 
lock-holder) would yield as a whole.  If it yielded for exactly the time 
needed (until the lock holder releases the lock), it wouldn't matter, 
since the spinner isn't accomplishing anything, but we don't know what 
the exact time is.  So we want to preserve our entitlement.

With a pure yield implementation the process would get less than its 
fair share, even discounting spin time, which we'd be happy to donate to 

We definitely want to maintain fairness.  Both with a dedicated virt 


What's the problem exactly?  What's the difference, system-wide, with 
the donor continuing to run for that same entitlement?  Other tasks see 


I guess random perturbations cause task migrations periodically and 
things balance out.  But it seems wierd to have this devotion to 

That's fine as long as the bounce period is significantly larger than 
the cost of a migration.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--

From: Mike Galbraith
Date: Sunday, December 19, 2010 - 3:05 am

I don't get this part.  How does the whole process yield if one thread

And that's the hard part.  If can drop lag, you may hurt yourself, but

That makes it difficult to the point of impossible.

You want a specific task to run NOW for good reasons, but any number of
tasks may want the same godlike power for equally good reasons.

You could create a force select which only godly tasks could use that
didn't try to play games with vruntimes, just let the bugger run, and
let him also eat the latency hit he'll pay for that extra bit of cpu IFF
you didn't care about being able to mix loads.

Or, you could just bump his nice level with an automated return to
previous level on resched.




What's that? :)  No task may run until there are enough of you to fill

It doesn't ignore it complete, it just doesn't try to do all the math
continuously (danger Will Robinson: Peter has scary patches).  Prodding
it in the right general direction with migrations is cheaper.

	-Mike

--

From: Avi Kivity
Date: Sunday, December 19, 2010 - 2:19 am

The process is the sum of its threads.  If a thread yield loses 1 msec 
of runtime due to the yield, the process loses 1 msec due to the yield.  
If the lock is held for, say, 100 usec, it would be better for the 
process to spin rather than yield.

With directed yield the process loses nothing by yielding to one of its 

We already have a "hurt only yourself" thing.  We sleep for 100 usec 

We aren't happy to donate it to the rest of the system, since it will 
cause a guest with lots of internal contention to make very little 

I don't want it to run now.  I want it to run before some other task.  I 
don't care if N other tasks run before both.  So no godlike powers 

Since task A is running now, clearly the scheduler thinks it deserves to 
run.  What I want to do is take just enough of the "deserves" part to 



Why is that a consequence of global fairness? three tasks get 100% cpu 


Doesn't seem to work from my brief experiment.

-- 
error compiling committee.c: too many arguments to function

--

From: Mike Galbraith
Date: Sunday, December 19, 2010 - 4:18 am

If behaviors are very similar, and tasks are not likely to try to
exploit it (as described), you can likely swap lags without horrible
consequences.




..I was just trying to say that "global fairness" is not well defined.

Never mind.

	-Mike

--

From: Mike Galbraith
Date: Monday, December 20, 2010 - 1:39 am

Ponders that...

What if: we test that both tasks are in the same thread group, if so,
use cfs_rq->next to pass the scheduler a HINT of what you would LIKE to
happen.  If the current task on that rq is also in your group, resched
it, then IFF the task you would like to run isn't too far right, it'll
be selected.  If the current task is not one of yours, tough, you can
set cfs_rq->next and hope it doesn't get overwritten, but you may not
preempt a stranger.  If you happen to be sharing an rq, cool, you
accomplished your yield_to().  If not, there's no practical way (I can
think of) to ensure that the target runs before you run again if you try
to yield, but you did your best to try to get him to the cpu sooner, and
in a manner that preserves fairness without dangerous vruntime diddling.

Would that be good enough to stop (or seriously improve) cpu wastage?

	-Mike

--

From: Avi Kivity
Date: Monday, December 20, 2010 - 1:45 am

In my use case, both tasks are in the same thread group, so that works.  


The cross-cpu limitation is bothersome.  Since there are many cpus in 
modern machines, particularly ones used for virt, the probability of the 
two tasks being on the same cpu is quite low.

-- 
error compiling committee.c: too many arguments to function

--

From: Mike Galbraith
Date: Monday, December 20, 2010 - 1:55 am

Because preempting a perfect stranger is not courteous, all tasks have


What would you suggest?  There is no global execution timeline, so if
you want to definitely run after this task, you're stuck with moving to
his timezone or moving him to yours.  Well, you could sleep a while, but
we know how productive sleeping is.

	-Mike



--

From: Avi Kivity
Date: Monday, December 20, 2010 - 2:03 am

I don't want to preempt anybody, simply make the task run before me.

Further, this is a kernel internal API, so no need for these types of 

Since I'm running and the target isn't, it's clear the scheduler thinks 
the target had more cpu than I did [73].  That's why I want to donate 
cpu time.

[73] at least it'd be clear if the scheduler were globally fair.  As it 
is, I might be the only task running on my cpu, therefore in a cpu glut, 
while the other task shares the cpu with some other task and is 

I don't know.  The whole idea of donating runtime was predicated on CFS 
being completely fair.  Now I find that (a) it isn't (b) donating 
runtimes between tasks on different cpus is problematic.

Moving tasks between cpus is expensive and sometimes prohibited by 
pinning.  I'd like to avoid it if possible, but it's better than nothing.

-- 
error compiling committee.c: too many arguments to function

--

From: Mike Galbraith
Date: Monday, December 20, 2010 - 2:30 am

I thought you wanted to get the target to the cpu asap?  You just can't

Doesn't matter whether it's kernel or not afaikt.  If virtualization has
to coexist peacefully with other loads, it can't just say "my hints are

That's not necessarily true, in fact, it's very often false.  Last/next
buddy will allow a task to run ahead of leftmost so we don't always

True and true.  However, would you _want_ the scheduler to hold runnable
tasks hostage, and thus let CPU go to waste in the name of perfect

Expensive in many ways, so let's try to not do that.

So why do you need this other task to run before you do, even cross cpu?
If he's a lock holder, getting him to the cpu will give him a chance to
drop, no?  Isn't that what you want to get done?  Drop that lock so you
or someone else can get something other than spinning done?

	-Mike


--

From: Avi Kivity
Date: Monday, December 20, 2010 - 2:46 am

You're right, of course.  I'm fine with running in parallel.  I'm fine 
with him running before or instead of me.  I'm not fine with running 

What does that have to do with being in the same group or not?  I want 
to maintain fairness (needed for pure virt workloads, one guest cannot 
dominate another), but I don't see how being in the same thread group is 
relevant.

Again, I don't want more than one entitlement.  I want to move part of 



Correct.  I don't want the other task to run before me, I just don't 
want to run before it.

-- 
error compiling committee.c: too many arguments to function

--

From: Mike Galbraith
Date: Monday, December 20, 2010 - 3:33 am

My thought is that you can shred your own throughput, but not some other



OK, so what I gather is that if you can preempt another of your own
threads to get the target to cpu, that would be a good thing whether
he's on the same cpu as yield_to() caller or not.  If the target is
sharing a cpu with you, that's even better.  Correct?

Would a kick/hint option be useful?

	-Mike

--

From: Avi Kivity
Date: Monday, December 20, 2010 - 3:39 am

Correct.  I'm not interested in scheduling wrt other tasks, just my tasks.

However, if I'm all alone on my cpu, and the other task is runnable but 
not running, behind some unrelated task, then I do need that task to be 

Depends on what it does...

-- 
error compiling committee.c: too many arguments to function

--

From: Mike Galbraith
Date: Monday, December 20, 2010 - 3:46 am

Let you decide whether you only want to drop a hint and leave it at
that, or also attempt a preemption.

	-Mike


--

From: Avi Kivity
Date: Monday, December 20, 2010 - 3:49 am

Who is "you" in this?  the scheduler?

I'm fine with hints so long as they are usually acted upon (if it isn't, 
I'll go back to the guest, spin a bit more, and retry).

-- 
error compiling committee.c: too many arguments to function

--

From: Mike Galbraith
Date: Monday, December 20, 2010 - 3:50 am

From: Avi Kivity
Date: Monday, December 20, 2010 - 4:06 am

I guess it works.  We can start out without preemption, if we see we 
spin to much, preempt.

-- 
error compiling committee.c: too many arguments to function

--

From: Peter Zijlstra
Date: Tuesday, December 14, 2010 - 5:22 am

That wants to be plain: schedule(), possibly conditional on having

You're calling this from
yield_to()->sched_class::yield_to()->yield_to_fair()->slice_remain(),

Like Mike said, the returned figure doesn't really mean anything, its
definitely not the remaining time of a slice. It might qualify for a

Here you assume @p lives on @rq, but you passed:

+               current->sched_class->yield_to(rq, p);


You cannot simply subtract wall-time from virtual time, see the usage of


Also, modifying the vruntime of one task without also modifying the
vruntime of the other task breaks stuff. You're injecting time into p
without taking time out of current. 


Maybe something like:

static void yield_to_fair(struct rq *p_rq, struct task_struct *p)
{
	struct rq *rq = this_rq();
	struct sched_entity *se = &current->se;
	struct cfs_rq *cfs_rq = cfs_rq_of(se);
	struct sched_entity *pse = &p->se;
	struct cfs_rq *p_cfs_rq = cfs_rq_of(pse);

	/*
	 * Transfer wakeup_gran worth of time from current to @p,
	 * this should ensure current is no longer eligible to run.
	 */
	unsigned long wakeup_gran = ACCESS_ONCE(sysctl_sched_wakeup_granularity);

	update_rq_clock(rq);
	update_curr(cfs_rq);

	if (pse != p_cfs_rq->curr) {
		__dequeue_entity(p_cfs_rq, pse);
	} else {
		update_rq_clock(p_rq);
		update_curr(p_cfs_rq);
	}

	se->vruntime += calc_delta_fair(wakeup_gran, se);
	pse->vruntime -= calc_delta_fair(wakeup_gran, pse);

	clear_buddies(cfs_rq, se);
	
	if (pse != p_cfs_rq->curr) {
		__enqueue_entity(p_cfs_rq, pse);
		check_preempt_curr(prq, p, 0)
	}
}

This isn't strictly correct for the group scheduling case though, that
wants a for_each_sched_entity() loop for both se and pse, but I'd have
to like actually think about that ;-)

A quick hack might be simply dis-allowing yield_to between different
groups, add something like the below to the above function:

#ifdef CONFIG_FAIR_GROUP_SCHED
	if (cfs_rq->tg != p_cfs_rq->tg)
		return;
#endif


--

From: Rik van Riel
Date: Saturday, December 18, 2010 - 7:50 am

On 12/14/2010 07:22 AM, Peter Zijlstra wrote:

... fixed all the obvious stuff.   No idea what the hell I was
thinking while doing that "cleanup" - probably too busy looking
at the tests that I was running on a previous codebase :(

For the next version of the patches, I have switched to your
version of yield_to_fair :)

-- 
All rights reversed
--