Hi Steven, The following series are patches that have been accepted by Ingo in sched-devel post 2.6.25. They probably should be considered for the next RT release as well. The series applies to 25.4-rt3. Regards, -Greg --
Dmitry Adamushko pointed out a logic error in task_wake_up_rt() where we will always evaluate to "true". You can find the thread here: http://lkml.org/lkml/2008/4/22/296 In reality, we only want to try to push tasks away when a wake up request is not going to preempt the current task. So lets fix it. Note: We introduce test_tsk_need_resched() instead of open-coding the flag check so that the merge-conflict with -rt should help remind us that we may need to support NEEDS_RESCHED_DELAYED in the future, too. Signed-off-by: Gregory Haskins <ghaskins@novell.com> CC: Dmitry Adamushko <dmitry.adamushko@gmail.com> CC: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- include/linux/sched.h | 7 ++++++- kernel/sched_rt.c | 2 +- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 823819c..6fff051 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2118,6 +2118,11 @@ static inline void clear_tsk_need_resched(struct task_struct *tsk) clear_tsk_thread_flag(tsk,TIF_NEED_RESCHED); } +static inline int test_tsk_need_resched(struct task_struct *tsk) +{ + return unlikely(test_tsk_thread_flag(tsk,TIF_NEED_RESCHED)); +} + static inline int signal_pending(struct task_struct *p) { return unlikely(test_tsk_thread_flag(p,TIF_SIGPENDING)); @@ -2132,7 +2137,7 @@ static inline int fatal_signal_pending(struct task_struct *p) static inline int _need_resched(void) { - return unlikely(test_thread_flag(TIF_NEED_RESCHED)); + return unlikely(test_tsk_need_resched(current)); } static inline int need_resched(void) diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c index 5a5da00..80ef2ee 100644 --- a/kernel/sched_rt.c +++ b/kernel/sched_rt.c @@ -987,7 +987,7 @@ static void post_schedule_rt(struct rq *rq) static void task_wake_up_rt(struct rq *rq, struct task_struct *p) { if (!task_running(rq, p) && - (p->prio >= rq->rt.highest_prio) && + ...
We currently use an optimization to skip the overhead of wake-idle
processing if more than one task is assigned to a run-queue. The
assumption is that the system must already be load-balanced or we
wouldnt be overloaded to begin with.
The problem is that we are looking at rq->nr_running, which may include
RT tasks in addition to CFS tasks. Since the presence of RT tasks
really has no bearing on the balance status of CFS tasks, this throws
the calculation off.
This patch changes the logic to only consider the number of CFS tasks
when making the decision to optimze the wake-idle.
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
CC: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/sched_fair.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 24c0830..0ade6f8 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -951,7 +951,7 @@ static int wake_idle(int cpu, struct task_struct *p)
* sibling runqueue info. This will avoid the checks and cache miss
* penalities associated with that.
*/
- if (idle_cpu(cpu) || cpu_rq(cpu)->nr_running > 1)
+ if (idle_cpu(cpu) || cpu_rq(cpu)->cfs.nr_running > 1)
return cpu;
for_each_domain(cpu, sd) {
--
Dmitry Adamushko pointed out a known flaw in the rt-balancing algorithm that could allow suboptimal balancing if a non-migratable task gets queued behind a running migratable one. It is discussed in this thread: http://lkml.org/lkml/2008/4/22/296 This issue has been further exacerbated by a recent checkin to thing. Using Dmitry's nomenclature, if T0 is on cpu1 first, and T1 wakes up at equal or lower priority (affined only to cpu1) later, it *should* wait for T0 to finish. However, in reality that is likely suboptimal from a system perspective if there are other cores that could allow T0 and T1 to run concurrently. Since T1 can not migrate, the only choice for higher concurrency is to try to move T0. This is not something we addessed in the recent rt-balancing re-work. This patch tries to enhance the balancing algorithm by accomodating this scenario. It accomplishes this by incorporating the migratability of a task into its priority calculation. Within a numerical tsk->prio, a non-migratable task is logically higher than a migratable one. We maintain this by introducing a new per-priority queue (xqueue, or exclusive-queue) for holding non-migratable tasks. The scheduler will draw from the xqueue over the standard shared-queue (squeue) when available. There are several details for utilizing this properly. 1) During task-wake-up, we not only need to check if the priority preempts the current task, but we also need to check for this non-migratable condition. Therefore, if a non-migratable task wakes up and sees an equal priority migratable task already running, it will attempt to preempt it *if* there is a likelyhood that the current task will find an immediate home. 2) Tasks only get this non-migratable "priority boost" on wake-up. Any requeuing will result in the non-migratable task being queued to the end of the shared queue. This is an attempt to prevent the system from being completely unfair to migratable tasks during things ...
