Hello, I noticed short thread in LKM regarding "sched: add vslice" causes horrible interactivity under load. I can see similar behavior. If I stress both CPU cores, even typing on keyboard suffers from huge latencies, I can see letters appearing with delay (typing into xterm). No swap is used at all, having 1GB free RAM. I noticed this bad behavior with 2.6.24-git[46], 2.6.24-rc8-git was OK. My config is attached. -- Luk
if you apply the current sched-fixes (rollup patch below), does it get
any better?
Ingo
Index: linux/kernel/sched_fair.c
===================================================================
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -520,7 +520,7 @@ place_entity(struct cfs_rq *cfs_rq, stru
if (!initial) {
/* sleeps upto a single latency don't count. */
- if (sched_feat(NEW_FAIR_SLEEPERS) && entity_is_task(se))
+ if (sched_feat(NEW_FAIR_SLEEPERS))
vruntime -= sysctl_sched_latency;
/* ensure we never gain time by being placed backwards. */
@@ -1106,7 +1106,11 @@ static void check_preempt_wakeup(struct
}
gran = sysctl_sched_wakeup_granularity;
- if (unlikely(se->load.weight != NICE_0_LOAD))
+ /*
+ * More easily preempt - nice tasks, while not making
+ * it harder for + nice tasks.
+ */
+ if (unlikely(se->load.weight > NICE_0_LOAD))
gran = calc_delta_fair(gran, &se->load);
if (pse->vruntime + gran < se->vruntime)
--
No. Another observation, running two instances of while true; do true; done (on 1 dual core cpu) does not break interactivity. running make clean; make -j2 in kernel tree breaks interactivity terribly. Looks like disk I/O activity is needed to break interactivity. While compiling, I have more than 1GB of RAM free. One friend of mine suggests that kernel is swapping out binaries which causes non-interactivity. The swaparea is clean, though. He also reports that the behavior can be seen even in 2.6.24-rc8. -- Lukáš Hejtmánek --
Ingo, any progress here? I've tried to revert this patch: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=67e9fb... as it was marked as suspicious patch in this case (http://www.uwsg.indiana.edu/hypermail/linux/kernel/0801.3/1665.html) but in such a case, kernel 2.6.24-git13 does oops at startup in sched_slice. I think this is really *big* regression in 2.6.24 kernel. -- Lukáš Hejtmánek --
I can't reproduce this with a pure cpu load. I started 10 while :; do :; done & instances and aside from slowing down, nothing bad happened. May I suggest you try latency top to see if there is something in your build scenario that generates horrible latencies (some IO path or whatnot). --
yes, while true; do true; does nothing wrong. But running make -j2 in kernel see my previous mail to Ingo (you were Cc.), latency top says that Xorg and gnome-terminal suffers 300+ms latency in scheduler: waiting for cpu. -- Lukáš Hejtmánek --
If I disable CONFIG_FAIR_GROUP_SCHED, it is a lot better. I would not call it optimal, though. Xorg has 20ms latency, gnome-terminal another 20ms latency. If I just press a key (a letter for instance) to see how autorepeat fills terminal, one can see that autorepeat is not smooth and it is stopping for a little while (really extra short stops are visible but still visible). But it is really a ton better that it was with fair group sched. So, any conclusion? The case is closed or any further investigation should be done? -- Lukáš Hejtmánek --
could you tell me more about this oops? You booted unmodified, latest -git and it oopsed in sched_slice()? The patch below should work around any oopses in sched_slice(). [but this is really a 'must not happen' scenario - so a just-for-testing patch] Ingo Index: linux-x86.q/kernel/sched_fair.c =================================================================== --- linux-x86.q.orig/kernel/sched_fair.c +++ linux-x86.q/kernel/sched_fair.c @@ -268,7 +268,8 @@ static u64 sched_slice(struct cfs_rq *cf u64 slice = __sched_period(cfs_rq->nr_running); slice *= se->load.weight; - do_div(slice, cfs_rq->load.weight); + if (cfs_rq->load.weight) + do_div(slice, cfs_rq->load.weight); return slice; } --
No, I booted modified lates git to see if mentioned patch (revertin slices) solves horrible non-interactivy problem. With your fix, I can boot now but the patch did not help. Make -j2 in kernel sources significantly decreases interactivity. Any ideas? -- Lukáš Hejtmánek --
yes, please run latencytop - does it pinpoint any latency source? Enable CONFIG_LATENCYTOP in the -git13 kernel and run the utility from latencytop.org. Also, please send me the output of this script: http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh Ingo --
Not sure whether the application works correctly, with make -j2, it reports for all processes: Scheduler: waiting for cpu with latency about 300 and more attached. -- Luk
Hi Lukas, Can you check if the patch below helps improve interactivity for you? The patch is against 2.6.25-rc1. I would request you to check for difference it makes with CONFIG_FAIR_GROUP_SCHED and CONFIG_FAIR_USER_SCHED turned on. --- kernel/sched.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) Index: current/kernel/sched.c =================================================================== --- current.orig/kernel/sched.c +++ current/kernel/sched.c @@ -7431,8 +7431,8 @@ local_load = tg->cfs_rq[i]->load.weight; local_shares = (local_load * total_shares) / total_load; - if (!local_shares) - local_shares = MIN_GROUP_SHARES; + if (!local_load) + local_shares = tg->shares; if (local_shares == tg->se[i]->load.weight) continue; @@ -7710,7 +7710,7 @@ struct rq *rq = cfs_rq->rq; int on_rq; - if (!shares) + if (shares < MIN_GROUP_SHARES) shares = MIN_GROUP_SHARES; on_rq = se->on_rq; -- Regards, vatsa --
well, I tried the patch against 2.6.25-rc2-git1. It seems to be better but without CONFIG_FAIR_GROUP_SCHED it is still even better. -- Lukáš Hejtmánek --
could you try latest sched-devel.git, does it behave any better? It includes patches from Peter Zijlstra that should also address latencies under the group scheduler: http://people.redhat.com/mingo/sched-devel.git/README Ingo --
Here, it does not. It seems fine without CONFIG_FAIR_GROUP_SCHED. Oddity: mainline git with Srivatsa's test patch improves markedly, and using sched_latency_ns and sched_wakeup_granularity_ns, I can tweak the regression into submission. With sched-devel, I cannot tweak it away with or without the test patch. Dunno how useful that info is. -Mike --
My hunch is its because of the vruntime driven preemption which shoots
up latencies (and the fact perhaps that Peter hasnt't focused more on SMP case
yet!).
Lukas,
Does tweaking these make any difference for you?
# echo 10000000 > /proc/sys/kernel/sched_latency_ns
# echo 10000000 > /proc/sys/kernel/sched_wakeup_granularity_ns
FWIW, my test patch I had sent earlier didnt address the needs of UP, as Peter
pointed me out. In that direction, I had done more experimentation with the
patch below, which seemed to improve UP latencies also. Note that I
don't particularly like the first hunk below, perhaps it needs to be
surrounded by an if(something) ..
---
kernel/sched_fair.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
Index: current/kernel/sched_fair.c
===================================================================
--- current.orig/kernel/sched_fair.c
+++ current/kernel/sched_fair.c
@@ -523,8 +523,6 @@ place_entity(struct cfs_rq *cfs_rq, stru
if (sched_feat(NEW_FAIR_SLEEPERS))
vruntime -= sysctl_sched_latency;
- /* ensure we never gain time by being placed backwards. */
- vruntime = max_vruntime(se->vruntime, vruntime);
}
se->vruntime = vruntime;
@@ -816,6 +814,13 @@ hrtick_start_fair(struct rq *rq, struct
}
#endif
+static inline void dequeue_stack(struct sched_entity *se)
+{
+ for_each_sched_entity(se)
+ if (se->on_rq)
+ dequeue_entity(cfs_rq_of(se), se, 0);
+}
+
/*
* The enqueue_task method is called before nr_running is
* increased. Here we update the fair scheduling stats and
@@ -828,6 +833,9 @@ static void enqueue_task_fair(struct rq
*topse = NULL; /* Highest schedulable entity */
int incload = 1;
+ if (wakeup)
+ dequeue_stack(se);
+
for_each_sched_entity(se) {
topse = se;
if (se->on_rq) {
P.S : Sorry about slow responses, since I am now in a different project :(
--
Regards,
vatsa
--
