2.6.24-git4+ regression

Previous thread: [PATCH][drivers/net/fec_mpc52xx.c] duplicate NETIF_MSG_IFDOWN in MPC52xx_MESSAGES_DEFAULT by Roel Kluin on Wednesday, January 30, 2008 - 7:18 am. (1 message)

Next thread: [PATCH][drivers/pcmcia/ti113x.h] ENE_TEST_C9_PFENABLE duplicate *_F0 by Roel Kluin on Wednesday, January 30, 2008 - 7:28 am. (1 message)
From: Lukas Hejtmanek
Date: Wednesday, January 30, 2008 - 6:56 am

Hello,

I noticed short thread in LKM regarding "sched: add vslice" causes horrible
interactivity under load.

I can see similar behavior. If I stress both CPU cores, even typing on
keyboard suffers from huge latencies, I can see letters appearing with delay
(typing into xterm). No swap is used at all, having 1GB free RAM.

I noticed this bad behavior with 2.6.24-git[46], 2.6.24-rc8-git was OK.

My config is attached.

-- 
Luk
From: Ingo Molnar
Date: Thursday, January 31, 2008 - 3:29 am

if you apply the current sched-fixes (rollup patch below), does it get 
any better?

	Ingo

Index: linux/kernel/sched_fair.c
===================================================================
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -520,7 +520,7 @@ place_entity(struct cfs_rq *cfs_rq, stru
 
 	if (!initial) {
 		/* sleeps upto a single latency don't count. */
-		if (sched_feat(NEW_FAIR_SLEEPERS) && entity_is_task(se))
+		if (sched_feat(NEW_FAIR_SLEEPERS))
 			vruntime -= sysctl_sched_latency;
 
 		/* ensure we never gain time by being placed backwards. */
@@ -1106,7 +1106,11 @@ static void check_preempt_wakeup(struct 
 	}
 
 	gran = sysctl_sched_wakeup_granularity;
-	if (unlikely(se->load.weight != NICE_0_LOAD))
+	/*
+	 * More easily preempt - nice tasks, while not making
+	 * it harder for + nice tasks.
+	 */
+	if (unlikely(se->load.weight > NICE_0_LOAD))
 		gran = calc_delta_fair(gran, &se->load);
 
 	if (pse->vruntime + gran < se->vruntime)
--

From: Lukas Hejtmanek
Date: Thursday, January 31, 2008 - 3:55 am

No. 

Another observation, running two instances of while true; do true; done (on
1 dual core cpu) does not break interactivity.

running make clean; make -j2 in kernel tree breaks interactivity terribly.
Looks like disk I/O activity is needed to break interactivity. 

While compiling, I have more than 1GB of RAM free. One friend of mine suggests
that kernel is swapping out binaries which causes non-interactivity. The
swaparea is clean, though. He also reports that the behavior can be seen even
in 2.6.24-rc8. 

-- 
Lukáš Hejtmánek
--

From: Lukas Hejtmanek
Date: Monday, February 4, 2008 - 4:17 am

Ingo,

any progress here? I've tried to revert this patch:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=67e9fb...

as it was marked as suspicious patch in this case
(http://www.uwsg.indiana.edu/hypermail/linux/kernel/0801.3/1665.html)

but in such a case, kernel 2.6.24-git13 does oops at startup in sched_slice.

I think this is really *big* regression in 2.6.24 kernel.


-- 
Lukáš Hejtmánek
--

From: Peter Zijlstra
Date: Monday, February 4, 2008 - 4:36 am

I can't reproduce this with a pure cpu load. I started 10 
  while :; do :; done &
instances and aside from slowing down, nothing bad happened.

May I suggest you try latency top to see if there is something in your
build scenario that generates horrible latencies (some IO path or
whatnot).

--

From: Lukas Hejtmanek
Date: Monday, February 4, 2008 - 7:36 am

yes, while true; do true; does nothing wrong. But running make -j2 in kernel

see my previous mail to Ingo (you were Cc.), latency top says that Xorg and
gnome-terminal suffers 300+ms latency in scheduler: waiting for cpu.

-- 
Lukáš Hejtmánek
--

From: Peter Zijlstra
Date: Monday, February 4, 2008 - 7:45 am

what happens when you turn CONFIG_FAIR_GROUP_SCHED off?

--

From: Lukas Hejtmanek
Date: Monday, February 4, 2008 - 10:00 am

If I disable CONFIG_FAIR_GROUP_SCHED, it is a lot better. I would not call it
optimal, though. 

Xorg has 20ms latency, gnome-terminal another 20ms latency. If I just press
a key (a letter for instance) to see how autorepeat fills terminal, one can
see that autorepeat is not smooth and it is stopping for a little while
(really extra short stops are visible but still visible). But  it is really 
a ton better that it was with fair group sched.

So, any conclusion? The case is closed or any further investigation should be 
done?

-- 
Lukáš Hejtmánek
--

From: Ingo Molnar
Date: Monday, February 4, 2008 - 5:01 am

could you tell me more about this oops? You booted unmodified, latest 
-git and it oopsed in sched_slice()? The patch below should work around 
any oopses in sched_slice(). [but this is really a 'must not happen' 
scenario - so a just-for-testing patch]

	Ingo

Index: linux-x86.q/kernel/sched_fair.c
===================================================================
--- linux-x86.q.orig/kernel/sched_fair.c
+++ linux-x86.q/kernel/sched_fair.c
@@ -268,7 +268,8 @@ static u64 sched_slice(struct cfs_rq *cf
 	u64 slice = __sched_period(cfs_rq->nr_running);
 
 	slice *= se->load.weight;
-	do_div(slice, cfs_rq->load.weight);
+	if (cfs_rq->load.weight)
+		do_div(slice, cfs_rq->load.weight);
 
 	return slice;
 }
--

From: Lukas Hejtmanek
Date: Monday, February 4, 2008 - 5:29 am

No, I booted modified lates git to see if mentioned patch (revertin slices) 
solves horrible non-interactivy problem. With your fix, I can boot now but 
the patch did not help. Make -j2 in kernel sources significantly decreases 
interactivity. Any ideas?
 
-- 
Lukáš Hejtmánek
--

From: Ingo Molnar
Date: Monday, February 4, 2008 - 6:04 am

yes, please run latencytop - does it pinpoint any latency source? Enable 
CONFIG_LATENCYTOP in the -git13 kernel and run the utility from 
latencytop.org. Also, please send me the output of this script:

  http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh

	Ingo
--

From: Lukas Hejtmanek
Date: Monday, February 4, 2008 - 6:49 am

Not sure whether the application works correctly, with make -j2, it reports
for all processes: Scheduler: waiting for cpu with latency about 300 and more

attached.

-- 
Luk
From: Srivatsa Vaddagiri
Date: Thursday, February 14, 2008 - 9:55 am

Hi Lukas,
	Can you check if the patch below helps improve interactivity for you?

The patch is against 2.6.25-rc1. I would request you to check for
difference it makes with CONFIG_FAIR_GROUP_SCHED and
CONFIG_FAIR_USER_SCHED turned on.

---
 kernel/sched.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

Index: current/kernel/sched.c
===================================================================
--- current.orig/kernel/sched.c
+++ current/kernel/sched.c
@@ -7431,8 +7431,8 @@
 
 			local_load = tg->cfs_rq[i]->load.weight;
 			local_shares = (local_load * total_shares) / total_load;
-			if (!local_shares)
-				local_shares = MIN_GROUP_SHARES;
+			if (!local_load)
+				local_shares = tg->shares;
 			if (local_shares == tg->se[i]->load.weight)
 				continue;
 
@@ -7710,7 +7710,7 @@
 	struct rq *rq = cfs_rq->rq;
 	int on_rq;
 
-	if (!shares)
+	if (shares < MIN_GROUP_SHARES)
 		shares = MIN_GROUP_SHARES;
 
 	on_rq = se->on_rq;

-- 
Regards,
vatsa
--

From: Lukas Hejtmanek
Date: Sunday, February 17, 2008 - 1:26 pm

well, I tried the patch against 2.6.25-rc2-git1. It seems to be better but
without CONFIG_FAIR_GROUP_SCHED it is still even better.

-- 
Lukáš Hejtmánek
--

From: Ingo Molnar
Date: Sunday, February 17, 2008 - 9:28 pm

could you try latest sched-devel.git, does it behave any better? It 
includes patches from Peter Zijlstra that should also address latencies 
under the group scheduler:

  http://people.redhat.com/mingo/sched-devel.git/README

	Ingo
--

From: Mike Galbraith
Date: Monday, February 18, 2008 - 12:38 am

Here, it does not.  It seems fine without CONFIG_FAIR_GROUP_SCHED.

Oddity:  mainline git with Srivatsa's test patch improves markedly, and
using sched_latency_ns and sched_wakeup_granularity_ns, I can tweak the
regression into submission.  With sched-devel, I cannot tweak it away
with or without the test patch.  Dunno how useful that info is.

	-Mike

--

From: Srivatsa Vaddagiri
Date: Monday, February 18, 2008 - 1:20 am

My hunch is its because of the vruntime driven preemption which shoots
up latencies (and the fact perhaps that Peter hasnt't focused more on SMP case
yet!).


Lukas,
	Does tweaking these make any difference for you?

	# echo 10000000 > /proc/sys/kernel/sched_latency_ns
	# echo 10000000 > /proc/sys/kernel/sched_wakeup_granularity_ns


FWIW, my test patch I had sent earlier didnt address the needs of UP, as Peter 
pointed me out. In that direction, I had done more experimentation with the 
patch below, which seemed to improve UP latencies also. Note that I
don't particularly like the first hunk below, perhaps it needs to be
surrounded by an if(something) ..

---
 kernel/sched_fair.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Index: current/kernel/sched_fair.c
===================================================================
--- current.orig/kernel/sched_fair.c
+++ current/kernel/sched_fair.c
@@ -523,8 +523,6 @@ place_entity(struct cfs_rq *cfs_rq, stru
 		if (sched_feat(NEW_FAIR_SLEEPERS))
 			vruntime -= sysctl_sched_latency;
 
-		/* ensure we never gain time by being placed backwards. */
-		vruntime = max_vruntime(se->vruntime, vruntime);
 	}
 
 	se->vruntime = vruntime;
@@ -816,6 +814,13 @@ hrtick_start_fair(struct rq *rq, struct 
 }
 #endif
 
+static inline void dequeue_stack(struct sched_entity *se)
+{
+	for_each_sched_entity(se)
+		if (se->on_rq)
+			dequeue_entity(cfs_rq_of(se), se, 0);
+}
+
 /*
  * The enqueue_task method is called before nr_running is
  * increased. Here we update the fair scheduling stats and
@@ -828,6 +833,9 @@ static void enqueue_task_fair(struct rq 
 			    *topse = NULL;	/* Highest schedulable entity */
 	int incload = 1;
 
+	if (wakeup)
+		dequeue_stack(se);
+
 	for_each_sched_entity(se) {
 		topse = se;
 		if (se->on_rq) {



P.S : Sorry about slow responses, since I am now in a different project :(

-- 
Regards,
vatsa
--

From: Mike Galbraith
Date: Monday, February 18, 2008 - 1:36 am

I'll try this patch later (errands).

	Thanks,


--

Previous thread: [PATCH][drivers/net/fec_mpc52xx.c] duplicate NETIF_MSG_IFDOWN in MPC52xx_MESSAGES_DEFAULT by Roel Kluin on Wednesday, January 30, 2008 - 7:18 am. (1 message)

Next thread: [PATCH][drivers/pcmcia/ti113x.h] ENE_TEST_C9_PFENABLE duplicate *_F0 by Roel Kluin on Wednesday, January 30, 2008 - 7:28 am. (1 message)