Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler

Previous thread: [PATCH] initramfs should not depend on CONFIG_BLOCK by dimitri.gorokhovik on Monday, March 5, 2007 - 6:09 pm. (1 message)

Next thread: OOPS with 2.6.21rc2-git (ata: conflict with ide0/1) by Kok, Auke on Monday, March 5, 2007 - 6:32 pm. (4 messages)
To: Al Boldi <a1426z@...>
Cc: Markus <mjt@...>, ck list <ck@...>, <linux-kernel@...>
Date: Monday, March 5, 2007 - 6:10 pm

Hah I just wish gears would go away. If I get hardware where it runs at just
the right speed it looks like it doesn't move at all. On other hardware the
wheels go backwards and forwards where the screen refresh rate is just
perfectly a factor of the frames per second (or something like that).

This is not a cpu scheduler test and you're inferring that there are cpu
scheduling artefacts based on an application that has bottlenecks at
different places depending on the hardware combination.

To imply something is fishy with nice levels, do a test that _only_ uses cpu
(and not the bus, memory bandwidth, the gpu and has driver interactions) and
prove that there is something wrong. What happens to other resources on the
machine the cpu scheduler has no control over. The -rt tree tries to address
these factors for example but it's a huge - some would say insurmountable -
thing to try and manage at all levels on a general purpose operating system.
The cpu is proportioned out very fairly with rsdl on both quota and latency
according to nice vs cpu usage. When something is fully cpu bound on rsdl its
average and maximum latency will be greater than something that only
intermittently uses cpu. The (somewhat lengthy and not easily digestible)

Thanks!

--
-ck
-

To: Con Kolivas <kernel@...>
Cc: Al Boldi <a1426z@...>, Markus <mjt@...>, ck list <ck@...>, <linux-kernel@...>
Date: Tuesday, March 6, 2007 - 4:42 am

I'd add that Xorg has its own scheduler (for X11 operations, of course),
that has its own quirks, and chances are that it is the one you're
testing with glxgears. And as Con said, as long as glxgears does more
FPS than your screen refresh rate, its flickering its completely
meaningless: it doesn't even attempt to sync with vblank. Al, you'd
better try with Quake3 or Nexuiz, or even Blender if you want to test 3D
interactivity under load.

Xav

-

To: Xavier Bestel <xavier.bestel@...>
Cc: Markus <mjt@...>, ck list <ck@...>, <linux-kernel@...>, Con Kolivas <kernel@...>
Date: Tuesday, March 6, 2007 - 11:15 am

Actually, games aren't really usefull to evaluate scheduler performance, due
to their bursty nature.

OTOH, gears runs full throttle, including any of its bottlenecks. In fact,
it's the bottlenecks that add to its realism. It exposes underlying
scheduler hickups visually, unless buffered by the display-driver, in which
case you just use the vesa-driver to be sure.

If gears starts to flicker on you, just slow it down with a cpu hog like:

# while :; do :; done &

Add as many hogs as you need to make the hickups visible.

Again, these hickups are only visible when using uneven nice+ levels.

BTW, another way to show these hickups would be through some kind of a
cpu/proc timing-tracer. Do we have something like that?

Thanks!

--
Al

-

To: Con Kolivas <kernel@...>
Cc: ck list <ck@...>, <linux-kernel@...>
Date: Sunday, March 11, 2007 - 2:11 pm

Here is something like a tracer.

Original idea by Chris Friesen, thanks, from this post:
http://marc.theaimsgroup.com/?l=linux-kernel&m=117331003029329&w=4

Try attached chew.c like this:
Boot into /bin/sh.
Run chew in one console.
Run nice chew in another console.
Watch timings.

Console 1: ./chew
pid 655, prio 0, out for 6 ms
pid 655, prio 0, out for 6 ms
pid 655, prio 0, out for 6 ms
pid 655, prio 0, out for 5 ms
pid 655, prio 0, out for 6 ms
pid 655, prio 0, out for 6 ms
pid 655, prio 0, out for 5 ms
pid 655, prio 0, out for 5 ms
pid 655, prio 0, out for 5 ms
pid 655, prio 0, out for 6 ms
pid 655, prio 0, out for 5 ms
pid 655, prio 0, out for 5 ms
pid 655, prio 0, out for 5 ms
pid 655, prio 0, out for 6 ms
pid 655, prio 0, out for 5 ms
pid 655, prio 0, out for 6 ms
pid 655, prio 0, out for 6 ms
pid 655, prio 0, out for 6 ms
pid 655, prio 0, out for 5 ms
pid 655, prio 0, out for 6 ms
pid 655, prio 0, out for 6 ms
pid 655, prio 0, out for 5 ms

Console 2: nice -10 ./chew
pid 669, prio 10, out for 6 ms
pid 669, prio 10, out for 6 ms
pid 669, prio 10, out for 6 ms
pid 669, prio 10, out for 5 ms
pid 669, prio 10, out for 65 ms
pid 669, prio 10, out for 6 ms
pid 669, prio 10, out for 6 ms
pid 669, prio 10, out for 6 ms
pid 669, prio 10, out for 5 ms
pid 669, prio 10, out for 6 ms
pid 669, prio 10, out for 6 ms
pid 669, prio 10, out for 5 ms
pid 669, prio 10, out for 6 ms
pid 669, prio 10, out for 6 ms
pid 669, prio 10, out for 65 ms
pid 669, prio 10, out for 6 ms
pid 669, prio 10, out for 6 ms
pid 669, prio 10, out for 6 ms
pid 669, prio 10, out for 6 ms
pid 669, prio 10, out for 6 ms

Console 2: nice -15 ./chew
pid 673, prio 15, out for 5 ms
pid 673, prio 15, out for 6 ms
pid 673, prio 15, out for 95 ms
pid 673, prio 15, out for 5 ms
pid 673, ...

To: Al Boldi <a1426z@...>
Cc: ck list <ck@...>, <linux-kernel@...>
Date: Sunday, March 11, 2007 - 5:52 pm

And thank you! I think I know what's going on now. I think each rotation is
followed by another rotation before the higher priority task is getting a
look in in schedule() to even get quota and add it to the runqueue quota.
I'll try a simple change to see if that helps. Patch coming up shortly.

--
-ck
-

To: Al Boldi <a1426z@...>
Cc: ck list <ck@...>, <linux-kernel@...>
Date: Sunday, March 11, 2007 - 6:12 pm

Can you try the following patch and see if it helps. There's also one minor
preemption logic fix in there that I'm planning on including. Thanks!

---
kernel/sched.c | 24 +++++++++++-------------
1 file changed, 11 insertions(+), 13 deletions(-)

Index: linux-2.6.21-rc3-mm2/kernel/sched.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/kernel/sched.c 2007-03-12 08:47:43.000000000 +1100
+++ linux-2.6.21-rc3-mm2/kernel/sched.c 2007-03-12 09:10:33.000000000 +1100
@@ -96,10 +96,9 @@ unsigned long long __attribute__((weak))
* provided it is not a realtime comparison.
*/
#define TASK_PREEMPTS_CURR(p, curr) \
- (((p)->prio < (curr)->prio) || (((p)->prio == (curr)->prio) && \
+ (((p)->prio < (curr)->prio) || (!rt_task(p) && \
((p)->static_prio < (curr)->static_prio && \
- ((curr)->static_prio > (curr)->prio)) && \
- !rt_task(p)))
+ ((curr)->static_prio > (curr)->prio))))

/*
* This is the time all tasks within the same priority round robin.
@@ -3323,7 +3322,7 @@ static inline void major_prio_rotation(s
*/
static inline void rotate_runqueue_priority(struct rq *rq)
{
- int new_prio_level, remaining_quota;
+ int new_prio_level;
struct prio_array *array;

/*
@@ -3334,7 +3333,6 @@ static inline void rotate_runqueue_prior
if (unlikely(sched_find_first_bit(rq->dyn_bitmap) < rq->prio_level))
return;

- remaining_quota = rq_quota(rq, rq->prio_level);
array = rq->active;
if (rq->prio_level > MAX_PRIO - 2) {
/* Major rotation required */
@@ -3368,10 +3366,11 @@ static inline void rotate_runqueue_prior
}
rq->prio_level = new_prio_level;
/*
- * While we usually rotate with the rq quota being 0, it is possible
- * to be negative so we subtract any deficit from the new level.
+ * As we are merging to a prio_level that may not have anything in
+ * its quota we add 1 to ensure the...

To: Con Kolivas <kernel@...>
Cc: ck list <ck@...>, <linux-kernel@...>
Date: Monday, March 12, 2007 - 12:42 am

Applied on top of v0.28 mainline, and there is no difference.

What's it look like on your machine?

Thanks!

--
Al

-

To: Al Boldi <a1426z@...>
Cc: ck list <ck@...>, <linux-kernel@...>
Date: Monday, March 12, 2007 - 12:53 am

The higher priority one always get 6-7ms whereas the lower priority one runs
6-7ms and then one larger perfectly bound expiration amount. Basically
exactly as I'd expect. The higher priority task gets precisely RR_INTERVAL
maximum latency whereas the lower priority task gets RR_INTERVAL min and full
expiration (according to the virtual deadline) as a maximum. That's exactly
how I intend it to work. Yes I realise that the max latency ends up being
longer intermittently on the niced task but that's -in my opinion- perfectly
fine as a compromise to ensure the nice 0 one always gets low latency.

Eg:
nice 0 vs nice 10

nice 0:
pid 6288, prio 0, out for 7 ms
pid 6288, prio 0, out for 6 ms
pid 6288, prio 0, out for 6 ms
pid 6288, prio 0, out for 6 ms
pid 6288, prio 0, out for 6 ms
pid 6288, prio 0, out for 6 ms
pid 6288, prio 0, out for 6 ms
pid 6288, prio 0, out for 6 ms
pid 6288, prio 0, out for 6 ms
pid 6288, prio 0, out for 6 ms
pid 6288, prio 0, out for 6 ms
pid 6288, prio 0, out for 6 ms
pid 6288, prio 0, out for 6 ms

nice 10:
pid 6290, prio 10, out for 6 ms
pid 6290, prio 10, out for 6 ms
pid 6290, prio 10, out for 6 ms
pid 6290, prio 10, out for 6 ms
pid 6290, prio 10, out for 6 ms
pid 6290, prio 10, out for 6 ms
pid 6290, prio 10, out for 6 ms
pid 6290, prio 10, out for 6 ms
pid 6290, prio 10, out for 6 ms
pid 6290, prio 10, out for 66 ms
pid 6290, prio 10, out for 6 ms
pid 6290, prio 10, out for 6 ms
pid 6290, prio 10, out for 6 ms

exactly as I'd expect. If you want fixed latencies _of niced tasks_ in the
presence of less niced tasks you will not get them with this scheduler. What
you will get, though, is a perfectly bound relationship knowing exactly what
the maximum latency will ever be.

Thanks for the test case. It's interesting and nice that it confirms this
scheduler works as I expect it to.

--
-ck
-

To: Con Kolivas <kernel@...>
Cc: ck list <ck@...>, <linux-kernel@...>
Date: Monday, March 12, 2007 - 7:26 am

I think, it should be possible to spread this max expiration latency across
the rotation, should it not?

Thanks!

--
Al

-

To: Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>
Cc: ck list <ck@...>, <linux-kernel@...>
Date: Tuesday, March 13, 2007 - 11:31 am

Can you try the attached patch please Al and Mike? It "dithers" the priority
bitmap which tends to fluctuate the latency a lot more but in a cyclical
fashion. This tends to make the max latency bound to a smaller value and
should make it possible to run -nice tasks without killing the latency of the
non niced tasks. Eg you could possibly run X nice -10 at a guess like we used
to in 2.4 days. It's not essential of course, but is a workaround for Mike's
testcase.

Thanks.

---
Modify the priority bitmaps of different nice levels to be dithered
minimising the latency likely when different nice levels are used. This
allows low cpu using relatively niced tasks to still get low latency in the
presence of less niced tasks.

Fix the accounting on -nice levels to not be scaled by HZ.

Signed-off-by: Con Kolivas <kernel@kolivas.org>

---
kernel/sched.c | 69 +++++++++++++++++++++++++++++++++++++--------------------
1 file changed, 45 insertions(+), 24 deletions(-)

Index: linux-2.6.21-rc3-mm2/kernel/sched.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/kernel/sched.c 2007-03-13 23:17:29.000000000 +1100
+++ linux-2.6.21-rc3-mm2/kernel/sched.c 2007-03-14 02:20:37.000000000 +1100
@@ -89,24 +89,34 @@ unsigned long long __attribute__((weak))
#define SCHED_PRIO(p) ((p)+MAX_RT_PRIO)
#define MAX_DYN_PRIO (MAX_PRIO + PRIO_RANGE)

-/*
- * Preemption needs to take into account that a low priority task can be
- * at a higher prio due to list merging. Its priority is artificially
- * elevated and it should be preempted if anything higher priority wakes up
- * provided it is not a realtime comparison.
- */
-#define TASK_PREEMPTS_CURR(p, curr) \
- (((p)->prio < (curr)->prio) || (!rt_task(p) && \
- ((p)->static_prio < (curr)->static_prio && \
- ((curr)->static_prio > (curr)->prio))))
+#define TASK_PREEMPTS_CURR(p, curr) ((p)->prio < (curr)->prio)

/*
* This is the time ...

To: Al Boldi <a1426z@...>
Cc: Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, ck list <ck@...>, <linux-kernel@...>
Date: Tuesday, March 13, 2007 - 12:03 pm

Oops, one tiny fix. This is a respin of the patch, sorry.
---
Modify the priority bitmaps of different nice levels to be dithered
minimising the latency likely when different nice levels are used. This
allows low cpu using relatively niced tasks to still get low latency in the
presence of less niced tasks.

Fix the accounting on -nice levels to not be scaled by HZ.

Signed-off-by: Con Kolivas <kernel@kolivas.org>

---
kernel/sched.c | 73 ++++++++++++++++++++++++++++++++++++---------------------
1 file changed, 47 insertions(+), 26 deletions(-)

Index: linux-2.6.21-rc3-mm2/kernel/sched.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/kernel/sched.c 2007-03-13 23:17:29.000000000 +1100
+++ linux-2.6.21-rc3-mm2/kernel/sched.c 2007-03-14 03:01:58.000000000 +1100
@@ -89,24 +89,34 @@ unsigned long long __attribute__((weak))
#define SCHED_PRIO(p) ((p)+MAX_RT_PRIO)
#define MAX_DYN_PRIO (MAX_PRIO + PRIO_RANGE)

-/*
- * Preemption needs to take into account that a low priority task can be
- * at a higher prio due to list merging. Its priority is artificially
- * elevated and it should be preempted if anything higher priority wakes up
- * provided it is not a realtime comparison.
- */
-#define TASK_PREEMPTS_CURR(p, curr) \
- (((p)->prio < (curr)->prio) || (!rt_task(p) && \
- ((p)->static_prio < (curr)->static_prio && \
- ((curr)->static_prio > (curr)->prio))))
+#define TASK_PREEMPTS_CURR(p, curr) ((p)->prio < (curr)->prio)

/*
* This is the time all tasks within the same priority round robin.
* Set to a minimum of 6ms.
*/
-#define RR_INTERVAL ((6 * HZ / 1001) + 1)
+#define __RR_INTERVAL 6
+#define RR_INTERVAL ((__RR_INTERVAL * HZ / 1001) + 1)
#define DEF_TIMESLICE (RR_INTERVAL * 20)

+/*
+ * This contains a bitmap for each dynamic priority level with empty slots
+ * for the valid priorities each different nice level can have. It allows
+ * us t...

To: Al Boldi <a1426z@...>
Cc: Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, ck list <ck@...>, <linux-kernel@...>
Date: Tuesday, March 13, 2007 - 4:58 pm

Bah with a bit more sleep under my belt it became clear that I forgot to
update the expired array in any proper way so this change almost breaks stuff
at the moment in the shape it's in. Please disregard this change for now
apart from interest in how I'm tackling the nice issue.

--
-ck
-

To: <ck@...>, Ingo Molnar <mingo@...>
Cc: Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Tuesday, March 13, 2007 - 7:08 pm

The rsdl patches queued up so far are stable and boot fine and are reasonably
performant on many architectures so I'm quite happy for them to get a run
in -mm. The changes planned will (as you may have seen on this email thread)
decrease average latencies across all nice levels, and make differential nice
levels run better together. This will allow -nice to be used without
significant latency harm to not niced tasks (as there is presently in rsdl
and mainline). The change required on top of the patch earlier in this email
is to make the dynamic bitmap reflect where the tasks will actually be on an
array swap.

However, I must inform people that I have to arrest the RSDL development for
at least this week. I have a new and fairly serious neck problem that is
being exacerbated badly by sitting in front of the computer for any extended
period.

I suspect the inner workings of RSDL currently are not well understood yet by
anyone else well enough to hack on it. I'm not at all opposed to someone
taking up the code at the moment and making the necessary changes I've
mentioned above in the meantime though if they can get their head around it.

--
-ck
-

To: <ck@...>, Mike Galbraith <efault@...>
Cc: Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Friday, March 16, 2007 - 8:25 am

Thanks everyone for your offlist sympathy you sent me with respect to this
neck problem.

There is good news and bad news. The bad news is the nature of this problem is
that it will get a bit worse before it gets better, and it has not peaked
yet.

The good news is that in order to stay sane I've found ways of optimising my
computing position and could work for short stints on the rsdl code. So I'll
be releasing a new version shortly.

--
-ck
-

To: <ck@...>
Cc: Mike Galbraith <efault@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Subject: RSDL v0.31
Date: Friday, March 16, 2007 - 9:40 am

To: Con Kolivas <kernel@...>, <ck@...>
Cc: Mike Galbraith <efault@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Friday, March 16, 2007 - 5:55 pm

Thanks! It looks much better now.

With X nice'd at -10, and 11 hogs loading the cpu, interactivity looks good
until the default timeslice/quota is exhausted and slows down. Maybe
adjusting this according to nice could help.

It may also be advisable to fix latencies according to nice, and adjust
timeslices instead. This may help scaleability a lot, as there are some
timing sensitive apps that may crash under high load.

Thanks!

--
Al

-

To: Al Boldi <a1426z@...>
Cc: <ck@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Friday, March 16, 2007 - 10:51 pm

Thank you. You should find most of your latency concerns you brought up have

Not sure what you mean by that. It's still a fair distribution system, it's
just that nice -10 has quite a bit more cpu allocated than nice 0. Eventually

You will find that is the case already with this version. Even under heavy
load if you were to be running one server niced (say httpd nice 19 in the
presence of mysql nice 0) the latencies would be drastically reduced compared
to mainline behaviour. I am aware this becomes an issue for some heavily
loaded servers because some servers run multithreaded while others do not,

--
-ck
-

To: Con Kolivas <kernel@...>
Cc: <ck@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 12:40 am

I mean #DEF_TIMESLICE seems like an initial quota, which gets reset with each
major rotation. Increasing this according to nice may give it more room for
an interactivity boost, before being expired. Alternatively, you may just
reset the rotation, if the task didn't use it's quota within one full major

The thing is, latencies are currently dependent on the number of tasks in the
run-queue; i.e. more rq-tasks means higher latencies, yet fixed timeslices
according to nice. Just switching this the other way around, by fixing
latencies according to nice, and adjusting the timeslices depending on
rq-load, may yield a much more scalable system.

Thanks!

--
Al

-

To: Al Boldi <a1426z@...>
Cc: <ck@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 12:57 am

DEF_TIMESLICE is a value used for smp balancing and has no effect on quota so
I doubt you mean that value. The quota you're describing of not resetting is
something like the sleep average idea of current systems where you accumulate
bonus points by sleeping when you would be running and redeem them later.
This is exactly the system I'm avoiding using in rsdl as you'd have to decide
just how much sleep time it could accumulate, and over how long it would run
out, and so on. ie that's the interactivity estimator. This is the system
that destroys any guarantee of cpu percentage, and ends up leading to periods
of relative starvation, is open to tuning that can either be too short or too

That is not really feasible to implement. How can you guarantee latencies when
the system is overloaded? If you have 1000 tasks all trying to get scheduled
in say 10ms you end up running for only 10 microseconds at a time. That will
achieve the exact opposite whereby as the load increases the runtime gets

--
-ck
-

To: Con Kolivas <kernel@...>
Cc: <ck@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 12:12 pm

Ok, but I can clearly see an expiration happening for sleeping tasks, like X.
It looks like it's climbing a ladder halfway, then it sleeps, and when it
wakes up, it continues to complete the ladder to expiration. Couldn't this

Most of the time we only run a small number of tasks. And for the case of
1000's of tasks, you could put in some lower threshold, that would trigger
an increase of latency, if the timeslice became ridiculously small.

Thanks!

--
Al

-

To: Con Kolivas <kernel@...>
Cc: Al Boldi <a1426z@...>, <ck@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 9:50 am

Con

Think that Al may have a point. Yes there are cases where it not work well. How about
reversing the idea? Set the initial timeslice so it will give good latency with a fairly high
load. Increase the timeslice when the load is lower than the load we attempt to guarantee
good latency for. Idea being to reduce the number of context switches while retaining a
fixed latency.

Ed Tomlinson
-

To: <linux-kernel@...>
Cc: Con Kolivas <kernel@...>, Al Boldi <a1426z@...>, <ck@...>, Ingo Molnar <mingo@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 1:15 am

Greetings Con & company;

I built and rebooted to 2.6.20.3-rdsl-0.31 earlier this evening, but
purposely waited till amanda was well underway to make a report.

The report is that I really really have to work hard to tell that amanda
is running even though the cpu according to gkrellm is running between 97
and 99%.

For my loading then, this is as much an improvement over the -ck1 patch as
it was over the un-patched but same version of the kernel. FWIW, I'd
also built a 2.6.20.3-rdsl-0.30 and ran it for a day but it was nearly as
spastic as no patch.

Did I say I like this yet? :)

Now I'm waiting for 2.6.21-rc4 to make the mirrors & see if tar is still
broken. Based on the clues I've been able to find, I bz'd the tar since
that's a fedora supplied rpm install. Humm, I just now recalled that I
have a tarball built tar-1.15.1 on another drive, I was using it when I
was running FC2, so that might be something else to bisect against, and I
will, bet on it.

Many thanks Con, this is very nice. I've only seen one split second when
the screen was about 2 chars behind my typing. This is great. :-)

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
When a cow laughs, does milk come out of its nose?
-

To: Con Kolivas <kernel@...>
Cc: <ck@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Friday, March 16, 2007 - 1:12 pm

Hi, I tried to send this before but I'm not sure it got through.
All I wanted to say was that this version does not improve my
problem with Wengophone (tested with 2.6.21-rc4). Otherwise it
seems to run fine. I won't be able to test anything new for
about a month, but I guess by then it'll be perfect anyway :)

Ash
---------------------------------------------
Free POP3 Email from www.Gawab.com
Sign up NOW and get your account @gawab.com!!
-

To: AshMilsted <thatistosayiseenem@...>
Cc: Con Kolivas <kernel@...>, <ck@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Friday, March 16, 2007 - 1:41 pm

Wengophone does not really work fine on 'Linux' yet with whatever kernel.

The Linux version has a lot problems ( AFAIK all known by the Wengophone

Gabriel

-

To: Con Kolivas <kernel@...>
Cc: <ck@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Friday, March 16, 2007 - 11:34 am

It still has trouble with the x/gforce vs two niced encoders scenario.
The previously reported choppiness is still present.

I suspect that x/gforce landing in the expired array is the trouble, and
that this will never be smooth without some kind of exemption. I added
some targeted unfairness to .30, and it didn't help much at all.

Priorities going all the way to 1 were a surprise.

-Mike

-

To: Mike Galbraith <efault@...>
Cc: <ck@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Friday, March 16, 2007 - 5:13 pm

It wasn't going to change that case without renicing X. I said that from the
start to maintain fairness it's the only way to keep a fair design, and give
more cpu to X. The major difference in this one is the ability to run
different nice values without killing the latency of the relatively niced
ones.

--
-ck
-

To: Con Kolivas <kernel@...>
Cc: <ck@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Friday, March 16, 2007 - 6:30 pm

Con. You are trying to wedge a fair scheduler into an environment where
totally fair simply can not possibly function.

If this is your final answer to the problem space, I am done testing,
and as far as _I_ am concerned, your scheduler is an utter failure.

-Mike

-

To: Mike Galbraith <efault@...>
Cc: Con Kolivas <kernel@...>, <ck@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 10:32 am

The increased AIM7 throughput (and the other benchmark results)
looked very promising to me.

I wonder what we're doing wrong in the normal scheduler...

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
-

To: Rik van Riel <riel@...>
Cc: Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 11:39 am

there's a relatively easy way to figure out whether it's related to the
interactivity code: try AIM7 with SCHED_BATCH as well, to take most of
the 'interactivity effects' out of the picture.

build the attached setbatch.c code and do "./setbatch $$" to change the
shell to SCHED_BATCH (and all its future children will be SCHED_BATCH
too).

Ingo

To: Rik van Riel <riel@...>
Cc: Con Kolivas <kernel@...>, <ck@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 10:43 am

Yeah, I tried AIM7 with both schedulers, but apparently you need size
mondo hardware. My poor little box produced identical results.

(I only noticed one other benchmark result, and IIRC it showed a ~5%
drop in throughput, which was attributed to higher context switch rate)

-Mike

-

To: <ck@...>
Cc: <linux-kernel@...>
Date: Friday, March 16, 2007 - 7:05 pm

I can not let this comment stay like that. I have an AMD X2 4400+ Dual Core=
=20
running Gentoo and now kernel 2.6.21-rc3 with RSDL 0.30 (HZ=3D300).
Up till now whenever I wanted to watch a movie i had to stop compiling with=
=20
more than one task for the movie to run without skips. When playing games i=
=20
have to renice the game (-15-) or else it would get 'choppy'.
With the new RSDL i compile packages with -j3 (reniced to 15), my wife lets=
up=20
to 8 computations (scientific computations) running at the same time and th=
e=20
game and a movie still run without any visible flaws. The only thing i saw=
=20
till now was that the mouse cursor was a little less responsive and scrolli=
ng=20
in firefox took a little longer. But amarok for music, the movie in mplayer=
,=20
the 3d game, everything went smooth though a load of > 11. This all without=
=20
even renicing anything but the compiles. With mainline kernel already=20
watching a movie with this load was impossible.
I used the staircase scheduler before RSDL but even with staircase such=20
overload was not possible while watching a movie.
Mike, maybe use higher nice levels for your encoders or just use one. Or ma=
ybe=20
scheck your memory, i guess if the memory bandwidth is too low there's no=20
scheduler which can foresee such thing and react accordingly. Since you hav=
e=20
a HT system it's just one physical ALU, so everything has to be squeezed on=
to=20
this one ALU, up to a certain degree it works, but not forever. And the lam=
e=20
encoders i suppose won't wait that very much and long for their data to get=
=20
delivered from memory so they'll utilize the ALU quite a lot.
Con, continue your scheduler development as it helps many cases which were =
not=20
possible otherwise. I'm amazed of the ability of the scheduler to handle a =
5=20
times overloaded system without too much hazzle.
Great work Con.

Dirk.

PS: Con, don't stress your neck too much, your health is the only thing you=
=20
have to keep f...

To: Mike Galbraith <efault@...>
Cc: Con Kolivas <kernel@...>, <ck@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 12:24 am

Sorry, I haven't really been following this thread and now I'm confused.

You're saying that it's somehow the scheduler's fault that X isn't
running with a high enough priority?

--
Nicholas Miell <nmiell@comcast.net>

-

To: Nicholas Miell <nmiell@...>
Cc: Con Kolivas <kernel@...>, <ck@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 1:56 am

I'm saying that the current scheduler adjusts for interactive loads,
this new one doesn't. I'm seeing interactivity regressions, and they
are not fixed with nice unless nice is used to maximum effect. I'm
saying yes, I can lower my expectations, but no I don't want to.

A four line summary is as short as I can make it.

-Mike

-

To: Mike Galbraith <efault@...>
Cc: Con Kolivas <kernel@...>, <ck@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 2:26 am

Uh, no. Essentially, the current scheduler works around X's brokenness,
in an often unpredictable manner.

RSDL appears to be completely deterministic, which is a very strong
virtue.

The X people have plans for how to go about fixing this, but until then,
there's no reason to hold up kernel development.

--
Nicholas Miell <nmiell@comcast.net>

-

To: Nicholas Miell <nmiell@...>
Cc: Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 3:56 am

No. The two schedulers simply use different heuristics. RSDL uses _less_
heuristics, and thus gets some workloads right that the heuristics in
the current scheduler got wrong. But it also gets some other workloads
wrong.

so basically, the current scheduler has a built-in "auto-nice" feature,
while RSDL relies more on manual assignment of nice values.

if you want no heuristics at all you can do it in the current scheduler:
use SCHED_BATCH on your shell and start up X with that. I'd not mind
tweaking SCHED_BATCH with an RSDL-alike timeslice quota system.

so it is not at all clear to me that RSDL is indeed an improvement, if
it does not have comparable auto-nice properties.

Ingo
-

To: <ck@...>
Cc: Ingo Molnar <mingo@...>, Nicholas Miell <nmiell@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>
Date: Saturday, March 17, 2007 - 7:07 am

Wasn't the point of RSDL to get rid of the auto-nice, because it caused=20
starvation, unpredictable behaviour and other problems?

Anyway, I think it's a good thing we keep having a look at mike's problem, =
but=20
it's not clear to me how far he got in solving it. Does the latest patch=20
solve the interactivity problem, providing X is niced -10 (or something)???=
=20

If it does, I think that's the solution - at least until the X ppl fix X=20
itself. Distributions can just go back renicing X (they did that before,=20
after all), and the biggest problem is fixed. Then all other users can have=
=20
the improvements RSDL offers, the developers can rejoice over the simpler a=
nd=20
cleaner design and code, and everybody is happy.

If it doesn't solve the problem, more work is in order. I think ignoring a=
=20
clear regression to mainline, no matter how rare, isn't smart. It might=20
indicate an underlying problem, and even if it doesn't - you don't want ppl=
=20
complaining the new kernel isn't interactive anymore or something...

/Jos

=2D-=20
Disclaimer:

Alles wat ik doe denk en zeg is gebaseerd op het wereldbeeld wat ik nu heb.=
=20
Ik ben niet verantwoordelijk voor wijzigingen van de wereld, of het beeld w=
at=20
ik daarvan heb, noch voor de daaruit voortvloeiende gedragingen van mezelf.=
=20
Alles wat ik zeg is aardig bedoeld, tenzij expliciet vermeld.

To: jos poortvliet <jos@...>
Cc: Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>
Date: Saturday, March 17, 2007 - 8:44 am

it doesnt really get rid of it, it replaces it with another mechanism
that is fundamentally unfair too.

RSDL has _another_, albeit more hidden "auto-nice" behavior: this time
expressed not via the plain manipulation of priorities based on the
sleep average, but expressed via the quota-depletion flux of tasks over
time, fed into a complex dance of rotating priorities - which
quota-depletion flux is in essence a sleep average too, just more
derived and more hardcoded.

or looking at it from another angle, code size:

text data bss dec hex filename
15750 24 6008 21782 5516 sched.o.vanilla
15960 360 6336 22656 5880 sched.o.rsdl

there's no reduction in complexity, it just moved elsewhere.

Ingo
-

To: Ingo Molnar <mingo@...>
Cc: Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>
Date: Saturday, March 17, 2007 - 9:44 am

Hmmm. I wonder, then, does RSDL give an advantage in the areas I mentioned=
=20
(starvation and predictability)?=20

RSDL does give equal timeslices (eg equal cpu time) to each process - it's=
=20
just that processes which didn't use their time yet can quickly run, right?=
=20
Now I might not understand things here, but that it sounds more fair, thoug=
h=20
you're more quallified to judge that. So, for me, as an user, it boils down=
=20
to: does this solve problems, and does it introduce problems?

I must say I compare RSDL to staircase, as that's what I'm used to - a bit=
=20
more interactive compared to mainline. RSDL does slightly worse AND slightl=
y=20
better - worse in interactivity on heavy loads (8 makes running on my=20
dualcore) but it doesn't have the systemwide stalls sometimes occurring on=
=20
both mainline and staircase - though it replaces them with shorter,=20

=2D-=20
Disclaimer:

Alles wat ik doe denk en zeg is gebaseerd op het wereldbeeld wat ik nu heb.=
=20
Ik ben niet verantwoordelijk voor wijzigingen van de wereld, of het beeld w=
at=20
ik daarvan heb, noch voor de daaruit voortvloeiende gedragingen van mezelf.=
=20
Alles wat ik zeg is aardig bedoeld, tenzij expliciet vermeld.

To: jos poortvliet <jos@...>
Cc: <ck@...>, Ingo Molnar <mingo@...>, Nicholas Miell <nmiell@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>
Date: Saturday, March 17, 2007 - 10:04 am

Ingo,

The other point to make here is that you only need to nice X if you are heavily
overloading the box. Here X is NOT niced and RSDL 0.30 is giving me better
performance.

Ed Tomlinson
-

To: Nicholas Miell <nmiell@...>
Cc: Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 3:45 am

then we'll first have wait for those X changes to at least be done in a
minimal manner so that they can be tested for real with RSDL. (is it
_really_ due to that? Or will X regress forever once we switch to RSDL?)
We cannot regress the scheduling of a workload as important as "X mixed
with CPU-intense tasks". And "in theory this should be fixed if X is
fixed" does not cut it. X is pretty much _the_ most important thing to
optimize the interactive behavior of a Linux scheduler for. Also,
paradoxically, it is precisely the improvement of _X_ workloads that
RSDL argues with.

this regression has to be fixed before RSDL can be merged, simply
because it is a pretty negative effect that goes beyond any of the
visible positive improvements that RSDL brings over the current
scheduler. If it is better to fix X, then X has to be fixed _first_, at
least in form of a prototype patch that can be _tested_, and then the
result has to be validated against RSDL.

Ingo
-

To: Ingo Molnar <mingo@...>
Cc: Nicholas Miell <nmiell@...>, Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 3:44 am

why isn't niceing X to -10 an acceptable option?

if you overload the box enough things slow down, what scheduler avoids that?

where RSDL 'regresses' is with multiple CPU hog running at once (more then the
number of real CPU's you have available) at the same priority, with one of them
being the X server process.

the initial report was that with X + 2 cpu hogs on 1.5 cpu's there's more of a
slowdown (even with a nice difference of 5 between X and the other processes)

the latest report is that with X and 11 cpu hogs and a nice difference of 10
things slow down (I don't remember the number of cpu's from this report)

how much of an overload should the scheduler adjust for? a load of 3xcpu's?
10x cpu's? what would be deemed acceptable?

if the key factor in a scheduler is to be able to run multiple CPU hogs at the
same time as the X process, why not just check the name of the process running,
and if it's X give it a nice boost?

while that would be easy to abuse, it would at least be predictable.

if the nice levels don't have enough of an effect, how much of an effect should
a given nice level have? (con has asked this several times, I haven't seen an
answer from anyone)

David Lang
-

To: David Lang <david.lang@...>
Cc: Ingo Molnar <mingo@...>, Nicholas Miell <nmiell@...>, Con Kolivas <kernel@...>, <ck@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 4:46 am

Xorg's priority is only part of the problem. Every client that needs a
substantial quantity of cpu while a hog is running will also need to be

(Hmm. What's overload in a multi-tasking multi-threaded world? I'm
always going to have more tasks available than cpus at some time. With

I see interactivity regression with both X and client at nice -10 in the
presence of any cpu hog load. Maybe a bug lurks. Maybe it's fairness.

-Mike

-

To: Mike Galbraith <efault@...>
Cc: David Lang <david.lang@...>, Al Boldi <a1426z@...>, <linux-kernel@...>, <ck@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Saturday, March 17, 2007 - 10:09 am

On Sat, 17 Mar 2007 09:46:27 +0100

I don't suppose you can be a bit more specific, and define how much CPU
constitutes a "substantial quantity"? It looks to me like X already got

I'm hoping that actually quantifying this issue will result in a better
understanding of the issue...

Thanks,

Mark
-

To: Mark Glines <mark-ck@...>
Cc: David Lang <david.lang@...>, Al Boldi <a1426z@...>, <linux-kernel@...>, <ck@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Saturday, March 17, 2007 - 10:33 am

This is a snippet from a hacked up by me version of RSDL.30, not stock.

-Mike

-

To: Mike Galbraith <efault@...>
Cc: David Lang <david.lang@...>, Al Boldi <a1426z@...>, <linux-kernel@...>, <ck@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Saturday, March 17, 2007 - 10:54 am

On Sat, 17 Mar 2007 15:33:41 +0100

Oops. Thanks, sorry about my confusion. What does it look like without
your patches? (I'm not sure if you've already sent this... I can't
find any in the list archives.)

Mark
-

To: Mark Glines <mark-ck@...>
Cc: David Lang <david.lang@...>, Al Boldi <a1426z@...>, <linux-kernel@...>, <ck@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Saturday, March 17, 2007 - 10:58 am

If you go to the beginning of the thread, I described the test load.

(I don't want to rehash... 'nuff turbulence)

-Mike

-

To: Ingo Molnar <mingo@...>
Cc: Nicholas Miell <nmiell@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, <ck@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>
Date: Saturday, March 17, 2007 - 4:41 am

Let me restate the fact, if it wasn't obvious enough, that most people
who tried RSDL (and most of them use desktop systems, me including) never
see any regressions compared to mainline. Quite contrary -- their impressions
were that with RSDL desktop system runs more smoothly, even under fierce load,
which was never possible with mainline scheduler.

(see http://article.gmane.org/gmane.linux.kernel/504068 for a list
of references.)
-

To: <ck@...>
Cc: Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 5:48 am

Well despite being in a drug induced stupor I find I have to reply on this
thread. Hopefully I'm not doing my code a disservice by doing so. Who knows,
maybe I make more sense?

The most frustrating part of a discussion of this nature on lkml is that
earlier information in a thread seems to be long forgotten after a few days
and all that is left is the one reporter having a problem. That's not to deny
the one user is having a problem, but when you have a thousand downloads (no
exaggeration) and only one person remains reporting badness it's frustrating
that the problem actually comes down to one of semantics rather than a bug
(will I nice or won't I).

So in an attempt to summarise the situation, what are the advantages of RSDL
over mainline.

Fairness
Starvation free
Much lower and bound latencies
Deterministic
Better interactivity for the majority of cases.

Now concentrating on the very last aspect since that seems to be the sticking
point.

I won't try and estimate what percentage is better, but overall it is _far_
more, _not_ less. The few scenarios that mainline remains better are
unpredictable. This is where it gets interesting, because unlike mainline
which does not have a good solution for the rest of the problems, all it
takes is to renice X and then you have RSDL outperforming virtually always.

As for SCHED_BATCH on mainline, I think you'll find it is NOT as deterministic
as you believe, leads to woeful interactivity, and still is starveable (just
sleep just before your timeslice runs out). That is not a valid solution I'm
sorry to say.

Despite the claims to the contrary, RSDL does not have _less_ heuristics, it
does not have _any_. It's purely entitlement based.

--
-ck
-

To: Con Kolivas <kernel@...>
Cc: <linux-kernel@...>
Date: Saturday, March 17, 2007 - 11:13 am

even starvation is sometimes a good thing - there's a place for processes
that only use the CPU if it is otherwise idle. that is, they are

in an average sense? also, under what circumstances does this actually
matter? (please don't offer something like RT audio on an overloaded machine-

not a bad thing, but how does this make itself apparent and of value
to the user? I think everyone is extremely comfortable with non-determinism

how is this measured? is this statement really just a reiteration of

nah, I think the fairness and latency claims are the real issues.
-

To: Mark Hahn <hahn@...>
Cc: Con Kolivas <kernel@...>, <linux-kernel@...>
Date: Monday, March 19, 2007 - 11:06 am

Just so you know the context, I'm coming at this from the point of view
of an embedded call server designer.

Fairness is good because it promotes predictability. See the

If you have nice 19 be sufficiently low priority, then the difference
between "using cpu if otherwise idle" and "gets a little bit of cpu even
if not totally idle" is unimportant.

In my environment, latency *matters*. If a packet doesn't get processed
in time, you drop it. With mainline it can be quite tricky to tune the
latency, especially when you don't want to resort to soft realtime
because you don't entirely trust the code thats running (because it came

Determinism is really important. It almost doesn't matter what the
behaviour is, as long as we can predict it. We model the system to
determine how to tweak the system (niceness, sched policy, etc.), as
well as what performance numbers we can advertise. If the system is
non-deterministic, it makes this modelling extremely difficult--you end
up having to give up significant performance due to worst-case spikes.

If the system is deterministic, it makes it much easier to predict its
actions.

Chris
-

To: Mark Hahn <hahn@...>
Cc: Con Kolivas <kernel@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 1:22 pm

I guess I wonder what is wrong with the current scheduler? I am running
2.6.20.2 on a whitebook
laptop with a core 2 duo 1.86ghz 2gb of mem with an intel 945 shared
memory graphics processor.
I am currently running X with beryl have vncserver connected to my main
system, which I am
using to write this email, I am running also firefox and both of ingo
test programs. I also am running
a make -j4 on a kernel rebuild without having any pauses or any problem
doing anything interactively.

So again what does this new scheduler fix?

Steve

--

"They that give up essential liberty to obtain temporary safety,
deserve neither liberty nor safety." (Ben Franklin)

"The course of history shows that as a government grows, liberty
decreases." (Thomas Jefferson)

-

To: Con Kolivas <kernel@...>
Cc: <ck@...>, Serge Belyshev <belyshev@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 7:49 am

RSDL still has heuristics very much, but this time it's hardcoded into
the design! Let me demonstrate this via a simple experiment.

in the vanilla scheduler, the heuristics are ontop of a fairly basic
(and fast) scheduler, they are plain visible and thus 'optional'. In
RSDL, the heuristics are still present but more hidden and more
engrained into the design.

But it's easy to demonstrate this under RSDL: consider the following two
scenarios, which implement precisely the same fundamental computing
workload (everything running on the same, default nice 0 level):

1) a single task runs almost all the time and sleeps about 1 msec every
100 msecs.

[ run "while N=1; do N=1; done &" under bash to create such a
workload. ]

2) tasks are in a 'ring' where each runs for 100 msec, sleeps for 1
msec and passes the 'token' around to the next task in the ring. (in
essence every task will sleep 9900 msecs before getting another run)

[ run http://redhat.com/~mingo/scheduler-patches/ring-test.c to
create this workload. If the 100 tasks default is too much for you
then you can run "./ring-test 10" - that will show similar effects.
]

Workload #1 uses 100% of CPU time. Workload #2 uses 99% of CPU time.
They both do in essence the same thing.

if RSDL had no heuristics at all then if i mixed #1 with #2, both
workloads would get roughly 50%/50% of the CPU, right? (as happens if i
mix #1 with #1 - both CPU-intense workloads get half of the CPU)

in reality, in the 'ring workload' case, RSDL will only give about _5%_
of CPU time to the #1 CPU-intense task, and will give 95% of CPU time to
the #2 'ring' of tasks. So the distribution of timeslices is
significantly unfair!

Why? Because RSDL still has heuristics, just elsewhere and more hidden:
in the "straightforward CPU intense task" case RSDL will 'penalize' the
task by depleting its quota for running nearly all the time, in the
"ring of tasks" case the 100 tasks will each run near ...

To: Ingo Molnar <mingo@...>
Cc: Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 4:41 pm

Well, the heuristic here is that process == job. I'm not sure heuristic
is the right name for it, but it does point out a deficieny.

A cpu-bound process with many threads will overwhelm a cpu-bound single
threaded threaded process.

A job with many processes will overwhelm a job with a single process.

A user with many jobs can starve a user with a single job.

I don't think the problem here is heuristics, rather that the
scheduler's manages cpu quotas at the task level rather than at the user
visible level. If scheduling were managed at all three hierarchies I
mentioned ('job' is a bit artificial, but process and user are not) then:

- if N users are contending for the cpu on a multiuser machine, each
should get just 1/N of available cpu power. As it is, a user can run a
few of your #1 workloads (or a make -j 20) and slow every other user down
- your example would work perfectly (if we can communicate to the kernel
what a job is)
- multi-threaded processes would not get an unfair advantage

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

-

To: Avi Kivity <avi@...>
Cc: Ingo Molnar <mingo@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 9:25 pm

I like this notion very much. I should probably mention pgrp's' typical
association with the notion of "job," at least as far as shells go.

One issue this raises is prioritizing users on a system, threads within
processes, jobs within users, etc. Maybe sessions would make sense, too,
and classes of users, and maybe whatever they call the affairs that pid
namespaces are a part of (someone will doubtless choke on the hierarchy
depth implied here but it doesn't bother me in the least). It's not a
deep or difficult issue. There just needs to be some user API to set the
relative scheduling priorities of all these affairs within the next higher
level of hierarchy, regardless of how many levels of hierarchy (aleph_0?).

-- wli
-

To: William Lee Irwin III <wli@...>
Cc: Ingo Molnar <mingo@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Sunday, March 18, 2007 - 1:00 am

I think it follows naturally.

Note that more than the scheduler needs to be taught about this. The
page cache and swapper should prevent a user from swapping out too many
of another user's pages when there is contention for memory; there
should be per-user quotas for network and disk bandwidth, etc. Until
then people who want true multiuser with untrusted users will be forced
to use ugly hacks like virtualization. Fortunately it seems the
container people are addressing at least a part of this.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

-

To: William Lee Irwin III <wli@...>
Cc: Avi Kivity <avi@...>, Ingo Molnar <mingo@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 9:32 pm

Doing some "classing" even by just euid might be a good idea. It would
actually catch X automatically most of the time, because the euid of the X
server is likely to be root, so even for the "trivial" desktop example, it
would kind of automatically mean that X would get about 50% of CPU time
even if you have a hundred user clients, just because that's "fair" by
euid.

Dunno. I guess a lot of people would like to then manage the classes,
which would be painful as hell.

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: William Lee Irwin III <wli@...>, Avi Kivity <avi@...>, Ingo Molnar <mingo@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Sunday, March 18, 2007 - 1:24 am

Warning: all these ideas seem interesting for desktop, but are definitely
not for servers. I found RSDL to be excellent on servers, compared to
mainline in which some services are starving under load. I can understand
that on the desktop people want some unfairness, and I like the pgrp idea
for instance. But this one will certainly fail on servers, or make the
admins get grey hair very soon.

Maybe we're all discussing the problem because we have reached the point
where we need two types of schedulers : one for the desktop and one for
the servers. After all, this is already what is proposed with preempt,
it would make sense provided they share the same core and avoid ifdefs
or unused structure members. Maybe adding OPTIONAL unfairness to RSDL
would help some scenarios, but in any case it is important to retain

Sure ! I wouldn't like people to point the finger on Linux saying "hey
look, they can't write a good scheduler so you have to adjust the knobs
yourself!". I keep in mind that Solaris' scheduler is very good, both
fair and interactive. FreeBSD was good (I haven't tested for a long time).
We should manage to get something good for most usages, and optimize
later for specific uses.

Regards,
Willy

-

To: Willy Tarreau <w@...>
Cc: Linus Torvalds <torvalds@...>, William Lee Irwin III <wli@...>, Avi Kivity <avi@...>, Ingo Molnar <mingo@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Al Boldi <a1426z@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Sunday, March 18, 2007 - 2:26 am

Bingo.

-

To: <linux-kernel@...>
Cc: Willy Tarreau <w@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, William Lee Irwin III <wli@...>, <ck@...>, Avi Kivity <avi@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Sunday, March 18, 2007 - 2:54 am

Sounds like Staircase's interactive mode switch, except this actually
requires writing additional code.

The per-user system would also be nice for servers, provided there are
CPU/disc IO/swapper/... quotas or priorities at least.

All in all, I'd hate to see mldonkey eating 1/3 of CPU time, just
because it runs as another user.
-

To: Radoslaw Szkodzinski <astralstorm@...>
Cc: <linux-kernel@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, William Lee Irwin III <wli@...>, <ck@...>, Avi Kivity <avi@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Sunday, March 18, 2007 - 3:58 am

This is too hard to adjust. Imagine what would happen to your hundreds of
apache processes when the "backup" user will start the rsync or tar+gzip,
or when user "root" will start rotating and compressing the logs. Being
able to group processes may be useful on servers, but it should be enabled
on purpose by the admin.

Willy

-

To: Willy Tarreau <w@...>
Cc: Radoslaw Szkodzinski <astralstorm@...>, <linux-kernel@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, William Lee Irwin III <wli@...>, <ck@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Sunday, March 18, 2007 - 4:45 am

Sure, if implemented, it should default to the old behavior to avoid
surprises. Maintenance jobs may have to be niced to avoid getting too
much cpu. But it should also make hosting different applications on the
the server much more predictable and easier (think of your hundreds of
apache processes swamping an unrelated load when slashdotted).

--
error compiling committee.c: too many arguments to function

-

To: Willy Tarreau <w@...>
Cc: Linus Torvalds <torvalds@...>, William Lee Irwin III <wli@...>, Avi Kivity <avi@...>, Ingo Molnar <mingo@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>, Bill Huey (hui) <billh@...>
Date: Sunday, March 18, 2007 - 2:09 am

Like I've said in a previous email, SGI schedulers have an interactive
term in addition to the normal "nice" values. If RSDL ends up being too
rigid for desktop use, then this might be a good idea to explore in
addition to priority manipulation.

However, it hasn't been completely proven that RSDL can't handle desktop
loads and that needs to be completely explored first. It certain seems
like, from the .jpgs that were posted earlier in the thread regarding mysql
performance, that RSDL seems to have improved performance for those set
ups so it's not universally the case that it sucks for server loads. The
cause of this performance difference has yet to be pinpointed.

Also, bandwidth scheduler like this are a new critical development for
things like the -rt patch. It would benefit greatly if the RSDL basic
mechanisms (RR and deadlines) were to somehow slip into that patch and
be used for a more strict -rt based scheduling class. It would be the basis
for first-class control over process resource usage and would be a first
in Linux or any mainstream kernel.

This would be a powerful addition to Linux as a whole and RSDL should
not be dismissed without these considerations. If it can somehow be
integrated into the kernel with interactivity concerns addressed, then
it would be an all out win for the kernel in both these areas.

bill

-

To: <linux-kernel@...>
Cc: <ck@...>
Date: Monday, March 19, 2007 - 5:14 pm

I would say that RSDL is probably a bit better than default for server
use, although if the server starves for CPU interactive processing at
the console becomes leisurely indeed. The only thing I would like to
address is the order of magnitude blips in latency of nice processes,
which may be solved by playing with time slices. Con hasn't really

I don't think that RSDL and -rt should be merged, but that's for Ingo
and Con to discuss. I would love to see RSDL in mainline as soon as it
I don't think there are a lot of places where it underperforms the
default scheduler, and it avoids a lot of jackpot cases where an
overloaded system really bogs down. I would like to see more varied
testing before any changes are made, unless a simple change would
improve consistency of latency.

--
Bill Davidsen <davidsen@tmr.com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

-

To: Bill Huey <billh@...>
Cc: Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, William Lee Irwin III <wli@...>, Avi Kivity <avi@...>, Ingo Molnar <mingo@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Al Boldi <a1426z@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Sunday, March 18, 2007 - 2:37 am

I've done that already (ain't perfect yet, maybe never be). The hard
part is making it automatic, and not ruining the good side of RSDL in
the process.

-Mike

-

To: Mike Galbraith <efault@...>
Cc: Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, William Lee Irwin III <wli@...>, Avi Kivity <avi@...>, Ingo Molnar <mingo@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Al Boldi <a1426z@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>, Bill Huey (hui) <billh@...>
Date: Sunday, March 18, 2007 - 3:35 am

I can't fully qualify what aspects of the X server that's creating this
problem. More experimentation is needed (various display drivers, etc...)
should be played with to see what kind of problematic situations arise.
It's a bit too new with too few users to know what are the specific
problems just yet. Your case is too sparse for it to be an completely
exhaustive exploration of what's failing with this scheduler.

There's a policy decision that needs to be made of whether adding another
term to the scheduler calcuation is blessed or not. My opinion is that
is should be. Meanwhile, we should experiment more with different
configurations.

bill

-

To: Willy Tarreau <w@...>
Cc: Linus Torvalds <torvalds@...>, William Lee Irwin III <wli@...>, Ingo Molnar <mingo@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Sunday, March 18, 2007 - 1:55 am

I didn't suggest adding any unfairness! I suggested being fair by
user/job/process instead of being fair by thread (which is actually
unfair as it favors multi threaded processes over single threaded

I hope not. I think that reducing the timeslice base, combined with
renicing X all the way to hell should suffice.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

-

To: Linux-Kernel@Vger. Kernel. Org <linux-kernel@...>
Date: Sunday, March 18, 2007 - 10:27 pm

Wouldn't that be unfair because it favors multi-user approaches over
single-user approaches with the same number of processes?

Consider two otherwise equivalent web server designs. They both use a helper
process owned by the user who owns the file the web server is sending. One
does a lot of work in the helper process, the other does very little. A
"fair by user" scheduler would give the approach that puts more work in the
helper process more CPU than the one that puts little work in the helper
process.

Being fair by user builds lots of assumptions into the scheduler. When
they're not true, the scheduler becomes sub-optimal. For example, consider a
web server that runs two very important tools, 'foo' and 'bar'. Rather than
running them as root, they run as users 'foo' and 'bar' for security. "Fair
to user" would mean that just because most other people are using 'foo', I
get less CPU when I try to use 'foo', because the OS doesn't know the "real
user", just the fake user who owns the process -- a security decision that
has no relationship to fairness. This would be handled perfectly by a "fair
to process" approach.

As for favoring multi-threaded processes over single-threaded processes,
sometimes that's what you want. Consider two servers, one using thread per
job the other using process per job. Does it make sense to give the "process
per job" server as much CPU to do a single task as the "thread per job"
server gets for all the clients it's dealing with?

It's really more important that the scheduler be tunable and predictable.
That way, we can tell it what we want and get it. But the scheduler cannot
read our minds.

DS

-

To: <davids@...>
Cc: Linux-Kernel@Vger. Kernel. Org <linux-kernel@...>
Date: Monday, March 19, 2007 - 11:25 am

[what happened to the 'To' header?]

A fairly contrived example, but I see your point. Of course any system
can be broken. I think that user-level scheduling is good for real
multi user systems, where 'user' means a person, not an artificial
entity. It's also good for a multi application server, where typically

Perhaps we need a scheduling class instead, defaulting to each user
being in its own class, and system processes in another (or maybe more
than one) class. Root can configure the relative priorities of the

That's why I wanted the job abstraction, which currently isn't
communicated to the kernel (or at least not well). Within a user's

For multiuser systems, it also has to provide predictable response to
unpredictable loads. RSDL accomplishes part of this by removing
heuristics. User-level scheduling does more by limiting the impact of a
single user to 1/N of the system's cpu capacity, and similarly limits
the impact of a single job to 1/number_of_active_jobs_for_this_user.

--
error compiling committee.c: too many arguments to function

-

To: Avi Kivity <avi@...>
Cc: <davids@...>, Linux-Kernel@Vger. Kernel. Org <linux-kernel@...>
Date: Monday, March 19, 2007 - 12:06 pm

For a not so contrived example, look at email delivery. Some mailservers do
all work as root (or some fixed email user)

Some servers will switch to the UID of the user receiving the message,
limiting the
damage in case of buffer overflow etc. A fair amount of work is then done
as that user - running the message through virus/spam-checks and
then perhaps procmail.

Helge Hafting
-

To: Helge Hafting <helge.hafting@...>
Cc: <davids@...>, Linux-Kernel@Vger. Kernel. Org <linux-kernel@...>
Date: Monday, March 19, 2007 - 12:37 pm

Actually that makes some sense with user level scheduling - delivering
email is charged to the recipient instead of to the system. But I agree
it's a surprising side effect and if this is ever implemented it should
be optional.

--
error compiling committee.c: too many arguments to function

-

To: <davids@...>
Cc: Linux-Kernel@Vger. Kernel. Org <linux-kernel@...>
Date: Monday, March 19, 2007 - 9:27 am

Not necessarily. Use GID rotations too.

Then, use a group quota. But checking that will be slower, and that

Not on desktop.

Typical multi-threaded workloads:
- apache
- some P2P clients
- some audio servers/applications (small number of threads)

^ ^ ^

Well, aren't we discussing desktops?

Not necessarily. You see, the processes themselves are schedulable,

This kind of scheduler, yes. Except it's much more tunable than a
simple fair or unfair scheduler, and much more suited to real-time

That's why the per-user or per-group part would have to be optional.
It just doesn't make much sense on single-user desktops.
Then, RSDL design or even RSDL+bonus could be used.

The bonus part would have to be really simple, e.g. priority
inheritance for pipes, startup priority boost for nice 0 tasks.
(warning - fork bombs. :P )
No sleep estimator. Maybe these would suffice?

The interactive bonus would be disabled by default, same as
per-user/per-group scheduling.

Some syscalls would have to be added, maybe using LSM framework?
-

To: Radoslaw Szkodzinski <astralstorm@...>
Cc: <davids@...>, Linux-Kernel@Vger. Kernel. Org <linux-kernel@...>
Date: Monday, March 19, 2007 - 2:30 pm

how many multi-user desktops are there? most desktops that I have seen run just
about everything as a single user. I know that on mine, I don't want the
updatedb process that runs as 'nobody' out of cron to have the same percentage
of cpu as all the processes running as my userid.

David Lang
-

To: Ingo Molnar <mingo@...>
Cc: <ck@...>, Serge Belyshev <belyshev@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 8:02 am

We're obviously disagreeing on what heuristics are so call it what you like.

You're simply cashing in on the deep pipes that do kernel work for other
tasks. You know very well that I dropped the TASK_NONINTERACTIVE flag from
rsdl which checks that tasks are waiting on pipes and you're exploiting it.
That's not the RSDL heuristics at work at all, but you're trying to make it
look like it is the intrinsic RSDL system at work. Putting that flag back in
is simple enough when I'm not drugged. You could have simply pointed that out
instead of trying to make my code look responsible.

For the moment I'll assume you're not simply trying to make my code look bad
and that you thought there really was an intrinsic design problem, otherwise
I'd really be unhappy with what was happening to me.

--
-ck
-

To: <ck@...>
Cc: Con Kolivas <kernel@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 8:23 am

Well, re-reading his post, he has a point - one WOULD expect each of these =
2=20
tasks to have an equal share of CPU, and if RSDL doesn't currently take thi=
s=20
pipe-thing into account, it might need some fixing. call it heuristics or n=
ot=20
(after all, how could one NOT say a scheduler uses heuristics of some kind?=
).

Anyway, relax (you know getting angry won't help you getting better) and=20
remember this is email - not exactly a perfet way to communicate, esp in th=
e=20
emotional area. I haven't said this anywhere else, as I'm waiting for RSDL =
to=20
be a bit more mature, but I have irritations with it as well - I don't have=
=20
the long full-system stalls I had with staircase (hail RSDL!) but I do have=
=20
more frequent, shorter stalls, when one app doesn't respond for up to 10=20
seconds, while others just continue to work. So it's not perfect yet, and=20
when I have time, I'll try to find out what's wrong. BTW, nice seems to hel=
p,=20
but not entirely.

grtz

Jos

To: Linux-Kernel@Vger. Kernel. Org <linux-kernel@...>
Date: Saturday, March 17, 2007 - 1:31 pm

I can't get myself to expect that no matter how hard I try. The scheduler
is, by design, fair to *tasks*. So why would one expect two different
approaches that do the same thing, but with different numbers of tasks, to
each get the same amount of CPU if they compete with each other? One would
expect the approach that uses the most tasks to get more CPU.

DS

-

To: Con Kolivas <kernel@...>
Cc: <ck@...>, Serge Belyshev <belyshev@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 8:28 am

that could very well be so - it would be helpful if you could provide
your own rough definition for the term, so that we can agree on how to
call things?

[ in any case, there's no rush here, please reply at your own pace, as

Con, i am not 'cashing in' on anything and i'm not 'exploiting'
anything. The TASK_NONINTERACTIVE flag is totally irrelevant to my
argument because i was not testing the vanilla scheduler, i was testing
RSDL. I could have written this test using plain sockets, because i was
testing RSDL's claim of not having heuristics, i was not testing the
vanilla scheduler.

and i showed you a workload under _RSDL_ that clearly shows that RSDL is
an unfair scheduler too.

my whole point was to counter the myth of 'RSDL has no heuristics'. Of
course it has heuristics, which results in unfairness. (If it didnt have
any heuristics that tilt the balance of scheduling towards sleep-intense
tasks then a default Linux desktop would not be usable at all.)

so the decision is _not_ a puristic "do we want to have heuristics or
not", the question is a more practical "which heuristics are simpler,
which heuristics are more flexible, which heuristics result in better
behavior".

Ingo
-

To: Ingo Molnar <mingo@...>
Cc: <ck@...>, Serge Belyshev <belyshev@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 8:43 am

Ok but please look at how it appears from my end (illness aside).

I spend 3 years just diddling with scheduler code trying my hardest to find a
design that fixes a whole swag of problems we still have, and a swag of
problems we might get with other fixes.

You initially said you were pleased with this design.

..lots of code, testing, bugfixes and good feedback.

Then Mike has one testcase that most other users disagree is worthy of being
considered a regresssion. You latched onto that and basically called it a
showstopper in spite of who knows how many other positive things.

Then you quickly produce a counter patch designed to kill off RSDL with a
config option for mainline.

Then you boldly announce on LKML "is RSDL an "unfair" scheduler too?" with
some test case you whipped up to try and find fault with the design.

What am I supposed to think? Considering just how many problems I have
addressed and tried to correct with RSDL succesfully I'm surprised that
despite your enthusiasm for it initially you have spent the rest of the time
trying to block it.

Please, either help me (and I'm in no shape to code at the moment despite what
I have done so far), or say you have no intention of including it. I'm
risking paralysis just by sitting at the computer right now so I'm dropping
the code as is at the moment and will leave it up to your better judgement as
to what to do with it.

--
-ck
-

To: Con Kolivas <kernel@...>
Cc: Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, <ck@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 10:13 pm

No damn it! He's pointing out that you do have heuristics, they are just
built into the design. And of course he's whipping up test cases, how
else can anyone help you find corner cases where it behaves in an
unexpected or undesirable manner?

Actually I think Ingo has tried to help get it in, that's his patch
offered for CONFIG_SCHED_FAIR, lets people try it and all.

Now for something constructive... by any chance is Mike running KDE
instead of GNOME? I only had a short time to play because I had to look
at another problem in 2.6.21-rc3 (nbd not working), so the test machine
is in use. But it looked as if behavior was not as smooth with KDE. May
that thought be useful.

--
Bill Davidsen <davidsen@tmr.com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
-

To: Bill Davidsen <davidsen@...>
Cc: Con Kolivas <kernel@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, <linux-kernel@...>, <ck@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Sunday, March 18, 2007 - 1:37 am

Yes.

-Mike

-

To: <ck@...>
Cc: Mike Galbraith <efault@...>, Bill Davidsen <davidsen@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Sunday, March 18, 2007 - 6:58 am

Well, then, it might indeed be the KIOslave/pipe stuff. I experience someti=
mes=20
horrible behaviour in certain apps who use a lot of KIO slaves (konqueror,=
=20
kontact). Is there a solution to this? Cuz if there isn't, the majority of=
=20
the Linux Desktops are going to regress with RSDL...

=2D-=20
Disclaimer:

Alles wat ik doe denk en zeg is gebaseerd op het wereldbeeld wat ik nu heb.=
=20
Ik ben niet verantwoordelijk voor wijzigingen van de wereld, of het beeld w=
at=20
ik daarvan heb, noch voor de daaruit voortvloeiende gedragingen van mezelf.=
=20
Alles wat ik zeg is aardig bedoeld, tenzij expliciet vermeld.

To: Bill Davidsen <davidsen@...>
Cc: Con Kolivas <kernel@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, <ck@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 11:20 pm

Now i must say here, i use KDE, and have been testing 0.31, and i have
been observing all the effects, in contrast with vanilla and staircase.
this is on 2.6.20

I do not notice kde being slower, in fact i notice various interactivity
"speedups" compared to mainline.

first one i noticed(because i deliberately tested) was kicker. Kickers
hide function was very very smooth during boot, and still is under load.
It is not entirely as smooth under vanilla during boot(i suspect IO
issue), under staircase it is, however under huge loads, it even is not
smooth under staircase.

then i started my konsole, and the one thing i immediately noticed was
that zsh started instantly, usually i can see/(feel) zsh starting, as in
it takes like 0.2 before my prompt comes. This is simply gone now with
rsdl, behavior used to be the same in vanilla/rsdl.

But the most interresting, and dare i say, completely unexpected things
are much more important.

I have for a long time had issues with tvtime, if i did stuff like move
windows, tvtime would drop frames, or simply hovering javascript stuff
on sites in konqueror, (this seemed to be introduced in 2.6.~5+), cause
in EARLY 2.6 i did not have this problem, but it was the same in
staircase and vanilla. But this is gone completely, tvtime no longer
drops any frames when doing this.

Another thing i noticed, which almost blew my mind as badly as with
tvtime, was with wine, and world of warcraft(and nvidia blob driver, but
this IS what many "desktop" users runs). While loading a level, the
sound no longer skipped. This problem afaik, EVERYBODY which runs wine
+wow has(unless they change the buffer size to ridicoulesly high which
annoys gameplay).

And more playing wow has shown me that rsdl seems to be doing an
extremely good job of not letting other tasks interfere. For example i
have spamasassin going quite very often (every minute, for lots of
accounts), and this usually kills all sorts of high performance opengl
stuff, causing severe stuttering, b...

To: Con Kolivas <kernel@...>
Cc: Serge Belyshev <belyshev@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 12:34 pm

( i really think we should continue this debate after you get better.

I said that 2 years ago about the staircase scheduler and i am still
saying this about RSDL today. That doesnt make my position automatically
correct though :-) For example i wrote and maintained the 4g:4g patchset
for over 2 years and still that was no guarantee of it making sense
upstream ;) And it was a hell of a lot of work (much uglier and nastier
work than any scheduler hacking and tuning, believe me), and it was
thrown away as a cute but unnecessary complication we dont need. So
what? To me what matters is the path you walk, not the destination you
reach.

in terms of RSDL design, i like it, still i'm kind of asking myself
'couldnt something in this direction be done in a much simpler and less
revolutionary way'? For example couldnt we introduce per-priority level
timeslice quotas in the current scheme as well, instead of the very
simplistic and crude STARVATION_LIMIT approach? Furthermore, couldnt we
make the timeslices become smaller as the runqueue length increases, to
make starvation less of a problem? It seems like the problem cases with
the current scheduler arent so much centered around the interactivity
estimator, it is more that timeslices get distributed too coarsely,
while RSDL distributes timeslices in a more finegrained way and is thus
less suspect to starvation under certain workloads.

in any case, regardless the technical picture, i do get nervous when a
group of people tries to out-shout clearly valid feedback like Mike's,
and i'll try to balance such effects out a bit. _Of course_ those people
that are not happy with the current scheduler have a higher likelyhood
to try out another scheduler and be happy about it - but we should not
allow that natural bias (which, btw., could easily be the _truth_, so
i'm not assuming anything) to stand in the way of critical thinking. I
also get slightly nervous about what appears to be dubious technical
claims like "this ...

To: Ingo Molnar <mingo@...>
Cc: Con Kolivas <kernel@...>, Serge Belyshev <belyshev@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <vishpat@...>
Date: Saturday, March 17, 2007 - 11:23 pm

Yes. The "doorknob scheduler" was a scheduler which worked as follows:
the runnable processes in the system were put in a priority sorted list
and counted. Then the length of one cycle (turn) was divided by the
number of processes and that was the timeslice. In the next version
upper and lower limits were put on the length of a timeslice, so the
system didn't get eaten by context switches under load or be jerky under
light load. That worked fairly well.

Then people got into creating unfairness to address what they thought
were corner cases, the code turned into a plumber's nightmare, and
occasional jackpot cases created occasional (non-reproducible) hangs.

Finally management noted that the peripherals cost three times as much
as the CPU, so jobs doing i/o should run first. That made batch run like
the clappers of hell, and actually didn't do all the bad things you
might expect. User input was waitio, disk was waitio, response was about
as good as it could be for that hardware.

===> it might be useful to give high priority to a process going from
waitio to runable state, once only.

The "doorknob" term came from "everybody gets a turn" and the year was 1970.

>
> A job with many processes will overwhelm a job with a single process.
>
> A user with many jobs can starve a user with a single job.
>
> - multi-threaded processes would not get an unfair advantage

If we wanted to do this, a job would be defined as all children or
threads of the oldest parent process with a PPID of one. So if I logged
on and did
make -j4
on a kernel, and someone else did:
find /var -type f | xargs grep -l zumblegarfe
and someone else was doing:
foo & mumble & barfe

We would all be equal. That's good! And there would be some recursive
scheduler which would pick a "job" and then a process, and run it. That
too is good!

But we have a mail server, and there are 671 threads with a socket and
POP3 user on each one, and they only ge...

To: <ck@...>
Cc: Ingo Molnar <mingo@...>, Con Kolivas <kernel@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 8:15 am

Isn't RSDL fair to each task? So each of the 101 tasks (1 and 2) gets an eq=
ual=20
share... That doesn't sound unfair to me. if the current scheduler manages =
to=20
give the single task 10 times more cpu to task one, that wouldn't be fair.=
=20

I don't see RSDL having heuristics here - it just gives each task an equal=
=20

I guess I just don't get it, would the current kernel give 50% cpu to the=20
single thread, and 0,5% to each of the 100 other threads?!? How would it do=
=20
that? Does it schedule a process with several threads equal to a process=20
having 1 thread, so each gets an equal share of cpu?

To: Con Kolivas <kernel@...>
Cc: <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 5:58 am

One? I'm not the only person who reported regression.

-Mike

-

To: Mike Galbraith <efault@...>
Cc: Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Monday, March 19, 2007 - 12:03 pm

Ditto here. I'm not the only one after all!
Reverted back to stock scheduler, and desktop is interactive again.

-ml
-

To: Mike Galbraith <efault@...>
Cc: Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 4:55 pm

Mike, I'm not saying RSDL is perfect, but v0.31 is by far better than
mainline. Try this easy test:

startx with the vesa driver
run reflect from the mesa5.0-demos
load 5 cpu-hogs
start moving the mouse

On my desktop, mainline completely breaks down, and no nicing may rescue.

On RSDL, even without nicing, the desktop is at least useable.

What we need is constructive criticism to improve the situation, either with
mainline or with RSDL. And for now RSDL is better.

Thanks!

--
Al

-

To: Al Boldi <a1426z@...>
Cc: Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Monday, March 19, 2007 - 12:07 pm

I use a simpler, far more common (for lkml participants) workload:

Dell notebook, single P-M-2GHz, ATI X300, open source X.org:
(1) build a kernel in one window with "make -j$((NUMBER_OF_CPUS + 1))".
(2) try to read email and/or surf in Firefox/Thunderbird.

Stock scheduler wins easily, no contest.

Cheers
-

To: Mark Lord <lkml@...>
Cc: Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Monday, March 19, 2007 - 4:53 pm

Try this on RSDL:

--- sched.bak.c 2007-03-16 23:07:23.000000000 +0300
+++ sched.c 2007-03-19 23:49:40.000000000 +0300
@@ -938,7 +938,11 @@ static void activate_task(struct task_st
(now - p->timestamp) >> 20);
}

- p->quota = rr_quota(p);
+ /*
+ * boost factor hardcoded to 5; adjust to your liking
+ * higher means more likely to DoS
+ */
+ p->quota = rr_quota(p) + (((now - p->timestamp) >> 20) * 5);
p->prio = effective_prio(p);
p->timestamp = now;
__activate_task(p, rq);

Thanks!

--
Al

-

To: Al Boldi <a1426z@...>
Cc: <linux-kernel@...>
Date: Tuesday, March 20, 2007 - 3:50 pm

i've tried this and it lasted only a few minutes -- i was seeing
mouse cursor stalls lasting almost 1s. i/o bound tasks starving X?
After reverting the patch everything is smooth again.

artur

-

To: Artur Skawina <art_k@...>
Cc: <linux-kernel@...>
Date: Wednesday, March 21, 2007 - 12:15 am

This patch wasn't really meant for production, as any sleeping background
proc turned cpu-hog may DoS the system.

If you like to play with this, then you probably want to at least reset the
quota in its expiration.

Thanks!

--
Al

-

To: Al Boldi <a1426z@...>
Cc: <linux-kernel@...>, Con Kolivas <kernel@...>
Date: Wednesday, March 21, 2007 - 1:24 pm

well, the problem is that i can't reproduce the problem :) I tried
the patch because i suspected it could introduce regressions, and it
did. Maybe with some tuning a reasonable compromise could be found,
but first we need to know what to tune for... Does anybody have a
simple reproducible way to show the scheduling regressions of RSDL
vs mainline? ie one that does not involve (or at least is
independent of) specific X drivers, binary apps etc. Some reports
mentioned MP, is UP less susceptible?

I've now tried a -j2 kernel compilation on UP and in the not niced
case X interactivity suffers, which i guess is to be expected when
you have ~5 processes competing for one cpu (2*(cc+as)+X). "nice -5"
helps a bit, but does not eliminate the effect completely. Obviously
the right solution is to nice the makes, but i think the scheduler
could do better, at least in the case of almost idle X (once you
start moving windows etc it becomes a cpuhog just as the the
compiler). I'll look into this, maybe there's a way to prioritize
often sleeping tasks which can not be abused.
Another thing is the nice levels; right now "nice -10" means ~35%
and "nice -19" gives ~5% cpu; that's probably 2..5 times too much.

artur
-

To: Mark Lord <lkml@...>
Cc: Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Monday, March 19, 2007 - 12:26 pm

What happens when you renice X ?

Xav

-

To: Xavier Bestel <xavier.bestel@...>
Cc: Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Monday, March 19, 2007 - 12:36 pm

Dunno -- not necessary with the stock scheduler.
Nicing the "make" helped with RSDL, though.
But again, the stock scheduler "just works" in that regard.

I agree with Ingo -- the auto-renice feature is something
very useful for desktops, and is missing from RSDL.

Cheers
-

To: Mark Lord <lkml@...>
Cc: Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Monday, March 19, 2007 - 12:43 pm

Could you try something like renice -10 $(pidof Xorg) ?

Xav

-

To: Xavier Bestel <xavier.bestel@...>
Cc: Mark Lord <lkml@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Monday, March 19, 2007 - 11:11 pm

Could you try something as simple and accepting that maybe this is a
problem?

Quite frankly, I was *planning* on merging RSDL very early after 2.6.21,
but there is one thing that has turned me completely off the whole thing:

- the people involved seem to be totally unwilling to even admit there
might be a problem.

This is like alcoholism. If you cannot admit that you might have a
problem, you'll never get anywhere. And quite frankly, the RSDL proponents
seem to be in denial ("we're always better", "it's your problem if the old
scheduler works better", "just one report of old scheduler being better").

And the thing is, if people aren't even _willing_ to admit that there may
be issues, there's *no*way*in*hell* I will merge it even for testing.
Because the whole and only point of merging RSDL was to see if it could
replace the old scheduler, and the most important feature in that case is
not whether it is perfect, BUT WHETHER ANYBODY IS INTERESTED IN TRYING TO
FIX THE INEVITABLE PROBLEMS!

See?

Can you people not see that the way you're doing that "RSDL is perfect"
chorus in the face of people who report problems, you're just making it
totally unrealistic that it will *ever* get merged.

So unless somebody steps up to the plate and actually *talks* about the
problem reports, and admits that maybe RSDL will need some tweaking, I'm
not going to merge it.

Because there is just _one_ thing that is more important than code - and
that is the willingness to fix the code...

Linus
-

To: Linus Torvalds <torvalds@...>
Cc: Xavier Bestel <xavier.bestel@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Tuesday, March 20, 2007 - 9:22 am

Not to mention that it seems to only be tested thus far
by a very vocal and supportive core. It needs much wider
exposure for much longer before risking it in mainline.
It likely will get there, eventually, just not yet.

I've droppped it from my machine -- interactive response is much
more important for my primary machine right now.

I believe Ingo's much simpler hack produces as good/bad results
as this RSDL thingie, and with one important extra:
it can be switched on/off at runtime.

----->forwarded message:

Subject: [patch] CFS scheduler: Completely Fair Scheduler
From: Ingo Molnar <mingo@elte.hu>

add the CONFIG_SCHED_FAIR option (default: off): this turns the Linux
scheduler into a completely fair scheduler for SCHED_OTHER tasks: with
perfect roundrobin scheduling, fair distribution of timeslices combined
with no interactivity boosting and no heuristics.

a /proc/sys/kernel/sched_fair option is also available to turn
this behavior on/off.

if this option establishes itself amongst leading distributions then we
could in the future remove the interactivity estimator altogether.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
include/linux/sched.h | 1 +
kernel/Kconfig.preempt | 9 +++++++++
kernel/sched.c | 8 ++++++++
kernel/sysctl.c | 10 ++++++++++
4 files changed, 28 insertions(+)

Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -119,6 +119,7 @@ extern unsigned long avenrun[]; /* Load
load += n*(FIXED_1-exp); \
load >>= FSHIFT;

+extern unsigned int sched_fair;
extern unsigned long total_forks;
extern int nr_threads;
DECLARE_PER_CPU(unsigned long, process_counts);
Index: linux/kernel/Kconfig.preempt
===================================================================
--- linux.orig/kernel/Kconfig.preempt
+++ linux/kernel/Kconfig.preempt
@@ -63,3 +63,12 @@ config PREE...

To: Mark Lord <lkml@...>
Cc: Linus Torvalds <torvalds@...>, Xavier Bestel <xavier.bestel@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Tuesday, March 20, 2007 - 11:16 am

Help out with a data point? Are you running KDE as well? If you are,
then it looks like the common denominator that RSDL is handling poorly
is client-server communication. (KDE's KIO slaves in this case, but X
in general.)

If so, one would hope that a variation on Linus's 2.5.63 pipe wakeup
pass-the-interactivity idea could work here. The problem with that
original patch, IIRC, was that a couple of tasks could bounce their
interactivity bonus back and forth and thereby starve others. Which
might be expected given there was no 'decaying' of the interactivity
bonus, which means you can make a feedback loop.

Anyway, looks like processes that do A -> B -> A communication chains
are getting penalized under RSDL. In which case, perhaps I can make a
test case that exhibits the problem without having to have the same
graphics card or desktop as you.

Ray
-

To: <ray-gmail@...>
Cc: Mark Lord <lkml@...>, Linus Torvalds <torvalds@...>, Xavier Bestel <xavier.bestel@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Wednesday, March 21, 2007 - 4:55 am

im not experiencing any problems with KDE. if anything ktorrent seems to
be going a teeny tiny bit smoother, though its nothing i can back up
with data.

now i havent tested ALL kioslaves yet, but stuff like sftp, fish, tar

-

To: <ray-gmail@...>
Cc: Linus Torvalds <torvalds@...>, Xavier Bestel <xavier.bestel@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Tuesday, March 20, 2007 - 11:20 am

Yes, KDE.
-

To: <ck@...>
Cc: Linus Torvalds <torvalds@...>, Xavier Bestel <xavier.bestel@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, Mark Lord <lkml@...>, Nicholas Miell <nmiell@...>
Date: Tuesday, March 20, 2007 - 6:26 am

Con simply isn't available right now, but you're right. RSDL isn't ready ye=
t,=20
imho, there seem to be some regressions (and I'm bitten by them, too). But =
if=20
con's past behaviour says anything about how he's going to behave in the=20
future (and according to my psych prof it's the most reliable predictor ;-)=
),=20
I'm pretty sure he'll jump on this when he's healthy again. He's gone throu=
gh=20
great lengths to fix problems with staircase, no matter how obscure, so I s=
ee=20
no reason why he wouldn't do the same for RSDL... Though scheduler problems=
=20

=2D-=20
Disclaimer:

Alles wat ik doe denk en zeg is gebaseerd op het wereldbeeld wat ik nu heb.=
=20
Ik ben niet verantwoordelijk voor wijzigingen van de wereld, of het beeld w=
at=20
ik daarvan heb, noch voor de daaruit voortvloeiende gedragingen van mezelf.=
=20
Alles wat ik zeg is aardig bedoeld, tenzij expliciet vermeld.

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html

To: Linus Torvalds <torvalds@...>
Cc: Xavier Bestel <xavier.bestel@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Tuesday, March 20, 2007 - 2:11 am

Linus, you're unfair with Con. He initially was on this position, and lately
worked with Mike by proposing changes to try to improve his X responsiveness.
But he's ill right now and cannot touch the keyboard, so only his supporters
speak for him, and as you know, speech is not code and does not fix problems.

Leave him a week or so to relieve and let's see what he can propose. Hopefully
a week away from the keyboard will help him think with a more general approach.
Also, Mike has already modified the code a bit to get better experience.

Also, while I don't agree with starting to renice X to get something usable,
it seems real that there's something funny on Mike's system which makes it
behave particularly strangely when combined with RSDL, because other people
in comparable tests (including me) have found X perfectly smooth even with
loads in the tens or even hundreds. I really suspect that we will find a bug
in RSDL which triggers the problem and that this fix will help discover
another problem on Mike's hardware which was not triggered by mainline.

Regards,
Willy

-

To: Willy Tarreau <w@...>
Cc: Xavier Bestel <xavier.bestel@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Tuesday, March 20, 2007 - 11:31 am

I was not actually so much speaking about Con, as about a lot of the
tone in general here. And yes, it's not been entirely black and white. I
was very happy to see the "try this patch" email from Al Boldi - not
because I think that patch per se was necessarily the right fix (I have no
idea), but simply because I think that's the kind of mindset we need to
have.

Not a lot of people really *like* the old scheduler, but it's been tweaked
over the years to try to avoid some nasty behaviour. I'm really hoping
that RSDL would be a lot better (and by all accounts it has the potential
for that), but I think it's totally na

To: Linus Torvalds <torvalds@...>
Cc: Willy Tarreau <w@...>, Xavier Bestel <xavier.bestel@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Wednesday, March 28, 2007 - 7:43 pm

May I suggest that if you want proper testing that it not only should be
a config option but a boot time option as well? Otherwise people will be
comparing an old scheduler with an RSDL kernel, and they will diverge as
time goes on.

More people would be willing to reboot and test on a similar load than
will keep two versions of the kernel around. And if you get people
testing RSDL against a vendor kernel which might be hacked, it will be
even less meaningful.

Please consider the benefits of making RSDL the default scheduler, and
leaving people with the old scheduler with an otherwise identical kernel
as a fair and meaningful comparison.

--
Bill Davidsen <davidsen@tmr.com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

-

To: <linux-kernel@...>
Cc: Willy Tarreau <w@...>, Xavier Bestel <xavier.bestel@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>
Date: Wednesday, March 21, 2007 - 4:22 am

Another data point: I'm getting stalls in mplayer. I'm assuming the stalls
occur when procmail runs messages through spamprobe, as the system is
otherwise idle. The stalls continue to occur (and I'm not sure that they
aren't worse) when X and/or mplayer are reniced to negative nice levels.

This is on a dual core amd64 system running 2.6.20.3 with rsdl 0.31.
Admittedly I'm also running the nvidia binary driver with X.

--
The universe hates you, but don't worry - it's nothing personal.
-

To: Linus Torvalds <torvalds@...>
Cc: Willy Tarreau <w@...>, Xavier Bestel <xavier.bestel@...>, Mark Lord <lkml@...>, Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Tuesday, March 20, 2007 - 2:08 pm

Well, it wasn't really meant as a fix, but rather to point out that
interactivity boosting is possible with RSDL.

It probably needs a lot more work, but just this one-liner gives an

Aside from ia boosting, I think fixed latencies per nice levels may be
desirable, when physically possible, to allow for more deterministic

Agreed.

Thanks!

--
Al

-

To: Willy Tarreau <w@...>
Cc: Linus Torvalds <torvalds@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Tuesday, March 20, 2007 - 5:03 am

X looks very special to me: it's a big userspace driver, the primary
task handling user interaction on the desktop, and on some OS the part
responsible for moving the mouse pointer and interacting with windows is
even implemented as an interrupt handler, and that for sure provides for
smooth user experience even on very low-end hardware. Why not compensate
for X design by prioritizing it a bit ?
If RSDL + reniced X makes for a better desktop than sotck kernel + X, on
all kind of workloads, it's good to know.

-

To: Xavier Bestel <xavier.bestel@...>
Cc: Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Wednesday, March 21, 2007 - 3:50 am

there were multiple attempts with renicing X under the vanilla
scheduler, and they were utter failures most of the time. _More_ people
complained about interactivity issues _after_ X has been reniced to -5
(or -10) than people complained about "nice 0" interactivity issues to
begin with.

The vanilla scheduler's auto-nice feature rewards _behavior_, so it gets
X right most of the time. The fundamental issue is that sometimes X is
very interactive - we boost it then, there's lots of scheduling but nice
low latencies. Sometimes it's a hog - we penalize it then and things
start to batch up more and we get out of the overload situation faster.
That's the case even if all you care about is desktop performance.

no doubt it's hard to get the auto-nice thing right, but one thing is
clear: currently RSDL causes problems in areas that worked well in the
vanilla scheduler for a long time, so RSDL needs to improve. RSDL should
not lure itself into the false promise of 'just renice X statically'. It
wont work. (You might want to rewrite X's request scheduling - but if so
then i'd like to see that being done _first_, because i just dont trust
such 10-mile-distance problem analysis.)

Ingo
-

To: Linux-Kernel@Vger. Kernel. Org <linux-kernel@...>
Date: Wednesday, March 21, 2007 - 6:43 am

Unfortunately, nicing X is not going to work. It causes X to pre-empt any
local process that tries to batch requests to it, defeating the batching.
What you really want is X to get scheduled after the client pauses in
sending data to it or has sent more than a certain amount. It seems kind of
crazy to put such login in a scheduler.

Perhaps when one process unblocks another, you put that other process at the
head of the run queue but don't pre-empt the currently running process. That
way, the process can continue to batch requests, but X's maximum latency

I am hopeful that there exists a heuristic that both improves this problem
and is also inherently fair. If that's true, then such a heuristic can be
added to RSDL without damaging its properties and without requiring any
special settings. Perhaps longer-term latency benefits to processes that
have yielded in the past?

I think there are certain circumstances, however, where it is inherently
reasonable to insist that 'nice' be used. If you want a CPU-starved task to
get more than 1/X of the CPU, where X is the number of CPU-starved tasks,
you should have to ask for that. If you want one CPU-starved task to get
better latency than other CPU-starved tasks, you should have to ask for
that.

Fundamentally, the scheduler cannot do it by itself. You can create cases
where the load is precisely identical and one person wants X and another
person wants Y. The scheduler cannot know what's important to you.

DS

-

To: <davids@...>
Cc: Linux-Kernel@Vger. Kernel. Org <linux-kernel@...>
Date: Wednesday, March 28, 2007 - 7:37 pm

I agree for giving a process more than a fair share, but I don't think
"latency" is the best term for what you describe later. If you think of
latency as the time between a process unblocking and the time when it
gets CPU, that is a more traditional interpretation. I'm not really sure
latency and CPU-starved are compatible.

I would like to see processes at the head of the queue (for latency)
which were blocked for long term events, keyboard input, network input,
mouse input, etc. Then processes blocked for short term events like
disk, then processes which exhausted their time slice. This helps
latency and responsiveness, while keeping all processes running.

--
Bill Davidsen <davidsen@tmr.com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
-

To: Linux-Kernel@Vger. Kernel. Org <linux-kernel@...>
Date: Thursday, March 29, 2007 - 3:10 am

For CPU-starvation, I think 'nice' is always going to be the fix. If you
want a process to get more than its 'fair share' of the CPU, you have to ask
for that. I think the scheduler should be fair by default.

However, cleverness in the scheduler with latency can make things better
without being unfair to anyone. It's perfectly fair for a task that has been
blocked for awhile to pre-empt a CPU-limited task when it unblocks.

What I'm arguing is that if your task is CPU-limited and the scheduler is
fair, that's your fault -- nice it. If your task is suffering from poor
latency, and it's using less than its fair share of the CPU (because it is
not CPU-limited), that is something the scheduler can be smarter about.

Two things that I think can help improve interactivity without breaking
fairness are:

1) Keep a longer-term history of tasks that have yielded the CPU so that
they can be more likely to pre-empt when they are unblocked by I/O. (The
improved accounting accuracy may go a long way towards doing this. I
personally like exponential decay measurements of CPU usage.)

2) Be smart about things like pipes. When one process unblocks another
through a pipe, socket, or the like, do not pre-empt (this defeats batching
and blows out caches needlessly), but do try to schedule the unblocked
process soon. Don't penalize one process for unblocking another, that's a
good thing for it to do.

I believe that the process of making schedulers smarter and fairer (and
fixing bugs in them) will get us to a place where interactivity is superb
without sacrificing fairness among tasks at equal static priority.

Honestly, I have always been against aggressive pre-emption. I think as CPUs
get faster and timeslices get shorter, it makes less and less sense. In many
cases you are better off just making the task ready-to-run and allowing its
higher dynamic priority to make it next. I strongly believe this for cases
where the running task unblocked the other task. (I think in too many cases,
you bl...

To: <davids@...>
Cc: Linux-Kernel@Vger. Kernel. Org <linux-kernel@...>
Date: Thursday, March 29, 2007 - 3:34 am

Agreed. That's what I've been saying for years (since early 2.6 when we had

I think scheduler timeslices actually shouldn't really be getting shorter.
While I found it is quite easy to get good interactivity with a pretty
dumb scheduler and tiny timeslices (at least until load ramps up enough
that the "off-time" for your critical processes builds up too much), I
think we want to aim for large timeslices. CPU caches are still getting
bigger, and I don't think misses are getting cheaper (especially if you
consider multi core). Also, the energy cost of a memory access is much
higher even if hardware or software is able to hide the latency.

--
SUSE Labs, Novell Inc.
-

To: Xavier Bestel <xavier.bestel@...>
Cc: Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Tuesday, March 20, 2007 - 8:31 am

No, running X at a different priority than its clients is not really
a good idea. If it isn't immediately obvious why try something like
this:

mkdir /tmp/tempdir
cd /tmp/tempdir
for i in `seq -w 1 10000` ; do touch
longfilenamexxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx$i
; done
nice --20 xterm &
xterm &
nice -20 xterm &

then do "time ls -l ." in each xterm.

This is what i get on UP 2.6.20+RSDL.31 w/ X at nice 0:
-20: 0m0.244s user 0m0.156s system 0m3.113s elapsed 12.84% CPU
0: 0m0.216s user 0m0.168s system 0m2.801s elapsed 13.70% CPU
19: 0m0.188s user 0m0.196s system 0m3.268s elapsed 11.75% CPU

I just made this simple example up and it doesn't show the problem
too well, but you can already see the ~10% performance drop. It's
actually worse in practice, because for some apps the increased
amount of rendering is clearly visible; text areas scroll
line-by-line, content is incrementally redrawn several times etc.
This happens because an X server running at a higher priority than a
client will often get scheduled immediately after some x11 traffic
arrives; when the process priorities are equal usually the client
gets a chance to supply some more data. IOW by renicing the server
you make X almost synchronous.

This isn't specific to RSDL - it happens w/ any cpu scheduler; and
while the effects of less extreme prio differences (ie -5 instead of
-20 etc) may be less visible i also doubt they will help much.

A better approach to X interactivity might be allowing the server to
use (part of) the clients timeslice, but it's not trivial -- you'd
only want to do that when the client is waiting for a reply and you
almost never want to preempt the client just because the server
received some data.

As to RSDL - it seems to work great for desktop use and feels bette...

To: <unlisted-recipients@...>, <@...>, <UNEXPECTED_DATA_AFTER_ADDRESS@...>
Cc: Con Kolivas <kernel@...>, <linux-kernel@...>
Date: Tuesday, March 20, 2007 - 3:16 pm

ignore that - i didn't notice ccache was involved.

-

To: Willy Tarreau <w@...>
Cc: Linus Torvalds <torvalds@...>, Xavier Bestel <xavier.bestel@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Tuesday, March 20, 2007 - 4:03 am

I don't _think_ there's anything funny in my system, and Con said it was
the expected behavior with my testcase, but I won't rule it out.

Moving right along to the bugs part, I hope others are looking as well,
and not only talking.

One area that looks pretty fishy to me is cross-cpu wakeups and task
migration. p->rotation appears to lose all meaning when you cross the
cpu boundary, and try_to_wake_up()is using that information in the
cross-cpu case. In pull_task() OTOH, it checks to see if the task ran
on the remote cpu (at all, hmm), and if so tags the task accordingly.
It is not immediately obvious to me why this would be a good thing
though, because quotas of one runqueue don't appear to have any relation
to quotas of some other runqueue. (i'm going to it that this old
information is meaningless)

-Mike

-

To: Willy Tarreau <w@...>
Cc: Linus Torvalds <torvalds@...>, Xavier Bestel <xavier.bestel@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Wednesday, March 21, 2007 - 10:57 am

Doing the same in try_to_wake_up()delivered a counter intuitive result.
I expected sleeping tasks to suffer a bit, because when a task wakes up
on a different cpu, the chance of it being in the same rotation is
practically nil, so it would be issued a new quota when it hit
recalc_task_prio() and begin a new walk down the stairs. In the case
where it's is told that the awakening task is running in the same
rotation (as is done in pull_task, and with the patchlet below), since
p->array isn't NULLed any more when the task is dequeued, there would be
an array (last it was queued in), there's going to be time_slice (see no
way 0 time_slice can happen, and nothing good would happen in
task_running_tick() if it could), and since per instrumentation nobody
is ever overrunning runqueue quota, it should just continue to march
down the stairs, and receive less bandwidth than the full restart.

What happened is below.

'f' is a progglet which sleeps a bit and burns a bit, duration depending
on argument given. 'sh' is a shell 100% hog. In this scenario, the
argument was set such that 'f' used right at 50% cpu. All are started
at the same time, and I froze top when the first 'f' reached 1:00.

virgin 2.6.21-rc3-rsdl-smp
top - 13:52:50 up 7 min, 12 users, load average: 3.45, 2.89, 1.51

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
6560 root 31 0 2892 1236 1032 R 82 0.1 1:50.24 1 sh
6558 root 28 0 1428 276 228 S 42 0.0 1:00.09 1 f
6557 root 30 0 1424 280 228 R 35 0.0 1:00.25 0 f
6559 root 39 0 1424 276 228 R 33 0.0 0:58.36 0 f
6420 root 23 0 2372 1068 764 R 3 0.1 0:04.68 0 top

patched as below 2.6.21-rc3-rsdl-smp
top - 14:09:28 up 6 min, 12 users, load average: 3.52, 2.70, 1.29

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
6517 root 38 0 2892 1240 1032 R 59 0.1 1:31.12 1 sh
6515 root 24 0 1424 280 228 R 51 0.0 1:00.10 0 f
6...

To: Mike Galbraith <efault@...>
Cc: Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, Xavier Bestel <xavier.bestel@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Wednesday, March 21, 2007 - 12:02 pm

May one enquire how much CPU the mythical 'f' uses when ran alone? Just
to get a gauge for the numbers?

-

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, Xavier Bestel <xavier.bestel@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Thursday, March 22, 2007 - 3:07 am

This is a rather long message, and isn't directed at anyone in
particular, it's for others who may be digging into their own problems
with RSDL, and for others (if any other than Con exist) who understand
RSDL well enough to tell me if I'm missing something. Anyone who's not
interested in RSDL's gizzard hit 'D' now.

Actually, the numbers are an interesting curiosity point, but not as
/*
* Accounting is performed by both the task and the runqueue. This
* allows frequently sleeping tasks to get their proper quota of
* cpu as the runqueue will have their quota still available at
* the appropriate priority level. It also means frequently waking
* tasks that might miss the scheduler_tick() will get forced down
* priority regardless.
*/
if (!--p->time_slice)
task_expired_entitlement(rq, p);
/*
* We only employ the deadline mechanism if we run over the quota.
* It allows aliasing problems around the scheduler_tick to be
* less harmful.
*/
if (!rt_task(p) && --rq_quota(rq, rq->prio_level) < 0) {
if (unlikely(p->first_time_slice))
p->first_time_slice = 0;
rotate_runqueue_priority(rq);
set_tsk_need_resched(p);
}

The reason for ticking both runqueue and task is that you can't sample a
say 100KHz information stream at 1KHz and reproduce that information
accurately. IOW, task time slices "blur" at high switch frequency, you
can't always hit tasks, so you hit what you _can_ hit every sample, the
runqueue, to minimize the theoretical effects of time slice theft.
(I've instrumented this before, and caught fast movers stealing 10s of
milliseconds in extreme cases.) Generally speaking, statistics even
things out very much, the fast mover eventually gets hit, and pays a
full tick for his sub-tick dip in the pool, so in practice it's not a
great big hairy deal.

If you can accept that tasks can and do dodge the tick, an imbalance
between runqueue quota and task quota must occur. It isn't happening
here, and the reason appea...

To: Mike Galbraith <efault@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, Xavier Bestel <xavier.bestel@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Thursday, March 22, 2007 - 6:50 pm

Thanks for taking the time to actually look at the code. All audits are most
welcome!.

I had considered the quota not migrating to the new runqueue but basically it
screws up the "set quota once and deadline only kicks in if absolutely
necessary" policy. Migration means some extra quota is left behind on the
runqueue it left from. It is never a huge extra quota and is reset on major
rotation which occurs very frequently on rsdl. If I was to carry the quota
over I would need to deduct p->time_slice from the source runqueue's quota,
and add it to the target runqueue's quota. The problem there is that once the
time_slice has been handed out to a task it is my position that I no longer
trust the task to keep its accounting right and may well have exhausted all
its quota from the source runqueue and is pulling quota away from tasks that
haven't used theirs yet.

Cross cpu migrating task can't have p->array pointing to the new runqueue's
active array by any means, but fork and friends could. The other point about
cross cpu history having the wrong effect though is most valid. Good
spotting! While it's unlikely this could cause an oops... you never know so

Thanks again!

--
-ck
-

To: Con Kolivas <kernel@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, Xavier Bestel <xavier.bestel@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Friday, March 23, 2007 - 12:39 am

The accounting is easy iff tick resolution is good enough, the deadline
mechanism is harder. I did the "quota follows task" thing, but nothing
good happens. That just ensured that the deadline mechanism kicks in
constantly because tick theft is a fact of tick-based life. A
reasonable fudge factor would help, but...

I see problems wrt with trying to implement the deadline mechanism.

As implemented, it can't identify who is doing the stealing (which
happens constantly, even if userland if 100% hog) because of tick
resolution accounting. If you can't identify the culprit, you can't
enforce the quota, and quotas which are not enforced are, strictly
speaking, not quotas. At tick time, you can only close the barn door
after the cow has been stolen, and the thief can theoretically visit
your barn an infinite number of times while you aren't watching the
door. ("don't blink" scenarios, and tick is backward-assward blink)

You can count nanoseconds in schedule, and store the actual usage, but
then you still have the problem of inaccuracies in sched_clock() from
cross-cpu wakeup and migration. Cross-cpu wakeups happen quite a lot.
If sched_clock() _were_ absolutely accurate, you wouldn't need the
runqueue deadline mechanism, because at slice tick time you can see
everything you will ever see without moving enforcement directly into
the most critical of paths.

IMHO, unless it can be demonstrated that timeslice theft is a problem
with a real-life scenario, you'd be better off dropping the queue
ticking. Time slices are a deadline mechanism, and in practice the god
of randomness ensures that even fast movers do get caught often enough
to make ticking tasks sufficient.

(that was a very long-winded reply to one sentence because I spent a lot
of time looking into this very subject and came to the conclusion that

You're welcome.

-Mike

-

To: Mike Galbraith <efault@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, Xavier Bestel <xavier.bestel@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Friday, March 23, 2007 - 1:59 am

The deadline mechanism is easy to hit and works. Try printk'ing it. There is
some leeway to take tick accounting into the equation and I don't believe
nanosecond resolution is required at all for this (how much leeway would you
give then ;)). Eventually there is nothing to stop us using highres timers
(blessed if they work as planned everywhere eventually) to do the events and
do away with scheduler_tick entirely. For now ticks works fine; a reasonable
estimate for smp migration will suffice (patch forthcoming).

--
-ck
-

To: Con Kolivas <kernel@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, Xavier Bestel <xavier.bestel@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Friday, March 23, 2007 - 8:17 am

I tried rc4-rsdl.33, and in a log that's 782kb, there is only one
instance of an overrun, which I created. On my box, it's dead code.

-Mike

-

To: Con Kolivas <kernel@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, Xavier Bestel <xavier.bestel@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Friday, March 23, 2007 - 2:11 am

Hm. I did (.30), and it didn't in an hours time doing this and that.
After I did the take your quota with you, it did kick in. Lots.

-Mike

-

To: Mike Galbraith <efault@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, Xavier Bestel <xavier.bestel@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Thursday, March 22, 2007 - 5:18 am

it's not just the scheduling accounting being off, RSDL also seems to be

it might point to a hot-unplugged CPU's runqueue as well. Which might
work accidentally, but we want this fixed nevertheless.

Ingo
-

To: Ingo Molnar <mingo@...>
Cc: Mike Galbraith <efault@...>, Peter Zijlstra <a.p.zijlstra@...>, Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, Xavier Bestel <xavier.bestel@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, <ck@...>, Serge Belyshev <belyshev@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Thursday, March 22, 2007 - 6:03 pm

All code reviews are most welcome indeed!

I don't think this is a problem because immediately after this in pull_task it
calls enqueue_task() which always updates p->array in recalc_task_prio().
Every enqueue_task always calls recalc_task_prio on non-rt tasks so the array
should always be set no matter where the entry point to scheduling is from
unless I have a logic error in setting the p->array in recalc_task_prio() or
there is another path to schedule() that I've not accounted for by making

The hot unplugged cpu's prio_rotation will be examined, and then it sets the
prio_rotation from this runqueue's value. That shouldn't lead to any more
problems than setting the timestamp based on the hot unplug cpus timestamp
lower down also in pull_task()

p->timestamp = (p->timestamp - src_rq->most_recent_timestamp) +
this_rq->most_recent_timestamp;

Thanks for looking!

--
-ck
-

To: Ingo Molnar <mingo@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, Xavier Bestel <xavier.bestel@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Thursday, March 22, 2007 - 5:34 am

Erk! I mentioned to Con offline that I've seen RSDL bring up only one
of my two (halves of a) penguins a couple three times out of a zillion
boots. Maybe that's why?

-Mike

-

To: Ingo Molnar <mingo@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, Xavier Bestel <xavier.bestel@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Thursday, March 22, 2007 - 5:41 am

bzzt. singletasking brain :)

-

To: Peter Zijlstra <a.p.zijlstra@...>
Cc: Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, Xavier Bestel <xavier.bestel@...>, Mark Lord <lkml@...>, Al Boldi <a1426z@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Andrew Morton <akpm@...>
Date: Wednesday, March 21, 2007 - 1:06 pm

Right at 50%

-Mike

(mythical? i can send you the binary if you want)

-

To: Al Boldi <a1426z@...>
Cc: Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Sunday, March 18, 2007 - 2:17 am

So neither does a good job with this load.

-Mike

-

To: Mike Galbraith <efault@...>
Cc: Al Boldi <a1426z@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Sunday, March 18, 2007 - 2:47 am

that sorely depends on what you mean by good job.

It seems like what you call a good job is preserving the speed of the

-

To: Kasper Sandberg <lkml@...>
Cc: Al Boldi <a1426z@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Sunday, March 18, 2007 - 3:08 am

Wrong. I call a good job giving a _preference_ to the desktop. I call
rigid fairness impractical for the desktop, and a denial of reality.

-Mike

-

To: Mike Galbraith <efault@...>
Cc: Kasper Sandberg <lkml@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <ck@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Sunday, March 18, 2007 - 3:22 am

My sound programs (audacity, non-RT) and mplayer disaggree with you. :-)
Not to mention some more mundane stuff like Gajim. (no stall with its
slow PyGTK UI on RSDL)

(Hint: I'm using Xfce, not KDE)

I'd recon KDE regresses because of kioslaves waiting on a pipe
(communication with the app they're doing IO for) and then expiring.
That's why splitting IO from an app isn't exactly smart. It should at
least be ran in an another thread.

A much better approach would be running IO in the context of the app,
but using a common shared library.

Also, Beryl works better with RSDL too.
(blur doesn't "disable" itself sometimes - which is the result of a lag)
-

To: Radoslaw Szkodzinski <astralstorm@...>
Cc: Kasper Sandberg <lkml@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <ck@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Sunday, March 18, 2007 - 3:38 am

Hm. Sounds rather a lot like the...
X sucks, fix X and RSDL will rock your world. RSDL is perfect.
...that I've been getting.

-Mike

-

To: Mike Galbraith <efault@...>
Cc: Kasper Sandberg <lkml@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <ck@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Sunday, March 18, 2007 - 11:44 am

Blah. Nothing's perfect. Especially not computer programs.

Still, it's not a smart decision on KDE's part.
It will break a lot of scheduling decisions, esp. you can't use IO priorities.
-

To: <ck@...>
Cc: Radoslaw Szkodzinski <astralstorm@...>, Mike Galbraith <efault@...>, Al Boldi <a1426z@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, Kasper Sandberg <lkml@...>, Nicholas Miell <nmiell@...>
Date: Sunday, March 18, 2007 - 12:09 pm

Well, if somebody could explain what they should do instead of what they're=
=20
doing, I can contact them - the libraries for KDE 4 aren't in feature freez=
e=20
yet (they will be, though) so they can solve the problem(s). The KIO=20
infrastructure is ATM under a redesign, so please, if you know what they=20
should do/are doing wrong, speak up!

grtz

Jos

To: Mike Galbraith <efault@...>
Cc: Radoslaw Szkodzinski <astralstorm@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <ck@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Sunday, March 18, 2007 - 5:57 am

not really, only X sucks. KDE works atleast as good with rsdl as
vanilla. i dont know how originally said kde works worse, wasnt it just

-

To: Kasper Sandberg <lkml@...>
Cc: Mike Galbraith <efault@...>, Radoslaw Szkodzinski <astralstorm@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <ck@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Monday, March 19, 2007 - 4:47 pm

It was probably me, and I had the opinion that KDE is not as smooth as
GNOME with RSDL. I haven't had time to measure, but using for daily
stuff for about an hour each way hasn't changed my opinion. Every once
in a while KDE will KLUNK to a halt for 200-300ms doing mundane stuff
like redrawing a page, scrolling, etc. I don't see it with GNOME.

--
Bill Davidsen <davidsen@tmr.com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
-

To: Bill Davidsen <davidsen@...>
Cc: Mike Galbraith <efault@...>, Radoslaw Szkodzinski <astralstorm@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <ck@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Wednesday, March 21, 2007 - 4:58 am

umm, could you try to find something that always does it, so i can try
to reproduce? cause i dont really hit any such thing, and i only have a

-

To: <ck@...>
Cc: Bill Davidsen <davidsen@...>, Kasper Sandberg <lkml@...>, Al Boldi <a1426z@...>, Mike Galbraith <efault@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Tuesday, March 20, 2007 - 6:19 am

yeah, here too... sometimes even longer (and I have a dualcore, 3gb ram,=20
damnit!)

=2D-=20
Disclaimer:

Alles wat ik doe denk en zeg is gebaseerd op het wereldbeeld wat ik nu heb.=
=20
Ik ben niet verantwoordelijk voor wijzigingen van de wereld, of het beeld w=
at=20
ik daarvan heb, noch voor de daaruit voortvloeiende gedragingen van mezelf.=
=20
Alles wat ik zeg is aardig bedoeld, tenzij expliciet vermeld.

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html

To: Kasper Sandberg <lkml@...>
Cc: Mike Galbraith <efault@...>, Radoslaw Szkodzinski <astralstorm@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <ck@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Sunday, March 18, 2007 - 9:57 am

Couldn't agree more, been using RSDL+KDE for a week now, and as far as
I'm concerned I will be sticking with this until it goes to mainline,
or mainline exhibits better behaviour. The fact of the matter is I was
always unsure why windows 'feels' so much better than linux. RSDL
makes it feel like windows, all the time, no matter what's going on.
I'm really miffed when anyone talks about regressions, because I have
scenarios that will completely lock up linux for 10s or so on my
athlon 4200x2, due to [really] poorly designed apps that I need to
run. RSDL seems to work right though it; it works faster, mouse feels
better, browser scrolls smoothly, can't make the sound skip, video is
fluid, opening a term is instant, boot up is faster, all which are a
step up from mainline. That's the facts. IME I have seen nothing but
good with RSDL.

--
avuton
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
-

To: Mike Galbraith <efault@...>
Cc: Radoslaw Szkodzinski <astralstorm@...>, Kasper Sandberg <lkml@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <ck@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Sunday, March 18, 2007 - 4:20 am

maybe if it is possible to classify program behaviors that cause RSDL to
do bad (relatively) or the mainline scheduler to jitter, we could try
modifying the existing heuristics to get a better default scheduler.

of course, it wouldn't be able to cater to all the workloads and would
meet everybody's definition of optimal. but getting close to optimal in
most cases should be a good enough goal for linux's default sched!

i've been following this thread, and there's been many instances of
'RSDL is gr8' and 'RSDL regresses'.

maybe RSDL isn't the answer. maybe the current mainline sched isn't
either. but RSDL definitely has done *something* right.

What i think is needed is 'why this works here' and 'how to get this
behavior to work with some other possibly conflicting but important
workloads'.

(just my 2c :-)

-jb
--
I am professionally trained in computer science, which is to say
(in all seriousness) that I am extremely poorly educated.
-- Joseph Weizenbaum
-

To: jimmy bahuleyan <knight.camelot@...>
Cc: Radoslaw Szkodzinski <astralstorm@...>, Kasper Sandberg <lkml@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <ck@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Sunday, March 18, 2007 - 4:34 am

IMHO, that's worth more than 2c.

-Mike

-

To: Radoslaw Szkodzinski <astralstorm@...>
Cc: Kasper Sandberg <lkml@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <ck@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Sunday, March 18, 2007 - 4:04 am

P.S. For those folks who appear to think that I'm in love with mainline
behavior and blissfully ignorant of it's shortcomings.

http://lwn.net/Articles/176635/

-

To: Mike Galbraith <efault@...>
Cc: Con Kolivas <kernel@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <ck@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Saturday, March 17, 2007 - 9:58 am

Con is over-simplifying here -- he is saying the number of
regression-reporters is dwarfed by the number of positive responses.
One regression in particular, though, is rather persistent, and we are
unsure of how to solve it in a way that fits the ideals of RSDL.

There has been one class of problems that have been reported against
RSDL (problems with X or some X+GL-based app in the context of
CPU-intensive programs) that has yet to be "resolved", AFAICS. The
possible solutions being brought up (e.g. auto-nice in the kernel) go
against the fundamental reasons and logic behind RSDL.

The latest (RSDL 0.31) is supposed to help with relatively-niced
latency-sensitive programs (which were reported earlier) and there is
some progress, although I believe maybe one or two people are still
reporting issues here. We are making progress in this circumstance
that Con feels is acceptable. (Again, the remaining potential
solutions being proposed so far go against RSDL's fundamental ideals.
If we actually have to use these solutions, then basically we're just
doing the vanilla scheduler all over again -- defeating the purpose of
RSDL in the first place.)

akpm, IIRC, reported an issue on PPC with an early version of RSDL
(presumably a broken one) when he tried to build the first public
release with -mm; we have yet to hear negative feedback from him (on
the -ck list at least) saying this problem persisted with future
releases.

I think the idea is that Con has seen much greater positive response
(particularly with earlier releases, i.e. a week ago), and a lack of
thorough, viable solutions (particularly ones that either come with a
patch or fit in with the current ideal of RSDL) for the complaints
being brought up that he is feeling frustrated. (Con's neck problem is
preventing him from spending too much time coding solutions himself,
and I feel the discussion that is going on here is starting to become
counterproductive in consideration of that.)

I've also seen no one mention the possibi...

To: Con Kolivas <kernel@...>
Cc: <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, <linux-kernel@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 6:49 am

Rocks and clubs at work (down boy <whack>, down i say!;).

This is .30 with some targeted unfairness. I seem to be making progress
toward beating it to a bloody but cooperative pulp. It might be
possible to have my cake and eat it too. Likely too ugly to live
though.

top - 11:35:50 up 57 min, 12 users, load average: 5.20, 4.30, 2.57

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
6599 root 26 0 174m 30m 8028 R 51 3.1 7:08.70 0 Xorg
7991 root 29 0 18196 14m 5188 R 47 1.4 0:55.70 0 amarok_libvisua
7995 root 37 5 3720 2444 976 R 44 0.2 0:27.53 1 lame
7993 root 37 5 3720 2448 976 R 40 0.2 0:46.60 1 lame

-

To: <linux-kernel@...>
Cc: Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 8:05 am

What X driver are you using Mike? Are you on a builtin video chipset
using mobo shared memory? I'm running the nvidia 9755 driver here with a
jaton nvidia 6200-256 card here, and YES I KNOW that taints the kernel,
but x is using on average, 1.3% of the cpu. Its sitting at a -1 nice and
it continues to give usable results here in the face of a make -j8
running in another shell.

With figures (over 50% cpu to X) like that, its no wonder you're
squawking about it.

An old phrase comes to mind "Physician, Heal thyself". Your X is broken
and you are trying to blame everything but X. Fix your hardware and
drivers.

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Two can Live as Cheaply as One for Half as Long.
-- Howard Kandel
-

To: Gene Heskett <gene.heskett@...>
Cc: <linux-kernel@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 9:36 am

Xorg is using 50% cpu because I'm asking it to.

-Mike

-

To: <linux-kernel@...>
Cc: Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 1:03 pm

On Saturday 17 March 2007, Mike Galbraith wrote:

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Once upon a time, four AMPHIBIOUS HOG CALLERS attacked a family of
DEFENSELESS, SENSITIVE COIN COLLECTORS and brought DOWN their PROPERTY
VALUES!!
-

To: Gene Heskett <gene.heskett@...>
Cc: <linux-kernel@...>, Con Kolivas <kernel@...>, <ck@...>, Serge Belyshev <belyshev@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Nicholas Miell <nmiell@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 1:37 pm

It's a test scenario. Read the thread please, I really don't want to
repeat myself endlessly.

-Mike

-

To: Mike Galbraith <efault@...>
Cc: Gene Heskett <gene.heskett@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <ck@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Saturday, March 17, 2007 - 2:23 pm

I've been following "this thread" since Con's .31 announcement - and
the only reference to your test scenario that you've given is that
you're still "having trouble with x/gforce vs two niced encoders",
that you've added "some targeted unfairness", that Con's new scheduler
is "an utter failure" and something about beating it to a bloody pulp.

You haven't detailed what your test actually is or what it's trying to
acheive, nor provided anyone else with the means to reproduce it or
understand any of the behaviour you're seeing. Now if you've done that
in any other thread, consider referencing it instead of worrying about
"repeating yourself endlessly". Otherwise, you're making it pretty
clear that you're just trying to be difficult, rather than being
heard.

Remember, this thread is not only cross-posted, but also exists in a
high-volume mailing list where things aren't as easy to track as in
one's own head.

And for Mark and others who are as confused as I was, this is the
thread that Mike meant to reference:
http://thread.gmane.org/gmane.linux.kernel/503455/focus=6614

Cheers,
-Kacper
-

To: Kacper Wysocki <kacperw@...>
Cc: Gene Heskett <gene.heskett@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, <ck@...>, Linus Torvalds <torvalds@...>, Nicholas Miell <nmiell@...>
Date: Saturday, March 17, 2007 - 2:45 pm

Nope, with all the back and forth (and noise), I lost track of which
thread was which. Thanks.

-Mike

-

To: Ingo Molnar <mingo@...>
Cc: Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 4:23 am

(sorry for the duplicate Ingo, this time I managed to Repy to All)

Yes, it's an X problem.

There's two issues, really -- smooth pointer movement or the lack
thereof and the servicing of clients at varying priorities. There's
vague plans floating around about moving all input processing off into a
separate high-priority thread and pretty much no ideas how to deal with
mixed priority clients.

So, the current scheduler works around this brain damage using

RSDL is, above all else, fair. Predictably so.
Hacking around X's stupidity makes it no longer *be* RSDL.

Until they catch up to the early-90s technology-wise, we can just nice
-19 X.

--
Nicholas Miell <nmiell@comcast.net>

-

To: Nicholas Miell <nmiell@...>
Cc: Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 5:42 am

SCHED_BATCH (an existing feature of the current scheduler) is even
fairer and even more deterministic than RSDL, because it has _zero_
heuristics.

so how about the patch below (against current -git), which adds the
"CFS, Completely Fair Scheduler" feature? With that you could test your
upcoming X fixes. (it also adds /proc/sys/kernel/sched_fair so that you
can compare the fair scheduler against the vanilla scheduler.) It's very
simple and unintrusive:

4 files changed, 28 insertions(+)

furthermore, this is just the first step: if CONFIG_SCHED_FAIR becomes
widespread amongst distributions then we can remove the interactivity
estimator code altogether, and simplify the code quite a bit.

( NOTE: more improvements are possible as well: right now most
interactivity calculations are still done even if CONFIG_SCHED_FAIR is
enabled - that could be improved upon. )

Ingo

------------------------------>
Subject: [patch] CFS scheduler: Completely Fair Scheduler
From: Ingo Molnar <mingo@elte.hu>

add the CONFIG_SCHED_FAIR option (default: off): this turns the Linux
scheduler into a completely fair scheduler for SCHED_OTHER tasks: with
perfect roundrobin scheduling, fair distribution of timeslices combined
with no interactivity boosting and no heuristics.

a /proc/sys/kernel/sched_fair option is also available to turn
this behavior on/off.

if this option establishes itself amongst leading distributions then we
could in the future remove the interactivity estimator altogether.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
include/linux/sched.h | 1 +
kernel/Kconfig.preempt | 9 +++++++++
kernel/sched.c | 8 ++++++++
kernel/sysctl.c | 10 ++++++++++
4 files changed, 28 insertions(+)

Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -119,6 +119,7 @@ extern unsigned long avenrun[]; /* Lo...

To: Nicholas Miell <nmiell@...>
Cc: Con Kolivas <kernel@...>, <ck@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 3:11 am

I'm not in a position to hold up development.

On a side note, I wonder how long it's going to take to fix all the
X/client combinations out there.

-Mike

-

To: <linux-kernel@...>
Cc: Mike Galbraith <efault@...>, Nicholas Miell <nmiell@...>, Con Kolivas <kernel@...>, <ck@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>
Date: Saturday, March 17, 2007 - 7:48 am

And on yet another side note Mike, I just did a make -j8 for all make
options in my makeit script, something I don't normally do, and while
running 2.6.20.3-rdsl-0.31, building 2.6.21-rc4, the machine remained
100% responsive, worst case keyboard lag that I observed might have been
200 or 300 milliseconds. The machine remained usable, which to me is the
bottom line. And no, I wasn't running xmms at the time or watching
tvtime else I'd have awakened the missus.

I'm having a hard time justifying your continual fussing as its obviously
a huge improvement to me. What makes your system and loading so much
different from mine I wonder...

In the meantime I'm a very happy camper about this patch.

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
I prefer rogues to imbeciles because they sometimes take a rest.
-- Alexandre Dumas, fils
-

To: Mike Galbraith <efault@...>
Cc: Nicholas Miell <nmiell@...>, Con Kolivas <kernel@...>, <ck@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 3:25 am

AIUI X's clients largely access it via libraries X ships, so the X
update will sweep the vast majority of them in one shot. You'll have
to either run the clients from remote hosts with downrev libraries or
have downrev libraries around (e.g. in chroots) for clients to link to
for the clients not to cooperate.

-- wli
-

To: William Lee Irwin III <wli@...>
Cc: Mike Galbraith <efault@...>, Con Kolivas <kernel@...>, <ck@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 3:29 am

The changes will probably be entirely server-side anyway, so stray
ancient libraries won't be a problem.

--
Nicholas Miell <nmiell@comcast.net>

-

To: Nicholas Miell <nmiell@...>
Cc: Con Kolivas <kernel@...>, <ck@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 2:08 am

P.S. "utter failure" was too harsh. What sticks in my craw is that the
world has to adjust to fit this new scheduler.

-Mike

-

To: Linux-Kernel@Vger. Kernel. Org <linux-kernel@...>
Date: Sunday, March 18, 2007 - 10:27 pm

Even when it's totally clear that this scheduler is doing what you asked it
do while the old one wasn't? It still bothers you that now you have to ask
for what you want rather than asking for what happens to give you what you

Assuming you *want* that. It's possible that the desktop may not be
particularly important and the machine may be doing much more important
server work with critical latency issues. So if you want that, you have to
ask for it.

Again, your complaint is that the other server gave you what you wanted even
when you didn't ask for it. That's great for you but totally sucks for the
majority of other people who want something else.

DS

-

To: <davids@...>
Cc: Linux-Kernel@Vger. Kernel. Org <linux-kernel@...>
Date: Monday, March 19, 2007 - 2:21 am

Amusing argument ;-) I doubt that there are many admins ripping and

I don't presume to speak for the majority...

-Mike

-

To: Mike Galbraith <efault@...>
Cc: <davids@...>, Linux-Kernel@Vger. Kernel. Org <linux-kernel@...>
Date: Monday, March 19, 2007 - 2:59 am

I've known one at least, he said he was ensuring the shiny new dual athlons
were stable enough for production ;-) But that does not make a rule.

Cheers,
Willy

-

To: Mike Galbraith <efault@...>
Cc: Nicholas Miell <nmiell@...>, Con Kolivas <kernel@...>, <ck@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Sunday, March 18, 2007 - 3:37 pm

I have never seen X run nearly as smooth as our favorite proprietary
OS on similar spec hardware with ANY scheduler.

Lee
-

To: Lee Revell <rlrevell@...>
Cc: Mike Galbraith <efault@...>, Nicholas Miell <nmiell@...>, Con Kolivas <kernel@...>, <ck@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Sunday, March 18, 2007 - 6:45 pm

i have never seen Windows (or were you talking about Mac OSX ?) run
smooth. Win2k (scheduler) is almost usable if your computer is very fast
but on common hardware every version of windows for me was a joke. or
maybe you have a special version ;) [1]

I don't run KDE or Gnome in linux so ... maybe that's the problem ;)

[1] And no, i don't consider waiting 2-5-20-50 seconds for a program to
start a feature. YMMV

--

"frate, trezeste-te, aici nu-i razboiul stelelor"
Radu R. pe offtopic at lug.ro

-

To: Lee Revell <rlrevell@...>
Cc: Nicholas Miell <nmiell@...>, Con Kolivas <kernel@...>, <ck@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Sunday, March 18, 2007 - 3:55 pm

Heh, I _have_, but they do have an edge. It's a lot easier when you're
more or less a single user single tasking OS.

-Mike

-

To: Mike Galbraith <efault@...>
Cc: Nicholas Miell <nmiell@...>, Con Kolivas <kernel@...>, <ck@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 9:56 am

If a new scheduler has a better 'normal' performance adjusting to its quirks
is fine. Your testing is important. We need to understand what needs

Thanks
Ed Tomlinson
-

To: Con Kolivas <kernel@...>
Cc: <ck@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Friday, March 16, 2007 - 9:42 am

Can you please make an incremental diff as well? That will make it
easier to see what changes have taken place, and make it easier to
integrate those into hacked-upon trees.

-Mike

-

To: Mike Galbraith <efault@...>
Cc: <ck@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Friday, March 16, 2007 - 9:59 am

To: Con Kolivas <kernel@...>
Cc: <ck@...>, Ingo Molnar <mingo@...>, Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Friday, March 16, 2007 - 10:07 am

Ah, got it. Thanks.

-Mike

-

To: <ck@...>
Cc: Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, <linux-kernel@...>
Date: Tuesday, March 13, 2007 - 12:08 pm

A few other minor things would need to be updated before this patch is in a
good enough shape to join the rsdl patches. This one will be good for testing
though.

--
-ck
-

To: Con Kolivas <kernel@...>
Cc: Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, ck list <ck@...>, <linux-kernel@...>
Date: Wednesday, March 14, 2007 - 5:13 am

Oh my. I thought I was all done staring mindlessly at gforce (chinese
water torture). Oh well, a few more brain cells dying of boredom won't
kill me I guess ;-) Will give it a shot.

-Mike

-

To: Mike Galbraith <efault@...>
Cc: Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, ck list <ck@...>, <linux-kernel@...>
Date: Wednesday, March 14, 2007 - 5:25 am

No don't. It's buggy and you missed the warning. Boy were you lucky I was
looking right now.

--
-ck
-

To: Con Kolivas <kernel@...>
Cc: Al Boldi <a1426z@...>, Andrew Morton <akpm@...>, Ingo Molnar <mingo@...>, ck list <ck@...>, <linux-kernel@...>
Date: Wednesday, March 14, 2007 - 5:42 am

I'll wait for a .31 and construct a 2.6.21-rc3 (isolation) test-tree
from that.

-Mike

-

To: Al Boldi <a1426z@...>
Cc: ck list <ck@...>, <linux-kernel@...>
Date: Monday, March 12, 2007 - 8:52 am

There is a way that I toyed with of creating maps of slots to use for each
different priority, but it broke the O(1) nature of the virtual deadline
management. Minimising algorithmic complexity seemed more important to
maintain than getting slightly better latency spreads for niced tasks. It
also appeared to be less cache friendly in design. I could certainly try and
implement it but how much importance are we to place on latency of niced
tasks? Are you aware of any usage scenario where latency sensitive tasks are
ever significantly niced in the real world?

--
-ck
-

To: <ck@...>
Cc: Con Kolivas <kernel@...>, Al Boldi <a1426z@...>, <linux-kernel@...>
Date: Sunday, March 18, 2007 - 6:50 am

I do always nice down heavy games, it makes them more smooth...

=2D-=20
Disclaimer:

Alles wat ik doe denk en zeg is gebaseerd op het wereldbeeld wat ik nu heb.=
=20
Ik ben niet verantwoordelijk voor wijzigingen van de wereld, of het beeld w=
at=20
ik daarvan heb, noch voor de daaruit voortvloeiende gedragingen van mezelf.=
=20
Alles wat ik zeg is aardig bedoeld, tenzij expliciet vermeld.

To: Con Kolivas <kernel@...>
Cc: Al Boldi <a1426z@...>, ck list <ck@...>, <linux-kernel@...>
Date: Saturday, March 17, 2007 - 9:30 pm

It depends on how you reconcile "completely fair" and "order of
magnitude blips in latency." It looks (from the results, not the code)
as if nice is implemented by round-robin scheduling followed by once in
a while just not giving the CPU to the nice task for a while. Given the
smooth nature of the performance otherwise, it's more obvious than if
you weren't doing such a good job most of the time.

Ugly stands out more on something beautiful!

--
Bill Davidsen <davidsen@tmr.com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
-

To: Con Kolivas <kernel@...>
Cc: ck list <ck@...>, <linux-kernel@...>
Date: Monday, March 12, 2007 - 10:14 am

It only takes one negatively nice'd proc to affect X adversely.

Thanks!

--
Al

-

To: Al Boldi <a1426z@...>
Cc: ck list <ck@...>, <linux-kernel@...>
Date: Monday, March 12, 2007 - 2:05 pm

I have an idea. Give me some time to code up my idea. Lack of sleep is making
me very unpleasant.

--
-ck
-

To: <ck@...>
Cc: Con Kolivas <kernel@...>, Al Boldi <a1426z@...>, <linux-kernel@...>
Date: Monday, March 12, 2007 - 2:47 pm

You're excited by RSDL and the positive comments, aren't you? Well, don't=20
forget to sleep, sleeping makes ppl smarter you know ;-)

To: <linux-kernel@...>
Cc: <ck@...>, Al Boldi <a1426z@...>, Con Kolivas <kernel@...>, jos poortvliet <jos@...>
Date: Monday, March 12, 2007 - 2:58 pm

IIRC, about 2 or three years ago (or maybe on the 2.6.10 timeframe),
there was a patch which managed to pass the interactive from one app
to another when there was a pipe or udp connection between them. This
meant that a marked-as-interactive xterm would, when blocked waiting
for an Xserver response, transfer some of its interactiveness to the
Xserver, and aparently it worked very good for desktop workloads so,
maybe adapting it for this new scheduler would be good.

--
Greetz, Antonio Vargas aka winden of network

http://network.amigascne.org/
windNOenSPAMntw@gmail.com
thesameasabove@amigascne.org

Every day, every year
you have to work
you have to study
you have to scene.
-

To: Antonio Vargas <windenntw@...>
Cc: <linux-kernel@...>, <ck@...>, Al Boldi <a1426z@...>, Con Kolivas <kernel@...>, jos poortvliet <jos@...>
Date: Monday, March 19, 2007 - 6:47 am

And it was dropped because of some very nasty side effect,
probably a DOS opportunity.

Helge Hafting

-

To: <ck@...>
Cc: Al Boldi <a1426z@...>, Con Kolivas <kernel@...>, <linux-kernel@...>
Date: Monday, March 12, 2007 - 10:58 am

Then, maybe, we should start nicing X again, like we did/had to do until a =
few=20
years ago? Or should we just wait until X gets fixed (after all, developmen=
t=20

=2D-=20
Disclaimer:

Alles wat ik doe denk en zeg is gebaseerd op het wereldbeeld wat ik nu heb.=
=20
Ik ben niet verantwoordelijk voor wijzigingen van de wereld, of het beeld w=
at=20
ik daarvan heb, noch voor de daaruit voortvloeiende gedragingen van mezelf.=
=20
Alles wat ik zeg is aardig bedoeld, tenzij expliciet vermeld.

To: jos poortvliet <jos@...>
Cc: <ck@...>, Al Boldi <a1426z@...>, <linux-kernel@...>
Date: Monday, March 12, 2007 - 12:37 pm

Take this with a grain of salt, but, I don't think this is the
scheduler's _fault_. That said, if the scheduler can fix it, it's not
necessarily a bad thing.

--
~Mike
- Just the crazy copy cat.
-

To: jos poortvliet <jos@...>, <ck@...>
Cc: Con Kolivas <kernel@...>, <linux-kernel@...>
Date: Monday, March 12, 2007 - 1:41 pm

It's not enough to renice X. You would have to renice it, and any app that
needed fixed latency, to the same nice of the negatively nice'd proc, which
defeats the purpose...

Thanks!

--
Al

-

Previous thread: [PATCH] initramfs should not depend on CONFIG_BLOCK by dimitri.gorokhovik on Monday, March 5, 2007 - 6:09 pm. (1 message)

Next thread: OOPS with 2.6.21rc2-git (ata: conflict with ide0/1) by Kok, Auke on Monday, March 5, 2007 - 6:32 pm. (4 messages)