Linux: Scheduler Tunables

Submitted by Jeremy
on December 11, 2002 - 2:12pm

Robert Love [interview] recently posted a patch to the lkml that allows a root user to tune the O(1) scheduler. Robert describes these tunables which live in '/proc/sys/sched' as, "knobs which let one play with all of the important scheduler variables", explaining that they "may help in tuning and debugging the scheduler."

The complete list of added tunables is: child_penalty, max_sleep_avg, parent_penalty, exit_weight, max_timeslice, prio_bonus_ratio, interactive_delta, min_timeslice, and starvation_limit. For definitions of each of these tunables and full details as provided to the lkml by Robert, read on...


child_penalty
Percentage of the parent's sleep_avg that children inherit. sleep_avg is a running average of the time a process spends sleeping. Tasks with high sleep_avg values are considered interactive and given a higher dynamic priority and a larger timeslice. You typically want this some value just under 100.

exit_weight
When a CPU hog task exits, its parent's sleep_avg is reduced by a factor of exit_weight against the exiting task's sleep_avg.

interactive_delta
If a task is "interactive" it is reinserted into the active array after it has expired its timeslice, instead of being inserted into the expired array. How "interactive" a task must be in order to be deemed interactive is a function of its nice value. This interactive limit is scaled linearly by nice value and is offset by the interactive_delta.

max_sleep_avg
max_sleep_avg is the largest value (in ms) stored for a task's running sleep average. The larger this value, the longer a task needs to sleep to be considered interactive (maximum interactive bonus is a function of max_sleep_avg).

max_timeslice
Maximum timeslice, in milliseconds. This is the value given to tasks of the highest dynamic priority.

min_timeslice
Minimum timeslice, in milliseconds. This is the value given to tasks of the lowest dynamic priority. Every task gets at least this slice of the processor per array switch.

parent_penalty
Percentage of the parent's sleep_avg that it retains across a fork(). sleep_avg is a running average of the time a process spends sleeping. Tasks with high sleep_avg values are considered interactive and given a higher dynamic priority and a larger timeslice. Normally, this value is 100 and thus task's retain their sleep_avg on fork. If you want to punish interactive tasks for forking, set this below 100.

prio_bonus_ratio
Middle percentage of the priority range that tasks can receive as a dynamic priority. The default value of 25% ensures that nice values at the extremes are still enforced. For example, nice +19 interactive tasks will never be able to preempt a nice 0 CPU hog. Setting this higher will increase the size of the priority range the tasks can receive as a bonus. Setting this lower will decrease this range, making the interactivity bonus less apparent and user nice values more applicable.

starvation_limit
Sufficiently interactive tasks are reinserted into the active array when they run out of timeslice. Normally, tasks are inserted into the expired array. Reinserting interactive tasks into the active array allows them to remain runnable, which is important to interactive performance. This could starve expired tasks, however, since the interactive task could prevent the array switch. To prevent starving the tasks on the expired array for too long. the starvation_limit is the longest (in ms) we will let the expired array starve at the expense of reinserting interactive tasks back into active. Higher values here give more preferance to running interactive tasks, at the expense of expired tasks. Lower values provide more fair scheduling behavior, at the expense of interactivity. The units are in milliseconds.


From: Robert Love
Subject: [PATCH] scheduler tunables
Date: 	10 Dec 2002 18:27:58 -0500

Attached patch implements sysctl/procfs scheduler tunables, knobs which
let one play with all of the important scheduler variables:

        [18:12:54]rml@phantasy:~$ ls /proc/sys/sched/
        child_penalty      max_sleep_avg  parent_penalty
        exit_weight        max_timeslice  prio_bonus_ratio
        interactive_delta  min_timeslice  starvation_limit

Which may help in tuning and debugging the scheduler.

I believe Ingo did something similar to this ages ago, so original
credit for the idea goes to him.

Note the values are not checked and you can probably cause a
divide-by-zero somewhere, but only root can write these.

Patch is against 2.5.51-mm1.  It also applies to 2.5.51 modulo a simple
failed hunk in sysctl.c.

	Robert Love

 include/linux/sysctl.h |   15 ++++++++++++++-
 kernel/sched.c         |   31 ++++++++++++++++++++++---------
 kernel/sysctl.c        |   35 ++++++++++++++++++++++++++++++++++-
 3 files changed, 70 insertions(+), 11 deletions(-)

diff -urN linux-2.5.51-mm1/include/linux/sysctl.h linux/include/linux/sysctl.h
--- linux-2.5.51-mm1/include/linux/sysctl.h	2002-12-10 17:48:10.000000000 -0500
+++ linux/include/linux/sysctl.h	2002-12-10 16:50:41.000000000 -0500
@@ -66,7 +66,8 @@
 	CTL_DEV=7,		/* Devices */
 	CTL_BUS=8,		/* Busses */
 	CTL_ABI=9,		/* Binary emulation */
-	CTL_CPU=10		/* CPU stuff (speed scaling, etc) */
+	CTL_CPU=10,		/* CPU stuff (speed scaling, etc) */
+	CTL_SCHED=11,		/* scheduler tunables */
 };
 
 /* CTL_BUS names: */
@@ -157,6 +158,18 @@
 	VM_LOWER_ZONE_PROTECTION=20,/* Amount of protection of lower zones */
 };
 
+/* Tunable scheduler parameters in /proc/sys/sched/ */
+enum {
+	SCHED_MIN_TIMESLICE=1,		/* minimum process timeslice */
+	SCHED_MAX_TIMESLICE=2,		/* maximum process timeslice */
+	SCHED_CHILD_PENALTY=3,		/* penalty on fork to child */
+	SCHED_PARENT_PENALTY=4,		/* penalty on fork to parent */
+	SCHED_EXIT_WEIGHT=5,		/* penalty to parent of CPU hog child */
+	SCHED_PRIO_BONUS_RATIO=6,	/* percent of max prio given as bonus */
+	SCHED_INTERACTIVE_DELTA=7,	/* delta used to scale interactivity */
+	SCHED_MAX_SLEEP_AVG=8,		/* maximum sleep avg attainable */
+	SCHED_STARVATION_LIMIT=9,	/* no re-active if expired is starved */
+};
 
 /* CTL_NET names: */
 enum
diff -urN linux-2.5.51-mm1/kernel/sched.c linux/kernel/sched.c
--- linux-2.5.51-mm1/kernel/sched.c	2002-12-10 17:48:10.000000000 -0500
+++ linux/kernel/sched.c	2002-12-10 16:33:34.000000000 -0500
@@ -57,16 +57,29 @@
  * Minimum timeslice is 10 msecs, default timeslice is 150 msecs,
  * maximum timeslice is 300 msecs. Timeslices get refilled after
  * they expire.
+ *
+ * They are configurable via /proc/sys/sched
  */
-#define MIN_TIMESLICE		( 10 * HZ / 1000)
-#define MAX_TIMESLICE		(300 * HZ / 1000)
-#define CHILD_PENALTY		95
-#define PARENT_PENALTY		100
-#define EXIT_WEIGHT		3
-#define PRIO_BONUS_RATIO	25
-#define INTERACTIVE_DELTA	2
-#define MAX_SLEEP_AVG		(2*HZ)
-#define STARVATION_LIMIT	(2*HZ)
+
+int min_timeslice = (10 * HZ) / 1000;
+int max_timeslice = (300 * HZ) / 1000;
+int child_penalty = 95;
+int parent_penalty = 100;
+int exit_weight = 3;
+int prio_bonus_ratio = 25;
+int interactive_delta = 2;
+int max_sleep_avg = 2 * HZ;
+int starvation_limit = 2 * HZ;
+
+#define MIN_TIMESLICE		(min_timeslice)
+#define MAX_TIMESLICE		(max_timeslice)
+#define CHILD_PENALTY		(child_penalty)
+#define PARENT_PENALTY		(parent_penalty)
+#define EXIT_WEIGHT		(exit_weight)
+#define PRIO_BONUS_RATIO	(prio_bonus_ratio)
+#define INTERACTIVE_DELTA	(interactive_delta)
+#define MAX_SLEEP_AVG		(max_sleep_avg)
+#define STARVATION_LIMIT	(starvation_limit)
 
 /*
  * If a task is 'interactive' then we reinsert it in the active
diff -urN linux-2.5.51-mm1/kernel/sysctl.c linux/kernel/sysctl.c
--- linux-2.5.51-mm1/kernel/sysctl.c	2002-12-10 17:48:10.000000000 -0500
+++ linux/kernel/sysctl.c	2002-12-10 17:05:04.000000000 -0500
@@ -54,6 +54,15 @@
 extern int cad_pid;
 extern int pid_max;
 extern int sysctl_lower_zone_protection;
+extern int min_timeslice;
+extern int max_timeslice;
+extern int child_penalty;
+extern int parent_penalty;
+extern int exit_weight;
+extern int prio_bonus_ratio;
+extern int interactive_delta;
+extern int max_sleep_avg;
+extern int starvation_limit;
 
 /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
 static int maxolduid = 65535;
@@ -111,6 +120,7 @@
 
 static ctl_table kern_table[];
 static ctl_table vm_table[];
+static ctl_table sched_table[];
 #ifdef CONFIG_NET
 extern ctl_table net_table[];
 #endif
@@ -155,6 +165,7 @@
 	{CTL_FS, "fs", NULL, 0, 0555, fs_table},
 	{CTL_DEBUG, "debug", NULL, 0, 0555, debug_table},
         {CTL_DEV, "dev", NULL, 0, 0555, dev_table},
+	{CTL_SCHED, "sched", NULL, 0, 0555, sched_table},
 	{0}
 };
 
@@ -357,7 +368,29 @@
 
 static ctl_table dev_table[] = {
 	{0}
-};  
+};
+
+static ctl_table sched_table[] = {
+	{SCHED_MAX_TIMESLICE, "max_timeslice",
+	&max_timeslice, sizeof(int), 0644, NULL, &proc_dointvec},
+	{SCHED_MIN_TIMESLICE, "min_timeslice",
+	&min_timeslice, sizeof(int), 0644, NULL, &proc_dointvec},
+	{SCHED_CHILD_PENALTY, "child_penalty",
+	&child_penalty, sizeof(int), 0644, NULL, &proc_dointvec},
+	{SCHED_PARENT_PENALTY, "parent_penalty",
+	&parent_penalty, sizeof(int), 0644, NULL, &proc_dointvec},
+	{SCHED_EXIT_WEIGHT, "exit_weight",
+	&exit_weight, sizeof(int), 0644, NULL, &proc_dointvec},
+	{SCHED_PRIO_BONUS_RATIO, "prio_bonus_ratio",
+	&prio_bonus_ratio, sizeof(int), 0644, NULL, &proc_dointvec},
+	{SCHED_INTERACTIVE_DELTA, "interactive_delta",
+	&interactive_delta, sizeof(int), 0644, NULL, &proc_dointvec},
+	{SCHED_MAX_SLEEP_AVG, "max_sleep_avg",
+	&max_sleep_avg, sizeof(int), 0644, NULL, &proc_dointvec},
+	{SCHED_STARVATION_LIMIT, "starvation_limit",
+	&starvation_limit, sizeof(int), 0644, NULL, &proc_dointvec},
+	{0}
+};
 
 extern void init_irq_proc (void);
 

From: Robert Love Subject: Re: [PATCH] scheduler tunables Date: 10 Dec 2002 23:26:26 -0500 On Tue, 2002-12-10 at 18:27, Robert Love wrote: > Attached patch implements sysctl/procfs scheduler tunables, knobs > which let one play with all of the important scheduler variables: Updated version at Andrew's prodding, now with documentation in Documentation/filesystem/proc.txt Again, this patch implements scheduler tunables in /proc/sys/sched: sched.starvation_limit = 2000 sched.max_sleep_avg = 2000 sched.interactive_delta = 2 sched.prio_bonus_ratio = 25 sched.exit_weight = 3 sched.parent_penalty = 100 sched.child_penalty = 95 sched.min_timeslice = 10 sched.max_timeslice = 300 Some notes for testers and tuners: - you can effectively disable the interactivity estimator (both bonuses and penalties) by setting prio_bonus_ratio to zero - some users seem to prefer a lower child_penalty (50 is the default in 2.4-aa) - some workloads may benefit from setting starvation_limit lower (maybe 0.5s?) while a few workloads may actually like it much higher (4s?) Enjoy, Robert Love Documentation/filesystems/proc.txt | 87 +++++++++++++++++++++++++++++++++++++ include/linux/sysctl.h | 15 +++++- kernel/sched.c | 31 +++++++++---- kernel/sysctl.c | 35 ++++++++++++++ 4 files changed, 157 insertions(+), 11 deletions(-) diff -urN linux-2.5.51-mm1/Documentation/filesystems/proc.txt linux/Documentation/filesystems/proc.txt --- linux-2.5.51-mm1/Documentation/filesystems/proc.txt 2002-12-10 17:48:09.000000000 -0500 +++ linux/Documentation/filesystems/proc.txt 2002-12-10 23:15:19.000000000 -0500 @@ -37,6 +37,7 @@ 2.8 /proc/sys/net/ipv4 - IPV4 settings 2.9 Appletalk 2.10 IPX + 2.11 /proc/sys/sched - scheduler tunables ------------------------------------------------------------------------------ Preface @@ -1663,6 +1664,92 @@ gives the destination network, the router node (or Directly) and the network address of the router (or Connected) for internal networks. +2.11 /proc/sys/sched - scheduler tunables +----------------------------------------- + +Useful knobs for tuning the scheduler live in /proc/sys/sched. + +child_penalty +------------- + +Percentage of the parent's sleep_avg that children inherit. sleep_avg is +a running average of the time a process spends sleeping. Tasks with high +sleep_avg values are considered interactive and given a higher dynamic +priority and a larger timeslice. You typically want this some value just +under 100. + +exit_weight +----------- + +When a CPU hog task exits, its parent's sleep_avg is reduced by a factor of +exit_weight against the exiting task's sleep_avg. + +interactive_delta +----------------- + +If a task is "interactive" it is reinserted into the active array after it +has expired its timeslice, instead of being inserted into the expired array. +How "interactive" a task must be in order to be deemed interactive is a +function of its nice value. This interactive limit is scaled linearly by nice +value and is offset by the interactive_delta. + +max_sleep_avg +------------- + +max_sleep_avg is the largest value (in ms) stored for a task's running sleep +average. The larger this value, the longer a task needs to sleep to be +considered interactive (maximum interactive bonus is a function of +max_sleep_avg). + +max_timeslice +------------- + +Maximum timeslice, in milliseconds. This is the value given to tasks of the +highest dynamic priority. + +min_timeslice +------------- + +Minimum timeslice, in milliseconds. This is the value given to tasks of the +lowest dynamic priority. Every task gets at least this slice of the processor +per array switch. + +parent_penalty +-------------- + +Percentage of the parent's sleep_avg that it retains across a fork(). +sleep_avg is a running average of the time a process spends sleeping. Tasks +with high sleep_avg values are considered interactive and given a higher +dynamic priority and a larger timeslice. Normally, this value is 100 and thus +task's retain their sleep_avg on fork. If you want to punish interactive +tasks for forking, set this below 100. + +prio_bonus_ratio +---------------- + +Middle percentage of the priority range that tasks can receive as a dynamic +priority. The default value of 25% ensures that nice values at the +extremes are still enforced. For example, nice +19 interactive tasks will +never be able to preempt a nice 0 CPU hog. Setting this higher will increase +the size of the priority range the tasks can receive as a bonus. Setting +this lower will decrease this range, making the interactivity bonus less +apparent and user nice values more applicable. + +starvation_limit +---------------- + +Sufficiently interactive tasks are reinserted into the active array when they +run out of timeslice. Normally, tasks are inserted into the expired array. +Reinserting interactive tasks into the active array allows them to remain +runnable, which is important to interactive performance. This could starve +expired tasks, however, since the interactive task could prevent the array +switch. To prevent starving the tasks on the expired array for too long. the +starvation_limit is the longest (in ms) we will let the expired array starve +at the expense of reinserting interactive tasks back into active. Higher +values here give more preferance to running interactive tasks, at the expense +of expired tasks. Lower values provide more fair scheduling behavior, at the +expense of interactivity. The units are in milliseconds. + ------------------------------------------------------------------------------ Summary ------------------------------------------------------------------------------ diff -urN linux-2.5.51-mm1/include/linux/sysctl.h linux/include/linux/sysctl.h --- linux-2.5.51-mm1/include/linux/sysctl.h 2002-12-10 17:48:10.000000000 -0500 +++ linux/include/linux/sysctl.h 2002-12-10 16:50:41.000000000 -0500 @@ -66,7 +66,8 @@ CTL_DEV=7, /* Devices */ CTL_BUS=8, /* Busses */ CTL_ABI=9, /* Binary emulation */ - CTL_CPU=10 /* CPU stuff (speed scaling, etc) */ + CTL_CPU=10, /* CPU stuff (speed scaling, etc) */ + CTL_SCHED=11, /* scheduler tunables */ }; /* CTL_BUS names: */ @@ -157,6 +158,18 @@ VM_LOWER_ZONE_PROTECTION=20,/* Amount of protection of lower zones */ }; +/* Tunable scheduler parameters in /proc/sys/sched/ */ +enum { + SCHED_MIN_TIMESLICE=1, /* minimum process timeslice */ + SCHED_MAX_TIMESLICE=2, /* maximum process timeslice */ + SCHED_CHILD_PENALTY=3, /* penalty on fork to child */ + SCHED_PARENT_PENALTY=4, /* penalty on fork to parent */ + SCHED_EXIT_WEIGHT=5, /* penalty to parent of CPU hog child */ + SCHED_PRIO_BONUS_RATIO=6, /* percent of max prio given as bonus */ + SCHED_INTERACTIVE_DELTA=7, /* delta used to scale interactivity */ + SCHED_MAX_SLEEP_AVG=8, /* maximum sleep avg attainable */ + SCHED_STARVATION_LIMIT=9, /* no re-active if expired is starved */ +}; /* CTL_NET names: */ enum diff -urN linux-2.5.51-mm1/kernel/sched.c linux/kernel/sched.c --- linux-2.5.51-mm1/kernel/sched.c 2002-12-10 17:48:10.000000000 -0500 +++ linux/kernel/sched.c 2002-12-10 16:33:34.000000000 -0500 @@ -57,16 +57,29 @@ * Minimum timeslice is 10 msecs, default timeslice is 150 msecs, * maximum timeslice is 300 msecs. Timeslices get refilled after * they expire. + * + * They are configurable via /proc/sys/sched */ -#define MIN_TIMESLICE ( 10 * HZ / 1000) -#define MAX_TIMESLICE (300 * HZ / 1000) -#define CHILD_PENALTY 95 -#define PARENT_PENALTY 100 -#define EXIT_WEIGHT 3 -#define PRIO_BONUS_RATIO 25 -#define INTERACTIVE_DELTA 2 -#define MAX_SLEEP_AVG (2*HZ) -#define STARVATION_LIMIT (2*HZ) + +int min_timeslice = (10 * HZ) / 1000; +int max_timeslice = (300 * HZ) / 1000; +int child_penalty = 95; +int parent_penalty = 100; +int exit_weight = 3; +int prio_bonus_ratio = 25; +int interactive_delta = 2; +int max_sleep_avg = 2 * HZ; +int starvation_limit = 2 * HZ; + +#define MIN_TIMESLICE (min_timeslice) +#define MAX_TIMESLICE (max_timeslice) +#define CHILD_PENALTY (child_penalty) +#define PARENT_PENALTY (parent_penalty) +#define EXIT_WEIGHT (exit_weight) +#define PRIO_BONUS_RATIO (prio_bonus_ratio) +#define INTERACTIVE_DELTA (interactive_delta) +#define MAX_SLEEP_AVG (max_sleep_avg) +#define STARVATION_LIMIT (starvation_limit) /* * If a task is 'interactive' then we reinsert it in the active diff -urN linux-2.5.51-mm1/kernel/sysctl.c linux/kernel/sysctl.c --- linux-2.5.51-mm1/kernel/sysctl.c 2002-12-10 17:48:10.000000000 -0500 +++ linux/kernel/sysctl.c 2002-12-10 17:05:04.000000000 -0500 @@ -54,6 +54,15 @@ extern int cad_pid; extern int pid_max; extern int sysctl_lower_zone_protection; +extern int min_timeslice; +extern int max_timeslice; +extern int child_penalty; +extern int parent_penalty; +extern int exit_weight; +extern int prio_bonus_ratio; +extern int interactive_delta; +extern int max_sleep_avg; +extern int starvation_limit; /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ static int maxolduid = 65535; @@ -111,6 +120,7 @@ static ctl_table kern_table[]; static ctl_table vm_table[]; +static ctl_table sched_table[]; #ifdef CONFIG_NET extern ctl_table net_table[]; #endif @@ -155,6 +165,7 @@ {CTL_FS, "fs", NULL, 0, 0555, fs_table}, {CTL_DEBUG, "debug", NULL, 0, 0555, debug_table}, {CTL_DEV, "dev", NULL, 0, 0555, dev_table}, + {CTL_SCHED, "sched", NULL, 0, 0555, sched_table}, {0} }; @@ -357,7 +368,29 @@ static ctl_table dev_table[] = { {0} -}; +}; + +static ctl_table sched_table[] = { + {SCHED_MAX_TIMESLICE, "max_timeslice", + &max_timeslice, sizeof(int), 0644, NULL, &proc_dointvec}, + {SCHED_MIN_TIMESLICE, "min_timeslice", + &min_timeslice, sizeof(int), 0644, NULL, &proc_dointvec}, + {SCHED_CHILD_PENALTY, "child_penalty", + &child_penalty, sizeof(int), 0644, NULL, &proc_dointvec}, + {SCHED_PARENT_PENALTY, "parent_penalty", + &parent_penalty, sizeof(int), 0644, NULL, &proc_dointvec}, + {SCHED_EXIT_WEIGHT, "exit_weight", + &exit_weight, sizeof(int), 0644, NULL, &proc_dointvec}, + {SCHED_PRIO_BONUS_RATIO, "prio_bonus_ratio", + &prio_bonus_ratio, sizeof(int), 0644, NULL, &proc_dointvec}, + {SCHED_INTERACTIVE_DELTA, "interactive_delta", + &interactive_delta, sizeof(int), 0644, NULL, &proc_dointvec}, + {SCHED_MAX_SLEEP_AVG, "max_sleep_avg", + &max_sleep_avg, sizeof(int), 0644, NULL, &proc_dointvec}, + {SCHED_STARVATION_LIMIT, "starvation_limit", + &starvation_limit, sizeof(int), 0644, NULL, &proc_dointvec}, + {0} +}; extern void init_irq_proc (void);

Related Links:

tunables?

Anonymous
on
December 11, 2002 - 8:48pm

I don't think "tunable" is a noun. "Knobs" would be a better name.

nothing is a word until people use it

Anonymous
on
December 11, 2002 - 11:15pm

Did you not understand what "tunables" meant?

> dict tunable 1 defi

molo
on
December 11, 2002 - 11:21pm
> dict tunable
1 definition found

From Webster's Revised Unabridged Dictionary (1913) [web1913]:

  Tunable Tun"a*ble, a.
     Capable of being tuned, or made harmonious; hence,
     harmonious; musical; tuneful. -- {Tun"a*ble*ness}, n. --
     {Tun"a*bly}, adv.
  
           And tunable as sylvan pipe or song.      --Milton.

So, as per your definition,

Anonymous
on
December 12, 2002 - 5:43pm

So, as per your definition, "tunable" is an adjective, not a noun.

i.e. "tunable knob" is correct (although perhaps redundant), but not "tunable".

Hate to point out the obvious

Anonymous
on
December 12, 2002 - 5:59pm

That little 'a.' after 'Tun"a*ble' means it's an adjective, not a noun.

hehe

David Nielsen
on
December 12, 2002 - 8:38pm

You can call it tunasandwich for all I care... as long as it works and it stays that way...

I didn't say it was a noun.

molo
on
December 12, 2002 - 9:33pm

I didn't say it was a noun. I was just providing a definition, which obviously states it is an adjective.

To be tunable or not to be

Anonymous
on
December 13, 2002 - 12:08am

I guess here the mean point is that Robert gives a way to tune the scheduler to test it and compare what are the best tunes for special workloads or whatever .. (fun ? ;-)
That's why, i'll just say yeapie ! The tunable scheduling policy is out
Thanks Robert.

I agree. Balancing so many pe

Anonymous
on
December 13, 2002 - 1:33am

I agree. Balancing so many performance "knobs" is a difficult task. Exposing these parameters let's all those LKML fans use their favorite benchmarks (like Contest) to do this fuzzy research. Brilliant way to divide and conquer!

Did Linux 2.4 have similar scheduler knobs?

To be, To be! :-)

KiTaSuMbA
on
December 13, 2002 - 7:54am

I currently use the 2.4.18-wolk-3.7.1 kernel that includes O(1) and Con's timeslice autoregulator... The system works very nice at normal "desktop stuff" even at heavy loads but becomes awckward when you change desktops too fast, and run one too many CPU hoggers... So, there are specific loads that you'd like the kernel acted differently. Now think of this: change the kernel's behaviour on-the-fly in /proc... Think of scripts doing your workload presets (presets you had fun for a month choosing :-). I plan on heavy usage of my linux box for music...(and that's "special" enough, considering that the linux way is made of many apps working together at minimum latency). Knobs... music... hell, I might even try a front end :-))

Scheduler tuneables/tweaks/patches

Anonymous
on
December 14, 2002 - 12:37am

I think it's nice to have all those tuneables. But on the downside, isn't needing to have a lot of tuneables in the first place the sign of a bad design? For example, I run FreeBSD, and it can pretty much handle any load very smoothly without the need for all these scheduler patches and tweaks. By comparison, it seems like the other processes bog down in Linux lot more in response to one or two other processes hogging the CPU. Maybe it's an issue with some sort of exponential adjustment period of each process (ramping function).

Re: Scheduler tuneables/tweaks/patches

rml
on
December 14, 2002 - 1:27am

I think it's nice to have all those tuneables. But on the downside, isn't needing to have a lot of tuneables in the first place the sign of a bad design?

Yes, I think having algorithms that require tuning on behalf of the user is an excellent sign of having algorithms that suck. No doubt.

The point of these TUNABLES (noun) is to allow debugging and tuning more-so of the scheduler algorithms themselves. I.e., it lets developers find what better default values may be or where some algorithms just suck regardless. Of course, advanced users can also play around just for fun. Even the most perfect algorithm could still be fine-tuned for a specific workload.

But, in short, I agree 100%. While this patch is now in 2.5-mm I do not think it is needed in mainline. What we will have in mainline is just damn good algorithms. We currently have a corner case where the interactivity estimator goes apeshit and that was the reason for me doing this patch.

Regards,
Robert

Contest runs

Con Kolivas
on
December 14, 2002 - 4:49am

I'd love to help out and do a whole family of runs with contest to see what all these do but 2.5 killed my osdl box with that ext3 bug. I'm too afraid to use my own machine any more for benchmarking development kernels so I'm afraid it will have to wait :-\

Re: Contest runs

rml
on
December 14, 2002 - 6:38am

I'd love to help out and do a whole family of runs with contest to see what all these do but 2.5 killed my osdl box with that ext3 bug. I'm too afraid to use my own machine any more for benchmarking development kernels so I'm afraid it will have to wait

Hey Con.

If its the same ext3 bug that bit me (see lkml posting) the fix is out and now in 2.5-mm. Simple off-by-one mistake. I believe the bug was only in 2.5-mm, though. So if the problem happened on stock 2.5, you may have another bug. There are no other known ext3 issues, though.

Regards,
Robert

Ext3 fix

Con Kolivas
on
December 15, 2002 - 9:26am

Thanks Robert.

As soon as my osdl box is online again I'll do it. The machine is 10000 miles away so I'm at their mercy.

Cheers,
Con

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.