The recently merged Completely Fair Scheduler changes how the Linux kernel handles scheduling priorities set with the nice command. Ingo Molnar explained that each level of nice adds or substracts 10% of CPU utilization, "the '10% effect' is relative and cumulative: from _any_ nice level, if you go up 1 level, it's -10% CPU usage, if you go down 1 level it's +10% CPU usage." Ingo noted that with the earlier scheduler the nice level was tied to the HZ, offering three examples in which HZ is set to 100, 250, and 300, "a nice +19 task (the most commonly used nice level in practice) gets 9.1%, 3.9%, 3.1% of CPU time on the old scheduler, depending on the value of HZ. This is quite inconsistent and illogical. This HZ dependency of nice levels existed for many years, and the new scheduler solves that inconsistency - every nice level will get the same amount of time, regardless of HZ."
Ingo went on to offer a table comparing positive nice values to the percentage of the CPU that a process gets in relation to a 'nice 0' process: "nice 0: 100.00%; nice 1: 80.00%; nice 2: 64.10%; nice 3: 51.28%; nice 4: 40.98%; nice 5: 32.78%; nice 6: 26.24%; nice 7: 21.00%; nice 8: 16.77%; nice 9: 13.42%; nice 10: 10.74%; nice 11: 8.59%; nice 12: 6.87%; nice 13: 5.50%; nice 14: 4.39%; nice 15: 3.51%; nice 16: 2.81%; nice 17: 2.25%; nice 18: 1.80%; nice 19: 1.44%". He offered another table comparing negative nice values to a 'nice -20' process: "nice 0: 1.15%; nice -1: 1.44%; nice -2: 1.80%; nice -3: 2.25%; nice -4: 2.81%; nice -5: 3.51%; nice -6: 4.39%; nice -7: 5.50%; nice -8: 6.87%; nice -9: 8.59%; nice -10: 10.74%; nice -11: 13.42%; nice -12: 16.77%; nice -13: 21.00%; nice -14: 26.24%; nice -15: 32.78%; nice -16: 40.98%; nice -17: 51.28%; nice -18: 64.10%; nice -19: 80.00%; nice -20: 100.00%"
From: James Bruce [email blocked] To: linux-kernel Subject: Re: [PATCH] CFS: Fix missing digit off in wmult table Date: Mon, 16 Jul 2007 02:18:06 -0400 Thomas Gleixner wrote: > Roman Zippel noticed inconsistency of the wmult table. > wmult[16] has a missing digit. [snip] While we're at it, isn't the comment above the wmult table incorrect? The multiplier is 1.25, meaning a 25% change per nice level, not 10%. - Jim
From: Ingo Molnar [email blocked] To: James Bruce [email blocked] Subject: Re: [PATCH] CFS: Fix missing digit off in wmult table Date: Mon, 16 Jul 2007 09:06:10 +0200 * James Bruce [email blocked] wrote: > While we're at it, isn't the comment above the wmult table incorrect? > The multiplier is 1.25, meaning a 25% change per nice level, not 10%. yes, the weight multiplier 1.25, but the actual difference in CPU utilization, when running two CPU intense tasks, is ~10%: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8246 mingo 20 0 1576 244 196 R 55 0.0 0:11.96 loop 8247 mingo 21 1 1576 244 196 R 45 0.0 0:10.52 loop so the first task 'wins' +10% CPU utilization (relative to the 50% it had before), the second task 'loses' -10% CPU utilization (relative to the 50% it had before). so what the comment says is true: * The "10% effect" is relative and cumulative: from _any_ nice level, * if you go up 1 level, it's -10% CPU usage, if you go down 1 level * it's +10% CPU usage. for there to be a ~+10% change in CPU utilization for a task that races against another CPU-intense task there needs to be a ~25% change in the weight. Ingo
From: Ingo Molnar [email blocked] To: James Bruce [email blocked] Subject: Re: [PATCH] CFS: Fix missing digit off in wmult table Date: Mon, 16 Jul 2007 09:41:02 +0200 * Ingo Molnar [email blocked] wrote: > * James Bruce [email blocked] wrote: > > > While we're at it, isn't the comment above the wmult table incorrect? > > The multiplier is 1.25, meaning a 25% change per nice level, not 10%. > > yes, the weight multiplier 1.25, but the actual difference in CPU > utilization, when running two CPU intense tasks, is ~10%: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 8246 mingo 20 0 1576 244 196 R 55 0.0 0:11.96 loop > 8247 mingo 21 1 1576 244 196 R 45 0.0 0:10.52 loop > > so the first task 'wins' +10% CPU utilization (relative to the 50% it > had before), the second task 'loses' -10% CPU utilization (relative to > the 50% it had before). > > so what the comment says is true: > > * The "10% effect" is relative and cumulative: from _any_ nice level, > * if you go up 1 level, it's -10% CPU usage, if you go down 1 level > * it's +10% CPU usage. > > for there to be a ~+10% change in CPU utilization for a task that > races against another CPU-intense task there needs to be a ~25% change > in the weight. in any case more documentation is justified, so i've added some clarification to the comments - see the patch below. Ingo ------------------------> Subject: sched: improve weight-array comments From: Ingo Molnar [email blocked] improve the comments around the wmult array (which controls the weight of niced tasks). Clarify that to achieve a 10% difference in CPU utilization, a weight multiplier of 1.25 has to be used. Signed-off-by: Ingo Molnar [email blocked] --- kernel/sched.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Index: linux/kernel/sched.c =================================================================== --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -736,7 +736,9 @@ static void update_curr_load(struct rq * * * The "10% effect" is relative and cumulative: from _any_ nice level, * if you go up 1 level, it's -10% CPU usage, if you go down 1 level - * it's +10% CPU usage. + * it's +10% CPU usage. (to achieve that we use a multiplier of 1.25. + * If a task goes up by ~10% and another task goes down by ~10% then + * the relative distance between them is ~25%.) */ static const int prio_to_weight[40] = { /* -20 */ 88818, 71054, 56843, 45475, 36380, 29104, 23283, 18626, 14901, 11921,
From: Roman Zippel [email blocked] To: Ingo Molnar [email blocked] Subject: Re: [PATCH] CFS: Fix missing digit off in wmult table Date: Mon, 16 Jul 2007 12:18:05 +0200 (CEST) Hi, On Mon, 16 Jul 2007, Ingo Molnar wrote: > yes, the weight multiplier 1.25, but the actual difference in CPU > utilization, when running two CPU intense tasks, is ~10%: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 8246 mingo 20 0 1576 244 196 R 55 0.0 0:11.96 loop > 8247 mingo 21 1 1576 244 196 R 45 0.0 0:10.52 loop > > so the first task 'wins' +10% CPU utilization (relative to the 50% it > had before), the second task 'loses' -10% CPU utilization (relative to > the 50% it had before). As soon as you add another loop the difference changes again, while it's always correct to say it gets 25% more cpu time (which I still think is a little too much). bye, Roman
From: Ingo Molnar [email blocked] To: Roman Zippel [email blocked] Subject: Re: [PATCH] CFS: Fix missing digit off in wmult table Date: Mon, 16 Jul 2007 13:20:37 +0200 * Roman Zippel [email blocked] wrote: > > yes, the weight multiplier 1.25, but the actual difference in CPU > > utilization, when running two CPU intense tasks, is ~10%: > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 8246 mingo 20 0 1576 244 196 R 55 0.0 0:11.96 loop > > 8247 mingo 21 1 1576 244 196 R 45 0.0 0:10.52 loop > > > > so the first task 'wins' +10% CPU utilization (relative to the 50% > > it had before), the second task 'loses' -10% CPU utilization > > (relative to the 50% it had before). > > As soon as you add another loop the difference changes again, while > it's always correct to say it gets 25% more cpu time [...] yep, and i'll add the relative effect to the comment too. Ingo
From: Roman Zippel [email blocked] To: Ingo Molnar [email blocked] Subject: Re: [PATCH] CFS: Fix missing digit off in wmult table Date: Mon, 16 Jul 2007 13:58:36 +0200 (CEST) Hi, On Mon, 16 Jul 2007, Ingo Molnar wrote: > > As soon as you add another loop the difference changes again, while > > it's always correct to say it gets 25% more cpu time [...] > > yep, and i'll add the relative effect to the comment too. Why did you cut off the rest of the sentence? To illustrate the problem a little different: a task with a nice level -20 got around 700% more cpu time (or 8 times more), now it gets 8500% more cpu time (or 86.7 times more). You don't think that change to the nice levels is a little drastic? bye, Roman
From: Linus Torvalds [email blocked] To: Roman Zippel [email blocked] Subject: Re: [PATCH] CFS: Fix missing digit off in wmult table Date: Mon, 16 Jul 2007 10:47:06 -0700 (PDT) On Mon, 16 Jul 2007, Roman Zippel wrote: > > To illustrate the problem a little different: a task with a nice level -20 > got around 700% more cpu time (or 8 times more), now it gets 8500% more > cpu time (or 86.7 times more). Ingo, that _does_ sound excessive. How about trying a much less aggressive nice-level (and preferably linear, not exponential)? Linus
From: Ingo Molnar [email blocked] To: Roman Zippel [email blocked] Subject: Re: [PATCH] CFS: Fix missing digit off in wmult table Date: Mon, 16 Jul 2007 14:12:46 +0200 * Roman Zippel [email blocked] wrote: > On Mon, 16 Jul 2007, Ingo Molnar wrote: > > > > As soon as you add another loop the difference changes again, > > > while it's always correct to say it gets 25% more cpu time [...] > > > > yep, and i'll add the relative effect to the comment too. > > Why did you cut off the rest of the sentence? (no need to become hostile, i answered to that portion of your sentence separately, which was logically detached from the other portion of your sentence. I marked the cut with the '[...]' sign. ) > To illustrate the problem a little different: a task with a nice level > -20 got around 700% more cpu time (or 8 times more), now it gets 8500% > more cpu time (or 86.7 times more). You don't think that change to the > nice levels is a little drastic? This was discussed on lkml in detail, see the CFS threads. It has been a common request for nice levels to be more logical (i.e. to make them universal and to detach them from HZ) and for them to be more effective as well. Ingo
From: Roman Zippel [email blocked] To: Ingo Molnar [email blocked] Subject: Re: [PATCH] CFS: Fix missing digit off in wmult table Date: Mon, 16 Jul 2007 14:42:30 +0200 (CEST) Hi, On Mon, 16 Jul 2007, Ingo Molnar wrote: > > > > As soon as you add another loop the difference changes again, > > > > while it's always correct to say it gets 25% more cpu time [...] > > > > > > yep, and i'll add the relative effect to the comment too. > > > > Why did you cut off the rest of the sentence? > > (no need to become hostile, i answered to that portion of your sentence > separately, which was logically detached from the other portion of your > sentence. I marked the cut with the '[...]' sign. ) Could you please stop with these accusations? Could you please point me to the mail with the separate answer? > > To illustrate the problem a little different: a task with a nice level > > -20 got around 700% more cpu time (or 8 times more), now it gets 8500% > > more cpu time (or 86.7 times more). You don't think that change to the > > nice levels is a little drastic? > > This was discussed on lkml in detail, see the CFS threads. Which are quite big, so I skipped most of it, a more precise pointer would be appreciated. > It has been a > common request for nice levels to be more logical (i.e. to make them > universal and to detach them from HZ) and for them to be more effective > as well. Huh? What has this to do with HZ? The scheduler used ticks internally, but it's irrelevant to what the user sees via the nice levels. So the question still stands that this change may be a little drastic, as you changed the nice levels of _all_ users, not just of those who were previously interested in CFS. bye, Roman
From: Ingo Molnar [email blocked] To: Roman Zippel [email blocked] Subject: Re: [PATCH] CFS: Fix missing digit off in wmult table Date: Mon, 16 Jul 2007 15:40:36 +0200 * Roman Zippel [email blocked] wrote: > > It has been a common request for nice levels to be more logical > > (i.e. to make them universal and to detach them from HZ) and for > > them to be more effective as well. > > Huh? What has this to do with HZ? The scheduler used ticks internally, > but it's irrelevant to what the user sees via the nice levels. [...] unfortunately you are wrong again - there are various HZ related artifacts in the nice level support code of the old scheduler. v2.6.22, CONFIG_HZ=100, nice +19 task against a nice-0 CPU-intense task: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2446 mingo 25 0 1576 244 196 R 90.9 0.0 0:32.79 loop 2448 mingo 39 19 1580 248 196 R 9.1 0.0 0:02.94 loop v2.6.22, CONFIG_HZ=250, nice +19 task against a nice-0 CPU-intense task: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2358 mingo 25 0 1576 248 196 R 96.1 0.0 0:31.97 loop_silent 2363 mingo 39 19 1576 244 196 R 3.9 0.0 0:01.24 loop_silent v2.6.22, CONFIG_HZ=300, nice +19 task against a nice-0 CPU-intense task: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2332 mingo 25 0 1580 248 196 R 95.1 0.0 0:11.84 loop_silent 2335 mingo 39 19 1576 244 196 R 3.1 0.0 0:00.39 loop_silent to sum it up: a nice +19 task (the most commonly used nice level in practice) gets 9.1%, 3.9%, 3.1% of CPU time on the old scheduler, depending on the value of HZ. This is quite inconsistent and illogical. this HZ dependency of nice levels existed for many years, and the new scheduler solves that inconsistency - every nice level will get the same amount of time, regardless of HZ. Ingo
From: Roman Zippel [email blocked] To: Ingo Molnar [email blocked] Subject: Re: [PATCH] CFS: Fix missing digit off in wmult table Date: Mon, 16 Jul 2007 16:01:17 +0200 (CEST) Hi, On Mon, 16 Jul 2007, Ingo Molnar wrote: > to sum it up: a nice +19 task (the most commonly used nice level in > practice) gets 9.1%, 3.9%, 3.1% of CPU time on the old scheduler, > depending on the value of HZ. This is quite inconsistent and illogical. You're correct that you can find artifacts in the extreme cases, it's subjective whether this is a serious problem. It's nice that these artifacts are gone, but that still doesn't explain why this ratio had to be increase that much from around 1:10 to 1:69. bye, Roman
From: Matt Mackall [email blocked] To: Roman Zippel [email blocked] Subject: Re: [PATCH] CFS: Fix missing digit off in wmult table Date: Mon, 16 Jul 2007 15:31:18 -0500 On Mon, Jul 16, 2007 at 04:01:17PM +0200, Roman Zippel wrote: > Hi, > > On Mon, 16 Jul 2007, Ingo Molnar wrote: > > > to sum it up: a nice +19 task (the most commonly used nice level in > > practice) gets 9.1%, 3.9%, 3.1% of CPU time on the old scheduler, > > depending on the value of HZ. This is quite inconsistent and illogical. > > You're correct that you can find artifacts in the extreme cases, it's > subjective whether this is a serious problem. > It's nice that these artifacts are gone, but that still doesn't explain > why this ratio had to be increase that much from around 1:10 to 1:69. More dynamic range is better? If you actually want a task to get 20x the CPU time of another, the older scheduler doesn't really allow it. Getting 1/69th of a modern CPU is still a fair number of cycles. Nevermind 1/69th of a machine with > 64 cores. -- Mathematics is the supreme nostalgia of our time.
From: Ingo Molnar [email blocked] To: Matt Mackall [email blocked] Subject: Re: [PATCH] CFS: Fix missing digit off in wmult table Date: Mon, 16 Jul 2007 23:18:51 +0200 * Matt Mackall [email blocked] wrote: > More dynamic range is better? If you actually want a task to get 20x > the CPU time of another, the older scheduler doesn't really allow it. > > Getting 1/69th of a modern CPU is still a fair number of cycles. > Nevermind 1/69th of a machine with > 64 cores. yeah. furthermore, nice -20 is only admin-selectable. Here are the current CPU-use values for positive nice levels: nice 0: 100.00% nice 1: 80.00% nice 2: 64.10% nice 3: 51.28% nice 4: 40.98% nice 5: 32.78% nice 6: 26.24% nice 7: 21.00% nice 8: 16.77% nice 9: 13.42% nice 10: 10.74% nice 11: 8.59% nice 12: 6.87% nice 13: 5.50% nice 14: 4.39% nice 15: 3.51% nice 16: 2.81% nice 17: 2.25% nice 18: 1.80% nice 19: 1.44% here's the CPU utilization table for negative nice levels (relative to a nice -20 task): nice 0: 1.15% nice -1: 1.44% nice -2: 1.80% nice -3: 2.25% nice -4: 2.81% nice -5: 3.51% nice -6: 4.39% nice -7: 5.50% nice -8: 6.87% nice -9: 8.59% nice -10: 10.74% nice -11: 13.42% nice -12: 16.77% nice -13: 21.00% nice -14: 26.24% nice -15: 32.78% nice -16: 40.98% nice -17: 51.28% nice -18: 64.10% nice -19: 80.00% nice -20: 100.00% these are pretty sane, and symmetric across the origo. Nice -20 is the odd one out, because there is no nice +20. But its value is still logical, it's the mirror image of an imaginery nice +20. and note that even on the old scheduler, nice-0 was "3200% more powerful" than nice +19 (with CONFIG_HZ=300), and nice -19 was only 700% more powerful than nice-0. So not only was it inconsistent (and i can create scary numbers too ;), it gave the admin-controlled negative nice levels less of a punch than to user-controlled nice +19. A number of people complained about that, and CFS addresses this. in fact i like it that nice -20 has a slightly bigger punch than it used to have before: it might remove the need to run audio apps (and other multimedia apps) under SCHED_FIFO. (SCHED_FIFO is unprotected against lockups, while under CFS a nice 0 task is still starvation protected against a nice -20 task.) furthermore, there is a quality of implementation issue as well, look at the definition of the nice system call: asmlinkage long sys_nice(int increment) the "increment" is relative. So nice(1) has the same behavioral effect under CFS, regardless of which nice level you start out from. Under the old scheduler, the result depended on which nice level you started out from. Ingo
Important part intentionally left out?
You forgot the most important part of the thread:
http://lkml.org/lkml/2007/7/17/229
"Important" only if you have
"Important" only if you have a preference for flamewars. This guy is challenging Ingo about code that apparently Ingo wrote originally, and pretends that Ingo does not understand his own code (in a very repulsive style). Ingo answers rather patiently, only at the end does he sarcastically question the approach of the guy.
When it turned out that Ingo was right all along (which is not a really big surprise under such circumstances), the guy loses his cool and starts flaming Ingo. That is the mail you quoted.
Precise values
I think it would be nice to have precise values such as 50% instead of 51.28%.
I think there is a too big gap.
nice 1: 80.00%
nice 2: 64.10%
I would like to have 25%, 50%, 75%, 100%.
You want a linear scale,
You want a linear scale, like Linus also suggested.
I like the way it's implemented now:
from level to level you get that same proportional rise in cpu time.
no more "dynamic" cpu utilisation?
If i understand everything correctly, the old system had a more "dynamic" way of setting cpu utilization. Afaik it's like this: setting nice 20 for example would mean "don't schedule this process unless you have absolutely nothing else to do". The new system seems more "fixed": what will happen when you set nice 20 for a process and the cpu has nothing else to do ?
It still does that.
If there's nothing else to to, the really really nice process will be given full CPU.
The CFS code assigns everything a weight, and then sees the amount of time something ran as the actual time divided by the weight... so a "heavier" (negative nice) process "looks like" it ran for less time than it actually ran and the scheduler gives it proportionally more time, and a "lighter" (positive nice) process "looks like" it ran for more time than it really did and gets proportionally less CPU time.
At least, that's what I get from a quick look at gitweb...
I think you're confused
First: Nice 19 isn't "run only when idle." Over the years, there have been attempts at a "SCHED_BATCH" and "SCHED_IDLE" scheduler class that only run a task if the machine is truly idle. Since these can lead to horrible priority inversion problems, they've never really made it in the mainline. "Nice 19" tasks (there is no "nice 20") always get a little bit of CPU if they need it. This prevents lockouts due to priority inversion at least among the SCHED_OTHER tasks. (You can still always hose the system with SCHED_FIFO and SCHED_RR, but you need to be root to do it.)
Ingo's first table describes what relative allocation a CPU hog at nice level 'N' will get relative to what that task would get at nice level 0, if it has to compete another task for the CPU. For instance, suppose you have two CPU hogs running at "nice 0." Under CFS, they both get 50% of the CPU. If you move one to "nice 1" and the balance shifts to 45% vs 55%, the "nice 1" task is now getting 90% of its former allocation (50% goes down to 45%) with the extra going to the other task, giving it 10% more (50% goes up to 55%). In terms of relative distance, the "nice 0" task gets 21% more CPU (not 25% as Ingo stated) than the "nice 1" task. That's when they're both CPU hogs.
A "nice 19" CPU hogging task still gets a little over 1% of allocation it would have gotten at "nice 0" when competing against other tasks demanding the CPU. That's not bad at all. When multiple nice levels are in play, it obviously gets more complex. The nice levels act as weighting factors when determining how big a slice of the pie each gets, when they all want the CPU. Sleeping tasks don't get included, and the CPU is never forced to be idle if there's something that's ready to run. Two "nice 19" tasks fighting for the CPU should each still get 50% if everyone else is sleeping.
The system is still dynamic. If you have a "nice 19" CPU hog and the system is idle, that task still can consume 100% of the CPU. If a "nice 0" task wakes up and demands 1000ms of CPU time, it'll get that 1000ms of CPU time over the next 1010ms-1020ms or so of real time, and the "nice 19" guy will get 10-20ms. That is, the "nice 19" task will still get to run for a little bit, but not very much. Once the "nice 0" task goes back to sleep, the "nice 19" guy resumes eating the CPU.
Make sense?
--
Program Intellivision and play Space Patrol!
If you move one to "nice 1"
Yeah. Note that 25% is the right ratio: these are so small differences that even half a percentage point will change your 21% calculation - the exact CPU utilization is around 55.5% versus 44.5%, which gives exactly 25%.
Fair enough, I guess
Sure, 1.25 gives about the same spread and you can even argue it's in around the roundoff error. It still seemed like voodoo math until I realized the actual number for 55/45 was 1.21, not 1.25. At that point I realized it really is an approximation.
To get precisely 1.25 you need 55.5555555...% vs. 44.4444444...% (5/9 vs. 4/9).
--
Program Intellivision and play Space Patrol!
1 for each percent
Why cant it be, 1%, 2%, 3%, 4%, 5% ... 100% ?
Then you can specify it very precise, but if you don't need that precision, you could just choose 10%, 20%, 30% ... 100%.
I think it would be more easier, make more sense, and be more useful.
But I don't know, because I am a noob.
POSIX is part of it
I'm pretty sure POSIX specifies the -20 to +19 nice range. And besides, what do you gain from finer granularity really? It only matters if all your tasks are CPU hogs, in which case you might rethink what you're doing if you really think you need such fine grain control.
--
Program Intellivision and play Space Patrol!
Guaranteed access to console
I need guaranteed access to the shell/console.
So that no matter what happen, if there is a fork bomb, or if shit hits the fan, or if some application freezes, or some thing gets stuck into an infinite loop, or if some encoder/decoder or something hogs 99 or 100% of the CPU, and shit hits the fan, then I can just access the console, and kill the process.
I need guaranteed access to the console, so that I can always kill the process, and save the system, and have access to the system, even when the system is under really high load.
Ok...
Don't run X, and make getty a task with Real Time priority (SCHED_RR or SCHED_FIFO). I'm pretty sure your login process and therefore your shell will inherit this. SCHED_RR and SCHED_FIFO trump SCHED_OTHER, and offer 100 different hard priority levels. You could put this shell at the highest priority level.
If the disk is getting thrashed, you'll still have to wait behind the thrasher's page faults, but eventually you should be able to get in.
Note that unless you patch "login", though, anyone with console access could get real time priority in such a setup.
(You don't want to run X, because you won't be able to switch out of X back to a text console if the system is really blocking it out. An alternative is to have the emergency console exported over serial to another box.)
Oh, and I should point out that SCHED_RR/SCHED_FIFO's 100 priority levels serve a different purpose than nice levels. Nice levels serve to ratio CPU priority among multiple tasks that actively compete for the CPU. The POSIX real time queues are hard-real time, and do not ratio anything--a higher priority task always gets 100%. Thus, you typically want to divide a large task into many, many small subtasks. The 100 queues allow building up a hierarchy among these subtasks.
--
Program Intellivision and play Space Patrol!
I am a noob
I am a noob, and don't understand much what you said, or how to change it into that configuration. Nor do I think I should need to know much, or need to do such manual configurations myself.
I hear much good things about Linux stability, so I think that all Linux machines should have guaranteed access to the console.
Once again... with feeling. :-)
I said don't run X on console. By that I mean the X windowing environment. Since X lives outside the kernel and has to compete with everything else that's going on, you can't rely on X being responsive on a thrashing machine. Therefore, if you're running a server and you MUST always able to get on console, don't run X on console.
(Note: You can run X applications as remote applications on the system, since that won't affect console as long as they display on another machine.)
The SCHED_OTHER/SCHED_FIFO/SCHED_RR stuff I mentioned is described in the sched_setscheduler(2) man page. These are different scheduling policies defined by POSIX, and Linux implements them. Most tasks operate under SCHED_OTHER.
If you install schedutils, you can use the chrt command to change the priority of a task. In the context of logging in on console, you can make the getty process (which handles logins) have the highest real-time priority available by wrapping it with chrt -r 99. Anyone who logs in on that TTY will inherit the elevated scheduler class.
On my Ubuntu Feisty, which seems to have the non-SysV style init, I went into /etc/event.d and edited tty1 as follows: I left everything alone except the last line, which I changed to read: (the bold is what I added)
(Naturally, you need to do that as root.)
What this does is make virtual console 1 have maximum real-time priority. Anyone who logs in specifically on that terminal has higher priority than anything else in the system for the CPU, period. (Well, except anyone else who picked a real time priority of 99.)
To enable this change you need to run a couple more commands:
This causes init to restart the service with the change you just made.
Do I recommend this? Not really, actually. But it will give you a console that has higher priority than anything else running on the machine. Chances are, you'll still have a hard time logging in, because when a machine thrashes hard core, it's the disk activity that makes it unusable more than the CPU activity.
Fortunately, the schedutils package also offers ionice, which lets you raise your disk I/O priority as well. You can perform a similar trick as I mentioned above, replacing chrt -r 99 with ionice -c1 -n0 to specify that all logins on that console belong to the "real time" I/O scheduling class with maximum priority. That'll at least elevate normal I/O so you get ahead of whoever's thrashing the disk. I'm not sure if it has any impact on paging traffic.
If you're really nuts, you can combine them both, i.e. chrt -r 99 ionice -c1 -n0.
--
Program Intellivision and play Space Patrol!