Linux: History Of Nice Levels

Submitted by Jeremy
on July 19, 2007 - 4:36am

In a continued thread about how the recently merged Completely Fair Scheduler affects the nice command, Ingo Molnar offered a history of nice levels in the Linux kernel. He began by describing the three most frequent complaints he has received, first was "nice levels were always so weak under Linux that people continuously bugged me about making nice +19 tasks use up much less CPU time", second was "the fact that nice level behavior depended on the _absolute_ nice level as well, while the nice API itself is fundamentally 'relative'", and third was "negative nice levels were not 'punchy enough', so lots of people had to resort to run audio (and other multimedia) apps under RT priorities such as SCHED_FIFO."

Ingo then noted, "CFS addresses all three types of complaints". For the first complaint he noted that he "decoupled the scheduler from 'time slice' and HZ concepts (and made granularity a separate concept from nice levels) and thus CFS was able to implement better and more consistent nice +19 support: now in CFS nice +19 tasks get a HZ-independent 1.5%, instead of the variable 3%-5%-9% range they got in the old scheduler." For the second type of complaint he "made nice(1) have the same CPU utilization effect on tasks, regardless of their absolute nice levels. So on CFS, running a nice +10 and a nice +11 task has the same CPU utilization 'split' between them as running a nice -5 and a nice -4 task. (one will get 55% of the CPU, the other 45%.)" And the third type of complaint "is addressed by CFS almost automatically: stronger negative nice levels are an automatic side-effect of the recalibrated dynamic range of nice levels."


From: Ingo Molnar [email blocked]
To:	Roman Zippel [email blocked]
Subject: Re: [PATCH] CFS: Fix missing digit off in wmult table
Date:	Wed, 18 Jul 2007 18:02:51 +0200


* Roman Zippel [email blocked] wrote:

> > _changing_ it is an option within reason, and we've done it a couple 
> > of times already in the past, and even within CFS (as Peter 
> > correctly observed) we've been through a couple of iterations 
> > already. And as i mentioned it before, the outer edge of nice levels 
> > (+19, by far the most commonly used nice level) was inconsistent to 
> > begin with: 3%, 5%, 9% of nice-0, depending on HZ.
> 
> Why do you constantly stress level 19? Yes, that one is special, all 
> other positive levels were already relatively consistent.

i constantly stress it for the reason i mentioned a good number of 
times: because it's by far the most commonly used (and complained about) 
nice level. =B-)

but because you are asking, i'm glad to give you some first-hand 
historic background about Linux nice levels (in case you are interested) 
and the motivations behind their old and new implementations:

nice levels were always so weak under Linux (just read Peter's report) 
that people continuously bugged me about making nice +19 tasks use up 
much less CPU time. Unfortunately that was not that easy to implement 
(otherwise we'd have done it long ago) because nice level support was 
historically coupled to timeslice length, and timeslice units were 
driven by the HZ tick, so the smallest timeslice was 1/HZ.

In the O(1) scheduler (about 4 years ago) i changed negative nice levels 
to be much stronger than they were before in 2.4 (and people were happy 
about that change), and i also intentionally calibrated the linear 
timeslice rule so that nice +19 level would be _exactly_ 1 jiffy. To 
better understand it, the timeslice graph went like this (cheesy ASCII 
art alert!):


                   A
             \     | [timeslice length]
              \    |
               \   |
                \  |
                 \ |
                  \|___100msecs
                   |^ . _
                   |      ^ . _
                   |            ^ . _
 -*----------------------------------*-----> [nice level]
 -20               |                +19
                   |
                   |

so that if someone wants to really renice tasks, +19 would give a much 
bigger hit than the normal linear rule would do. (The solution of 
changing the ABI to extend priorities was discarded early on.)

This approach worked to some degree for some time, but later on with 
HZ=1000 it caused 1 jiffy to be 1 msec, which meant 0.1% CPU usage which 
we felt to be a bit excessive. Excessive _not_ because it's too small of 
a CPU utilization, but because it causes too frequent (once per 
millisec) rescheduling. (and would thus trash the cache, etc. Remember, 
this was 4-5 years ago when hardware was weaker and caches were smaller, 
and people were running number crunching apps at nice +19.)

So for HZ=1000 i changed nice +19 to 5msecs, because that felt like the 
right minimal granularity - and this translates to 5% CPU utilization. 
But the fundamental HZ-sensitive property for nice+19 still remained, 
and i never got a single complaint about nice +19 being too _weak_ in 
terms of CPU utilization, i only got complaints about it (still) being 
way too _strong_.

To sum it up: i always wanted to make nice levels more consistent, but 
within the constraints of HZ and jiffies and their nasty design level 
coupling to timeslices and granularity it was not really viable.

The second (less frequent but still periodically occuring) complaint 
about Linux's nice level support was its assymetry around the origo 
(which you can see demonstrated in the picture above), or more 
accurately: the fact that nice level behavior depended on the _absolute_ 
nice level as well, while the nice API itself is fundamentally 
"relative":

   int nice(int inc);

   asmlinkage long sys_nice(int increment)

(the first one is the glibc API, the second one is the syscall API.) 
Note that the 'inc' is relative to the current nice level. Tools like 
bash's "nice" command mirror this relative API.

With the old scheduler, if you for example started a niced task with +1 
and another task with +2, the CPU split between the two tasks would 
depend on the nice level of the parent shell - if it was at nice -10 the 
CPU split was different than if it was at +5 or +10.

A third complaint against Linux's nice level support was that negative 
nice levels were not 'punchy enough', so lots of people had to resort to 
run audio (and other multimedia) apps under RT priorities such as 
SCHED_FIFO. But this caused other problems: SCHED_FIFO is not starvation 
proof, and a buggy SCHED_FIFO app can also lock up the system for good.

CFS addresses all three types of complaints:

To address the first complaint (of nice levels being not "punchy" 
enough), i decoupled the scheduler from 'time slice' and HZ concepts 
(and made granularity a separate concept from nice levels) and thus CFS 
was able to implement better and more consistent nice +19 support: now 
in CFS nice +19 tasks get a HZ-independent 1.5%, instead of the variable 
3%-5%-9% range they got in the old scheduler.

To address the second complaint (of nice levels not being consistent), i 
made nice(1) have the same CPU utilization effect on tasks, regardless 
of their absolute nice levels. So on CFS, running a nice +10 and a nice 
+11 task has the same CPU utilization "split" between them as running a 
nice -5 and a nice -4 task. (one will get 55% of the CPU, the other 
45%.) That is why I changed nice levels to be "multiplicative" (or 
exponential) - that way it does not matter which nice level you start 
out from, the 'relative result' will always be the same.

The third complaint (of negative nice levels not being "punchy" enough 
and forcing audio apps to run under the more dangerous SCHED_FIFO 
scheduling policy) is addressed by CFS almost automatically: stronger 
negative nice levels are an automatic side-effect of the recalibrated 
dynamic range of nice levels.

Hope this helps,

	Ingo

Related Links:

I guess that the

Anonymous (not verified)
on
July 19, 2007 - 1:05pm

I guess that the inefficiency of the nice setting is rather related to the fact the it is not coupled to I/O resources like disk I/O. How does CFS solve that? It does not. Not at all.

Was it designed to? Or is

Anonymous (not verified)
on
July 19, 2007 - 1:25pm

Was it designed to? Or is that something suited towards an I/O schedular, which _should_ take nice into effect?

You can already use ionice

Anonymous (not verified)
on
July 19, 2007 - 1:50pm

You can already use ionice for that. This is the domain of the IO scheduler, not of the task/CPU scheduler.

Where?

MungBean (not verified)
on
July 19, 2007 - 5:34pm

My Linux box doesn't have an ionice command, so I was wondering whether this is only a feature of certain kernel versions. A quick Google suggests it's limited to a couple of schedulers that are available as patchsets.

Your box is a bit out of

Anonymous (not verified)
on
July 20, 2007 - 3:04am

Your box is a bit out of date. From man ionice:
"Linux supports io scheduling priorities and classes since 2.6.13 with the CFQ io scheduler." Which if I am not mistaken is the default io scheduler.

Install schedutils

on
July 20, 2007 - 9:58am

Even on my shiny Ubuntu Feisty Fawn install, I didn't have ionice by default. I had to install the schedutils package. Once that was in, though, I was in good shape. The schedutils package also includes chrt to set real-time scheduling priorities, and taskset to manage CPU affinity on an SMP box.

--
Program Intellivision and play Space Patrol!

Actually you don't need

Anonymous (not verified)
on
July 20, 2007 - 2:20pm

Actually you don't need ionice and schedutils, though they won't hurt. When using CFQ (default), the I/O scheduler does take nice levels into account, which are set by nice(1).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.