Linux: 2.6.8.1 CPU Scheduler Documentation

Submitted by Jeremy
on February 14, 2005 - 10:02am

Josh Aas announced the availability of some comprehensive documentation for the Linux CPU scheduler as available in the 2.6.8.1 kernel [story]. The document begins with an introduction talking about the rapid rate of Linux kernel development noting, "this pace of development has led to a situation where very few of the kernel's major components are adequately documented at the implementation level." He goes on to add, "the goal of this paper is to provide in-depth documentation, [...] hopefully of use to kernel developers who must work with the code, a well as students and researchers who wish to understand the implementation of a real, working scheduler."


From: Josh Aas [email blocked]
To:  linux-kernel
Subject: Linux 2.6.8.1 CPU Scheduler Documentation
Date: 	Sun, 13 Feb 2005 18:23:15 -0600

Hello,

I have written an introduction to the Linux 2.6.8.1 CPU scheduler 
implementation. It should help people to understand what is going on in 
the scheduler code faster than they would be able to by just reading 
through the code. The paper can be downloaded in PDF or LyX form from here:

http://josh.trancesoftware.com/linux/

This paper will never be "done," as I'd like to keep improving it over 
time, and updating it to newer versions of the kernel as time allows. If 
you have comments, suggestions, or corrections you'd like to make, 
please email me. Technical corrections in particular would be 
appreciated. Hopefully this can be as accurate and helpful as possible, 
and will inspire more people to look into the Linux scheduler.

My employer, SGI, did not ask me to write this paper - it was done as 
part of a school project last semester. While SGI owns the copyright to 
the paper, they have allowed me to release it under the GNU FDL.

-- 
Josh Aas
Linux System Software
Silicon Graphics, Inc. (SGI)


From: Willy Tarreau [email blocked] Subject: Re: Linux 2.6.8.1 CPU Scheduler Documentation Date: Mon, 14 Feb 2005 06:28:12 +0100 Hello Josh, On Sun, Feb 13, 2005 at 06:23:15PM -0600, Josh Aas wrote: > Hello, > > I have written an introduction to the Linux 2.6.8.1 CPU scheduler > implementation. It should help people to understand what is going on in > the scheduler code faster than they would be able to by just reading > through the code. The paper can be downloaded in PDF or LyX form from > here: > > http://josh.trancesoftware.com/linux/ This is quite an interesting documentation. Howver, the pdf version shows a problem that most of us encountered when distributing PDF documents written in latex : the fonts are blurred and need to be zoomed in to be readable. This is because you used the default latex fonts. You just have to switch to PS fonts and it will be fine. For this, I often added one of the following lines in the latex headers : \usepackage{times} or \usepackage{palatino} I don't know how it must be translated into lyx, but you get the general idea. Thanks anyway for this document which seems rather complete. Regards, Willy
From: Nick Piggin [email blocked] Subject: Re: Linux 2.6.8.1 CPU Scheduler Documentation Date: Mon, 14 Feb 2005 17:18:56 +1100 On Mon, 2005-02-14 at 06:28 +0100, Willy Tarreau wrote: > Hello Josh, > > On Sun, Feb 13, 2005 at 06:23:15PM -0600, Josh Aas wrote: > > Hello, > > > > I have written an introduction to the Linux 2.6.8.1 CPU scheduler > > implementation. It should help people to understand what is going on in > > the scheduler code faster than they would be able to by just reading > > through the code. The paper can be downloaded in PDF or LyX form from > > here: > > > > http://josh.trancesoftware.com/linux/ > > This is quite an interesting documentation. > The multiprocessor scheduling documentation looks pretty accurate, at a quick glance. A few small things though: I don't think you place enough emphasis on the per-CPU-ness of the 2.6 scheduler. For most workloads, this probably yields by far the biggest improvement over the 2.4 scheduler on even small SMP systems, due to much less lock and cacheline contention. Secondly, 7.1.2 can probably be removed completely. We can actually handle this type of SMT balancing correctly without shared runqueues (and in an arguably nicer way). Nick

Related Links:

Doxygen?

Anonymous (not verified)
on
February 14, 2005 - 10:54am

Curious what Sir Linus thinks of such.

Thinks of what?

Anonymous (not verified)
on
February 14, 2005 - 6:11pm

Thinks of what? Documentation? I'm pretty sure he's all for it as long as someone other them him writes it.

More to it than that.

on
February 14, 2005 - 6:35pm

Right, but Doxygen requires specially formatted comments *in the code*, if I recall correctly. That means Linus would be brushing up against Doxygen stuff all the time. He's very likely to form an opinion of it, and it's likely not a positive one.

That's quite different from something living in a separate file or set of files in "linux-2.6/Documentation/".

Such comments *exist*

on
February 16, 2005 - 12:33pm

They exist and are used for the interface description (no example at hand). You can see them by typing a "make *docs", like make htmldocs, and looking at the "kernel-api" one, which is made exactly this way (not with Doxygen I think, but the idea is the same one).

Such comments proved to be a lot more maintainable than separate comments, especially for API description.

When you are going to describe the general algorithm used by a piece of code, with examples and aiming to reader's understanding, you often prefer to place the comment separately, to avoid polluting too much the source.

In code docs

on
February 16, 2005 - 3:18pm

I agree with Blaisorblade 100% on this. I hope the powers that be do also.

Great

Anonymous (not verified)
on
February 14, 2005 - 12:41pm

Absolutly great, up-to-date docs are really what linux needs to attract more hackers and developers.
I am now waiting for a similar document tat explains how the MM works :-)

Here you go

Josh Aas (not verified)
on
February 14, 2005 - 6:24pm

I found this to be up-to-date enough for getting a handle on things:

http://www.csn.ul.ie/~mel/projects/vm/

Too bad ...

Anonymous (not verified)
on
February 15, 2005 - 8:21pm

that you have to buy the book for the up-to-date version of this docs :-)

No, it's Open Source

on
February 16, 2005 - 12:34pm

It is not at all easy to find the link, however the book is under some kind of Open License and it's freely downloadable (I found it by using Google last time)...

Thanks Josh.

Anonymous (not verified)
on
February 14, 2005 - 12:48pm

Thanks Josh.

LaTeX (LyX) and fonts in PDF file

Jakub Narebski (not verified)
on
February 14, 2005 - 1:53pm


You just have to switch to PS fonts and it will be fine. For
this, I often added one of the following lines in the latex headers :
\usepackage{times}
or
\usepackage{palatino}

\usepackage{pslatex}

is also common in headers/preamble of LaTeX files meant for PDF creation. pslatex makes PDF file using embedded (build-in) fonts, which result in smaller file. And of course fonts are proper vector fonts, not bitmap fonts.

BTW. Josh, could you add title to mentioned page, http://josh.trancesoftware.com/linux/, please?

Re: LaTeX (LyX) and fonts in PDF file

Anonymous (not verified)
on
February 16, 2005 - 5:15am

I read that the problem with LaTeX and fonts is that by default, Type 1 fonts (i.e., bitmapped fonts) are used. Type 3 (scalable) fonts should be used instead. To do that, I've set this alias:
alias dvips='dvips -Pcmz -Pamz -t a4'
(the "-t a4" part sets the paper size, you may not need it) and produce PDF documents with dvips followed by ps2pdf. No changes in the LaTeX document required. Don't know how to set this for LyX, though.

Re: LaTeX (LyX) and fonts in PDF file

Rob Funk (not verified)
on
February 18, 2005 - 12:52am

You have it backwards. Type 1 fonts are totally scalable. Type 3 fonts are usually (though not always) bitmaps. If you find the config.cmz and config.amz files on your system, you'll probably find comments about using Type 1 fonts.

I find it best to use pslatex, since that way you don't even need any fonts embedded in the PDF.

Some errors in the paper

Anonymous (not verified)
on
February 14, 2005 - 2:30pm

The O(1) notation does not necessarily mean that algorithm executes in constant time. It means that it is possible to establish an upper bound to the execution size that is a constant.

Also, O(n^2) algorithms do not double their execution time for increase of input from n to n+1. The growth rate is much, much smaller. To compute it, (n+1)² - n² = n² + 2n + 1 - n² = 2n + 1. That's quite a bit different from the behaviour that the author implied that would be result of O(2^n) complexity.

Quite right.

on
February 14, 2005 - 6:14pm

Right. A more accurate statement is that for each linear increment of 'n', run time of a O(n^2) algorithm increases by a factor proportional to 'n'. For small values of 'n', this is ok, but for large or even moderate values of 'n', this can become burdensome.

To say it doubles for each step would imply, as you say, exponential growth--O(2^n).

righto

Josh Aas (not verified)
on
February 14, 2005 - 6:58pm

Quite right. I will be fixing that statement in a couple of minutes - encouraging how many people picked up on that quickly and emailed me :) I should have known better than to type that quickly thinking it was simplistic given how tricky that wording can be.

However, an algorithm that al

Anonymous (not verified)
on
February 15, 2005 - 3:48am

However, an algorithm that always executes in constant time will be O(1). Just pointing this out so that those not versed in algorithm analysis don't mistakenly chuck out everything they thought they knew about theta notation in favour of something more complex that they can't be bothered to understand and then be left with nothing :-)

Also, Wikipedia has a serviceable explanation of theta notation, or "big o notation" as it's often called.

Theta notation is not same as the big-O notation, though.

Anonymous (not verified)
on
February 15, 2005 - 6:29am

The O notation only establishes that we know an upper bound that limits the running time. The theta notation limits the running time from up and below. Here's how you could understand them.

O: establishes the upper bound to running time
Theta: establishes the upper and lower bound to running time
Omega: establishes the lower bound to running time
o: establishes that running time grows slower than the function
omega: establishes that the running time grows faster than the function

O(1) notation may sometimes hide the fact that the function inside contains smaller O(N) algorithms. If there's a limit to N, you can ignore them if you want to claim so. For instance, the linux kernel has 140 priority queues. If you read the paper you see that the kernel must find the first queue among those that contains some schedulable elements. I understand this is bitmap search among just 140 bits and therefore probably blindingly fast in cpu time terms, but in principle, if we had more queues then we would start to observe the properties of linear search for the first queue that has elements. But since the algorithm doesn't vary the amount of priority queues, this can be rolled into O(1) nevertheless.

I realize the above is a bit foolish but I hope this illustrated something about the usage of the O notation. One of the most important facts is that it doesn't mean the algorithm executes quickly. It could take a day every time it's called, and it would still be O(1) just fine. :) It just means that regardless of the input it is called on, you know it will be finished by some time bound (but possibly also much earlier than that). What it does not mean is that it takes a constant time, a fact that was repeatedly asserted in the paper and which sparked me to write this...

fonts

Anonymous (not verified)
on
February 15, 2005 - 4:20am

The problem is one of Acrobat not handling bitmapped fonts well. I think it should be much nicer looking in Ghostscript/GV.

The easiest remedy is not to change fonts but to use the vectorized Postscript fonts instead. They were released royalty free some 8 years ago and most TeX distributions include them. Just put the relevant paths in ~/.dvipsrc (some trivial googling required).

Wikicomments to code?

Tom Bäckström (not verified)
on
February 16, 2005 - 12:14pm

Would it be possible/useful to have wiki-style commenting on kernel code (or for other code for that matter)? Or is there software for this kind of feature already? Now I suppose that a lot of people read through parts of the kernel and if he/she is to stumble on some cryptic part of code, and wonders about it for days, when he finds out what it does, she/he could just add a note that this part does this-and-that. He'd thus save other code-readers time for the same amount of time.

Commenting probably doesn't/shouldn't need to be as rigorously maintained through Linus as code has to be, so I can't see why this wouldn't be possible in principle.

Just a though,
Tom

Linux 2.6.8.1 CPU Scheduler Documentation

on
June 11, 2005 - 10:52am

Hi,

You state in your document that the kernel runs as a thread awoken by a timer interrupt (probably every 1 msec or so).

Isn't the kernel never supposed to run as a thread (what if it blocks?) who schedules it?

Cheers,
k

Reply to Question

Anonymous (not verified)
on
June 26, 2006 - 7:13am

The kernel is also a process. Kernel itself regains the control of the CPU automatically whenever there is a need to schedule the processes.
There are different ways in which the kernel regains its control. They are:
1. Kernel(or part of it) may reside in each process which is executing. So, whenever the process's execution is completed or interrupted the kernel which is in its address space will take control and performs the necessary actions.

2. Kernel may be residing in the separate main memory segment apart from the processes.

In this way the kernel itself schedules.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.