Adrian went on to ask for the technical reasons for continuing to support four different IO schedulers, expressing concern that it could lead to bugs in individual schedulers going unreported. Jens explained that he was aiming for the perfect IO scheduler, but at this time different IO schedulers offer better results for different workloads, "with some hard work and testing, we should be able to get rid of [the anticipatory scheduler]. It still beats cfq for some of the workloads that deadline is good at, so not quite yet." Arjan van de Ven offered, "there is at least one technical reason to need more than one: certain types of storage (both big EMC boxes as well as solid state disks) don't behave like disks and have no seek penalty; any cpu time spent on avoiding seeks is wasted on those, so for these devices one really wants to use a different IO scheduler, one which is much lighter weight". Jens then acknowledged, "there's always a risk with 'duplication', like several drivers for the same hardware. I'm not disputing that."
From: Jens Axboe <jens.axboe@...>
Subject: Re: [2.6 patch] make I/O schedulers non-modular
Date: Nov 25, 12:21 pm 2007
On Sun, Nov 25 2007, Adrian Bunk wrote:
> There isn't any big advantage and doesn't seem to be much usage of
> modular schedulers.
>
> OTOH, the overhead made the kernel image of an x86 defconfig (that
> doesn't use modular schedulers) bigger by nearly 2 kB.
Big nack, I use it all the time for testing. Just because you don't
happen to use it is not a reason to remove it.
--
Jens Axboe
-
From: Adrian Bunk <bunk@...>
Subject: Re: [2.6 patch] make I/O schedulers non-modular
Date: Nov 25, 12:31 pm 2007
On Sun, Nov 25, 2007 at 05:21:07PM +0100, Jens Axboe wrote:
> On Sun, Nov 25 2007, Adrian Bunk wrote:
> > There isn't any big advantage and doesn't seem to be much usage of
> > modular schedulers.
> >
> > OTOH, the overhead made the kernel image of an x86 defconfig (that
> > doesn't use modular schedulers) bigger by nearly 2 kB.
>
> Big nack, I use it all the time for testing.
OK.
> Just because you don't
> happen to use it is not a reason to remove it.
s/you/you and all distributions you checked/
> Jens Axboe
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
-
From: Jens Axboe <jens.axboe@...>
Subject: Re: [2.6 patch] make I/O schedulers non-modular
Date: Nov 25, 12:45 pm 2007
On Sun, Nov 25 2007, Adrian Bunk wrote:
> On Sun, Nov 25, 2007 at 05:21:07PM +0100, Jens Axboe wrote:
> > On Sun, Nov 25 2007, Adrian Bunk wrote:
> > > There isn't any big advantage and doesn't seem to be much usage of
> > > modular schedulers.
> > >
> > > OTOH, the overhead made the kernel image of an x86 defconfig (that
> > > doesn't use modular schedulers) bigger by nearly 2 kB.
> >
> > Big nack, I use it all the time for testing.
>
> OK.
>
> > Just because you don't
> > happen to use it is not a reason to remove it.
>
> s/you/you and all distributions you checked/
Well they should make them modules (two of them, that is). It's been a
long time since I considered a distro .config a benchmark/guideline of
any sort.
--
Jens Axboe
-
From: Adrian Bunk <bunk@...>
Subject: Re: [2.6 patch] make I/O schedulers non-modular
Date: Nov 25, 12:56 pm 2007
On Sun, Nov 25, 2007 at 05:45:32PM +0100, Jens Axboe wrote:
> On Sun, Nov 25 2007, Adrian Bunk wrote:
> > On Sun, Nov 25, 2007 at 05:21:07PM +0100, Jens Axboe wrote:
> > > On Sun, Nov 25 2007, Adrian Bunk wrote:
> > > > There isn't any big advantage and doesn't seem to be much usage of
> > > > modular schedulers.
> > > >
> > > > OTOH, the overhead made the kernel image of an x86 defconfig (that
> > > > doesn't use modular schedulers) bigger by nearly 2 kB.
> > >
> > > Big nack, I use it all the time for testing.
> >
> > OK.
> >
> > > Just because you don't
> > > happen to use it is not a reason to remove it.
> >
> > s/you/you and all distributions you checked/
>
> Well they should make them modules (two of them, that is).
>...
Is there any technical reason why we need 4 different schedulers at all?
I have the gut feeling that the usual thing happens and people e.g. not
report some cfq problems because as works for them...
> Jens Axboe
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
-
From: Arjan van de Ven <arjan@...>
Subject: Re: [2.6 patch] make I/O schedulers non-modular
Date: Nov 25, 7:27 pm 2007
On Sun, 25 Nov 2007 17:56:54 +0100
Adrian Bunk wrote:
> Is there any technical reason why we need 4 different schedulers at
> all?
>
there is at least one technical reason to need more than one: certain
types of storage (both big EMC boxes as well as solid state disks)
don't behave like disks and have no seek penalty; any cpu time spent on
avoiding seeks is wasted on those, so for these devices one really
wants to use a different IO scheduler, one which is much lighter weight
-
Am I missing something?
Is long as scheduler design (process or IO) is an issue in the main kernel branch, either a modular interface should be created to substitute alternate schedulers at compile or runtime, or new scheduler rewrites should not be merged until they are rock solid. Why Linus rejects the modularity argument I'll never know. Why he accepts buggy scheduler rewrites into the main branch (hello CFS!) is even more baffling. The bottom line is Linus' fence straddling is leading to a worse kernel.
what's your problem with CFS
what's your problem with CFS exactly? it's working great everywhere i tested it.
Hi, I'm not the one who
Hi,
I'm not the one who posted before, but what I can't understand is the difference between IO and process schedulers. Linux has refushed to include several pluggable schedulers into the kernel, and I don't really think we can get _the perfect scheduler_ that performs outstanding in every case.
Why do we have different IO schedulers and not multiple process schedulers? I mean, what's the technical reason?
As I understand it
As I understand it, a process scheduler is inherently a global resource, while for IO you could have different schedulers per device.
Do you mean, we can set up a
Do you mean, we can set up a IO sched. for one hard drive and another IO sched for the other hard drive? I didn't know that.
But if we had pluggable process schedulers, couldn't we have one of them for a certain workload and other when this workload changes?
Cheers
thing is this, different
thing is this, different hardware behaves differently, so applying a scheduler that take that difference into effect makes sense.
cpu timeshare do not behave differently from hardware to hardware, so finding the one "true" scheduler there makes sense.
a cpu under heavy load behaves the same as any other cpu under heavy load. but a hardrive under heavy load will behave differently from a ssd under heavy load thanks to differences in physical design (like the mail quotes say).
a hardisk has to bring the heads and plates into the right location for the write to happen, a ssd can just send the data to the right spot straight away. physical media have a built in latency. it could also be that on a physical drive, one should give priority to data thats supposed to be written to the same area of a drive, to save wear on the drive heads and other parts.
a cpu scheduler seems to benefit more from being tunable depending on how it will be used. so that a user can give priority to gui related processes, or make sure that the audio or video encoding stream from a live media do not get messed up because the scheduler put a less important (to the user) process ahead of it.
so you set up a couple of baseline rules of fairness, and then add a framework to tune it on top. like how one is (or will be, cant say i have payed that much attention to the recent progress) able to group processes and give those groups priority.
in other words, storage media is a known quantity that rarely change. so if you have a scheduler that work for one, you box it up and is done with it. but user scenarios change on a almost daily basis. so to expect to write one scheduler that works perfectly for that scenario, and another for a different one is nearly hopeless, or a exercise in futility.
so for a cpu scheduler, workload is unimportant, but usage scenarios are king.
Wow
OK man,
Thanks for the explanation!
Great explanation, too bad its wrong
It is very simple. A desktop workstation has interactivity requirements that a server does not. It is a question of minimizing latency on a desktop machine and maximizing throughput on a server. This topic has been discussed an nauseum. Linux's one size fits few scheduler is lousy on the desktop. Con showed this and provided an excellent alternative and a modularization interface, which were summarily rejected.
Bzzt. Wrong.
A great many server applications are very latency sensitive. Anything that involves locking, for example, can benefit greatly from fast turn-arounds. Furthermore, the kind of latencies that humans might demand are trivial compared to modern processor speeds and the cheapness of context switching.
So really there is *no* reason to have separate server and desktop schedulers when you can instead have a scheduler than doesn't suck(tm).
Con's work on a desktop cpu scheduler make the weaknesses of the old scheduler more obvious to some people... and Ingo's scheduler came about as a result.
- Furthermore, the tunables
- Furthermore, the tunables are there exactly for differing needs, there's absolutely no need for another scheduler.
(I'm not the parent)
thing is this, different
wow!! short and complete. nice!! Thanx!!
;)
Scheduling CPU and I/O.
* CPU-convoy.
* I/O-convoy.
cpu: _^^^^^^________^_^_^_^__^^^___^^^___^^^___
i/o: _______^^^^^^^^_^_^_^_^____^_____^_____^_____
_ means death-time.
Not always true
Your timeline shows only CPU or I/O active at any given time. That's isn't always how it works out, though it certainly is true for the common case of dependent reads. (Dependent read meaning that the CPU can't proceed until the data comes back for the read, and the next thing being read is determined by what you're currently reading, so prefetch isn't an option.) You can get concurrency between I/O and CPU activity. Writes are the most common example, since they can be flushed in the background. Asynchronous I/O is another example.
Indeed, the fact that writes can drain in the background is, ironically, a big cause of interactivity problems. :-) The producer of data to be written doesn't get blocked by the writes very easily, and so can produce tons of data. This can slow all other readers way down, which was part of the motivation for the anticipatory I/O scheduler.
--
Program Intellivision and play Space Patrol!
You were missing
An 'A' to start with, a name, and the point.
Is a spelling flame...
the best you can do? The debate is whether or not it makes sense to continue to grope for a sufficiently tunable single scheduler or to support modularity to experiment with other schedulers.
Different schedulers for different disks.
I've found having multiple I/O schedulers to be handy for a system that is acting as a SAN server. The disk being served out to remote clients performed better using the deadline scheduler (or even the noop scheduler) than it did using the default cfq scheduler.
But I still wanted all the local filesystems to use the cfq scheduler.
By having multiple schedulers I could apply the appropriate ones to the appropriate disks based on their work loads.
I could imagine this might apply to disks that being used exclusively by a database server too for example.
For christs sake
This guy is like a trained parrot, repeating the same crap over and over again without even understanding what he's talking about. First mail: "there isn't any big advantage", Second mail: "whats the technical reason?". Sigh. The module namespace discussion is another good example of this.
Please Adrian, go fix some whitespace errors or clean up Kconfig files.
I just love how people here
I just love how people here with only minor or no kernel programming experience (myself included) seems to think they know better than highly experienced kernel developers...
solid state disk
So, whitch IO scheduler is better
for SSD ?
Noop is the best of the
Noop is the best of the currently available schedulers for SSD.
However, a better SSD scheduler could be done: The best scheduler for SSD disks today would be noop for read requests but for writes it would delay non-sync writes and attempt to reorder so that writes form contiguous chunks.
would be nice if we could
would be nice if we could have a special one for usb media, one that does writes as they come, but isnt pedantic about the the FAT or similar, so as to not wear out the flash of a memory stick.
if you do not have a memory device with a write indication light, you cant really tell when its done writing. that is, unless you do mounts and unmounts manually (as it will do pending writes on unmount), and thats less then user friendly imo.
That's the responsibility of
That's the responsibility of your desktop software, not the kernel.
If your desktop doesn't tell you when it's done, then it's time to get a new desktop.
In gnome, you just
In gnome, you just right-click on the volume icon, and select "eject."
Then you can pull it out.
Same for CD drives.
There's no "idiot interlock," like there is on CD drives (to keep you from ejecting while it's in use), but that's the hardware folks' fault.
Re: Noop is the best of the
Thanks!
Will try it soon ...