Pluggable Schedulers vs. Pluggable Security

Submitted by Jeremy
on October 3, 2007 - 7:13am

In a continuing discussion about the difference between pluggable security and pluggable schedulers, Linus Torvalds quoted himself:

"Another difference is that when it comes to schedulers, I feel like I actually can make an informed decision. Which means that I'm perfectly happy to just make that decision, and take the flak that I get for it. And I do (both decide, and get flak). That's my job."

He added, "which you seem to not have read or understood (neither did apparently anybody on slashdot)". Linus continued, "the arguments that 'servers' have a different profile than 'desktop' is pure and utter garbage, and is perpetuated by people who don't know what they are talking about." He then asked and answered his own question, "really: tell me what the difference is between 'desktop' and 'server' scheduling. There is absolutely *none*," going on to explain:

"Yes, there are differences in tuning, but those have nothing to do with the basic algorithm. They have to do with goals and trade-offs, and most of the time we should aim for those things to auto-tune (we do have the things in /proc/sys/kernel/, but I really hope very few people use them other than for testing or for some extreme benchmarking - at least I don't personally consider them meant primarily for 'production' use)."

Regarding the comparison between pluggable schedulers and pluggable security, Linus stated:

"Really - not only is the whole 'desktop scheduler' argument totally bogus to begin with (and only brought up by people who either don't know anything about it, or who just want to argue, regardless of whether the argument is valid or not), quite frankly, when you say that it's the 'same issue' as with security models, you're simply FULL OF SH*T.

"The issue with LSM is that security people simply cannot even agree on the model. It has nothing to do with performance. It's about management, and it's about totally different models. Have you even *looked* at the differences between AppArmor and SELinux? Did you look at SMACK? They are all done by people who are interested in security, but have totally different notions of what 'security' even *IS*ALL*ABOUT."


From: Bill Davidsen
Subject: Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access   Control Kernel
Date: Oct 2, 2:02 pm 2007

Linus Torvalds wrote:
> 
> On Mon, 1 Oct 2007, Stephen Smalley wrote:
>> You argued against pluggable schedulers, right?  Why is security
>> different?
> 
> Schedulers can be objectively tested. There's this thing called 
> "performance", that can generally be quantified on a load basis.
> 
> Yes, you can have crazy ideas in both schedulers and security. Yes, you 
> can simplify both for a particular load. Yes, you can make mistakes in 
> both. But the *discussion* on security seems to never get down to real 
> numbers. 
> 
And yet you can make the exact same case for schedulers as security, you 
can quantify the behavior, but if your only choice is A it doesn't help 
to know that B is better.

You say "performance" as if it had universal meaning. In truth people 
want to optimize for total tps (servers), or responsiveness on the human 
scale (mail, dns, nntp servers), or perceived smoothness (with many 
threads updating a display to slow with load rather than start visibly 
jumping the motion from one to another), or very short term response 
(-rt patches). People want very different behavior under the same load, 
and that is what *they* call "performance," namely best delivery of 
what's important. The numbers are "hard science" but the choice of which 
numbers are important is still "people wanking around with their opinions".

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-

From: Linus Torvalds Subject: Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel Date: Oct 2, 2:20 pm 2007 On Tue, 2 Oct 2007, Bill Davidsen wrote: > > And yet you can make the exact same case for schedulers as security, you can > quantify the behavior, but if your only choice is A it doesn't help to know > that B is better. You snipped a key part of the argument. Namely: Another difference is that when it comes to schedulers, I feel like I actually can make an informed decision. Which means that I'm perfectly happy to just make that decision, and take the flak that I get for it. And I do (both decide, and get flak). That's my job. which you seem to not have read or understood (neither did apparently anybody on slashdot). > You say "performance" as if it had universal meaning. Blah. Bogus and pointless argument removed. When it comes to schedulers, "performance" *is* pretty damn well-defined, and has effectively universal meaning. The arguments that "servers" have a different profile than "desktop" is pure and utter garbage, and is perpetuated by people who don't know what they are talking about. The whole notion of "server" and "desktop" scheduling being different is nothing but crap. I don't know who came up with it, or why people continue to feed the insane ideas. Why do people think that servers don't care about latency? Why do people believe that desktop doesn't have multiple processors or through-put intensive loads? Why are people continuing this *idiotic* scheduler discussion? Really - not only is the whole "desktop scheduler" argument totally bogus to begin with (and only brought up by people who either don't know anything about it, or who just want to argue, regardless of whether the argumen is valid or not), quite frankly, when you say that it's the "same issue" as with security models, you're simply FULL OF SH*T. The issue with LSM is that security people simply cannot even agree on the model. It has nothing to do with performance. It's about management, and it's about totally different models. Have you even *looked* at the differences between AppArmor and SELinux? Did you look at SMACK? They are all done by people who are interested in security, but have totally different notions of what "security" even *IS*ALL*ABOUT. In contrast, anybody who claims that the CPU scheduler doesn't know what it's all about is just tripping. And anybody who claims that desktop workloads are so radically different from server workloads (or that the hardware is so different) is just totally out to lunch. So next time, think five minutes before you start your argument. Linus -
From: Bill Davidsen Subject: Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel Date: Oct 2, 8:54 pm 2007 Linus Torvalds wrote: > On Tue, 2 Oct 2007, Bill Davidsen wrote: > >> And yet you can make the exact same case for schedulers as security, you can >> quantify the behavior, but if your only choice is A it doesn't help to know >> that B is better. >> > > You snipped a key part of the argument. Namely: > > Another difference is that when it comes to schedulers, I feel like I > actually can make an informed decision. Which means that I'm perfectly > happy to just make that decision, and take the flak that I get for it. And > I do (both decide, and get flak). That's my job. > > which you seem to not have read or understood (neither did apparently > anybody on slashdot). > Actually I had quoted that, made a reply, and decided that my reply was too close to a flame and deleted the quote and the nasty reply, because I couldn't find a nice way to say what I wanted. Oh well, I tried to keep to a higher level, but... on this topic you seem to be off on an ego trip. You are not the decider, George Bush is the decider, and the only time he's not wrong he didn't understand the question. I checked the schedule, it's not you week to be God. There are sensible people you respect on other topics, who have the opinion that there is room for behaviors other than CFS, and who have created a pluggable scheduler framework which they are trying to hand you on a platter. And you won't even consider that they might be right, because you believe there can be one scheduler which is close to optimal for all loads. >> You say "performance" as if it had universal meaning. >> > > Blah. Bogus and pointless argument removed. > > When it comes to schedulers, "performance" *is* pretty damn well-defined, > and has effectively universal meaning. > > The arguments that "servers" have a different profile than "desktop" is > pure and utter garbage, and is perpetuated by people who don't know what > they are talking about. The whole notion of "server" and "desktop" > scheduling being different is nothing but crap. > Unfortunately not so, I've been looking at schedulers since MULTICS, and desktops since the 70s (MP/M), and networked servers since I was the ARPAnet technical administrator at GE's Corporate R&D Center. And on desktops response is (and should be king), while on a server, like nntp or mail, I will happily go from 1ms to 10sec for a message to pass through the system if only I can pass 30% more messages per hour, because in virtually all cases transit time in that range is not an issue. Same thing for DNS, LDAP, etc, only smaller time range. If my goal is <10ms, I will not sacrifice capacity to do it. > I don't know who came up with it, or why people continue to feed the > insane ideas. Why do people think that servers don't care about latency? > Because people who run servers for a living, and have to live with limited hardware capacity realize that latency isn't the only issue to be addressed, and that the policy for degradation of latency vs. throughput may be very different on one server than another or a desktop. > Why do people believe that desktop doesn't have multiple processors or > through-put intensive loads? Why are people continuing this *idiotic* > scheduler discussion? > Because people can't get you to understand that one size doesn't fit all (and I doubt I've broken through). > Really - not only is the whole "desktop scheduler" argument totally bogus > to begin with (and only brought up by people who either don't know > anything about it, or who just want to argue, regardless of whether the > argumen is valid or not), quite frankly, when you say that it's the "same > issue" as with security models, you're simply FULL OF SH*T. > The real issue is that you can't imagine that people who don't share your opinion are not only wrong but don't understand the problem. You may be right, but when you say anyone who disagrees is wrong by definition, then you have lost sight of productive technical differences. When your arguments drop to personal attacks and rants it's time to look at your technical values. > The issue with LSM is that security people simply cannot even agree on the > model. It has nothing to do with performance. It's about management, and > it's about totally different models. Have you even *looked* at the > differences between AppArmor and SELinux? Did you look at SMACK? They are > all done by people who are interested in security, but have totally > different notions of what "security" even *IS*ALL*ABOUT. > Exactly, and I'm not the only one who doubts that more than one model would be useful. I'm sorry you can't see that about CPU schedulers as well. > In contrast, anybody who claims that the CPU scheduler doesn't know what > it's all about is just tripping. And anybody who claims that desktop > workloads are so radically different from server workloads (or that the > hardware is so different) is just totally out to lunch. > > So next time, think five minutes before you start your argument. > I don't disagree with you lightly, in this case I think I have a better superficial understanding of schedulers than you do of how production servers are used. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 -
From: Linus Torvalds Subject: Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel Date: Oct 2, 9:52 pm 2007 On Tue, 2 Oct 2007, Bill Davidsen wrote: > > Unfortunately not so, I've been looking at schedulers since MULTICS, and > desktops since the 70s (MP/M), and networked servers since I was the ARPAnet > technical administrator at GE's Corporate R&D Center. And on desktops response > is (and should be king), while on a server, like nntp or mail, I will happily > go from 1ms to 10sec for a message to pass through the system if only I can > pass 30% more messages per hour, because in virtually all cases transit time > in that range is not an issue. Same thing for DNS, LDAP, etc, only smaller > time range. If my goal is <10ms, I will not sacrifice capacity to do it. Bill, that's a *tuning* issue, not a scheduler logic issue. You can do that today. The scheduler has always had (well, *almost* always: I think the really really original one didn't) had tuning knobs. It in no way excuses any "pluggable scheduler", because IT DOES NOT CHANGE THE PROBLEM. [ Side note: not only doesn't it change the problem, but a good scheduler tunes itself rather naturally for most things. In particular, for things that really are CPU-limited, the scheduler should be able to notice that, and will not aim for latency to the same degree. In fact, what is really important is that the scheduler notice that some programs are latency-critical AT THE SAME TIME as other programs sharing that CPU are not, which very much implies that you absolutely MUST NOT have a scheduler that done one or the other: it needs to know about *both* behaviors at the same time. IOW, it is very much *not* about multiple different "pluggable modules", because the scheduler must be able to work *across* these kinds of barriers. ] So for example, with the current scheduler, you can actually set things like scheduler latency. Exactly so you can tune things. However, I actually would argue that you generally shouldn't need to, and if you really do need to, and it's a huge deal for a real load (and not just a few percent for a benchmark), we should consider that a scheduler problem. So your "argument" is nonsense. You're arguing for something else than what you _claim_ to be arguing for. What you state that you want actually has nothing what-so-ever to do with pluggable schedulers, quite the reverse! It's also totally incorrect to state that this is somehow intrisicly a feature of a "server load". Many server loads have very real latency constraints. No, not the traditional UNIX loads of SMPT and NNTP, but in many loads the latency guarantees are a rather important part of it, and you'll have benchmarks that literally test how high the load can be until latency reaches some intolerable value - ie latency ends up being the critical part. There's also a meta-development issue here: I can state with total conviction that historically, if we had had a "server scheduler" and a "desktop scheduler", we'd have been in much worse shape than we are now. Not only are a lot of the loads the same or at least similar (and aiming for _one_ scheduler - especially one that auto-tunes itself at least to to some degree - gets you lots of good testing), but the hardware situation changes. For example, even just five years ago, there would have been people who thought that multiprocessing is a server load - and they'd have been largely right at the time. Would you have wanted a "server" (SMP, screw latency) scheduler, a "workstation" (SMP but low-latency) scheduler and a "desktop" (UP) scheduler for the different cases? Because yes, SMP does impact the scheduler a lot... The locking, the migration between CPU's, the CPU affinity.. Things that gamers five years ago would have felt was just totally screwing them over and making the scheduler slower and more complex "for no gain". See? Pluggable things are generally a *bad* thing. You should generally aim for *never* being pluggable if you can at all avoid it, because it not only fragments the developer base over totally different code bases, it generates unmaintainable decisions as the problem space evolves. To get back to security: I didn't want pluggable security because I thought that was a technically good solution. No, the reason Linux has LSM (and yes, I was the one who pushed hard for the whole thing, even if I didn't actually write any of it) was because the problem wasn't technical to begin with. It was social/political and administrative. See? Another fundamental difference between schedulers and security modules. > > I don't know who came up with it, or why people continue to feed the insane > > ideas. Why do people think that servers don't care about latency? > > Because people who run servers for a living, and have to live with limited > hardware capacity realize that latency isn't the only issue to be addressed, > and that the policy for degradation of latency vs. throughput may be very > different on one server than another or a desktop. Quite frankly a lot of other people run servers for a living too, and their main issue is often latency. And no, they don't do NNTP or SMTP, they do strange java things around databases with thousands of threads. Should they use a "desktop" scheduler? Because clearly their loads have nothing what-so-ever in common with yours? Or can you possibly admit that it's really the exact same problem? Really: tell me what the difference is between "desktop" and "server" scheduling. There is absolutely *none*. Yes, there are differences in tuning, but those have nothing to do with the basic algorithm. They have to do with goals and trade-offs, and most of the time we should aim for those things to auto-tune (we do have the things in /proc/sys/kernel/, but I really hope very few people use them other than for testing or for some extreme benchmarking - at least I don't personally consider them meant primarily for "production" use). > > Why do people believe that desktop doesn't have multiple processors or > > through-put intensive loads? Why are people continuing this *idiotic* > > scheduler discussion? > > Because people can't get you to understand that one size doesn't fit all (and > I doubt I've broken through). I understand the "one size" argument, I just disagree vehemently about it having anything to do with a pluggable scheduler. The scheduler does have tuning, most of it 100% automatic (that's what the "fairness" thing is all about!), and none of it needs - or would even remotely be helped by - pluggability. Take a really simple example: you have fifty programs all wanting to run on the same machine at the same time. There simply *needs* to be some single scheduler that picks which one to run. At some point, you have to make the decision. And no, they are not all "throughput" or all "latency", and you cannot make your decision based on a "global pluggable scheduler policy". Some of the processes may be purely about throughput, some may be purely about latency, and some may change over their lifetime. Not very amenable to "pluggable" things, is it? Especially since the thing that eventually needs to give the CPU time to *somebody* simply needs to understand all these different needs at some level anyway. It always ends up having to be *something* that decides, and it can absolutely never ignore the other "somethings". So a set of independent pluggable modules simply wouldn't work. See? (Sure you could make a multi-level scheduler with different pluggable ones for different levels, but that really doesn't work very well, since even in a multi-level one, you do want to have some generic notion of "this one cares about latency" and "this process is about throughput", so then the pluggable stuff wouldn't add any advantage _anyway_ - the top-level decision would have all the complexities of the one "perfect" scheduler you want in the first place!) In contrast, look at fifty programs that all want to run on the same machine at the same time, but we have security issues. Now, security pretty much by definition cuts _across_ those programs, with the whole point generally being to make one machine secure, so almost always you'd generally want to have "a security model" for the whole machine (or at least virtual machine) - it's just that the policies may be totally different in different circumstances and across different machines. But even if you were to *not* want to have one single policy, but have different policies for different processes, it at least still makes some conceptual sense, in ways it does not to try to have independent schedulers. For schedulers, at some point, it just hits the hardware resource: the CPU needs to be given to *one* of them. For a security policy, it's all software choices - you don't need to limit yourself to one choice at any time. So a pluggable module makes more sense there anyway. But no, that's not really why we have LSM. I'd have *much* preferred to have one unified security module setup that we could all agree on, and no pluggable security modules. It was not to be - and the reason we have LSM is not because "it makes more sense than a CPU scheduler", but simply because "people didn't actually get anything done at all, because they just argued about what to do". In the CPU schedulers, Ingo still gets work done, even though people argue about it. So we haven't needed to go to the extreme of an "LSM for CPU schedulers", because the arguments don't actually hold up the work. And THAT is what matters in the end. Linus -

Tuning an inappropriate algorithm of using the right one?

Anonymous (not verified)
on
October 3, 2007 - 8:17am

"Yes, there are differences in tuning, but those have nothing to do with the basic algorithm."

Ok. But the basic algorithm is one of the most hotly debated areas of kernel design. The scheduler has undergone 3 recent major rewrites that I know of. That is telling us something: that an algorithm that can be smoothly varied between the needs of desktop and server use is elusive.

"an algorithm that can be

Anonymous (not verified)
on
October 3, 2007 - 8:47am

"an algorithm that can be smoothly varied between the needs of desktop and server user is elusive."

Is it really? Has anyone even collected real data on this?

What exactly is the difference between the desktop and server cases?

I have a lot of trouble believing that something can't be done well for both cases... if there even _are_ two cases to start with.

Algorithmes vs. policies

iq-0
on
October 3, 2007 - 9:22pm

(Disclaimer: this posting is not all inclusive, but wants to make the destinction between algorithmes and policies clear, I probably forget a number of things in my summations ea.)

policies
========

Requirements:
* high throughput
* low latency
* relative importance
(a combination of these)

The variables which are used to achieve these requirements are:
* priorities
* time slices
* preemption

We want a certain level of fairness, which makes processes with the overlapping requirements content for resources. This is the basic truth as to why we have to do scheduling at all. If you don't want basic fairness, you could just as wel do it randomly and stop caring about the details, this might actually work pretty well for certain workloads :-)

So far we are talking policy not algorithm. Algorithms have everything to do with efficiently applying these policies to our problem domain. We use different algorthimes to mostly because of performance reasons. A naive implementation could make (near?) perfect scheduling decisions but would take an unprecedented amount of time for it.
A perfect algorithm would simply make all policies applicable without using any resources (cpu / memory) in no time. This is impossible :-)
So we try different algorithms with different data-structures to try to make descissions as fast as possible (minimum overhead) while still scaling well (few locks, small memory overhead). In reality though we often "cheat" by making assumptions about the policies they try to enforce. These assumptions often make dramatic performance improvements for the scheduler but often (always?) make them biased (not really fair). It's all in the eye of the beholder if this bias is acceptable. This bias though *always* means it's flawed.

Do note that a specific workload applied to the scheduler adds additional variables to the scene and effectively invalidates the algorithm (since it needs to take into account additional information). Though in practice we can make simple assumptions for the other cases to make them fit this model. (This is partly where tunables come from, they are default values for workloads that don't have a specific value for given problems many of these tunables can be automatically tuned but that is by approximation. However, they don't change the algorithm!)

server vs desktop

Anonymous (not verified)
on
October 3, 2007 - 8:36am

Sure, the basic concepts for server and desktop scheduling are exactly the same, but on a server, I want fairness, while on my desktop, if I'm playing an online 3D FPS, I want extreme unfairness! I want top priority to my gaming resources, and everything else needs to go on the back burner. I want low latency for everything associated with the game, and everything else can bloody well wait until I exit the game.

Give the end user an easy way to do that, and the critics will be happy.

If you want top priority to

Anonymous (not verified)
on
October 3, 2007 - 8:52am

If you want top priority to your "gaming resources" you can probably also tell us what those "gaming resources" are, and perhaps also use the nice(1) command appropriately.

More flexibility

AstralStorm (not verified)
on
October 3, 2007 - 9:34am

Someone should split nice into throughput and latency notion, e.g. add syscalls tnice and lnice.
Currently with CFS, nice governs both, which means apps which need a lot of CPU cannot be down-niced, as it'll reduce the latency of the whole system.

Likewise, a very bursty load could use high tnice and low lnice.

Example: the game itself would get low tnice, medium lnice, while the sound server it uses would get high tnice, low lnice, which would mean it almost always preempts the game, though is only allowed little CPU time per burst.

Currently with CFS, nice

Anonymous (not verified)
on
October 3, 2007 - 8:24pm

Currently with CFS, nice governs both, which means apps which need a lot of CPU cannot be down-niced, as it'll reduce the latency of the whole system.

That was probably true with earlier CFS versions, but the latest v22 one keeps to a constant "latency target", regardless of nice levels. (It's in /proc/sys/kernel/sched_latency, if you want to tune it)

With that I see even and uniform latencies in non-reniced tasks too, even if X is running at negative nice levels.

There's also RLIMIT_NICE which allows ordinary nonprivileged users to go down a few (configurable number of) nice levels, if they want to do that.

nice

rmg
on
October 3, 2007 - 9:27am

I believe under a "proper" scheduler, the process priority will do this for you.

Question

Anonymous (not verified)
on
October 3, 2007 - 9:27am

It seems to me that Linus is making sense.

Is it possible to run a high throughput app and a low latency app on the same server?

Surely to maintain the low latency you have to interrupt the high throughput app as and when required thus reducing its throughput?

Basically you want 2 servers one running with a 1000Hz timeslice and the other with a 100HZ timeslice. ??

one way to do that, that i

turn_self_off (not verified)
on
October 3, 2007 - 10:07pm

one way to do that, that i can see, is DMA and sub-processors.

basically the big iron way.

you have one for the storage controller, one for the network controller, and one that runs the low latency requirement app.

to expect all this to work out on a single core, single cpu system where the cpu must deal with both the traffic and the latency issue is just asking for trouble.

software cant fix a hardware problem...

Huh? All high-performance

intgr
on
October 3, 2007 - 10:41pm

Huh? All high-performance peripheral devices have used DMA since the 1990s. Except for a few circumstances, drivers need so little CPU power that it simply does not matter. If they don't, they can be preempted by enabling CONFIG_PREEMPT.

Making sure that interactive tasks stay interactive in face of high-throughput tasks is exactly the job of the scheduler. As long as the interactive task uses little CPU time (or is niced proportionally less than the other task), the high-throughput task will always be preempted immediately after an incoming interrupt (e.g. after a blocking I/O operation completed).

The end performance of course depends on what kind of I/O the tasks are doing. If it's disk I/O, the I/O scheduler is not always fair by default, but this can be changed with ionice. Network packet queues can grow moderately long, and although TCP attempts to be fair among separate TCP connections, further tuning can be achieved by using CBQ (class-based queueing). Both of these problems disappear when the tasks use separate I/O devices.

If all of this is still not enough, real-time Linux can offer hard latency guarantees.

It's all about scheduling and priorities. CPU has all the control over hardware.

Linus is right.

Anonymous (not verified)
on
October 3, 2007 - 9:47am

Windows has basically the same scheduler in the desktop and server versions.

The difference between them is bias settings. CFS need containers on processors to do this effectively. Then some how a program with active window gets more processor time has to be done for desktop machines for user processes. This can be stacked on top of CFS. Really this is only that scheduler has to provide a very controllable system.

There is no need for more than 1 scheduler if the scheduler is flexible enough.

Security in theory there is room for only one too. Problem is most have secuirty weaknesses or a buggers to setup right.

We have benchmarks for schedulers to workout if they lack flexibility. We need a benchmark system for security systems it weed out the problems. Ie detect if too complex to configure right or misses security flaws.

Which is the best scheduler?

Fred Flinta (not verified)
on
October 3, 2007 - 10:32am
  • Borrowed-Virtual-Time Scheduling (BVT)
  • Completely Fair Scheduler (CFS)
  • Critical Path Method of Scheduling
  • Deadline-monotonic scheduling (DMS)
  • Deficit round robin (DRR)
  • Earliest deadline first scheduling (EDF)
  • Elastic Round Robin
  • Fair-share scheduling
  • First In, First Out (FIFO), also known as First Come First Served (FCFS)
  • Gang scheduling
  • Genetic Anticipatory
  • Highest response ratio next (HRRN)
  • Interval scheduling
  • Last In, First Out (LIFO)
  • Job Shop Scheduling (see Job shops)
  • Least-connection scheduling
  • Least slack time scheduling (LST)
  • List scheduling
  • Lottery Scheduling
  • Multilevel queue
  • Multilevel Feedback Queue
  • Never queue scheduling
  • O(1) scheduler
  • Proportional Share Scheduling
  • Rate-monotonic scheduling (RMS)
  • Round-robin scheduling (RR)
  • Shortest expected delay scheduling
  • Shortest job next (SJN)
  • Shortest remaining time (SRT)
  • Staircase Deadline scheduler (SD)
  • "Take" Scheduling
  • Two-level scheduling
  • Weighted fair queuing (WFQ)
  • Weighted least-connection scheduling
  • Weighted round robin (WRR)
  • Group Ratio Round-Robin: O(1)
  • IO scheduler

    Anonymous (not verified)
    on
    October 3, 2007 - 12:25pm

    Why is it pluggable again?

    Because they made the wrong

    intgr
    on
    October 3, 2007 - 8:40pm

    Because they made the wrong choice by making it pluggable. This has very often been used as an example in CPU scheduler discussions of what to avoid.

    I'm not sure what's the reason for not de-pluggifying it again though. Maybe it will happen some time.

    Hm, never heard that it was

    Anonymous
    on
    October 3, 2007 - 8:59pm

    Hm, never heard that it was a wrong choice. AFAIK the different io schedulers can bound on different devices so that the device performs on maximum performance. For example a flash drive performs other than a hdd.

    on that topic, is there now

    turn_self_off (not verified)
    on
    October 3, 2007 - 10:10pm

    on that topic, is there now a option to make sure that data is written to a flash right away, without having it burn to a cheap chip that lacks wear leveling?

    For example a flash drive

    Anonymous (not verified)
    on
    October 3, 2007 - 10:39pm


    For example a flash drive performs other than a hdd.

    An IO scheduler should be perfectly aware of the fact that it is handling a flash drive. It is suboptimal if the user has to make that decision/tuning.

    CPUs are a lot more uniform and a lot more shared, so no such fundamental variations occur. You dont set one CPU to run Oracle and the other CPU run shell scripts. (yes, it can be done, but generally CPUs are uniformly shared, while disks are much more bound to workloads. You further bind your workload to a group of disks by placing the data on the disk - which increases the binding.)

    And that's the crux of the matter: when there's fundamental physical differences (such as having a physically different networking card, or having different physical layouts of data/filesystems), a pluggable, "driver" like design is forced. Even in those cases, the plugging should be done by the kernel, automatically. For everything else, pluggability has a cost - especially if the user is forced to do it.

    Why is it pluggable

    Anonymous (not verified)
    on
    October 3, 2007 - 8:55pm

    Why is it pluggable again?

    Now that it has been exposed to user-space apps the maintainers cannot remove IO scheduler pluggability anymore because application writers and vendors resist it. So Linux is stuck with an IO scheduler API that needs tweaking from sysadmins for every workload instead of a good IO scheduler that would take hints from applications.

    It's not a really big deal (yet), but the other (non-CFQ) IO schedulers are definitely hard to remove and they dillute testing and split developer attention.

    CPU schedulers are a completely different animal due to CPUs having a much higher workload sharing factor and them having a much lower context switch cost, so there pluggability is even less wanted than for IO schedulers.

    On a CPU it costs little to share workloads (up to a certain limit). On a disks workload sharing causes immediate disk seeks and trashing. So due to the physical differences disks are already de-facto dedicated to particular apps/workloads (at least on servers, for IO bound workloads where IO performance matters), so the disadvantages of pluggable IO schedulers are much less visible. (It was probably still the wrong choice - but more borderline.)

    Because the hardware behaves

    Anonymous (not verified)
    on
    October 4, 2007 - 12:06am

    Because the hardware behaves differently.

    Comment viewing options

    Select your preferred way to display the comments and click "Save settings" to activate your changes.