Hello,=20 Before I dive in, I should probably justify my motivations for writing this email. I'm working away on implementing an EDF scheduler for real time tasks in the kernel. This again leads to hacking at the existing source as I'm not about to toss out the entire scheduler - just replace (by some Kconfig switch) the RR/FIFO classes. As to why I'm looking at EDF, I think the answer to that is a bit too long (and not appropriate for this email anyway) so I'll leave that part out. However, what I do mean to discuss is the current state of the scheduler code. Working through the code, I must say I'm really impressed. The code is clean, it is well thought out and the new approach with sched_class and sched_entity makes it very modular. However, digging deeper, I find myself turning more and more desperate, looking over my shoulder for a way out. Now, I'm in no doubt that the code *is* modular, that it *is* clean and tidy, but coming from outside, it is not that easy to grasp it all. And, it is not just the sheer size and complexity of the scheduler itself, but also a lot with how the code is arranged. =46or instance, functions like free_fair_sched_group, alloc_fair_sched_group etc - does IMHO belong in sched_fair.c and not in sched.c. The same goes for several rt-functions and structs. So, if one drew up a list over all events that would cause functions in sched.c to be called, this could be used to make a minimized 'interface' and then let the scheduler call the appropriate function for the given class in the same fashion sched_tick is used today.=20 What I would like, is to rip out all the *actual* scheduling logic and put this in sched_[fair|rt].c and let sched be purely event-driven (which would be a nice design goal in itself). This would also lead to sched_[fair|rt].h, where the structs, macros, defines etc can be defined. Today these are defined in just about everywhere, making the code unnecessary complicated (I'm not going to say messy since I'm not *that* ...
You and a few other folks. The most interesting part of EDF is not the
actual scheduler itself (although there are fun issues with that as
well), but extending the Priority Inheritance framework to deal with all
I'd start out small by moving the functions to the right file. After
You might need to be careful there, or introduce sched_(fair|rt).h for
Sounds good, its been on the agenda for a while, but nobody ever got
around to it.
Other cleanups that can be done are:
- get rid of all the load balance iterator stuff and move
that all into sched_fair
- extract the common sched_entity members and create:
struct {
struct sched_entity_common common;
union {
struct sched_entity fair;
struct sched_rt_entity rt;
}
Well, adding a sched_class, no need to replace anything besides that.
--
well, yes, you trade in priority inversion with deadline inversion. Probabl= y a=20 Yes, I was thinking something along those lines, slowly moving things out. = I=20 just wanted to clear the air before I got started in case someone had some= =20 real issues with this approach. After all, if it's never going to get merge= d,=20 I wasn't really aiming at replacing anything (besides the rt-stuff, but as= =20 said earlier, that's not the issuen ow). I just want to move things to make= =20 it a bit clearer.=20 =2D-=20 med Vennlig Hilsen - Yours Sincerely Henrik Austad
Yes, here we are! :-)
We also have some code, but it still is highly experimental and we are
The main problem is that, especially to deal with SMP systems, we also
need to investigate theoretical issues and find out what the best
I'm not saying anything in possible sched.c and sched_{fair|rt}.c code
rearranging, I also only wonder why replacing fixed priority RT
scheduling with EDF.
I think they both could be in... Maybe we can discuss on where, I mean
on which position, in the linked list of scheduling classes, to put
each of them.
Regards,
Dario
Thanks for the Cc. Peter, I also added Fabio and Michael that, you
know, are working to this thing. :-)
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
--
Right, ideally I'd like to see 2 EDF classes on top of FIFO, so that we
end up with the following classes
hedf - hard EDF
sedf - soft EDF (bounded tardiness)
fifo/rr - the current static priority scheduler
fair - the current proportional fair scheduler
idle - the idle scheduler
The two edf classes must share some state, so that the sedf class knows
about the utilisation consumed by hedf, and the main difference between
these two classes is the schedulability test.
[ NOTE: read EDF as any deadline scheduler, so it might as well be
pfair PD^2 scheduler. ]
The few problems this gives are things like kstopmachine and the
migration threads, which should run at the max priority available on the
system.
[ NOTE: although possibly we could make an exception for the migration
threads, as we generally don't need to migrate running RT tasks]
Perhaps we can introduce another class on top of hedf which will run
just these two tasks and is not exposed to userspace (yes, I understand
it will ruin just about any schedulability analysis).
Which leaves us with the big issue of priority inversion ;-)
We can do deadline inheritance and bandwidth inheritance by changing
plist to a rb-tree/binary heap and mapping the static priority levels
somewhere at the back and also propagating the actual task responsible
for the boost down the chain (so as to be able to do bandwidth
reported that DI between disjoint schedule domains (partitions) posed an
interesting problem.
Personally I'd like to see the full priority inversion issue solved by
something like the proxy execution protocol, however the SMP extension
thereof seems to be a tad expensive - found a book on graph theory, all
that remains is finding time to read it :-)
The advantage of proxy execution is that its fully invariant to the
schedule function and thus even works for proportional fair schedulers
and any kind of scheduler hierarchy.
--
Basically the problem that I'm looking at is: Given a directed acyclic graph G, with entry nodes E (those who don't have an inbound edge) and exit nodes X (those without an outbound edge) then given an exit node x of X, split the graph into G1 and G2 so that G1 contains x and all paths leading to it. --
ok, simply put: * give each task a relative deadline (will probably introduce a new syscall, please don't shoot me). * when the task enters TASK_RUNNING, set abosolute deadline to time_now + rel_deadline. * insert task in rq, sorted by abs_deadline * pick leftmost task and run it * when task is done, pick next task that's it. I have a very clear idea about *what* the scheduler should do, the problem is how to get it to work :-) Things would be a lot easier if code in the scheduler was a bit 'more separated. I have started separating things and moving it to separate files. Yes, well, EDF is not optimal for SMP systems, only for single core. However, you can do a pretty good attempt by assigning tasks to cores in a greedy fashion (simply put the next task at the CPU with the lowest load). As a further optimization, I guess you could do the whole sced-domain thing to No. You should have *either* FIFO/RR *or* EDF, not both at the same time. If you absolutely require both, you should at least separate them on a per-core basis. If you mix them, they need to be aware of the other in order to make Well.. why not just treat *all* RT-tasks as *either* FIFO/RR or EDF? Having fifo and edf together will complicate things. And, people looking for edf, will not use fifo/rr anyway (famous last words). Furthermore, hard/firm/soft could be treated as one class, but it should be treated differently at missed deadlines. * Soft: nothing. Scheduled at best effort, when deadline passes, prioritize other tasks to avoid cascading effect of deadlinemissies * Firm: keep some statistics that the user can modify and a possible event-handler when limits are exceeded * Hard: immediatly call a user registred function, or send signal to notify Well, the nice thing about EDF, is that it is provable optimal for any feasible schedule, IOW, if all the tasks are schedulable, you can have 100% utilization of the CPU. On a multicore, EDF is not optimal, as rearranging ...
Well, yes, I know how to do EDF, and its trivially simple - on UP. Sure, but implementing an EDF class isn't really all that hard - esp if you only want UP. The problem with greedy binpacking heuristics is that your schedulablity We _have_ to have both. Its that simple. POSIX mandates we have SCHED_FIFO/RR, there is tons and tons of userspace that uses it, we cannot just replace it with a deadline scheduler. Also, start a linux-rt kernel and look at all the RT threads you have (and that's only going to get worse when we go to threads per interrupt handler instead of threads per interrupt source). Who is going to assign useful and meaningful periods, deadlines and execution times to all those threads? So the only other option is to add a deadline scheduler and allow people to use it. When you do that, you'll see that you have to add it above the FIFO/RR errr, not so, see above. FIFO/RR are static priority schedulers, whereas deadline schedulers are job-static or dynamic priority schedulers. You cannot simply map FIFO onto them, you need more information and who is going to provide that? Also, with the above mentioned requirement that we must have FIFO/RR, Thing is, you have to run hard tasks first, and scheduler weaker forms in its slack time, otherwise you cannot guarantee anything. And the easiest way to do that slack time stealing, is with such a hierarchy. Sure, you can share a lot of code etc. but you need separate On UP - which is not interesting on a general purpose kernel that runs That is partitioned EDF or PEDF for short. There are many other EDF variants who do not suffer this, for example global EDF (GEDF). GEDF can guarantee a utilization bound of 50% for hard-rt and is soft-rt up to 100%. Then there are the pfair class of scheduling algorithms which can Yes, its brilliant, on UP :-) But it should work on SMP as well with a bit more effort. But you really need bandwidth inheritance as well, which is yet another bit of ...
As a start, that is the approach at least I would like to take. Once you have a proven, functional EDF on a single core, you can extend that to handle several Well, not really. I mean, to be optimal, you should also consider WCET, but then, that's really not interesting as IMHO that's the userspace-programmer's responsibility. If the user wants to add tasks that sum up to 210% utilization, it's really not much we can do anyway. You certainly wouldn't want the kernel to stop accepting new jobs. So, keep the kernel logic as simple as possible and move the job to the user. By keeping the kernel logic simple - we make the job easier for the end-users. A very complex EDF-scheduler will make the testing very difficult. If, on the other hand, we *know* that the scheduler is not optimal, but that it behaves in a predictable manner, the end users have a simpler task of finding out why something bad happened. Because, no matter *what* you do, and *how* you implement it, with *whatever* features, there will be cases when things fall apart, and having a simple, I didn't mean to rip the whole fifo/rr out of the kernel, but adding a switch at compile-time so that you could choose *either* normal, static RT *or* EDF. Then we could, at least for the first few versions, have it depend on !SMP to avoid yes, that would certainly be a problem. But, if we can configure in either EDF well, periods is not that difficult. Basically you treat everything as asynchronus events and put them in the runqueue when they become runnable. And the application itself should set a relative deadline via the syscall to tell the kernel "when I become runnable, I must finish before time_now + rel_deadline" RT-tasks that runs forever will certainly be a problem, but what about something like mark it as soft/firm and for every time a deadline is missed, have the trigger function set a new deadline? It would mean you have to rewrite a whole I'm not going to map anything ...
We prefer not to, but sometimes there just isn't any other option. Well, you're of course free to do so, but I don't think its a very I agree that the scheduler should be simple, and even something like PD^2 is relatively simple. But I disagree that we should not do schedulability tests. Doing those, and esp. enforcing tasks to their given limits increases the QoS for others in the presence of faulty/malicious tasks. Also, WCET is still the users responsibility. If for each deadline task you specify a period, a deadline and a budget. Then the WCET computation is reflected in the budget. By enforcing the schedulability test and execution budget you raise the quality of service, because even in the presence of a mis-behaving task only that task will be impacted. The other tasks will still meet their But _why_? why not leave FIFO/RR in? There is absolutely no downside to This is, if you make the soft-deadline class aware of the hard-deadline class's tasks and schedulability contraints, then you can keep the soft-rt class schedulable too. So srt is in no way less important, its just has less restrictions on the schedule, therefore we can run it in the hrt slack/idle time. And adding the schedulability test in the kernel avoids these starvation Not that large indeed, but people are interested in running RT workloads on machines in the 32/64 scale. And even the embedded folks are now staring quad core arm11 chips in the Yeah, the idea was to send SIGXCPU to tasks who exceed their budget (and I'm thinking there's little useful left after all that ;-) --
My motivation was the increased complexity in meeting deadlines etc. By having only EDF (or only RR/FIFO) things would be a lot simpler. Life is not simple bah, you just picked all my arguments to shreds and convinced me I'm wrong. I guess it's back to the drawing board then, but now I have a better understanding at least. Thans for all the feedback guys, especially Peter! I really appreciate this! -- med Vennlig Hilsen - Yours Sincerely Henrik Austad --
Ok, that is how EDF work, and I know it... I was asking something I agree, EDF is very simple and has a lot of very nice properties... Problem is do decide how to assign a deadline to a task, if it is not a classical soft or hard real-time one! :-O I definitely agree that hard real time workloads are better handled by partitioned EDF, but for soft ones, it was sad to suffer from the possible CPU utilization loss it entails. Also, what about resources shared by different tasks in different CPU/partitions? And if you avoid sharing resources between tasks in different partitions (is that acceptable?), what about system resources, that are shared by _all_ tasks in the system by definition? Sorry about asking so much questions, but these are the issues we are trying to address, and I'm quite interested in knowing if you have any Well, obvioulsy it's something that we have to think carefully, but I can't see any harmful situation in having a deadline based and a fixed priority based scheduling from where to pick task in (sorry) priority Ok, maybe it's a matter of personal feelings, but I think that such a Well, I personally think that partitioning _raises_ issues about resource Regards, Dario Faggioli PS. Sorry for the webmail... I'm abroad and I've not my laptop with me :-( --
Yep. Actually I think that schedulability test could be an issue as well, especially if we like group/hierarchical approach, since the known hierarchical admission tests are quite complex to implement and, probably, time consuming if performed on-line in an highly dynamic (with respec to Yeah, that's exactly what we was thinking too. Actually, I was also thinking that having fixed priority scheduling _before_ EDF could be of some benefit if you have to be sure that a task is going to be executed at a very precise instant in time, but have not a Well, could be a solution... And if this is only used for such kind of special tasks, maybe it is not impossible to bound or account for their scheduling contribution... But I'm just shooting inthe dark here, sorry Yes, that's right, this is what we are investigating and trying to do in Wow... So, good luck for that! :-) Maybe it's my fault, but I see some issues with proxy execution and similar protocols. That is, if you have, let's say, task A blocked on task B, blocked on task C, and you are using proxy execution, that means that you have not dequeued A and B when they blocked, but that you, for example, filled a pointer that reminds you, when you schedule them, that you have to actually run C, am I right? Now, what happens if C blocks on a non-rt mutex lock, or if it simply go to sleep? Is that acceptable to track the blocking chain in order to actually dequeue also A and B, and to requeue them again when C will wake up as well? Yes, I agree and I like it very much too. If you go for it, you could also add bandwidth inheritance (e.g., for group scheduling) and things like that almost for free (if wanted! :-)) Regards, Dario Faggioli --
