Jeff Roberson recently fixed a few bugs in the experimental ULE process scheduler in FreeBSD -current [forum]. He says:
"Things should be much improved. Feedback, as always, is welcome. I'd like to look into making this the default scheduler for 5.2 if things start looking up. I hope that scares you all into using it more. :-)"
User reactions to these changes were generally positive, putting 5.x performance with the ULE scheduler enabled potentially above that of the current stable 4.8 kernel [forum]. This led into an interesting discussion about future plans for the scheduler itself, and how it interfaces with threads.
The ULE scheduler was written by Jeff Roberson and merged into FreeBSD 5.1 as an "experimental" process scheduler designed to bring many benefits to SMP servers. For a complete understanding refer to this pdf document.
From: Jeff Roberson [email blocked] To: current Subject: More ULE bugs fixed. Date: Wed, 15 Oct 2003 03:51:18 -0400 (EDT) I fixed two bugs that were exposed due to more of the kernel running outside of Giant. ULE had some issues with priority propagation that stopped it from working very well. Things should be much improved. Feedback, as always, is welcome. I'd like to look into making this the default scheduler for 5.2 if things start looking up. I hope that scares you all into using it more. :-) Cheers, Jeff
From: Daniel Eischen [email blocked] Subject: Re: More ULE bugs fixed. Date: Wed, 15 Oct 2003 08:47:35 -0400 (EDT) On Wed, 15 Oct 2003, Jeff Roberson wrote: > Things should be much improved. Feedback, as always, is welcome. I'd > like to look into making this the default scheduler for 5.2 if things > start looking up. I hope that scares you all into using it more. :-) Before you do that, can you look into changing the scheduler interfaces to address David Xu's concern with it being suboptimal for KSE processes? -- Dan Eischen
From: Jeff Roberson [email blocked] Subject: Re: More ULE bugs fixed. Date: Wed, 15 Oct 2003 15:09:35 -0400 (EDT) On Wed, 15 Oct 2003, Daniel Eischen wrote: > > Before you do that, can you look into changing the scheduler > interfaces to address David Xu's concern with it being > suboptimal for KSE processes? Certainly, it may not happen if I can't find out what's making things so jerky for gnome/kde users. If it looks like it will, I'll investigate the kse issues. > > -- > Dan Eischen
From: Julian Elischer [email blocked] Subject: Re: More ULE bugs fixed. Date: Wed, 15 Oct 2003 17:16:25 -0700 (PDT) On Wed, 15 Oct 2003, Daniel Eischen wrote: > > Before you do that, can you look into changing the scheduler > interfaces to address David Xu's concern with it being > suboptimal for KSE processes? There is also some work that I'd like to get done re: cleaning up the scheduler interface a bit.. I know that Jeff and I have doiscussed this before but it was a long time ago, and I've forgotten a lot and also learned a bit since then.. Here's my logic on the matter: Any process has a number (fixed or variable) of kernel entities tghat can be scheduled. In KSE (gotta get a better name) there are a variable number of them. In libthr they are 1:1. I would postulate that the action of scheduling these items in a fair way is up to the scheduler. I had a very crude fairness module added to the BSD4.4 scheduler but I think that fairness is a property of the scheduler and not of the threading package. If the scheduler doesn't care if threads are scheduled fairly than it can just schedule all threads equally. I would say that the ksegrp in question (which represents a rough unit of 'fairness'), should make a call to the scheduler on creation specifying the required concurrancy. At the moment KSE-M:N based ksegrps would specify N = NCPU, and THR based ksegrps would specify N = NTHREADS. KSE-1:1 runs with a KSEGRP with a concurrancy of 1 per thread. (I still think that THR should allocate a KSEGRP per thread not a KSE but it's not critical.) Basically What I'm saying is that each scheduler should taka a concurrency setting for each KSEGRP and how it implements it is hidden from higher layers. The current 4.4 scheduler would implement it using KSEs and the existitng code but other shcedulers may chose to implement it in different manners. I think the top layer API calls for the scheduler should be: setrunnable(thread) choosethread() sched_clocktick() sched_set_concurrancy() (plus all the other 'entrypoints') I think that the scheduler needs to be in control of scheduling threads because there is too much inside information needed for it to be done properly by an outside entity. For example if the scheduler is not a priority based scheduler then an outside entity can not know how to juggle which thread should be run next if there is a choice of which to do.. this would mean that each scheduler would neeed its own module to do this juggling instead of having a separate module to do it.. it makes the job of the scheduler more difficult, but in fact it has to be so, because true posix process-scope threads require that the scheduler do this work. a thread is made runnable (with a unix priority) the scheduler needs to look at this thread in the context of all the other threads from this process, the current concurrency rule for that ksegrp and the other runnable threads, and adjust things so that: 1/ the new thread is run some time 2/ the ksegrp doesn't get TOO MUCH cpu, possibly punishing other threads in the group to compensate.. This is all up for discussion, but it's my current thinking. Julian
From: David Xu [email blocked] Subject: Re: More ULE bugs fixed. Date: Thu, 16 Oct 2003 09:49:57 +0800 rv:1.5b) Gecko/20030723 Thunderbird/0.1 These are what I want to see. Current I am forced to maintain nexus among kse and ksegrp and thread whenever kse_create is called or thread is exiting but those nexuses is not used by SA code at all. From my view as a scheduler interface user, I only want to see ksegrp and thread, the scheduler interface should be thread friendly not kse which is scheduler internal detail to maintian scheduling faireness, for example, it might just be a token to assign a thread, a thread has this token can be picked to run on physical CPU,scheduler would maintain token assigning faireness, when there more tokens in ksegrp, the more concurrent level the ksegrp will get. Another problem is libkse in future would support cpu affinity, with current scheduler interface, I can not specify a newly create thread to be bound to a specified CPU, this is needed by libkse, when an upcall is scheduled, I want to specify which cpu is prefered to run newly created upcall thread to give userland scheduler a stable cpu affinity state. Dan ever asked to add kse_bind syscall to let him bind userland kse to a specified cpu, if we want to support him, then we might need this feature in kernel. David Xu
From: Eirik Oeverby [email blocked] Subject: Re: More ULE bugs fixed. Date: Wed, 15 Oct 2003 13:30:33 +0200 rv:1.5b) Gecko/20031014 Thunderbird/0.2 Jeff Roberson wrote: > I fixed two bugs that were exposed due to more of the kernel running > outside of Giant. ULE had some issues with priority propagation that > stopped it from working very well. > > Things should be much improved. Feedback, as always, is welcome. I'd > like to look into making this the default scheduler for 5.2 if things > start looking up. I hope that scares you all into using it more. :-) Hi.. Just tested, so far it seems good. System CPU load is floored (near 0), system is very responsive, no mouse sluggishness or random mouse/keyboard input. Doing a make -j 20 buildworld now (on my 1ghz p3 thinkpad ;), and running some SQLServer stuff in VMWare. We'll see how it fares. Thanks, /Eirik
From: Eirik Oeverby [email blocked] Subject: Re: More ULE bugs fixed. Date: Wed, 15 Oct 2003 13:48:57 +0200 rv:1.5b) Gecko/20031014 Thunderbird/0.2 Eirik Oeverby wrote: > > Just tested, so far it seems good. System CPU load is floored (near 0), > system is very responsive, no mouse sluggishness or random > mouse/keyboard input. > Doing a make -j 20 buildworld now (on my 1ghz p3 thinkpad ;), and > running some SQLServer stuff in VMWare. We'll see how it fares. Hi, just a followup message. I'm now running the buildworld mentioned above, and the system is pretty much unusable. It exhibits the same symptoms as I have mentioned before, mouse jumpiness, bogus mouse input (movement, clicks), and the system is generally very jerky and unresponsive. This is particularily evident when doing things like webpage loading/browsing/rendering, but it's noticeable all the time, no matter what I am doing. As an example, the last sentence I wote without seeing a single character on screen before I was finsihed writing it, and it appeared with a lot more typos than I usually make ;) I'm running *without* invariants and witness right now, i.e. a kernel 100% equal to the SCHED_4BSD kernel. Best regards, /Eirik
From: Jeff Roberson [email blocked] Subject: Re: More ULE bugs fixed. Date: Wed, 15 Oct 2003 15:08:27 -0400 (EDT) On Wed, 15 Oct 2003, Eirik Oeverby wrote: > Eirik Oeverby wrote: > > Hi, just a followup message. > I'm now running the buildworld mentioned above, and the system is pretty > much unusable. It exhibits the same symptoms as I have mentioned before, > mouse jumpiness, bogus mouse input (movement, clicks), and the system is > generally very jerky and unresponsive. This is particularily evident > when doing things like webpage loading/browsing/rendering, but it's > noticeable all the time, no matter what I am doing. As an example, the > last sentence I wote without seeing a single character on screen before > I was finsihed writing it, and it appeared with a lot more typos than I > usually make ;) > > I'm running *without* invariants and witness right now, i.e. a kernel > 100% equal to the SCHED_4BSD kernel. Can you confirm the revision of your sys/kern/sched_ule.c file? How does SCHED_4BSD respond in this same test? Thanks, Jeff
From: Eirik Oeverby [email blocked] Subject: Re: More ULE bugs fixed. Date: Thu, 16 Oct 2003 10:16:03 +0200 rv:1.5b) Gecko/20031014 Thunderbird/0.2 Jeff Roberson wrote: > > Can you confirm the revision of your sys/kern/sched_ule.c file? How does > SCHED_4BSD respond in this same test? Yes I can. From file: __FBSDID("$FreeBSD: src/sys/kern/sched_ule.c,v 1.59 2003/10/15 07:47:06 jeff Exp $"); I am running SCHED_4BSD now, with a make -j 20 buildworld running, and I do not experience any of the problems. Keyboard and mouse input is smooth, and though apps run slightly slower due to the massive load on the system, there is none of the jerkiness I have seen before. Anything else I can do to help? /Eirik
From: Jeff Roberson [email blocked] Subject: Re: More ULE bugs fixed. Date: Thu, 16 Oct 2003 04:32:28 -0400 (EDT) On Thu, 16 Oct 2003, Eirik Oeverby wrote: > > Yes I can. From file: > __FBSDID("$FreeBSD: src/sys/kern/sched_ule.c,v 1.59 2003/10/15 07:47:06 > jeff Exp $"); > I am running SCHED_4BSD now, with a make -j 20 buildworld running, and I > do not experience any of the problems. Keyboard and mouse input is > smooth, and though apps run slightly slower due to the massive load on > the system, there is none of the jerkiness I have seen before. > > Anything else I can do to help? Yup, try again. :-) I found another bug and tuned some parameters of the scheduler. The bug was introduced after I did my paper for BSDCon and so I never ran into it when I was doing serious stress testing. Hopefully this will be a huge improvement. I did a make -j16 buildworld and used mozilla while in kde2. It was fine unless I tried to scroll around rapidly in a page full of several megabyte images for many minutes.
From: Eirik Oeverby [email blocked] Subject: Re: More ULE bugs fixed. Date: Thu, 16 Oct 2003 13:42:26 +0200 rv:1.5b) Gecko/20031014 Thunderbird/0.2 Jeff Roberson wrote: > > Yup, try again. :-) I found another bug and tuned some parameters of the > scheduler. The bug was introduced after I did my paper for BSDCon and so > I never ran into it when I was doing serious stress testing. > > Hopefully this will be a huge improvement. I did a make -j16 buildworld > and used mozilla while in kde2. It was fine unless I tried to scroll > around rapidly in a page full of several megabyte images for many minutes. It is. Still not perfect, but now it's somewhere around the 4BSD mark I would say. Think about 'make buildworld' is that it doesn't get real tough before it hits some of the larger directories, like the crypto stuff etc., where there are many .c files in one dir - before it gets that far, there are at most 2 or 3 cc1 processes going concurrently. As soon as I get 10-20 of them, things start getting sluggish, but I suppose it's hard to avoid that. What disturbs me somewhat, though, is that I get some of this sluggishness (and other symptoms i've mentioned before) even when i'm running 'nice -n 20 make -j 20 buildworld' .. meaning the cc1 processes and all that are running (very) nice. The fact that I still have issues even when doing that, would lead me to think the problem is somewhere else than in the scheduler.. Now I can't say I'm completely sure if this is also the case with 4BSD - I only tested the nice stuff after the last reboot. But all in all, things are better now than yesterday morning. Kudos! /Eirik
From: Peter Kadau [email blocked] Subject: Re: More ULE bugs fixed. Date: Thu, 16 Oct 2003 10:38:07 +0200 Hi ! > Things should be much improved. Feedback, as always, is welcome. Wow ! Smoothly working under a load of approx. 4. Running gnome2, mozilla, evolution, mplayer and kpdf. Running portsdb -Uu and a kernel build. No stuttering mouse, no irritating delays, fast rendering. That's definitely better than _4BSD. (UP machine) Cheers Peter -- [email blocked] Campus der Max-Planck-Institute Tübingen Netzwerk- und Systemadministration