Hi,
The past months I've been working intermittently on an I/O scheduler
framework and a fair queuing policy for DragonFly BSD. First off I want to
note that I'm not a Computer Scientist and that my code is probably
sub-optimal, completely off from what is considered an I/O scheduler in
scientific papers, etc.
The FQ policy should serve mainly as a proof of concept of what the
framework can do. It seems to work pretty decently on its own, as some
rather naive benchmarks[1] I did confirm. I really want to emphasize that
it's suboptimal, it has some problems that limit for example overall write
performance more than it should. Yet overall it should solve our extreme
interactivity issues. As the graphs[1] show, the read performance with
ongoing writes has been drastically increased by about a factor of 3.
--
At this point I would like to make my work public and see some testing and
especially some reviews. You can either fetch my iosched-current branch on
leaf or just apply my patch[2] to the current master as of this writing;
although it probably also applies to older kernels.
The work basically consists of 4 parts:
- General system interfacing
- I/O scheduler framework (dsched)
- I/O scheduler fair queuing policy (dsched_fq or fq)
- userland tools (dschedctl and ionice)
--
After applying the patch you still won't notice any difference, as the
default scheduler is the so-called noop (no operation) scheduler; it
emulates our current behaviour. This can be confirmed by dschedctl -l:
# dschedctl -l
cd0 => noop
da0 => noop
acd0 => noop
ad0 => noop
fd0 => noop
md0 => noop
--
To enable the fq policy on a disk you have two options:
1) set scheduler_{diskname}="fq" in /boot/loader.conf; e.g. if it should be
enabled for da0, then scheduler_da0="fq". Certain wildcards are also
understood, e.g. scheduler_da* or scheduler_*. Note that sernos are not
supported (yet).
2) use dschedctl:
# dschedctl -s fq -d ...I think the principle of least surprise would suggest that it should work exactly like nice, rather than flip it around, or people will get confused why the misbehaving program continues to eat all IO even after they reniced it. :-) MAgnus
nice is actually intuitive - the higher the number, the 'nicer' the processes are to the rest of the system.. negative nice => "mean" it's just *system* relative - not process relative.. perhaps a commentary on our modern individualistic nature, that we get this wrong.. or something.. shutting up now.
Right now this is the least of my concerns. If someone wants to use dsched right now, he should know that he's using something experimental and hence should read up on whatever documentation is available, including the nice level inversion. Eventually I might change it to be similar to the normal nice levels, but it also would be only in one direction (i.e. -10 to 0, not -20 to 20). -- On another note, I really want to emphasize the fact that if you want to try dsched, don't use the patch that was attached to the first email, pull from my iosched-current branch on leaf which is up to date. I've done quite a few improvements since the original patch, mainly: - changing the algorithm to estimate the disk usage percent. Now it's done right, by measuring the time the disk spends idle in one balancing period. (thanks to corecode for the idea) - due to the previous change, I have also been able to add a feedback mechanism that tries to dispatch more requests if the disk becomes idle, even if all processes have already reached their rate limit by increasing the limit if needed. - moving the heavier balancing calculations out of the fq_balance thread and into the context of the processes/threads that do I/O, as far as this is possible. Some of the heavy balancing calculations will still occur in the dispatch thread instead of the issuing context. (thanks to Aggelos for the idea) - ironing out a few bugs related to int32_t overflow. - general cleanup & refactoring -- I also forgot to mention in my original email that there are some other interesting tools/settings, mainly: sysctl kern.dsched_debug: the higher the level, the more debug you'll get. By default no debug will be printed. At level 4, only the disk busy-% will be printed, and at 7 all details about the balancing will be shown. test/dsched_fq: If you build fqstats (just using 'make' in this directory), you'll be able to read some of the statistics that dsched_fq keeps track of, such as ...
Hi, I've checked out the branch and done some test for the past two days and although I didn't go further than some basic tests (music playing under heavy I/O, recursive finds, X programs startup also during high I/O activity, etc) and some other more intensive (bonnie++ and what's more important hammer reblocking) I'd like to share my subjective good feelings about it. For example, during a hammer cleanup a music play isn't jumpy anymore, and the applications like firefox are quite happier than they went before under certain load. On the other hand I've had some unexpected freezes (machine not responding to any input from the mouse and keyboard) but I think my the hardware can take the blame of that, because I have not been able to reproduce another box. Also, if you switch the scheduler 5 or 6 times from/to fq and noop, a panic is triggered related to a TAILQ handling on cleanup. Besides that I would like to thank Alex for the great effort he's doing and also encourage people to give the scheduler a try. I would really like to see this in master soon, disabled by default, so people who are in the bleeding-edge can benefit from it in the case they want. For those who would like to try and don't know exactly how to do it, find below some quick instructions: a) Switch to master branch, make sure you don't have local changes before pull and update master branch. # cd /usr/src # git checkout master # git pull b) Add alexh's personal repo to your remotes and update it. # git remote add leaf_alexh git://leaf.dragonflybsd.org/~alexh/dragonfly.git # git remote update leaf_alexh c) Checkout main scheduler branch and also bring latest changes from master # git checkout -b iosched-current --no-track leaf_alexh/iosched-current # git rebase master d) Build and install everything # make -j2 buildworld && make -j2 buildkernel KERNCONF=GENERIC # sudo -E make installkernel KERNCONF=GENERIC && sudo make installworld && sudo make upgrade Hope you ...
Any testing is welcome! I would have expected some more interest (and hence more testers) as this addresses an issue that many people have experienced, I've not been able to reproduce any system freezes and I'm not sure why they would happen in the first place. On the other hand, I've finally been able to solve the policy switching issue, and this fix is now committed to my branch (commit 0e9144bec7970967edcd917909472d2dde8db23a). I've tried it by switching about IMHO this should be the way to go, as the impact is virtually non-existent with the 'noop' policy, which is currently default. Bringing in the code to master would allow for a wider testing base, as more people might feel adventurous enough to switch the policy to 'fq' for at least a while. Just to reiterate: the 'noop' policy just passes the requests through as it has happened so far, so no breakage is to be expected from just using the Thanks for providing some instructions! Cheers, Alex Hornung
Hi Alex, Thanks for dsched. So far it seems work well. Havent got a panic. It feels maybe slightly more responsive using fq, than noop. I dont know, i havent really "stress tested" it. One thing that I did notice though is that setting scheduler_*="fq" doesnt work, but scheduler_da0="fq" does. Petr
Thanks! Didn't notice that the loader doesn't like '*'. I've changed the tunables to the following: dsched_pol_da0 = "foo" (replaces scheduler_da0 = "foo") dsched_pol_da = "foo" (replaces scheduler_da* = "foo") dsched_pol = "bar" (replaces scheduler_* = "bar") Cheers, Alex Hornung
