<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://kerneltrap.org"  xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>KernelTrap - Jens Axboe</title>
 <link>http://kerneltrap.org/taxonomy/term/338/0</link>
 <description></description>
 <language>en-local</language>
<item>
 <title>Budget Fair Queuing IO Scheduler</title>
 <link>http://kerneltrap.org/Linux/Budget_Fair_Queuing_IO_Scheduler</link>
 <description>&lt;div class=&quot;taxonomy-images&quot;&gt;&lt;a href=&quot;/news/linux&quot; class=&quot;taxonomy-image-links&quot;&gt;&lt;img src=&quot;http://kerneltrap.org/files/category_pictures/K-Linux.gif&quot; alt=&quot;Linux news&quot; title=&quot;Linux news&quot;  width=&quot;75&quot; height=&quot;75&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;!-- google_ad_section_start --&gt;&lt;p&gt;&quot;&lt;i&gt;We are working [on] a new I/O scheduler based on CFQ, aiming at improved predictability and fairness of the service, while maintaining the high throughput it already provides,&lt;/i&gt;&quot; began Fabio Checconi, announcing the BFQ I/O scheduler.  &quot;&lt;i&gt;The Budget Fair Queueing (BFQ) scheduler turns the CFQ Round-Robin scheduling policy of time slices into a fair queuing scheduling of sector budgets,&quot;&lt;i&gt; he continued, &quot;&lt;i&gt;more precisely, each task is assigned a budget measured in number of sectors instead of amount of time, and budgets are scheduled using a slightly modified version of WF2Q+.  The budget assigned to each task varies over time as a function of its behaviour.  However, one can set the maximum value of the budget that BFQ can assign to any task.&lt;/i&gt;&quot;  Fabio went on to explain:&lt;/i&gt;&lt;/i&gt;&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&quot;The time-based allocation of the disk service in CFQ, while having the desirable effect of implicitly charging each application for the seek time it incurs, suffers from unfairness problems also towards processes making the best possible use of the disk bandwidth.  In fact, even if the same time slice is assigned to two processes, they may get a different throughput each, as a function of the positions on the disk of their requests.  On the contrary, BFQ can provide strong guarantees on bandwidth distribution because the assigned budgets are measured in number of sectors.  Moreover, due to its Round Robin policy, CFQ is characterized by an O(N) worst-case delay (jitter) in request completion time, where N is the number of tasks competing for the disk.  On the contrary, given the accurate service distribution of the internal WF2Q+ scheduler, BFQ exhibits O(1) delay.&quot;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Jens Axboe reacted favorably, &quot;&lt;i&gt;Fabio, I&#039;ve merged the scheduler for some testing. Overall the code looks great, you&#039;ve done a good job!&lt;/i&gt;&quot;  He noted that the scheduler should soon appear in the -mm tree, and that it was worth considering merging the two I/O schedulers together.&lt;/p&gt;
&lt;!-- google_ad_section_end --&gt;&lt;p&gt;&lt;a href=&quot;http://kerneltrap.org/Linux/Budget_Fair_Queuing_IO_Scheduler&quot; target=&quot;_blank&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://kerneltrap.org/Linux/Budget_Fair_Queuing_IO_Scheduler#comments</comments>
 <category domain="http://kerneltrap.org/-mm">-mm</category>
 <category domain="http://kerneltrap.org/taxonomy/term/1231">BFQ</category>
 <category domain="http://kerneltrap.org/CFQ">CFQ</category>
 <category domain="http://kerneltrap.org/taxonomy/term/1232">Fabio Checconi</category>
 <category domain="http://kerneltrap.org/Jens_Axboe">Jens Axboe</category>
 <category domain="http://kerneltrap.org/Linux">Linux</category>
 <category domain="http://kerneltrap.org/scheduler">scheduler</category>
 <category domain="http://kerneltrap.org/news/linux">Linux news</category>
 <pubDate>Tue, 15 Apr 2008 18:02:59 +0000</pubDate>
 <dc:creator>Jeremy</dc:creator>
 <guid isPermaLink="false">15993 at http://kerneltrap.org</guid>
</item>
<item>
 <title>Quote: Think Outside Your Own Sandbox</title>
 <link>http://kerneltrap.org/Quote/Think_Outside_Your_Own_Sandbox</link>
 <description>&lt;!-- google_ad_section_start --&gt;&lt;p&gt;&quot;Sorry to sound a bit harsh, but sometimes it doesn&#039;t hurt to think a bit outside your own sandbox.&quot;&lt;/p&gt;
&lt;!-- google_ad_section_end --&gt;</description>
 <comments>http://kerneltrap.org/Quote/Think_Outside_Your_Own_Sandbox#comments</comments>
 <category domain="http://kerneltrap.org/Jens_Axboe">Jens Axboe</category>
 <category domain="http://kerneltrap.org/Linux">Linux</category>
 <category domain="http://kerneltrap.org/quote">quote</category>
 <category domain="http://kerneltrap.org/taxonomy/term/1157">Jens Axboe</category>
 <category domain="http://kerneltrap.org/taxonomy/term/1094">linux-kernel</category>
 <pubDate>Thu, 10 Jan 2008 14:39:26 +0000</pubDate>
 <dc:creator>Jeremy</dc:creator>
 <guid isPermaLink="false">15179 at http://kerneltrap.org</guid>
</item>
<item>
 <title>Modular IO Schedulers</title>
 <link>http://kerneltrap.org/Linux/Modular_IO_Schedulers</link>
 <description>&lt;div class=&quot;taxonomy-images&quot;&gt;&lt;a href=&quot;/news/linux&quot; class=&quot;taxonomy-image-links&quot;&gt;&lt;img src=&quot;http://kerneltrap.org/files/category_pictures/K-Linux.gif&quot; alt=&quot;Linux news&quot; title=&quot;Linux news&quot;  width=&quot;75&quot; height=&quot;75&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;!-- google_ad_section_start --&gt;&lt;p&gt;Adrian Bunk posted &lt;a href=&quot;http://kerneltrap.org/mailarchive/linux-kernel/2007/11/25/444191&quot;&gt;a patch&lt;/a&gt; to make Linux IO schedulers a non-modular option, which would require one IO scheduler to be selected at compile time.  He suggested, &quot;&lt;i&gt;there isn&#039;t any big advantage and doesn&#039;t seem to be much usage of modular schedulers.&lt;/i&gt;&quot;  He added that removing the option to make IO schedulers modular would save 2kB on each kernel image.  Jens Axboe did not like the patch, &quot;&lt;i&gt;big nack, I use it all the time for testing. Just because you don&#039;t happen to use it is not a reason to remove it.&lt;/i&gt;&quot;  When Adrian noted that no distros seemed to be making IO schedulers available as modules, Jens suggested that this was a mistake and quipped, &quot;&lt;i&gt;it&#039;s been a long time since I considered a distro .config a benchmark/guideline of any sort.&lt;/i&gt;&quot;&lt;/p&gt;
&lt;p&gt;Adrian went on to ask for the technical reasons for continuing to support four different IO schedulers, expressing concern that it could lead to bugs in individual schedulers going unreported.  Jens explained that he was aiming for the perfect IO scheduler, but at this time different IO schedulers offer better results for different workloads, &quot;&lt;i&gt;with some hard work and testing, we should be able to get rid of [the anticipatory scheduler].  It still beats cfq for some of the workloads that deadline is good at, so not quite yet.&lt;/i&gt;&quot;  Arjan van de Ven offered, &quot;&lt;i&gt;there is at least one technical reason to need more than one: certain types of storage (both big EMC boxes as well as solid state disks) don&#039;t behave like disks and have no seek penalty; any cpu time spent on avoiding seeks is wasted on those, so for these devices one really wants to use a different IO scheduler, one which is much lighter weight&lt;/i&gt;&quot;.  Jens then acknowledged, &quot;&lt;i&gt;there&#039;s always a risk with &#039;duplication&#039;, like several drivers for the same hardware. I&#039;m not disputing that.&lt;/i&gt;&quot;&lt;/p&gt;
&lt;!-- google_ad_section_end --&gt;&lt;p&gt;&lt;a href=&quot;http://kerneltrap.org/Linux/Modular_IO_Schedulers&quot; target=&quot;_blank&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://kerneltrap.org/Linux/Modular_IO_Schedulers#comments</comments>
 <category domain="http://kerneltrap.org/Adrian_Bunk">Adrian Bunk</category>
 <category domain="http://kerneltrap.org/anticipatory_scheduler">anticipatory scheduler</category>
 <category domain="http://kerneltrap.org/Arjan_van_de_Ven">Arjan van de Ven</category>
 <category domain="http://kerneltrap.org/CFQ">CFQ</category>
 <category domain="http://kerneltrap.org/taxonomy/term/1146">IO scheduler</category>
 <category domain="http://kerneltrap.org/Jens_Axboe">Jens Axboe</category>
 <category domain="http://kerneltrap.org/news/linux">Linux news</category>
 <pubDate>Mon, 26 Nov 2007 21:23:24 +0000</pubDate>
 <dc:creator>Jeremy</dc:creator>
 <guid isPermaLink="false">14882 at http://kerneltrap.org</guid>
</item>
<item>
 <title>SG Chaining Merged</title>
 <link>http://kerneltrap.org/Linux/SG_Chaining_Merged</link>
 <description>&lt;div class=&quot;taxonomy-images&quot;&gt;&lt;a href=&quot;/news/linux&quot; class=&quot;taxonomy-image-links&quot;&gt;&lt;img src=&quot;http://kerneltrap.org/files/category_pictures/K-Linux.gif&quot; alt=&quot;Linux news&quot; title=&quot;Linux news&quot;  width=&quot;75&quot; height=&quot;75&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;!-- google_ad_section_start --&gt;&lt;p&gt;&quot;&lt;i&gt;I think the SG stuff looks ok now, but I think we have a lot of &#039;fix up the rough edges&#039; to go!&lt;/i&gt;&quot; &lt;a href=&quot;http://kerneltrap.org/mailarchive/linux-kernel/2007/10/23/351488&quot;&gt;Linus Torvalds noted&lt;/a&gt; regarding some of the fallout from the recent merge of Jens Axboe&#039;s &lt;a href=&quot;http://kerneltrap.org/sg_chaining&quot;&gt;SG chaining&lt;/a&gt; patchset.  During one of the many discussions, Jens explained:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&quot;It&#039;s all about the end goal - having maintainable and resilient code.  And I think the sg code will be better once we get past the next day or so, and it&#039;ll be more robust. That is what matters to me, not the simplicity of the patch itself.&quot;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Boaz Harrosh commented, &quot;&lt;i&gt;thanks Jens for doing all this, The performance gain is substantial and we will all enjoy it.&lt;/i&gt;&quot;  Jens replied, &quot;&lt;i&gt;my pleasure, I just wish it could have been a little less painful. But in a day or two, it should all be behind us and we can move forward with making good use of it.&lt;/i&gt;&quot;&lt;/p&gt;
&lt;!-- google_ad_section_end --&gt;&lt;p&gt;&lt;a href=&quot;http://kerneltrap.org/Linux/SG_Chaining_Merged&quot; target=&quot;_blank&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://kerneltrap.org/Linux/SG_Chaining_Merged#comments</comments>
 <category domain="http://kerneltrap.org/taxonomy/term/1106">Boaz Harrosh</category>
 <category domain="http://kerneltrap.org/Jens_Axboe">Jens Axboe</category>
 <category domain="http://kerneltrap.org/Linus_Torvalds">Linus Torvalds</category>
 <category domain="http://kerneltrap.org/Linux">Linux</category>
 <category domain="http://kerneltrap.org/merge_window">merge window</category>
 <category domain="http://kerneltrap.org/sg_chaining">sg chaining</category>
 <category domain="http://kerneltrap.org/news/linux">Linux news</category>
 <pubDate>Tue, 23 Oct 2007 19:56:24 +0000</pubDate>
 <dc:creator>Jeremy</dc:creator>
 <guid isPermaLink="false">14658 at http://kerneltrap.org</guid>
</item>
<item>
 <title>Caution and Latency</title>
 <link>http://kerneltrap.org/Linux/Caution_and_Latency</link>
 <description>&lt;div class=&quot;taxonomy-images&quot;&gt;&lt;a href=&quot;/news/linux&quot; class=&quot;taxonomy-image-links&quot;&gt;&lt;img src=&quot;http://kerneltrap.org/files/category_pictures/K-Linux.gif&quot; alt=&quot;Linux news&quot; title=&quot;Linux news&quot;  width=&quot;75&quot; height=&quot;75&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;!-- google_ad_section_start --&gt;&lt;p&gt;&quot;&lt;i&gt;With latencytop, I noticed that the (in memory) atime updates during a kernel build had latencies of 600 msec or longer; this is obviously not so nice behavior. Other EXT3 journal related operations had similar or even longer latencies,&lt;/i&gt;&quot; &lt;a href=&quot;http://kerneltrap.org/mailarchive/linux-kernel/2007/10/15/343499&quot;&gt;Arjan van de Ven reported&lt;/a&gt;, describing a &quot;mass priority inversion&quot; caused by, &quot;&lt;i&gt;an interaction between EXT3 and CFQ in that CFQ tries to be fair to everyone, including kjournald. However, in reality, kjournald is &#039;special&#039; in that it does a lot of journal work&lt;/i&gt;&quot;.  Finally, he offered a tiny patch to resolve the issue, &quot;&lt;i&gt;the patch below makes kjournald of the IOPRIO_CLASS_RT priority to break this priority inversion behavior. With this patch, the latencies for atime updates (and similar operation) go down by a factor of 3x to 4x !&lt;/i&gt;&quot;&lt;/p&gt;
&lt;p&gt;Andrew Morton took a cautious stance, &quot;&lt;i&gt;seems a pretty fundamental change which could do with some careful benchmarking, methinks.  See, your patch amounts to &#039;do more seeks to improve one test case&#039;.  Surely other testcases will worsen.  What are they?&lt;/i&gt;&quot; CFQ author Jens Axboe agreed, &quot;&lt;i&gt;It should not be merged as-is, instead I&#039;ll provide a function to do this.&lt;/i&gt;&quot;  Ingo Molnar wasn&#039;t convinced, &quot;&lt;i&gt;atime update latencies went down by a factor of 3x-4x ... but what bothers me even more is the large picture. Linux&#039;s development is still fundamentally skewed towards bandwidth (which goes up with hardware advances anyway), while the focus on latencies is very lacking (which users do care about much more and which usually does _not_ improve with improved hardware), so i cannot see why we shouldnt apply this.&lt;/i&gt;&quot;  He added, &quot;&lt;i&gt;if bandwidth hurts anywhere, it will be pointed out and fixed, we&#039;ve got  like tons of bandwidth benchmarks and it&#039;s _easy_ to fix bandwidth problems. But _finally_ we now have desktop latency tools, hard numbers and patches that fix them, but what do we do ... we put up extra roadblocks??&lt;/i&gt;&quot;  Andrew calmy replied, &quot;&lt;i&gt;I think the situation is that we&#039;ve asked for some additional what-can-be-hurt-by-this testing.  Yes, we could sling it out there and wait for the reports.  But often that&#039;s a pretty painful process and regressions can be discovered too late for us to do anything about them.&lt;/i&gt;&quot;&lt;/p&gt;
&lt;!-- google_ad_section_end --&gt;&lt;p&gt;&lt;a href=&quot;http://kerneltrap.org/Linux/Caution_and_Latency&quot; target=&quot;_blank&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://kerneltrap.org/Linux/Caution_and_Latency#comments</comments>
 <category domain="http://kerneltrap.org/Andrew_Morton">Andrew Morton</category>
 <category domain="http://kerneltrap.org/Arjan_van_de_Ven">Arjan van de Ven</category>
 <category domain="http://kerneltrap.org/CFQ">CFQ</category>
 <category domain="http://kerneltrap.org/Ingo_Molnar">Ingo Molnar</category>
 <category domain="http://kerneltrap.org/Jens_Axboe">Jens Axboe</category>
 <category domain="http://kerneltrap.org/latencytop">latencytop</category>
 <category domain="http://kerneltrap.org/Linux">Linux</category>
 <category domain="http://kerneltrap.org/low-latency">low-latency</category>
 <category domain="http://kerneltrap.org/news/linux">Linux news</category>
 <pubDate>Mon, 22 Oct 2007 12:28:25 +0000</pubDate>
 <dc:creator>Jeremy</dc:creator>
 <guid isPermaLink="false">14641 at http://kerneltrap.org</guid>
</item>
<item>
 <title>Measuring Kernel Marker Overhead</title>
 <link>http://kerneltrap.org/Linux/Measuring_Kernel_Marker_Overhead</link>
 <description>&lt;div class=&quot;taxonomy-images&quot;&gt;&lt;a href=&quot;/news/linux&quot; class=&quot;taxonomy-image-links&quot;&gt;&lt;img src=&quot;http://kerneltrap.org/files/category_pictures/K-Linux.gif&quot; alt=&quot;Linux news&quot; title=&quot;Linux news&quot;  width=&quot;75&quot; height=&quot;75&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;!-- google_ad_section_start --&gt;&lt;p&gt;&quot;&lt;i&gt;It looks to be about 2.1% increase in time to do the make/mount/unmount operations with the marker patches in place and no blktrace operations,&lt;/i&gt;&quot; Alan Brunelle summarized some benchmarks testing the overhead of the kernel markers patches.  He continued, &quot;&lt;i&gt;with the blktrace operations in place we see about a 3.8% decrease in time to do the same ops.&lt;/i&gt;&quot;  Block layer maintainer Jens Axboe responded favorably, &quot;&lt;i&gt;thanks for running these numbers. I don&#039;t think you have to bother with it more. My main concern was a performance regression, increasing the overhead of running blktrace.&lt;/i&gt;&quot;  He added, &quot;&lt;i&gt;I&#039;d say the above is Good Enough for me,&lt;/i&gt;&quot; acking the kernel marker patches.&lt;/p&gt;
&lt;p&gt;Jens went on to muse, &quot;&lt;i&gt;I do wonder about that performance _increase_ with blktrace enabled. I remember that we have seen and discussed something like this before, it&#039;s still a puzzle to me...&lt;/i&gt;&quot;  Mathieu Desnoyers agreed, &quot;&lt;i&gt;interesting question indeed,&lt;/i&gt;&quot; going on to suggest possible future tests to understand the unexpected performance increase.  &lt;code&gt;blktrace&lt;/code&gt; is a block layer IO tracing tool for providing detailed information about request queue operations, originally developed by Jens Axboe and merged into the mainline kernel in 2.6.17-rc1.&lt;/p&gt;
&lt;!-- google_ad_section_end --&gt;&lt;p&gt;&lt;a href=&quot;http://kerneltrap.org/Linux/Measuring_Kernel_Marker_Overhead&quot; target=&quot;_blank&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://kerneltrap.org/Linux/Measuring_Kernel_Marker_Overhead#comments</comments>
 <category domain="http://kerneltrap.org/2.6.17">2.6.17</category>
 <category domain="http://kerneltrap.org/taxonomy/term/1049">Alan Brunelle</category>
 <category domain="http://kerneltrap.org/blktrace">blktrace</category>
 <category domain="http://kerneltrap.org/block_layer">block layer</category>
 <category domain="http://kerneltrap.org/Jens_Axboe">Jens Axboe</category>
 <category domain="http://kerneltrap.org/kernel_markers">kernel markers</category>
 <category domain="http://kerneltrap.org/Mathieu_Desnoyers">Mathieu Desnoyers</category>
 <category domain="http://kerneltrap.org/performance">performance</category>
 <category domain="http://kerneltrap.org/news/linux">Linux news</category>
 <pubDate>Sun, 07 Oct 2007 13:02:36 +0000</pubDate>
 <dc:creator>Jeremy</dc:creator>
 <guid isPermaLink="false">14531 at http://kerneltrap.org</guid>
</item>
<item>
 <title>SG Chaining Performance</title>
 <link>http://kerneltrap.org/Linux/SG_Chaining_Performance</link>
 <description>&lt;div class=&quot;taxonomy-images&quot;&gt;&lt;a href=&quot;/news/linux&quot; class=&quot;taxonomy-image-links&quot;&gt;&lt;img src=&quot;http://kerneltrap.org/files/category_pictures/K-Linux.gif&quot; alt=&quot;Linux news&quot; title=&quot;Linux news&quot;  width=&quot;75&quot; height=&quot;75&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;!-- google_ad_section_start --&gt;&lt;p&gt;Jens Axboe &lt;a href=&quot;http://kerneltrap.org/mailarchive/linux-kernel/2007/9/21/265259&quot;&gt;detailed the changes&lt;/a&gt; in his &lt;a href=&quot;http://git.kernel.org/?p=linux/kernel/git/axboe/linux-2.6-block.git;a=summary&quot;&gt;linux-2.6-block.git&lt;/a&gt; tree that he plans to merge into the upcoming 2.6.24 kernel.  Among the changes were the necessary updates to enable SG chaining which is used for &lt;a href=&quot;http://kerneltrap.org/node/8176&quot;&gt;large IO commands&lt;/a&gt;, &quot;&lt;i&gt;the goal of sg chaining is to allow support for very large sgtables, without requiring that they be allocated from one contigious piece of memory.&lt;/i&gt;&quot;  Andrew Morton asked for more information, &quot;&lt;i&gt;presumably sg chaining means more overhead on the IO submission paths?  If so, has this been quantified?&lt;/i&gt;&quot;&lt;/p&gt;
&lt;p&gt;Jens explained that there is no overhead for existing logic which doesn&#039;t use sg chaining, &quot;&lt;i&gt;just cleanups to drivers to use &lt;code&gt;sg_next()&lt;/code&gt; and &lt;code&gt;for_each_sg()&lt;/code&gt; and so on.&lt;/i&gt;&quot;  He continued:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&quot;For actually using the sg chaining, there&#039;s some overhead of course. Say we support 256 entries without chaining, or 1mb with 4kb pages. A request with 1000 entried would require 4 trips to the allocator to allocate the chainable lists and 4 trips when freeing that list again.  We don&#039;t loop the sg list on setup of freeing, just jump to the correct locations.  So even for chaining, the cost isn&#039;t that big. It enables us to support much larger IO commands and potentially speed up some devices quite a lot, so CPU cost is less of a concern. And for small sglists, there isn&#039;t a noticable overhead.&quot;&lt;/p&gt;&lt;/blockquote&gt;
&lt;!-- google_ad_section_end --&gt;&lt;p&gt;&lt;a href=&quot;http://kerneltrap.org/Linux/SG_Chaining_Performance&quot; target=&quot;_blank&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://kerneltrap.org/Linux/SG_Chaining_Performance#comments</comments>
 <category domain="http://kerneltrap.org/2.6.24">2.6.24</category>
 <category domain="http://kerneltrap.org/Andrew_Morton">Andrew Morton</category>
 <category domain="http://kerneltrap.org/Jens_Axboe">Jens Axboe</category>
 <category domain="http://kerneltrap.org/taxonomy/term/339">large IO</category>
 <category domain="http://kerneltrap.org/Linux">Linux</category>
 <category domain="http://kerneltrap.org/merge_window">merge window</category>
 <category domain="http://kerneltrap.org/sg_chaining">sg chaining</category>
 <category domain="http://kerneltrap.org/news/linux">Linux news</category>
 <pubDate>Sun, 23 Sep 2007 18:34:50 +0000</pubDate>
 <dc:creator>Jeremy</dc:creator>
 <guid isPermaLink="false">14430 at http://kerneltrap.org</guid>
</item>
<item>
 <title>Linux: DRBD, The Distributed Replicated Block Device</title>
 <link>http://kerneltrap.org/node/13983</link>
 <description>&lt;div class=&quot;taxonomy-images&quot;&gt;&lt;a href=&quot;/news/linux&quot; class=&quot;taxonomy-image-links&quot;&gt;&lt;img src=&quot;http://kerneltrap.org/files/category_pictures/K-Linux.gif&quot; alt=&quot;Linux news&quot; title=&quot;Linux news&quot;  width=&quot;75&quot; height=&quot;75&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;!-- google_ad_section_start --&gt;&lt;p&gt;Lars Ellenberg started an effort to get &lt;a href=&quot;http://www.drbd.org/&quot;&gt;DRBD&lt;/a&gt;, the Distributed Replicated Block Device merged into the Linux kernel.  When asked for clarification as to what it was, Lars explained, &quot;&lt;i&gt;think of it as RAID1 over TCP.  Typically you have one Node in Primary, the other as Secondary, replication target only. But you can also have both Active, for use with a cluster file system.&lt;/i&gt;&quot;  Earlier in the thread he described it as &quot;&lt;i&gt;a stacked block device driver&lt;/i&gt;&quot;.&lt;/p&gt;
&lt;p&gt;Much of the initial review focused on the need to comply with kernel coding style guidelines.  Kyle Moffett offered a much lengthier review, noting at one point in the code, &quot;&lt;i&gt;how about fixing this to actually use proper workqueues or something instead of this open-coded mess?&lt;/i&gt;&quot;  Lars replied, &quot;&lt;i&gt;unlikely to happen &#039;right now&#039;.  But it is on our todo list...&lt;/i&gt;&quot;  Jens Axboe added, &quot;&lt;i&gt;but stuff like that is definitely a merge show stopper, jfyi&lt;/i&gt;&quot;.&lt;/p&gt;
&lt;!-- google_ad_section_end --&gt;&lt;p&gt;&lt;a href=&quot;http://kerneltrap.org/node/13983&quot; target=&quot;_blank&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://kerneltrap.org/node/13983#comments</comments>
 <category domain="http://kerneltrap.org/taxonomy/term/759">DRBD</category>
 <category domain="http://kerneltrap.org/HA">HA</category>
 <category domain="http://kerneltrap.org/Jens_Axboe">Jens Axboe</category>
 <category domain="http://kerneltrap.org/taxonomy/term/758">Kyle Moffett</category>
 <category domain="http://kerneltrap.org/taxonomy/term/757">Lars Ellenberg</category>
 <category domain="http://kerneltrap.org/RAID">RAID</category>
 <category domain="http://kerneltrap.org/news/linux">Linux news</category>
 <pubDate>Mon, 23 Jul 2007 18:53:55 +0000</pubDate>
 <dc:creator>Jeremy</dc:creator>
 <guid isPermaLink="false">13983 at http://kerneltrap.org</guid>
</item>
<item>
 <title>Linux:  Large IO Commands</title>
 <link>http://kerneltrap.org/node/8176</link>
 <description>&lt;div class=&quot;taxonomy-images&quot;&gt;&lt;a href=&quot;/news/linux&quot; class=&quot;taxonomy-image-links&quot;&gt;&lt;img src=&quot;http://kerneltrap.org/files/category_pictures/K-Linux.gif&quot; alt=&quot;Linux news&quot; title=&quot;Linux news&quot;  width=&quot;75&quot; height=&quot;75&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;!-- google_ad_section_start --&gt;&lt;p&gt;Jens Axboe [&lt;a href=&quot;http://kerneltrap.org/node/view/7637&quot;&gt;interview&lt;/a&gt;] posted a series of ten patches that add support for large IO commands.  He began by defining the problem:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&quot;Some people complain that Linux doesn&#039;t support really large IO commands. The main reason why we do not support infinitely sized IO is that we need to allocate a scatterlist to fill these elements into for dma mapping. The Linux scatterlist is an array of scatterlist elements, so we need to allocate a contiguous piece of memory to hold them all. On i386, we can at most fit 256 scatterlist elements into a page, and on x86-64 we are stuck with 128. So that puts us somewhere between 512kb and 1024kb for a single IO.&quot;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Jens went on to explain his solution, &quot;&lt;i&gt;to get around that limitation, this patchset introduces an sg chaining concept. The way it works is that the last element of an sg table can point to a new sgtable, thus extending the size of the total IO scatterlist greatly.&lt;/i&gt;&quot;  Regarding the current status he noted, &quot;&lt;i&gt;it works for me, but you can&#039;t enable large commands on anything but i386 right now. I still need to go over the x86-64 iommu bits to enable it there as well.&lt;/i&gt;&quot;&lt;/p&gt;
&lt;!-- google_ad_section_end --&gt;&lt;p&gt;&lt;a href=&quot;http://kerneltrap.org/node/8176&quot; target=&quot;_blank&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://kerneltrap.org/node/8176#comments</comments>
 <category domain="http://kerneltrap.org/Jens_Axboe">Jens Axboe</category>
 <category domain="http://kerneltrap.org/taxonomy/term/189">Kernel</category>
 <category domain="http://kerneltrap.org/taxonomy/term/339">large IO</category>
 <category domain="http://kerneltrap.org/Linux">Linux</category>
 <category domain="http://kerneltrap.org/sg_chaining">sg chaining</category>
 <category domain="http://kerneltrap.org/news/linux">Linux news</category>
 <pubDate>Wed, 09 May 2007 12:37:24 +0000</pubDate>
 <dc:creator>Jeremy</dc:creator>
 <guid isPermaLink="false">8176 at http://kerneltrap.org</guid>
</item>
<item>
 <title>Linux:  Syslets &amp; Threadlets</title>
 <link>http://kerneltrap.org/node/7753</link>
 <description>&lt;div class=&quot;taxonomy-images&quot;&gt;&lt;a href=&quot;/news/linux&quot; class=&quot;taxonomy-image-links&quot;&gt;&lt;img src=&quot;http://kerneltrap.org/files/category_pictures/K-Linux.gif&quot; alt=&quot;Linux news&quot; title=&quot;Linux news&quot;  width=&quot;75&quot; height=&quot;75&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;!-- google_ad_section_start --&gt;&lt;p&gt;Announcing the third version of his syslets subsystem patches [&lt;a href=&quot;http://kerneltrap.org/node/7737&quot;&gt;story&lt;/a&gt;], Ingo Molnar [&lt;a href=&quot;http://kerneltrap.org/node/view/517&quot;&gt;interview&lt;/a&gt;] noted that he has implemented many fundamental changes to the code including the introduction of threadlets, &quot;&lt;i&gt;&#039;threadlets&#039; are basically the user-space equivalent of syslets: small functions of execution that the kernel attempts to execute without scheduling. If the threadlet blocks, the kernel creates a real thread from it, and execution continues in that thread. The &#039;head&#039; context (the context that never blocks) returns to the original function that called the threadlet.&lt;/i&gt;&quot;  As threadlets are only moved into a separate thread context if they block, Ingo refers to them as &#039;optional threads&#039;.  He also describes them as &#039;on-demand parallelism&#039;, &quot;&lt;i&gt;user-space does not have to worry about setting up, sizing and feeding a thread pool - the kernel will execute the workload in a single-threaded manner as long as it makes sense, but once the context blocks, a parallel context is created. So parallelism inside applications is utilized in a natural way.&lt;/i&gt;&quot; &lt;/p&gt;
&lt;p&gt;Ingo goes on to note that the syslet code and API has been significantly enhanced in this latest release, &quot;&lt;i&gt;the v3 code is ABI-incompatible with v2, due to these fundamental changes.&lt;/i&gt;&quot;  He adds, &quot;&lt;i&gt;syslets (small, kernel-side, scripted &#039;syscall plugins&#039;) are still supported - they are (much...) harder to program than threadlets but they allow the highest performance. Core infrastructure libraries like glibc/libaio are expected to use syslets. Jens Axboe&#039;s FIO tool already includes support for v2 syslets, and the following patch updates FIO to the v3 API&lt;/i&gt;&quot;.&lt;/p&gt;
&lt;!-- google_ad_section_end --&gt;&lt;p&gt;&lt;a href=&quot;http://kerneltrap.org/node/7753&quot; target=&quot;_blank&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://kerneltrap.org/node/7753#comments</comments>
 <category domain="http://kerneltrap.org/taxonomy/term/386">API</category>
 <category domain="http://kerneltrap.org/Ingo_Molnar">Ingo Molnar</category>
 <category domain="http://kerneltrap.org/Jens_Axboe">Jens Axboe</category>
 <category domain="http://kerneltrap.org/taxonomy/term/412">syslets</category>
 <category domain="http://kerneltrap.org/taxonomy/term/413">threadlets</category>
 <category domain="http://kerneltrap.org/news/linux">Linux news</category>
 <pubDate