Linus Torvalds announced the first release candidate for the upcoming 2.6.21 kernel, ending the two-week merge window [story], "there's a lot of changes, as is usual for an -rc1 thing, but at least so far it would seem that 2.6.20 has been a good base, and I don't think we have anything *really* scary here." Linus noted that the tickless kernel patch [story] was finally merged into the mainline kernel, "the most interesting core change may be the dyntick/nohz one, where timer ticks will only happen when needed. It's been brewing for a _loong_ time, but it's in the standard kernel now as an option." Thomas Gleixner explained a year ago how this could result in cooler CPUs and power savings, "the tickless kernel feature (CONFIG_NO_HZ) enables 'on-demand' timer interrupts: if there is no timer to be expired for say 1.5 seconds when the system goes idle, then the system will stay totally idle for 1.5 seconds."
As for the rest of the changes, Linus added, "there's a ton of architecture updates (arm, mips, powerpc, x86, you name it), ACPI updates, and lots of driver work. And just a lot of cleanups." Release candidate kernels can be downloaded from your nearest kernel.org mirror. You can browse through all the changes using the gitweb interface. Kernel Newbiews maintains a useful summary of all the changes going into the latest version of the Linux kernel.
Ingo Molnar [interview] posted a second version of his syslets subystem patch set, which offers asynchrous system call support [story]. He noted that the effort is a work in progress, and that there are still outstanding issues to be fixed, "the biggest conceptual change in v2 is the ability of cachemiss threads to be turned into user threads. This fixes signal handling, makes them ptrace-eable, etc," going on to list numerous fixes since the first release. He noted that prior to releasing a third version of the patch set he will add support for multiple completion rings, add logic to share the 'spare thread' between the rings to further reduce startup costs, and remove reliance on mlock().
Linus Torvalds commented, "I'm still not a huge fan of the user space interface, but at least the core code looks quite clean. No objections on that front." He referred to earlier comments in which he had reacted strongly to the syslets userland interface saying, "I dislike it intensely, because it's so _close_ to being usable. But the programming interface looks absolutely horrid for any 'casual' use, and while the loops etc look like fun, I think they are likely to be less than useful in practice. Yeah, you can do the 'setup and teardown' just once, but it ends up being 'once per user', and it ends up being a lot of stuff to do for somebody who wants to just do some simple async stuff." He later noted that he was in particular concerned with the "register" functionality, which Ingo then simplified.
Rik van Riel [interview] posted some thoughts on the page replacement requirements of the Linux VM, noting that the same kinds of bugs have been getting fixed and reintroduced over the past few years, "this has convinced me that it is time to take a look at the actual requirements of a page replacement mechanism, so we can try to fix things without reintroducing other bugs. Understanding what is going on should also help us deal better with really large memory systems." He added his thoughts from this email to the linux-mm wiki, which he plans to update as new requirements surface.
The initial requirements shortlist included seven items: "1) must select good pages for eviction; must not submit too much I/O at once. Submitting too much I/O at once can kill latency and even lead to deadlocks when bounce buffers (highmem) are involved. Note that submitting sequential I/O is a good thing; 2) must be able to efficiently evict the pages on which pageout I/O completed; 3) must be able to deal with multiple memory zones efficiently; 4) must always have some pages ready to evict. Scanning 32GB of "recently referenced" memory is not an option when memory gets tight; 5) must be able to process pages in batches, to reduce SMP lock contention; 6) a bad decision should have bounded consequences. The VM needs to be resilient against its own heuristics going bad; 7) low overhead of execution." He continued on with some more in depth discussion of the various requirements.
Following the release of the 2.6.20 kernel [story] Andrew Morton [interview] posted a list of patches in his -mm kernel, summarizing for each his plans as to whether or not they will be pushed upstream for inclusion in the upcoming 2.6.21 kernel. Andrew commented, "I'm getting fed up of holding onto hundreds of patches against subsystem trees, sending them over and over again and seeing nothing happen. I sent 242 patches out to subsystem maintainers on Monday and look at what's still here." In response to some confusion as to what happens to these patches, he went on explain, "once a subsystem has a subsystem tree (git or quilt) I basically never merge anything which belongs to that tree. It's always originator->mm->subsystemtree->Linus".
Linux creator Linus Torvalds announced the release of the 2.6.20 kernel, summarizing, "a lot of stuff. All over. And KVM." He further noted, "I tried rather hard to make 2.6.20 largely a 'stabilization release'. Unlike a lot of kernels lately, there aren't really any big fundamental changes to some core infrastructure area, and while we always have bugs, I really am hoping that we fixed many more than we introduced." His announcement started with a news parody, "in a widely anticipated move, Linux 'headcase' Torvalds today announced the immediate availability of the most advanced Linux kernel to date, version 2.6.20." Linus continued:
"As ICD head analyst Walter Dickweed put it: "Releasing a new kernel on Superbowl Sunday means that the important 'pasty white nerd' constituency finally has something to do while the rest of the country sits comatose in front of their 65" plasma screens."
"Walter was immediately attacked for his racist and insensitive remarks by Geeks without Borders representative Marilyn vos Savant, who pointed out that not all of their members are either pasty nor white. "Some of them even shower!" she added, claiming that the constant stereotyping hurts nerds' standing in society.
Geeks outside the US were just confused about the whole issue, and were heard wondering what the big hoopla was all about. Some of the more culturally aware of them were heard snickering about balls that weren't even round.
Jens Axboe has been involved with Linux since 1993. 30 years old, he lives in Copenhagen, Denmark, and works as a Linux Kernel developer for Oracle. His block layer rewrite launched the 2.5 kernel development branch, a layer he continues to maintain and improve. Interested in most anything dealing with IO, he has introduced several new IO schedulers to the kernel, including the default CFQ, or Complete Fair Queuing scheduler.
In this interview, Jens talks about how he got interested in Linux, how he became the maintainer of the block layer and other block devices, and what's involved in being a maintainer. He describes his work on IO schedulers, offering an indepth look at the design and current status of the CFQ scheduler, including a peek at what's in store for the future. He conveys his excitement about the new splice IO model, explaining how it came about and how it works. And he discusses the current 2.6 kernel development process, the impact of git, and why the GPL is important to him.
"The Linux kernel community is offering all companies free Linux driver development," Greg Kroah-Hartman posted in an open offer on the lkml, for all types of devices "from USB toys to PCI video devices to high-speed networking cards." He explains, "all that is needed is some kind of specification that describes how your device works, or the email address of an engineer that is willing to answer questions every once in a while. A few sample devices might be good to have so that debugging doesn't have to be done by email, but if necessary, that can be done." He added, "if your company is worried about NDA issues surrounding your device's specifications, we have arranged a program with OSDL/TLF's Tech Board to provide the legal framework where a company can interact with a member of the kernel community in order to properly assure that all needed NDA requirements are fulfilled." Greg suggests that companies participating can allow their developers to focus on drivers for other operating systems, "and you can add 'supported on Linux' to your product's marketing material." He further explains:
"You will receive a complete and working Linux driver that is added to the main Linux kernel source tree. The driver will be written by some of the members of the Linux kernel developer community (over 1500 strong and growing). This driver will then be automatically included in all Linux distributions, including the 'enterprise ones. It will be automatically kept up to date and working through all Linux kernel API changes. This driver will work with all of the different CPU types supported by Linux (for the CPUs that support the bus types that your device works on), the largest number of CPU types supported by any operating system ever before in the history of computing."
Linux creator Linus Torvalds announced the 2.6.20-rc6 release candidate kernel, "it's been more than a week since -rc5, but I blame everybody (including me) being away for Linux.conf.au and then me waiting for a few days afterwards to let everybody sync up." He asked that people test the regressions reported against earlier release candidates [story], "so that we can confirm whether they are still active and relevant." Linus noted that he hoped this would be the final release candidate before 2.6.20 is released, then went on to discuss what's new:
"As to -rc6 itself: the bulk of it are the MTD updates (including a few new drivers), and the POWER update (and the bulk of _that_ in terms of patch size being defconfig updates ;)
"But there's various random fixes in infiniband, DVB, network drivers, scsi, usb, some filesystems (cifs, jffs2, nfs, ntfs, ocfs2) as well as core networking too. Oh, and KVM, of course. And stuff I probably have already forgotten."
Theodore Ts'o announced that the 2007 Linux Kernel Summit will be moved from its usual location in Ottawa, Canada, taking place this year in Cambridge, England. Ted described the move as a one-time experiment to be re-evaluated at a future date to see if it's worth moving the Kernel Summit to other locations in the future. He noted, "I understand that if it were only up to us developers, we'd want to have the conference in Honolulu, or perhaps in Australia or New Zealand. Unfortunately there are other stakeholers and other financial realities involved." Regarding this year's summit, Ted explained:
"This year, the Kernel Summit will be held in Cambridge, England, at the DeVere University Arms Hotel, September 5-6 (with a welcome reception on the 4th). The decision to move the Kernel Summit to England is a one-year experiment based on the very strong request of last year's kernel summit attendees to try a location outside of Ottawa, and especially from the roughly 1/3rd of the attendees that come from the UK or Europe. So the plan is for us to book the Ottawa Congress Ceter space for July 2008 (which we will need to do by mid-year 2007), and pending how well the Cambridge venue works out in September 2007, we'll figure out how often we want to try moving the Kernel Summit to other locations in future years beyond 2008."
Nadia Derbey posted a set of patches to the Linux Kernel Mailing List titled Automatic Kernel Tunables, or AKT, explaining, "this is a series of patches that introduces a feature that makes the kernel automatically change the tunables values as it sees resources running out." The kernel portion of the AKT framework is described as providing sysfs interfaces for registering tunables, and for activating the automatic tuning of registered tunables. Nadia explains the second feature, "it can be called during resource allocation to tune up, and during resource freeing to tune down the registered tunable." The userland portion of the framework provides an interface for configuring whether or not a tunable should be set automatically.
The default automatic adjustment routine provided by the patches simply allow a tunable to be configured with a minimum and maximum values, as well as a thresholds. If a monitored value grows beyond the defined threshold, the tunable is increased. If the monitored values shrinks below the defined threshold, the tunable is decreased. The patches also allow more complicated adjustment routines to be defined. The effort is part of the larger libtune project, aiming "at providing a standard API to unify the various ways Linux developers have to access kernel tunables, system information, resource consumptions."
Avi Kivity suggested that combining KVM, the Kernel-based Virtual Machine [story], with the dyntick patch [story] could improve overall KVM performance. He noted that it would likely improve performance of both the host by "avoiding expensive vmexits due to useless timer interrupts," as well as on the guest by "reducing the load on the host when the guest is idling (currently an idle guest consumes a few percent cpu)". Ingo Molnar [interview] pointed out that KVM with his -rt kernel already works with dynticks enabled on both the host and the guest, "using the dynticks code from the -rt kernel makes the overhead of an idle guest go down by a factor of 10-15". Ingo added that he hopes the dyntick patch will be ready to be merged into the upcoming mainline 2.6.21 kernel.
Rik van Riel [interview] noted that there were other ways to reduce the load of the guest when it's idling, "you do not need dynticks for this actually. Simple no-tick-on-idle like Xen has works well enough." Ingo explained, "s390 (and more recently Xen too) uses a next_timer_interrupt() based method to stop the guest tick - which works in terms of reducing guest load, but it doesnt stop the host-side interrupt. The highest quality approach is to have dynticks on both the host and the guest, and this also gives high-resolution timers and a modernized time/timer-events subsystem for both the host and the guest."
A thread on the lkml began with a query about using O_DIRECT when opening a file. An early white paper written by Andrea Arcangeli [interview] to describe the O_DIRECT patch before it was merged into the 2.4 kernel explains, "with O_DIRECT the kernel will do DMA directly from/to the physical memory pointed [to] by the userspace buffer passed as [a] parameter to the read/write syscalls. So there will be no CPU and memory bandwidth spent in the copies between userspace memory and kernel cache, and there will be no CPU time spent in kernel in the management of the cache (like cache lookups, per-page locks etc..)." Linux creator Linus Torvalds was quick to reply that despite all the claims there is no good reason for mounting files with O_DIRECT, suggesting that interfaces like madvise() and posix_fadvise() should be used instead, "there really is no valid reason for EVER using O_DIRECT. You need a buffer whatever IO you do, and it might as well be the page cache. There are better ways to control the page cache than play games and think that a page cache isn't necessary."
Linus went on to explain, "the only reason O_DIRECT exists is because database people are too used to it, because other OS's haven't had enough taste to tell them to do it right, so they've historically hacked their OS to get out of the way. As a result, our madvise and/or posix_fadvise interfaces may not be all that strong, because people sadly don't use them that much. It's a sad example of a totally broken interface (O_DIRECT) resulting in better interfaces not getting used, and then not getting as much development effort put into them." To further underscore his point, he humorously added:
"The whole notion of "direct IO" is totally brain damaged. Just say no.
This is your brain: O
This is your brain on O_DIRECT: .Any questions?
Jeff Garzik noted that the hardware documentation for the Promise SX4 chipset is being opened up and therefor the sata_sx4 driver is a good candidate for improvements, "I would like to take this opportunity to point hackers looking for a project at this hardware. The Promise SX4 is pretty neat, and it needs more attention than I can give, to reach its full potential." He notes that it is an older chipset that's probably not sold anymore, that the ATA programming interface is similar to that in the sata_promise driver, and that it contains a fully programmable on board DIMM and on board RAID5 XOR. Jeff went on to explain:
"A key problem is that, under Linux, sata_sx4 cannot fully exploit the RAID-centric power of this hardware by driving the hardware in 'dumb ATA mode' as it does. A better driver would notice when a RAID1 or RAID5 array contains multiple components attached to the SX4, and send only a single copy of the data to the card (saving PCI bus bandwidth tremendously). Similarly, a better driver would take advantage of the RAID5 XOR offload capabilities, to offload the entire RAID5 read or write transaction to the card.
"All this is difficult within either the MD or DM RAID frameworks, because optimizing each RAID transaction requires intimate knowledge of the hardware. We have the knowledge... but I don't have good ideas -- aside from an SX4-specific RAID 0/1/5/6 driver -- on how to exploit this knowledge."
A few hours before the new year, Linus Torvalds released the 2.6.20-rc3 Linux kernel, "in order to not get in trouble with MADR ("Mothers Against Drunk Releases") I decided to cut the 2.6.20-rc3 release early rather than wait for midnight, because it's bound to be new years _somewhere_ out there. So here's to a happy 2007 for everybody." In good humor, he noted that the new kernel would be available on all the kernel.org mirrors by the time everyone's New Years celebrations had concluded, "it's probably going to be up-to-date by the time the hangovers are mostly gone. At which point the first thing on any self-respecting geek's mind should obviously be: 'is there a new kernel release for me to try?'" Regarding the changes in the new release candidate, which include a data corruption fix [story], Linus summarized:
"The big thing at least for me personally is that nasty shared mmap corruption fix, but there's a number of other changes in here, many of them just documentation (and some media and network drivers). Shortlog and diffstat appended."
2.4 kernel maintainer Willy Tarreau [story] announced the release of the 2.4.34 stable Linux kernel, "2.4.34 brings the usual bunch of security fixes, bugfixes, and adds support for gcc 4 to x86, x86-64 and sparc64, thanks to Mikael Pettersson's work." Willy also released the 2.4.33.7 kernel with a security fix added in 2.4.34-rc3. He went on to note some caveats:
"One user reported regular panics with aacraid since 2.4.32, so there's no regression here. I will seek for some help to get this fixed in 2.4.35. I also get reports of people getting trapped by NIC vendors who suddenly change their ethernet chips with no big warning notice. The i82546GB chip which replaced the i82546EB in e1000 cards come to mind. It is not supported by the driver in 2.4.34 but I will try to solve this in 2.4.35 (right now, you have to download the vendor's drivers when you replace a NIC). Another driver should get some lifting : skge. I have got a few reports of problems with the vendor's sk98lin driver and I noticed the same problems at work (UDP becoming silent on NFS server)."