A lengthy debate that began with a suggestion to dual license the Linux kernel under the GPLv2 and the GPLv3 [story] continues on the Linux Kernel Mailing List. Throughout the ongoing thread Linux creator Linus Torvalds has spoken out on the GPLv2, the upcoming GPLv3, the BSD license, Tivo, the Free Software Foundation, and much more. During the discussion, he was asked we he chose the GPLv2 over the BSD license when he's obviously not a big fan of the FSF. Linus explained:
"Because I think the GPLv2 is a great license. And I don't like the FSF's radical world-view, but I am able to separate the license (the GPLv2) from the author and source of the license (rms and the FSF). Why do people always confuse the two? The GPLv2 stands on its own. The fact that I disagree with the FSF on how to act has _zero_ relevance for my choice of license.
"[...] But for a project I actually care about, I would never choose the BSD license. The license doesn't encode my fundamental beliefs of 'fairness'. I think the BSD license encourages a 'everybody for himself' mentality, and doesn't encourage people to work together, and to merge."
Chris Mason announced an early alpha release of his new Btrfs filesystem, "after the last FS summit, I started working on a new filesystem that maintains checksums of all file data and metadata." He listed the following features as "mostly implemented": "extent based file storage (2^64 max file size), space efficient packing of small files, space efficient indexed directories, dynamic inode allocation, writable snapshots, subvolumes (separate internal filesystem roots), checksums on data and metadata (multiple algorithms available), very fast offline filesystem check". He listed the following features as yet to be implemented: "object level mirroring and striping, strong integration with device mapper for multiple device support, online filesystem check, efficient incremental backup and FS mirroring". Regarding the current state of the project, Chris said:
"The current status is a very early alpha state, and the kernel code weighs in at a sparsely commented 10,547 lines. I'm releasing now in hopes of finding people interested in testing, benchmarking, documenting, and contributing to the code. I've gotten this far pretty quickly, and plan on continuing to knock off the features as fast as I can. Hopefully I'll manage a release every few weeks or so. The disk format will probably change in some major way every couple of releases."
"I was impressed in the sense that it was a hell of a lot better than the disaster that were the earlier drafts," Linus Torvalds explained in reply to a comment suggesting that he was impressed with the final draft of the GPLv3. He went on to add, "I still think GPLv2 is simply the better license." The discussion began with a suggestion that the Linux kernel be dual-licensed GPLv2 and GPLv3. Linus noted, "I consider dual-licensing unlikely (and technically quite hard), but at least _possible_ in theory. I have yet to see any actual *reasons* for licensing under the GPLv3, though. All I've heard are shrill voices about 'tivoization' (which I expressly think is ok) and panicked worries about Novell-MS (which seems way overblown, and quite frankly, the argument seems to not so much be about the Novell deal, as about an excuse to push the GPLv3)." In a followup email, Linus added:
"Btw, if Sun really _is_ going to release OpenSolaris under GPLv3, that _may_ be a good reason. I don't think the GPLv3 is as good a license as v2, but on the other hand, I'm pragmatic, and if we can avoid having two kernels with two different licenses and the friction that causes, I at least see the _reason_ for GPLv3. As it is, I don't really see a reason at all."
The translation of a some kernel documentation into Japanese led to a discussion as to whether or not it was appropriate to include translated documentation with the kernel source code. One concern that was expressed was that as the number of included translations grows, so would the size of the kernel. Another concern was the liklihood that as time passes the various translations might become out of date. Jesper Juhl suggested one workaround, "since the common language of most kernel contributors is english I personally feel that we should stick to just that one language in the tree and then perhaps keep translations on a website somewhere. So the authoritative docs stay in the tree, in english, so that as many contributors as possible can read and update them."
Greg KH noted that there were a number of files in the kernel that change infrequently and that he would like to see included, "I really do want to see a translated copy of the HOWTO, stable-api-nonsense.txt, and possibly a few other files in the main kernel tree (SubmittingPatches, CodingStyle, and SubmittingDrivers might all be good canidates for this.) These files change relatively infrequently (the HOWTO file has had only 7 changes in 1 and 1/2 years, and they were very minor ones) and should be easy for the translators to keep up with."
Mel Gorman offered a first release of a patchset that compacts memory, "this is a prototype for compacting memory to reduce external fragmentation so that free memory exists as fewer, but larger contiguous blocks. Rather than being a full defragmentation solution, this focuses exclusively on pages that are movable via the page migration mechanism." He notes that the patchset is currently incomplete, and at this time memory is only compacted manually, not automatically, "this version of the patchset is mainly concerned with getting the compaction mechanism correct." Mel goes on to describe how it works:
"A single compaction run involves two scanners operating within a zone - a migration and a free scanner. The migration scanner starts at the beginning of a zone and finds all movable pages within one pageblock_nr_pages-sized area and isolates them on a migratepages list. The free scanner begins at the end of the zone and searches on a per-area basis for enough free pages to migrate all the pages on the migratepages list. As each area is respectively migrated or exhausted of free pages, the scanners are advanced one area. A compaction run completes within a zone when the two scanners meet."
What started as the review of a bug report grew into an interesting debate as Linus Torvalds slammed the current suspend and resume [story] design in the Linux Kernel, "why the HELL cannot you realize that kernel threads are different? The right thing to do is AND HAS ALWAYS BEEN, to stop and start user threads only around the whole thing. Don't touch those kernel threads. Stop freezing them." Later in the discussion, Linus noted that he had no interest in Suspend to Disk (STD), and was only interested in a working Suspend to Ram (STR) implementation. He noted that complexity introduced by STD was infecting the STR logic, and that the two should be completely separated, "what irritates me is that STR really shouldn't have _had_ that bug at all. The only reason STR had the same bug as STD was exactly the fact that the two features are too closely inter-twined in the kernel. That irritates me hugely. We had a bug we should never had had! We had a bug because people are sharing code that shouldn't be shared! We had a bug because of code that makes no sense in the first place!" Linus noted that he doesn't use laptops much, but still likes STR on his desktop, "STR means they are quiet and don't waste energy when I don't use them, but they're instantly available when I care." He then went on to point to design flaws in the freezer:
"I actually don't think that processes should be frozen really at all. I agree that filesystems have to be frozen (and I think that checkpointing of the filesystem or block device is 'too clever'), but I just don't think that has anything to do with freezing processes. So I'd actually much prefer to freeze at the VFS (and socket layers, etc), and make sure that anybody who tries to write or do something else that we cannot do until resuming, will just be blocked (or perhaps just buffered)!"
In a humorous announcement for the latest release candidate of the upcoming 2.6.22 Linux kernel, Linus Torvalds noted that there were updates to the ARM, SH and Blackfin architectures. He also noted fixes to USB suspend, infiband, and the network stack, as well as updates to ATA, DVB and MMC, and network drivers. Noting that a three-day weekend was starting in the US he said, "so what's a pasty white nerd to do? You can't go out on the beach, because the goodlooking people will laugh at you, and kick sand in your face. I'm not bitter." Linus continued:
"But now you _can_ do something: you can download the latest -rc kernel, and smile smugly to yourself, knowing that you are running the latest and greatest on your machine. And suddenly it doesn't even matter that summer is coming, because you can just sit in the basement, and close the blinds, and bask in the warm light from your LCD, rather than the harsh glare of the daystar.."
Further information about what's new and changed in the upcoming 2.6.22 kernel can be found in the KernelNewbies wiki. The latest -rc can be downloaded from the Linux Kernel Archives [story], and the source changes can be browsed online using the gitweb interface.
In a recent lkml thread the concept of dumping an image of the kernel's memory to swap when the kernel hits a bug was discussed. Linus Torvalds pointed out that such a feature wasn't useful to an operating system like Linux that can ran on such a diverse assortment of computers, "yes, in a controlled environment, dumping the whole memory image to disk may be the right thing to do. BUT: in a controlled environment, you'll never get the kind of usage that Linux gets. Why do you think Linux (and Windows, for that matter) took away a lot of the market from traditional UNIX?" He went on to explain that there are systems where swap is not larger than the size of the core so collecting a crash dump would not be possible, that Linux instead tries to acknowledge bugs without crashing, and quite often the bug is actually in the drivers, "writing to disk when the biggest problem is a driver to begin with is INSANE." Comparing Linux to Solaris he added, "so the fact is, Solaris is crap, and to a large degree Solaris is crap exactly _because_ it assumes that it runs in a 'controlled environment'."
Alan Cox went on to point out that there are also privacy issues, "there is an additional factor - dumps contain data which variously is - copyright third parties, protected by privacy laws, just personally private, security sensitive (eg browser history) and so on. The only reasons you can get dumps back in the hands of vendors is because there are strong formal agreements controlling where they go and what is done with them." He went on to note that dump utilities are also not user friendly, "diskdump (and even more so netdump) are useful in the hands of a developer crashing their own box just like kgdb, but not in the the normal and rational end user response of 'its broken, hit reset'". Linus heartily agread, and suggested that anyone willing to use kernel dumps would be better off debugging through a firewire connection, " if you've ever picked through a kernel dump after-the-fact, I just bet you could have done equally well with firewire, and it would have had _zero_ impact on your kernel image. Now, contrast that with kdump, and ask yourself: which one do you think is worth concentrating effort on?"
Following a review of Ingo Molnar [interview]'s Completely Fair Scheduler [story], Srivatsa Vaddagiri posted a patch allowing the new scheduler to provide fairness at a per-group level rather than at a per-process level. He described the changes that he made and noted, "I have used 'uid' as the basis of grouping for timebeing (since that grouping concept is already in mainline today). The patch can be adapted to a more generic process grouping mechanism later."
Ingo reacted to the patch favorably, "yeah, i like this alot." He went on to comment, "the 'struct sched_entity' abstraction looks very clean, and that's the main thing that matters: it allows for a design that will only cost us performance if group scheduling is desired." He went on to ask, "if you could do a -v14 port and at least add minimal SMP support: i.e. it shouldnt crash on SMP, but otherwise no extra load-balancing logic is needed for the first cut - then i could try to pick all these core changes up for -v15. (I'll let you know about any other thoughts/details when i do the integration.)"
"In no case is it ok to just 'shut up the warning'," Linus Torvalds exclaimed in response to a patch that stifled a compiler warning. Reminiscent of a thread on the lkml last year [story], Linus pointed out that it is very important to understand and properly fix compiler warnings [story]:
"Please, we do NOT fix compiler warnings without understanding the code! That's a sure way to just introduce _new_ bugs, rather than fix old ones. So please, please, please, realize that the compiler is _stupid_, and fixing warnings without understanding the code is bad.
"In this case, anybody who actually spends 5 seconds looking at the code should have realized that the warning is just another way of saying that the author of the code was on some bad drugs, and the warnings WERE BETTER OFF REMAINING! Because that code _should_ have warnings. Big fat warnings about incompetence!?"
Miklos Szeredi posted a patch to allow files to be accessed as directories, offering the example of accessing the contents of a compressed tarball as you would any other directory. He noted that this is not the only application of the patch, "others might suggest accessing streams, resource forks or extended attributes through such an interface. However this patch only deals with the non-directory case, so directories would be excluded from that interface. But otherwise this patch doesn't limit the uses of the 'file as directory' concept in any way. It just adds the infrastructure to support these whacky beasts." Al Viro took an interest in the patch noting, "I'll look through the patch tonight; it sounds interesting, assuming that we don't run into serious crap with locking and revalidation logics." This was followed by an interesting discussion between Miklos and Al regarding the implementation of the patch.
Miklos went on to explain how the functionality works using mounts with special properties, "if a non-directory object is accessed with a trailing slash, then the filesystem may opt to let the file be accessed as a directory. In this case 'something' (as supplied by the filesystem) is mounted on top of the non-directory object." He then explained the following special properties of these mounts: "If there's no trailing slash after the file name, the mount won't be followed, even if the path resolution would otherwise follow mounts; The mount only stays there while it is referenced by some external object, like a pwd or an open file. When it is no longer referenced, it is automatically unmounted; Unlike 'real' mounts, this won't block unlink(2) or rename(2) on the underlying object."
Jesse Barnes posted a summary of recent efforts to improve the Linux kernel's support for graphics, "in collaboration with the [framebuffer] guys, we've been working on enhancing the kernel's graphics subsystem in an attempt to bring some sanity to the Linux graphics world and avoid the situation we have now where several kernel and userspace drivers compete for control of graphics devices." He then explained, "there are several reasons to pull modesetting and proper multihead support into the kernel: suspend/resume, debugging (e.g. panic), non-X uses, and more reliable VT switch," going on to offer detail on each of these listed reasons. Jesse followed these explanations with an overview of the current status of the code:
"The current codebase is still incomplete in many ways: locking needs to be (re-)added around our various list manipulation paths, we need better initial configuration logic, only the Intel driver has any support (and it's still missing suspend/resume and accelerated FB functions), we need to check modes against monitor limitations (which come from EDID or the user), CVT and GTF based mode generation still isn't used by the DRM modesetting code, and much more. I'm hoping that by posting this now, we can get some ideas about what requirements other people have for graphics on Linux so we can prioritize our work."
The task of tracking regressions between kernel releases [story] has been picked up by Michal Piotrowski who maintains a "known regressions" wiki page at Kernel Newbies. The list is divided into sections and mailed out to the lkml after each release candidate.
"As I understand, fair_clock is a monotonously increasing clock which advances at a pace inversely proportional to the load on the runqueue," Srivatsa Vaddagiri explained in a review of Ingo Molnar [interview]'s CFS CPU scheduler [story], "if load = 1 (task), it will advance at same pace as wall clock, as load increases it advances slower than wall clock." He continued on to ask some questions about the choices made in CFS as compared to the EEVDF CPU scheduler [story]. In the resulting discussion, Ingo offered some insight into the design of the CFS. He began:
"80% of CFS's design can be summed up in a single sentence: CFS basically models an 'ideal, precise multi-tasking CPU' on real hardware. 'Ideal multi-tasking CPU' is a (non-existent :-) CPU that has 100% physical power and which can run each task at precise equal speed, in parallel, each at 1/nr_running speed. For example: if there are 2 tasks running then it runs each at 50% physical power - totally in parallel.
"On real hardware, we can run only a single task at once, so while that one task runs the other tasks that are waiting for the CPU are at a disadvantage - the current task gets an unfair amount of CPU time. In CFS this fairness imbalance is expressed and tracked via the per-task p->wait_runtime (nanosec-unit) value. 'wait_runtime' is the amount of time the task should now run on the CPU for it become completely fair and balanced."
The question was asked on the lkml whether or not memory allocated by kmalloc and vmalloc is swappable. Rik van Reil offered a clear explanation as to why it is not, "unswappable kernel memory is simpler and faster," adding, "there really is no good reason for swapping kernel memory nowadays." He went on to explain:
"Over the last 15 years, the memory requirements of the Linux kernel have grown maybe a factor 10, while the memory of computers has grown by a factor of 1000.
"The data structures that grow with memory (mostly the mem_map[] array of page structs) has actually gotten smaller since the 2.4 kernel and now takes under 1% of memory even on x86-64."