Following up to a bug report against the 2.6.22 kernel, Andrew Morton and Linus Torvalds offered some tips on how to debug kernel problems. Andrew first pointed to netconsole.txt for instructions on setting up a netconsole, "when the machine has stalled, see if you can get a task trace with ALT-SYSRQ-t. This will require CONFIG_MAGIC_SYSRQ=y and possibly setting ignore_loglevel on the kernel boot command line."
Linus Torvalds suggested "git bisect" as an alternative, "[it] will take some time, but is really a lot easier" He explains, "there's almost 7000 commits in between 2.6.21 and 22, but that still means that in about fourteen recompiles/reboots, "git bisect" should tell us where your problem starts, which will hopefully make it obvious what the problem is (or at least pinpoint it a *lot*)." He goes on to detail how to install git, obtain the latest kernel, and run "git bisect", "doing a git bisect isn't really that hard, but fourteen compiles/reboots will take some time (well, the compiles will, the reboots aren't that bad). But even if you're not a git user, it really is very simple". Specifically, he notes, "start the 'git bisect' with '
git bisect good v2.6.21', '
git bisect bad v2.6.22', and it will pick a kernel version about half-way between the two points, and you can now start testing. For each kernel you try, if it boots fine, do '
git bisect good', otherwise boot into a working kernel, and then do '
git bisect bad'. Git will then pick the next 'halfway' kernel for that case."
Offering a potential alternative to the existing suspend and restore implementations in the Linux Kernel, Ying Huang posted a patch utilizing kexec, "kexec based hibernation has some potential advantages over uswsusp and suspend2. " He listed two such potential advantages, "the hibernation image size can exceed half of memory size easily," and, "the hibernation image can be written to and read from almost anywhere, such as a USB disk [or] NFS." He described the feature implemented by his patch as "jumping from a kexeced kernel to the original kernel", allowing someone to first boot from one kernel, then to kexec another crashdump kernel in reserved memory and run from it for a while, and finally to "jump back" to the original kernel.
Andrew Morton replied to the idea very positively, "this sounds awesome. Am I correct in expecting that ultimately the existing hibernation implementation just goes away and we reuse (and hence strengthen) the existing kexec (and kdump?) infrastructure? And that we get hibernation support almost for free on all kexec (and relocatable-kernel?) capable architectures? And that all the management of hibernation and resume happens in userspace?" He went on to ask, "how close do you think all this is to being a viable thing?" Ying replied, "the kexec jump is the first step, maybe the simplest step. There are many other issues to be resolved, at least the following ones," going on to list a series of steps that still have to be implemented before kexec based hibernation would be a viable option.
Following the release of the 2.6.22 kernel [story], Andrew Morton [interview] posted a list of a wide range of patches that are in his -mm kernel, summarizing for each his plans as to whether or not they will be pushed upstream for inclusion in the upcoming 2.6.23 kernel. Comments included simply noting "merge" or "hold", as well as "these appear to need some work,", "don't know, need to ping suitable developers over this work," and "sent to maintainer." Perhaps most entertaining was Andrew's response to the vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch, "this is scary. Will sit and admire it until it has been demonstrated to be a net gain." It is possible to track which patches are actually merged using the gitweb interface to Linus' kernel tree.
Andrew Morton submitted some documentation explaining the use of the "Signed-off-by" and "Acked-by" tags added when patches are submitted for conclusion into the Linux kernel. "The Signed-off-by: tag implies that the signer was involved in the development of the patch, or that he/she was in the patch's delivery path," the documentation explains, "if a person was not directly involved in the preparation or handling of a patch but wishes to signify and record their approval of it then they can arrange to have an Acked-by: line added to the patch's changelog." When asked about the possibility of including "Tested-by" tags, Andrew replied, "I think it's very useful information to have. For a start, it tells you who has the hardware and knows how to build a kernel. So if you're making a change to a driver and want it tested, you can troll the file's changelog looking for people who might be able to help."
The thread went on to discuss if Ack and Nack patches were useful from non-maintainers. Andrew suggested that a without additional information they don't offer much, "it's better to just provide constructive, detailed technical comments and from that it becomes pretty obvious to all parties whether or not the patch has a future. If you did properly provide that useful feedback then the 'ack' or 'nack' bit becomes redundant." He went on to stress the need for useful feedback, "frankly, I don't trust a simple 'ack' much at all. It's the kernel equivalent of 'whoa, kewl!'"
Andrew Morton [interview] sent out the latest lguest patches for review, noting that he intends to merge the code into the mainline kernel, "some concern was expressed over the lguest review status, so I shall send the patches out again for people to review, to test, to make observations about the author's personal appearance, etc. I'll plan on sending these patches off to Linus in a week's time, assuming all goes well." The project's FAQ notes, "lguest is designed to be simple to use and modify, with the aim of keeping the codebase small. Currently it's around 5000 lines including userspace utility, whereas kvm is over 10 times that size, and Xen is around 10 times bigger again (of course, both have far more features)."
The lguest patches are written and maintained by Rusty Russell [interview] who also authored Rusty's Remarkably Unreliable Guide to Lguest, the project's documentation. The guide explains, "lguest is designed to be a minimal hypervisor for the Linux kernel, for Linux developers and users to experiment with virtualization with the minimum of complexity. Nonetheless, it should have sufficient features to make it useful for specific tasks, and, of course, you are encouraged to fork and enhance it." In the FAQ, lguest is compared to kvm [story], "kvm requires hardware virtualization support (most recent Intel and AMD chips have it), but it can run almost any Operating System (since it does full virtualization. It also has 64-bit support. Lguest doesn't do full virtualization: it only runs a Linux kernel with lguest support." The FAQ also compares lguest to Xen, "Xen is similar, in that it doesn't need hardware virtualization support (although it can use it), but Xen supports an extensive range of features such as PAE (ie. lots of memory), SMP guests, 64-bit. You have to boot your kernel under the Xen hypervisor; you can't simply modprobe when you want to create a guest."
Following up to feedback on his merge plans [story], Andrew Morton [interview] posted an updated summary of what he is pushing upstream for inclusion in the upcoming 2.6.22 kernel. His list included, "a few serial bits, a few pcmcia bits, one little security patch, the blackfin architecture, small h8300 update, small alpha update, swsusp updates, m68k bits, and lots of UML updates." He also noted that he'll push some of the memory management queue including, "an enhancement to /proc/pid/smaps to permit monitoring of a running program's working set. The SLUB allocator, it's pretty green but I do want to push ahead with this pretty aggressively with a view to replacing slab altogether. Generic pagetable quicklist management. We have x86_64 and ia64 and sparc64 implementations, but I'll only include David's sparc64 implementation here. I'll send the x86_64 and ia64 implementations through maintainers."
Following the release of the 2.6.21 kernel [story] Andrew Morton [interview] posted a list of patches in his -mm kernel, summarizing for each his plans as to whether or not they wil be pushed upstream for inclusion in the upcoming 2.6.22 kernel. He noted, "the overall stability in recent -mm's was not sufficiently high and we ran out of time to find all the bugs. I shouldn't have merged all those patches last week - they contained an exceptional amount of garbage. This all means that more bugs than usual will probably leak into mainline, and we'll have to fix them there." He went on to add, "I've been ducking most non-bugfix patches recently. I have ~200 feature and cleanup patches queued for later consideration, so people who sent those will be hearing from me eventually."
The future of Reiser4 was raised on the lkml, with the filesystem's creator, Hans Reiser [interview], awaiting his May 7'th trial [story]. Concerns that the filesystem wasn't being maintained were laid to rest when Andrew Morton [interview] stated, "the namesys engineers continue to maintain reiser4 and I continue to receive patches for it." He further added, "the namesys guys are responsive and play well with others." As to why the filesystem hasn't yet been merged into the 2.6 kernel, Andrew explained, "to get it unstuck we'd need a general push, get people looking at and testing the code, get the vendors to have a serious think about it, etc. We could do that - it'd require that the namesys people (and I) start making threatening noises about merging it, I guess." He then made joking reference to the recent debate regarding the new CPU schedulers [story], "or we could move all the reiser4 code into kernel/sched.c - that seems to get people fired up."
Namesys developer and author of the Reiser4 encryption and compression plugins, Edward Shiskin, offered some updates. Replying to some comments about the need to remove plugins from the Reiser4 code he explained, "the popular opinion that plugins make more sense in the VFS is a great delusion, as plugins are entities related to reiser4 disk layouts." In an earlier thread it had been suggested that the plugins were misnamed and would be better called an internal abstraction layer [story]. Edward went on to note, "currently there are two namesys employees working [on Reiser4] mostly on enthusiasm." He linked to a wiki page listing known issues with the code needing to be fixed before it's likely to be merged into the 2.6 kernel, "the main issues here are xattrs and support for blocksize != pagesize. I think that adding xattrs will take ~1 month of full-time working. Not sure about blocksize support." When it was noted that other filesystems have already been merged without support for either of these features, Edward said that they'd lower their priority and finish up with the other remaining issues left on the old todo list and resume the merge discussion at that time.
Following the release of the 2.6.20 kernel [story] Andrew Morton [interview] posted a list of patches in his -mm kernel, summarizing for each his plans as to whether or not they will be pushed upstream for inclusion in the upcoming 2.6.21 kernel. Andrew commented, "I'm getting fed up of holding onto hundreds of patches against subsystem trees, sending them over and over again and seeing nothing happen. I sent 242 patches out to subsystem maintainers on Monday and look at what's still here." In response to some confusion as to what happens to these patches, he went on explain, "once a subsystem has a subsystem tree (git or quilt) I basically never merge anything which belongs to that tree. It's always originator->mm->subsystemtree->Linus".
When the data corruption bug which is fixed as of 2.6.20-rc3 [story] was still being tracked down [story], it was thought that the bug, a race in shared mmap'ed page writeback, might have been in the 2.6 kernel for a very long time. It has since been determined that the bug was introduced much more recently. Nick Piggin [interview] explains, "this bug was only introduced in 2.6.19, due to a change that caused pte dirty bits to be discarded without a subsequent set_page_dirty() (nowhere else in the kernel should have done this)." Linus Torvalds noted that earlier kernels could have been affected by a less serious version of the bug:
"Actually, I think 2.6.18 may have a subtle variation on it. But that much older race would only trigger on SMP (or possibly UP with preempt). And I haven't actually thought about it that much, so I could be full of crap. But I don't see anything that protects against it: we may hold the page lock, but since the code that marks things _dirty_ doesn't necessarily always hold it, that doesn't help us. And we may hold the 'private_lock', but we drop it before we do the dirty bit clearing, and in fact on UP+PREEMPT that very dropping could cause an active preemption to take place, so.. I dunno. For older kernels? If there is a race there, it must be pretty damn hard to hit in practice (and it must have been there for a looong time), so trying to fix it is possibly as likely to cause problems as it migh to fix them."
David Miller pointed out that some of the confusion as to when the bug was actually introduced comes from the fact that the original bug was against a 2.6.18 Debian kernel. Andrew Morton [interview] explained, "that was 2.6.18+debian-added-dirty-page-tracking-patches," then went on to caution that the fix still does not address a newly reported and currently unconfirmed BerkeleyDB corruption bug, "I'll assert (and emphasise) that the cause of the alleged BerkeleyDB corruption is not known at this time. The post-2.6.19 'fix' might make it go away. But if it does, we do not know why, and it might still be there, only harder to hit."
With the release of the 2.6.19-rc1-mm1 kernel, the ext4 filesystem [story] was merged into Andrew Morton [interview]'s -mm tree for further testing. In the announcement Andrew notes that the new filesystem is compatible with ext3 until you add a file that has extents. He also notes, "when comparing performance with other filesystems, remember that ext3/4 by default offers higher data integrity guarantees than most. So when comparing with a metadata-only journalling filesystem, use `mount -o data=writeback'. (Although this doesn't seem to make much difference with ext3)" The goal is to stabilize the new filesystem within the next six to nine months, and ultimately to replace the ext3 filesystem.
Andrew Morton [interview] posted his patch queue with numerous comments about merge plans into the mainline kernel. Among his comments he noted that he would not yet be merging the Reiser4 filesystem [story], "reiser4. I was planning on merging this, but the batch_write/writev problemight wreck things, and I don't think the patches arising from my recent partial review have come through yet. So it's looking more like 2.6.20."
A large discussion followed Andrew's posting that focused on the current kernel development process [story]. Andrew expressed his concerns on what's currently happening, "people seem to treat the stabilisation period as a wonderful quiet time in which to run off and develop new features, rather than participating in the stabilisation. This has the following effects: 1: release cycles get longer 2: the kernel has more bugs 3: we put new features into the kernel faster than we otherwise would (see 2:, above)." Alan Cox [interview] proposed an idea, "a suggestion from the department of evil ideas: Call even cycles development odd ones stabilizing. Nothing gets into an odd one without a review and linux-kernel signoff/ack?" Linus Torvalds replied favorably, going on to note that he was surprised at how well the decision to only accept big merges in the two weeks following a major release has been accepted, "I actually expected people to dislike arbitrary rules more than they do, but I've come to believe that people _like_ having rules that they have to obey, as long as it's not a big pain for them. In other words, arbitrary rules are not actually disliked at all, people actually _like_ them, because suddenly there's less need for making unnecessary judgement decisions." Linus went on to spell out the idea further, "2.6.<odd> is 'the big initial merges with all the obvious fixes to make it all work' (ie roughly the current -rc2 or perhaps -rc3). 2.6.<even> is 'no big merges, just careful fixes' (ie the current 'real release')". He went on to caution:
"That said, I think Andrew was of the opinion that it doesn't really _fix_ anything, and he may well be right. What's the point of the odd release, if the weekly snapshots after that are supposed to be strictly better than it anyway? So I think I may like it just because it _seems_ to combine the good features of both the old naming scheme and the current one, but I suspect Andrew may be right in that it doesn't _really_ change anything, deep down."
With the release of the 2.6.18-rc3-mm1 kernel, Andrew Morton [interview] included a brief note stating, "fwiw, I recently took a position with Google." He then linked to a Linux Today article which details the reasons behind his recent move. The article begins, "Andrew Morton has started working for a new company, but his day job as the Linux 2.6 kernel maintainer will remain exactly the same." In the article, Andrew discusses one of the reasons Google was a good fit, "in my position as kernel maintainer I feel that I should not be employed by a company which has a direct interest in the kernel.org kernel because this would put me in a position of making decisions which are commercially significant to my employer's competitors. As Google maintains their own kernel variant for internal use, their interests are largely decoupled from what happens in the kernel.org kernel."
The ongoing discussion about the Reiser4 filesystem [story] continues on the lkml. Jeff Garzik discussed the complexity introduced by a plugin layer [story], suggesting it is really a second VFS, "furthermore, it completely changes the notion of what a Linux filesystem is. Currently, each Linux filesystem is a tightly constrained set of metadata support. reiser4 changes 'tightly constrained' to 'infinity'. While that freedom is certainly liberating, it also has obvious support costs due to new admin paradigms and customer configuration possibilities."
Linux creator Linus Torvalds weighed in on the discussion, "as long you call them 'plugins' and treat them as such, I (and I suspect a lot of other people) are totally uninterested, and in fact, a lot of people will suspect that the primary aim is to either subvert the kernel copyright rules, or at best to create a mess of incompatible semantics with no sane overlying rules for locking etc." He went on to add, "as far as I'm concerned, the problem with reiser4 is that it hasn't tried to work with the VFS people. Now, I realize that the main VFS people aren't always easy to work with (Al and Christoph, take a bow), but that doesn't really change the basic facts. Al in particular is _always_ right. I don't think I've ever had the cojones to argue with Al.."
Later in the same thread, Andrew Morton [interview] noted that he's currently reviewing the code, "meanwhile here's poor old me trying to find another four hours to finish reviewing the thing." Regarding the code he added, "the writeout code is ugly, although that's largely due to a mismatch between what reiser4 wants to do and what the VFS/MM expects it to do. If it works, we can live with it, although perhaps the VFS could be made smarter." He then suggested, "I'd say that resier4's major problem is the lack of xattrs, acls and direct-io. That's likely to significantly limit its vendor uptake." As for the plugin debate, Andrew said, "the plugins appear to be wildly misnamed - they're just an internal abstraction layer which permits later feature additions to be added in a clean and safe manner. Certainly not worth all this fuss."
The question of if and when Reiser4 will be merged into the mainline Linux kernel has been an on-going debate for a couple of years [story]. The filesystem was described as being "fairly stable for average users" by Hans Reiser [interview] over two years ago, in March of 2004 [story]. It has been merged into Andrew Morton [interview]'s -mm kernel [story], though issues such as Reiser4 plugins [story] and coding style [story] caused lengthy discussions last year. Two recent threads on the lkml raised the question again, asking at a non-technical level why Reiser 4 has not been included in the Linux kernel. Some have offered theories that Reiser4 is being blocked for political reasons, others because of concerns that once Reiser4 is included Namesys might forget it and move onto another filesystem. Responses to these theories point out that in reality there are technical issues that must be resolved before the filesystem will be merged, and that much progress has been made toward this end. Additional discussion can be found on a relevant recently created kernel newbies wiki page.
Hans Reiser posted a "short term task list for Reiser4" to address the remaining technical issues. The todo list included getting batch_write merged into the -mm kernel [story], getting read optimization code merged into the -mm kernel, documenting everything in the Namesys wiki, exploring and addressing reports of system pauses when using Reiser4, a complete review of the crypt-compress code, a large effort in optimizing fsync, a review of installation instructions, and a review of the kernel documentation. Hans explains, "unfortunately, our code stability is going to decrease for a bit due to all these changes to the read and write code --- no way to cure that but passage of time. On the other hand, our CPU usage went way down. Reiser4's only performance weakness now is fsync. Once the crypt-compress code is ready, we will release Reiser4.1-beta (with plugins, releasing a beta means telling users that if they mount -o reiser4.1-beta then cryptcompress will be their default plugin, and if they don't, then they are using Reiser4.0 still). Doubling our performance and halving our disk usage is going to be fun."