"If you're wondering why I'm taking a long time to respond to your patches,", began Theodore Ts'o on the linux-ext4 mailing list, in a thread that offered much insight into how and why to properly submit and test patches. "Patches that are accepted into mainline should do one and only one thing," Ted continued, "so if someone suggests that you make changes to your submitted patch, ideally what you should do is to resubmit the patch with the fixes --- and not submit a patch which is a delta to the previous one." He also noted that patch submitters often greatly outnumber maintainers dictating a higher standard of quality, "consider that for some maintainers, there may be 10 or 20 or 30 or more patch submitters in their subsystem. With that kind of submitter-to-maintainer ratio, the patch submitter simply has to do much more of the work, since otherwise the subsystem maintainer simply can't keep up."
Ted went on to acknowledge, "I happen to believe that we need to encourage newcomers to the kernel developer community, and so I spend more time mentoring people who are new to the process." He noted that his time was finite however, and that patches are accepted more quickly when they are easy to review and integrate. Regarding the filesystem for which the patches had been submitted, he added, "Ext4 is actually quite stable at this point. Very large numbers of people are using it, and most users are quite happy." For this reason, he pointed out that it is even more critical that the patches merged be of high quality. That said, he continued, "there is no such thing as code which is not buggy. For any non-trivial program, it's almost certain there are bugs. [...] Ext4 is not exempt from these fundamental laws of software engineering. 'Code is always buggy until the last user of the program dies'." He tied this back to the importance of testing patches before submitting, "keep in mind that the maxim that code which is not buggy also applies to your patches."
Mel Gorman posted the seventh version of his Memory Compaction patches asking, "are there any further obstacles to merging?" The patches, first posted in May of 2007, provide a mechanism for moving GFP_MOVABLE pages into a smaller number of pageblocks, reducing externally fragmented memory. Mel explains that 'compaction' is another method of defragmenting memory, "for example, lumpy reclaim is a form of defragmentation as was slub 'defragmentation' (really a form of targeted reclaim). Hence, this is called 'compaction' to distinguish it from other forms of defragmentation."
The core compaction patch explains that memory is compacted in a zone by relocating movable pages towards the end of the zone:
"A single compaction run involves a migration scanner and a free scanner. Both scanners operate on pageblock-sized areas in the zone. The migration scanner starts at the bottom of the zone and searches for all movable pages within each area, isolating them onto a private list called migratelist. The free scanner starts at the top of the zone and searches for suitable areas and consumes the free pages within making them available for the migration scanner. The pages isolated for migration are then migrated to the newly isolated free pages."
We're using a lot of different kernel configurations on our servers running Gentoo Linux and Linux VServer. Kernels' configs depend on hardware and functions of every server. Every time new kernel is released and/or we need to enable/disable some features for all servers' kernels we have to update .config's manually.
"Or, we could just do the ugliest patch ever, namely
-#define pcibus_to_node(node) (-1) +#define pcibus_to_node(node) ((int)(long)(node),-1)
Wow. It's so ugly it's almost wraps around and comes out the other side and looks pretty."
"I bow down before you. I thought I had done some rather horrible things with gcc built-ins and macros, but I hereby hand over my crown to you. As my daughter would say: that patch fell out of the ugly tree, and hit every branch on the way down. Very impressive."
"'Too small' and 'too insignificant' is not a patch attribute that is in my vocabulary - by definition the best patches are very small and very insignificant (they just happen to end up doing something cool a 1000 steps later ;-) 99% of our problems come from 'too large' and 'too intrusive' patches."
"The __deprecated marker is quite useful in highlighting the remnants of old APIs that want removing. However, it is quite normal for one or more years to pass, before the (usually ancient, bitrotten) code in question is either updated or deleted," explained Jeff Garzik, posting a small patch to make it possible to silence "warning: 'foo' is deprecated" type messages.
Andrew Morton wasn't impressed, suggesting, "Sigh. Can't we just fix the dud code? Or mark it BROKEN and see what happens?" Linus Torvalds added, "I think removing __deprecated is the better option. Quite frankly, some people add '__deprecated' *way* too eagerly." Jeff agreed that
__deprecated is over used, "__deprecated has spread to just about every API that people don't consider fresh and up-to-date." He then added ,"like I noted in the patch description, rewriting grotty ISA/MCA/etc. probe code is a thankless, boring task that few are crazy enough to attempt :) As you can see from the patch flood recently I /have/ been working through the dud code, but it will still take years. The changes required for each are on average ~200 LOC changed, if not more."
Jeff Garzik posted a series of nine patchs to the lkml titled to "remove [the] 'irq' argument from all irq handlers", explaining, "the overwhelming majority of drivers do not ever bother with the 'irq' argument that is passed to each driver's irq handler. Of the minority of drivers that do use the arg, the majority of those have the irq number stored in their private-info structure somewhere." He noted that he had no intention to push the patches upstream anytime soon.
Feedback was entirely positive, with Thomas Gleixner suggesting, "Full ACK. We should do this right at the edge of -rc1. And let's do this right now in .24 and not drag it out for no good reason." Ingo Molnar concurred, "full ACK on the concept from me too. Please go ahead! :)" Eric Biederman noted that there was still work to be done, "the practical question is how do we make this change without breaking the drivers that use their irq argument." Jeff agreed, explaining why the code won't be pushed upstream during -rc1, "I am finding a ton of bugs in each get_irqfunc_irq() driver, so I would rather patiently sift through them, and push fixes and cleanups upstream. Once that effort is done, everything should be in the 'trivial' pile and not have the logic that you are worried about (and thus there would be no need to add an additional branch to the irq handling path)."
"We have seen ramdisk based install systems, where some pages of mapped libraries and programs were suddendly zeroed under memory pressure. This should not happen, as the ramdisk avoids freeing its pages by keeping them dirty all the time," Christian Borntraeger began, explaining the need for his small patch to the ramdisk driver. He continued, "it turns out that there is a case, where the VM makes a ramdisk page clean, without telling the ramdisk driver. On memory pressure shrink_zone runs and it starts to run shrink_active_list. There is a check for buffer_heads_over_limit, and if true, pagevec_strip is called. pagevec_strip calls try_to_release_page. If the mapping has no releasepage callback, try_to_free_buffers is called. try_to_free_buffers has now a special logic for some file systems to make a dirty page clean, if all buffers are clean. Thats what happened in our test case."
He provided two methods for duplicating the reported problem, "you have to make buffer_heads_over_limit true" This is done by either lowering
max_buffer_heads or having a system with lots of high memory. "The solution is to provide a noop-releasepage callback for the ramdisk driver. This avoids try_to_free_buffers for ramdisk pages."
"The following patch makes it possible to give kernel messages a selectable color which helps to distinguish it from other noise, such as boot messages," explained Jan Engelhardt, along with a 43 line patch to the
char driver. As justification for the patch he offered, "NetBSD has it, OpenBSD has it, FreeBSD to some extent, so I think Linux should too." He also noted that an earlier version of the small patch had been previously posted back in April.
Ingo Molnar responded favorably, "looks really good to me!" He went on to suggest, "feature request: would be interesting to have a color table (defined in the .config) dependent on message loglevel. That way KERN_CRIT messages could be red, KERN_INFO ones white, etc."
Greg KH posted three emails titled State of the Linux Driver Core Subsystem, State of the Linux USB Subsystem, and State of the Linux PCI Subsystem, noting that for each there were no known regressions then going on to list which patches were bound for the upcoming 2.6.24 kernel. Greg pointed out that there were a large number of open bugs against the USB subsystem, "yeah, there are way too many there, I've been really slack in trying to work through them. If anyone wants to help out, feel free :)" He continued:
"Note, there are over 100 patches in the USB queue, so I might have missed a few things in reviewing them by hand right now. If I failed to describe your patch that is already in the queue, and you feel it is important for everyone to know about, please feel free to add to the above list. I did not purposefully mean to exclude anything, merely try to summarize things"
"Here's a new version of my credentials patch. It's still very basic, with only Ext3, (V)FAT, NFS, AFS, SELinux and keyrings compiled in on an x86_64 arch kernel," stated David Howells. He described the patch as, "introduce a copy on write credentials record (struct cred). The fsuid, fsgid, supplementary groups list move into it (DAC security). The session, process and thread keyrings are reflected in it, but don't primarily reside there as they aren't per-thread and occasionally need to be instantiated or replaced by other threads or processes."
Casey Schaufler asked, "what I don't really understand is what value is gained by this exercise. Are the savings sufficiently significant to justify the effort?" Trond Myklebust explained, "it is not about savings, but about new functionality. Basically, the existence of reference-counted credentials will allow AFS and NFS to cache that information and use it for deferred writes etc." David added, "and also make it easier for cachefiles and hopefully NFSd to override the active security. There's a comment somewhere in, I think, the SunRPC code in the Linux kernel bemoaning the lack of this very feature:-)"
Randy Dunlap sent a patch to the Linux kernel mailing list described as adding "info about various email clients and their applicability in being used to send Linux kernel patches." The first revision generated quite a bit of discussion, quickly resulting in a second version, and eventually a third version that Andrew Morton referred to as "soon to be merged". In addition to some general suggestions about emailing patches, it also offers some specific configuration suggestions for a number of popular email clients. It begins:
"Patches for the Linux kernel are submitted via email, preferably as inline text in the body of the email. Some maintainers accept attachments, but then the attachments should have content-type 'text/plain'. However, attachments are generally frowned upon because it makes quoting portions of the patch more difficult in the patch review process.
"Email clients that are used for Linux kernel patches should send the patch text untouched. For example, they should not modify or delete tabs or spaces, even at the beginning or end of lines."
"This patch implements a new version of RCU which allows its read-side critical sections to be preempted," Paul McKenney said in a posting to the Linux Kernel mailing list. He described the RCU code contained in his 9 part patchset, "it uses a set of counter pairs to keep track of the read-side critical sections and flips them when all tasks exit [the] read-side critical section." He continued:
"This patch was developed as a part of the -rt kernel development and meant to provide better latencies when read-side critical sections of RCU don't disable preemption. As a consequence of keeping track of RCU readers, the readers have a slight overhead (optimizations in the paper). This implementation co-exists with the 'classic' RCU implementations and can be switched to at compiler."
Ingo Molnar responded very favorably to the patch, "cool! Now after 2 years of development and testing I think this is one of the most mature patchsets on lkml - so i'd like to see this designated for potential upstream inclusion. I.e. everyone who can see some bug, please holler now."
"The cfs core has been enhanced since quite sometime now to understand task-groups and [to] provide fairness to such task-groups," began Srivatsa Vaddagiri, "what was needed was an interface for the administrator to define task-groups and specify group 'importance' in terms of its cpu share. The patch below adds such an interface."
Srivatsa requested that his patch be merged into Andrew Morton's -mm tree to receive more testing, "note that the load balancer needs more work, esp to deal with cases like 2-groups on 4-cpus, one group has 3 tasks and other having 4 tasks. We are working on some ideas, but nothing to share in the form of a patch yet. I felt sending this patch out would help folks start testing the feature and also improve it."