"In the kerneloops.org stats, a new oops is rapidly climbing the charts, began Arjan van de Ven, referring to his website where he automatically collects kernel oops and warning reports from mailing lists, bugzillas, and a special client. Regarding the latest oops, he noted, "the oops is a page fault in the ext3 'do_slit' function, and the first report of it was with 2.6.26-rc6-git3." Linux creator Linus Torvalds took a quick interest in the issue, observing that all the oopses seemed to be on the i686 architecture, suggesting, "could this perhaps be an indication that it is specific to i686 some way (eg a compiler issue?)"
Shortly before Linus sent out his emails, Dave Airlie confirmed that this was indeed a known compiler bug affecting GCC 4.3.1. The bug report notes, "any ext* filesystem which enables the dir_index feature is likely susceptible". Linus caught up on his email and retorted, "gaah. I should read all my email instead of wasting my time trying to match up the code with what I can reproduce.." The reason the Red Hat bug report wasn't automatically picked up by the kerneloops website was because the oops was reported in a jpeg image, leading Arjan to quip, "maybe one day if I'm really bored I'll implement OCR into [kerneloops.org] ;)".
"My concern is that if there's something technological in the 'bleeding tree' that is so valuable to users that distros feel that it's ready 'enough' and that they need to pick it up for their users, we have a flaw in our processes in moving too slowly for users."
"The http://www.kerneloops.org website collects kernel oops and warning reports from various mailing lists and bugzillas as well as with a client users can install to auto-submit oopses," began Arjan van de Ven, referring to a website first announced last December. He summarized, "this week, a total of 3670 oopses and warnings have been reported, compared to 3029 reports in the previous week." The 'kerneloops' client is available from the project's web page, and is now being included by multiple distributions. Arjan explains, "in addition to Fedora, Debian now has included the client application in their default GUI install targets, thanks a lot for that!" He went on to discuss some recent changes:
"This week, based on feedback, I've split the report into 'untainted' and 'caused by proprietary drivers'. Let me know if I should continue doing this or if the old format was better.
"As an experiment (on request) I've exported the database to text files (one file per report) and stuck it in a git repository. You can take a look with git clone git://www.kerneloops.org/ Suggestions for improving the format of this are obviously very welcome, as are 'yes useful' and 'no not useful' comments. Again, this is an experiment, if it's not seen as useful I may discontinue it."
Andrew Morton replied to a commit message making 4k stacks the default, saying, "this patch will cause kernels to crash." Ingo Molnar replied, "what mainline kernels crash and how will they crash? Fedora and other distros have had 4K stacks enabled for years." He added, "we've conducted tens of thousands of bootup tests with all sorts of drivers and kernel options enabled and have yet to see a single crash due to 4K stacks." During the lengthy discussion it was suggested that nfs+xfs+raid kernel configurations, and using ndiswrapper are the most common reasons for overflowing a 4K stack size.
Andi Kleen questioned the usefulness of 4k stacks, "as far as I can figure out they are not [a worthy goal]. They might have been a worthy goal on crappy 2.4 VMs, but these times are long gone." Arjan van de Ven suggested that though the 2.6 VM is much improved over the 2.4 VM, fragmentation with 8K stacks remains an unsolvable problem, "it's basic math; the Linux VM gets to deal with both short and long lasting allocations; no matter how hard you try to get some degree of fragmentation; especially due to the 15:1 acceleration you get due to the lowmem issue. And before you say 'you should use 64 bit on such machines'; I would love it if more people used 64 bit linux. Sadly the adoption rate of that is not very good still.... by far ;(" In another email, Arjan listed two advantages to 4K stacks, "1) less memory consumption in the lowmem zone (critical for enterprise use, also good for general performance), and 2) kernel stacks at 8K are one of the most prominent order-1 allocations in the kernel; again with big-memory systems the fragmentation of the lowmem zone is a problem (and the distros that ship 4K stacks went there because of customer complaints)".
"Slow servers, Skipping audio, Jerky video --everyone knows the symptoms of latency. But to know what's really going on in the system, what's causing the latency, and how to fix it... those are difficult questions without good answers right now," began Arjan van de Van, announcing version 0.1 of LatencyTop, "a tool for developers to visualize system latencies." He continued:
"LatencyTOP is a Linux tool for software developers (both kernel and userspace), aimed at identifying where system latency occurs, and what kind of operation/action is causing the latency to happen. By identifying this, developers can then change the code to avoid the worst latency hiccups.
"There are many types and causes of latency, and LatencyTOP focus on type that causes audio skipping and desktop stutters. Specifically, LatencyTOP focuses on the cases where the applications want to run and execute useful code, but there's some resource that's not currently available (and the kernel then blocks the process). This is done both on a system level and on a per process level, so that you can see what's happening to the system, and which process is suffering and/or causing the delays."
"This week, a total of 49 oopses and warnings have been reported, compared to 53 reports in the previous week," Arjan van de Ven noted, sending out a list of the week's top 10 kernel oopses. Al Viro suggested, "FWIW, people moaning about the lack of entry-level kernel work would do well by decoding those to the level of 'this place in this function, called from <here>, with so-and-so variable being <this>' and posting the results." This was met by multiple requests for documentation on how to actually decode an oops. Linus Torvalds explained:
"It's actually not necessarily at all that trivial, unless you have a deep understanding of the code generated for the architecture in question (and even then, some oopses take more time to figure out than others, thanks to inlining and tailcalls etc). If the oops happened with a kernel you generated yourself, it's usually rather easy. Especially if you said 'y' to the 'generate debugging info' question at configuration time."
Linus went on to detail how to debug a random oops reported on the lkml, "you will generally have to disassemble the hex sequence given in the oops (the 'Code:' line), and try to match it up against the source code to try to figure out what is going on." He then offered a number of tips on how this is best accomplished, continuing with an example walking through one of the reports oops. Al Viro replied describing his own methods of accomplishing the same thing, walking through of another oops and isolating a bug.
"The http://www.kerneloops.org website collects kernel oops and warning reports from various mailing lists and bugzillas," noted Arjan van de Ven, announcing the new website. He included a summary of the top 10 oopses collected in the past 7 days noting, "this is the first such report that I'm posting; Please let me know if this is useful or not."
Feedback was positive. Andrew Morton commented, "well that would have been fun to write." Steven Richter expressed some concern about the tool counting the same bug report duplicate times when found in different places. Arjan aggreed, "this is true however it's .. a hard issue. It's really hard to distinguish a duplicate report from two reports of the same bug." Another concern was in separating oops generated by 2.6.X-rcY kernels from 2.6.X-rcY-mmZ kernels. Arjan noted, "finding what exact kernel version an oops is from is... surprisingly hard. And to be honest, bugs against -mm are still very interesting, since they'll be the next mainline after all".
Adrian Bunk posted a patch to make Linux IO schedulers a non-modular option, which would require one IO scheduler to be selected at compile time. He suggested, "there isn't any big advantage and doesn't seem to be much usage of modular schedulers." He added that removing the option to make IO schedulers modular would save 2kB on each kernel image. Jens Axboe did not like the patch, "big nack, I use it all the time for testing. Just because you don't happen to use it is not a reason to remove it." When Adrian noted that no distros seemed to be making IO schedulers available as modules, Jens suggested that this was a mistake and quipped, "it's been a long time since I considered a distro .config a benchmark/guideline of any sort."
Adrian went on to ask for the technical reasons for continuing to support four different IO schedulers, expressing concern that it could lead to bugs in individual schedulers going unreported. Jens explained that he was aiming for the perfect IO scheduler, but at this time different IO schedulers offer better results for different workloads, "with some hard work and testing, we should be able to get rid of [the anticipatory scheduler]. It still beats cfq for some of the workloads that deadline is good at, so not quite yet." Arjan van de Ven offered, "there is at least one technical reason to need more than one: certain types of storage (both big EMC boxes as well as solid state disks) don't behave like disks and have no seek penalty; any cpu time spent on avoiding seeks is wasted on those, so for these devices one really wants to use a different IO scheduler, one which is much lighter weight". Jens then acknowledged, "there's always a risk with 'duplication', like several drivers for the same hardware. I'm not disputing that."
"I'd like to ask you to put a file in Documentation/ somewhere that describes what AppArmor's intended security protection is (it's different from SELinux for sure for example); by having such a document for each LSM user, end users and distros can make a more informed decision which module suits their requirements..." Arjan van de Ven suggested in an attempt to help focus future Linux Security Module discussions on technical issues. He explained, "it also makes it possible to look at the implementation to see if it has gaps to the intent, without getting into a pissing contest about which security model is better; but unless the security goals are explicitly described that's a trap that will keep coming back... so please spend some time on getting a good description going here.." Arjan continued:
"My main concern for now is a description of what it tries to protect against/in what cases you would expect to use it. THe reason for asking this explicitly is simple: Until now the LSM discussions always ended up in a nasty mixed up mess around disagreeing on the theoretical model of what to protect against and the actual implementation of the threat protection. The only way I can think of to get out of this mess is to have the submitter of the security model give a description of what his protection model is (and unless it's silly, not argue about that), and then only focus on how the code manages to achieve this model, to make sure there's no big gaps in it, within its own goals/reference."
"With latencytop, I noticed that the (in memory) atime updates during a kernel build had latencies of 600 msec or longer; this is obviously not so nice behavior. Other EXT3 journal related operations had similar or even longer latencies," Arjan van de Ven reported, describing a "mass priority inversion" caused by, "an interaction between EXT3 and CFQ in that CFQ tries to be fair to everyone, including kjournald. However, in reality, kjournald is 'special' in that it does a lot of journal work". Finally, he offered a tiny patch to resolve the issue, "the patch below makes kjournald of the IOPRIO_CLASS_RT priority to break this priority inversion behavior. With this patch, the latencies for atime updates (and similar operation) go down by a factor of 3x to 4x !"
Andrew Morton took a cautious stance, "seems a pretty fundamental change which could do with some careful benchmarking, methinks. See, your patch amounts to 'do more seeks to improve one test case'. Surely other testcases will worsen. What are they?" CFQ author Jens Axboe agreed, "It should not be merged as-is, instead I'll provide a function to do this." Ingo Molnar wasn't convinced, "atime update latencies went down by a factor of 3x-4x ... but what bothers me even more is the large picture. Linux's development is still fundamentally skewed towards bandwidth (which goes up with hardware advances anyway), while the focus on latencies is very lacking (which users do care about much more and which usually does _not_ improve with improved hardware), so i cannot see why we shouldnt apply this." He added, "if bandwidth hurts anywhere, it will be pointed out and fixed, we've got like tons of bandwidth benchmarks and it's _easy_ to fix bandwidth problems. But _finally_ we now have desktop latency tools, hard numbers and patches that fix them, but what do we do ... we put up extra roadblocks??" Andrew calmy replied, "I think the situation is that we've asked for some additional what-can-be-hurt-by-this testing. Yes, we could sling it out there and wait for the reports. But often that's a pretty painful process and regressions can be discovered too late for us to do anything about them."
When a Linux user reported a repeatedly high load average on an idle server, tracking the problem to a specific patch labeled, "user of the jiffies rounding code", Andrew Morton replied, "this is unexpected. High load average is due to either a task chewing a lot of CPU time or a task stuck in uninterruptible sleep." Linus Torvalds disagreed, explaining:
"We saw high loadaverages with the timer bogosity with 'gettimeofday()' and 'select()' not agreeing, so they would do things like
'date = time(..); select(.. , timeout = );'and when 'date' wasn't taking the jiffies offset into account, and thus mixing these kinds of different time sources, the select ended up returning immediately because they effectively used different clocks, and suddenly we had some applications chewing up 30% CPU time, because they were in a loop that *tried* to sleep."
Linus offered what he described as an "idiotic patch" to cause the load average to not be calculated exactly once every 5 seconds to prevent it from being in sync with something else waking up every 5 seconds, noting, "the load average is not calculated every tick, because that's not just expensive, but we also want to have some time-based decay." Arjan van de Ven pointed out that this shouldn't help, "I mean, the load gets only updated in actual timer interrupts... and on a tickless system there's very few of those around..... and usually at places round_jiffies() already put a timer on." Linus agreed with this reasoning, suggesting, "maybe Anders' problem stems partly from the fact that he really is using the tweaks to make that tickless theory more true than it tends to be on most systems?" Arjan pointed out that a lot of work has been successful in making tickless kernels wake up less, "we fixed a TON of stuff over the last months.. standard desktops (F8 / next Ubuntu) will be around 10 wakeups/sec, in a lab environment you can get below 2 ;)"
James Bottomley announced the Linux Foundation Technical Advisory Board election results from September 5th, "sorry this has taken so long to get out ... I just, er, forgot." He noted that there were eight candidates. "Every candidate gave a nomination statement before the voting (with the three persons not present having their statements read to the meeting). We did single polling per position and had two rounds for a tie on the last candidate."
James then stated that the five people elected to the advisory board were, Arjan van de Ven, Greg Kroah Hartman, Christoph Lameter, Jon Corbett, and Olaf Kirch. The purpose of the advisory board was discussed earlier.
A short thread on the lkml discussed the lack of a memzero function in the Linux kernel. Cyrill Gorcunov asked, "could anyone tell me why there is no official memzero function (or macros) in the kernel?" Arjan van de Ven explained, "it doesn't add value.... memset with a constant 0 is just as fast (since the compiler knows it's 0) than any wrapper around it, and the syntax around it is otherwise the same." Linux creator Linus Torvalds went on to explain:
"The reason we have '
clear_page()' is not because the value we're writing is constant - that doesn't really help/change anything at all. We could have had a 'fill_page()' that sets the value to any random byte, it's just that zero is the only value that we really care about."So the reason we have '
clear_page()' is because the *size* and *alignment* is constant and known at compile time, and unlike the value you write, that actually matters. So 'memzero()' would never really make sense as anything but a syntactic wrapper around 'memset(x,0,size)'."
"Intel's Open Source Technology Center is pleased to announce the LessWatts.org project, an open source project for saving power on Linux," began an email posted to the lkml by Arjan van de Ven. The announcement continued:
"LessWatts.org is a place to bring users, developers and distribution makers together around power reduction for linux machines, from mobile to desktop to server to datacenter. LessWatts.org is about a system-level approach to power savings, from the lowest level device drivers in the kernel to the most advanced desktop applications. LessWatts.org is about things you can do to reduce power usage. LessWatts.org is about longer battery life, a lower airconditioning bill, about reducing the impact of computers on the environment."
The announcement went on to note, "at this time of launching the LessWatts.org project, the technology development projects are those that Intel has started, is involved in or has just started working on, such as PowerTOP, Tickless Idle, Graphics and various link power management techniques. We'd like to invite all developers and projects that focus on power saving to join the LessWatts.org effort and community."
"With all the tickless [story] and other goodies going into the kernel in the last few months, there is a lot of hope that this helps Linux reduce power consumption," Arjan van de Ven began on the lkml, "and the good news is that it does... once you fix some bugs and fix a bunch of userspace applications." He referred to a promising graph generated utilizing the recently introduced PowerTOP utility [story], measuring power consumption before and after applying a series of related bug fixes.
The tests began with a Lenovo T61 laptop running the stock 32-bit Fedora 7 kernel which includes the tickless kernel. This was compared against the stock 2.6.22-rc4 kernel with a series of improvements including a fix for the Ondemand CPUFREQ governor, the new CPUIDLE infrastructure, the Active Link Power Management patch, disabling the laptop's TV-out capability, and using a cli utility to properly reduce the laptop's backlight. Arjan summarizes, "with kernel fixes and features, the power consumption of this laptop went from 21.06 Watts to 18.25 Watts; with 2 additional userspace fixes the power consumption ended up at 15.5 Watts."