login
Header Space

 
 

bugs

Bugs And Bureaucracy

April 14, 2008 - 6:35pm
Submitted by Jeremy on April 14, 2008 - 6:35pm.
Linux news

A thread on the Linux Kernel mailing list discussed the process in place for reporting, bisecting and fixing bugs. In response to a suggestion that some of the issues could be solved by introducing new procedures, Al Viro retorted, "we've got ourselves a developing beaurocracy. As in 'more and more ways of generating activity without doing anything even remotely useful'. Complete with tendency to operate in the ways that make sense only to bureaucracy in question and an ever-growing set of bylaws..." Later in the thread, David Miller agreed and noted that ,"the resulting 'bureaucracy' or whatever you want to call it is perceived to undercut the very thing that makes the Linux kernel fun to work on. It's still largely free form, loose, and flexible. And that's a notable accomplishment considering how much things have changed. That feeling is why I got involved in the first place, and I know it's what gets other new people in and addicted too."

Andrew Morton tried to return the discussion to its original topic, "the problem we're discussing here is the apparently-large number of bugs which are in the kernel, the apparently-large number of new bugs which we're adding to the kernel, and our apparent tardiness in addressing them." Al noted that some of the problem is that git is so efficient at merging code, "the patches going in during a merge (especially for a tree that collects from secondaries) are not visible enough. And it's too late at that point, since one has to do something monumentally ugly to get Linus revert a large merge. On the scale of Great IDE Mess in 2.5..." Another suggestion was made to replace bugzilla with something better, to which Andrew replied, "swapping out bugzilla for something else wouldn't help. We'd end up with lots of people ignoring a good bug tracking system just like they were ignoring a bad one."

Quote: The Burden of Debugging

April 11, 2008 - 6:01pm
Submitted by Jeremy on April 11, 2008 - 6:01pm.

"The way I see it, the burden of debugging and fixing bugs is mainly on the developers of the code that breaks. You can't blame users for using the code, triggering bugs and then reporting the breakage. Users who report bugs are doing us all a great service regardless of their ability or willingness to do more work than just the initial report."

— Jesper Juhl, in an April 10th, 2008 message on the Linux Kernel mailing list.

Quote: Quality of the Bug Report

November 26, 2007 - 11:44pm
Submitted by Jeremy on November 26, 2007 - 11:44pm.

"This case is a good example to use the next time a stupid thread starts up about bug reports not being looked into. To me it seems clearly more a matter of the quality of the bug report."

— David Miller, in a November 15th, 2007 message on the Linux Kernel mailing list.

Quote: Fixing Bugs

November 18, 2007 - 7:23pm
Submitted by Jeremy on November 18, 2007 - 7:23pm.

"Five years ago I might have said that it's important to fix pre-existing bugs, but all the ACPI and suspend etc problems have long since convinced me that regressions are *much* more important than stuff that never worked."

— Linus Torvalds, in a November 15th, 2007 message on the Linux Kernel mailing list.

Bug Fixing and Kernel Code Quality

November 13, 2007 - 9:03am
Submitted by Jeremy on November 13, 2007 - 9:03am.
Linux news

"This is the listing of the open bugs that are relatively new, around 2.6.22 and up. They are vaguely classified by specific area," Natalie Protasevich said, posting a current list of bugs each linking to an appropriate bugzilla.kernel.org entry. Andrew Morton reviewed the list, noting "no response from developers" in response to many of the bugs. David Miller pointed out that in some cases this wasn't true, referring to 46 bug fixes queued in his networking tree and another 10 already pushed upstream, "when someone like me is bug fixing full time, I take massive offense to the impression you're trying to give especially when it's directed at the networking. So turn it down a notch Andrew." Andrew wasn't convinced, "first we need to work out whether we have a problem. If we do this, then we can then have a think about what to do about it. I tried to convince the 2006 KS attendees that we have a problem and I resoundingly failed. People seemed to think that we're doing OK." He continued:

"This is not a minor matter. If the kernel _is_ slowly deteriorating then this won't become readily apparent until it has been happening for a number of years. By that stage there will be so much work to do to get us back to an acceptable level that it will take a huge effort. And it will take a long time after that for the kerel to get its reputation back. So it is important that we catch deterioration *early* if it is happening."

Quote: Good For Development, Bad For Business

November 2, 2007 - 12:35am
Submitted by Jeremy on November 2, 2007 - 12:35am.

"Exposing bugs is good for development, bad for business."

— Rik van Riel in an October 2nd, 2007 message on the Linux Kernel mailing list.

Compiler Optimization Bugs and World Domination

October 5, 2007 - 5:09am
Submitted by Jeremy on October 5, 2007 - 5:09am.
Linux news

A bug report filed by Ingo Molnar regarding a procfs crash in the recently released 2.6.23-rc9 kernel was quickly tracked down by Linus Torvalds as a compiler bug. The bug was ultimately determined to be from a compiler optimization generated with an older version of GCC. Ingo was skeptical at first, "it's 4.0.2. Not the latest & greatest but I've been using it for 2 years and this would be the first time it miscompiles a 32-bit kernel out of tens of thousands of successful kernel bootups." Linus replied, "I am 100% sure. I can look at the disassembly, and point to the fact that your Oops happens on code that is simply totally bogus." He continued on to offer an interesting review of the crash, explaining line by line what should have been generated versus what actually was, causing the crash. In the end, Ingo switched to a distribution compiled GCC 4.1.2 and confirmed that the crash went away, "so you are completely right, it's a compiler bug in 4.0.2."

During the thread, Linus suggested that the optimization made by the compiler wasn't "legal", to which Alan Cox retorted, "pedant: valid. Almost all optimizations are legal, nobody has yet written laws about compilers. Sorry but I'm forever fixing misuse of the word 'illegal' in printks, docs and the like and it gets annoying after a bit." Linus playfully responded, "heh. When I'm ruler of the universe, it *will* be illegal. I'm just getting a bit ahead of myself." When asked how long until he expected to be ruler, Linus added, "I'm working on it, I'm working on it. I'm just as frustrated as you are. It turns out to be a non-trivial problem."

Avoiding Unnecessary Delays

September 27, 2007 - 8:21pm
Submitted by Jeremy on September 27, 2007 - 8:21pm.
Linux news

"We don't want to introduce pointless delays in throttle_vm_writeout() when the writeback limits are not yet exceeded, do we?" asked Fengguang Wu as the description of his patch to mm/page-writeback.c. Andrew Morton replied, "this is a pretty major bugfix, explaining, "this patch has the potential to significantly alter the dynamics of the VM behaviour under particular workloads. It might turn up other stuff..." He continued:

"I wonder why nobody noticed this happening. Either a) it turns out that kswapd is doing a good job and such callers don't do direct reclaim much or b) nobody is doing any in-depth kernel instrumentation.

"Now, how _would_ one notice this problem? We don't have very good tools, really. Booting with "profile=sleep" and looking at the profile data would be one way. Repeatedly doing sysrq-T is another. Perhaps the new lockstat-via-lockdep code would allow this to be observed in some fashion, dunno."

Linux: Regressions In 2.6.22-git

July 19, 2007 - 1:49pm
Submitted by Jeremy on July 19, 2007 - 1:49pm.
Linux

Michal Piotrowski sent out an updated list of known regressions in the 2.6.22-git kernel.

OpenBSD: Intel Core 2 Bugs

June 30, 2007 - 4:37pm
Submitted by Jeremy on June 30, 2007 - 4:37pm.
OpenBSD news

Theo de Raadt [interview] described an active effort by OpenBSD developers to work around "serious bugs in Intel's Core 2 cpu". He went on to explain, "these processors are buggy as hell, and some of these bugs don't just cause development/debugging problems, but will *ASSUREDLY* be exploitable from userland code. As is typical, BIOS vendors will be very late providing workarounds / fixes for these processors bugs. Some bugs are unfixable and cannot be worked around. Intel only provides detailed fixes to BIOS vendors and large operating system groups. Open Source operating systems are largely left in the cold." He provided a link to the full errata (in PDF format) as well as a graphical overview, summarizing:

"Note that some errata like AI65, AI79, AI43, AI39, AI90, AI99 scare the hell out of us. Some of these are things that cannot be fixed in running code, and some are things that every operating system will do until about mid-2008, because that is how the MMU has always been managed on all generations of Intel/AMD/whoeverelse hardware. Now Intel is telling people to manage the MMU's TLB flushes in a new and different way. Yet even if we do so, some of the errata listed are unaffected by doing so.

As I said before, hiding in this list are 20-30 bugs that cannot be worked around by operating systems, and will be potentially exploitable. I would bet a lot of money that at least 2-3 of them are."

Linux: Introducing Bugs

June 20, 2007 - 6:15pm
Submitted by Jeremy on June 20, 2007 - 6:15pm.
Linux news

In another thread discussing the tracking of kernel regressions [story], Linux creator Linus Torvalds noted that the kernel is evolving so quickly it is inevitable that bugs will be introduced. He used a git query to determine that there are an average of over 65 patches being committed every single day, "that translates to five hundred commits a week, two _thousand_ commits per month, and 25 thousand commits per year. As a fairly constant stream. Will mistakes happen? Hell *yes*." He continued on to add, "and I'd argue that any flow that tries to 'guarantee' that mistakes don't happen is broken. It's a sure-fire way to just frustrate people, simply because it assumes a level of perfection in maintainers and developers that isn't possible." He then offered a number of calculations based on the number of lines of code added in the past 100 days, summarizing:

"Even by the most *stringent* reasonable rules, we add a new bug every four days. That's just something that people need to accept. The people who say 'we must never introduce a regression' aren't living on planet earth, they are living in some wonderful world of Blarney, where mistakes don't happen, developers are perfect, hardware is perfect, and maintainers always catch things."

Linux: Rethinking Suspend and Resume

May 26, 2007 - 5:10am
Submitted by Jeremy on May 26, 2007 - 5:10am.
Linux news

What started as the review of a bug report grew into an interesting debate as Linus Torvalds slammed the current suspend and resume [story] design in the Linux Kernel, "why the HELL cannot you realize that kernel threads are different? The right thing to do is AND HAS ALWAYS BEEN, to stop and start user threads only around the whole thing. Don't touch those kernel threads. Stop freezing them." Later in the discussion, Linus noted that he had no interest in Suspend to Disk (STD), and was only interested in a working Suspend to Ram (STR) implementation. He noted that complexity introduced by STD was infecting the STR logic, and that the two should be completely separated, "what irritates me is that STR really shouldn't have _had_ that bug at all. The only reason STR had the same bug as STD was exactly the fact that the two features are too closely inter-twined in the kernel. That irritates me hugely. We had a bug we should never had had! We had a bug because people are sharing code that shouldn't be shared! We had a bug because of code that makes no sense in the first place!" Linus noted that he doesn't use laptops much, but still likes STR on his desktop, "STR means they are quiet and don't waste energy when I don't use them, but they're instantly available when I care." He then went on to point to design flaws in the freezer:

"I actually don't think that processes should be frozen really at all. I agree that filesystems have to be frozen (and I think that checkpointing of the filesystem or block device is 'too clever'), but I just don't think that has anything to do with freezing processes. So I'd actually much prefer to freeze at the VFS (and socket layers, etc), and make sure that anybody who tries to write or do something else that we cannot do until resuming, will just be blocked (or perhaps just buffered)!"

Linux: Compiler Warnings

May 24, 2007 - 7:00am
Submitted by Jeremy on May 24, 2007 - 7:00am.
Linux news

"In no case is it ok to just 'shut up the warning'," Linus Torvalds exclaimed in response to a patch that stifled a compiler warning. Reminiscent of a thread on the lkml last year [story], Linus pointed out that it is very important to understand and properly fix compiler warnings [story]:

"Please, we do NOT fix compiler warnings without understanding the code! That's a sure way to just introduce _new_ bugs, rather than fix old ones. So please, please, please, realize that the compiler is _stupid_, and fixing warnings without understanding the code is bad.

"In this case, anybody who actually spends 5 seconds looking at the code should have realized that the warning is just another way of saying that the author of the code was on some bad drugs, and the warnings WERE BETTER OFF REMAINING! Because that code _should_ have warnings. Big fat warnings about incompetence!?"

Linux: Tracking Regressions

May 21, 2007 - 6:33am
Submitted by Jeremy on May 21, 2007 - 6:33am.
Linux news

The task of tracking regressions between kernel releases [story] has been picked up by Michal Piotrowski who maintains a "known regressions" wiki page at Kernel Newbies. The list is divided into sections and mailed out to the lkml after each release candidate.

Linux: Releasing With Known Regressions

April 27, 2007 - 3:32pm
Submitted by Jeremy on April 27, 2007 - 3:32pm.
Linux news

Following the release announcement of the 2.6.21 Linux kernel [story], Adrian Bunk noted that he no longer planned to track regressions [story]. He explained, "if we would take 'no regressions' seriously, it might take 4 or 5 months between releases due to the lack of developer manpower for handling regressions. But that should be considered OK if avoiding regressions was considered more important than getting as quick as possible to the next two week regression-merge window."

Linus Torvalds disagreed with Adrian's view that increasing the length of the release cycle would improve stability, "regressions _increase_ with longer release cycles. They don't get fewer." He went on to add, "you are ignoring the reality of development. The reality is that you have to balance things. If you have a four-month release cycle, where three and a half months are just 'wait for reports to trickle in from testers', you simply won't get _anything_ done. People will throw their hands up in frustration and go somewhere else." He continued:

"Do you really think bugs get fixed faster just because there wasn't a release? Quite the reverse. Bugs get _found_ faster thanks to a release (simply because you tend to get more information thanks to more users), giving the stable people more information, causing the bugs to be able to be found and fixed _more_quickly_ in the stable release than if we had waited for four months to release 2.6.21."

speck-geostationary