login
Header Space

 
 

David Miller

Bugs And Bureaucracy

April 14, 2008 - 6:35pm
Submitted by Jeremy on April 14, 2008 - 6:35pm.
Linux news

A thread on the Linux Kernel mailing list discussed the process in place for reporting, bisecting and fixing bugs. In response to a suggestion that some of the issues could be solved by introducing new procedures, Al Viro retorted, "we've got ourselves a developing beaurocracy. As in 'more and more ways of generating activity without doing anything even remotely useful'. Complete with tendency to operate in the ways that make sense only to bureaucracy in question and an ever-growing set of bylaws..." Later in the thread, David Miller agreed and noted that ,"the resulting 'bureaucracy' or whatever you want to call it is perceived to undercut the very thing that makes the Linux kernel fun to work on. It's still largely free form, loose, and flexible. And that's a notable accomplishment considering how much things have changed. That feeling is why I got involved in the first place, and I know it's what gets other new people in and addicted too."

Andrew Morton tried to return the discussion to its original topic, "the problem we're discussing here is the apparently-large number of bugs which are in the kernel, the apparently-large number of new bugs which we're adding to the kernel, and our apparent tardiness in addressing them." Al noted that some of the problem is that git is so efficient at merging code, "the patches going in during a merge (especially for a tree that collects from secondaries) are not visible enough. And it's too late at that point, since one has to do something monumentally ugly to get Linus revert a large merge. On the scale of Great IDE Mess in 2.5..." Another suggestion was made to replace bugzilla with something better, to which Andrew replied, "swapping out bugzilla for something else wouldn't help. We'd end up with lots of people ignoring a good bug tracking system just like they were ignoring a bad one."

Quote: Not Investing the Necessary Time

April 14, 2008 - 12:34pm
Submitted by Jeremy on April 14, 2008 - 12:34pm.

"Every single argument you make that supports why you should not be investing the necessary time into the bug applies equally to the very developers you are so quick to quip at and want help from."

— David Miller, in an April 11th, 2008 message on the Linux Kernel mailing list.

Quote: A Fart In A Spacesuit

April 1, 2008 - 8:48am
Submitted by Jeremy on April 1, 2008 - 8:48am.

"About as cool as a fart in a spacesuit."

— David Miller, in a March 31st, 2008 message on the Linux kernel mailing list.

Quote: Don't Panic

February 7, 2008 - 9:21am
Submitted by Jeremy on February 7, 2008 - 9:21am.

"It's ascii art I took it from someone's signature 12 years ago, it's meant to be the guy on the cover of some of the editions of the Hitchhikers Guide to the Galaxy by Douglas Adams. Don't Panic! :-)"

— David Miller, in a February 6th, 2008 message on the Linux Kernel mailing list.

Quote: Quality of the Bug Report

November 26, 2007 - 11:44pm
Submitted by Jeremy on November 26, 2007 - 11:44pm.

"This case is a good example to use the next time a stupid thread starts up about bug reports not being looked into. To me it seems clearly more a matter of the quality of the bug report."

— David Miller, in a November 15th, 2007 message on the Linux Kernel mailing list.

Quote: Glacial Development Pace

November 14, 2007 - 10:40pm
Submitted by Jeremy on November 14, 2007 - 10:40pm.

"Yeah, if you develop at the glacial pace Solaris does, don't add any features to cyclics or work on scalability improvements, sure it can be bug free and untouched for 6 years."

— David Miller, in a November 14th, 2007 message on the Linux Kernel mailing list.

Bug Fixing and Kernel Code Quality

November 13, 2007 - 9:03am
Submitted by Jeremy on November 13, 2007 - 9:03am.
Linux news

"This is the listing of the open bugs that are relatively new, around 2.6.22 and up. They are vaguely classified by specific area," Natalie Protasevich said, posting a current list of bugs each linking to an appropriate bugzilla.kernel.org entry. Andrew Morton reviewed the list, noting "no response from developers" in response to many of the bugs. David Miller pointed out that in some cases this wasn't true, referring to 46 bug fixes queued in his networking tree and another 10 already pushed upstream, "when someone like me is bug fixing full time, I take massive offense to the impression you're trying to give especially when it's directed at the networking. So turn it down a notch Andrew." Andrew wasn't convinced, "first we need to work out whether we have a problem. If we do this, then we can then have a think about what to do about it. I tried to convince the 2006 KS attendees that we have a problem and I resoundingly failed. People seemed to think that we're doing OK." He continued:

"This is not a minor matter. If the kernel _is_ slowly deteriorating then this won't become readily apparent until it has been happening for a number of years. By that stage there will be so much work to do to get us back to an acceptable level that it will take a huge effort. And it will take a long time after that for the kerel to get its reputation back. So it is important that we catch deterioration *early* if it is happening."

Debugging Multiple CPUs

October 22, 2007 - 2:19pm
Submitted by Jeremy on October 22, 2007 - 2:19pm.
Linux news

"Sysrq-p is pretty useless unless you can force the keyboard interrupt and the spinning process onto the same CPU," noted Chuck Ebbert during a discussion centered around debugging tasks stuck in a running state. Pressing the <Alt><SysRq><p> key combination is used for debugging, dumping the registers and flags from the CPU that handles the keypress interrupt to the console. UltraSPARC maintainer, David Miller, replied, "yes, I find this a painful limitation too," adding:

"Sparc64 used to dump the registers on all active cpus for show_regs() via a cross-call, and this was incredibly useful. But I disabled that as soon as I started playing with Niagara because at 32 cpus and larger the output is just too voluminous to be useful."

David then suggested, "what might be appropriate is just to get a one-line program counter dump on every cpu via some new sysrq keystroke." Chuck noted that similar functionality is provided by a patch in the -mm kernel, "IIRC -mm had something like this but it was buggy because we were sending IPIs to each processor asking them to print their state. Maybe it would work if we had a way of making them dump their state to a memory location and then collected and printed it from the CPU that's handling the sysrq."

Networking 2.6.24 Merge Plans

October 3, 2007 - 7:31am
Submitted by Jeremy on October 3, 2007 - 7:31am.
Linux news

"I'm a bit behind after investigating the TCP performance issues that turned out to be HW specific problems. It's a bit of a disappointment, I thought maybe there was a cool bug to fix in TCP :-)" explained David Miller, posting his networking merge plans for the upcoming 2.6.24 kernel. He noted, "I merged in Jeff Garzik's and John Linville's latest and I'm running the current tree on my workstation most of today with good results so far." David added, "I plan to commit my Neptune driver in it's current state, and that's the last new feature going in."

In an earlier discussion last month on the Linux netdev mailing list, David described how many changes were in his net-2.6.24 git tree, "it's to the point where every single bug fix put into Linus's tree creates a merge conflict with net-2.6.24, we are simply touching that much stuff. :-)" He added, "we've touched so much in net-2.6.24 that we really should be auditing the thing and fixing any bugs that have been added. If you're bored and looking for something to do, pick an odd NAPI driver and audit it in the net-2.6.24 tree."

Connect Specification versus Man Page

September 20, 2007 - 9:48pm
Submitted by Jeremy on September 20, 2007 - 9:48pm.
Linux news

Ulrich Drepper noted a difference between the Linux connect(2) man page and the POSIX specification. The former states, "connectionless sockets may dissolve the association by connecting to an address with the sa_family member of sockaddr set to AF_UNSPEC." The latter reads, "if address is a null address for the protocol, the socket's peer address shall be reset." Ulrich explained that he preferred the description in the Linux man page, but the Linux kernel seems to actually follow the POSIX specification, "is this functionality which got lost over time? Or is the man page wrong and this never was the case? Is this a worthwhile change?"

Alan Cox noted, "we got it from the 1003.4g draft socket specification if I remember rightly." David Miller suggested, "the whole AF_UNSPEC thing I'm almost certain comes from BSD, which has behaved that way for centuries." Alan concurred, "its entirely plausible that [the 1003.4g draft socket specification] got it from 4BSE." Ulrich concluded, "I guess I'll just go ahead and file a problem report with the spec. Maybe the Unix vendors will test their implementations and provide feedback."

Linux: Introducing The Data Corruption Bug

January 4, 2007 - 7:04pm
Submitted by Jeremy on January 4, 2007 - 7:04pm.
Linux news

When the data corruption bug which is fixed as of 2.6.20-rc3 [story] was still being tracked down [story], it was thought that the bug, a race in shared mmap'ed page writeback, might have been in the 2.6 kernel for a very long time. It has since been determined that the bug was introduced much more recently. Nick Piggin [interview] explains, "this bug was only introduced in 2.6.19, due to a change that caused pte dirty bits to be discarded without a subsequent set_page_dirty() (nowhere else in the kernel should have done this)." Linus Torvalds noted that earlier kernels could have been affected by a less serious version of the bug:

"Actually, I think 2.6.18 may have a subtle variation on it. But that much older race would only trigger on SMP (or possibly UP with preempt). And I haven't actually thought about it that much, so I could be full of crap. But I don't see anything that protects against it: we may hold the page lock, but since the code that marks things _dirty_ doesn't necessarily always hold it, that doesn't help us. And we may hold the 'private_lock', but we drop it before we do the dirty bit clearing, and in fact on UP+PREEMPT that very dropping could cause an active preemption to take place, so.. I dunno. For older kernels? If there is a race there, it must be pretty damn hard to hit in practice (and it must have been there for a looong time), so trying to fix it is possibly as likely to cause problems as it migh to fix them."

David Miller pointed out that some of the confusion as to when the bug was actually introduced comes from the fact that the original bug was against a 2.6.18 Debian kernel. Andrew Morton [interview] explained, "that was 2.6.18+debian-added-dirty-page-tracking-patches," then went on to caution that the fix still does not address a newly reported and currently unconfirmed BerkeleyDB corruption bug, "I'll assert (and emphasise) that the cause of the alleged BerkeleyDB corruption is not known at this time. The post-2.6.19 'fix' might make it go away. But if it does, we do not know why, and it might still be there, only harder to hit."

speck-geostationary