login
Header Space

 
 

Linux: Introducing Bugs

June 20, 2007 - 6:15pm
Submitted by Jeremy on June 20, 2007 - 6:15pm.
Linux news

In another thread discussing the tracking of kernel regressions [story], Linux creator Linus Torvalds noted that the kernel is evolving so quickly it is inevitable that bugs will be introduced. He used a git query to determine that there are an average of over 65 patches being committed every single day, "that translates to five hundred commits a week, two _thousand_ commits per month, and 25 thousand commits per year. As a fairly constant stream. Will mistakes happen? Hell *yes*." He continued on to add, "and I'd argue that any flow that tries to 'guarantee' that mistakes don't happen is broken. It's a sure-fire way to just frustrate people, simply because it assumes a level of perfection in maintainers and developers that isn't possible." He then offered a number of calculations based on the number of lines of code added in the past 100 days, summarizing:

"Even by the most *stringent* reasonable rules, we add a new bug every four days. That's just something that people need to accept. The people who say 'we must never introduce a regression' aren't living on planet earth, they are living in some wonderful world of Blarney, where mistakes don't happen, developers are perfect, hardware is perfect, and maintainers always catch things."


From: Adrian Bunk [email blocked]
Subject: How to improve the quality of the kernel?
Date:	Sun, 17 Jun 2007 16:29:50 +0200

On Sun, Jun 17, 2007 at 03:17:58PM +0200, Michal Piotrowski wrote:
> On 17/06/07, Adrian Bunk [email blocked] wrote:
>...
>> Fine with me, but:
>>
>> There are not so simple cases like big infrastructure patches with
>> 20 other patches in the tree depending on it causing a regression, or
>> even worse, a big infrastructure patch exposing a latent old bug in some
>> completely different area of the kernel.
>
> It is different case.
>
> "If the patch introduces a new regression"
>
> introduces != exposes an old bug

My remark was meant as a note "this sentence can't handle all 
regressions" (and for a user it doesn't matter whether a new 
regression is introduced or an old regression exposed).

It could be we simply agree on this one.  ;-)

> Removal of 20 patches will be painful, but sometimes you need to
> "choose minor evil to prevent a greater one" [1].
> 
>> And we should be aware that reverting is only a workaround for the real
>> problem which lies in our bug handling.
>...

And this is something I want to emphasize again.

How can we make any progress with the real problem and not only the 
symptoms?

There's now much money in the Linux market, and the kernel quality 
problems might result in real costs in the support of companies like
IBM, SGI, Redhat or Novell (plus it harms the Linux image which might 
result in lower revenues).

If [1] this is true, it might even pay pay off for them to each assign 
X man hours per month of experienced kernel developers to upstream 
kernel bug handling?

This is just a wild thought and it might be nonsense - better 
suggestions for solving our quality problems would be highly welcome...

cu
Adrian

[1] note that this is an "if"

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


From: Bartlomiej Zolnierkiewicz [email blocked] Subject: Re: How to improve the quality of the kernel? Date: Sun, 17 Jun 2007 20:53:41 +0200 Hi, On Sunday 17 June 2007, Adrian Bunk wrote: > On Sun, Jun 17, 2007 at 03:17:58PM +0200, Michal Piotrowski wrote: > > On 17/06/07, Adrian Bunk [email blocked] wrote: > >... > >> Fine with me, but: > >> > >> There are not so simple cases like big infrastructure patches with > >> 20 other patches in the tree depending on it causing a regression, or > >> even worse, a big infrastructure patch exposing a latent old bug in some > >> completely different area of the kernel. > > > > It is different case. > > > > "If the patch introduces a new regression" > > > > introduces != exposes an old bug > > My remark was meant as a note "this sentence can't handle all > regressions" (and for a user it doesn't matter whether a new > regression is introduced or an old regression exposed). > > It could be we simply agree on this one. ;-) > > > Removal of 20 patches will be painful, but sometimes you need to > > "choose minor evil to prevent a greater one" [1]. > > > >> And we should be aware that reverting is only a workaround for the real > >> problem which lies in our bug handling. > >... > > And this is something I want to emphasize again. > > How can we make any progress with the real problem and not only the > symptoms? > > There's now much money in the Linux market, and the kernel quality > problems might result in real costs in the support of companies like > IBM, SGI, Redhat or Novell (plus it harms the Linux image which might > result in lower revenues). > > If [1] this is true, it might even pay pay off for them to each assign > X man hours per month of experienced kernel developers to upstream > kernel bug handling? > > This is just a wild thought and it might be nonsense - better > suggestions for solving our quality problems would be highly welcome... IMO we should concentrate more on preventing regressions than on fixing them. In the long-term preventing bugs is cheaper than fixing them afterwards. First let me tell you all a little story... Over two years ago I've reviewed some _cleanup_ patch and noticed three bugs in it (in other words I potentially prevented three regressions). I also asked for more thorough verification of the patch as I suspected that it may have more problems. The author fixed the issues and replied that he hasn't done the full verification yet but he doesn't suspect any problems... Fast forward... Year later I discover that the final version of the patch hit the mainline. I don't remember ever seeing the final version in my mailbox (there are no patch description. However the worse part is that it seems that the full verification has never been done. The result? Regression in the release kernel (exactly the issue that I was worried about) which required three patches and over a month to be fixed completely. It seems that a year was not enough to get this ~70k _cleanup_ patch fully verified and tested (it hit -mm soon before being merged)... From reviewer's POV: I have invested my time into review, discovered real issues and as a reward I got no credit et all and extra frustration from the fact that part of my review was forgotten/ignored (the part which resulted in real regression in the release kernel)... Oh and in the past the said developer has already been asked (politely in private message) to pay more attention to his changes (after I silently fixed some other regression caused by his other patch). But wait there is more, I happend to be the maintainer of the subsystem which got directly hit by the issue and I was getting bugreports from the users about the problem... :-) It wasn't my first/last bad experience as a reviewer... finally I just gave up on reviewing other people patches unless they are stricly for IDE subsystem. The moral of the story is that currently it just doesn't pay off to do code reviews. From personal POV it pays much more to wait until buggy patch hits the mainline and then fix the issues yourself (at least you will get some credit). To change this we should put more ephasize on the importance of code reviews by "rewarding" people investing their time into reviews and "rewarding" developers/maintainers taking reviews seriously. We should credit reviewers more, sometimes it takes more time/knowledge to review the patch than to make it so getting near to zero credit for review doesn't sound too attractive. Hmm, wait it can be worse - your review may be ignored... ;-) From my side I think I'll start adding less formal "Reviewed-by" to IDE patches even if the review resulted in no issues being found (in additon to explicit "Acked-by" tags and crediting people for finding real issues - which I currently always do as a way for showing my appreciation for their work). I also encourage other maintainers/developers to pay more attention to adding "Acked-by"/"Reviewed-by" tags and crediting reviewers. I hope that maintainers will promote changes that have been reviewed by others by giving them priority over other ones (if the changes are on more-or-less the same importance level of course, you get the idea). Now what to do with people who ignore reviews and/or have rather high regressions/patches ratio? I think that we should have info about regressions integrated into SCM, i.e. in git we should have optional "fixes-commit" tag and we should be able to do some reverse data colletion. This feature combined with "Author:" info after some time should give us some very interesting statistics (Top Ten "Regressors"). It wouldn't be ideal (ie. we need some patches threshold to filter out people with 1 patch and >= 1 regression(s), we need to remember that some code areas are more difficult than the others and that patches are not equal per se etc.) however I believe than making it into Top Ten "Regressors" should give the winners some motivation to improve their work ethic. Well, in the worst case we would just get some extra trivial/documentation patches. ;-) Sorry for a bit chaotic mail but I hope that message is clear. Thanks, Bart
From: Andrew Morton [email blocked] Subject: Re: How to improve the quality of the kernel? Date: Sun, 17 Jun 2007 11:52:58 -0700 On Sun, 17 Jun 2007 20:53:41 +0200 Bartlomiej Zolnierkiewicz [email blocked] wrote: > > > IMO we should concentrate more on preventing regressions than on fixing them. > In the long-term preventing bugs is cheaper than fixing them afterwards. > > First let me tell you all a little story... > > Over two years ago I've reviewed some _cleanup_ patch and noticed three bugs > in it (in other words I potentially prevented three regressions). I also > asked for more thorough verification of the patch as I suspected that it may > have more problems. The author fixed the issues and replied that he hasn't > done the full verification yet but he doesn't suspect any problems... > > Fast forward... > > Year later I discover that the final version of the patch hit the mainline. > I don't remember ever seeing the final version in my mailbox (there are no > cc: lines in the patch description) and I saw that I'm not credited in the > patch description. However the worse part is that it seems that the full > verification has never been done. The result? Regression in the release > kernel (exactly the issue that I was worried about) which required three > patches and over a month to be fixed completely. It seems that a year > was not enough to get this ~70k _cleanup_ patch fully verified and tested > (it hit -mm soon before being merged)... crap. Commit ID, please ;) > >From reviewer's POV: I have invested my time into review, discovered real > issues and as a reward I got no credit et all and extra frustration from the > fact that part of my review was forgotten/ignored (the part which resulted in > real regression in the release kernel)... Oh and in the past the said > developer has already been asked (politely in private message) to pay more > attention to his changes (after I silently fixed some other regression caused > by his other patch). > > But wait there is more, I happend to be the maintainer of the subsystem which > got directly hit by the issue and I was getting bugreports from the users about > the problem... :-) > > It wasn't my first/last bad experience as a reviewer... finally I just gave up > on reviewing other people patches unless they are stricly for IDE subsystem. > > The moral of the story is that currently it just doesn't pay off to do > code reviews. I dunno. I suspect (hope) that this was an exceptional case, hence one should not draw general conclusions from it. It certainly sounds very bad. > From personal POV it pays much more to wait until buggy patch > hits the mainline and then fix the issues yourself (at least you will get > some credit). To change this we should put more ephasize on the importance > of code reviews by "rewarding" people investing their time into reviews > and "rewarding" developers/maintainers taking reviews seriously. > > We should credit reviewers more, sometimes it takes more time/knowledge to > review the patch than to make it so getting near to zero credit for review > doesn't sound too attractive. Hmm, wait it can be worse - your review > may be ignored... ;-) > > >From my side I think I'll start adding less formal "Reviewed-by" to IDE > patches even if the review resulted in no issues being found (in additon to > explicit "Acked-by" tags and crediting people for finding real issues - which > I currently always do as a way for showing my appreciation for their work). yup, Reviewed-by: is good and I do think we should start adopting it, although I haven't thought through exactly how. On my darker days I consider treating a Reviewed-by: as a prerequisite for merging. I suspect that would really get the feathers flying. > I also encourage other maintainers/developers to pay more attention to > adding "Acked-by"/"Reviewed-by" tags and crediting reviewers. I hope > that maintainers will promote changes that have been reviewed by others > by giving them priority over other ones (if the changes are on more-or-less > the same importance level of course, you get the idea). > > Now what to do with people who ignore reviews and/or have rather high > regressions/patches ratio? Ignoring a review would be a wildly wrong thing to do. It's so unusual that I'd be suspecting a lost email or an i-sent-the-wrong-patch. As for high regressions/patches ratio: that'll be hard to calculate and tends to be dependent upon the code which is being altered rather than who is doing the altering: some stuff is just fragile, for various reasons. One ratio which we might want to have a think about is the patches-sent versus reviews-done ratio ;) > I think that we should have info about regressions integrated into SCM, > i.e. in git we should have optional "fixes-commit" tag and we should be > able to do some reverse data colletion. This feature combined with > "Author:" info after some time should give us some very interesting > statistics (Top Ten "Regressors"). It wouldn't be ideal (ie. we need some > patches threshold to filter out people with 1 patch and >= 1 regression(s), > we need to remember that some code areas are more difficult than the others > and that patches are not equal per se etc.) however I believe than making it > into Top Ten "Regressors" should give the winners some motivation to improve > their work ethic. Well, in the worst case we would just get some extra > trivial/documentation patches. ;-) We of course do want to minimise the amount of overhead for each developer. I'm a strong believer in specialisation: rather than requiring that *every* developer/maintainer integrate new steps in their processes it would be better to allow them to proceed in a close-to-usual fashion and to provide for a specialist person (or team) to do the sorts of things which you're thinking about.
From: Bartlomiej Zolnierkiewicz [email blocked] Subject: Re: How to improve the quality of the kernel? Date: Sun, 17 Jun 2007 23:49:08 +0200 On Sunday 17 June 2007, Andrew Morton wrote: > On Sun, 17 Jun 2007 20:53:41 +0200 Bartlomiej Zolnierkiewicz [email blocked] wrote: > > > > > > > IMO we should concentrate more on preventing regressions than on fixing them. > > In the long-term preventing bugs is cheaper than fixing them afterwards. > > > > First let me tell you all a little story... > > > > Over two years ago I've reviewed some _cleanup_ patch and noticed three bugs > > in it (in other words I potentially prevented three regressions). I also > > asked for more thorough verification of the patch as I suspected that it may > > have more problems. The author fixed the issues and replied that he hasn't > > done the full verification yet but he doesn't suspect any problems... > > > > Fast forward... > > > > Year later I discover that the final version of the patch hit the mainline. > > I don't remember ever seeing the final version in my mailbox (there are no > > cc: lines in the patch description) and I saw that I'm not credited in the > > patch description. However the worse part is that it seems that the full > > verification has never been done. The result? Regression in the release > > kernel (exactly the issue that I was worried about) which required three > > patches and over a month to be fixed completely. It seems that a year > > was not enough to get this ~70k _cleanup_ patch fully verified and tested > > (it hit -mm soon before being merged)... > > crap. Commit ID, please ;) Will send in pm. I don't want to reveal the "guilty" person identify in public. > > >From reviewer's POV: I have invested my time into review, discovered real > > issues and as a reward I got no credit et all and extra frustration from the > > fact that part of my review was forgotten/ignored (the part which resulted in > > real regression in the release kernel)... Oh and in the past the said > > developer has already been asked (politely in private message) to pay more > > attention to his changes (after I silently fixed some other regression caused > > by his other patch). > > > > But wait there is more, I happend to be the maintainer of the subsystem which > > got directly hit by the issue and I was getting bugreports from the users about > > the problem... :-) > > > > It wasn't my first/last bad experience as a reviewer... finally I just gave up > > on reviewing other people patches unless they are stricly for IDE subsystem. > > > > The moral of the story is that currently it just doesn't pay off to do > > code reviews. > > I dunno. I suspect (hope) that this was an exceptional case, hence one > should not draw general conclusions from it. It certainly sounds very bad. I've been too long around to not learn a few things... rule #3 of successful kernel developer Ignore reviewers - fix the bugs but don't credit reviewers (crediting them makes your patch and you look less perfect), if they are asking question requiring you to do the work (verification of taken assumptions etc.) do not check anything - answer in a misleading way and present the assumptions you've taken as a truth written in the stone - eventually they will do verification themselves. I really shouldn't be giving these rules out (at least for free 8) so this time only #3 but there are much more rules and they are as dead serious as Linus' advices on Linux kernel management style... > > From personal POV it pays much more to wait until buggy patch > > hits the mainline and then fix the issues yourself (at least you will get > > some credit). To change this we should put more ephasize on the importance > > of code reviews by "rewarding" people investing their time into reviews > > and "rewarding" developers/maintainers taking reviews seriously. > > > > We should credit reviewers more, sometimes it takes more time/knowledge to > > review the patch than to make it so getting near to zero credit for review > > doesn't sound too attractive. Hmm, wait it can be worse - your review > > may be ignored... ;-) > > > > >From my side I think I'll start adding less formal "Reviewed-by" to IDE > > patches even if the review resulted in no issues being found (in additon to > > explicit "Acked-by" tags and crediting people for finding real issues - which > > I currently always do as a way for showing my appreciation for their work). > > yup, Reviewed-by: is good and I do think we should start adopting it, > although I haven't thought through exactly how. Adding Reviewed-by for reviews which highlighted real issues is obvious (with more detailed credits for noticed problems in the patch description). Also when somebody reviewed your patch but the discussions it turned out that the patch is valid - the review itself was still valuable so it would be appropriate to credit the reviewer by adding Reviewed-by:. > On my darker days I consider treating a Reviewed-by: as a prerequisite for > merging. I suspect that would really get the feathers flying. Easy to workaround by a friendly mine "Reviewed-by:" for yours "Reviewed-by:" deals (without any _proper_ review being done in reality)... ;) > > I also encourage other maintainers/developers to pay more attention to > > adding "Acked-by"/"Reviewed-by" tags and crediting reviewers. I hope > > that maintainers will promote changes that have been reviewed by others > > by giving them priority over other ones (if the changes are on more-or-less > > the same importance level of course, you get the idea). > > > > Now what to do with people who ignore reviews and/or have rather high > > regressions/patches ratio? > > Ignoring a review would be a wildly wrong thing to do. It's so unusual > that I'd be suspecting a lost email or an i-sent-the-wrong-patch. It is not unusual et all. I mean patches which affect code in such way that it is difficult to prove it's (in)correctness without doing time consuming audit. ie. lets imagine doing a small patch affecting many drivers - you've tested it quickly on your driver/hardware, then you skip the part of verifying correctness of new code in other drivers and just push the patch As a patch author you can either assume "works for me" and push the patch or do the audit (requires good understanding of the changed code and could be time consuming). It is usually quite easy to find out which approach the author has choosen - the very sparse patch description combined with the changes in code behavior not mentioned in the patch description should raise the red flag. :) As a reviewer having enough knowledge in the area of code affected by patch you can see the potential problems but you can't prove them without doing the time consuming part. You may try to NACK the patch if you have enough power but you will end up being bypassed by not proving incorrectness of the patch (not to mention that developer will feel bad about you NACKing his patch). Now the funny thing is that despite the fact that audit takes more time/knowledge then making the patch you will end up with zero credit if patch turns out to be (luckily) correct. Even if you find out issues and report them you are still on mercy of author for being credited so from personal POV you are much better to wait and fix issues after they hit mainline kernel. You have to choose between being a good citizen and preventing kernel regressions or being bastard and getting the credit. ;) If you happen to be maintainer of the affected code the choice is similar with more pros for letting the patch in especially if you can't afford the time to do audit (and by being maintainer you are guaranteed to be heavily time constrained). I hope this makes people see the importance of proper review and proper recognition of reviewers in preventing kernel regressions. > As for high regressions/patches ratio: that'll be hard to calculate and > tends to be dependent upon the code which is being altered rather than who > is doing the altering: some stuff is just fragile, for various reasons. > > One ratio which we might want to have a think about is the patches-sent > versus reviews-done ratio ;) Sounds like a good idea. > > I think that we should have info about regressions integrated into SCM, > > i.e. in git we should have optional "fixes-commit" tag and we should be > > able to do some reverse data colletion. This feature combined with > > "Author:" info after some time should give us some very interesting > > statistics (Top Ten "Regressors"). It wouldn't be ideal (ie. we need some > > patches threshold to filter out people with 1 patch and >= 1 regression(s), > > we need to remember that some code areas are more difficult than the others > > and that patches are not equal per se etc.) however I believe than making it > > into Top Ten "Regressors" should give the winners some motivation to improve > > their work ethic. Well, in the worst case we would just get some extra > > trivial/documentation patches. ;-) > > We of course do want to minimise the amount of overhead for each developer. > I'm a strong believer in specialisation: rather than requiring that *every* > developer/maintainer integrate new steps in their processes it would be > better to allow them to proceed in a close-to-usual fashion and to provide > for a specialist person (or team) to do the sorts of things which you're > thinking about. Makes sense... however we need to educate each and every developer about importance of the code review and proper recognition of reviewers. Thanks, Bart
From: Stefan Richter [email blocked] Subject: Re: How to improve the quality of the kernel? Date: Mon, 18 Jun 2007 01:15:15 +0200 Bartlomiej Zolnierkiewicz wrote: > despite the fact that audit takes > more time/knowledge then making the patch you will end up with zero credit > if patch turns out to be (luckily) correct. Even if you find out issues > and report them you are still on mercy of author for being credited If we introduce a "Reviewed-by" with reasonably clear semantics (different from Signed-off-by; e.g. the reviewer is not a middle-man in patch forwarding; the reviewer might have had remaining reservations... very similar to but not entirely the same as "Acked-by" as currently defined in -mm) --- and also make the already somewhat established "Tested-by" more official, --- then the maintainers could start to make it a habit to add Reviewed-by and Tested-by. Plus, reviewers and testers could formally reply with Reviewed-by and Tested-by lines to patch postings and even could explicitly ask the maintainer to add these lines. > so from personal POV you are much better to wait and fix issues after they > hit mainline kernel. You have to choose between being a good citizen and > preventing kernel regressions or being bastard and getting the credit. ;) > > If you happen to be maintainer of the affected code the choice is similar > with more pros for letting the patch in especially if you can't afford the > time to do audit (and by being maintainer you are guaranteed to be heavily > time constrained). I don't think that a maintainer (who signs off on patches after all) can easily afford to take the "bastard approach". I may be naive. -- Stefan Richter -=====-=-=== -==- =--=- http://arcgraph.de/sr/
From: Andrew Morton [email blocked] Subject: Re: How to improve the quality of the kernel? Date: Sun, 17 Jun 2007 22:09:27 -0700 On Mon, 18 Jun 2007 01:15:15 +0200 Stefan Richter [email blocked] wrote: > Tested-by Tested-by would be good too. Because over time, we will generate a list of people who own the relevant hardware and who are prepared to test changes. So if you make changes to random-driver.c you can do `git-log random-driver.c|grep Tested-by" to find people who can test your changes for you. Not that many people are likely to bother. The consequences of being slack are negligible, hence there is little incentive to do the extra work.
From: Fortier,Vincent [email blocked] Subject: RE: How to improve the quality of the kernel? Date: Mon, 18 Jun 2007 09:23:20 -0400 > -----Message d'origine----- > De : [email blocked] > De la part de > Andrew Morton > > On Mon, 18 Jun 2007 01:15:15 +0200 Stefan Richter > [email blocked] wrote: > > > Tested-by > > Tested-by would be good too. Because over time, we will > generate a list of people who own the relevant hardware and > who are prepared to test changes. Why not include a user-space tool that, when invoked, if you agree to send personnal info, sends your hardware vs driver info to a web database + your email address (maybie even you .config, etc..) ... In case of help for testing new patches/finding a bug/etc.. your email could be used by maintainers to ask for help... > So if you make changes to random-driver.c you can do `git-log > random-driver.c|grep Tested-by" to find people who can test > your changes for you. You would'nt even need to search in GIT. Maybie even when ever a patchset is being proposed a mail could be sent to appropriate hardware/or feature pseudo-auto-generated mailing-list? On lkml I mostly try to follow patches/bugs associated with hardware I use. Why not try to automate the process and get more testers in? - vin
From: "Natalie Protasevich" [email blocked] Subject: Re: How to improve the quality of the kernel? Date: Mon, 18 Jun 2007 15:31:04 -0700 On 6/18/07, Fortier,Vincent [Montreal] wrote: > > -----Message d'origine----- > > De : [email blocked] > > De la part de > > Andrew Morton > > > > On Mon, 18 Jun 2007 01:15:15 +0200 Stefan Richter > > [email blocked] wrote: > > > > > Tested-by > > > > Tested-by would be good too. Because over time, we will > > generate a list of people who own the relevant hardware and > > who are prepared to test changes. > > Why not include a user-space tool that, when invoked, if you agree to > send personnal info, sends your hardware vs driver info to a web > database + your email address (maybie even you .config, etc..) ... In > case of help for testing new patches/finding a bug/etc.. your email > could be used by maintainers to ask for help... > > > So if you make changes to random-driver.c you can do `git-log > > random-driver.c|grep Tested-by" to find people who can test > > your changes for you. > > You would'nt even need to search in GIT. Maybie even when ever a > patchset is being proposed a mail could be sent to appropriate > hardware/or feature pseudo-auto-generated mailing-list? > > On lkml I mostly try to follow patches/bugs associated with hardware I > use. Why not try to automate the process and get more testers in? > I think this is an excellent point. One data point could be a field in bugzilla to input the hardware information. Simple query can select common hardware and platform. So far it's not working when hardware is just mentioned in the text part. --Natalie
From: Martin Bligh [email blocked] Subject: Re: How to improve the quality of the kernel? Date: Mon, 18 Jun 2007 15:41:38 -0700 >> > So if you make changes to random-driver.c you can do `git-log >> > random-driver.c|grep Tested-by" to find people who can test >> > your changes for you. >> >> You would'nt even need to search in GIT. Maybie even when ever a >> patchset is being proposed a mail could be sent to appropriate >> hardware/or feature pseudo-auto-generated mailing-list? >> >> On lkml I mostly try to follow patches/bugs associated with hardware I >> use. Why not try to automate the process and get more testers in? >> > > I think this is an excellent point. One data point could be a field in > bugzilla to input the hardware information. Simple query can select > common hardware and platform. So far it's not working when hardware is > just mentioned in the text part. if it's free text it'll be useless for search ... I suppose we could do drop-downs for architecture at least? Not sure much beyond that would work ... *possibly* the common drivers, but I don't think we'd get enough coverage for it to be of use. M.
From: "Natalie Protasevich" [email blocked] Subject: Re: How to improve the quality of the kernel? Date: Mon, 18 Jun 2007 15:56:19 -0700 On 6/18/07, Martin Bligh [email blocked] wrote: > >> > So if you make changes to random-driver.c you can do `git-log > >> > random-driver.c|grep Tested-by" to find people who can test > >> > your changes for you. > >> > >> You would'nt even need to search in GIT. Maybie even when ever a > >> patchset is being proposed a mail could be sent to appropriate > >> hardware/or feature pseudo-auto-generated mailing-list? > >> > >> On lkml I mostly try to follow patches/bugs associated with hardware I > >> use. Why not try to automate the process and get more testers in? > >> > > > > I think this is an excellent point. One data point could be a field in > > bugzilla to input the hardware information. Simple query can select > > common hardware and platform. So far it's not working when hardware is > > just mentioned in the text part. > > if it's free text it'll be useless for search ... I suppose we could > do drop-downs for architecture at least? Not sure much beyond that > would work ... *possibly* the common drivers, but I don't think > we'd get enough coverage for it to be of use. > How about several buckets for model/BIOS version/chipset etc., at least optional, and some will be relevant some not for particular cases. But at least people will make an attempt to collect such data from their system, boards, etc. --Natalie
From: Martin Bligh [email blocked] Subject: Re: How to improve the quality of the kernel? Date: Mon, 18 Jun 2007 16:59:25 -0700 Natalie Protasevich wrote: > On 6/18/07, Martin Bligh [email blocked] wrote: >> >> > So if you make changes to random-driver.c you can do `git-log >> >> > random-driver.c|grep Tested-by" to find people who can test >> >> > your changes for you. >> >> >> >> You would'nt even need to search in GIT. Maybie even when ever a >> >> patchset is being proposed a mail could be sent to appropriate >> >> hardware/or feature pseudo-auto-generated mailing-list? >> >> >> >> On lkml I mostly try to follow patches/bugs associated with hardware I >> >> use. Why not try to automate the process and get more testers in? >> >> >> > >> > I think this is an excellent point. One data point could be a field in >> > bugzilla to input the hardware information. Simple query can select >> > common hardware and platform. So far it's not working when hardware is >> > just mentioned in the text part. >> >> if it's free text it'll be useless for search ... I suppose we could >> do drop-downs for architecture at least? Not sure much beyond that >> would work ... *possibly* the common drivers, but I don't think >> we'd get enough coverage for it to be of use. > > How about several buckets for model/BIOS version/chipset etc., at > least optional, and some will be relevant some not for particular > cases. But at least people will make an attempt to collect such data > from their system, boards, etc. Mmm. the problem is that either they're: 1. free text, in which case they're useless, as everyone types mis-spelled random crud into them. However, free-text search through the comment fields might work out. 2. Drop downs, in which case someone has to manage the lists etc, they're horribly crowded with lots of options. trying to do that for model/BIOS version/chipset would be a nightmare. If they're mandatory, they're a pain in the butt, and often irrelevant ... if they're optional, nobody will fill them in. Either way, they clutter the interface ;-( Sorry to be a wet blanket, but I've seen those sort of things before, and they just don't seem to work, especially in the environment we're in with such a massive diversity of hardware. If we can come up with some very clear, tightly constrained choices, that's a decent possibility. Nothing other than kernel architecture (i386 / x86_64 / ia64) or whatever springs to mind, but perhaps I'm being unimaginative. Nothing complicated ever seems to work ... even the simple stuff is difficult ;-( M.
From: Linus Torvalds [email blocked] Subject: Re: How to improve the quality of the kernel? Date: Mon, 18 Jun 2007 17:09:37 -0700 (PDT) On Mon, 18 Jun 2007, Martin Bligh wrote: > > Sorry to be a wet blanket, but I've seen those sort of things > before, and they just don't seem to work, especially in the > environment we're in with such a massive diversity of hardware. I do agree. It _sounds_ like a great idea to try to control the flow of patches better, but in the end, it needs to also be easy and painfree to the people involved, and also make sure that any added workflow doesn't require even *more* load and expertise on the already often overworked maintainers.. In many cases, I think it tends to *sound* great to try to avoid regressions in the first place - but it also sounds like one of those "I wish the world didn't work the way it did" kind of things. A worthy goal, but not necessarily one that is compatible with reality. Linus
From: Oleg Verych [email blocked] Subject: This is [Re:] How to improve the quality of the kernel[?]. Date: Tue, 19 Jun 2007 06:06:47 +0200 [Dear Debbug developers, i wish your ideas will be useful.] * From: Linus Torvalds * Newsgroups: gmane.linux.kernel * Date: Mon, 18 Jun 2007 17:09:37 -0700 (PDT) > > On Mon, 18 Jun 2007, Martin Bligh wrote: >> >> Sorry to be a wet blanket, but I've seen those sort of things >> before, and they just don't seem to work, especially in the >> environment we're in with such a massive diversity of hardware. > > I do agree. It _sounds_ like a great idea to try to control the flow of > patches better, There were some ideas, i will try to summarize: * New Patches (or sets) need tracking, review, testing Zero (tracking) done by sending (To, or bcc) [RFC] patch to something like [email blocked] (like BTS now). Notifications will be sent to intrested maintainers (if meta-information is ok) or testers (see below) First is mostly done by maintainers or interested individuals. Result is "Acked-by" and "Cc" entries in the next patch sent. Due to lack of tracking this things are done manually, are generally in trusted manner. But bad like <200706172053.41806.bzolnier@gmail.com> sometimes happens. When patch in sent to this PTS, your lovely checkpatch/check-whatever-crap-has-being-sent tools can be set up as gatekeepers, thus making impossible stupid errors with MIME coding, line wrapping, whatever style you've came up with now in checking sent crap. * Tracking results of review (Acked-by). This can be mostly e-mail exchange with comments and agreements. "Acked-by" semantic may be implemented in form of contlol message to tracking system, and this system will generate e-mail confirmation to the patch author in form of "Acked-by: Developer's Name <message-id of e-mail with acke-by>" Thus, next patch will have this entry. And if testing of this version ir regression happens, there's info about who is/was interested/involved. * Testing. Mainly same for "Tested-by" (newly suggested by Stefan <4675C083.6080409@s5r6.in-berlin.de>) |-*- Feature Needed -*- Addition, needed is hardware user tested have/had/used. Currently ``reportbug'' tool includes packed specific and system specific additions automaticly gathered and inserted to e-mail sent to BTS. (e.g. <http://permalink.gmane.org/gmane.linux.debian.devel.kernel/29740&gt;) Formats of that hardware profile(as system information in reportbug) . arch . chipset . hdd . vga ... in meaningful fields, and not just lspci -v[vv]. If additional info (-vvv) or something required, profile can be exteded. For kernel's sub-system information(as packed info): . subsystem/driver/kernel version (or similar) . maintainers must know what they need to see more here |-*- back to patches -*- Last and not least tast cases, that everyone might came up with. All formats this can be agreed (or implemented and updated latter) and inserted automaticly. * Commit. Id is recorded, patch archived. But any additions are welcome, regressions will pop up this patch again (reopen in BTS). > but in the end, it needs to also be easy and painfree to the people > involved, and also make sure that any added workflow doesn't require > even *more* load and expertise on the already often overworked > maintainers.. Experienced BTS users and developers. Please, correct me if i'm wrong. At least e-mail part of Debian's BTS and whole idea of it is *exactly* what is needed. Bugzilla fans, you can still use you useless pet, because Debian developers have done things, to track and e-mail states of bugs: <http://permalink.gmane.org/gmane.linux.debian.devel.kernel/29736&gt; > In many cases, I think it tends to *sound* great to try to avoid > regressions in the first place - but it also sounds like one of those "I > wish the world didn't work the way it did" kind of things. A worthy goal, > but not necessarily one that is compatible with reality. I wish perl hackers out there will join this yet-new effort. I know there many of them out there, writing kilobytes of checkfile and checkpatch (i've wrote in few lines of ``sed''). BTS is written on perl, but any interoperability interface, like stdin/stdout for python or shell hackers is worth of thinking about. Please, see more and make useful follows ups: http://bugs.debian.org/ Please, do not (<46772321.9080602@mbligh.org>) """ I know you hate bugzilla ... but at least I can try to make that bit of the process work better. """ [here's you fancy checkbox...] Thanks.
From: Adrian Bunk [email blocked] Subject: Re: This is [Re:] How to improve the quality of the kernel[?]. Date: Tue, 19 Jun 2007 14:48:55 +0200 On Tue, Jun 19, 2007 at 06:06:47AM +0200, Oleg Verych wrote: > [Dear Debbug developers, i wish your ideas will be useful.] > > * From: Linus Torvalds > * Newsgroups: gmane.linux.kernel > * Date: Mon, 18 Jun 2007 17:09:37 -0700 (PDT) > > > > On Mon, 18 Jun 2007, Martin Bligh wrote: > >> > >> Sorry to be a wet blanket, but I've seen those sort of things > >> before, and they just don't seem to work, especially in the > >> environment we're in with such a massive diversity of hardware. > > > > I do agree. It _sounds_ like a great idea to try to control the flow of > > patches better, > > There were some ideas, i will try to summarize: > > * New Patches (or sets) need tracking, review, testing > > Zero (tracking) done by sending (To, or bcc) [RFC] patch to something > like [email blocked] (like BTS now). Notifications will > be sent to intrested maintainers (if meta-information is ok) or testers > (see below) > > First is mostly done by maintainers or interested individuals. > Result is "Acked-by" and "Cc" entries in the next patch sent. Due to > lack of tracking this things are done manually, are generally in > trusted manner. But bad like <200706172053.41806.bzolnier@gmail.com> > sometimes happens. The goal is to get all patches for a maintained subsystem submitted to Linus by the maintainer. > When patch in sent to this PTS, your lovely > checkpatch/check-whatever-crap-has-being-sent tools can be set up as > gatekeepers, thus making impossible stupid errors with MIME coding, > line wrapping, whatever style you've came up with now in checking > sent crap. The -mm kernel already implements what your proposed PTS would do. Plus it gives testers more or less all patches currently pending inclusion into Linus' tree in one kernel they can test. The problem are more social problems like patches Andrew has never heard of before getting into Linus' tree during the merge window. >... > |-*- Feature Needed -*- > Addition, needed is hardware user tested have/had/used. Currently > ``reportbug'' tool includes packed specific and system specific > additions automaticly gathered and inserted to e-mail sent to BTS. > (e.g. <http://permalink.gmane.org/gmane.linux.debian.devel.kernel/29740&gt;) The problem is that most problems don't occur on one well-defined kind of hardware - patches often break in exactly the areas the patch author expected no problems in. And in many cases a patch for a device driver was written due to a bug report - in such cases a tester with the hardware in question is already available. >... > > but in the end, it needs to also be easy and painfree to the people > > involved, and also make sure that any added workflow doesn't require > > even *more* load and expertise on the already often overworked > > maintainers.. > > Experienced BTS users and developers. Please, correct me if i'm wrong. > At least e-mail part of Debian's BTS and whole idea of it is *exactly* > what is needed. Bugzilla fans, you can still use you useless pet, > because Debian developers have done things, to track and e-mail states > of bugs: <http://permalink.gmane.org/gmane.linux.debian.devel.kernel/29736&gt; >... "useless pet"? Be serious. How many open source projects use Bugzilla and how many use the Debian BTS? And then start thinking about why the "useless pet" has so many more user... The Debian BTS requires you to either write emails with control messages or generating control messages with external tools. In Bugzilla the same works through a web interface. I know both the Debian BTS and Bugzilla and although they are quite different they both are reasonable tools for their purpose. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed
From: Linus Torvalds [email blocked] Subject: Re: This is [Re:] How to improve the quality of the kernel[?]. Date: Tue, 19 Jun 2007 08:01:19 -0700 (PDT) On Tue, 19 Jun 2007, Adrian Bunk wrote: > > The goal is to get all patches for a maintained subsystem submitted to > Linus by the maintainer. Well, to be honest, I've actually over the years tried to have a policy of *never* really having black-and-white policies. The fact is, some maintainers are excellent. All the relevant patches *already* effectively go through them. But at the same time, other maintainers are less than active, and some areas aren't clearly maintained at all. Also, being a maintainer often means that you are busy and spend a lot of time talking to *people* - it doesn't necessarily mean that you actually have the hardware and can test things, nor does it necessarily mean that you know every detail. So I point out in Documentation/ManagementStyle (which is written very much tongue-in-cheek, but at the same time it's really *true*) that maintainership is often about recognizing people who just know *better* than you! > The -mm kernel already implements what your proposed PTS would do. > > Plus it gives testers more or less all patches currently pending > inclusion into Linus' tree in one kernel they can test. > > The problem are more social problems like patches Andrew has never heard > of before getting into Linus' tree during the merge window. Not really. The "problem" boils down to this: [torvalds@woody linux]$ git-rev-list --all --since=100.days.ago | wc -l 7147 [torvalds@woody linux]$ git-rev-list --no-merges --all --since=100.days.ago | wc -l 6768 ie over the last hundred days, we have averaged over 70 changes per day, and even ignoring merges and only looking at "pure patches" we have more than an average of 65 patches per day. Every day. Day in and day out. That translates to five hundred commits a week, two _thousand_ commits per month, and 25 thousand commits per year. As a fairly constant stream. Will mistakes happen? Hell *yes*. And I'd argue that any flow that tries to "guarantee" that mistakes don't happen is broken. It's a sure-fire way to just frustrate people, simply because it assumes a level of perfection in maintainers and developers that isn't possible. The accepted industry standard for bug counts is basically one bug per a thousand lines of code. And that's for released, *debugged* code. Yes, we should aim higher. Obviously. Let's say that we aim for 0.1 bugs per KLOC, and that we actually aim for that not just in _released_ code, but in patches. What does that mean? Do the math: git log -M -p --all --since=100.days.ago | grep '^+' | wc -l That basically takes the last one hundred days of development, shows it all as patches, and just counts the "new" lines. It takes about ten seconds to run, and returns 517252 for me right now. That's *over*half*a*million* lines added or changed! And even with the expectation that we do ten times better than what is often quoted as an industry average, and even with the expectation that this is already fully debugged code, that's at least 50 bugs in the last one hundred days. Yeah, we can be even more stringent, and actually subtract the number of lines _removed_ (274930), and assume that only *new* code contains bugs, and that's still just under a quarter million purely *added* lines, and maybe we'd expect just new 24 bugs in the last 100 days. [ Argument: some of the old code also contained bugs, so the lines added to replace it balance out. Counter-argument: new code is less well tested by *definition* than old code, so.. Counter-counter-argument: the new code was often added to _fix_ a bug, so the code removed had an even _higher_ bug rate than normal code.. End result? We don't know. This is all just food for thought. ] So here's the deal: even by the most *stringent* reasonable rules, we add a new bug every four days. That's just something that people need to accept. The people who say "we must never introduce a regression" aren't living on planet earth, they are living in some wonderful world of Blarney, where mistakes don't happen, developers are perfect, hardware is perfect, and maintainers always catch things. > The problem is that most problems don't occur on one well-defined > kind of hardware - patches often break in exactly the areas the patch > author expected no problems in. Note that the industry-standard 1-bug-per-kloc thing has nothing to do with hardware. Somebody earlier in this thread (or one of the related ones) said that "git bisect is only valid for bugs that happen due to hardware issues", which is just totally *ludicrous*. Yes, hardware makes it harder to test, but even *without* any hardware- specific issues, bugs happen. The developer just didn't happen to trigger the condition, or didn't happen to notice it when he *did* trigger it. So don't go overboard about "hardware". Yes, hardware-specific issues have their own set of problems, and yes, drivers have a much higher incidence of bugs per KLOC, but in the end, even *without* that, you'd still have to face the music. Even for stuff that isn't drivers. So this whole *notion* that you can get it right the first time is *insane*. We should aim for doing well, yes. But quite frankly, anybody who aims for "perfect" without taking reality into account is just not realistic. And if that's part of the goal of some "new process", then I'm not even interested in listening to people discuss it. If this plan cannot take reality into account, please stop Cc'ing me. I'm simply not interested. Any process that tries to "guarantee" that regressions don't happen is crap. Any process that tries to "guarantee" that we release only kernels without bugs can go screw itself. There's one thing I _can_ guarantee, and that's as long as we add a quarter million new lines per 100 days (and change another quarter million lines), we will have new bugs. No ifs, buts or maybe's about it. The process should aim for making them *fewer*. But any process that aims for total eradication of new bugs will result in one thing, and one thign only: we won't be getting any actual work done. The only way to guarantee no regressions is to make no progress. Linus
From: Oleg Verych [email blocked] Subject: Re: This is [Re:] How to improve the quality of the kernel[?]. Date: Tue, 19 Jun 2007 18:53:00 +0200 Linus, On Tue, Jun 19, 2007 at 08:01:19AM -0700, Linus Torvalds wrote: > > > On Tue, 19 Jun 2007, Adrian Bunk wrote: > > > > The goal is to get all patches for a maintained subsystem submitted to > > Linus by the maintainer. Nice quote. I'm trying to make proposition/convince Adrian, who is in opposition, but whole thread gets just like obeying his extreme POV... > But quite frankly, anybody who aims for "perfect" without taking reality > into account is just not realistic. And if that's part of the goal of some > "new process", then I'm not even interested in listening to people discuss > it. I'm proposing kind of smart tracking, summarized before. I'm not an idealist, doing manual work. Making tools -- is what i've picked up from one of your mails. Thus hope of having more opinions on that. > If this plan cannot take reality into account, please stop Cc'ing me. I'm > simply not interested. This one is last at least from me. Sorry for taking you time.
From: Linus Torvalds [email blocked] Subject: Re: This is [Re:] How to improve the quality of the kernel[?]. Date: Tue, 19 Jun 2007 10:04:58 -0700 (PDT) On Tue, 19 Jun 2007, Oleg Verych wrote: > > I'm proposing kind of smart tracking, summarized before. I'm not an > idealist, doing manual work. Making tools -- is what i've picked up from > one of your mails. Thus hope of having more opinions on that. Don't get me wrong, I wasn't actually responing to you personally, I was actually responding mostly to the tone of this thread. So I was responding to things like the example from Bartlomiej about missed opportunity for taking developer review into account (and btw, I think a little public shaming might not be a bad idea - I believe more in *social* rules than in *technical* rules), and I'm responding to some of the commentary by Adrian and others about "no regressions *ever*". These are things we can *wish* for. But the fact that we migth wish for them doesn't actually mean that they are really good ideas to aim for in practice. Let me put it another way: a few weeks ago there was this big news story in the New York Times about how "forgetting" is a very essential part about remembering, and people passed this around as if it was a big revelation. People think that people with good memories have a "good thing". And personally, I was like "Duh". Good memory is not about remembering everything. Good memory is about forgetting the irrelevant, so that the important stuff stands out and you *can* remember it. But the big deal is that yes, you have to forget stuff, and that means that you *will* miss details - but you'll hopefully miss the stuff you don't care for. The keyword being "hopefully". It works most of the time, but we all know we've sometimes been able to forget a detail that turned out to be crucial after all. So the *stupid* response to that is "we should remember everything". It misses the point. Yes, we sometimes forget even important details, but it's *so* important to forget details, that the fact that our brains occasionally forget things we later ended up needing is still *much* preferable to trying to remember everything. The same tends to be true of bug hunting, and regression tracking. There's a lot of "noise" there. We'll never get perfect, and I'll argue that if we don't have a system that tries to actively *remove* noise, we'll just be overwhelmed. But that _inevitably_ means that sometimes there was actually a signal in the noise that we ended up removing, because nobody saw it as anything but noise. So I think people should concentrate on turning "noise" into "clear signal", but at the same time remember that that inevitably is a "lossy" transformation, and just accept the fact that it will mean that we occasionally make "mistakes". This is why I've been advocating bugzilla "forget" stuff, for example. I tend to see bugzilla as a place where noise accumulates, rather than a place where noise is made into a signal. Which gets my to the real issue I have: the notion of having a process for _tracking_ all the information is actually totally counter-productive, if a big part of the process isn't also about throwing noise away. We don't want to "save" all the crud. I don't want "smart tracking" to keep track of everything. I want "smart forgetting", so that we are only left with the major signal - the stuff that matters. Linus



Related Links:

Automatic...

June 21, 2007 - 12:11am
Anonymous (not verified)

Can run software that automatically finds bugs in the source code?

Can run some kind of automated test suit to check if things break?

only up to a point. bugs

June 21, 2007 - 12:54am
turn.self.off (not verified)

only up to a point.

bugs come from all kinds of things. some can be detected at compile time. others cant be detected unless one run each and every possible, and impossible, usage scenario for every line of code more or less.

then there are regressions, bugs that come back because someone have been working on a new feature for a while and while doing so the area of code thats been worked on have gotten some bugs fixed that slipped under the persons radar.

so when said feature is committed to the main code, the bug comes back like some zombie out of the grave.

then there are false positives. i think linus torvalds have at times talked about not setting the compiler to "pedantic" mode as often what it will warn about are stuff that has to be done in a abnormal way to get the job done for some reason or other.

and lets not forget my personal favorite, race conditions. these things may not be spotted by many because they come from two or more tasks being done at the same time, where if a specific one finish ahead of the others it will lead to a error. often that only happens on machines that have a very higher or lower work pace then the machine the coder was working on at the time.

races ...

June 23, 2007 - 3:32am
Anonymous (not verified)

Yeah, I love them. I bet on them all the time (at least I bet they'll be present in code) and I'm rarely disappointed. I'm working on an embedded system at the moment and the hardware watchdog timer is of prime importance - no driver exists for my hardware as such but the sbc_epx_c3 driver is close and I can modify that. I haven't done an extensive review of the code yet, but the module init function looks like it's got a race condition in it - in particular it's a mutex implementation which doesn't quite work for a preemtible kernel. So from what I see so far, this code will work fine if the following conditions are met:
1. the system is not an SMP device (that condition is met in this particular case)
2. only 1 process ever opens the watchdog device - hey, I don't even trust myself to be able to enforce (2) - although there's not a high probability of multiple processes opening the watchdog, I certainly can't guarantee that it won't happen.

"Can run software that

June 21, 2007 - 2:00am

"Can run software that automatically finds bugs in the source code?"

Yes, you can.

"Can run some kind of automated test suit to check if things break?"

AutoTest if for you. If you need any additional information about testing there is a "Linux Kernel Tester’s Guide"

PDF version
http://www.stardust.webpages.pl/files/handbook/handbook-en-0.3-rc1.pdf

LaTeX source
http://www.stardust.webpages.pl/files/handbook/handbook-current.tar.bz2

You need to have test suit

June 21, 2007 - 2:17am
Anonymous (not verified)

You need to have test suit to run one.
Bug auto search can't find some bug, without knowledge about hardware and code semantic. You don't know if it is ok or not:

   if (FLAG1 & option)
      p->uid = 0;

or this one:

   if (FLAG2 & option)
      p->uid = 0;

In a general case, this is

June 21, 2007 - 8:11am
Anonymous (not verified)

In a general case, this is provably impossible. But there are common types of bugs that can be checked for either by the compiler, or by some sort of lint software (or coverity as given in another comment). Even those can only do so well. Basically, humans are needed to discover and fix bugs.

Coverity & LTP helps to some degree

June 21, 2007 - 4:40am
Anonymous (not verified)

Tools like Coverity (which they gracefully allow kernel devs free access to) (http://www.coverity.com) and LTP (Linux Test Project) (http://ltp.sourceforge.net/) do help a lot, but they'll never catch everything.

So, we need someone that

June 22, 2007 - 5:55am
Anonymous (not verified)

So, we need someone that fixes on average a bug each 4 days B-)

Anyway, dedicating several people to bug finding and bug fixing could help to reduce the chances on severe bugs to levels of the Dutch Delta Works. Of course this might cause a lack of attention on bugs by other patchers. Therefore, I suggest to also punish people for each bug that has been found in their patches. I leave open for discussion what's the best punishment method. Maybe a temporary longer artificial delay before their patches are included? This might be a good motivator to produce high quality code with less bugs, I guess.

.. or just less code. ;)

June 23, 2007 - 9:44pm
Anonymous (not verified)

.. or just less code. ;)

I think Linus is right.

June 22, 2007 - 4:14pm
Flewellyn (not verified)

It's impossible, in any large software project (which qualifier is something of an understatement when used for the Linux kernel), to avoid introducing bugs. It's possible and reasonable and necessary to try and minimize this, but it will happen even with the best of intentions and the most vigorous code review.

I think it's more useful, and more feasible, for the kernel developers to work on making the kernel fault-tolerant, so that the damage from bugs is limited in its scope, and the kernel can recover from faults. I know that the kernel developers already make efforts to do this, and have (I think) succeeded pretty well over the years; but, it might be helpful for the "core" people, such as Linus, Alan Cox, Andrew Morton, Greg KH, and such, to explicitly emphasize fault-tolerance over "perfect code".

Of course, when someone finds a bug, fix it, but bugs can go undetected for indefinite periods, and if the problem is mitigated in the meantime, it could mean (for instance) less actual damage to data or performance.

Yeah I totally agree about

June 24, 2007 - 2:06am
Jezze (not verified)

Yeah I totally agree about fault tolerance and the solution is simple. Admit defeat and switch to a microkernel architecture... :)

Customer service

June 26, 2007 - 4:13pm
Dave Peck (not verified)

The easy way out of critique is argumentum ad absurdum. Everyone knows perfection is impossible.

The fallacy of Linus is that zero regressions means zero bugs. Zero regressions means zero known bugs, not perfection. But regression count per se is not the big issue, it's "customer service" for want of a better phrase.

In big-name distros, dozens of known bugs sit for literally years unfixed, because distros take the attitude, "it will be fixed in the next kernel." So users get to wait another 18 months for their USB scanner or bluetooth mouse to work. That's the timeframe for distro turnaround. The kernel undergirds all distros, so it needs far more serious attention to bugs. A distro bug affects only its M users. Each kernel bug affects N distros and N*M users.

The kernel releases are not offering distros reasonable bug-freeness on which to target releases, so they don't target at all. They just take whatever's on offer, bugs and all, leaving users to suffer from Linus's philosophy. Meanwhile kernel devs happily re-compile -mm and bisect everything, so they don't face the same brick walls a typical user faces. Your average person just concludes "Linux is broken."

Experimental features break things. CONFIG_USB_SUSPEND broke hardware that used to work in Linux and now does not. Like many other flags in Linux, CONFIG_USB_SUSPEND should live in boot-flag land, not compile-flag land, so users can make their own choices about which set of bugs they want to run.

https://launchpad.net/ubuntu/+source/
sane-backends/+bug/85488/comments/103
"The time to fix this was during feisty development, by responding to this bug report. Instead the decision was taken to allow some of the best-supported scanners under linux to become broken, balancing that against the benefit elsewhere. Hopefully it was worth it - some kind of response to all these hundreds of bug reports would be nice!"

https://launchpad.net/ubuntu/+source/
sane-backends/+bug/85488/comments/107
"A windows user was looking at me while I was trying to scan a page with my Feisty and Canon Lide-30... He went away from my desk laughing aloud...It's a shame."

And yes, vendors ship imperfect/buggy firmware. As we know, perfection is impossible! Amazingly, Microsoft deals with the "real world" that Linus talks about - and makes buggy-hardware users happy.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
speck-geostationary