login
Header Space

 
 

crosscompiler [WAS: RFC: starting a kernel-testers group for newbies]

Previous thread: linux-next: upstream build failure: v4l/dvb by Stephen Rothwell on Tuesday, April 29, 2008 - 9:41 pm. (5 messages)

Next thread: [GIT PULL] ext4 update by Theodore Ts'o on Tuesday, April 29, 2008 - 10:45 pm. (1 message)
To: <linux-kernel@...>
Date: Tuesday, April 29, 2008 - 10:03 pm

This is starting to get beyond frustrating for me.

Yesterday, I spent the whole day bisecting boot failures
on my system due to the totally untested linux/bitops.h
optimization, which I fully analyzed and debugged.

Today, I had hoped that I could get some work done of my
own, but that's not the case.

Yet another bootup regression got added within the last 24
hours.

I don't mind fixing the regression or two during the merge
window but THIS IS ABSOLUTELY, FUCKING, REDICULIOUS!

The tree breaks every day, and it's becomming an extremely
non-fun environment to work in.

We need to slow down the merging, we need to review things
more, we need people to test their fucking changes!
--
To: David Miller <davem@...>
Cc: <linux-kernel@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 3:36 pm

Well, I must say I second that.

I'm not seeing regressions myself this time (well, except for the one that
Jiri fixed), but I did find a few of them during the post-2.6.24 merge window
and I wouldn't like to repeat that experience, so to speak.

IMO, the merge window is way too short for actually testing anything.  I rebuild
the kernel once or even twice a day and there's no way I can really test it.
I can only check if it breaks right away.  And if it does, there's no time to
find out what broke it before the next few hundreds of commits land on top of
that.

Thanks,
Rafael
--
To: Rafael J. Wysocki <rjw@...>
Cc: <davem@...>, <linux-kernel@...>, <torvalds@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 4:15 pm

On Wed, 30 Apr 2008 21:36:57 +0200

&lt;jumps up and down&gt;

There should be nothing in 2.6.x-rc1 which wasn't in 2.6.x-mm1!

_anything_ which appears in 2.6.x-rc1 and which wasn't in 2.6.x-mm1 was
snuck in too late (OK, apart from trivia and bugfixes).


If we decide that we need to fix the oh-shit-lets-slam-this-in-and-hope
problem then I expect we can do so, via fairly relible means.

But the first attempt at solving it should be to ask people to not do that.
--
To: Andrew Morton <akpm@...>
Cc: Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 4:31 pm

The problem I see with both -mm and linux-next is that they tend to be 
better at finding the "physical conflict" kind of issues (ie the merge 
itself fails) than the "code looks ok but doesn't actually work" kind of 
issue.

Why?

The tester base is simply too small.

Now, if *that* could be improved, that would be wonderful, but I'm not 
seeing it as very likely.

I think we have fairly good penetration these days with the regular -git 
tree, but I think that one is quite frankly a *lot* less scary than -mm or 
-next are, and there it has been an absolutely huge boon to get the kernel 
into the Fedora test-builds etc (and I _think_ Ubuntu and SuSE also 
started something like that).

So I'm very pessimistic about getting a lot of test coverage before -rc1.

Maybe too pessimistic, who knows?

		Linus
--
To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Wednesday, April 30, 2008 - 8:31 pm

First of all:
I 100% agree with Andrew that our biggest problems are in reviewing code 
and resolving bugs, not in finding bugs (we already have far too many 
unresolved bugs).

But although testing mustn't replace code reviews it is a great help, 
especially for identifying regressions early.

Finding testers should actually be relatively easy since it doesn't 
require much knowledge from the testers.

And it could even solve a second problem:

It could be a way for getting newbies into kernel development.

We actually do only rarely have tasks suitable as janitor tasks for 
newbies, and the results of people who do neither know the kernel
nor know C running checkpatch on files in the kernel have already
been discussed extensively...

I'll try to do this:
- create some Wiki page
- get a mailing list at vger
- point newbies to this mailing list
- tell people there which kernels to test
- figure out and document stuff like how to bisect between -next kernels

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--
To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Wednesday, April 30, 2008 - 3:03 am

On Thu, 1 May 2008 03:31:25 +0300

I would argue instead that we don't know which bugs to fix first.
We're never going to fix all bugs, and to be honest, that's ok.
As long as we fix the important bugs, we're doing really well.
And at least for the kerneloops.org reported issues, we're doing quite ok.

For me, 'important' is a combination of effect of the bug and the number of people
it'll hit. A compiler warning on parisc is less important than easy to trigger filesystem corruption
in ext3 that way; more people will hit it and the effect is more grave.


For oopses and WARN_ON()'s were getting to the hang of this now with kerneloops.org,
at least for the oopses that aren't really hard fatal. One thing I learned at least is that
lkml is a poor representation of what people actually hit; it's a very very selective
audience. 
oopses/warnons are only a subset of the bugs of course... but still.

So there's a few things we (and you / janitors) can do over time to get better data on what issues
people hit: 
1) Get automated collection of issues more wide spread. The wider our net the better we know which
   issues get hit a lot, and plain the more data we have on when things start, when they stop, etc etc.
   Especially if you get a lot of testers in your project, I'd like them to install the client for easy reporting
   of issues.
2) We should add more WARN_ON()s on "known bad" conditions. If it WARN_ON()'s, we can learn about it via
   the automated collection. And we can then do the statistics to figure out which ones happen a lot.
3) We need to get persistent-across-reboot oops saving going; there's some venues for this


--
To: Arjan van de Ven <arjan@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Thursday, May 1, 2008 - 7:30 am

That might be OK.

But our current status quo is not OK:

Check Rafael's regressions lists asking yourself
"How many regressions are older than two weeks?" 

The kernel Bugzilla curerntly knows about 212 open regression bugs.
(And many more have not made it into Bugzilla.)

We have unmaintained and de facto unmaintained parts of the kernel where 

No disagreement on this, its just a different issue than our bug fixing 
problem.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--
To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Wednesday, April 30, 2008 - 10:20 am

On Thu, 1 May 2008 14:30:38 +0300

"ext4 doesn't compile on m68k".
YAWN.

Wrong question...
"How many bugs that a sizable portion of users will hit in reality are there?"

And how many people are hitting those issues? If a part of the kernel is really
important to enough people, there tends to be someone who stands up to either fix
the issue or start de-facto maintaining that part.
And yes I know there's parts where that doesn't hold. But to be honest, there's
not that many of them that have active development (and thus get the biggest

No it's not! Knowing earlier and better which bugs get hit is NOT different
--
To: Arjan van de Ven <arjan@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Thursday, May 1, 2008 - 9:21 am

"Kernel oops while running kernbench and tbench on powerpc" took more 
than 2 months to get resolved, and we ship 2.6.25 with this regression.

Granted that compared to x86 there's not a sizable portion of users 
crazy enough to run Linux on powerpc machines...

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--
To: Adrian Bunk <bunk@...>
Cc: Arjan van de Ven <arjan@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Thursday, May 1, 2008 - 10:08 pm

That was a very subtle bug that only showed up on one particular
powerpc machine.  I was not able to replicate it on any of the powerpc
machines I have here.  Nevertheless, we found it and we have a fix for
it.  I think that's an example of the process working. :)

Paul.
--
To: Paul Mackerras <paulus@...>
Cc: Adrian Bunk <bunk@...>, Arjan van de Ven <arjan@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Thursday, May 1, 2008 - 11:10 pm

Was it even a regression in the classical sense of the word?  Seemed
more of a latent bug that was simply never triggered before.

josh

--
To: Josh Boyer <jwboyer@...>
Cc: Adrian Bunk <bunk@...>, Arjan van de Ven <arjan@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Friday, May 2, 2008 - 12:09 am

That's right.  The bug has been there basically forever (i.e. since
before 2.6.12-rc2 ;) and no-one has been able to trigger it reliably
before.

Paul.
--
To: Paul Mackerras <paulus@...>
Cc: Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Friday, May 2, 2008 - 4:29 am

But for users this is a recent regression since 2.6.24 worked
and 2.6.25 does not.

If this problem was on x86 Linus himself and some other core developers 
would most likely have debugged this issue and Linus would have delayed 
the release of 2.6.25 for getting it fixed there.

And stuff that "only showed up on one particular machine" often shows up 
on many machines (we only know in hindsight) and the "one particular 
machine" is often due to the fact that of the many machines that might 
trigger a regression only one was used for testing this -rc kernel.

This not in any way meant against you personally, and due to the fact 
that the powerpc port is among the better maintained parts of the kernel 
this regression eventually got fixed, but in many other parts of the 
kernel this would have been one more of the many regressions that were 

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--
To: Adrian Bunk <bunk@...>
Cc: Paul Mackerras <paulus@...>, Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Friday, May 2, 2008 - 10:58 am

Totally and utterly immaterial.

If it's a timing-related bug, as far as developers are concerned, nothing 
they did introduced the problem.

So anybody who think s that "process" should have caught it is just being 
stupid. 

Adrian, you're one of the absolutely *worst* in the camp of "everything 
should be perfect". You really need to realize that reality is messy, and 
things cannot be pefect.

You also need to realize and *understand* that aiming for "good" is 
actually much BETTER than trying to aim for "perfect".

Perfect is the enemy of good.

			Linus
--
To: Linus Torvalds <torvalds@...>
Cc: Adrian Bunk <bunk@...>, Paul Mackerras <paulus@...>, Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Friday, May 2, 2008 - 11:44 am

So I would like to ask you what an user should do when facing what is
probably a timing-related bug, as it appears I have the bad luck
of hitting one.

See for example my comments after this one 
http://bugzilla.kernel.org/show_bug.cgi?id=10117#c11

This same problem is still present with yesterday's git, and sometimes
it hangs without hpet=disable and sometimes it doesn't. (And never
with hpet=disable in the boot command line)

And when it hangs I can see only _one_ "Switched to high resolution mode
on CPU x" message before the hang point, and when it boots fine there
is always the two of them in sequence:

Switched to high resolution mode on CPU 1
Switched to high resolution mode on CPU 0

And using vga=6 or vga=0x0364 makes a difference in the probability
of hanging.

I am just waiting -rc1 to be released to send an email with my
problem again, as I am unable to debug this myself.
I think this is ok from my part, right?


--
To: Carlos R. Mafra <crmafra2@...>
Cc: Adrian Bunk <bunk@...>, Paul Mackerras <paulus@...>, Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Friday, May 2, 2008 - 12:28 pm

Quite frankly, it will depend on the bug.

If it's *reliably* timing-related (which sounds crazy, but is not at all 
unheard of), it can be reliably bisected down to some totally unrelated 
commit that doesn't actually introduce the problem at all, but that 
reliably turns it on or off.

That can be very misleading, and can cause us to basically revert a good 
commit, only to not actually fix the bug (and possibly re-introduce the 
bug that the reverted commit tried to fix).

But sometimes it gives us a clue where the timing problem is. But quite 
frankly, that seems to be the exception rather than the rule.

There have been issues that literally seemed to depend on things like 
cacheline placement etc, where changing config options for code that was 
never actually even *run* would change timing just enough to show a bug 
pseudo-reliably or not at all.

The good news is that those timing issues are really quite rare. 

Tha bad news is that when they happen, they are almost totally 

Hey, it may well be a HPET+NOHZ issue. But it could also be that HPET is 

.. and yeah, these kinds of really odd and obviously totally unrelated 
issues are a sign of a bug that is either simply hardware instability or 
very subtly timing-related.

The reason I mention hardware instability is that there really are bugs 
that happen due to (for example) power supply instabilities. Brownouts 
under heavy load have been causes of problems, but perhaps surprisingly, 
so has _idle_ time thanks to sleep-states!

The latter is probably due to bad powr conditioning on the CPU power 
lines, where the huge current swings (going at high CPU power to low, and 
back again) not only have made soem motherboards "sing" (or "hum", 
depending on frequency) but also causes voltage instability and then 
the CPU crashes.

Am I saying that's the reason you see problems? Probably not. Most 
instabilities really are due to kernel bugs. But hardware instabilities do 

Yes. You've been a good bug reporter, and...
To: Linus Torvalds <torvalds@...>
Cc: Adrian Bunk <bunk@...>, Paul Mackerras <paulus@...>, Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>, <venkatesh.pallipadi@...>
Date: Friday, May 2, 2008 - 1:15 pm

It happens a bit before that because when it hangs it doesn't 
print the above lines, and when it does not hang these lines are

Yes you are right. When I have luck and the boot succeeds my Sony laptop

A few days ago I found this message in lkml in reply to a hpet patch
http://lkml.org/lkml/2007/5/7/361 in which the reporter also had 
a similar hang, which was cured by hpet=disable. 

So it is in my TODO list to try to check out if that patch is 
in the current -git and whether it can be reverted somehow (I 
added Venki to the Cc: now)

Thanks a lot for the answer!
--
To: Carlos R. Mafra <crmafra2@...>, Linus Torvalds <torvalds@...>
Cc: Adrian Bunk <bunk@...>, Paul Mackerras <paulus@...>, Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>, <tglx@...>, Len Brown <lenb@...>
Date: Friday, May 2, 2008 - 2:02 pm

It depends on whether we are HPET is being force detected based on the
chipset or whether it was exported by the BIOS in ACPI table.

If it was force enabled and above patch is having any effect, then you

In any case, off late there seems to be quite a few breakages that are
related to HPET/timer interrupts. One of them was on a system which has
HPET being exported by BIOS
http://bugzilla.kernel.org/show_bug.cgi?id=10409
And the other one where we are force enabling based on chipset
http://bugzilla.kernel.org/show_bug.cgi?id=10561

And then we have hangs once in a while reports by you, Roman and Mark
here
http://bugzilla.kernel.org/show_bug.cgi?id=10377
http://bugzilla.kernel.org/show_bug.cgi?id=10117


Thanks,
Venki
--
To: Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: Carlos R. Mafra <crmafra2@...>, Linus Torvalds <torvalds@...>, Adrian Bunk <bunk@...>, Paul Mackerras <paulus@...>, Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>, <tglx@...>, Len Brown <lenb@...>
Date: Friday, May 9, 2008 - 12:32 pm

..

Yeah.  This particular bug first appeared when NOHZ &amp; HPET were added.
Somebody once suggested it had something to do with an SMI interrupt
happening in the midst of HPET calibration or some such thing.

But nobody who works on the HPET code has ever shown more than a casual
interest in helping to track down and fix whatever the problem is.

Cheers
--
To: Mark Lord <lkml@...>
Cc: Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Linus Torvalds <torvalds@...>, Adrian Bunk <bunk@...>, Paul Mackerras <paulus@...>, Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>, <tglx@...>, Len Brown <lenb@...>
Date: Friday, May 9, 2008 - 3:30 pm

I said I was waiting for -rc1 to be released to send another email
about my HPET problem, but curiously with v2.6.26-rc1-6-gafa26be 
my laptop did not hang after 30+ boots and counting. 

Somewhere between 2.6.25-07000-(something) and the above kernel
something happened which changed significantly the probability
of hanging during boot. 

I could not boot more than 3 times in
a row without hanging with kernels up to 2.6.25-07000 (approximately),
and now I am still booting v2.6.26-rc1-6-gafa26be a few times a day
and no hangs yet.

Yesterday I started a "reverse" bisection, trying to find which
commit "fixed" it, but I still didn't finish (but it is past
-7200).

Of course I am not sure if after the 100th boot the latest -git

Well, I would like to thank Venki for his effort because he even
answered some private emails from me about this issue and is 
tracking the bugzillas about it.
--
To: Mark Lord <lkml@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Linus Torvalds <torvalds@...>, Adrian Bunk <bunk@...>, Paul Mackerras <paulus@...>, Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>, <tglx@...>, Len Brown <lenb@...>
Date: Friday, May 9, 2008 - 4:39 pm

..

My experience with this bug, since 2.6.20 or so, has been that it comes
and goes with even the most innocent change in the .config file,
like turning frame pointers on/off.

Cheers
--
To: Adrian Bunk <bunk@...>
Cc: Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Friday, May 2, 2008 - 6:16 am

I never actually saw a statement to that effect (i.e. that 2.6.24
worked) from Kamalesh.  I think people assumed that because he
reported it against version X that version X-1 worked, but we don't

If I had been able to replicate it, or if it had been seen on more
than one machine, I would probably have asked Linus to wait while we
fixed it.  

There's a risk management thing happening here.  Delaying a release is
a negative thing in itself, since it means that users have to wait
longer for the improvements we have made.  That has to be balanced
against the negative of some users seeing a regression.  It's not an
absolute, black-and-white kind of thing.  In this case, for a bug
being seen on only one machine, of a somewhat unusual configuration, I
considered it wasn't worth asking to delay the release.

Paul.
--
To: Paul Mackerras <paulus@...>
Cc: Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Friday, May 2, 2008 - 7:58 am

He reported it as

[BUG] 2.6.25-rc2-git4 - Regression Kernel oops while running kernbench and tbench on powerpc


No general disagreement on this.

And my example was not in any way meant against you - it's actually 
unusual and positive that a bug that once got the attention of being
on the regression lists gets fixed later.

Even worse is the situation with regressions people run into when 

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--
To: Adrian Bunk <bunk@...>
Cc: Arjan van de Ven <arjan@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Thursday, May 1, 2008 - 11:49 am

Precisely.  Cherry-picking a single example such as the 68k thing and then

Another fallacy which Arjan is pushing (even though he doesn't appear to
have realised it) is "all hardware is the same".

Well, it isn't.  And most of our bugs are hardware-specific.  So, I'd
venture, most of our bugs don't affect most people.  So, over time, by
Arjan's "important to enough people" observation we just get more and more
and more unfixed bugs.

And I believe this effect has been occurring.

And please stop regaling us with this kerneloops.org stuff.  It just isn't
very interesting, useful or representative when considering the whole
problem.  Very few kernel bugs result in a trace, and when they do they are
usually easy to fix and, because of this, they will get fixed, often
quickly.  I expect netdevwatchdogeth0transmittimedout.org would tell a
different story.

One thing which muddies all this up is that bug reporters vanish.  Over the
years I have sent thousands and thousands of ping emails to people who have
reported bugs via email, three to six months after the fact.  Some were
solved - maybe a fifth.  About the same proportion of reporters reply and
give some reason why they cannot work on the bug.  In the majorty of cases
people don't reply at all and I suspect they're in the same category of
cannot-work-on-the-bug.

And why can't they work on the bug?  Usually, because they found a
workaround.  People aren't going to spend months sitting in front of a
non-functional computer waiting for kernel developers to decide if their
machine is important enough to fix.  They will find a workaround.  They
will buy new hardware.  They will discover "noapic" (234000 google hits and
rising!).  They will swap it with a different machine.  They will switch to
a different distro which for some reason doesn't trigger the bug.  They
will use an older kernel.  They will switch to Solaris.  Etcetera.  People
are clever - they will find a way to get around it.

I figure that after a bug is reported w...
To: Andrew Morton <akpm@...>
Cc: Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Wednesday, April 30, 2008 - 9:13 pm

On Thu, 1 May 2008 08:49:19 -0700

no I'm pushing "some classes of hardware are much more popular/relevant

I did not say "most people". I believe "most people" aren't hitting
bugs right now (or there would be a lot more screaming).
What I do believe is that *within the bugs that hit*, even the hardware
specific ones, there's a clear prioritization by how many people hit

now that's a fallacy of your own.. if you care about that one, it's 1)
trivial to track and/or 2) could contain a WARN_ON_ONCE(), at which
point it's automatically tracked. (and more useful information I
suspect, since it suddenly has a full backtrace including driver info
in it)
By your argument we should work hard to make sure we're better at
creating traces for cases we detect something goes wrong.

if it's a hardware bug there's little we can do.
If it's a hardware specific bug, yeah then it becomes a function of how

Given that a normal PC has maybe 10 components... 
yes we don't want bugcreep that affects common hardware over time.
At the same time, by your argument, a bug that hits a piece of hardware
of which 5 are made (or left on this planet) is equally important to

This statement is so rediculous and self contradicting to what you
said before that I'm not even going to respond to it. 
--
To: Arjan van de Ven <arjan@...>
Cc: Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Friday, May 2, 2008 - 5:00 am

"popular/relevant" is hard to define.

E.g. if we'd go after "popular" we should only keep architectures like 
ARM and x86 and ditch architectures like ia64 and s390 that have puny 
userbases.


If your "or have the hardware in general" is meant seriously you have to
convince people that ARM must become a very high priority.

No matter whether one supports your "there's a clear prioritization" 
view or not it anyway doesn't currently work since the areas covered by 
people testing -rc kernels don't even remotely map the most popular 

kerneloops.org catches the easiest to solve bugs (there's a trace) and 
helps in getting them fixed.

That's a very good thing.

And if we get more bugs into this easy to resolve state that would be 
even better.

But it's only a small part of the complete picture of incoming bug 
reports.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--
To: Andrew Morton <akpm@...>
Cc: Adrian Bunk <bunk@...>, Arjan van de Ven <arjan@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Thursday, May 1, 2008 - 1:24 pm

So the question is if we have a thousand bugs which only affect one
person each, and 70 million Linux users, how much should we beat up
ourselves that 1,000 people can't use a particular version of the
Linux kernel, versus the 99.9% of the people for which the kernel
works just fine?

Sometimes, we can't make everyone happy.

At the recent Linux Collaboration Summit, we had a local user walk up
to a microphone, and loosely paraphrased, said, "WHINE WHINE WHINE
WHINE I have have a $30 DVD drive that doesn't work with Linux.  WHINE
WHINE WHINE WHINE WHINE What are *you* going to do to fix my problem?"

Some people like James responded very diplomatically, with "Well, you
have to understand, the developer might not have your hardware, and
there's a lot of broken out here, etc., etc."  What I wanted to tell
this user was, "Ask not what the Linux development community can do
for you.  Ask what *you* can do for Linux?"  Suppose this person had
filed a kernel bugzilla bug, and it was one of the hundreds or
thousands of non-handled bugs.  Sure, it's a tragedy that bugs pile
up.  But if they pile up because of crappy hardware, that's not a
major tragedy.  If we can figure out how to blacklist it, and move on,

Hey, in this particular case, if this user worked around the problem
by buying new hardware, it was probably the right solution.  As far as
we know we don't have a systematic problem where huge numbers DVD
drives aren't working, so if there are a few odd ball ones that are
out there, we just CAN'T self-flagellate ourselves that we're not

... and maybe we can't solve hardware bugs.  Or that crappy hardware
isn't worth holding back Linux development.  And I'm not sure ignoring
it is that horrible of a thing.  And in practice, if it's a hardware
bug in something which is very common, it *will* get noticed very
quickly and fixed.  But if it's in a hardware bug in some rare piece
of hardware, the user is going to have to either (a) help us fix it,
or (b) decide that his time is more ...
To: Theodore Tso <tytso@...>
Cc: <bunk@...>, <arjan@...>, <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <rostedt@...>
Date: Thursday, May 1, 2008 - 3:26 pm

On Thu, 1 May 2008 13:24:34 -0400

Many, many of these are regressions.  If old-linux works on that
hardware then new-linux can too.

(still wants to know what we did 2-3 years ago which caused thousands of
people to have to resort to using noapic and other apic-related boot option
workarounds)

--
To: Andrew Morton <akpm@...>
Cc: Theodore Tso <tytso@...>, <bunk@...>, <arjan@...>, <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <rostedt@...>
Date: Friday, May 2, 2008 - 6:23 am

Forcing APIC even when the BIOS didn't support them.

-Andi


--
To: Andrew Morton <akpm@...>
Cc: Theodore Tso <tytso@...>, <bunk@...>, <arjan@...>, <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Thursday, May 1, 2008 - 3:39 pm

Perhaps 2-3 years ago more people started using more hardware that
implements APIC. ;-)

-- Steve

--
To: Andrew Morton <akpm@...>
Cc: Adrian Bunk <bunk@...>, Arjan van de Ven <arjan@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Thursday, May 1, 2008 - 12:38 pm

And actually, core kernel developers are best for writing new bugs.

Really, the way I started out learning how the kernel ticks was to go and
try to solve some bugs that I was seeing (this was years ago). I get
people asking that they want to learn to be a kernel developer and they
ask what new feature should they work on? Well, honestly, the last thing
a newbie kernel developer should be doing is writing new bugs. We need to
send them to a URL that lists all the known bugs and have them pick one,
any one, and have them solve it. This would be the best way to learn part
of the kernel.

I even find that I understand my own code better when I'm in the debugging
phase.

People here mention differnt places to look at code, and besides the
kerneloops.org I really don't even know where to look for bugs, because I
haven't seen a URL to point me to.

The next time someone asks me how to get started in kernel programming, I
would love to tell them to go and look here, and solve the bugs. I'm
guessing that I should just point them to:

  http://janitor.kernelnewbies.org/

and tell them to focus on real bugs (not just comments and such) to get
fixed if they really want to learn the kernel.

-- Steve

--
To: Steven Rostedt <rostedt@...>
Cc: <bunk@...>, <arjan@...>, <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Thursday, May 1, 2008 - 1:18 pm

On Thu, 1 May 2008 12:38:23 -0400 (EDT)

bugzilla.kernel.org is, umm, improving.

It would be an intersting exercise for someone to spend a few days seeing
how many of the bugzilla reports they personally can reproduce.  I'd guess
"zero".  There's a lesson in that.

The problem with bugzilla will be that it will be hard to find reports
where the reporter will be able to work with you on the fix - we've let
them go cold.

The most fruitful place to find fixable bugs is linux-kernel.  People who
report bugs there are sufficiently motivated to have actually sent the
email and the bug is still recent, so they probably haven't done the
Solaris install yet.

--
To: Arjan van de Ven <arjan@...>
Cc: Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Thursday, May 1, 2008 - 8:53 am

Agreed.

Thanks,
Rafael
--
To: Arjan van de Ven <arjan@...>
Cc: Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Thursday, May 1, 2008 - 4:13 am

&lt;boggle&gt;

How about "a bug which we just added"?  One which is repeatable. 
Repeatable by a tester who is prepared to work with us on resolving it. 
Those bugs.

Rafael has a list of them.  We release kernels when that list still has tens of
unfixed regressions dating back up to a couple of months.

--
To: Andrew Morton <akpm@...>
Cc: Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Wednesday, April 30, 2008 - 10:15 am

On Thu, 1 May 2008 01:13:46 -0700


I know he does. But I will still argue that if that is all we work from, and treat
all of those equally, we're doing the wrong thing.
I'm sorry, but I really do not consider "ext4 doesn't compile on m68k" which is 
on that list to be as relevant as a "i915 drm driver crashes" bug which is among
us for a while and not on that list, just based on the total user base for either of those. 

Does that mean nobody should fix the m68k bug?
Someone who cares about m68k for sure should work on it, or if it's easy for an ext4 developer,
sure. But if the ext4 person has to spend 8 hours on it figuring cross compilers, I say 
we're doing something very wrong here. (no offense to the m68k people, but there's just
a few of you; maybe I should have picked voyager instead)

Maybe that's a "boggle" for you; but for me that's symptomatic of where we are today:
We don't make (effective) prioritization decisions. Such decisions are hard, because it 
effectively means telling people "I'm sorry but your bug is not yet important". That's
unpopular, especially if the reporter is very motivated on lkml. And it will involve a 
certain amount of non-quantifiable judgement calls, which also means we won't always be
right. Another hard thing is that lkml is a very self-selective audience. A bug may be 
reported three times there, but never hit otherwise, while another bug might not be reported
at all (or only once) while thousands and thousands of people are hitting it.

Not that we're doing all that bad, we ARE fixing the bugs (at least the oopses/warnings) that
are frequently hit. So I wouldn't blindly say we're doing a bad job at prioritizing. I would
rather say that if we focus only on what is left afterwards without doing a reality check,
we'll *always* have a negative view of quality, since there will *always* be bugs we don't 
fix. Linux well over ten million users (much more if you count embedded devices). 
A lot of them will have "standard" hardware, and a bunch of the...
To: Arjan van de Ven <arjan@...>
Cc: Andrew Morton <akpm@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Sunday, May 4, 2008 - 8:45 am

On that note, I'd really like to see better binary availability of cross 
compilers. While it's improved over the last few years mostly due to the 
crossgcc stuff it's still a pain. Ideally, they would be available through 
the distribution package manager even but failing that some dedicated place 
on kernel.org with x86-&gt;lots and some of the more widely used other 
combinations would quite definitely be good. Perhaps not really directly 
relevant to this thread as such, but still good.

Andrew maintain{s,ed} a number of them at

http://userweb.kernel.org/~akpm/cross-compilers/

But as you see, most of the stuff there is really old again...

Rene
--
To: Rene Herman <rene.herman@...>
Cc: Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>, Vegard Nossum <vegard.nossum@...>
Date: Sunday, May 4, 2008 - 9:00 am

You're most welcome to help out Vegard to do this:

http://www.kernel.org/pub/tools/crosstool/
--
To: linux kernel list <linux-kernel@...>
Date: Monday, May 5, 2008 - 9:13 am

You could also use ct-ng:

http://ymorin.is-a-geek.org/dokuwiki/projects/crosstool

Works excellent for me :)


cu
-- 
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux IT service - http://www.metux.de/
---------------------------------------------------------------------
 Please visit the OpenSource QM Taskforce:
 	http://wiki.metux.de/public/OpenSource_QM_Taskforce
 Patches / Fixes for a lot dozens of packages in dozens of versions:
	http://patches.metux.de/
---------------------------------------------------------------------
--
To: Pekka Enberg <penberg@...>
Cc: Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>, Vegard Nossum <vegard.nossum@...>
Date: Sunday, May 4, 2008 - 9:19 am

Ah, thanks, lovely, just new I see (and yes, I meant s/grossgcc/crosstool/). 
Good thing. I'll check it out and see if there's anything to add.

Rene.
--
To: Arjan van de Ven <arjan@...>
Cc: Andrew Morton <akpm@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Thursday, May 1, 2008 - 8:42 am

It's not that clear-cut, either. Something which manifests itself as a
build failure or an immediate test failure on m68k alone, might actually
turn out to cause subtle data corruption on other platforms.

You can't always know that it isn't important, just because it only
shows up in some esoteric circumstances. You only really know how
important it was _after_ you've fixed it.

That obviously doesn't help us to prioritise.

-- 
dwmw2

--
To: David Woodhouse <dwmw2@...>
Cc: Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Monday, May 5, 2008 - 6:03 am

Ideally, you'd do an analysis first and then prioritize, based
on the severity of the bug, its exposure, how easy it is it fix,
etc.  If while doing that you already have a fix at hand, you're
almost done :)

Recursively, there's the problem of which bugs you analyze first.
I'm inclined to say that you want to analyze most if not all bug reports
in higher priority than working on fixing non-critical bug.

Benny
--
To: David Woodhouse <dwmw2@...>
Cc: Andrew Morton <akpm@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Wednesday, April 30, 2008 - 11:02 am

On Thu, 01 May 2008 13:42:44 +0100

absolutely. I'm not going to argue that prioritization is easy. Or 
that we'll be able to get it right all the time.
--
To: Andrew Morton <akpm@...>
Cc: <arjan@...>, <bunk@...>, <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <rostedt@...>
Date: Thursday, May 1, 2008 - 5:16 am

And leave unfixed all the regressions introduced in earlier kernel versions 
and known at the time of the release of that version but still present in 
the current version? Not to mention all the other bugs reported by users of 

That can be true for not-so-recently introduced bugs too.

There are so many bugs out there and developers tend to focus on new ones 
leaving a lot of others unattended, both important and not so important 
ones.

Which ones should someone focus on? Maybe on the ones that someone (helped) 
introduce him/herself. Maybe that should even sometimes be prioritized over 
introducing new bugs^W^W^Wdoing new development.
--
To: linux kernel list <linux-kernel@...>
Date: Thursday, May 1, 2008 - 6:30 am

&lt;big_snip /&gt;

Hi folks,


what do you think about Gentoo's "bug-wrangler" concept ?
Maybe could do something similar:

An Tester group (which eg. should be the entry point for newbies),
is responsible for receiving bug reports from users (maybe even 
distro maintainers who're not directly involved in kernel dev.). 
They try to reproduce the bugs and find out as much as they can,
then file a report to the actual kernel devs (just critical bugs 
are directly kicked to the devs with high priority). Maybe this 
group could also keep users informed about fixes and give some 
upgrade advise, etc.

This way we can build an good technical support (independent
from distributors ;-P), newbies can learn on the job and te 
load on kernel devs is reduced, so they can better concentrate
on their core competences.


What do you think about this ?


cu
-- 
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux IT service - http://www.metux.de/
---------------------------------------------------------------------
 Please visit the OpenSource QM Taskforce:
 	http://wiki.metux.de/public/OpenSource_QM_Taskforce
 Patches / Fixes for a lot dozens of packages in dozens of versions:
	http://patches.metux.de/
---------------------------------------------------------------------
--
To: Enrico Weigelt <weigelt@...>
Cc: linux kernel list <linux-kernel@...>
Date: Thursday, May 1, 2008 - 9:02 am

Andrew already does more or less this.

The problems are:
- kernel bugs tend to very quickly reach the state where you need expert
  knowledge in some area, and there's definitely not much room for
  newbies in bug handling
- "try to reproduce the bugs" works for much software, but in the 

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--
To: <bunk@...>
Cc: <torvalds@...>, <akpm@...>, <rjw@...>, <linux-kernel@...>, <jirislaby@...>, <rostedt@...>
Date: Wednesday, April 30, 2008 - 8:41 pm

From: Adrian Bunk &lt;bunk@kernel.org&gt;

kernel-testers@vger.kernel.org has been created, feel free to
use it
--
To: David Miller <davem@...>
Cc: <torvalds@...>, <akpm@...>, <rjw@...>, <linux-kernel@...>, <jirislaby@...>, <rostedt@...>
Date: Thursday, May 1, 2008 - 9:23 am

Thanks  :-)
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--
To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 5:52 pm

One thing is that we keep fragmenting the tester base by adding new 
confidence levels: we now have -mm, -next, mainline -git, mainline -rc, 
mainline release, stable, distro testing, and distro release (and some 
distros even have aggressive versus conservative tracks.)  Furthermore, 
thanks to craniorectal immersion on the part of graphics vendors, a lot 
of users have to run proprietary drivers on their "main work" systems, 
which means they can't even test newer releases even if they would dare.

This fragmentation is largely intentional, of course -- everyone can 
pick a risk level appropriate for them -- but it does mean:

a) The lag for a patch to ride through the pipeline is pretty long.
b) The section of people who are going to use the more aggressive trees 
for "real work" testing is going to be small.

	-hpa

--
To: H. Peter Anvin <hpa@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Thursday, May 1, 2008 - 12:39 pm

And another problem is that often, it's hard to get good "real work" coverage
over the whole tree.  I just discovered an apparent borkage somewhere in
the networking/wireless area that seems to have gotten into Linus's tree
somewhere between 24-rc8 and 24-final, just because I haven't beaten on
my wireless card in the last few weeks, so I didn't notice a regression in
'ip link show' related to the rfkill switch...
To: H. Peter Anvin <hpa@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 11:24 pm

Since I poke my head out of the foxhole every once in a while with a
relatively late-breaking bug report, I thought I should chime in...
Mr. Anvin has pretty much nailed it...

As the kernel development process has evolved, which "confidence level"
I select has evolved as well.  The thing that *hasn't* changed through
the years is, I tend to pick a "confidence level" that is appropriately
close to "mainline" and has an update release schedule roughly compatible
with my ability to keep up with it.  Specifically, if it takes me several
hours to download a patch set, apply it, build the new kernel, and test
on multiple platforms/architectures, then the update release schedule is
probably going to have to be no more often than twice a week if I'm going
to be at all interested in even trying to keep up with it.  In 2008, the
"-rcX" updates are a good fit.  In the not-too-distant past, keeping up
with 2.5.X.Y was no problem.

Yes, I realize I don't *have* to test every revision level in every
major tree, but I don't have to think about which one to pick for testing
if I can keep up with the update release schedule :-).

-- 
------------------------------------------------------------------------
Bob Tracy          |  "I was a beta tester for dirt.  They never did
rct@frus.com       |   get all the bugs out." - Steve McGrew on /.
------------------------------------------------------------------------
--
To: Linus Torvalds <torvalds@...>
Cc: <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 4:54 pm

On Wed, 30 Apr 2008 13:31:08 -0700 (PDT)

Well.  We'll see.

linux-next is more than another-tree-to-test.  It is (or will be) a change
in our processes and culture.  For a start, subsystem maintainers can no
longer whack away at their own tree as if the rest of use don't exist. 
They now have to be more mindful of merge issues.

Secondly, linux-next is more accessible than -mm: more releases, more
stable, better tested by he-who-releases it, available via git:// etc.  It
should be very easy for developers to do their weekly "does linux-next
boot" test.

Plus, of course, people who complain about merge-window breakage only to
find that the breakage was already in linux-next except they didn't test it
will not have a leg to stand on.


I feared that linux-next wouldn't work: that Stephen would stomp off in
disgust at all the crap people send at him.  But in fact it seems to be
going very well from that POV.

I get the impression that we're seeing very little non-Stephen testing of
linux-next at this stage.  I hope we can ramp that up a bit, initially by
having core developers doing at least some basic sanity testing.



linux-next does little to address our two largest (IMO) problems:
inadequate review and inadequate response to bug and regression reports. 
But those problems are harder to fix..

--
To: Andrew Morton <akpm@...>
Cc: Linus Torvalds <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Friday, May 9, 2008 - 5:28 am

Probably it would make sense also for distro vendors to make linux-next 
snapshosts available in their development distro branches (redhat's 
rawhide, opensuse's factory, etc), to make it easier to test by those 
users who are willing to test if it works in their environment, but don't 
want to compile kernels themselves.

-- 
Jiri Kosina

--
To: Jiri Kosina <jkosina@...>
Cc: Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Friday, May 9, 2008 - 11:00 am

I try to test linux-next on a few SATA test boxes, but it's definitely 

Agreed...  any lead time on linux-next testing would be great.

	Jeff



--
To: Andrew Morton <akpm@...>
Cc: Linus Torvalds <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Ingo Molnar <mingo@...>
Date: Wednesday, April 30, 2008 - 5:42 pm

Andrew, the latter thing is a very good point. For me personally, the fact
that -mm is not available via git is the major obstacle for trying your
tree more frequently than just a few times per year. How difficult it
would be to switch to git for you? I guess there are good reasons for still
using the source code management system from the last century; please
correct me if I'm wrong, but I believe that using a modern SCM system could

For busy (or lazy) people like myself, the big problem with linux-next are
the frequent merge breakages, when pulling the tree stops with "you are in
the middle of a merge conflict". Perhaps, there is a better way to resolve
this without just removing the whole repo and cloning it once again - this
is what I'm doing, please flame me for stupidity or ignorance if I simply
am not aware of some git feature that could be useful in such cases.

Finally, while the list is at it, I'd like to make another technical comment.
My development zoo is a pretty fast 4-way Xeon server, where I keep a handful
of trees, a few cross-toolchains, Qemu, etc. The network setup in our
organization is such that I can use git only over http from that server. This
cannot be changed, it's the company policy. In view of that, it's a pity that
quite a few tree owners don't make sure that http access to their trees works
(I added Ingo to the Cc: list in the hope that this will be corrected soon for
the x86 tree, which I am using quite extensively), and I have to use a much
slower machine (a two and a half year old laptop) for these trees. Please see
this:

&lt;&lt;&lt;&lt;&lt;&lt;&lt;

[dmitri.vorobiev@amber ~]$ git clone http://www.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
Initialized empty Git repository in /home/dmitri.vorobiev/linux-2.6-x86/.git/
Getting alternates list for http://www.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
Also look at http://www.kernel.org/home/ftp/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/
Getting pack list for http:...
To: Dmitri Vorobiev <dmitri.vorobiev@...>
Cc: <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <mingo@...>
Date: Wednesday, April 30, 2008 - 6:10 pm

On Thu, 01 May 2008 01:42:59 +0400

Every -mm release if available via git://, as described in the release
announcements.

The scripts which do this are a bit cantankerous but I believe they do
work.

&lt;tests it&gt;


Fatal, I expect.  A tool which manages source-code files is just the wrong

Really?  Doesn't Stephen handle all those problems?  It should be a clean

Don't know what to do about that, sorry.  An off-site git-&gt;http proxy might
work, but I doubt if anyone has written the code.

--
To: Andrew Morton <akpm@...>
Cc: Dmitri Vorobiev <dmitri.vorobiev@...>, <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <mingo@...>
Date: Thursday, May 1, 2008 - 2:15 am

Would you mind using stgit? That you way have the queue patch
functionality, yet a simple git-push -f will send the whole
patch stack over to a repo (without the stgit bits that is),
leaving what looks like a regular tree with just lots of
recent commits. Does not even need extra scripts to do a

Indeed, assuming the remote is set up and you have a local branch,
`git reset --hard mm/master` after a fetch is the thing.
But be sure not to have any changed files.
--
To: Andrew Morton <akpm@...>
Cc: <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <mingo@...>
Date: Wednesday, April 30, 2008 - 7:04 pm

Andrew Morton пишет:


But there is another solution, which I believe is straightforward: have the tree
maintainer set up his tree properly.


--
To: Andrew Morton <akpm@...>
Cc: Dmitri Vorobiev <dmitri.vorobiev@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <mingo@...>
Date: Wednesday, April 30, 2008 - 6:19 pm

It should indeed be a clean fetch, but I wonder if Dmitri perhaps does a 
"git pull" - which will do the fetch, but then try to _merge_ that fetched 
state into whatever the last base Dmitri happened to have.

Dmitry: you cannot just "git pull" on linux-next, because each version of 
linux-next is independent of the next one. What you should do is basically

	# Set this up just once..
	git remote add linux-next git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git

and then after that, you keep on just doing

	git fetch linux-next
	git checkout linux-next/master

which will get you the actual objects and check out the state of that 
remote (and then you'll normally never be on a local branch on that tree, 
git will end up using a so-called "detached head" for this).

IOW, you should never need to do any merges, because Stephen did all those 
in linux-next already.

			Linus
--
To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Dmitri Vorobiev <dmitri.vorobiev@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <mingo@...>
Date: Thursday, May 1, 2008 - 7:06 pm

Just to add some emphasis here - this is something that took me a long time to figure out, and since it is the pattern for dealing with the x86 trees and with the mm git tree and with linux-next, it would help if it were documented somewhere (not that I can imagine where).  Once you know it, it becomes obvious, but try staring at a merge conflict for a while trying to figure out what to do, and it gets frustrating.  I wonder if we can guess how many testers abandon the mm git tree or the linux-next tree because of this.

It might be nice if git supported a command like git-remote-help or something that would fetch a predefined help file from a remote tree that describes the workflow for that tree.

But at least with an extra reply to this mail, it might creep higher in the google search results when looking for merge conflicts with linux-next.

-- 
Kevin Winchester
 
--
To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <mingo@...>
Date: Wednesday, April 30, 2008 - 6:28 pm

Linus, thanks a lot for the detailed explanation. Indeed, it seems that I foolishly
tried to duplicate Stephen's work. In the future I'll do as you suggest here.


--
To: Dmitri Vorobiev <dmitri.vorobiev@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <mingo@...>, Stephen Rothwell <sfr@...>
Date: Thursday, May 1, 2008 - 12:26 pm

That "howto" should probably be added to the linux-next announcements...
(CC'ing Stephen)
--
To: Diego Calleja <diegocg@...>
Cc: Dmitri Vorobiev <dmitri.vorobiev@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <rjw@...>, &l