Re: RFC: starting a kernel-testers group for newbies

Previous thread: linux-next: upstream build failure: v4l/dvb by Stephen Rothwell on Tuesday, April 29, 2008 - 9:41 pm. (5 messages)

Next thread: [GIT PULL] ext4 update by Theodore Ts'o on Tuesday, April 29, 2008 - 10:45 pm. (1 message)
To: <linux-kernel@...>
Date: Tuesday, April 29, 2008 - 10:03 pm

This is starting to get beyond frustrating for me.

Yesterday, I spent the whole day bisecting boot failures
on my system due to the totally untested linux/bitops.h
optimization, which I fully analyzed and debugged.

Today, I had hoped that I could get some work done of my
own, but that's not the case.

Yet another bootup regression got added within the last 24
hours.

I don't mind fixing the regression or two during the merge
window but THIS IS ABSOLUTELY, FUCKING, REDICULIOUS!

The tree breaks every day, and it's becomming an extremely
non-fun environment to work in.

We need to slow down the merging, we need to review things
more, we need people to test their fucking changes!
--

To: David Miller <davem@...>
Cc: <linux-kernel@...>, Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 3:36 pm

Well, I must say I second that.

I'm not seeing regressions myself this time (well, except for the one that
Jiri fixed), but I did find a few of them during the post-2.6.24 merge window
and I wouldn't like to repeat that experience, so to speak.

IMO, the merge window is way too short for actually testing anything. I rebuild
the kernel once or even twice a day and there's no way I can really test it.
I can only check if it breaks right away. And if it does, there's no time to
find out what broke it before the next few hundreds of commits land on top of
that.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: <davem@...>, <linux-kernel@...>, <torvalds@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 4:15 pm

On Wed, 30 Apr 2008 21:36:57 +0200

<jumps up and down>

There should be nothing in 2.6.x-rc1 which wasn't in 2.6.x-mm1!

_anything_ which appears in 2.6.x-rc1 and which wasn't in 2.6.x-mm1 was
snuck in too late (OK, apart from trivia and bugfixes).

If we decide that we need to fix the oh-shit-lets-slam-this-in-and-hope
problem then I expect we can do so, via fairly relible means.

But the first attempt at solving it should be to ask people to not do that.
--

To: Andrew Morton <akpm@...>
Cc: Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 4:31 pm

The problem I see with both -mm and linux-next is that they tend to be
better at finding the "physical conflict" kind of issues (ie the merge
itself fails) than the "code looks ok but doesn't actually work" kind of
issue.

Why?

The tester base is simply too small.

Now, if *that* could be improved, that would be wonderful, but I'm not
seeing it as very likely.

I think we have fairly good penetration these days with the regular -git
tree, but I think that one is quite frankly a *lot* less scary than -mm or
-next are, and there it has been an absolutely huge boon to get the kernel
into the Fedora test-builds etc (and I _think_ Ubuntu and SuSE also
started something like that).

So I'm very pessimistic about getting a lot of test coverage before -rc1.

Maybe too pessimistic, who knows?

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Wednesday, April 30, 2008 - 8:31 pm

First of all:
I 100% agree with Andrew that our biggest problems are in reviewing code
and resolving bugs, not in finding bugs (we already have far too many
unresolved bugs).

But although testing mustn't replace code reviews it is a great help,
especially for identifying regressions early.

Finding testers should actually be relatively easy since it doesn't
require much knowledge from the testers.

And it could even solve a second problem:

It could be a way for getting newbies into kernel development.

We actually do only rarely have tasks suitable as janitor tasks for
newbies, and the results of people who do neither know the kernel
nor know C running checkpatch on files in the kernel have already
been discussed extensively...

I'll try to do this:
- create some Wiki page
- get a mailing list at vger
- point newbies to this mailing list
- tell people there which kernels to test
- figure out and document stuff like how to bisect between -next kernels

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Wednesday, April 30, 2008 - 3:03 am

On Thu, 1 May 2008 03:31:25 +0300

I would argue instead that we don't know which bugs to fix first.
We're never going to fix all bugs, and to be honest, that's ok.
As long as we fix the important bugs, we're doing really well.
And at least for the kerneloops.org reported issues, we're doing quite ok.

For me, 'important' is a combination of effect of the bug and the number of people
it'll hit. A compiler warning on parisc is less important than easy to trigger filesystem corruption
in ext3 that way; more people will hit it and the effect is more grave.

For oopses and WARN_ON()'s were getting to the hang of this now with kerneloops.org,
at least for the oopses that aren't really hard fatal. One thing I learned at least is that
lkml is a poor representation of what people actually hit; it's a very very selective
audience.
oopses/warnons are only a subset of the bugs of course... but still.

So there's a few things we (and you / janitors) can do over time to get better data on what issues
people hit:
1) Get automated collection of issues more wide spread. The wider our net the better we know which
issues get hit a lot, and plain the more data we have on when things start, when they stop, etc etc.
Especially if you get a lot of testers in your project, I'd like them to install the client for easy reporting
of issues.
2) We should add more WARN_ON()s on "known bad" conditions. If it WARN_ON()'s, we can learn about it via
the automated collection. And we can then do the statistics to figure out which ones happen a lot.
3) We need to get persistent-across-reboot oops saving going; there's some venues for this

--

To: Arjan van de Ven <arjan@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Thursday, May 1, 2008 - 7:30 am

That might be OK.

But our current status quo is not OK:

Check Rafael's regressions lists asking yourself
"How many regressions are older than two weeks?"

The kernel Bugzilla curerntly knows about 212 open regression bugs.
(And many more have not made it into Bugzilla.)

We have unmaintained and de facto unmaintained parts of the kernel where

No disagreement on this, its just a different issue than our bug fixing
problem.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: Adrian Bunk <bunk@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Wednesday, April 30, 2008 - 10:20 am

On Thu, 1 May 2008 14:30:38 +0300

"ext4 doesn't compile on m68k".
YAWN.

Wrong question...
"How many bugs that a sizable portion of users will hit in reality are there?"

And how many people are hitting those issues? If a part of the kernel is really
important to enough people, there tends to be someone who stands up to either fix
the issue or start de-facto maintaining that part.
And yes I know there's parts where that doesn't hold. But to be honest, there's
not that many of them that have active development (and thus get the biggest

No it's not! Knowing earlier and better which bugs get hit is NOT different
--

To: Arjan van de Ven <arjan@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Thursday, May 1, 2008 - 9:21 am

"Kernel oops while running kernbench and tbench on powerpc" took more
than 2 months to get resolved, and we ship 2.6.25 with this regression.

Granted that compared to x86 there's not a sizable portion of users
crazy enough to run Linux on powerpc machines...

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: Adrian Bunk <bunk@...>
Cc: Arjan van de Ven <arjan@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Thursday, May 1, 2008 - 10:08 pm

That was a very subtle bug that only showed up on one particular
powerpc machine. I was not able to replicate it on any of the powerpc
machines I have here. Nevertheless, we found it and we have a fix for
it. I think that's an example of the process working. :)

Paul.
--

To: Paul Mackerras <paulus@...>
Cc: Adrian Bunk <bunk@...>, Arjan van de Ven <arjan@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Thursday, May 1, 2008 - 11:10 pm

Was it even a regression in the classical sense of the word? Seemed
more of a latent bug that was simply never triggered before.

josh

--

To: Josh Boyer <jwboyer@...>
Cc: Adrian Bunk <bunk@...>, Arjan van de Ven <arjan@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Friday, May 2, 2008 - 12:09 am

That's right. The bug has been there basically forever (i.e. since
before 2.6.12-rc2 ;) and no-one has been able to trigger it reliably
before.

Paul.
--

To: Paul Mackerras <paulus@...>
Cc: Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Friday, May 2, 2008 - 4:29 am

But for users this is a recent regression since 2.6.24 worked
and 2.6.25 does not.

If this problem was on x86 Linus himself and some other core developers
would most likely have debugged this issue and Linus would have delayed
the release of 2.6.25 for getting it fixed there.

And stuff that "only showed up on one particular machine" often shows up
on many machines (we only know in hindsight) and the "one particular
machine" is often due to the fact that of the many machines that might
trigger a regression only one was used for testing this -rc kernel.

This not in any way meant against you personally, and due to the fact
that the powerpc port is among the better maintained parts of the kernel
this regression eventually got fixed, but in many other parts of the
kernel this would have been one more of the many regressions that were

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: Adrian Bunk <bunk@...>
Cc: Paul Mackerras <paulus@...>, Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Friday, May 2, 2008 - 10:58 am

Totally and utterly immaterial.

If it's a timing-related bug, as far as developers are concerned, nothing
they did introduced the problem.

So anybody who think s that "process" should have caught it is just being
stupid.

Adrian, you're one of the absolutely *worst* in the camp of "everything
should be perfect". You really need to realize that reality is messy, and
things cannot be pefect.

You also need to realize and *understand* that aiming for "good" is
actually much BETTER than trying to aim for "perfect".

Perfect is the enemy of good.

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Adrian Bunk <bunk@...>, Paul Mackerras <paulus@...>, Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Friday, May 2, 2008 - 11:44 am

So I would like to ask you what an user should do when facing what is
probably a timing-related bug, as it appears I have the bad luck
of hitting one.

See for example my comments after this one
http://bugzilla.kernel.org/show_bug.cgi?id=10117#c11

This same problem is still present with yesterday's git, and sometimes
it hangs without hpet=disable and sometimes it doesn't. (And never
with hpet=disable in the boot command line)

And when it hangs I can see only _one_ "Switched to high resolution mode
on CPU x" message before the hang point, and when it boots fine there
is always the two of them in sequence:

Switched to high resolution mode on CPU 1
Switched to high resolution mode on CPU 0

And using vga=6 or vga=0x0364 makes a difference in the probability
of hanging.

I am just waiting -rc1 to be released to send an email with my
problem again, as I am unable to debug this myself.
I think this is ok from my part, right?

--

To: Carlos R. Mafra <crmafra2@...>
Cc: Adrian Bunk <bunk@...>, Paul Mackerras <paulus@...>, Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Friday, May 2, 2008 - 12:28 pm

Quite frankly, it will depend on the bug.

If it's *reliably* timing-related (which sounds crazy, but is not at all
unheard of), it can be reliably bisected down to some totally unrelated
commit that doesn't actually introduce the problem at all, but that
reliably turns it on or off.

That can be very misleading, and can cause us to basically revert a good
commit, only to not actually fix the bug (and possibly re-introduce the
bug that the reverted commit tried to fix).

But sometimes it gives us a clue where the timing problem is. But quite
frankly, that seems to be the exception rather than the rule.

There have been issues that literally seemed to depend on things like
cacheline placement etc, where changing config options for code that was
never actually even *run* would change timing just enough to show a bug
pseudo-reliably or not at all.

The good news is that those timing issues are really quite rare.

Tha bad news is that when they happen, they are almost totally

Hey, it may well be a HPET+NOHZ issue. But it could also be that HPET is

.. and yeah, these kinds of really odd and obviously totally unrelated
issues are a sign of a bug that is either simply hardware instability or
very subtly timing-related.

The reason I mention hardware instability is that there really are bugs
that happen due to (for example) power supply instabilities. Brownouts
under heavy load have been causes of problems, but perhaps surprisingly,
so has _idle_ time thanks to sleep-states!

The latter is probably due to bad powr conditioning on the CPU power
lines, where the huge current swings (going at high CPU power to low, and
back again) not only have made soem motherboards "sing" (or "hum",
depending on frequency) but also causes voltage instability and then
the CPU crashes.

Am I saying that's the reason you see problems? Probably not. Most
instabilities really are due to kernel bugs. But hardware instabilities do

Yes. You've been a good bug reporter, and...

To: Linus Torvalds <torvalds@...>
Cc: Adrian Bunk <bunk@...>, Paul Mackerras <paulus@...>, Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>, <venkatesh.pallipadi@...>
Date: Friday, May 2, 2008 - 1:15 pm

It happens a bit before that because when it hangs it doesn't
print the above lines, and when it does not hang these lines are

Yes you are right. When I have luck and the boot succeeds my Sony laptop

A few days ago I found this message in lkml in reply to a hpet patch
http://lkml.org/lkml/2007/5/7/361 in which the reporter also had
a similar hang, which was cured by hpet=disable.

So it is in my TODO list to try to check out if that patch is
in the current -git and whether it can be reverted somehow (I
added Venki to the Cc: now)

Thanks a lot for the answer!
--

To: Carlos R. Mafra <crmafra2@...>, Linus Torvalds <torvalds@...>
Cc: Adrian Bunk <bunk@...>, Paul Mackerras <paulus@...>, Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>, <tglx@...>, Len Brown <lenb@...>
Date: Friday, May 2, 2008 - 2:02 pm

It depends on whether we are HPET is being force detected based on the
chipset or whether it was exported by the BIOS in ACPI table.

If it was force enabled and above patch is having any effect, then you

In any case, off late there seems to be quite a few breakages that are
related to HPET/timer interrupts. One of them was on a system which has
HPET being exported by BIOS
http://bugzilla.kernel.org/show_bug.cgi?id=10409
And the other one where we are force enabling based on chipset
http://bugzilla.kernel.org/show_bug.cgi?id=10561

And then we have hangs once in a while reports by you, Roman and Mark
here
http://bugzilla.kernel.org/show_bug.cgi?id=10377
http://bugzilla.kernel.org/show_bug.cgi?id=10117

Thanks,
Venki
--

To: Pallipadi, Venkatesh <venkatesh.pallipadi@...>
Cc: Carlos R. Mafra <crmafra2@...>, Linus Torvalds <torvalds@...>, Adrian Bunk <bunk@...>, Paul Mackerras <paulus@...>, Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>, <tglx@...>, Len Brown <lenb@...>
Date: Friday, May 9, 2008 - 12:32 pm

..

Yeah. This particular bug first appeared when NOHZ & HPET were added.
Somebody once suggested it had something to do with an SMI interrupt
happening in the midst of HPET calibration or some such thing.

But nobody who works on the HPET code has ever shown more than a casual
interest in helping to track down and fix whatever the problem is.

Cheers
--

To: Mark Lord <lkml@...>
Cc: Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Linus Torvalds <torvalds@...>, Adrian Bunk <bunk@...>, Paul Mackerras <paulus@...>, Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>, <tglx@...>, Len Brown <lenb@...>
Date: Friday, May 9, 2008 - 3:30 pm

I said I was waiting for -rc1 to be released to send another email
about my HPET problem, but curiously with v2.6.26-rc1-6-gafa26be
my laptop did not hang after 30+ boots and counting.

Somewhere between 2.6.25-07000-(something) and the above kernel
something happened which changed significantly the probability
of hanging during boot.

I could not boot more than 3 times in
a row without hanging with kernels up to 2.6.25-07000 (approximately),
and now I am still booting v2.6.26-rc1-6-gafa26be a few times a day
and no hangs yet.

Yesterday I started a "reverse" bisection, trying to find which
commit "fixed" it, but I still didn't finish (but it is past
-7200).

Of course I am not sure if after the 100th boot the latest -git

Well, I would like to thank Venki for his effort because he even
answered some private emails from me about this issue and is
tracking the bugzillas about it.
--

To: Mark Lord <lkml@...>, Pallipadi, Venkatesh <venkatesh.pallipadi@...>, Linus Torvalds <torvalds@...>, Adrian Bunk <bunk@...>, Paul Mackerras <paulus@...>, Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>, <tglx@...>, Len Brown <lenb@...>
Date: Friday, May 9, 2008 - 4:39 pm

..

My experience with this bug, since 2.6.20 or so, has been that it comes
and goes with even the most innocent change in the .config file,
like turning frame pointers on/off.

Cheers
--

To: Adrian Bunk <bunk@...>
Cc: Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Friday, May 2, 2008 - 6:16 am

I never actually saw a statement to that effect (i.e. that 2.6.24
worked) from Kamalesh. I think people assumed that because he
reported it against version X that version X-1 worked, but we don't

If I had been able to replicate it, or if it had been seen on more
than one machine, I would probably have asked Linus to wait while we
fixed it.

There's a risk management thing happening here. Delaying a release is
a negative thing in itself, since it means that users have to wait
longer for the improvements we have made. That has to be balanced
against the negative of some users seeing a regression. It's not an
absolute, black-and-white kind of thing. In this case, for a bug
being seen on only one machine, of a somewhat unusual configuration, I
considered it wasn't worth asking to delay the release.

Paul.
--

To: Paul Mackerras <paulus@...>
Cc: Josh Boyer <jwboyer@...>, Arjan van de Ven <arjan@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Friday, May 2, 2008 - 7:58 am

He reported it as

[BUG] 2.6.25-rc2-git4 - Regression Kernel oops while running kernbench and tbench on powerpc

No general disagreement on this.

And my example was not in any way meant against you - it's actually
unusual and positive that a bug that once got the attention of being
on the regression lists gets fixed later.

Even worse is the situation with regressions people run into when

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: Adrian Bunk <bunk@...>
Cc: Arjan van de Ven <arjan@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Thursday, May 1, 2008 - 11:49 am

Precisely. Cherry-picking a single example such as the 68k thing and then

Another fallacy which Arjan is pushing (even though he doesn't appear to
have realised it) is "all hardware is the same".

Well, it isn't. And most of our bugs are hardware-specific. So, I'd
venture, most of our bugs don't affect most people. So, over time, by
Arjan's "important to enough people" observation we just get more and more
and more unfixed bugs.

And I believe this effect has been occurring.

And please stop regaling us with this kerneloops.org stuff. It just isn't
very interesting, useful or representative when considering the whole
problem. Very few kernel bugs result in a trace, and when they do they are
usually easy to fix and, because of this, they will get fixed, often
quickly. I expect netdevwatchdogeth0transmittimedout.org would tell a
different story.

One thing which muddies all this up is that bug reporters vanish. Over the
years I have sent thousands and thousands of ping emails to people who have
reported bugs via email, three to six months after the fact. Some were
solved - maybe a fifth. About the same proportion of reporters reply and
give some reason why they cannot work on the bug. In the majorty of cases
people don't reply at all and I suspect they're in the same category of
cannot-work-on-the-bug.

And why can't they work on the bug? Usually, because they found a
workaround. People aren't going to spend months sitting in front of a
non-functional computer waiting for kernel developers to decide if their
machine is important enough to fix. They will find a workaround. They
will buy new hardware. They will discover "noapic" (234000 google hits and
rising!). They will swap it with a different machine. They will switch to
a different distro which for some reason doesn't trigger the bug. They
will use an older kernel. They will switch to Solaris. Etcetera. People
are clever - they will find a way to get around it.

I figure that after a bug is reported w...

To: Andrew Morton <akpm@...>
Cc: Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Wednesday, April 30, 2008 - 9:13 pm

On Thu, 1 May 2008 08:49:19 -0700

no I'm pushing "some classes of hardware are much more popular/relevant

I did not say "most people". I believe "most people" aren't hitting
bugs right now (or there would be a lot more screaming).
What I do believe is that *within the bugs that hit*, even the hardware
specific ones, there's a clear prioritization by how many people hit

now that's a fallacy of your own.. if you care about that one, it's 1)
trivial to track and/or 2) could contain a WARN_ON_ONCE(), at which
point it's automatically tracked. (and more useful information I
suspect, since it suddenly has a full backtrace including driver info
in it)
By your argument we should work hard to make sure we're better at
creating traces for cases we detect something goes wrong.

if it's a hardware bug there's little we can do.
If it's a hardware specific bug, yeah then it becomes a function of how

Given that a normal PC has maybe 10 components...
yes we don't want bugcreep that affects common hardware over time.
At the same time, by your argument, a bug that hits a piece of hardware
of which 5 are made (or left on this planet) is equally important to

This statement is so rediculous and self contradicting to what you
said before that I'm not even going to respond to it.
--

To: Arjan van de Ven <arjan@...>
Cc: Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Friday, May 2, 2008 - 5:00 am

"popular/relevant" is hard to define.

E.g. if we'd go after "popular" we should only keep architectures like
ARM and x86 and ditch architectures like ia64 and s390 that have puny
userbases.

If your "or have the hardware in general" is meant seriously you have to
convince people that ARM must become a very high priority.

No matter whether one supports your "there's a clear prioritization"
view or not it anyway doesn't currently work since the areas covered by
people testing -rc kernels don't even remotely map the most popular

kerneloops.org catches the easiest to solve bugs (there's a trace) and
helps in getting them fixed.

That's a very good thing.

And if we get more bugs into this easy to resolve state that would be
even better.

But it's only a small part of the complete picture of incoming bug
reports.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: Andrew Morton <akpm@...>
Cc: Adrian Bunk <bunk@...>, Arjan van de Ven <arjan@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Thursday, May 1, 2008 - 1:24 pm

So the question is if we have a thousand bugs which only affect one
person each, and 70 million Linux users, how much should we beat up
ourselves that 1,000 people can't use a particular version of the
Linux kernel, versus the 99.9% of the people for which the kernel
works just fine?

Sometimes, we can't make everyone happy.

At the recent Linux Collaboration Summit, we had a local user walk up
to a microphone, and loosely paraphrased, said, "WHINE WHINE WHINE
WHINE I have have a $30 DVD drive that doesn't work with Linux. WHINE
WHINE WHINE WHINE WHINE What are *you* going to do to fix my problem?"

Some people like James responded very diplomatically, with "Well, you
have to understand, the developer might not have your hardware, and
there's a lot of broken out here, etc., etc." What I wanted to tell
this user was, "Ask not what the Linux development community can do
for you. Ask what *you* can do for Linux?" Suppose this person had
filed a kernel bugzilla bug, and it was one of the hundreds or
thousands of non-handled bugs. Sure, it's a tragedy that bugs pile
up. But if they pile up because of crappy hardware, that's not a
major tragedy. If we can figure out how to blacklist it, and move on,

Hey, in this particular case, if this user worked around the problem
by buying new hardware, it was probably the right solution. As far as
we know we don't have a systematic problem where huge numbers DVD
drives aren't working, so if there are a few odd ball ones that are
out there, we just CAN'T self-flagellate ourselves that we're not

... and maybe we can't solve hardware bugs. Or that crappy hardware
isn't worth holding back Linux development. And I'm not sure ignoring
it is that horrible of a thing. And in practice, if it's a hardware
bug in something which is very common, it *will* get noticed very
quickly and fixed. But if it's in a hardware bug in some rare piece
of hardware, the user is going to have to either (a) help us fix it,
or (b) decide that his time is more ...

To: Theodore Tso <tytso@...>
Cc: <bunk@...>, <arjan@...>, <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <rostedt@...>
Date: Thursday, May 1, 2008 - 3:26 pm

On Thu, 1 May 2008 13:24:34 -0400

Many, many of these are regressions. If old-linux works on that
hardware then new-linux can too.

(still wants to know what we did 2-3 years ago which caused thousands of
people to have to resort to using noapic and other apic-related boot option
workarounds)

--

To: Andrew Morton <akpm@...>
Cc: Theodore Tso <tytso@...>, <bunk@...>, <arjan@...>, <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <rostedt@...>
Date: Friday, May 2, 2008 - 6:23 am

Forcing APIC even when the BIOS didn't support them.

-Andi

--

To: Andrew Morton <akpm@...>
Cc: Theodore Tso <tytso@...>, <bunk@...>, <arjan@...>, <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Thursday, May 1, 2008 - 3:39 pm

Perhaps 2-3 years ago more people started using more hardware that
implements APIC. ;-)

-- Steve

--

To: Andrew Morton <akpm@...>
Cc: Adrian Bunk <bunk@...>, Arjan van de Ven <arjan@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Thursday, May 1, 2008 - 12:38 pm

And actually, core kernel developers are best for writing new bugs.

Really, the way I started out learning how the kernel ticks was to go and
try to solve some bugs that I was seeing (this was years ago). I get
people asking that they want to learn to be a kernel developer and they
ask what new feature should they work on? Well, honestly, the last thing
a newbie kernel developer should be doing is writing new bugs. We need to
send them to a URL that lists all the known bugs and have them pick one,
any one, and have them solve it. This would be the best way to learn part
of the kernel.

I even find that I understand my own code better when I'm in the debugging
phase.

People here mention differnt places to look at code, and besides the
kerneloops.org I really don't even know where to look for bugs, because I
haven't seen a URL to point me to.

The next time someone asks me how to get started in kernel programming, I
would love to tell them to go and look here, and solve the bugs. I'm
guessing that I should just point them to:

http://janitor.kernelnewbies.org/

and tell them to focus on real bugs (not just comments and such) to get
fixed if they really want to learn the kernel.

-- Steve

--

To: Steven Rostedt <rostedt@...>
Cc: <bunk@...>, <arjan@...>, <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Thursday, May 1, 2008 - 1:18 pm

On Thu, 1 May 2008 12:38:23 -0400 (EDT)

bugzilla.kernel.org is, umm, improving.

It would be an intersting exercise for someone to spend a few days seeing
how many of the bugzilla reports they personally can reproduce. I'd guess
"zero". There's a lesson in that.

The problem with bugzilla will be that it will be hard to find reports
where the reporter will be able to work with you on the fix - we've let
them go cold.

The most fruitful place to find fixable bugs is linux-kernel. People who
report bugs there are sufficiently motivated to have actually sent the
email and the bug is still recent, so they probably haven't done the
Solaris install yet.

--

To: Arjan van de Ven <arjan@...>
Cc: Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Thursday, May 1, 2008 - 8:53 am

Agreed.

Thanks,
Rafael
--

To: Arjan van de Ven <arjan@...>
Cc: Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Thursday, May 1, 2008 - 4:13 am

<boggle>

How about "a bug which we just added"? One which is repeatable.
Repeatable by a tester who is prepared to work with us on resolving it.
Those bugs.

Rafael has a list of them. We release kernels when that list still has tens of
unfixed regressions dating back up to a couple of months.

--

To: Andrew Morton <akpm@...>
Cc: Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Wednesday, April 30, 2008 - 10:15 am

On Thu, 1 May 2008 01:13:46 -0700

I know he does. But I will still argue that if that is all we work from, and treat
all of those equally, we're doing the wrong thing.
I'm sorry, but I really do not consider "ext4 doesn't compile on m68k" which is
on that list to be as relevant as a "i915 drm driver crashes" bug which is among
us for a while and not on that list, just based on the total user base for either of those.

Does that mean nobody should fix the m68k bug?
Someone who cares about m68k for sure should work on it, or if it's easy for an ext4 developer,
sure. But if the ext4 person has to spend 8 hours on it figuring cross compilers, I say
we're doing something very wrong here. (no offense to the m68k people, but there's just
a few of you; maybe I should have picked voyager instead)

Maybe that's a "boggle" for you; but for me that's symptomatic of where we are today:
We don't make (effective) prioritization decisions. Such decisions are hard, because it
effectively means telling people "I'm sorry but your bug is not yet important". That's
unpopular, especially if the reporter is very motivated on lkml. And it will involve a
certain amount of non-quantifiable judgement calls, which also means we won't always be
right. Another hard thing is that lkml is a very self-selective audience. A bug may be
reported three times there, but never hit otherwise, while another bug might not be reported
at all (or only once) while thousands and thousands of people are hitting it.

Not that we're doing all that bad, we ARE fixing the bugs (at least the oopses/warnings) that
are frequently hit. So I wouldn't blindly say we're doing a bad job at prioritizing. I would
rather say that if we focus only on what is left afterwards without doing a reality check,
we'll *always* have a negative view of quality, since there will *always* be bugs we don't
fix. Linux well over ten million users (much more if you count embedded devices).
A lot of them will have "standard" hardware, and a bunch of the...

To: Arjan van de Ven <arjan@...>
Cc: Andrew Morton <akpm@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Sunday, May 4, 2008 - 8:45 am

On that note, I'd really like to see better binary availability of cross
compilers. While it's improved over the last few years mostly due to the
crossgcc stuff it's still a pain. Ideally, they would be available through
the distribution package manager even but failing that some dedicated place
on kernel.org with x86->lots and some of the more widely used other
combinations would quite definitely be good. Perhaps not really directly
relevant to this thread as such, but still good.

Andrew maintain{s,ed} a number of them at

http://userweb.kernel.org/~akpm/cross-compilers/

But as you see, most of the stuff there is really old again...

Rene
--

To: Rene Herman <rene.herman@...>
Cc: Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>, Vegard Nossum <vegard.nossum@...>
Date: Sunday, May 4, 2008 - 9:00 am

You're most welcome to help out Vegard to do this:

http://www.kernel.org/pub/tools/crosstool/
--

To: linux kernel list <linux-kernel@...>
Date: Monday, May 5, 2008 - 9:13 am

You could also use ct-ng:

http://ymorin.is-a-geek.org/dokuwiki/projects/crosstool

Works excellent for me :)

cu
--
---------------------------------------------------------------------
Enrico Weigelt == metux IT service - http://www.metux.de/
---------------------------------------------------------------------
Please visit the OpenSource QM Taskforce:
http://wiki.metux.de/public/OpenSource_QM_Taskforce
Patches / Fixes for a lot dozens of packages in dozens of versions:
http://patches.metux.de/
---------------------------------------------------------------------
--

To: Pekka Enberg <penberg@...>
Cc: Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>, Vegard Nossum <vegard.nossum@...>
Date: Sunday, May 4, 2008 - 9:19 am

Ah, thanks, lovely, just new I see (and yes, I meant s/grossgcc/crosstool/).
Good thing. I'll check it out and see if there's anything to add.

Rene.
--

To: Arjan van de Ven <arjan@...>
Cc: Andrew Morton <akpm@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Thursday, May 1, 2008 - 8:42 am

It's not that clear-cut, either. Something which manifests itself as a
build failure or an immediate test failure on m68k alone, might actually
turn out to cause subtle data corruption on other platforms.

You can't always know that it isn't important, just because it only
shows up in some esoteric circumstances. You only really know how
important it was _after_ you've fixed it.

That obviously doesn't help us to prioritise.

--
dwmw2

--

To: David Woodhouse <dwmw2@...>
Cc: Arjan van de Ven <arjan@...>, Andrew Morton <akpm@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Monday, May 5, 2008 - 6:03 am

Ideally, you'd do an analysis first and then prioritize, based
on the severity of the bug, its exposure, how easy it is it fix,
etc. If while doing that you already have a fix at hand, you're
almost done :)

Recursively, there's the problem of which bugs you analyze first.
I'm inclined to say that you want to analyze most if not all bug reports
in higher priority than working on fixing non-critical bug.

Benny
--

To: David Woodhouse <dwmw2@...>
Cc: Andrew Morton <akpm@...>, Adrian Bunk <bunk@...>, Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Steven Rostedt <rostedt@...>
Date: Wednesday, April 30, 2008 - 11:02 am

On Thu, 01 May 2008 13:42:44 +0100

absolutely. I'm not going to argue that prioritization is easy. Or
that we'll be able to get it right all the time.
--

To: Andrew Morton <akpm@...>
Cc: <arjan@...>, <bunk@...>, <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <rostedt@...>
Date: Thursday, May 1, 2008 - 5:16 am

And leave unfixed all the regressions introduced in earlier kernel versions
and known at the time of the release of that version but still present in
the current version? Not to mention all the other bugs reported by users of

That can be true for not-so-recently introduced bugs too.

There are so many bugs out there and developers tend to focus on new ones
leaving a lot of others unattended, both important and not so important
ones.

Which ones should someone focus on? Maybe on the ones that someone (helped)
introduce him/herself. Maybe that should even sometimes be prioritized over
introducing new bugs^W^W^Wdoing new development.
--

To: linux kernel list <linux-kernel@...>
Date: Thursday, May 1, 2008 - 6:30 am

<big_snip />

Hi folks,

what do you think about Gentoo's "bug-wrangler" concept ?
Maybe could do something similar:

An Tester group (which eg. should be the entry point for newbies),
is responsible for receiving bug reports from users (maybe even
distro maintainers who're not directly involved in kernel dev.).
They try to reproduce the bugs and find out as much as they can,
then file a report to the actual kernel devs (just critical bugs
are directly kicked to the devs with high priority). Maybe this
group could also keep users informed about fixes and give some
upgrade advise, etc.

This way we can build an good technical support (independent
from distributors ;-P), newbies can learn on the job and te
load on kernel devs is reduced, so they can better concentrate
on their core competences.

What do you think about this ?

cu
--
---------------------------------------------------------------------
Enrico Weigelt == metux IT service - http://www.metux.de/
---------------------------------------------------------------------
Please visit the OpenSource QM Taskforce:
http://wiki.metux.de/public/OpenSource_QM_Taskforce
Patches / Fixes for a lot dozens of packages in dozens of versions:
http://patches.metux.de/
---------------------------------------------------------------------
--

To: Enrico Weigelt <weigelt@...>
Cc: linux kernel list <linux-kernel@...>
Date: Thursday, May 1, 2008 - 9:02 am

Andrew already does more or less this.

The problems are:
- kernel bugs tend to very quickly reach the state where you need expert
knowledge in some area, and there's definitely not much room for
newbies in bug handling
- "try to reproduce the bugs" works for much software, but in the

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: <bunk@...>
Cc: <torvalds@...>, <akpm@...>, <rjw@...>, <linux-kernel@...>, <jirislaby@...>, <rostedt@...>
Date: Wednesday, April 30, 2008 - 8:41 pm

From: Adrian Bunk <bunk@kernel.org>

kernel-testers@vger.kernel.org has been created, feel free to
use it
--

To: David Miller <davem@...>
Cc: <torvalds@...>, <akpm@...>, <rjw@...>, <linux-kernel@...>, <jirislaby@...>, <rostedt@...>
Date: Thursday, May 1, 2008 - 9:23 am

Thanks :-)
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 5:52 pm

One thing is that we keep fragmenting the tester base by adding new
confidence levels: we now have -mm, -next, mainline -git, mainline -rc,
mainline release, stable, distro testing, and distro release (and some
distros even have aggressive versus conservative tracks.) Furthermore,
thanks to craniorectal immersion on the part of graphics vendors, a lot
of users have to run proprietary drivers on their "main work" systems,
which means they can't even test newer releases even if they would dare.

This fragmentation is largely intentional, of course -- everyone can
pick a risk level appropriate for them -- but it does mean:

a) The lag for a patch to ride through the pipeline is pretty long.
b) The section of people who are going to use the more aggressive trees
for "real work" testing is going to be small.

-hpa

--

To: H. Peter Anvin <hpa@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Thursday, May 1, 2008 - 12:39 pm

And another problem is that often, it's hard to get good "real work" coverage
over the whole tree. I just discovered an apparent borkage somewhere in
the networking/wireless area that seems to have gotten into Linus's tree
somewhere between 24-rc8 and 24-final, just because I haven't beaten on
my wireless card in the last few weeks, so I didn't notice a regression in
'ip link show' related to the rfkill switch...

To: H. Peter Anvin <hpa@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 11:24 pm

Since I poke my head out of the foxhole every once in a while with a
relatively late-breaking bug report, I thought I should chime in...
Mr. Anvin has pretty much nailed it...

As the kernel development process has evolved, which "confidence level"
I select has evolved as well. The thing that *hasn't* changed through
the years is, I tend to pick a "confidence level" that is appropriately
close to "mainline" and has an update release schedule roughly compatible
with my ability to keep up with it. Specifically, if it takes me several
hours to download a patch set, apply it, build the new kernel, and test
on multiple platforms/architectures, then the update release schedule is
probably going to have to be no more often than twice a week if I'm going
to be at all interested in even trying to keep up with it. In 2008, the
"-rcX" updates are a good fit. In the not-too-distant past, keeping up
with 2.5.X.Y was no problem.

Yes, I realize I don't *have* to test every revision level in every
major tree, but I don't have to think about which one to pick for testing
if I can keep up with the update release schedule :-).

--
------------------------------------------------------------------------
Bob Tracy | "I was a beta tester for dirt. They never did
rct@frus.com | get all the bugs out." - Steve McGrew on /.
------------------------------------------------------------------------
--

To: Linus Torvalds <torvalds@...>
Cc: <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 4:54 pm

On Wed, 30 Apr 2008 13:31:08 -0700 (PDT)

Well. We'll see.

linux-next is more than another-tree-to-test. It is (or will be) a change
in our processes and culture. For a start, subsystem maintainers can no
longer whack away at their own tree as if the rest of use don't exist.
They now have to be more mindful of merge issues.

Secondly, linux-next is more accessible than -mm: more releases, more
stable, better tested by he-who-releases it, available via git:// etc. It
should be very easy for developers to do their weekly "does linux-next
boot" test.

Plus, of course, people who complain about merge-window breakage only to
find that the breakage was already in linux-next except they didn't test it
will not have a leg to stand on.

I feared that linux-next wouldn't work: that Stephen would stomp off in
disgust at all the crap people send at him. But in fact it seems to be
going very well from that POV.

I get the impression that we're seeing very little non-Stephen testing of
linux-next at this stage. I hope we can ramp that up a bit, initially by
having core developers doing at least some basic sanity testing.

linux-next does little to address our two largest (IMO) problems:
inadequate review and inadequate response to bug and regression reports.
But those problems are harder to fix..

--

To: Andrew Morton <akpm@...>
Cc: Linus Torvalds <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Friday, May 9, 2008 - 5:28 am

Probably it would make sense also for distro vendors to make linux-next
snapshosts available in their development distro branches (redhat's
rawhide, opensuse's factory, etc), to make it easier to test by those
users who are willing to test if it works in their environment, but don't
want to compile kernels themselves.

--
Jiri Kosina

--

To: Jiri Kosina <jkosina@...>
Cc: Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Friday, May 9, 2008 - 11:00 am

I try to test linux-next on a few SATA test boxes, but it's definitely

Agreed... any lead time on linux-next testing would be great.

Jeff

--

To: Andrew Morton <akpm@...>
Cc: Linus Torvalds <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Ingo Molnar <mingo@...>
Date: Wednesday, April 30, 2008 - 5:42 pm

Andrew, the latter thing is a very good point. For me personally, the fact
that -mm is not available via git is the major obstacle for trying your
tree more frequently than just a few times per year. How difficult it
would be to switch to git for you? I guess there are good reasons for still
using the source code management system from the last century; please
correct me if I'm wrong, but I believe that using a modern SCM system could

For busy (or lazy) people like myself, the big problem with linux-next are
the frequent merge breakages, when pulling the tree stops with "you are in
the middle of a merge conflict". Perhaps, there is a better way to resolve
this without just removing the whole repo and cloning it once again - this
is what I'm doing, please flame me for stupidity or ignorance if I simply
am not aware of some git feature that could be useful in such cases.

Finally, while the list is at it, I'd like to make another technical comment.
My development zoo is a pretty fast 4-way Xeon server, where I keep a handful
of trees, a few cross-toolchains, Qemu, etc. The network setup in our
organization is such that I can use git only over http from that server. This
cannot be changed, it's the company policy. In view of that, it's a pity that
quite a few tree owners don't make sure that http access to their trees works
(I added Ingo to the Cc: list in the hope that this will be corrected soon for
the x86 tree, which I am using quite extensively), and I have to use a much
slower machine (a two and a half year old laptop) for these trees. Please see
this:

<<<<<<<

[dmitri.vorobiev@amber ~]$ git clone http://www.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
Initialized empty Git repository in /home/dmitri.vorobiev/linux-2.6-x86/.git/
Getting alternates list for http://www.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
Also look at http://www.kernel.org/home/ftp/pub/scm/linux/kernel/git/torvalds/linux-2...
Getting pack list for http:...

To: Dmitri Vorobiev <dmitri.vorobiev@...>
Cc: <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <mingo@...>
Date: Wednesday, April 30, 2008 - 6:10 pm

On Thu, 01 May 2008 01:42:59 +0400

Every -mm release if available via git://, as described in the release
announcements.

The scripts which do this are a bit cantankerous but I believe they do
work.

<tests it>

Fatal, I expect. A tool which manages source-code files is just the wrong

Really? Doesn't Stephen handle all those problems? It should be a clean

Don't know what to do about that, sorry. An off-site git->http proxy might
work, but I doubt if anyone has written the code.

--

To: Andrew Morton <akpm@...>
Cc: Dmitri Vorobiev <dmitri.vorobiev@...>, <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <mingo@...>
Date: Thursday, May 1, 2008 - 2:15 am

Would you mind using stgit? That you way have the queue patch
functionality, yet a simple git-push -f will send the whole
patch stack over to a repo (without the stgit bits that is),
leaving what looks like a regular tree with just lots of
recent commits. Does not even need extra scripts to do a

Indeed, assuming the remote is set up and you have a local branch,
`git reset --hard mm/master` after a fetch is the thing.
But be sure not to have any changed files.
--

To: Andrew Morton <akpm@...>
Cc: <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <mingo@...>
Date: Wednesday, April 30, 2008 - 7:04 pm

Andrew Morton пишет:

But there is another solution, which I believe is straightforward: have the tree
maintainer set up his tree properly.

--

To: Andrew Morton <akpm@...>
Cc: Dmitri Vorobiev <dmitri.vorobiev@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <mingo@...>
Date: Wednesday, April 30, 2008 - 6:19 pm

It should indeed be a clean fetch, but I wonder if Dmitri perhaps does a
"git pull" - which will do the fetch, but then try to _merge_ that fetched
state into whatever the last base Dmitri happened to have.

Dmitry: you cannot just "git pull" on linux-next, because each version of
linux-next is independent of the next one. What you should do is basically

# Set this up just once..
git remote add linux-next git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git

and then after that, you keep on just doing

git fetch linux-next
git checkout linux-next/master

which will get you the actual objects and check out the state of that
remote (and then you'll normally never be on a local branch on that tree,
git will end up using a so-called "detached head" for this).

IOW, you should never need to do any merges, because Stephen did all those
in linux-next already.

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Dmitri Vorobiev <dmitri.vorobiev@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <mingo@...>
Date: Thursday, May 1, 2008 - 7:06 pm

Just to add some emphasis here - this is something that took me a long time to figure out, and since it is the pattern for dealing with the x86 trees and with the mm git tree and with linux-next, it would help if it were documented somewhere (not that I can imagine where). Once you know it, it becomes obvious, but try staring at a merge conflict for a while trying to figure out what to do, and it gets frustrating. I wonder if we can guess how many testers abandon the mm git tree or the linux-next tree because of this.

It might be nice if git supported a command like git-remote-help or something that would fetch a predefined help file from a remote tree that describes the workflow for that tree.

But at least with an extra reply to this mail, it might creep higher in the google search results when looking for merge conflicts with linux-next.

--
Kevin Winchester

--

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <mingo@...>
Date: Wednesday, April 30, 2008 - 6:28 pm

Linus, thanks a lot for the detailed explanation. Indeed, it seems that I foolishly
tried to duplicate Stephen's work. In the future I'll do as you suggest here.

--

To: Dmitri Vorobiev <dmitri.vorobiev@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <mingo@...>, Stephen Rothwell <sfr@...>
Date: Thursday, May 1, 2008 - 12:26 pm

That "howto" should probably be added to the linux-next announcements...
(CC'ing Stephen)
--

To: Diego Calleja <diegocg@...>
Cc: Dmitri Vorobiev <dmitri.vorobiev@...>, Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <mingo@...>
Date: Thursday, May 1, 2008 - 9:48 pm

This is already mentioned in the linux-next wiki
(http://linux.f-seidel.de/linux-next/pmwiki/) in the FAQ. I will add a
link to the wiki to the announcements.

--=20
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

To: Diego Calleja <diegocg@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <mingo@...>, Stephen Rothwell <sfr@...>
Date: Thursday, May 1, 2008 - 12:31 pm

Excellent idea. Thanks, Diego!

Dmitri
--

To: Dmitri Vorobiev <dmitri.vorobiev@...>
Cc: Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, Ingo Molnar <mingo@...>
Date: Wednesday, April 30, 2008 - 6:06 pm

If this is still an issue of -next, I would say we won't get too much testers. I
gave up after first time I was attacked by that and got back to pure -mm.

I think greg-kh asked why this happens (Stephen rebases?), if you search
archives, I'm sure you'll find it.
--

To: <akpm@...>
Cc: <torvalds@...>, <rjw@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 5:21 pm

From: Andrew Morton <akpm@linux-foundation.org>

This is all about positive and negative reinforcement.

The people who sit and git bisect their lives away to get the
regressions fixed need more positive reinforcement. And the people
who stick these regressions into the tree need more negative
reinforcement.

The current way of dealing with folks who stick broken crud into the
tree results in zero change in behvaior.

People who insert the bum changes into the tree only really have one
core thing that they are sensitive to, their reputation. That's why
there is an enormous reluctance to even suggest reverts, it looks bad
for them and it also makes more work for them in the end.

I guess what these folks are truly afraid of is that someone will
start tracking reverts and post their results in some presentation
at some big conference. I say that would be a good thing. To
be honest, hitting the revert button more aggressively and putting
the fear of being the "revert king" into everyone's minds might
really help with this problem.

Currently there is no sufficient negative pushback on people who
insert broken crud into the tree. So it should be no surprise that it
continues.
--

To: David Miller <davem@...>
Cc: <akpm@...>, <torvalds@...>, <rjw@...>, <linux-kernel@...>, <jirislaby@...>
Date: Friday, May 2, 2008 - 9:37 am

David Miller wrote:
You will probably want to sort by "revert percentage" then.
The absolute number of reverts might make the biggest contributor
"revert king", even if his average patch quality is better than

Helge Hafting
--

To: David Miller <davem@...>
Cc: <akpm@...>, <torvalds@...>, <rjw@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 6:19 pm

What we need is not 'negative reinforcement'. That is just nasty, open
warfare between isolated parties, expressed in a politically correct
way.

The core problem is that every maintainer has his own subjective,
assymetric view and experience about this matter: to him his own tree is
almost problem-free and most problems are very easy to fix, while other
problems in other trees are nuisance that should never have been put
upstream.

Also, people get defensive when their regressions gets pointed out in
anything but the most respectful and casual manner.

For example, how on earth do i tell you that during the v2.6.24 merge
window, half of all x86 test-machines for me and others were broken
because they had no networking, for more than a week in a row? Are you
surprised about this (true) experience we had? Do you feel insulted? Do
you feel unfairly handled and slandered?

The same goes in the other direction as well - you were just hit by
scheduler tree related regressions that were only triggered on your
128-way sparc64, but not on our 64way x86 and smaller boxes.

The thing is, what we really need is more cooperation and earlier
integration - more people actually testing linux-next occasionally to
see how things will look like in the next merge window.

linux-next doing build tests is fine, but the nasty regressions that
will hit your box can only be solved if _you_ boot linux-next at least
once before the merge window opens. The regressions that will hit my box
can only be avoided if i test your tree.

hm? And can we please somehow talk about this without flaming each other
in the process?

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: David Miller <davem@...>, <akpm@...>, <torvalds@...>, <rjw@...>, <linux-kernel@...>, <jirislaby@...>
Date: Sunday, May 4, 2008 - 11:04 pm

Over time as patches succeed more I reduce testing so I can "get things done
faster". Eventually I screw up, and get more cautious on checking. It's a
dynamic balance.

With reduced review comes sloppier code. If we can't increase review, we can
at least increase the penalty for screwing up when I do get caught.

If vger dropped all my emails for a week after I broke the kernel, I'd be far
more careful OR I'd find efficient ways to avoid doing that (like increasing
review, or automated testing). Either way, it's a win.

But I'm sure everyone else is far more disciplined than I...
Rusty.
--

To: David Miller <davem@...>
Cc: <akpm@...>, <torvalds@...>, <rjw@...>, <linux-kernel@...>, <jirislaby@...>, Thomas Gleixner <tglx@...>
Date: Wednesday, April 30, 2008 - 6:35 pm

in more detail: any "negative reinforcement" should be on the
_technical_ level, i.e. when changes are handled - not at the broad tree
level.

Sure, there are exceptions, etc. - but by the time stuff goes upstream
it's too late and we've got to fix stuff instead of trying to push back
on each other.

by earlier integration (= linux-next) we can do the pushback much
earlier, in a much more granular, much more technical in a much less
personal way: "hey Ingo, your new sched-dizzy-blah patch broke stuff
here, zap it" or "hey Dave, that socket-foo rewrite just broke things
here, zap it".

git-revert _kind of_ makes that possible too, but people still feel too
personal about reverts - they take it as intrusion into their subsystem
and regard it as an attack against their competence as a maintainer.

and this is all so typical btw.: the most effective measure against
human warfare is for people to see each other and to talk to each other.

[ That's one reason why i am so worried about mailing list isolation.
People get more distant, they mean less to each other, work less with
each other => Linux suffers. I do accept that for some people lkml is
simply too noisy - but i think the cure is worse than the disease. ]

Ingo
--

To: <mingo@...>
Cc: <akpm@...>, <torvalds@...>, <rjw@...>, <linux-kernel@...>, <jirislaby@...>, <tglx@...>
Date: Wednesday, April 30, 2008 - 6:51 pm

From: Ingo Molnar <mingo@elte.hu>

Sure, and I'll provide some right here.

Ingo, let me know what I need to do to change your behavior in
situations like the one I'm about to describe, ok?

Today, you merged in this bogus "regression fix".

commit ae3a0064e6d69068b1c9fd075095da062430bda9
Author: Ingo Molnar <mingo@elte.hu>
Date: Wed Apr 30 00:15:31 2008 +0200

inlining: do not allow gcc below version 4 to optimize inlining

fix the condition to match intention: always use the old inlining
behavior on all gcc versions below 4.

this should solve the UML build problem.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Did you actually read the UML build failure report?

Adrian Bunk specifically stated that the UML build failure regression
occurs with GCC version 4.3

Next, did you test this regression fix?

Next, if you could not test this regression fix, did you wait
patiently for the bug reporter to validate your fix? Adrian
responded that it didn't fix the problem, but that was after
you queued this up to Linus already.

This proves my main beef with you Ingo. You're way too trigger happy,
you merge things in too quickly, without checks and without
verifications.

To an arbitrary person reading the commit logs, the above
looks like you fixed something, when you actually didn't fix
anything.

And let's address this specific inlining optimization and all the
fallout it's generating. You said you merged this thing in because
you didn't want to "wait a year for such a useful feature." In
hindsight, that's exactly what we should have done, waited until we
could sort out all of these issues. Yes, even if it would take a
year.

Now we're forced to sort it out somehow, unless you can get beyond
your pride and revert the original change.
--

To: David Miller <davem@...>
Cc: <mingo@...>, <akpm@...>, <torvalds@...>, <rjw@...>, <linux-kernel@...>, <jirislaby@...>, <tglx@...>
Date: Wednesday, April 30, 2008 - 10:48 pm

You got the facts wrong, it is even worse:

It was Ingo himself who reported this bug. [1]

Ingo managed to send an untested and not working patch for a bug he
reported himself...

cu
Adrian

BTW: I finally figured out what is behind the problems on UML, and this
is not related to any recent kernel changes.
Patch comes when I'm awake again.

[1] http://lkml.org/lkml/2008/4/26/151

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: David Miller <davem@...>
Cc: <akpm@...>, <torvalds@...>, <rjw@...>, <linux-kernel@...>, <jirislaby@...>, <tglx@...>
Date: Wednesday, April 30, 2008 - 9:40 pm

the motivation of that fix wasnt UML - that was just an (indeed
incorrect) after-thought when i wrote up the commit log. The fix is
obviously right - although it doesnt fix UML.

it is wrong that it "doesnt fix anything". Look at the change itself:

- * Force always-inline if the user requests it so via the .config:
+ * Force always-inline if the user requests it so via the .config,
+ * or if gcc is too old:
*/
#if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \
- !defined(CONFIG_OPTIMIZE_INLINING) && (__GNUC__ >= 4)
+ !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4)

before the change it was only possible to disable the optimization on
gcc 4 and above. The intended (and now implemented) condition is to only
change anything on gcc 4 and above. I.e. on gcc3x the config option has
no effect at all - and that's what we want.

Ingo
--

To: Ingo Molnar <mingo@...>
Cc: <davem@...>, <torvalds@...>, <rjw@...>, <linux-kernel@...>, <jirislaby@...>, <tglx@...>
Date: Wednesday, April 30, 2008 - 6:49 pm

On Thu, 1 May 2008 00:35:09 +0200

I'd question this. People often seem pretty happy to yank their stuff out
of there - it relieves ongoing embarrassment and it relieves time pressure
- they can have another go and get it right at their leisure.

Of course, reverting is easy. The hard part is often finding the thing
which needs to be reverted.

--

To: <mingo@...>
Cc: <akpm@...>, <torvalds@...>, <rjw@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 6:22 pm

From: Ingo Molnar <mingo@elte.hu>

You keep saying this over and over again, but the powerpc folks hit
this stuff too.
--

To: David Miller <davem@...>
Cc: <mingo@...>, <akpm@...>, <torvalds@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 6:39 pm

Well, I think that some changes need some wider testing anyway.

They may be correct from the author's point of view and even from the knowledge
and point of view of the maintainer who takes them into his tree. That's
because no one knows everything and it'll always be like this.

Still, with the current process such "suspicious" changes go in as parts of
large series of commits and need to be "rediscovered" by the affected testers
with the help of bisection. Moreover, many changes of this kind may go in from
many different sources at the same time and that's really problematic.

In fact, so many changes go in at a time during a merge window, that we often
can't really say which of them causes the breakage observed by testers and
bisection, that IMO should really be a last-resort tool, is used on the main
debugging techinque.
--

To: Rafael J. Wysocki <rjw@...>
Cc: David Miller <davem@...>, <mingo@...>, <akpm@...>, <torvalds@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 7:12 pm

That's very true IMHO and is the thing which has been progressively
appearing since we merge large amounts of code at once. In the "good
old days", something did not work, the first one to discover it could
quickly report it on LKML : "hey, my 128-way sparc64 does not boot
anymore, anybody has any clue", and another one immediately found
this mail (better signal/noise ratio on LKML at this time) and say
"oops, I suspect that change, try to revert it".

Now, it's close to impossible. Maintainers frequently ask for bisection,
in part because nobody knows what code is merged, and they have to pull
Linus' tree to know when their changes have been pulled. That may be
part of the "fun" aspect that Davem is seeing going away in exchange
for more administrative relations. But if we agree that nobody knows
all the changes, we must agree that we need tools to track them, and

Maybe we could slightly improve the process by releasing more often, but
based on topics. Small sets of minimally-overlapping topics would get
merged in each release, and other topics would only be allowed to pull
fixes. That way everybody still gets some work merged, everybody tests
and problems are more easily spotted.

I know this is in part what Andrew tries to do when proposing to
integrate trees, but maybe some approximate rules should be proposed
in order for developers to organize their works. This would begin
with announcing topics to be considered for next branch very early.
This would also make it more natural for developers to have creation
and bug-tracking phases.

Willy

--

To: Willy Tarreau <w@...>
Cc: Rafael J. Wysocki <rjw@...>, David Miller <davem@...>, <mingo@...>, <akpm@...>, <torvalds@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 8:15 pm

What would this look like, notionally? Say the releases were twice as
frequent with Stage A and Stage B. How could the topic be grouped
into the stages? Could bugfixes of any type be merged in either
window? Would this only apply to "new" features, API changes, etc? or
would maintenance-type changes have to be assigned to a stage, too?

-chris
--

To: Chris Shoemaker <c.shoemaker@...>
Cc: Rafael J. Wysocki <rjw@...>, David Miller <davem@...>, <mingo@...>, <akpm@...>, <torvalds@...>, <linux-kernel@...>, <jirislaby@...>
Date: Thursday, May 1, 2008 - 1:09 am

bug fixes are of course always possible, just that we limit important
changes, i.e. the ones which randomly break and that take a lot of time

willy

--

To: Willy Tarreau <w@...>
Cc: David Miller <davem@...>, <mingo@...>, <akpm@...>, <torvalds@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 7:59 pm

Yes, that's reasonable.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: David Miller <davem@...>, <mingo@...>, <akpm@...>, <torvalds@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 6:54 pm

git makes it easy to have many branches that get merged upstream, would it
really help much if these changes were initially done as seperate branches
and then merged in?

if so there are two ways to do this

have Ingo (and others) create a small forest of branches that get merged
into linux-next

have Ingo (and others) create a small forest of branches that get merged
into one 'please pull' branch that gets merged into linux-next

the second has the advantage that merge conflicts between the different
branches will be resolved before they go upstream, and there's less work
to be done upstream (as the upstream doesn't need to keep adding branches
to pull)

the first may have an advantage in terms of making the different branches

there are always going to be cases where the problem can only be found by
bisecting it, but I agree that there seems to be a little too much
reliance on bisecting (but that was a heated topic a few weeks ago, let's
not re-hash it now)

David Lang
--

To: David Miller <davem@...>
Cc: <akpm@...>, <torvalds@...>, <rjw@...>, <linux-kernel@...>, <jirislaby@...>, Ingo Molnar <mingo@...>
Date: Wednesday, April 30, 2008 - 6:02 pm

I'm not a frequent poster to this mailing list, but I do spend a good
portion of my life reading it. Please excuse me for expressing my very
personal opinion, but I thought you might probably be interested in a
detached view of the situation.

I think that many have guessed that I would like to talk about the attacks
to Ingo and backwards. Believe me, this fight looks childish, as it becomes
obvious that that went beyond purely technical disputes, which Linus is so
keen of rightfully writing about.

In no case am I implying any kind of offense, but I do believe that bad
emotions do hinder the community from advancing with the technical things.

Dmitri
--

To: David Miller <davem@...>
Cc: <akpm@...>, <torvalds@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 5:47 pm

... but that should also point at the trees through which the bugs are
introduced.

I mean, the maintainers should be more careful for what they take to their
trees and push upstream. If that happens, they'll (hopefully) put some more
pressure on patch submitters.
--

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Rafael J. Wysocki <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 4:47 pm

Perhaps we should be clear and simple about what potential testers
should be running at any given point in time. With -mm, linux-next,
linux-2.6, etc, as a newcomer I find it difficult to know where my
testing time and energy is best directed.

Is linux-next the right thing to be running at this point? Is there a
need for testing in a particular tree (netdev, x86, etc)?

Cheers,
Dan

--
/--------------- - - - - - -
| Dan Noe
| http://isomerica.net/~dpn/
--

To: Dan Noe <dpn@...>
Cc: <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 4:59 pm

On Wed, 30 Apr 2008 16:47:00 -0400

-mm consists of the sum of

a) the ~80 subsytem maintainers trees (git and quilt)

b) the ~100 subsytem trees which are hosted only in -mm.

linux-next consists of only a)

Soon I shall remove a) from -mm and will replace it with linux-next (this
should be a no-op).

Later, I shall start feeding those 100 random subsystems into linux-next

yes. 85% of the code which goes into Linux goes via the ~80 subsystem
maintainers' trees and is (or should be) in linux-next. The other 15%

No, please test the sum-of-all-trees in linux-next. If you hit problems
then, as part of the problem resolving process a developer _might_ ask you
to test one tree specifically, but that would be a pretty unusual
circumstance.
--

To: Andrew Morton <akpm@...>
Cc: Dan Noe <dpn@...>, <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 6:53 pm

Speaking of energy and time of a tester. I'd like to know where these resources
should be directed from the arch point of view. Once I had a plan to buy as
many arches as I could get and run a farm of test boxes 8-) But that's hard
because of various reasons (money, time, room, energy). What arches need more
attention? Which are forgotten? Which are going away? For example does buying
an alphaserver DS 20 (hey - it's cheap) and running tests on it makes sense
these days?

Mariusz
--

To: Mariusz Kozlowski <m.kozlowski@...>
Cc: Andrew Morton <akpm@...>, Dan Noe <dpn@...>, <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Friday, May 2, 2008 - 6:20 am

A lot of bugs are not architecture specific. Or when they are architecture
specific they only affect some specific machines in that architecture.
But really a lot of bugs should happen on most architectures. Just focussing
on lots of boxes is not necessarily productive.

My recommendation would be to concentrate on deeper testing (more coverage)
on the architectures you have.

A interestig project for example would be to play with the kernel gcov patch that
was recently reposted (I hope it makes mainline eventually). Apply that patch,
run all the test suites and tests you usually run on your favourite test box
and check how much of the code that is compiled into your kernel was really tested
using the coverage information Then think: what additional tests can you do to get
more coverage? Write tests then? Or just write descriptions on what is not tested
and send them to the list, as a project for others looking to contribute to the
kernel.

-Andi
--

To: Andi Kleen <andi@...>
Cc: Andrew Morton <akpm@...>, Dan Noe <dpn@...>, <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Friday, May 2, 2008 - 11:33 am

Yes, there is some amount of bugs that I see only on specific architecture.
These which are reproducible or have an easy test case I do report to LKML, but
there are also bugs I see rarely or just once and they never come back and sometimes
as a bonus leave no trace - and these I ususaly don't report. Providing a test case

What I meant was one box per architecture, preferably an SMP one where possible - so
the number of required boxes is limited. This way instead of just cross-compiling
I could actually _run_ the kernel. On the other hand if some arch is close to be dead
and has no foreseable future then there is no point in testing it.

Also my thinking was that sometimes bugs from other (than x86) architectures can point to

Sounds like a plan - will look into that.

Mariusz aka arch'aeologist ;)
--

To: Mariusz Kozlowski <m.kozlowski@...>
Cc: <dpn@...>, <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 7:11 pm

On Thu, 1 May 2008 00:53:31 +0200

gee.

I think to a large extent this problem solves itself - the "more important"
architectures have more people using them, so they get more testing and
more immediate testing.

However there are gaps. I'd say that arm is one of the more important
architectures, but many people who are interested in arm tend to shy away
from bleeding-edge kernels for various reasons. Mainly because they have
real products to get out the door, rather than dinking around with mainline
kernel developement. So testing bleeding-edge on some arm systems would be
good, I expect.

otoh, the platform we break most often is surely plain-old-PCs. If it's
bugs you're looking for, I expect that dumpster-diving for as many
different PCs as you can and trying to get them to boot (let alone suspend
and resume!) would keep you entertained ;)

--

To: Andrew Morton <akpm@...>
Cc: Mariusz Kozlowski <m.kozlowski@...>, <dpn@...>, <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Monday, May 12, 2008 - 5:27 am

As both personally, and the policy of my employer we try and ensure we
can offer our customers at-least the previous 'stable' kernel release
and ensure that our development process tracks the kernel -rcX
candidates. We also run an autobuilder[1] which runs all -git releases
through an automated build (no auto-test yet) to ensure that we can
detect any build or configuration errors in the releases.

ARM is a fast moving area due to the amount of sillicon vendors out
there who seem intent on doing their own thing, and often forking
hardware blocks they use during differing development branches. I
am currently looking at merging support for the S3C6400 (new) and
finishing S3C2443 (similar to 6400) and the S3C24A0... this means
that I have a lot of code to look through before each release and
having a stall will just keep the backlog building, making my job
a lot more difficult.

[1] http://armlinux.simtec.co.uk/kautobuild/

--
Ben (ben@fluff.org, http://www.fluff.org/)

'a smiley only costs 4 bytes'
--

To: Andrew Morton <akpm@...>
Cc: Dan Noe <dpn@...>, <torvalds@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Stephen Rothwell <sfr@...>
Date: Wednesday, April 30, 2008 - 5:30 pm

How bisectable is linux-next, BTW?
--

To: Rafael J. Wysocki <rjw@...>
Cc: Andrew Morton <akpm@...>, Dan Noe <dpn@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, Stephen Rothwell <sfr@...>
Date: Wednesday, April 30, 2008 - 6:08 pm

Each _individual_ release will be entirely bisectable, since it's all git
trees, and at no point does anything collapse individual commits together
like -mm does.

HOWEVER.

Due to the way linux-next works, each individual release will be basically
unrelated to the previous one, so it gets a bit more exciting indeed when
you say "the last linux-next version worked for me, but the current one
does not".

Git can actually do this - you can make the previous (good) linux-next
version be one branch, and the not-directly-related next linux-next build
be another, and then "git bisect" will _technically_ work, but:

- it will not necessarily be as efficient (because the linux-next trees
will have re-done all the merges, so there will be new commits and
patterns in between them)

- but much more distressingly, if the individual git trees that got
merged into linux-next were also using rebasing etc, now even all the
*base* commits will be different, and saying that the old release was
good tells you almost nothing about the new release!

(The good news is that if only a couple of trees do that, the bisection
information from the other trees that don't do it will still be valid
and useful and help bisection)

- also, while it's very easy for somebody who knows and understands git
branches, it's technically still quite a bit more challenging than just
following a single tree that never rebases (ie mine) and just bisecting
within that one.

So yes, git bisect will work in linux-next, and the fundamental nature of
git-bisect will not change at all, but it's going to be a bit weaker
"between different versions" of linux-next than it would be for the normal
git tree that doesn't do the "merge different trees all over again" thing
that linux-next does.

Linus
--

To: Rafael J. Wysocki <rjw@...>
Cc: <dpn@...>, <torvalds@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>, <sfr@...>
Date: Wednesday, April 30, 2008 - 5:37 pm

On Wed, 30 Apr 2008 23:30:20 +0200

don't know. Fully, one hopes.

Laurent Riffard did a successful bisection last month; I don't see many
other signs on the linux-next list.

--

To: Rafael J. Wysocki <rjw@...>
Cc: David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 4:05 pm

That is largely on purpose.

There's two choices:

- have a longer and calmer merge window, spread out the joy, and have
people test and fix their things during the merge window too. In other
words, less black-and-white.

- Really short merge window, and use the extra time *after* it to fix the
issues.

and I've obviously gone for the latter. In fact, I'd personally like to
make it even shorter, because the problem with the long merge window can
be summed up very simply:

Long merge windows don't work - because rather than test more, it just
means that people will use them to make more changes!

So one of the major things about the short merge window is that it's
hopefully encouraging people to have things ready by the time the merge
window opens, because it's too late to do anything later.

And yes, we could have some other way of enforcing that - allow the merge
window to be longer, but have some other mechanism to make sure that I
only merge old code.

In fact, I'd personally *love* to have a hard rule that says "I will only
pull from trees that were already 'done' by the time the window opened",
and we've been kind-of moving in that direction.

But that wish is counteracted by the fact that the merges themselves do
need some development, so expecting everything to be ready before-hand is
simply not realistic.

Also, while I'd like trees to be ready when the window opens, at the same
time I do think that it's good to spread out some of it, and get *some*
basic testing - even if it's just a nightly build and a few tens of

And really, that's all that we'd expect during the merge window. We want
to find the *obvious* problems - build issues, and the things that hit
everybody, but let's face it, the subtle ones will take time to find
regardless.

Then, the short merge window means that we have more time when we really
don't have big changes going in to find the subtle ones.

(And making the release cycle longer would *not*...

To: Linus Torvalds <torvalds@...>
Cc: Rafael J. Wysocki <rjw@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 7:29 pm

Having things ready by the time the merge window opens is difficult
when you don't know when the merge window is going to open. OK, after
you release a -rc6 or -rc7, we know it's close, but it could still be
three weeks off at that point. Or it could be tomorrow.

That's mitigated at the moment by having the merge window be two weeks
long. So if you open the merge window at a point where I, or someone
downstream of me, thought we still had two weeks to go, we can hurry
up and try to get stuff finished within the first week and still get
it merged.

But if you made a really hard and fast rule that only stuff that is in
linux-next at the point where the merge window opens can be merged,
AND the point at which the merge window opens is unknown and
unpredictable within a period of about 4 weeks, then that makes it
really tough for those of us downstream of you to plan our work.

By the way, if you do want to make that rule, then there's a really
easy way to do it - just pull linux-next, and make that one pull be
the entire merge window. :) But please give us at least a week's
notice that you're going to do that.

Paul.
--

To: Paul Mackerras <paulus@...>
Cc: Rafael J. Wysocki <rjw@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 11:47 pm

Well, if the tree is ready, you shouldn't need to care ;)

I'm not going to pull linux-next, because I hate how it gets rebuilt every
time it gets done, so I would basically have to pick one at random, and
then that would be it.

I also do actually try to spread the early pulls out a _bit_, so that
if/when problems happen, there's some amount of information in the fact
that something started showing up between -git2 and -git3.

HOWEVER.

One thing that was discussed when linux-next was starting up was whether I
would maintain a next branch myself, that people could actually depend on
(unlike linux-next, which gets rebuilt).

And while I could do that for really core infrastructure changes, I really
would hate to see something like that become part of the flow - because
I'd hope things that really require it should be so rare that it's not
worth it for me to maintain a separate branch for it.

But there could be some kind of carrot here - maybe I could maintain a
"next" branch myself, not for core infrastructure, but for stuff where the
maintainer says "hey, I'm ready early, you can pull me into 'next'
already".

In other words, it wouldn't be "core infrastructure", it would simply be
stuff that you already know you'd send to me on the first day of the merge
window. And if by maintaining a "next" branch I could encourage people to
go early, _and_ let others perhaps build on it and sort out merge
conflicts (which you can't do well on linux-next, exactly because it's a
bit of a quick-sand and you cannot depend on merging the same order or
even the same base in the end), maybe me having a 'next' branch would be
worth it.

But it would have to be low-maintenance. Something I might open after
-rc4, say, and something where I'd expect people to only ask me to pull
_once_ (because they really are mostly ready, and can sort out the rest
after the merge window), and if they have no open regressions (again, the
"carrot" for good behaviour).

I'm not say...

To: Linus Torvalds <torvalds@...>
Cc: Paul Mackerras <paulus@...>, Rafael J. Wysocki <rjw@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 12:17 am

linux-next is _supposed_ to be solely the stuff that is ready to be sent
to you upon window-open.

The only thing that isn't reliable are the commit ids -- and that's at
the request of a large majority of maintainers, who noted to Stephen R
that the branch he was pulling from them might get rebased -- thus
necessitating the daily tree regeneration.

So, I think a 'next' branch from you would open cans o worms:

- one more tree to test, and judging from linux-next and -mm it's tough
to get developers to test more than just upstream

- is the value of holy penguin pee great enough to overcome this
another-tree-to-test obstacle?

- opens all the debates about running parallel branches, such as, would
it be better to /branch/ for 2.6.X-rc, and then keep going full steam on
the trunk? After all, the primary logic behind 2.6.X-rc is to only take
bug fixes, theoretically focusing developers more on that task. But now
we are slowly undoing that logic, or at least openly admitting that has
been the reality all along.

Jeff

--

To: Jeff Garzik <jeff@...>
Cc: Linus Torvalds <torvalds@...>, Paul Mackerras <paulus@...>, Rafael J. Wysocki <rjw@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 5:17 am

That encourages developers to continue ignoring that stabilizing work.
The stall does have a side effect of refocussing them. A branch for -rc
and a monthly cycle would be interesting as it would mean that the
pushback for not fixing stability problems would be not getting you work
pulled for the main tree if you didn't fix the bugs first - and could be
both sufficient an incentive and not too vicious as it would be with a 2
month cycle.

Alan
--

To: Jeff Garzik <jeff@...>
Cc: Paul Mackerras <paulus@...>, Rafael J. Wysocki <rjw@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 12:46 am

Yes, the "stuff" may be supposed to be stable. But the trees feeding it
certainly are not. People are rebasing them etc, and it doesn't matter

I do agree. And maybe I should have made it clear that I think it's worth
it to me only if it then means that the merge window can shrink.

If I'd have both a 'next' branch _and_ a full 2-week merge window, there's
no upside.

Btw, it wouldn't be another tree to test, since it would presumaby be what
'linux-next' starts out from - so it would purely be something that
doesn't have the constant re-merging of the more wild-and-crazy
'linux-next' tree.

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Jeff Garzik <jeff@...>, Paul Mackerras <paulus@...>, Rafael J. Wysocki <rjw@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Sunday, May 4, 2008 - 9:47 am

Personally I think the current process works reasonably well, though
as we should always try to improve it further...

I think you could branch at ~ rc3 (strictly critical fixes only from
this point). This way 'next' wouldn't be low-maintenance but the
release branch would be.

I.e., the merge window would open at ~ rc3. At 'final', the merge window
would probably be already closed :-)

Something like:
- 2.6.26-rc3: 2.6.27 merge window opens, 2.6.26 - fixes only
- 1 week later: no core changes for 2.6.27 except fixes (drivers only?)

2.6.26* would receive backports from 2.6.27 (cherry-picking? applying
on 2.6.26 and merging?).

The "no open regressions" rule would make sense certainly - unless in
a specific case agreed otherwise.

Perhaps if needed you could let other people do the final release

Shorter cycle is the big upside.

Perhaps we could start branching later at first - say at 2.6.26-rc5,
and see how does it work.
--
Krzysztof Halasa
--

To: Krzysztof Halasa <khc@...>
Cc: Linus Torvalds <torvalds@...>, Jeff Garzik <jeff@...>, Paul Mackerras <paulus@...>, Rafael J. Wysocki <rjw@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Sunday, May 4, 2008 - 11:05 am

Yep, that sounds pretty interesting. But It would be better to start something
like ,,slow merge window'' (explained below) around -rc4 where things really
slow down (or used to).

The idea of ,,slow merge window'' would look like:
- merge only *obvious* (long awaiting) changes;
- merge stuff (fixes) which comes to -rc releases;
- merge non-core changes from -mm;

-Jacek
--

To: Paul Mackerras <paulus@...>
Cc: Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 9:57 pm

That's a unique and interesting idea...

Jeff

--

To: Jeff Garzik <jeff@...>
Cc: <paulus@...>, <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <akpm@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 10:52 pm

Full ack.

Especially if there was some kind of "pre-merge linux-next freeze" where
people (arch maintainers, kernel testers) would be actively invited to do
pre-merge testing.

During that period only changes that fix reported issues (be it build issues
or regressions) would be allowed:
- either a revert of the problematic commit
- or a targeted fix

This could even hugely improve the bisectability of mainline after the merge
as such changes could be merged/rebased into the subsystem tree _before_
Linus pulls them into mainline.

Currently I avoid -next and -mm and I also don't do any merge window
testing. Why? Too much flux, too many issues, too much energy required.
But if there was some sort of pre-merge call for testing of an identifiable
and relatively stable tree, I would definitely participate in that and be
willing to spend time to bisect the hell out of any issues I'd find.

Cheers,
FJP
--

To: Linus Torvalds <torvalds@...>
Cc: David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 4:45 pm

And what do you think is happening _after_ the merge window closes, when
we're supposed to be fixing bugs? People work on new code. And, in fact, they

How about, instead, putting limits on the amount of stuff that's going to be

Well, and when's the time for fixing bugs? Surely not during the merge window
and also not after that, because otherwise people won't be ready for the next

Exactly. Moreover, the code is now being merged at a pace that makes it

Sorry to say that, but I don't think this is realistic. What happens after the merge
window is people go and develop new stuff. They look at the already merged
code only if they have to. Also, there are a _few_ people testing the kernel
carefully enough to see the more subtle problems, let alone debugging and

My point is, given the width of the merge windown, there's too much stuff
going in during it. As far as I'm concerned, the window can be a week long
or whatever, but let's make fewer commits over a unit of time.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 9:54 am

That's not correct. People work on new code before, during, and after

To be ready for the next merge window just means to know which code is
sufficiently reviewed and tested, and to have it queued up and if
necessary synchronized with other pending code.
--
Stefan Richter
-=====-==--- -=-= ----=
http://arcgraph.de/sr/
--

To: Stefan Richter <stefanr@...>
Cc: Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 10:06 am

I'm not quite sure if really all of them do. Well, I should have said "some

Of course it _should_ mean that, but the fact is unreviewed and untested
patches are pushed to Linus, at least from time to time. [Even some known
broken patches were pushed to Linus in the past, but we can't prevent that from
happening by any process changes.]

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 5:37 pm

Oh, I agree. But at that point, the issue you brought up - of testing and
then having the code change under you wildly - has at least gone away.

From a testing standpoint, the *developers* aren't ever even the main
issue. Yes, we get test coverage that way too, but we should really aim
for getting most of the non-obvious issues from the user community, and
not primarily from developers.

So the whole point of the merge window is *not* to have developers testing
their code during the six subsequent weeks, but to have *users* able to
use -rc1 and report issues!

That's why the distro "testing" trees are so important. And that's why

I'm not following that logic.

A single merge will bring in easily thousands of commits. It doesn't
matter if the merge window is a day or a week or two weeks, the merge will
be one event.

And there's no way to avoid the fact that during the merge window, we will
get something on the order of ten thousand commits (eg 2.6.24->25-rc1 was
9629 commits).

So your "fewer commits over a unit of time" doesn't make sense. We have
those ten thousand commits. They need to go in. They cannot take forever.
Ergo, you *will* have a thousand commits a day during the merge window.

We can spread it out a bit (and I do to some degree), but in many ways
that is just going to be more painful. So it's actually easier if we can
get about half of the merges done early, so that people like Andrew then
has at least most of the base set for him by the first few days of the
merge window.

So here's the math: 3,500 commits per month. That's just the *average*
speed, it's sometimes more. And we *cannot* merge them continuously,
because we need to have a stabler period for testing. And remember: those
3,500 commits don't stop happening just because they aren't merged. You
should think of them as a constant pressure.

So 3,500 commits per month, but with a stable period (that is *longer*
than the merge window) that means that the merge wind...

To: Linus Torvalds <torvalds@...>
Cc: David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 6:23 pm

That's correct, but since developers are already working on new code at that
point, the bug reports in fact distract them and make them go back to the "old"
stuff, recall why they did that particular changes etc. As a result, the
developers often do not take the bug reports seriously enough, especially if
they do not finger the "guilty" change. That, in turn, makes the users believe

Well, do we _have_ _to_ take that much? I know we _can_, but is this really

Oh, yes it does. Equally well you could say that having brakes in a car
didn't make sense, even if you could drive it as fast as the engine allowed

Surely, they don't, but maybe they don't have to.

You can technically handle merging even more, but what about quality? Do we
have a quality assurance process in place? If we do, what is it? How is it
able to handle the 3500 commits a week? Assuming it is, will it be able to
handle more and what's the limit?

IMO, there has to be a limit somewhere, or we will end up in a spiral driving
everybody mad.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 6:40 pm

not really, if patches are produced at a rate of 1000/week and you decide
to only accept 2000 of them this month, a month later you have 6000
patches to deal with. history has shown that developers do not stop
developing if their patches are not accepted, they just fork and go their
own way.

David Lang

--

To: <david@...>
Cc: Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 7:45 pm

Well, I think you know how TCP works. The sender can only send as much
data as the receiver lets it, no matter how much data there are to send.
I'm thinking about an analogous approach.

If the developers who produce those patches know in advance about the rate
limit and are promised to be treated fairly, they should be able to organize

That's mostly when they feel that they are treated unfairly.

OTOH, insisting that your patches should be merged at the same rate that you're
able to develop them is unreasonable to me.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: <david@...>, Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 8:38 pm

We cannot control who develops what.

When someone wants some feature or wants to get Linux running on his
hardware he will always develop the code.

We can only control what we merge.

And the main rationale for the 2.6 development model was that we do no
longer want distributions to ship kernels with insane amounts of

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: Adrian Bunk <bunk@...>
Cc: <david@...>, Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 8:56 pm

To be exact, we control what we merge and when. There's no rule saying that
every patch has to be merged as soon as it appears to be ready for merging,

This was an argument agaist starting a separate development branch in analogy
with 2.5, IIRC, and I agree with that.

Still, I think we don't need to merge patches at the current rate and it might
help improve their overall quality if we didn't. Of course, the latter is only
a speculation, although it's based on my experience.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: <david@...>, Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 9:25 pm

What currently gets applied to the kernel are between two and three
million lines changed per year.

We can discuss when and how to apply them.

But unless we want to create an evergrowing backlog we have to change
roughly 200.000 lines per month on average.

Even with higher quality criteria that might result in some code not

See above - what do you want to do if we'd merge less and have a backlog
of let's say one million lines to change after one year, much of it
already in distribution kernels?

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: Adrian Bunk <bunk@...>
Cc: <david@...>, Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 8:05 am

Well, I'm feeling that's what Linus is tryig to say too. :-)

I, for one, don't really want to cope with a situation I don't feel comfortable
in, because in the long run that leads to growing frustration. It seems pretty
obvious to me that people generally get more and more frustrated with the
current development process and it will have to be addressed somehow anyway.

If there's a problem, and I think that there really _is_ one, we should at
least try to _address_ it instead of just trying to duck it.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 7:57 pm

they will make the patches bigger to get the changes in a smaller number

it's not nessasarily the individuals that fork, it's the distros who want
to include the fixes and other changes that the individuals that create
the fork.

David Lang
--

To: <david@...>
Cc: Rafael J. Wysocki <rjw@...>, Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 8:01 pm

Is that really bad? Isn't that effectively equivalent to "increased testing of
earlier intergrations"?

-chris
--

To: Chris Shoemaker <c.shoemaker@...>
Cc: Rafael J. Wysocki <rjw@...>, Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 8:14 pm

not if there are so many changes that the testing isn't really relavent to
mainline.

not if the changes don't get into mainline.

look at the mess of the distro kernels in the 2.5 and earlier days. having
them maintain a large body of patches didn't work for them or for the
mainline kernel.

David Lang
--

To: <david@...>
Cc: Chris Shoemaker <c.shoemaker@...>, Rafael J. Wysocki <rjw@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 8:38 pm

Exactly.

I do think Rafael's TCP analogy is somewhat germane, but it misses the
point that the longer the queue gets, the *worse* the quality gets. It
gets worse because the queued-up patches don't actually get tested any
more during their queueing, and because everybody else who isn't
intimately involved with production of said patches just gets *less*
inclined to look at big patch-queue than a small one.

So having a long queue and trying to manage it (by some kind of negative
feedback) is counter-productive, because by the time that situation
happens, you're basically screwed already.

That's what we largely had with the Xen merge, for example. A lot of the
code had been around for basically _forever_, and the people involved in
reviewing it got really tired of it, and there was no way in *hell* a new
person would ever start reviewing the huge backlog. Once it is massive,
it's just too massive.

So trying to push back from the destination is really painful. It's also
aggravating for everybody else. When people were complaining about me not
scaling (remember those flame-wars? Now the complaint is basically the
reverse), it was very painful for everybody, and most of all me.

So I really really hope that if we need throttling (and I do want to point
out that I'm not entirely sure we do - I think the issue is not "number of
commits", but "quality of code", and I do _not_ agree that the two are
directly related in any way), it should be source-based.

Trying to make sure that the source throttles, and not by making
developers feel unproductive. And quite frankly, most things that throttle
the source are of the annoying and non-productive kind. The classic source
throttle tends to be to make it very "expensive" to do development, by
introducing various barriers.

The barriers are usually "you need to have <n> other people look at it",
or "you need to pass this five-hour test-suite", and almost invariably,
the big issue is not code quality, b...

To: Linus Torvalds <torvalds@...>
Cc: <david@...>, Chris Shoemaker <c.shoemaker@...>, Rafael J. Wysocki <rjw@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 9:39 pm

Heh. The Xen code in the kernel now is a complete rewrite, with only
trace elements from the original patchset. And yes, that's partly
because the original patches were unreviewable.

J
--

To: Rafael J. Wysocki <rjw@...>
Cc: David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 6:31 pm

Do you want me to stop merging your code?

Do you think anybody else does?

Any suggestions on how to convince people that their code is not worth
merging?

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>, Greg KH <greg@...>
Date: Wednesday, April 30, 2008 - 7:03 pm

Well, no, but actually there are only a few of my patches in this merge
window. :-)

Moreover, if the maintainers who took them told me they would be scheduled for
the next merge window, I wouldn't mind. That actually happended to some of my
patches that are in the Greg's tree at the moment and that's fine (although I
consider the patches as important).

IMO, this is a question of balance. Of course, a maintainer can take
everything from everyone, but at the same time he can have a look at the
patches and say "Well, I have lots of stuff scheduled for this merge window
already, this stuff of yours will wait for the next merge window. Please
improve the code or review the others' patches in the meantime". The only

I think the majority of developers would understand if you told them you could
only merge a limited amount of changes in a single merge window, provided that
they would be treated fairly.

When you take everything from everyone, you actually reward people who are
able to develop more code between merge windows. Not necessarily those who
spend time on different important activities, such as reviewing the others'

That shouldn't be necessary. :-)

The point is to tell people to develop the code less rapidly, so to speak.
Or maybe more carefully.

Thanks,
Rafael
--

To: Linus Torvalds <torvalds@...>
Cc: Rafael J. Wysocki <rjw@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 6:46 pm

I think you're approaching a solution Linus. If developers take a refusal
as a punishment, maybe you can use that for trees which have too many
unresolved regressions. This would be really unfair to subsystem maintainers
which themselves merge a lot of work, but recursively they may apply the
same principle to their own developers, so that everybody knows that it's
not worth working on next code past a point where too many regressions are
reported.

Willy

--

To: Willy Tarreau <w@...>
Cc: Rafael J. Wysocki <rjw@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 7:20 pm

Heh. It's been done. In fact, it's done all the time on a smaller scale.
It's how I've enforced some cleanliness or process issues ("I won't pull
that because it's too ugly"). I see similar messages floating around about
individual patches.

That said, I don't think it really works that well as "the solution": it
works as a small part of the bigger picture, but no, we can't see
punishment as the primary model for encouraging better bevaiour.

First off, and maybe this is not true, but I don't think it is a very
healthy way to handle issues in general. I may come off as an opinionated
bastard in discussions like these, and I am, but when it actually comes to
maintaining code, really prefer a much softer approach.

I want to _trust_ people, and I really don't want to be a "you need to do
'xyz' or else" kind of guy.

So I'll happily say "I can't merge this, because xyz", where 'xyz' is
something that is related to the particular code that is actually merged.
But quite frankly, holding up _unrelated_ fixes, because some other issue
hasn't been resolved, I really try to not do that.

So I'll say "I don't want to merge this, because quite frankly, we've had
enough code for this merge window already, it can wait". That tends to
happen at the end of the merge window, but it's not a threat, it's just me
being tired of the worries of inevitable new issues at the end of the
window.

And I personally feel that this is important to keep people motivated.
Being too stick-oriented isn't healthy.

The other reason I don't believe in the "won't merge until you do 'xyz'"
kind of thing as a main development model is that it traditionally hasn't
worked. People simply disagree, the vendors will take the code that their
customers need, the users will get the tree that works for them, and
saying "I won't merge it" won't help anybody if it's actually useful.

Finally, the people I work with may not be perfect, but most maintainers
are pretty much experts within their ...

To: Linus Torvalds <torvalds@...>
Cc: Willy Tarreau <w@...>, Rafael J. Wysocki <rjw@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 9:30 pm

And, ideally, they would have posted the changes as patches to the list
for review anyway, so there shouldn't be anything surprising in that pull...

J
--

To: Jeremy Fitzhardinge <jeremy@...>
Cc: Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 1:35 am

yes, it's something which has been disappearing since use of bk then git.
It would be impratical and useless to post everything during the merge
window now, but if we can get everyone to pass through linux-next, the
posts will be evenly distributed and it would make sense to require
everyone to post their changes to the list at the same time. Right now,
some developers already always post their changes. Jeff, Greg and
Bartlomiej come to mind, and I must say that I'm always interested in
performing a quick look, just in case something really obvious catches
my attention (which never happens).

Willy

--

To: Linus Torvalds <torvalds@...>
Cc: Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 8:42 pm

It may help directly, for example when people realize that they work on

I totally agree with that.

Still, the issue at hand is that
(1) The code merged during a merge window is somewhat opaque from the tester's
point of view and if a regression is found, the only practical means to
figure out what caused it is to carry out a bisection (which generally is
unpleasant, to put it lightly).
(2) Many regressions are introduced during merge windows (relative to the
total amount of code merged they are a few, but the raw numbers are
significant) and because of (1) the process of removing them is generally
painful for the affected people.
(3) The suspicion is that the number of regressions introduced during merge
windows has something to do with the quality of code being below
expectations, that in turn may be related to the fact that it's being
developed very rapidly.

My opinion is that we need to solve this issue sooner rather than later and so
the question is how we are going to approach that.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 9:19 pm

Hey, guv, do you _honestly_ believe that some kind of ISO-9000-like
process generates quality?

And I dislike how people try to conflate "quality" and "merging speed" as
if there was any reason what-so-ever to believe that they are related.

You (and Andrew) have tried to argue that slowing things down results in
better quality, and I simply don't for a moment believe that. I believe
the exact opposite.

The way to get good quality is not to put barriers up in front of
developers, but totally the reverse - by helping them. And yes, that help
can quite possibly be in the form of "process" - by making things more
streamlined, and by having people not have to waste time on wondering
where they should send things etc.

But the notion that we should even _try_ to aim to slow things down, that
one I find unlikely to be true, and I don't even understand why anybody
would find it a logical goal?

Of course, you will have fewer new bugs if you have fewer changes. But
that's not a goal, that's a tautology and totally uninteresting. A small
program is likely to have fewer bugs, but that doesn't make something
small "better" than something large that does more.

Similarly, a stagnant development community will introduce new bugs more
seldom. But does that make a stagnant one better than a virbrant one? Hell
no.

So what I'm arguing against here is not that we should aim for worse
quality, but I'm arguing against the false dichotomy of believing that
quality is incompatible with lots of change.

So if we can get the discussion *away* from the "let's slow things down",
then I'm interested. Because at that point we don't have to fight made-up
arguments about something irrelevant.

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Rafael J. Wysocki <rjw@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 1:50 am

Note that I'm not necessarily arguing for slowing down, but for reduced
functional conflicts (which slow down may help but it's not the only
solution). I think that refining the time resolution might achieve the
same goal. Instead of merging 10000 changes which each have 1% chance
of breaking any other area, and have all developers try to hunt bugs
caused by unrelated changes, I think we could do that in steps.

To illustrate, instead of changing 100 areas with one of them causing
breaking in the other ones, and having 100 victims try to hunt the
bug in 99 other areas, then theirs, and finally insult the faulty
author, we could merge 50 areas in version X and 50 in X+1 (or 3*33
or 4*25, etc...). That way, we would only have 50 victims trying to
find the bug in 49 other areas (or 32 or 24). Less people wasting
their time will mean faster validation of changes, and possibly
faster release cycle with better quality.

People send you their crap every two months. If you accept half of
it every month, they don't have to sleep on their code, and at the
same time at most half of them are in trouble during half the time

Willy

--

To: Willy Tarreau <w@...>
Cc: Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 7:53 am

How about:

(1) Merge a couple of trees at a time (one tree at a time would be ideal, but
that's impossible due to the total number of trees).
(2) After (1) give testers some time to report problems introduced by the
merge.
(3) Wait until the most urgent problems are resolved. Revert the offending
changes if there's no solution within given time.
(4) Repeat for another couple of trees.
(5) Arrange things so that every tree gets merged once every two months.

This would also give us an idea of which trees introduce more problems.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 3:36 pm

You can't get there from here (at least not very easily).

If you have 60 trees, and want a merge for each one every 2 months, you have to
average 1 tree a day. How big a delay you want in step (2) directly impacts
how many trees you merge at once - if you want a week of cook time, you have to
merge 7 trees every Monday, and so on...

To: Rafael J. Wysocki <rjw@...>
Cc: Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 9:16 am

...and what would you do with such information?

I'm not actually worried about my tree but if (theoretically) it happens to
be amongst the "problematic" ones I would be a bit pissed by blame shifting,
especially given that it is very difficult to compare different trees as
they (usually) deal with quite different areas of the code (some are messy
and problematic, yet critical while others can be more forgiving).

Also slowing down things to focus on quality is really a bad idea. You can
trust me on this one, I've tried it once on the smaller scale and it was a
big disaster cause people won't focus on quality just because you want them
to. They'll continue to operate in the usual way and try to workaround you
instead (which in turn causes extra tensions which may become quiet warfare).
In the end you will have a lot more problems to deal with...

Same goes for any other kind of improvement by incorporating "punishment" as
the part of the process. You are much better helping people and trying them
to understand that they should apply some changes to their way of work because
it would be also beneficial for _them_, not only for _you_.

Now regarding the development model - I think that there is really no need
for a revolution yet, instead we should focus on refining the current process
(which works great IMO), just to summarize various ideas given by people:

- try to persuade few black sheeps that skipping linux-next completely for
whole patch series is a really bad idea and that they should try to spend
a bit more time on planning for merge instead of LastMinute assembly+push
(by doing it right they could spend more time after merge to prepare for
the next one or fixing old bugs instead of chasing new regressions, overall
they should have _more_ time for development by doing it right)

- encourage flatting of merges during the merge window so instead of 1-2 big
merges per tree at the beginning of the merge you have few smaller ones
(majority of maintainers do it...

To: Bartlomiej Zolnierkiewicz <bzolnier@...>
Cc: Rafael J. Wysocki <rjw@...>, Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 11:29 am

On Thu, May 1, 2008 at 6:16 AM, Bartlomiej Zolnierkiewicz

When a teacher assigns grades in a class, it's not punishment, it's feedback.

I don't think anyone *intends* to push crap into the tree. However,
with the barrier to getting things into the tree so low, some may feel
there's less incentive to try to get things right the first (or
second) time. It would be nice to provide that incentive.

Normally, it'd be peer-review of the uncommitted patches. We don't
have a lot of that going on here, though. So,
peer-review-after-the-fact, ie, who placed this massive turd in the
tree, and everyone swivels an eye over there and asks what went wrong,
and how do we prevent it in the future. Those conversations seem to be
happening already, time to time.

And as a policy suggestion, if we're past rc1 and someone has
identified a commit as the root of a regression/bug, then the policy
should be just to revert it immediately, no questions asked. Let the
original author work with the person who identified the problem and
resend a fixed commit later. We lose testers in the meantime, and
perhaps the extra effort involved in having the author work out the
issues and redo the patch will help prevent drive-by patching in the
future.
--

To: Ray Lee <ray-lk@...>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@...>, Rafael J. Wysocki <rjw@...>, Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 3:03 pm

you make a valid point here : "we lose testers in the meantime". Maybe
it would help if -rc2 would be released a few days after -rc1 with
the first most obvious showstoppers (often build issues). The most
problematic ones are often fixed within an hour or so, but for most
testers, it still means they have to wait for -rc2.

Most external testers might then only try -rc2 first, but that's not
a problem. What we really want is them to test widely and not revert
back at the first problem. If only 20% of testers try -rc1, and the
remaining 80% actively wait for -rc2 3 days after, then we'll get
broader testing in the first two weeks.

Willy

--

To: Bartlomiej Zolnierkiewicz <bzolnier@...>
Cc: Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 9:53 am

There still are too many bugs of this kind that make it to the Linus' tree and

Well, I'm not sure what that's supposed to mean, so I won't comment.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 10:35 am

Agreed but if you trace the way of these bugs into the Linus' tree many of
them follow one of two patterns:

* -mm / -next skipped completely

* short time in -mm / -next (< 2 weeks)

[ disclaimer: this is based on my observations, no hard data to prove it ]

Please also remember that linux-next concept is still quite _fresh_ with
a _plenty_ of room for enhancements like having kernel-du-jour packages for
the most popular distros, doing more automated testing + searching for

This was not directed at you (you are doing great work BTW) but rather
at some people trolling the thread.

Thanks,
Bart
--

To: Rafael J. Wysocki <rjw@...>
Cc: Willy Tarreau <w@...>, Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 8:11 am

Perhaps it would make sense to split the merge window into 2 - first
week kernel/net/mm/lib etc., second week arch/drivers/fs? Obviously
some changes are going to span those two areas but it might help in
pinpointing where breakage was introduced as well as quietening the
thundering herd of pull requests at the start of a merge window and
thereby allow review to happen over a longer period.

Or I could just be dreaming...
--

To: Rafael J. Wysocki <rjw@...>
Cc: Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 9:40 pm

Sorry, not Andrew. DavidN.

Andrew argued the other way (quality->slower), which I also happen to not
necessarily believe in, but that's a separate argument.

Nobody should ever argue against raising quality.

The question could be about "at what cost"? (although I think that's not
necessarily a good argument, since I personally suspect that good quality
code comes from _lowering_ costs, not raising them).

But what's really relevant is "how?"

Now, we do know that open-source code tends to be higher quality (along a
number of metrics) than closed source code, and my argument is that it's
not because of bike-shedding (aka code review), but simply because the
code is out there and available and visible.

And as a result of that, my personal belief is that the best way to raise
quality of code is to distribute it. Yes, as patches for discussion, but
even more so as a part of a cohesive whole - as _merged_ patches!

The thing is, the quality of individual patches isn't what matters! What
matters is the quality of the end result. And people are going to be a lot
more involved in looking at, testing, and working with code that is
merged, rather than code that isn't.

So _my_ answer to the "how do we raise quality" is actually the exact
reverse of what you guys seem to be arguing.

IOW, I argue that the high speed of merging very much is a big part of
what gives us quality in the end. It may result in bugs along the way, but
it also results in fixes, and lots of people looking at the result (and
looking at it in *context*, not just as a patch flying around).

And yes, maybe that sounds counter-intuitive. But hey, people thought open
source was counter-intuitive. I spent years explaining why it should work
at all!

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 7:38 am

And we introduce bugs that nobody sees until they appear in a CERT advisory.

IMnsHO, the quick merging results in lots of code that nobody looked at,
except for the author, nobody is looking at and nobody will _ever_ look at.
Simply, because there's no time for looking at that code, since we're
supposed to be working on preparing new code for the next merge window, testing
the already merged code etc., around the clock. Now, you may hope that this
not-looked-at-by-anyone code is of high quality nevertheless, but I somehow
doubt it.

[Note that it's not directly related to the issue at hand, which is the fact
that people affected by regressions are heavily punished by our current
process. Never mind, though.]

And that's not to mention bugs that appear in the code everybody looked at
and happily reach the mainline because that code has not been tested well
enough before merging. Take SLUB as an example, if you wish.

The fact is, we're merging stuff with minimal-to-no review and with minimal
testing reasonably possible. Is _that_ supposed to produce the high quality?

Also, I'm not buying the argument that the quality of code improves over time
just because it's open and available to everyone. That only happens to the
code which is actually looked at by someone or attempted to modify. This
obviously doesn't apply to the whole kernel code.

For this reason, IMO, we should do our best to ensure that the code being
merged is of high quality already at the moment we merge it. How to achieve
that is a separate issue.

BTW, we seem to underestimate testing in this discussion. In fact, the vast
majority of kernel bugs are discovered by testing, so perhaps the way to go
is to make regular testing of the new code a part of the process.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Linus Torvalds <torvalds@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 10:28 am

On Thu, 1 May 2008 13:38:33 +0200

well.. -rc1 to -rc8 are doing that already, somewhat.
Can we do better? Always. The more testing the better, and the more
testers the better.
--

To: Arjan van de Ven <arjan@...>
Cc: Linus Torvalds <torvalds@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 8:41 am

The testing is not really a part of the process right now, though. We somehow
hope that the kernel will be tested sufficiently before a major release, but
we don't measure the testing coverage, for example. Of course, that will
involve more work independent of the code writing, but at one point it'll
just become a necessity.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Linus Torvalds <torvalds@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 11:06 am

On Thu, 1 May 2008 14:41:05 +0200

Well. Take 2.6.25.. we know Fedora shipped it in their alpha's and
betas (and in rawhide). Those are used by a lot of people; so for me
that's a whole bunch of coverage right there. Is it perfect? No.
But in a way it's in the spirit of open source: the people who care
about a stable release the most (distros) [1], helped us getting this
tested. The other people on this thread we care greatly at least also
help us test in general.

[1] Not trying to say no single person wouldn't care; but a distro
tends to care more due to the sheer number of users...
--

To: Linus Torvalds <torvalds@...>
Cc: <rjw@...>, <w@...>, <davem@...>, <linux-kernel@...>, <akpm@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 11:53 pm

The main problem as I see it is with the huge number of hard, confirmed bugs
that are *not* getting fixed.

With the current development model, developers only really care about
current regressions. In a large part this is due to the excellent work of
Rafael with his tracking of regressions since the previous release.
But it does mean older regressions fall by the wayside, even if they've been
confirmed, bisected and the submitter is responsive.
For a while Natalie Protasevich did some work on trying to get attention for
older regressions, but that effort seems to have died out.

Two concrete examples from my personal experience:
- http://bugzilla.kernel.org/show_bug.cgi?id=9749; the error:
sysctl table check failed:
/dev/parport/parport0/devices/ppdev0/timeslice Sysctl already exists
First reported for 2.6.24-rc5, just now confirmed with 2.6.25
Acknowledged by maintainer, but no follow-up [1].

- http://bugzilla.kernel.org/show_bug.cgi?id=9310; the error:
completely blank console with FRAMEBUFFER_CONSOLE_DETECT_PRIMARY set when
framebuffer is active, but no VGA=xxx parameter is passed
First reported for 2.6.23, confirmed for 2.6.24-rc6, almost certainly
still present in 2.6.25
Acknowledged by maintainer, but no follow-up despite later pings.

Another issue is that sometimes developers really are too eager to get their
changes into mainline even when there are known issues or when they know in
their heart that the changes have not received enough testing.

Example is a a scheduler change [2] that causes a completely reproducible
regression (music skips and key repeats) on my box with one specific
workload. Ingo and Peter have been great doing debugging after I reported
it for 2.6.25-rc8 and it was reverted just before the release, but I was
very surprised to see the patch resubmitted for 2.6.26 without the
regression being resolved first.
It is now confirmed to still be there and there has been additional effort
on it, but so far without resu...

To: Linus Torvalds <torvalds@...>
Cc: Rafael J. Wysocki <rjw@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 10:31 pm

Hi.

No. People generally expect that code that has been merged does work, so
they don't look at it unless they're forced to (by a bug or the desire
to make further modifications in that code) and they don't explicitly
seek to test it. They just seek to use it.

When it doesn't work, some of us will go and seek to find the cause,
others (most?) will simply roll back to whatever they last found to be
reliable.

Out of tree code has the same issues.

The only time code really gets looked at and tested is when there's a
problem, or when people are explicitly choosing to inspect it (pre-merge
reviews, eg).

So my answer to the "how do we raise quality" question would be that
when writing the code, we put time and effort into properly analysing
the problem and developing a solution, we put time and effort into
carefully testing the solution, and we put code in that will help the
end-user help us to debug issues later (without them necessarily needing
to git-bisect). After all, good software isn't the result of random (or
semi-random), unconsidered modifications, but of planning, thought and
attention to detail.

In other words, I'm arguing that the speed of merging should be
irrelevant. What's relevant is the quality of the work done in the first
place.

If you want better quality code, penalise the people who get buggy code
merged. Give them a reason to get it in a better state before they try
to merge. Of course Linus alone can't do that.

Nigel

--

To: Nigel Cunningham <ncunningham@...>
Cc: Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 2:32 pm

Amen!

--

"They that give up essential liberty to obtain temporary safety,
deserve neither liberty nor safety." (Ben Franklin)

"The course of history shows that as a government grows, liberty
decreases." (Thomas Jefferson)

--

To: Linus Torvalds <torvalds@...>
Cc: Rafael J. Wysocki <rjw@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 10:21 pm

Really? And how, pray tell, being out there will magically improve the
code? "With enough eyes all bugs are shallow" stuff out of ESR's arse?

FWIW, after the last month's flamefests I decided to actually do something
about review density of code in the areas I'm theoretically responsible
for. Namely, do systematic review of core data structure handling (starting
with the place where most of the codepaths get into VFS - descriptor tables
and struct file), doing both blow-by-blow writeup on how that sort of things
is done and documentation of the life cycle/locking rules/assertions made
by code/etc. I made one bad mistake that held the things back for quite
a while - sending heads-up for one of the worse bugs found in process to
never-sufficiently-damned vendor-sec. The last time I'm doing that, TYVM...

Anyway, I'm going to get the notes on that stuff in order and put them in
the open. I really hope that other folks will join the fun afterwards.
The goal is to get a coherent braindump that would be sufficient for
people new to the area wanting to understand and review VFS-related code -
both in the tree and in new patches.

files_struct/fdtable handling is mostly dealt with, struct file is only
partially done - unfortunately, struct file_lock has to be dealt with
before that and it's a (predictable) nightmare. On the other end of
things, fs_struct is not really started, vfsmount review is partially
done, dentry/superblock/inode not even touched.

Even with what little had been covered... well, let's just say that it
caught quite a few fun turds. With typical age around 3-4 years. And
VFS is not the messiest part of the tree...
--

To: Al Viro <viro@...>
Cc: Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Saturday, May 3, 2008 - 11:26 pm

In the same way that ESR's arse would improve if he'd not wear pants: by him
going to the gym more to avoid at least a few of the many disgusted stares.

ie, the magic would be in the quality of the code being greater simply due
to developer being aware of the openness. The effect probably wears of after
enough time though...

Rene.
--

To: Al Viro <viro@...>
Cc: Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 1:19 am

thank you, the lack of good documentation on the intent of the code has
been a significant barrier for new people. it's (relativly) easy for a
good programmer to look at the code and figure out how it does things, a
bit harder to figure out what it does, but why it does it (and what it was

it may not be the messiest part of the tree, but it's definantly one of
the hardest to figure out the intent of.

David Lang
--

To: <torvalds@...>
Cc: <rjw@...>, <w@...>, <linux-kernel@...>, <akpm@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 9:51 pm

From: Linus Torvalds <torvalds@linux-foundation.org>

This is a huge burdon to put on people.

The more broken stuff you merge, the more people are forced to track
these problems down so that they can get their own work done.

It punishes people who do put forth the effort to let new changes cook
properly, before pushing, and thus avoid putting turds into the tree.

You really have to think about the ramifications of this system.
--

To: David Miller <davem@...>
Cc: <rjw@...>, <w@...>, <linux-kernel@...>, <akpm@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 10:01 pm

I'm not saying we should merge crap.

You can take any argument too far, and clearly it doesn't mean that we
should just accept *anything*, because it will magically be gilded by its
mere inclusion into the kernel. No, I'm not going to argue that.

But I do want to argue against the notion that the only way to raise
quality is to do it before it gets merged. It's often better to merge
early, and fix the issues the merge brings up early too!

Release early, release often. That was the watch-word early in Linux
kernel development, and there was a reason for it. And it _worked_. Did it
mean "release crap, release anything"? No. But it did mean that things got
lots more exposure - even if those "things" were sometimes bugs.

Linus
--

To: <torvalds@...>
Cc: <rjw@...>, <w@...>, <linux-kernel@...>, <akpm@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 10:17 pm

From: Linus Torvalds <torvalds@linux-foundation.org>

That's exactly what's been happening this merge window though.

And throughout this, Andrew Morton has been the only person with the
balls and lack of ego problems to revert regression causing changes he
introduced.
--

To: Linus Torvalds <torvalds@...>
Cc: Rafael J. Wysocki <rjw@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 9:31 pm

eh? I argued the opposite: that increasing quality will as a side-effect
slow things down.

If we simply throttled things, people would spend more time watching the
shopping channel while merging smaller amounts of the same old crap.

--

To: Andrew Morton <akpm@...>
Cc: Rafael J. Wysocki <rjw@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 9:43 pm

Yes, my bad, I realized that when I read through my message and already

I agree totally. And although some of the time would probably _also_ be
spent on the frustrating crap that was designed to do the throttling, that
isn't much more productive than watching the shopping channel would be ...

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 6:59 am

Okay, so what exactly are we going to do to address the issue that I described
in the part of my last message that you skipped?

Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Andrew Morton <akpm@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 11:26 am

And quite frankly, (2) and (3) are both: "merge windows introduce new
bugs", and that's such an uninteresting tautology that I'm left
wordless. And (1) is just a result of merrging lots of stuff.

Of course the new bugs / regressions are introduced during the merge
window. That's when we merge new code. New bugs don't generally happen
when you don't get new code.

And of course finding bugs is always painful to everybody involved.

And of course the bugs indicate something about the quality of code
being merged. Perfect code wouldn't have bugs.

So what you are stating isn't interesting, and isn't even worthy of
discussion. The way you state it, the only answer is: don't take new
code, then. That's what your whole argument always seems to boild down
to, and excuse me for (yet again) finding that argument totally
pointless.

So let me repeat:

(1) we have new code. We always *will* have new code, hopefully. A few
million lines pe year.

If you don't accept this, I don't have anything to say.

(2) we need a merge window. That is a direct result not of wanting to
have lots of code at the same time, but of the _reverse_ issue: we
want to have times of relative calm.

And again, if you continue to see the merge window as the
"problem", rather than as the INEVITABLE result of wanting to have
a calm period, there's no point in talking to you.

(3) Ergo, there's a very fundamental and basic and inescapable result:
we absolutely _will_ have times when we get lots and lots of new
code.

So these are not "problems". They are *facts*. Stating them as
problems is stupid and pointless. I'm not going to discuss this with
you if you cannot get over this.

So please accept the facts.

Once you accept the facts, you can state the things you can change. But
the things you cannot change is the merge window, and the fact that we
get a lot of new code at a high rate (where the merge window will
inevitably compress tha...

To: <linux-kernel@...>
Date: Thursday, May 1, 2008 - 2:35 pm

Pardon this comment from an inexperienced kernel hacker, but it seems to
me that one of the main problems is subsystems stomping on each other
during the merge window, and a general confusion as to who is responsible
for what bugs that appear.

Perhaps a shorter merge window, using a round-robin approach, based on
subsystem, would help alleviate these issues?

This would:

- give people a "known" tree to base their subsystem patches on,
when their turn comes around

- give a rough schedule if the round-robin was always consistent
in order, or made known in advance

- a shorter window would keep people from waiting too long for
their turn

- give those responsible for the currently merged subsystem
motivation and clarity to fix bugs that do appear during
their merge window

Problems I see with this approach:

- those at the end of the cycle get the shaft, if previous changes
affect their work

- political issues with determining the order of the round-robin
schedule

If I'm overlooking something, I'm sure someone will correct me. :-)

- Chris

--

To: linux kernel list <linux-kernel@...>
Date: Friday, May 2, 2008 - 9:22 am

Hi folks,

<big_snip>

Just a few naive thoughts:

a) What about reducing code size ?

Some parts, IMHO, doen't necessarily need to be in the kernel,
eg. certain filesystems. Less code, less patches to review, less
chance of kernel bugs. Of course this might also cause other
impacts (eg. performance), so those decisions require great care.

b) Mutli-tier trees / patchlines

IMHO, a major problem are conflicting patches (eg. a core change
causes some driver to break). In measurement instrumentation
(eg. timesync), there's typically one primary reference point
(eg. atomic clock) as tier-0, where (a limited set of) tier-1's
are synchronized against, tier-2 syncs against tier-1 and so on.

So for the linux kernel, we perhaps could have something like:

* tier-0: core
* tier-1: arch
* tier-2: hw drivers
* tier-3: sw drivers
* tier-4: userland interfaces

If a change from a lower tier wants to it's upper tier, it first
MUST fit it's current mainline and carefully checked. Of course
this introduces longer times for an individual change to go to
into release (since it has to pass several tiers), but IMHO the
chance of new bugs in release should be reduced this way.

Of course there might be chances in a lower tier, which obviously
won't affect several intermediate tiers. Those could skip some tiers.

For example, I'm currently working on an /proc interface for
changing process privileges. In my model, this had to be settled
in #4, but shouldn't touch drivers (#2,#3), but maybe arch (#1).
So these changes could be kicked directly to #2.

What do you think about this ?

cu
--
---------------------------------------------------------------------
Enrico Weigelt == metux IT service - http://www.metux.de/
---------------------------------------------------------------------
Please visit the OpenSource QM Taskforce:
http://wiki.metux.de/public/OpenSource_QM_Taskforce
Patches / Fixes for a lot dozens of packages in dozens of versions:
[ message continues ]

" title="http://pa...">http://pa...

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 1:09 pm

Perhaps if they introduced fewer bugs, all of that would be less frustrating to
people who get hit by them, especially by two or more at a time. Everyone
seems to be fine with that until it happens to him personally (like it happened

I obviously agree with that. The question is, however, if we can decrease the
number of bugs introduced during merge windows and you seem to be saying

I have never said you shouldn't take new code at all. That's not what I'm
saying and please don't paint me this way.

I see a problem in that you get patches that you shouldn't have got because
they are unfinished and not well thought through. They introduce regressions
which are only possible to find using bisection because of the amount of code
merged at a time and that's frustrating.

You seem to be regarding this as a necessity, but I'm really not convinced

However, the width of the merge window is not a predetermined thing and might

The problem is the (relatively small) fraction of patches pushed to you that
is broken. Some patches are obviously broken, some of them are just not
tested well enough. The result is pretty much the same in either case.

Now, the question is if we can get rid of that fraction by adjusting the
process somehow. You're arguing that we can't and so be it. [This is your
opinion and BTW there's nothing allowing me to call that unreasonable or saying
that you use made up arguments or something like this.]

My opinion is that we could at least try to do something about it. linux-next
is probably a step in the right direction, though time will tell. I'm afraid,
though, that I personally can't do much more than I've been doing already to

The message that started this whole thread was not from me and I believe
it was sent for a reason. So the fact is that at least some people lose their
patience over the current handling of merge windows. And I'm not sure that's
necessary.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Andrew Morton <akpm@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 1:41 pm

No, that's not what I'm saying.

What I *am* saying is that as long as you concentrate on "merge window"
and "lots of code", you're concentrating not on the problems, but on the
facts of life. You can't change facts, and even trying is pointless.

What you should concentrate on is not how many patches there are during
the merge window (because we can't do anything about that) or the fact
that they all happen in a short timeframe, but about quality of patches
_regardless_ of merge window.

So if you can make an argument that does not even *try* to change the fact
that
- we have lots of patches
and
- we have a merge window
and
- merging patches causes bugs

but argues about quality from some other standpoint, then I can start to
believe that you have a point.

But as long as you argue about the fact that we merge a lot of stuff, and
that bugs come in during the merge window, I'm not interested. Arguing
about facts is totally non-productive.

And as long as people keep saying "let's not merge broken patches" or "we
should never have bugs", I'll just ignore those kinds of idiotic
statements. They aren't even arguments, they are wishes, and they are
unrealistic. If we knew they were broken and had bugs, of course we
wouldn't merge them.

In short - I'm simply not interested in what you _wish_ reality was.
People need to first acknowledge reality, and _then_ they may have
solutions.

So the reality is:
- we do have tons of patches, and they need to be merged (and furiously)

- there *will* be bugs. And the number of bugs will inevitably be
relative to the number of patches. There is no "perfect", and anybody
who argues for a lower number of bugs by lowering the number of patches
is an idiot in my book.

- there *will* be releases, even in the presense of bugs, because holding
everything up is simply not an option.

Those are the things that we have to accept. Anything else is just
dreaming.

Now, what part _can_ we improve an...

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 5:59 pm

No, I don't. I've never said we can _eliminate_ bugs and please don't make

Not necessarily trying to find bugs in them, but trying to understand how the
patched code is supposed to work and if that's really what we want.

I really think we should review each other's code more, but I do realize that

I'm not sure if you find it productive, but whatever.

A general rule that the trees people want you to pull during a merge window
should be tested in linux-next before, with no additional last minute changes,
may help.

For this to work, though, the people will have to know in advance when the
merge window will start. Which may be helpful anyway.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Friday, May 2, 2008 - 8:17 am

If I only release into my tree's for-next branch what I would release
into my tree's for-linus if I was to send a merge request to Linus right
in this moment, then I won't need advance notice of a merge window.

IOW, treat -next everyday as if the merge window was open right now.

I'm sure it is not that easy for the larger subsystems or the
infrastructure trees. However, Linus' late -rc announcements are plenty
of advance notice, at least for a merge period as long as two weeks.
--
Stefan Richter
-=====-==--- -=-= ---=-
http://arcgraph.de/sr/
--

To: Linus Torvalds <torvalds@...>
Cc: Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 3:39 pm

But is it smarter to discourage people from doing code review, by saying
that they won't be doing it anyway,
or actively and publicly encourage people to do so, even on the chance
that it might not lead to everyone doing it?
It's kind of a self-fulfilling prophecy that way.

Trying to force it through the process is another matter entirely.

Cheers,

Friedrich Göpel
--

To: Linus Torvalds <torvalds@...>
Cc: Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 2:50 pm

"all" above is the wrong part. Encourage each other into reviewing code
will definitely *help* (and I did not say fix the problem, OK?). There
are persons who regularly spend some time to review code. I'm thinking
about Al, Andrew, Christoph, Arjan, and maybe many other ones I'm missing,
just that I regularly see them give advices to people who post their patches
on the list. And even if only for that, they deserve some respect, and their
efforts must not be dismissed.

Maybe they are more skilled than anyone else for this job. Maybe they're
so much used to do it that it just takes them a few minutes each time, I
don't know. I wish *more* people could be encouraged to do this work,
which is very likely painful but instructive. If the current reviewers
could give hints on how to save a lot of time to them, it may motivate
more to follow them. I suspect that insisting on developers to post their
less obvious work to the list(s) is a first step. Maybe at one point we're
all responsible when we see a mail entitled "[GIT] pull request for XXX",
we should all jump on it and ask "when and where was this code reviewed ?".

It's not much about reviewing each others' patches, it's about showing
one's work to others first. If our developers are encouraged to work
alone in a cave late at night with itching eyes, and send their work
at once every 2 months in a sealed envelope, we'll not solve anything.

I also proposed a more repressive method incitating the ones with really
bad scores to find crap in other's work in order to remain hidden behind
them. You explained why it would not work. Fine.

I also proposed to group merges by reduced overlapping areas, and to
shorten the merge window and make it (at least) twice as often. Rafael
also proposed to merge core first, then archs, which is a refined variation
on the same principle. I'm not sure I've seen your opinion on this.

Willy

--

To: Willy Tarreau <w@...>
Cc: Linus Torvalds <torvalds@...>, Andrew Morton <akpm@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 6:17 pm

That wasn't me, but the idea is also worth considering IMO.

Thanks,
Rafael
--

To: Willy Tarreau <w@...>
Cc: Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 3:07 pm

the problem with trying to make the cycle twice as fast is that it takes
time to hunt down the hard bugs, even when you have some idea where they
are.

go back through the last few kernels and look at the bugs that were fixed
in the last couple of -rc releases (and in final), would they have really
been fixed faster if other changes hadn't taken place?

I suspect that they would not have, and if I'm right the result of merging
half as much wouldn't be twice as many releases, but rather approximatly
the same release schedule with more piling up for the next release.

even individual git trees that do get a fair bit of testing (like
networking for example) run into odd and hard to debug problems when
exposed to a wider set of hardware and loads. having the networking
changes go in every 4 months (with 4 months worth of changes) instead of
every 2 months (with 2 months worth of changes) will just mean that there
will be more problems in this area, and since they will be more
concentrated in that area it will be harder to fix them all fast as the
same group of people are needed for all of them.

if several maintainers think that you are correct that doing a merge with
far fewer changes will be a lot faster, they can test this in the real
world by skipping one release. just send Linus a 'no changes this time'
instead of a pull request. If you are right the stable release will happen
significantly faster and they can say 'I told you so' and in the next
release have a fair chance of convincing other maintainers to skip a
release.

it does worry me a bit that the release cycle seems to be slipping
slightly each release, but I don't see a good way to fix this.

David Lang
--

To: <david@...>
Cc: Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 3:28 pm

Of course, they'll always be bugs. They'll still slip past the release,

Don't know. However, I think that core bugs have more impact on the rest

no, this is exactly what *not* to do. Linus is right about the risk of
getting more stuff at once. If we merge less things, we *must* be able
to speed up the process. Half the patches to cross-check in half the
time should be easier than all patches in full time. The time to fix

You're perfectly right and that's exactly not what I'm proposing. BTW,
having two halves will also get more of the merge job done the side of
developers, where testing is being done before submission. So in the

again, this cannot work because this would result in slowing them down,
and it's not what I'm proposing.

Willy

--

To: Willy Tarreau <w@...>
Cc: Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 3:46 pm

in general you are correct, however I don't think that it's the general
bugs that end up delaying the releases, think it's the nasty, hard to
identify and understand bugs that delay the releases, and I don't think

Ok, I guess I don't understand what you are proposing then.

I thought that you were proposing going from 2 week merge + 6 week
stabilize = release to 1 week merge half + 3 week stabilize = release

it now sounds as if you are saying 1 week merge + x week stabilize + 1
week merge + x week stabilize = release

if merging fewer catagoies of stuff doesn't speed up the release cycle
then you are right, it would just slow things down. however I thought you
were arguing that if we merged fewer catagories of stuff each cycle we
could speed up the cycle. I'm saying that maintainers can choose to test
this experimentally and see if it works. if it works we can shift to doing
more of it, if it doesn't they only delay things by a couple of months one
time.

you would need to have several maintainers decide to participate in the
experiment or the difference in cycle time may not be noticable.

David Lang
--

To: <david@...>
Cc: Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 3:53 pm

Indirectly yes it should. Who do you think is chasing those nasty bugs ?
More people than should be. While those people spend time on bugs caused
revealed by associating several trees, they don't work on fixing their

The later : 1 week merge for core, 2-4 weeks to stabilize depending on the
amount of changes and complexity of some bugs, release or not at this point
(probably not), then 1 week merge for the rest, and 2-4 weeks stabilize.

Drivers are different. Maybe we'll find it's better to merge them with the

we should not delay too much IMHO, especially for core changes. We risk to
get huge piles of code which break a lot of other things. Also, core changes
sometimes involve adjustments in every driver or so. So they should not get
additional delay (unless we're really bore by the maintainer not respecting

But it would require Linus to drive it first.

Willy

--

To: Linus Torvalds <torvalds@...>
Cc: Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 2:11 pm

As one of those obviously drug-addled freaks who _are_ looking for bugs...
Thank you so fucking much ;-/
--

To: Al Viro <viro@...>
Cc: Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 2:23 pm

That's not what I meant, and I think you know it.

Of course as many people as possible should look at other peoples patches
and comment on them. But saying so won't _make_ it so. And it's also
something that we have done since day #1 _anyway_, so anybody who thinks
that it would improve code quality from where we already are, should
explain how he thinks the increase would be caused, and how it would
happen.

So when we're looking at improvement suggestions, they should be real
suggestions that have realistic goals, not just wishes. And they
shouldn't be the things we *already* do, because then they wouldn't
be improvements.

In other words: do people have realistic ideas for how to make others
spend _more_ time looking at patches? And not just _wishing_ people did
that?

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 3:37 pm

FWIW, the way I'd read that had been "face it, normal folks don't *do*
that and if you hope for more people doing code review - put down your
pipe, it's not even worth talking about". Which managed to get under
my skin, and that's not something that happens often...

The obvious answer: amount of areas where one _can_ do that depends on
some things that can be changed. Namely:
* one needs to understand enough of the area or know where/how
to get the information needed for that. I've got some experience with
the latter and I suspect that most of the folks who do active reviews
have their own set of tricks for getting into the unfamiliar area fast.
Moreover, having such set of tricks is probably _the_ thing that makes
us able to do that kind of work.
Sharing such (i.e. "here's how one wades through unfamiliar
area and gets a sense of what's going on there; here's what one looks
out for; here's how to deal with data structures; here are the signs
of problematic lifetime logics; here's how one formulates hypothesis
about refcounting rules; here's how one verifies such and looks for
possible bugs in that area; etc.) is a Good Idea(tm).
Having the critical areas documented with easy to review in
mind is another thing that would probably help. And yes, it won't
happen overnight, it won't happen for all areas and it won't be mandatory
for maintainers, etc. Previous part (i.e. which questions to ask
about data structures, etc.) would help with that.
FWIW, I'm trying to do that - right now I'm flipping between
wading through Cthulhu-damned fs/locks.c and its friends and getting
the notes I've got from the last month work into edible form (which
includes translation into something that resembles normal English,
among other things - more than half of that is in... well, let's call
it idiom-rich Russian).
* patches should be visible *when* *they* *can* *be* *changed*.
If it's "Linus had pulled from linux-foo.git and that included a merge
from linux-foobar.git, which is developed o...

To: Al Viro <viro@...>
Cc: Linus Torvalds <torvalds@...>, Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 4:07 pm

I think you've just nailed one of the tricks right there. A
long time ago, I just sat down and wrote up a "how the locking works in
the vfs" document for myself and others. Wrote up the structures, what
each member is for, where the structure appears and disappears, and all
the call chains for all of the locks. When I was done, I had a pretty
good idea of how everything interacted.
I think this is a great trick for ramping up on a section of the
code - documentation is good, but you understand self-written
documentation better.

Joel

--

Life's Little Instruction Book #452

"Never compromise your integrity."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
--

To: Al Viro <viro@...>
Cc: <torvalds@...>, <rjw@...>, <w@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Thursday, May 1, 2008 - 3:58 pm

On Thu, 1 May 2008 20:37:14 +0100

Yup. I think the only sane+scalable way of making this happen is to
prevail upon the 100-odd subsystem maintainers to keep an eye out for code
which should be exposed to additional eyes.

There are of course many reasons _why_ such code needs the attention of
others, and those reasons have varying strengths. Off the top of my head:

- modifies stuff outside the designated subsystem (eg: lib/pcounter.c -
thanks Pavel)

- (having just spent an hour looking at drivers/net/sfc/ and having
boggled at its bitmap.h): adds generic-looking infrastructure which
should be in core kernel. Or already _is_ in core kernel.

- Adds any kernel<->user interface which is not of the the most
trivial&standard form

- Futzes with memory management internals, adds pagefault handlers, etc.

- Ditto vfs things, I guess

- In any way attempts to work around _any_ shortcoming of any other part
of the kernel!

- Does anything RCU related. Every time I cc Paul on an rcu-using patch,
he finds holes in it.

- add your own here.

But we won't find such code by going out and looking for it - we do need
the recipients of that code to say "hey, others might want to see this".
That's very low-effort for the hey-sayer, so I expect we can do better here
quite easily.

--

To: Linus Torvalds <torvalds@...>
Cc: Al Viro <viro@...>, Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 2:58 pm

our mails have crossed each other. Just to follow up in this thread just
in case...

as explained in last mail, I think that we're doing that far less than

As explained, I have no problem hijacking pull requests asking for 1) code
and 2) review if it's not explicitly stated in the message that it has been
reviewed, or that it is an obvious fix. I have no problem trusting the poster,
he should just care not to lie too often or will get a bad reputation of being
a blatant liar.

The only limit is that if I'm alone doing those raids, I'll quickly get into
all developer's blacklist and nothing will change. *YOU* too have to enforce
this policy.

Willy

--

To: Al Viro <viro@...>
Cc: Rafael J. Wysocki <rjw@...>, Andrew Morton <akpm@...>, Willy Tarreau <w@...>, David Miller <davem@...>, <linux-kernel@...>, Jiri Slaby <jirislaby@...>
Date: Thursday, May 1, 2008 - 2:30 pm

Just to throw out an example:

- make a "Random pending patch of the day" google gadget.

I know that's abit out there, and I'm not sure the google gadget thing is
realistic, but I bet I'm not the only one who ends up using the google
homepage all the time. A button that says "this patch looks ok", "this
patch looks crap", or "I dunno, give me another one to look at" might be a
fun game that would encourage people to look at a couple of patches a day.

You get five thousand people doing that occasionally (not every day, but
maybe when they are bored and look for something more rewarding than
trying to find bad music videos on youtube), and maybe you'd actually get
feedback on patches.

Make it pick a random commit that is in linux-next but hasn't been merged
into main -git yet.

Crazy? Probably. But at least it fits my notion of "let's not just wish
people did more patch commentary" thing.

IOW, if people are really serious about coming up with ways to improve
code quality, I really think it needs to be about _practical_ things that
can fit in our flow or can be extensions to it, not just wishing for
better quality.

"If wishes were horses, beggars would ride"

Linus
--

To: Willy Tarreau <w@...>
Cc: <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 6:52 pm

On Thu, 1 May 2008 00:46:10 +0200

Well. If we were good enough at tracking bug reports and regressions we
could look at the status of subsytem X and say "no new features for you".

That would be a drastic step even if we had the information to do it (which
we don't).

It would certainly put the pigeon amongst the cats tho.
--

To: Andrew Morton <akpm@...>
Cc: <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 7:21 pm

We already have some information, Rafael is tracking this info. But we need
other developers to look at others' bugs. If we considered that for each
release, the *worst* subsystem does not get any new features merged, maybe
the ones who really want to get theirs merged will quickly take a look at
their not-so-friend coworkers's work to try to get their score up and
avoid getting spotted.

After all, that's what we want to achieve : better cross-testing. For
2.6.27, we would probably have Davem happy to report one hundred of
bugs brought by Ingo and ban him from next merge. But if that's the
only way to find 100 buts in one release cycle, hey that's quite
efficient! And in turn, Ingo would have more time to fix (or deny)
bugs assigned to him, then take a look at his accuser's code for next
release.

Not very moral, but the kernel team has evolved from a small team of
buddies to a large enterprise. And to survive this evolution, we may
need to apply the immoral principles found in big companies.

Willy

--

To: Willy Tarreau <w@...>
Cc: Andrew Morton <akpm@...>, <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 7:38 pm

On the contrary, I call this "keeping everybody else honest".

-chris
--

To: Linus Torvalds <torvalds@...>
Cc: <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 6:41 pm

On Wed, 30 Apr 2008 15:31:22 -0700 (PDT)

Raise the quality. Then the volume will automatically decrease.

Which leads us to... the volume isn't a problem per-se. The problem is
quality. It's the fact that they vary inversely which makes us say "slow
down".

So David's Subject: should have been "Do Better, please". Slowing down is
just a side-effect. And, we expect, a tool.

We should be discussing how to raise the quality of our work.
--

To: Andrew Morton <akpm@...>
Cc: Linus Torvalds <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Thursday, May 1, 2008 - 8:31 am

To improve the quality of kernel releases, maybe we can create a special
kernel testing tool.
This tool should have :

- Can check known bugs, regressions, compile errors etc.

- The design should be modular (plug-in support). So, easily these
regressions, known bugs etc. should be implemented.

- It should have a git support. So,when hit a bug, this tool should have
ability to bisect the commits which automates to finding buggy commits.

- It should have console interface and X interface. So; not just
developers, also users, who wants to help to find out the issues, can
contribute easily.

Just a few things came to my mind when thought about it. Any more
ideas/suggestions welcomed :-) Also, we can create a web site for this
project and we can identify the known regressions, bugs etc. So, easily
who wants to contribute some code about this tool, easily find out these
issues. If anyone interests to create/lead such a tool like this, I can
host a website about this project on our system.

Cheers

Tarkan

--

To: Tarkan Erimer <tarkan@...>
Cc: Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Thursday, May 1, 2008 - 11:34 am

A variety of bugs cannot be caught by automated tests. Notably those
which happen with rare hardware, or due to very specific interaction
with hardware, or with very special workloads.

An interesting thing to investigate would be to start at the regression
meta bugs at bugzilla.kernel.org, go through all bugs on which are
linked from there, and try to figure out
- if these bugs could have been found by automated or at least
semiautomatic tests on pre-merge code, and
- how those tests had to have looked like, e.g. what equipment would
have been necessary.

Let's look back at the posting at the thread start:
| On Wed, Apr 30, 2008 at 10:03 AM, David Miller <davem@davemloft.net>
wrote:
| > Yesterday, I spent the whole day bisecting boot failures
| > on my system due to the totally untested linux/bitops.h
| > optimization, which I fully analyzed and debugged.
...
| > Yet another bootup regression got added within the last 24
| > hours.

Bootup regressions can be automatically caught if the necessary machines
are available, and candidate code gets exposure to test parks of those
machines. I hear this is already being done, and increasingly so. But
those test parks will ever only cover a tiny fraction of existing
hardware and cannot be subjected to all code iterations and all possible
.config permutations, hence will have limited coverage of bugs.

And things like the bitops issue depend on review much more than on
tests, AFAIU.
--
Stefan Richter
-=====-==--- -=-= ----=
http://arcgraph.de/sr/
--

To: Stefan Richter <stefanr@...>
Cc: Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Friday, May 2, 2008 - 10:05 am

Of course,it's impossible to test all the things/scenarios. Just, that
My idea is also hunting the bugs more easily via a tool like this that
has a console/X interface and ability to bisect. So; users,who has
little or no knowledge about git/bisect, can easily try to find out the
problematic commits/bugs.

Tarkan

--

To: Andrew Morton <akpm@...>
Cc: Linus Torvalds <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 8:57 pm

One big problem I see is Linus wanting to merge all drivers regardless
of the quality.

Linus said in [1]:
"I'd really rather have the driver merged, and then *other* people can
send patches!"

The problem is that such "other people" do not exist (except perhaps Al)
for non-trivial stuff.

My favorite gem from this driver we merged in 2.6.25 is:
grep -C4 volatile drivers/infiniband/hw/nes/nes_nic.c

Fixing such stuff aren't "janitorial kind of things", and people are
actually more motivated to fix their code for getting it into the kernel
than to fix their code after it went into the kernel.

I am not saying we shouldn't merge such a driver at all or set
unrealistic high quality goals - I'm for merging all code of good
quality that provides functionality not yet into the kernel.

But we need some minimum quality level.

cu
Adrian

[1] http://lkml.org/lkml/2008/2/21/334

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: Adrian Bunk <bunk@...>
Cc: Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 9:35 pm

Sure, but that's not cause of the problems that people like DavidN
whine about, or problems that frustrate David Miller and/or Ingo
Molnar. The problems that cause whining and/or frustration are when
changes in core code break other maintainer. That is a TOTALLY
DIFFERENT problem from lower-quality device drivers getting merged.
In general, those device drivers don't cause problems who don't have
the relevant hardware, and worse case, the device driver can just be
CONFIG'ed out.

So this is a totally different issue, and whether or not we merge new
device drivers, and at what quality level (from "it compiles, ship
it!", to every single checkpatch, sparse, and Cristoph Hellwig nitpick
has to be addressed *AND* then the submitter has to give a bottle of
high-quality alcohol to a Maintainer :-) is completely orthoganal to
the question of whether we can, in a King Canute fashion, compel
developers from stopping to develop by command them not to send pull
requests or by refusing to merge their work into mainline.

If we don't merge their work, and it's really cool features that our
end users are demanding, it will just flow into the distros via
out-of-tree patches, much like it did during the 2.4/2.5 era. And
maybe the current enterprise distro's will try to hold it back, but if
end users start saying things like's "We want containers!!" and start
voting with their feet to a distro that is willing to merge OpenVZ
patches, it doesn't how much we try to tell the tide to stop flowing
in. So yes, we can apply some amount of backpressure, but the real
challenge is to figure out how we can work smarter and flush out the
bugs faster.

- Ted
--

To: Adrian Bunk <bunk@...>
Cc: Andrew Morton <akpm@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 9:25 pm

That's not what I said.

What I said was that I think we get *better* quality by merging early.

In other words, you're turning the whole argument on its head, and
incorrectly so.

I claim that you are the one that is arguing for *worse* quality, by
arguing for a process that is KNOWN to tend to generate bad code
(out-of-tree drivers) as opposed to one that tends to fix things over time
(and note the "tends" in both cases - there are counter-examples, but
the trend is so clear that anybody who disputes it would seem to be either
blind or lying).

So here's my challenge: give me *one* reason to believe that quality
improves more out-of-tree than it does in-tree, and then you'll have a
point. But you'd better be able to explain the ton of historical data we
have that proves otherwise.

Until you do that, your blathering is just that - total blathering. The
process I advocate is the one that has historical data on its side. Yours
is just a failed theory.

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 10:13 pm

I am *not* saying it should have stayed out-of-tree.

I am saying that it was merged too early, and that there are points that
should have been addressed before the driver got merged.

Get it submitted for review to linux-kernel.
Give the maintainers some time to incorporate all comments.
Even one month later it could still have made it into 2.6.25.

The only problem with my suggestion is that it's currently pretty random
whether someone takes the time to review such a driver on linux-kernel.

And even if I'm getting fire for this again (and different from newbies
running checkpatch on the kernel) for driver submissions it actually
makes sense to tell the submitter to fix the checkpatch errors [1], and
it would have made the driver better in this case (again, it could still
have made it into 2.6.25).

People are actually more motivated to fix their code for getting it into
the kernel than to fix their code after it went into the kernel, so we

cu
Adrian

[1] not necessarily all checkpatch warnings

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: Adrian Bunk <bunk@...>
Cc: Andrew Morton <akpm@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 10:30 pm

Now, I do agree that we could/should have some more process in general. I
really _would_ like to have a process in place that basically says:

- everything must have gone through lkml at least once

- after that point, it should have been in linux-next or the -mm queue

- and then it can get merged (and if it didn't get any review by then,
maybe it was because nobody was interested, and it simply won't be
getting any until it oopses or catches peoples interest some other way)

HOWEVER.

That process doesn't actually work for everything anyway (a lot of trivial
fixes are really best not being so noisy, and various patches that are
specific to some subsystem really _are_ better off just discussed on that
subsystem mailing lists).

And perhaps more pertinently, right now that kind of process is very
inconvenient (to the point of effectively being impossible) for me to
check. Obviously, if the patch comes from Andrew, I know it was in -mm,
and I seldom drop those patches for obvious reasons anyway, but the last
thing we want is some process that depends even _more_ on Andrew being a
burnt-out-excuse-for-a-man in a few years (*).

So I could ask for people to always have pointers to "it was discussed
here" on patches they send (and I'd likely mostly trust them without even
bothering to verify), the same way -git maintainers often talk about "most
of this has been in -mm for the last two months".

That might work. But then there would still be the patches that are
obvious and don't need them.

And then even the obvious patches do break. And people will complain. Even
though requiring that kind of process for the stupid stuff would just slow
everybody down, and would be really painful.

So one of my _personal_ reasons I don't want to put too much process in
place is that I don't think process is appropriate for everything, and yet
even the stuff that obviously doesn't need or want process (speling fixes
and build failures) _will_ cause problems, ...

To: Linus Torvalds <torvalds@...>
Cc: Adrian Bunk <bunk@...>, Andrew Morton <akpm@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, May 14, 2008 - 10:55 am

What about 'must go through lkml at least once *outside the merge
window*'. Or is it just me?

During the merge window, I'm totally overloaded by all those patches
going in and related lkml traffic...

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

To: Linus Torvalds <torvalds@...>
Cc: Andrew Morton <akpm@...>, <rjw@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Thursday, May 1, 2008 - 2:54 pm

Cc linux-kernel on 3 patches "specific to some subsystem" that add the
word "select" to Kconfig files and I'll catch at least one bug before it

It should be enough to trust maintainers that they follow the rules.

And in the unlikely case someone didn't follow them you know whom you

There's a middle way.

Requiring the submission of bigger changes and new drivers to be Cc'ed
to linux-kernel can help and shouldn't cause real problems.

And requiring this kind of patches to be in linux-next for some time
should also be possible.

Both can improve the quality of the kernel.

Trivial patches and bugfixes might not have to follow these rules, but
that's similar to e.g. the current merge window process also having

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--

To: Andrew Morton <akpm@...>
Cc: Linus Torvalds <torvalds@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 7:23 pm

I violently agree.

One of the (obvious?) ways in which we can raise the quality of the code
overall is to spend more time on reviewing the others' code and discussing that
code. It follows from my experience that the quality of patches improves
dramatically if they are discussed while being developed. Of course, that
requires time, but it's time well spent.

For this reason, there should be a mechanism in place that will encourage
people to review the existing code, even the code that hasn't changed for
a long time, and to review and discuss patches submitted by the other people
instead of producing new code.

Also, the patches that were thoroughly discussed during their development
should be regarded as more trustworthy than the ones that were not discussed
at all.
--

To: Rafael J. Wysocki <rjw@...>
Cc: Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 7:41 pm

but you don't have any way of knowing how much discussion took place on
any particular patch. that discussion could have taken place in many
different places, and you don't have the ability to monitor them all.

David Lang
--

To: <david@...>
Cc: Andrew Morton <akpm@...>, Linus Torvalds <torvalds@...>, <davem@...>, <linux-kernel@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 7:51 pm

Not at the moment, but there may be a way to do that if we think of it more
thoroughly.

One idea may be add a "Commented-by:" tag in which to place people who
provided valuable comments to the patch author and/or maintainer (as a comma
separated list, for example, in analogy with the email Cc lists), especially if
the patch has been changed as a result of the comments.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 4:14 pm

Just to clarify: I'd actually like to make the merge window be just a
week. If even that.

With linux-next hopefully stepping up to be a place where the actual
_conflicts_ (which are usually not the big problem, they are just
inconvenient from a timing standpoint) can get found and handled early, a
shorter merge window should be technically possible.

HOWEVER. Even now, at two weeks, we do have issues where timing just
doesn't fit some developer, because of conferences or vacations or just
random personal issues or whatever. There are always people who grumble
because the window didn't work for them.

Of course, they should have had it all ready, but somehow that simply
doesn't happen. I think it's against most human nature to be quite _that_
forward-looking.

And maybe everything would be ok if we could also shorten the actual
release cycle, so that if you miss one merge window for some random
conference or other (or just a *really* bad hair-day and you didn't get
your act together), you wouldn't mind too much and you'd just hit the next
one instead.

But that, in turn, is unrealistic because when bugs do happen, the latency
you get between testers and developers is long enough that I really don't
think we can shorten the after-merge-window thing much. Six weeks seems to
be already pushing it.

And as mentioned, a longer after-merge-window-stabilization phase is just
going to aggravate the problem next time around.

We could have staggered releases, but let's face it, that's what -mm and
linux-next and stable is all about.

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Rafael J. Wysocki <rjw@...>, David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 7:34 pm

I'd go for that. The only one with a possible problem might be Andrew
due to his need to rebase his 1000+ individual patches before he sends
them to you :)

Everyone else should have things queued up and ready to go for you as
it's not like we don't have some warning that the window is about to
open up...

thanks,

greg k-h
--

To: Linus Torvalds <torvalds@...>
Cc: David Miller <davem@...>, <linux-kernel@...>, Andrew Morton <akpm@...>, Jiri Slaby <jirislaby@...>
Date: Wednesday, April 30, 2008 - 4:56 pm

Well, where's it stated that you have to develop new code for each merge
window? By making shorter merge windows with less code merged in each of

That depends on the amount of bugs introduced during the merge window. With
shorter merge windows we may introduce fewer bugs per merge window and

Well, that's assuming that people test linux-next and -mm etc., but frankly I'm
not seeing that happening. Hopefully, things are going to improve.

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: <davem@...>, <linux-kernel@...>, <torvalds@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 4:00 pm

On Wed, 30 Apr 2008 21:36:57 +0200

ooh, fun thread.

One of the main reasons for -mm (probably _the_ main reason) is to weed out
other-developer-impacting regressions before they hit mainline and, umm,
affect developers.

But there are implementation problems:

a) developers aren't testing -mm enough

b) -mm releases have become too slow, and (hence) too unstable

c) people are slamming changes into mainline which have never been seen
in -mm. Lots of changes.

So here's how we're going to fix David's problem:

- Everyone gets their stuff into linux-next.

- Lots of people _test_ linux-next. Just once a week.

Those two steps will improve the merge-window chaos a lot. Things will get
better.

The remaining open problem is what do we do about the shiny new code which
is getting slammed into the merge window?

Well, it's very easy to tell whether code which appears in the merge window
was present in linux-next.

Our first way of preventing people from shoving inadequately-cooked code
into the merge window is suasion (aka flaming their titties off). If that
proves insufficient and if we still have a sufficiently large problem that
we need to do something about it then sure, let's reevaluate.

But one thing at a time. For the 2.6.27 release let us concentrte on two
things

- get your stuff into linux-next

- test linux-next.

If merge-window stability is still a problem after that then let's revisit?
--

To: Andrew Morton <akpm@...>
Cc: <davem@...>, <linux-kernel@...>, <torvalds@...>, <jirislaby@...>
Date: Wednesday, April 30, 2008 - 4:20 pm

For this to happen, let's make the mainline change slower than once a day

Not until we make a rule that nothing that didn't went through linux-next is

I'll see you in the analogous thread during the next merge window. ;-)
--

To: David Miller <davem@...>
Cc: <linux-kernel@...>
Date: Wednesday, April 30, 2008 - 10:48 am

Just some comments:

Analogous to that of the football team, everyone has an impt role to
play. And u better let go of the ball as fast as u can, otherwise u
are going to tire yourself out easily.

So, in a development team, if u think there is some unequal
distribution of workload, make noise. Or think of some means to do
automatic loading of workload - specifically in the area of change
review. (At other times, it is not easily to pass the load
around.....eg, if the bug happened only on your machines and not on
others?)

1. Generally, the more people reviewed the work, the higher chances
the piece of work is ok.

2. If more variation of real-testing is done, the better.
"variation" here means testing by users of different background
skills, different applications running, and most impt - is the base
kernel version where the patch is applied and tested. etc.

3. Based on the two numbers above alone, we can immediately have
some measure of confidence of the patch - correct?

4. So if we can put all these in a web page - the patches itself,
the reviewers/testers that have worked on it.

When someone comes in and review, review counter increase by one. Or
tester counter increased by one after testing.

And I supposed everyone will attempt to cover those that are lesser
covered by others - automatic loading of workload done in a
distributed manner.

Avoid having to fill in too much information though...u will
discourage taking up the work, and let the participant spent more
precious time on reviewing instead.

So prior to consolidation of sources, just by looking at the numbers,
u can see how successful the consolidation will be. If it is lesser
tested, then avoid including it for consoldating......

Please comments......
--
Regards,
Peter Teoh
--

To: David Miller <davem@...>
Cc: <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Wednesday, April 30, 2008 - 12:03 am

Yes. The Linux process is becoming unreliable. Newly "stable" versions
have stability problems. The development process looks childish.
Seasoned developers say not to worry, that the process works. I do
worry. BSD seems more attractive, and it may even be worth the
considerable effort to switch my entire client-base. Linux was lucky to
gain the foothold that it did: traditionally, BSD had a better system
with a less restrictive licence, so it is surprising that manufacturers
chose to go with Linux. BSD still has a less restrictive licence and
when mainstream press becomes interested in Linux's quality problems
it's adoption will fall. BSD is still a good, maybe even better, option.

Linus, this is your baby and so it's your problem. Only you have the
influence to change things.
--

To: David Newall <davidn@...>
Cc: David Miller <davem@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Wednesday, April 30, 2008 - 3:11 am

I completely disagree with your foolish and nonsense comments about the
Linux Kernel and the Linux OS. It's perfectly clear that you didn't
understand well enough how the linux development process works. If you
thought that the recently released kernels are not stable then, you have
to wait the 2.6.x.y series or you can use the distro kernels. All of
your comments are pointless and no base. You are free to choose BSD or
whatever you want to use. No one is putting a gun on your head to use
Linux :-)

I can very easily say that,cause of my experiences , the Linux Kernel is
PERFECTLY STABLE! I work in an one of the largest ISP of my country and
I use Linux very intensively under very high loads and I NEVER NEVER
faced any problems because of the fault of the Linux Kernel on my
environments. For example, many of our mail servers run on Linux and all
the day they process hundred thousands emails without any downtime or
trouble!

The manufactures mostly choose Linux instead of BSD flavors, simply
because of that Linux kernel, technically, more superior to BSDs or
others. When it comes to licenses: the BSD license is more and more
worse, if GPL is bad. GPL protects your freedom and openness of the
codes via forcing the changes to the source code must be return in open
form. For BSD, it is opposite. You are free to take someone else's code
and there is NO PROTECTION to prevent your code to become a closed
(proprietary) source. Can you imagine that one company (like Microsoft)
takes your whole kernel source code and creates a PROPRIETARY OS (Like
Windows!) as making a fool of you ? Why? Because; simply, BSD license
allows it! No need to return the code! Do you think really think that
BSD license is more free as making a fool of you ?

--

To: Tarkan Erimer <tarkan@...>
Cc: David Miller <davem@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Wednesday, April 30, 2008 - 9:28 am

The problem is not exactly faults in recently released kernels, rather
that introduction of faults is common when it should be rare, and
kernels are released as stable when they are fragile. Ignoring a
problem, and not caring if they migrate to BSD, is foolishness. Of
course you don't want people to migrate to BSD, so don't pretend that

It is a matter of transparent fact that BSD's licence is less
restrictive than Linux's. Whether that is desirable is not something
that need be discussed at this juncture. My point in raising BSD was
that, from a commercial point of view, BSD is attractive in a way that
Linux is not. The many commercial vendors who have been taken to task
for not honouring their GPL obligations are strong demonstrations of
that. Do not pretend that Linux is sacrosanct. BSD would be an easy
swap for vendors should Linux gain a reputation for poor quality (and it
already runs Linux applications.)

Reputations snowball. By the time anybody notices that a good one has
become tarnished it could be too late, and take too long, to rectify.
I'm sure somebody else observed approximately this just yesterday, so
it's not just me, is it?

I won't champion this because it's unimportant to me. Linux's quality
problems are not my problems. I do what I can to help Linux, but I'm
not religious about operating systems and I know that good, free
operating systems will continue to thrive, even if Linux's dies, just as
they did before Linux was born.

Ignore the problem, even shoot the messenger, if you like; or be adult,
consider the proposition dispassionately, and take steps from there.

I've said my bit, in fact more than I wanted to, so I choose to stop here.
--

To: David Newall <davidn@...>
Cc: Tarkan Erimer <tarkan@...>, David Miller <davem@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Wednesday, April 30, 2008 - 10:41 am

within all time you spent up here to discuss nonsense (from my pov it is),

several bugfixes, regression fixes, new drivers, ... have been done or started.

lets concentrate back on what counts - shouldn't we?

my 2ct
marcel

--

To: David Newall <davidn@...>
Cc: Tarkan Erimer <tarkan@...>, David Miller <davem@...>, <linux-kernel@...>, Linus Torvalds <torvalds@...>
Date: Wednesday, April 30, 2008 - 9:38 am

BANG!

#include <chicken_little_headstone.h>

--

To: <davidn@...>
Cc: <linux-kernel@...>, <torvalds@...>
Date: Wednesday, April 30, 2008 - 12:18 am

From: David Newall <davidn@davidnewall.com>

Please don't use my posting as an opportunity to portray
BSD as the best thing since sliced bread.

We're having ONE bad merge window, we're facing the problem
head on, RIGHT NOW, to prevent it in the future. It's
not a severe ongoing issue as you portray it to be.
--

To: David Miller <davem@...>
Cc: <linux-kernel@...>, <torvalds@...>
Date: Wednesday, April 30, 2008 - 9:04 am

No. The problem is more than just a bad merge window. There is poor or
non-existent review; frequent "regressions"; release of kernels as
stable when they are not. There is resentment and resistance to even
acknowledging these problems. Take, as an example, the desire to NOT
record who gives good code and who gives bugs: that one clearly hit a
nerve, which it should not have except from people who feel guilty.

I don't claim BSD to be perfect, but it appears to have a consistently
good quality. Old Linux kernels also have that; new ones not so.
--

To: David Newall <davidn@...>
Cc: David Miller <davem@...>, <linux-kernel@...>
Date: Wednesday, April 30, 2008 - 10:51 am

Lol. You should try VMS. Now *there* was a stable system.

Oh, but it didn't actually make any progress, did it?

The fact is, we're merging a lot. It comes from having a lot of
development. If you don't want that, then you're a fool - because you

Can you point to any actual stability problem?

The problem under discussion is the fact that some people are unhappy
because we had some merge trouble. The fact is, the problems got fixed in
a few days. And yes, we will probably will have to make Ingo follow the
rules that pretty much everybody else also follows, and no, it's not going
to solve all problems either - the fundamental issue is that we are just
too damn good at development.

And that's not a big problem in my view, as long as we are also also able
to handle the _result_ of that flood of patches. Which, quite frankly, we
are.

DavidN, you just have an agenda, and you think that mentioning BSD as some
kind of shining example of goodness is a good way to reach that agenda. It
isn't. It just shows that you don't understand the issue, and that you
think that "threatening" developers by saying you'll switch is a great way
to make PR.

But you know what? I really don't care one _whit_ what you do. You can
switch to Vista for all I care, and I really don't mind. All I care about
is doing a good job technically.

And you just show that you don't have a clue what you are talking about.
If you want stable kernel, don't follow the current -git tree. Don't mind
the fact that in two weeks we merge

6672 files changed, 373817 insertions(+), 285901 deletions(-)

and instead look at something like the enterprise kernels or other tree
that lags the development tree by half a year or more exactly _because_
they care about stable, not development.

In short: what do you think the git tree is? Is it something that should
prioritize good developmnent, or is it something that should worry about
you making inane arguments? Ask yourself that.

Linus

...

To: Linus Torvalds <torvalds@...>
Cc: David Miller <davem@...>, <linux-kernel@...>
Date: Wednesday, April 30, 2008 - 2:21 pm

Well of course. So could you because they are a matter of public record
on the list. Don't pretend otherwise. Just to give you some recent,
personal bugaboos, and not even drawing on the many hundreds of relevant
messages on LKML each month:

1. Out of memory, caused by apparent leak somewhere, resulting in
machine effectively hanging for a minute or two (massive disk i/o)
culminating in termination of one or more processes. (For what it's
worth: 512MB, no swap.) Problem takes a couple of days to develop
(hence I suspect a leak.) This is running only Firefox, Thunderbird and
Evince, plus whatever xubuntu wants. Restarting the killed
application(s) causes the problem to recur. Restarting X doesn't help.
Killing almost all processes also doesn't help. Reboot is required.
This problem seems not to be in 2.6.17, but is in 2.6.22 (plus whatever
patches xubuntu use) and 2.6.23. I'm still testing 2.6.25, but probably
going to have to abandon it and go backwards, because...

2. Suspend to disk doesn't resume properly (two out of three times.)
System comes back but X has severe wierdness. Draws frames and title
bar, but not window contents. Text-mode is just as bad: Screen is blank
(erased font table, perhaps?) Subsequent suspend to disk doesn't resume
at all.

Note the wide range of kernels exhibiting problem 1. I don't even want
to think about problem 2 at this stage; I just want to stop having to
reboot to reclaim memory, especially when a mate who does Windows

Not so good. The process is flawed. Inadequate testing. Inadequate
review. This has been mentioned by others, so you know I'm not making
it up. The real fundamental issue is that people are too keen to

Yes, BSD does seem to be a shining example of goodness, but I didn't
mention it because I think people should switch. I did so to warn of
competition, to say that the world does not owe Linux a second chance
and isn't going to give it one. It's pointless to debate the relative
merits of the two systems...

To: David Newall <davidn@...>
Cc: David Miller <davem@...>, <linux-kernel@...>
Date: Wednesday, April 30, 2008 - 2:27 pm

Ok, *PLONK*.

You're on an old kernel, don't know if your problem is fixed, and ask us
to slow down development.

That makes sense.

Go away.

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: David Newall <davidn@...>, David Miller <davem@...>, <linux-kernel@...>
Date: Wednesday, April 30, 2008 - 3:06 pm

He did say that he was testing 2.6.25, and that suspend-to-disk was
broken in 2.6.25.

Chris
--

To: Chris Friesen <cfriesen@...>
Cc: David Newall <davidn@...>, David Miller <davem@...>, <linux-kernel@...>
Date: Wednesday, April 30, 2008 - 3:13 pm

Neither of which had anything to do with the whole "slow down" argument.

If you have a bug, make a bug report, and push it, and make people aware
of it. But don't make it an argument for development to slow down.

Should we all stand around with our thumbs up our *ss because somebody has
a bug? Should the other developers just stop, because suspend-to-disk is
broken for somebody? Should everything come to a standstill because David
Newall doesn't like how there are other things going on that are
independent of _his_ problems?

Do you really believe that?

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: Chris Friesen <cfriesen@...>, David Miller <davem@...>, <linux-kernel@...>
Date: Wednesday, April 30, 2008 - 3:22 pm

You're being a nasty piece of work this day, Linus, and you're fibbing
by mischaracterising what I said which, by the way, included, "it's not
the specifics of the problem I'm having that matters". You're taking
this far too personally. Get a grip.
--

To: David Newall <davidn@...>
Cc: Chris Friesen <cfriesen@...>, David Miller <davem@...>, <linux-kernel@...>
Date: Wednesday, April 30, 2008 - 3:42 pm

Umm. If you didn't want a personal opinion, why did you Cc me in the first
place then, and ask for my input?

I gave my input to you. I think your arguments are ludicrous, to the point
of being totally idiotic. You complain how I don't release kernels that
are stable, but without any suggestions on what the issue might be, apart
from apparently me merging too much and making too many releases.

But do you really expect me to stop merging, or hold up releases that fix
hundreds of issues, just because there are other issues pending? Do you
really think development can be stopped? Trust me, we've tried. Every
time, it just leads to worse problems when the floodgates are then opened.

And yes, there is a solution: don't develop so much. Don't allow thousands
of developers to be involved. Do a small core group, and make development
so hard or inconvenient that you only have a few tens of people who write
code, and vet them and force them to jump through hoops when adding new
features (or fixing old ones, for that matter).

And yes, that *does* result in a "stable" system. Never mind that it's
stable for all the wrong reasons, and generally doesn't actually work well
across a dynamic environment (whether the hardware base below or user
space above).

See? This is why I think your arguments are so silly and misguided.

But if you actually have real constructive ideas on things to actually
*do*, please do mention them. We've changed our models over time, several
times, exactly because we've searched for better ways to do thigns. But do
realize that

(a) we can't just stop, or even really slow down. We can onyl try to
regulate and to some degree direct the flood, not hold it up for any
particular issue.

(b) We do have process in place, and it may not be perfect, but I doubt
anything is, and what we do have actually has evolved over the years.

And that's not just my process (ie "two-week merge window, followed
by about 6-8 week...

To: Linus Torvalds <torvalds@...>
Cc: David Miller <davem@...>, <linux-kernel@...>
Date: Wednesday, April 30, 2008 - 2:55 pm

I just finished telling you that I'm currently trying 2.6.25. But you
couldn't have read that with any care at all, because I also just
finished telling you that it's not the specifics of the problem I'm
having that matters, it's the systemic problems in Linux's development
process.
--

To: David Newall <davidn@...>
Cc: David Miller <davem@...>, <linux-kernel@...>
Date: Wednesday, April 30, 2008 - 3:08 pm

No. What you told us was nothing like that at all. What you told us was
that you totally ignored the issue I brought up, namely that development
happens, and that you have the choice of stagnating or accepting it.

You point to it as some "systemic problem", and I told you that it's a
sign of fast development. Things change. You didn't listen, or understand.

If you want systemic problems, it is your kind of "bug report" that isn't
anything like a bug report. Make a real report, don't whine. Push the
_report_, not your inane agenda. Talk about *technology*, not about how
you wish everything revolved around you and your wishes.

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: David Miller <davem@...>, <linux-kernel@...>
Date: Wednesday, April 30, 2008 - 3:16 pm

Don't be foolish, Linus. It was exactly like that, almost to the point
of quoting myself.
--

To: David Newall <davidn@...>
Cc: David Miller <davem@...>, <linux-kernel@...>
Date: Wednesday, April 30, 2008 - 3:25 pm

You misunderstand.

I object to your _idiotic_ claim that there are "systemic problems", where
your "solution" to them is apparently to stop making releases and stop
making forward progress.

That's why I said you told us was nothing like that. What you told us were
your personal problems, no "systemic" issues.

Linus
--

To: Linus Torvalds <torvalds@...>
Cc: David Miller <davem@...>, <linux-kernel@...>
Date: Thursday, May 1, 2008 - 12:31 am

I did not say to stop making releases or forward progress. You
completely made that up! I said there are systemic problems, namely
inadequate testing and review. Slow down; don't snatch up crap changes.

You asked me to give a specific problem, so I did, but I also said that
the particulars of those problems weren't the point. You have ignored or
twisted everything I said. Did you ask me for a specific problem purely
to attack me with it? Perhaps you did.

You do release kernels that are unstable, and you call them "stable",
but I'm sure I said that inadequate review and testing are causes, which
I think counts as a suggestion on what the issue might be. It's been a
repeating theme in this thread, and I'm talking about what everybody
else is saying, not what I'm saying, so again, you know that I'm not
making this up.

Stop telling the world that 2.6.25 is ready for them when you know it's
not. It's now ready for beta testing, and no more. Is 2.6.24 ready for

You're being absurd, even hysterical. How about you require test plans
and test results? Is it possible to require serious, independent code
review?

And let me talk about code review. When one puts one's name to a
reviewed-by tag one takes joint responsibility for the result. There
needs to be some sort of balanced accounting. Presently it's all glory,
where the records show who has contributed code that made it to
mainline, but nobody counts who broke the system. There's no motive to
do a good job, in fact the opposite is true. The more crap you can sneak
in, the more glory you get.

Don't you go and twist this into some sort of, "David want's to point
fingers at people who regularly introduce bugs, which we don't want to
do" and ignore the problem. There is a problem; this entire thread is
testimony to that. You, Linus, are ultimately responsible for what goes
in so you have to acknowledge that there is a problem, you have to stop
shooting the messenger, and you have to shepherd a solution.
--

To: David Newall <davidn@...>
Cc: Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>
Date: Thursday, May 1, 2008 - 11:28 am

On Thu, 2008-05-01 at 14:01 +0930, David Newall wrote:

this is kindof bullshit, You can never be sure that something works
perfectly for everyone, if there were to be so excessive testing that
you would be willing to make such a bold claim, any "stable" kernel
would be years in testing.. Linux stability also seems to be okay, and
people who wants to lower risk of problems can simply choose to use
slightly older versions.

What i find more of a problem is long term effects and problems of
changes.

For instance, Linux has slowly and steadily been getting alot more
sensitive to IO, and ALOT more memory hungry..

I Recently found a system with a 2.6.4 kernel, and when i upgraded to
2.6.23, i saw memory usage increase from ~250mb to around 500. I
upgraded to .25 to see if it was some weird bug, but it is the same.

Unfortunately i cannot investigate more, as i only had the box for a
very short time, but this is alot more concerning to me.

Unfortunately i dont think i can easily reproduce this as i am unsure

Well.. its doing a quite nice job on my new workstation :)

<snip>

--

To: David Newall <davidn@...>
Cc: Linus Torvalds <torvalds@...>, David Miller <davem@...>, <linux-kernel@...>
Date: Thursday, May 1, 2008 - 9:49 am

On Thu, May 01, 2008 at 02:01:43PM +0930, David Newall wrote:

If a kernel release works without problems on 9999 out of 10000
machines, is it stable? How few specific combinations of hardware are
there allowed to be with any problems before you can call it stable?
How do you know a problem you see wasn't tested by 500 people none of
whom had any problems because none of them had the hardware you do?

--
Len Sorensen
--

To: <davidn@...>
Cc: <torvalds@...>, <linux-kernel@...>
Date: Thursday, May 1, 2008 - 12:37 am

From: David Newall <davidn@davidnewall.com>

This has an absurd presumption that something is only stable when
there are zero problems with it.

Fault free software, except in extremely trivial examples, does not
exist in nature.

BTW, this points out another BS aspect of your BSD fan-boy crap,
the BSD userbase is only a tiny fraction of how many people use
Linux. So you can't even compare the number of outstanding problem
reports between the two.
--

To: David Newall <davidn@...>
Cc: David Miller <davem@...>, <linux-kernel@...>, <torvalds@...>
Date: Wednesday, April 30, 2008 - 9:18 am

Speaking as someone who has found quite a few kernel bugs, but written
few (because I've written little kernel code ;-))...

No. It hit a nerve because it's the simply wrong way of going about
things. There is no use in assigning blame.

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html
--

Previous thread: linux-next: upstream build failure: v4l/dvb by Stephen Rothwell on Tuesday, April 29, 2008 - 9:41 pm. (5 messages)

Next thread: [GIT PULL] ext4 update by Theodore Ts'o on Tuesday, April 29, 2008 - 10:45 pm. (1 message)