Re: [GIT]: Networking

Previous thread: [PATCH 1/6] sched: rt-bandwidth for user grouping interface by Peter Zijlstra on Tuesday, August 19, 2008 - 3:33 am. (1 message)

Next thread: ppm conference by Steven Michaels on Tuesday, August 19, 2008 - 4:43 am. (1 message)
From: David Miller
Date: Tuesday, August 19, 2008 - 4:17 am

We're still chipping away at the packet scheduler layer locking
issues, but I feel that this is mostly sorted at this point.

Other highlights:

1) Fix for NAT via loopback per regression with GSO by Herbert
   Xu.

2) Merge in wired driver fixes via Jeff Garzik.

3) Bluetooth updates via Marcel Holtmann.

4) Wireless driver updates via John Linville.

5) Fix to namespace handling in ipv6 from Brian Haley.

6) DCCP panic fix from Gerrit Renker.

7) Packet scheduler qdisc return value handling fix which can
   cause TCP crashes.

8) Netfilter bug fixes from Patrick McHardy and co.

Please pull, thanks a lot!

The following changes since commit a7f5aaf36ded825477c4d7167cc6eb1bcdc63191:
  Linus Torvalds (1):
        Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/.../tip/linux-2.6-tip

are available in the git repository at:

  master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master

Adrian Bunk (2):
      [netdrvr] uninline atl1e_setup_mac_ctrl()
      ath9k: work around gcc ICEs (again)

Anders Grafström (1):
      netfilter: ipt_addrtype: Fix matching of inverted destination address type

Atsushi Nemoto (1):
      [netdrvr] ne: Use CONFIG_MACH_TX49XX

Ben Dooks (1):
      AX88796: Fix locking in ethtool support

Brian Haley (1):
      netns: Add network namespace argument to rt6_fill_node() and ipv6_dev_get_saddr()

Brice Goglin (1):
      myri10ge: myri10ge_fw_name also overrides the rss firmware

Bruce Allan (7):
      e1000e: Return 1 instead of a non-zero value for link up indication
      e1000e: Set InterruptThrottleRate to default when invalid value used
      e1000e: Use skb_copy_to_linear_data_offset introduced in 2.6.22
      e1000e: Increase Tx timeout factor for 10Mbps
      e1000e: increase minimum frame size allowed
      e1000e: test for unusable MSI support
      e1000e: remove unnecessary snippet missed in prior check_options update

Christian Lamparter (3):
      p54: Fix regression due to "net: Delete ...
From: Linus Torvalds
Date: Tuesday, August 19, 2008 - 10:03 am

David, this absolutely _has_ to stop.

We're after -rc3. Your network merges continue to be too f*cking large, 
and this has been going on for many months now. If you cannot throttle 
people, I will have to throttle you and stop pulling things.

I'm going to take this, but really - this isn't just new drivers or 
something like that that you've used as an excuse for big pulls before, 
this is a _lot_ of changes to existing code.

Tell your people to look at the regression list, and if it's not there, 
they should stop.

I realize that this problem is partly because when I see the pull requests 
from you, I effectively see a combined pull from multiple different 
sources, and in that sense it's not quite as big. But the networking pulls 
have _consistently_ had the problem that they keep on being big not just 
after -rc3, but after -rc4 and on, and I get the distinct feeling that 
you're not moving the pain downwards, and aren't telling the people under 
you to keep it clean and minimal and regressions only.

For example, those BT updates looked in no way like regression fixes. So 
what the f*ck were they doing there? And why do you think all those driver 
updates cannot cause new regressions? 

If it's not a regression fix, it shouldn't be there. It should be in the 
queue for the next version. Why is that apparently so hard for the network 
people?

		Linus
--

From: Marcel Holtmann
Date: Tuesday, August 19, 2008 - 11:27 am

the Bluetooth fixes do fix one regression that broke user space
assumptions.

I included additional support for one new driver since I was under the
assumption that new driver support is fine since it can't introduce a
regression. If that has changed then please spell this out and we have
to apply this rule to all subsystems.

Also I cleaned up the MAINTAINERS file entries for Bluetooth. Are these
considered harmful now and should be postponed to the next merge window?
They can obviously not introduce any regressions?

Regards

Marcel


--

From: Linus Torvalds
Date: Tuesday, August 19, 2008 - 1:54 pm

What I consider harmful is not any individual commit per se, but the 
mindset that clearly says "hey, this particular commit is good, let's 
push it up".

And all of the commits are _individually_ fine and the likelihood for 
breakage is probably damn low, but when you have lots of them, that 
doesn't work any more. 

The whole point of the merge window is that you should be sending good, 
tested commits _then_. And if you miss the merge window, then you queue 
them up for the next one.

As it is, it seems like some people think that the merge window is when 
you send any random crap that hasn't even been tested, and then after the 
merge window you send the stuff that looks "obviously good".

How about raising your quality control a bit, so that I don't have to 
berate you? Send the _obviously good_ stuff during the merge window, and 
don't send the "random crap" AT ALL. And then, during the -rc series, you 
don't do any "obviously good" stuff at all, but you do the "absolutely 
required" stuff. 

The rule should be that if you have any doubt _what-so-ever_ that 
something is absolutely required, you simply don't send it during the -rc 
phase. And if you have any doubt at all about something not working, you 
don't send it during the merge window either!

The merge window is not for "let's get this tested, so that we can fix it 
during the -rc". And the stabilization phase is not for "this one looks 
obviously correct and safe".

		Linus
--

From: David Miller
Date: Tuesday, August 19, 2008 - 2:21 pm

From: Linus Torvalds <torvalds@linux-foundation.org>

We are perpetuating this mind set, aren't we?  I could be wrong
but this is how I see things currently.

I agree, we should be working on regressions fixes now.  And we should
essentially be doing so up until the merge window opens up again,
right?

When do people following those rules have time to work on new stuff?
Especially people like me who have to review and merge everyone else's
work as well as help fix bugs.

And not just subsystem maintainers like me, it's also the same for
people who are experienced, dilligent, and work on fixing bugs.
That kind of work is very time consuming.

So given that, who spends a decent amount of time working on features?
People who aren't dilligent working on bugs before the merge window,
and new developers, that's who.

linux-next is great, I love it, it solves all the merge hassles that
used to knock us out during the merge window and make life hell.

But it doesn't fix the time delegation problem.

There is always this "oh crap, I just spent 3 months doing nothing
but fixing bugs" feeling a lot of us core folks get right before
the merge window opens up.

So instead of getting the best work from the best people we have,
we get this last minute flurry of development in the days leading
up to the merge window openning up.

Maybe it's just a longing for the golden era of 2.${ODD}.x style
development, who knows :-)
--

From: Linus Torvalds
Date: Tuesday, August 19, 2008 - 2:31 pm

Yes. But any new code should go into another branch (or delayed entirely, 
but that probably doesn't work wekk for you guys) so that by the time the 
merge window opens up, it's already ready and rearing to go, and 
preferably pretty well tested too.

The problem is, you guys end up accepting a lot of stuff even after the 
merge window. I know why - it's easy to do. It looks obviously fine. And 
yeah, I let things slide.

The problem is, I've let things slide for a long time, and you guys don't 

You can work on the new stuff too, but DON'T F*CKING SEND IT TO ME!

What's so hard to understand about that?

			Linus
--

From: Josh Boyer
Date: Tuesday, August 19, 2008 - 7:25 pm

So, not to add fuel to a fire that seems to be calming down, but what is
so wrong with having that feeling?  If the core people are spending 3
months doing nothing but fixing bugs, the I consider that to _be_ the
best work from the best people.

Bugs happen and it's not really worth debating over how they got into
the tree to begin with because it happens in a number of different ways.
However, if it takes 3 months of bug fixing to get a tree in shape then
I don't see how that's a problem at all.  Personally, I'd rather take a
relatively bug free tree over new shiny features on top of a buggy as
hell tree.

Call me old fashioned.

josh

--

From: David Miller
Date: Tuesday, August 19, 2008 - 7:51 pm

From: Josh Boyer <jwboyer@gmail.com>

All work and no play makes Dave a dull boy, that's the
problem. :-)

I don't care how much someone claims they enjoy bug fixing
and gathering up other people's patches, you will go out
of your mind or become bored to death if you don't get to
spend real time implementing something significant from
time to time.
--

From: Willy Tarreau
Date: Tuesday, August 19, 2008 - 9:20 pm

That's true and I would also add that it's very common for bugs to
be discovered and fixed while implementing new features. However,
it's so convenient to manage several branches with git that it should
not be a problem to "play" in one branch and push all the stuff during
the merge window only. One of the problems with networking is that you
need a lot of testers. I don't think it's too hard for them to pull
from your development tree. And if it is, maybe you can incite them
from time to time by releasing snapshots as plain patches.

BTW, it also helps testers a lot to be able to play with topic trees
provided as patches against last release, because they generally can
apply them to stable kernels without the fear of losing their data.
I'm sure that many people already run stable kernels with not-yet-merged
patches on top of them and are happy that way.

Regards,
Willy

--

From: Grant Coady
Date: Tuesday, August 19, 2008 - 11:35 pm

On Wed, 20 Aug 2008 06:20:29 +0200, Willy Tarreau <w@1wt.eu> wrote:


Count me as a patch tester -- trying to learn git but I break the thing 
too often at the moment :(  

Grant.
--

From: John W. Linville
Date: Wednesday, August 20, 2008 - 7:35 am

Are you talking to me??? :-)

John
-- 
John W. Linville
linville@tuxdriver.com
--

From: Marcel Holtmann
Date: Tuesday, August 19, 2008 - 8:42 pm

again, my current understanding was that updates to the documentation
that would help people to navigate and understand the kernel better and
make it easier for them to do bug reports etc. are always welcome and
should be pushed immediately. Same goes for new drivers that would
enable people to use their hardware.

If you don't want these patches during the -rc phase, then this is fine
by me. It is no extra work for me to queue these up until the next merge
window. Actually GIT makes merges for me so simple that I couldn't care
less. I was mislead that you want these kind of fixes to go in quickly
and I apologize for the trouble. I will stop bothering Dave with these

I get your point! And I was never using the merge window for "random
crap". All my stuff is heavily tested and even on non-x86 systems.

So why does it happen that I touched the MAINTAINERS file outside the
merge window? Simply because I ran into it looking what it says and then
fixed it. And when I had the regression fix for Dave to pull, I picked
the other two patches that couldn't introduce any regression and send it
with it. You don't want these. I get it and from now on they will stay
in my queue until the next merge window.

Regards

Marcel


--

From: John W. Linville
Date: Wednesday, August 20, 2008 - 7:37 am

This is my (current) understanding as well.  If that is not the case,
then someone should clarify.

John
-- 
John W. Linville
linville@tuxdriver.com
--

From: Linus Torvalds
Date: Wednesday, August 20, 2008 - 8:30 am

Guys, which part of "it wasn't any individual commit" didn' you 
understand?

Why are you concentrating on one documentation commit that I didn't even 
point to?

But that said - no, I don't think there is any reason to even push 
documentation commits, unless there is a real and pressing reason (ie the 
documentation is really important or will really matter from a future 
merge standpoint). I generally won't complain about them, but I also don't 
see the point.

I'd _much_ rather see you guys queue it up for future merges. 

		Linus
--

From: Marcel Holtmann
Date: Wednesday, August 20, 2008 - 8:40 am

John was just pointing out (like myself before) that a lot of people are
under the impression that documentation updates and new drivers should
not be queued up and merged as soon as possible.

I am really fine either way. Queuing them up is not a problem and you
made your point that you don't wanna see them after -rc1. That is fair
enough, but it really needs to be spelled our here since the overall
consensus is different. I will not send any of these do Dave anymore
after the merge window.

Regards

Marcel


--

From: Linus Torvalds
Date: Wednesday, August 20, 2008 - 9:09 am

I think (and hey, I'm flexible, and we can discuss this) that the rules 
should be:

 - by default, the answer should always be "don't push anything after the 
   merge window unless it fixes a regression or a nasty bug".

   Here "nasty bug" is something that is a problem in practice, and not 
   something theoretical that people haven't really reported.

 - but as a special case, we relax that for totally new drivers (and that 
   includes things like just adding a new PCI or USB ID's to old drivers), 
   because (a) it can't really regress and (b) support for a specific 
   piece of hardware can often be critical.

With regard to that second case, I'd like to note that obviously even a 
totally new driver _can_ regress, in the sense that it can cause build 
errors, or problems that simply wouldn't have happened without that 
driver. So the "cannot regress" obviously isn't strictly true, but I 
think everybody understands what I really mean.

It should also be noted that the "new driver" exception should only be an 
issue for things that _matter_.

For example, a machine without networking support (or without suppoort for 
a some other really core driver that provides basic functionality) is 
practically useless. But a machine without support for some particular 
webcam or support for some special keys on a particular keyboard? That 
really doesn't matter, and might as well wait for the next release.

So the "merge drivers early" is for drivers that reasonably _matter_ in 
the sense that it allows people to test Linux AT ALL on the platform. It 
shouldn't be "any possible random driver".

IOW, think about the drivers a bit like a distro would think about 
backporting drivers to a stable kernel. Which ones are really needed? 

Also, note that "new driver" really should be that. If it's an older 
driver, and you need to touch _any_ old code to add a new PCI ID or 
something, the whole argument about it not breaking falls away. Don't do 
it. I think, for example, ...
From: Jiri Kosina
Date: Wednesday, August 20, 2008 - 9:46 am

It in fact depends on your definition of regression really :)

If we merge a buggy driver that hangs the user's machine when loaded, well 
... before the driver has been merged, the machine had been booting well, 
just some hardware was not functioning at all. After this late driver 
merge, the driver gets autoloaded upon boot and crashes the machine. Users 
will probably see this as a regression.

This doesn't mean that I am against merging new drivers as aggressively as 
possible, I just wanted to point out that it might bring actual 
regressions to users.

-- 
Jiri Kosina
SUSE Labs

--

From: Marcel Holtmann
Date: Wednesday, August 20, 2008 - 10:33 am

I am perfectly fine with these rules. You only had to spell them out :)

Regards

Marcel


--

From: Paolo Ciarrocchi
Date: Wednesday, August 20, 2008 - 11:50 am

I wonder whether if it would be a good idea to periodically send out an email
with the basic rules to be followed in each phase of the project.


regards,
-- 
Paolo
http://paolo.ciarrocchi.googlepages.com/
--

From: David Miller
Date: Tuesday, August 19, 2008 - 1:14 pm

From: Linus Torvalds <torvalds@linux-foundation.org>

The BT bits were the only part I really considered borderline,
and I was going to push back on Marcel.

But to be honest, I haven't seen bluetooth updates from him
for such a long time I felt that being strict here would just
exacerbate the problem.

Guess I was wrong.
--

From: Linus Torvalds
Date: Tuesday, August 19, 2008 - 1:47 pm

I really don't see the e1000 and netxen updates as being critical either. 
Sure, they look like driver improvement, but "improvement" is not what the 
-rc3+ series is about. 

Same goes for all the loopback changes. They look like cleanups or feature 
enables.

IOW, it all looks like good commits, but quite a _lot_ of that queue looks 
like good commits that should happen during the merge window, not during 
the stabilization phase.

And this is by no means unique to _this_ pull request. It's been a very 
clear pattern for a long time now. The networking area tends to be one of 
the absolutely *most* active ones during the post-rc1 phase.

[ Yeah, in all fairness some architectures also do that, but at least I 
  feel like I _really_ don't need to care when I get a diffstat that only 

I pointed out the BT ones as standing out (they were larger than some of 
the other patches too), but I really don't think this was in any way 
limited to BT in any shape, form or color. Quite frankly, looking through 
the thing, my gut feel is that about _half_ the commits over-all should 
probably have been in the queue for the next release.

		Linus
--

From: David Miller
Date: Tuesday, August 19, 2008 - 2:04 pm

From: Linus Torvalds <torvalds@linux-foundation.org>

Those fix a performance regression reported by a real user.

Sure I did a cleanup of dead code in one of those commits,
but if you look at the commit beforehand from Herbert, the
context, you can see that it made no sense to leave that
in there any longer as half of what it was standing there
as "documenting" was removed by Herbert's commit.
--

From: Rafael J. Wysocki
Date: Tuesday, August 19, 2008 - 2:15 pm

FWIW, they fix the recent regression tracked as
http://bugzilla.kernel.org/show_bug.cgi?id=11316 .

Thanks,
Rafael
--

From: Linus Torvalds
Date: Tuesday, August 19, 2008 - 2:28 pm

Yeah, and the real cause was apparently another commit that *ALSO* 
happened after the merge window!

Guys, you're making excuses for the problem.

The problem that triggered this bugus loopback change was commit 
e5a4a72d4f88f4389e9340d383ca67031d1b8536. Look at when that one was done. 

This is my whole _point_. The networking layer is doing development during 
the -rc window. And you guys are making excuses for it. Wake up, guys!

			Linus
--

From: David Miller
Date: Tuesday, August 19, 2008 - 2:35 pm

From: Linus Torvalds <torvalds@linux-foundation.org>

That change was made under the pretext that it was tested heavily and
that if we hit any problem whatsoever with it that we couldn't fix
quickly it would be reverted.

If you look at the pull request I sent you that contained that change,
I pointed this change out as "highlight" and explained the situation,
in detail.  Here it is:

	4) Lennert Buytenhek did some really nice analysis on a network
	   device that cannot do TSO offloading in hardware.  He checked
	   out what happens if you enable the software TSO mechanism fallback
	   we have in the kernel, and it improves CPU utilization tremendously.

	   It is safe to do this as long as the device in question can
	   support scatter-gather.

	   Herbert and I are discussing a way to do this even more efficiently
	   with some help from the device (currently the code has to allocate
	   extra sk_buff objects as we split up the TSO frame, and then do
	   a bunch of extra page ref counting, when all we need is some headers
	   and some way to say where the data portion split points are).

	   If this causes any problems whatsoever, it's trivial to revert this.

Did you read it?  I write those for you specifically, so that you know
what changes in there are "of note" and you may want to be aware of.

But anyways, let's chalk this one up as inappropriate.

Looking through the rest of the networking changes in this
pull I see real bug fixes in all of the netxen and e1000
changes that seemed to stand out.  All of the wireless stuff
looks like real bug fixes for things reported by real users.

And then there are 2 or 3 cleanups that probably could have
waited.

And then there is the Bluetooth SCO change which I agree was
borderline and I should have pushed back on.

There are simply a lot of people fixing a lot of bugs.  And I have to
stay on top of it all.  And I also have to be able to trust John
Linville, Jeff Garzik, Marcel, and others so that I don't have to ...
From: Linus Torvalds
Date: Tuesday, August 19, 2008 - 2:46 pm

David, I will say this one more time:

 - as long as you concentrate on individual commits, you're missing the 
   big picture.

you can _always_ make excuses for individual commits. That's not my point. 
Or rather, it actually verry much _is_ my point. If you have the mindset 
that you're looking for excuses why any individual commit is ok to merge, 
then you don't end up with a coupld of individual commits any more: you 
end up with a LOT OF CHURN.

It's not the individual commits. You're looking at the individual trees, 
and you're missing the forest. The problem isn't the individual trees. The 
problem is that there's a metric sh*tload of individual trees, what we in 
the tree industry call a 'forest'. You're not seeing it.

And btw, don't get me wrong - you're not the only problem spot. During the 
-rc's leading up to 2.6.26, drivers/merdia was actually a _bigger_ 
problem. I happen to care less about that (the same way I care less about 
some odd-ball architectures), but I have to admit that drivers/media was a 
total disaster last time around.

So if it makes you feel any better, others have been even worse.  But this 
networking problem ha been going on for quite a while.

So the problem here really is that you seem overly eager to make excuses 
for individual patches. And if they _stayed_ "individual" it would all be 
good. But they don't seem to.

			Linus
--

From: Marcel Holtmann
Date: Tuesday, August 19, 2008 - 10:22 pm

to be quite honest, don't you think this is a little bit unfair. So if
it is drivers/media/ or arch/sh/ or whatever you don't care. Do you
actually care what I do in drivers/bluetooth/?

I think that networking and USB are the two most biggest trees of the
kernel and the number of changes they produce are big. And most things
are drivers. I do think that we are doing pretty well in holding back
architectural changes of a subsystem and testing them properly in -mm or
-next before merging them.

The actual drivers make a big portion of Linux and we need them and we
want them. However drivers are the biggest problem for most subsystem
maintainers since they don't own all hardware. And just face it. Most
people that have certain hardware bits are not going to run -next or -mm
kernels. If we get them to test -rc kernels we are lucky.

So drivers/net/ alone is 41M in size. That is a quarter of all drivers/
directory and to be fair the others also contain the actual subsystem in
there while networking maintains it outside that directory.

Just look at your EeePC. Booting a 2.6.26 kernel on it and you have no
Ethernet and no WiFi drivers for the built-in hardware.

I personally think that we need to be conservative for the actual
subsystem changes and a little bit more open for driver changes.
Especially when it comes to new drivers (not a tg3 or e100 driver) since
we wanna ease the entry level to get Linux running. When it comes to
networking, there are just a lot of drivers. Plain simple as that.

And when it comes to architectural or subsystem changes, I think that
the merge window rule is followed quite literal. With minor things
during -rc1 because of merge conflicts, but eventually the -next tree
will solve all of these.

So what about the drivers? Should drivers for new hardware go in? Even
if the maintainers don't think they are stable enough? The current
approach is that even an almost stable driver is better than no driver.
If this no longer applies, then please spell ...
From: Marcel Holtmann
Date: Tuesday, August 19, 2008 - 9:20 pm

so this is the statement, I sent Dave to explain why that change was in
there:

---
For the btusb driver this adds the promised SCO support. The btusb
driver is a new driver and will eventually replace hci_usb. Adding SCO
support was the last missing piece. All distributions are using the
hci_usb driver at the moment and you can only select one of them. So
this can't introduce any regression. With this change the distributions
are now able to select the new driver if they really want to.
---

Was this absolutely needed after -rc3. Of course not. No questions asked
about it. So why did it ended up in there?

Almost everybody is using the hci_usb driver and that one has issues
that are beyond fixable. So the btusb is its replacement and with this
change it became a real alternate solution. For me this is a new driver
that would allow people to use it in case hci_usb gives them a hard time
and falls over again. And fixing hci_usb is not an option. A lot of
people tried it and they failed. I think the last one was Pavel a month
ago. This is why I re-wrote the whole beast from scratch.

So that is my excuse why I thought this would be good choice to push it
to Dave. No more excuses and no new drivers after the merge window. At
least not from me.

Regards

Marcel


--

From: david
Date: Tuesday, August 19, 2008 - 9:47 pm

one of thr goals of the new release approach was to make releases 
frequently enough that it's not a big deal to miss a merge window, you 
only have to wait a couple of months (rather then a couple of years under 
the old model).

while I don't see a bit problem with drivers going in for previously 
unsupported hardware (at least since I custom compile my kernels with all 
unnessasary drivers disabled, so I wouldn't even try to compile them ;-) 
it doesn't hurt much, either as a user, or for you as a developer (as you 
note above) to go ahead and delay till the next merge window.

the benifits of delaying are that the changes in the -rc cycle are clearer 
and smaller. this should make the progress towards the release more 
obvious, and avoid distractions like the one that started this thread. 
yes, this will make the -rc1/-rc2 even bigger as there is more stuff going 
in, but it looks like that is being handled well (in part thanks to the 
preview that -next is providing)

so as a user/tester I want to thank you for being so willing to delay new 
stuff for the next merge window, may others learn to follow your example.

David Lang
--

From: Marcel Holtmann
Date: Tuesday, August 19, 2008 - 10:38 pm

the downside is that users wanna use this hardware have to wait for the
next kernel release. The -next and -mm trees are simply not for
everybody. Even some of the -rc kernels are a pain if you happen to use
a non-x86 system. The kernel developers can fix them easily or know who
to ask for a fix. So decision to include certain driver updates or new
or an individual driver, I can go for many kernel releases without
running into major merge conflicts.

We do have the fast moving targets like wireless. And it is not always
the developers fault. The hardware manufactures are putting out new
chips so fast nowadays that keeping up with the drivers is a hard job.
Also laptop/desktop manufactures are a lot quicker in integrating these
chips and bringing them to market.

I made the EeePC 901 example. A 2.6.26 kernel has no support for the
Ethernet card in it. This happened that last time with a 2.2 kernel
where I bought an Ethernet card that was not supported.

So when it comes to new driver support, it is a judgment call. Some
times we make the wrong one.

Regards

Marcel


--

From: Linus Torvalds
Date: Tuesday, August 19, 2008 - 2:21 pm

Since when?

The thing is, I can do a 

	gitk v2.6.24.. drivers/net/loopback.c

as well as anybody else, and TSO has not been enabled for loopback at 
least since 2.6.24. Going back to 2.6.23 (which has more changes that I 
won't comment on), it looks like that LOOPBACK_TSO thing you removed was 
there back then too.

So the performance regression if it happened must have been due to 
something else, no?

Oh, I'm sure that enabling TSO speeds things up, but apparently it also 
basically enables a code-path that hasn't been enabled since at least 
2.6.23, no?

Really, David. Was the performance regression due to something else, and 
then by enabling LOOPBACK_TSO it hid the problem? Or what? The thing is, 
-rc3 is _not_ the point to apparently change something that hasn't been 
changed in about a year (I didn't go any further back in history). 

So what's going on? Do you seriously think it's a good point in time to 
enable TSO for loopback after a long time of apparently _not_ being 
enabled? 

It smells like excuses to me. Was this really a "must be in 2.6.27" thing?

And no, it wouldn't bother me if this was a rare thing. Again, let me 
repeat: the problem is not any of the individual commits _per_se_. The 
problem is that the network layer stands out. And not in a good way. It 
stands out as being a layer that gets a _lot_ of churn late in the -rc 
game.

			Linus
--

From: David Miller
Date: Tuesday, August 19, 2008 - 2:27 pm

From: Linus Torvalds <torvalds@linux-foundation.org>

Check the regression list entry you were pointed to in another
reply.

But I'll save you some time and I'll explain the problem for you.

We enabled GSO segmentation offload, which is a software variant of
TSO we've had in the tree for ages, when a card can do scatter-gather
and checksumming offloading in HW.  We do this because Lennert
Buytenhek validated with many tests that this consistently decreases
cpu utilization.

However, a user reported that if they NAT'd a remote destination port
using netfilter to a loopback addr:port, then there was a performance
degradation.

Herbert discovered the cause, which was multi-fold.  And smashing the
SKB checksum and not indicating TSO capability in loopback was the end
cause.

Loopback should enable TSO for other reasons, not just to fix this
bug.  If loopback says it can do TSO then the TSO packet gets passed
straight through to the receive side, and our entire stack has been
able to handle that for years.
--

From: Linus Torvalds
Date: Tuesday, August 19, 2008 - 2:33 pm

yeah. And look at when that happened.

Dammit, all I ask is that you 

 - admit that you have a problem
 - work on fixing it.

Stop the incessant excuses already.

		Linus
--

From: David Miller
Date: Tuesday, August 19, 2008 - 2:40 pm

From: Linus Torvalds <torvalds@linux-foundation.org>

I happily consider this example as inappropriate, sure.

But I think you're throwing the baby out with the bath water, the
majority of that pull contained legitimate real bug regression fixes.
That's why some other developers are coming out of the woods and
defending me, they don't have to do that, but they do it because they
feel I'm being slighted at least a little bit.

I don't know what to say, because I spent most of my sunny weekend
working on bug fixes as well as integrating other people's work.
--

From: Linus Torvalds
Date: Tuesday, August 19, 2008 - 2:50 pm

.. and notice how I 

 (a) took it

and

 (b) am asking for you to be more careful?

In other words, I would be a lot happier if you didn't say "majority". I 
would be a ton happioer if you could HONESTLY say that every single one 
was a regression.

And the thing is, you cannot. Some of the ones I pointed you to were 
actually regressions due to _other_ patches you had much too happily sent 

Umm. The only defending I have seen was a F*CKING DISGRACE, since nobody 
apparently had the balls to stand up and admit that the whole problem 
happened after -rc1 in the first place!

In other words, the "defense" was just making excuses for EXACTLY the 
behaviour I'm trying to tell you shouldn't have happened in the first 
place.

Please. You're still making excuses for this, even after I pointed out 
that ALL of the problems with the whole loopback driver thing happened 
after the merge window. 

			Linus
--

From: David Miller
Date: Tuesday, August 19, 2008 - 2:56 pm

From: Linus Torvalds <torvalds@linux-foundation.org>

Fair enough.
--

From: Evgeniy Polyakov
Date: Tuesday, August 19, 2008 - 3:26 pm

I belive it was you who told that there is no black and white (another
guy told that there is no spoon, I frequntly confuse).

Any changes made no matter when can not be 100% tested in laboratory
environment, even fixes, which look obviously. Even changes which do fix
some problems can introduce another. And some fixes can introduce
problems, which are not immediately shown in majority of the tests, so
changes on top of them can look like introducing new bugs. If you have
multiple changes and result which produce an error, it does not mean
that the last one was wrong. Of course it can, but 'there is no black
and white', and in really complex system only trivial changes can be
thought of not touching others. The same applies to loopback. So this
particular note is just about the fact, that fixing regression means
either reverting a change or introducing a new change. The latter is
preferred (or at least should be), since it is a move forward.

According to other changes, which you believe are not suitable for the
post merge window releases... People do know that major changes are not
allowed to be made, but there are always last strikes in the head which
are supposed to fix problems, not to introduce new ones. Where did you
see experimental code in -rc cycle? Where did you see patches which
break things without bringing improvement at that time? Yes, there
changes which are not supposed to be in the tree at that time, but
that's just a development process, which even in a short run makes good
result. Regressions and bugs are fixed, and things are not getting worse
with time.

-- 
	Evgeniy Polyakov
--

From: Linus Torvalds
Date: Tuesday, August 19, 2008 - 3:40 pm

100% agreed.

Please note that I'm not against these things slipping in occasionally. 
The reason I brought this up in the first place really wasn't the loopback 
driver issue at all. The reason I brought it up was simply the fact that 
when I compare the size and frequency of changes, the networking pulls 
tend to be the worst of the lot of the "core" kernel changes.

I say "core" kernel changes, because things are usually worse for the 
outliers. As mentioned, networking is actually one of the _better_ guys if 
you start comparing to the DVB people, or to some of the architectures 
that often slip the merge window _entirely_, and *all* their changes come 
in during -rc2 or something.

So it's not that networking is especially bad on an absolute scale in this 
regard. And it's not like it doesn't happen all the time for everybody 
else too. But I think networking has ben a bit more cavalier about things 
than many other core areas.

So no, I'm not asking for black-and-white absolutes here. But I'm asking 
for a "tightening of the belts". Please don't let it all hang out, ok?

			Linus
--

From: David Miller
Date: Tuesday, August 19, 2008 - 3:52 pm

From: Linus Torvalds <torvalds@linux-foundation.org>

I'm in agreement :)
--

From: Marcel Holtmann
Date: Tuesday, August 19, 2008 - 11:15 pm

I was always under the impression that Dave was quite strict when it
comes to merging things after the merge window. Yes, some things should
have better waiting for the next release, but we always had exceptions
for various things that fall out of the merge window anyway.

All the small "soldiers" like me make a call on what to pick to send to
Dave and it is not that we try to sneak anything in. It just made sense
to us and either he agrees or not. Some choices are better than others,
no questions asked, but I think what you are seeing is that networking
is getting huge. And this is mostly networking drivers. And I don't
expect this to slow down or anything. The Linux WiFi support is just at
the edge to really become competitive and show real leading across all
other operating systems.

When looking at the actual split between net/ and drivers/, then the
drivers part is the big one. And I don't expect this to go down any time
soon. We are about to get Ultra-Wideband support merged. After that we
will have plain networking over UWB, Wireless USB using UWB and soon
Bluetooth over UWB. Also we will see Bluetooth over 802.11 and then
another Ultra-Low-Power thing for sensor devices. And I forget WiMAX.
That is just the short term wireless future. Every of these come with
new networking drivers and then you have new Ethernet drivers and so on.
It is just a lot of stuff.

While looking at the actual diffstat, I realized that I really was the
biggest offender:

net/bluetooth/hci_sysfs.c                   |  376 ++++++++++++++-------------

This is actually the regression fix. It is big, because thanks to sysfs,
I had to move some code around in that file and rename things. I should
have seen that earlier and gave Dave an extra comment why it is so big.
That was my bad. Sorry for that.

Regards

Marcel


--

From: Denys Fedoryshchenko
Date: Tuesday, August 19, 2008 - 2:39 pm

Well i report about performance regression before too, seems related case,
but my report was unclear.

And the bug was terrible for me, it was causing very bad performance on 
REDIRECT and loopback transfers. I am testing also recent net-2.6 with this 
loopback changes, it seems improve things for me much. So it is really 
important bugfix for me too.

There is always chances that some fix will cause regression, i will try to 
test this changes on intensive real-life workloads to make sure all fine.
--

From: Evgeniy Polyakov
Date: Tuesday, August 19, 2008 - 2:15 pm

Netxen driver update contains bug fixes (leak and races) and hardware
workaround. Well, it has driver version bump either, I agress, that
one was an error. E1000 contains number of regression fixes and

It fixes performance regression.

-- 
	Evgeniy Polyakov
--

From: Marcel Holtmann
Date: Tuesday, August 19, 2008 - 8:43 pm

as I explained to Linus, my current assumption was that documentation
updates and new driver stuff should go in quickly. You will not get any
of these from me anymore. Next time you only get the one regression fix
and all my queued up stuff in the next merge window.

Don't hold back if you think that a patch is not acceptable. Really, I
am using GIT for everything. Merging is no problem for me. I also do
backports in my -mh patch for the latest stable kernel. So there is no
extra work for me. It is just sending you a new email with another tree
to pull from :)

Regards

Marcel


--

Previous thread: [PATCH 1/6] sched: rt-bandwidth for user grouping interface by Peter Zijlstra on Tuesday, August 19, 2008 - 3:33 am. (1 message)

Next thread: ppm conference by Steven Michaels on Tuesday, August 19, 2008 - 4:43 am. (1 message)