Linux: Enforcing the Merge Window

Submitted by Jeremy
on August 8, 2007 - 4:37pm

Following a recent merge request, Linus Torvalds stressed that he was serious about not wanting to merge any big changes after the merge window closes, "get the changes in before -rc1, or just *wait*. If they aren't ready before the merge window opens, they simply shouldn't be merged at all." Jeff Garzik reiterated, "once -rc1 is out there, that means the focus should be on stabilizing the existing codebase. Pushing a big driver update means that effort must restart from scratch. We just don't want to go down that road, which a big reason for the merge window in general." Further when it was noted that the recent changes were heavily tested by the vendor, Jeff stressed the importance of community testing:

"Take a lesson from when I was on Linus's shit-list... twice: Twice, Intel submitted an e1000 update after the merge window closed. Twice, they claimed the driver passed their quite-exhaustive internal testing. And twice, the most popular network driver broke for large masses of users because I took a hardware vendor's word on testing rather than rely on the testing PROVEN to flush out problems: public linux kernel testing.

"I'm not singling out Intel, there are plenty of other hardware vendors that repeat the exact same pattern."


From:	James Bottomley [email blocked]
Subject: [GIT PATCH] scsi bug fixes for 2.6.23-rc2
Date:	Sat, 04 Aug 2007 12:31:43 -0500

This is mainly bug fixes ... there's one or two features completions
that have been delayed pending ack and review to do with bsg (headers
and passthrough) but these are really required to complete already
upstream code.

The patch is available here:

master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6.git

The short changelog is:

Boaz Harrosh (6):
      aha152x: use data accessors and !use_sg cleanup
      aha152x: Fix check_condition code-path
      aha152x: Clean Reset path
      aha152x: preliminary fixes and some comments
      aha152x: use bounce buffer
      aha152x: fix debug mode symbol conflict

David Miller (1):
      ESP: Revert ESP_BUS_TIMEOUT back down to 250

FUJITA Tomonori (6):
      initialize shost_data to zero
      mptsas: add SMP passthrough support via bsg
      bsg: update sg_io_v4 structure
      ibmvscsi: use shost_priv
      ibmvscsi: remove unnecessary map_sg check
      zfcp: convert to use the data buffer accessors

James Bottomley (4):
      sd: disentangle barriers in SCSI
      aic7xxx: cap maxsync according to correct card limits
      mpt fusion: make logging a global sysfs parameter
      libsas: fix build dependencies on libata

James Smart (9):
      lpfc : scsi command accessor fix for 8.2.2
      lpfc 8.2.2 : Change version number to 8.2.2
      lpfc 8.2.2 : Style cleanups
      lpfc 8.2.2 : Miscellaneous Bug Fixes
      lpfc 8.2.2 : Miscellaneous management and logging mods
      lpfc 8.2.2 : Rework the lpfc_printf_log() macro
      lpfc 8.2.2 : Attribute and Parameter splits for vport and physical port
      lpfc 8.2.2 : Fix locking around HBA's port_list
      lpfc 8.2.2 : Error messages and debugfs updates

Jeff Garzik (1):
      gdth: remove redundant PCI stuff

Mark Fortescue (1):
      qlogicpti: Some cosmetic changes

Matthew Wilcox (1):
      dpt_i2o: convert to SCSI hotplug model

Matthias Kaehlcke (1):
      st: Use mutex instead of semaphore

Salyzyn, Mark (1):
      aacraid: prevent panic on adapter resource failure

Seokmann Ju (1):
      qla2xxx: fix panic caused by previous patch


and the diffstat:

 block/bsg.c                         |   10 
 drivers/message/fusion/mptbase.c    |   17 
 drivers/message/fusion/mptsas.c     |  126 ++++++
 drivers/s390/scsi/zfcp_fsf.c        |    5 
 drivers/s390/scsi/zfcp_qdio.c       |   41 --
 drivers/scsi/aacraid/linit.c        |    4 
 drivers/scsi/aha152x.c              |  169 ++++----
 drivers/scsi/aha152x.h              |    2 
 drivers/scsi/aic7xxx/aic7xxx_core.c |   22 +
 drivers/scsi/dpt_i2o.c              |  132 +++---
 drivers/scsi/dpti.h                 |    9 
 drivers/scsi/esp_scsi.h             |    2 
 drivers/scsi/gdth.c                 |   48 +-
 drivers/scsi/gdth.h                 |    6 
 drivers/scsi/hosts.c                |    2 
 drivers/scsi/ibmvscsi/ibmvscsi.c    |   39 --
 drivers/scsi/libsas/Kconfig         |    3 
 drivers/scsi/lpfc/lpfc.h            |   72 ++-
 drivers/scsi/lpfc/lpfc_attr.c       |  423 +++++++++++++++-------
 drivers/scsi/lpfc/lpfc_crtn.h       |   28 -
 drivers/scsi/lpfc/lpfc_ct.c         |  243 ++++++------
 drivers/scsi/lpfc/lpfc_debugfs.c    |  595 ++++++++++++++++++++++++++++---
 drivers/scsi/lpfc/lpfc_debugfs.h    |    2 
 drivers/scsi/lpfc/lpfc_els.c        |  679 ++++++++++++++++--------------------
 drivers/scsi/lpfc/lpfc_hbadisc.c    |  539 ++++++++++++----------------
 drivers/scsi/lpfc/lpfc_hw.h         |   14 
 drivers/scsi/lpfc/lpfc_init.c       |  284 +++++++--------
 drivers/scsi/lpfc/lpfc_logmsg.h     |   10 
 drivers/scsi/lpfc/lpfc_mbox.c       |   20 -
 drivers/scsi/lpfc/lpfc_mem.c        |   32 +
 drivers/scsi/lpfc/lpfc_nportdisc.c  |  162 +++-----
 drivers/scsi/lpfc/lpfc_scsi.c       |  413 ++++++++++-----------
 drivers/scsi/lpfc/lpfc_sli.c        |  423 +++++++++++-----------
 drivers/scsi/lpfc/lpfc_sli.h        |   10 
 drivers/scsi/lpfc/lpfc_version.h    |    4 
 drivers/scsi/lpfc/lpfc_vport.c      |  164 +++++---
 drivers/scsi/lpfc/lpfc_vport.h      |    2 
 drivers/scsi/qla2xxx/qla_os.c       |   14 
 drivers/scsi/qlogicpti.c            |   50 +-
 drivers/scsi/scsi_lib.c             |   17 
 drivers/scsi/sd.c                   |   14 
 drivers/scsi/st.c                   |   16 
 drivers/scsi/st.h                   |    3 
 include/linux/bsg.h                 |   13 
 include/scsi/scsi_driver.h          |    2 
 include/scsi/sd.h                   |    2 
 46 files changed, 2837 insertions(+), 2050 deletions(-)

James


From: Linus Torvalds [email blocked] Subject: Re: [GIT PATCH] scsi bug fixes for 2.6.23-rc2 Date: Mon, 6 Aug 2007 17:51:57 -0700 (PDT) On Sat, 4 Aug 2007, James Bottomley wrote: > > This is mainly bug fixes ... there's one or two features completions > that have been delayed pending ack and review to do with bsg (headers > and passthrough) but these are really required to complete already > upstream code. James, this is the last time *ever* I apply patches from you after -rc1. You used to have serious problems with the merge window, but for a few releases you then seemed to "get it" and got on with the program. But now it's back to "anythign goes", apparently. And I'm going to take a hard-line approach with you now. For SCSI merges, if I don't get the first pull request in the FIRST week of the merge window, don't bother sending one later, unless it's pure fixes and regressions. And after -rc1, I don't want to see crap like this: 46 files changed, 2837 insertions(+), 2050 deletions(-) because that simply is *not* appropriate after -rc1, much less -rc2. So I pulled, but I wanted to make it very clear that I'm very unhappy with you right now, and you're on my shit-list for the next few releases. Get the changes in before -rc1, or just *wait*. If they aren't ready before the merge window opens, they simply shouldn't be merged at all. Linus
From: James Bottomley [email blocked] Subject: Re: [GIT PATCH] scsi bug fixes for 2.6.23-rc2 Date: Mon, 06 Aug 2007 22:55:41 -0500 On Mon, 2007-08-06 at 17:51 -0700, Linus Torvalds wrote: > > On Sat, 4 Aug 2007, James Bottomley wrote: > > > > This is mainly bug fixes ... there's one or two features completions > > that have been delayed pending ack and review to do with bsg (headers > > and passthrough) but these are really required to complete already > > upstream code. > > James, this is the last time *ever* I apply patches from you after -rc1. > > You used to have serious problems with the merge window, but for a few > releases you then seemed to "get it" and got on with the program. > > But now it's back to "anythign goes", apparently. And I'm going to take a > hard-line approach with you now. > > For SCSI merges, if I don't get the first pull request in the FIRST week > of the merge window, don't bother sending one later, unless it's pure > fixes and regressions. Confused ... you did get the first pull request in the first week. That was this: Subject: [GIT PATCH] first SCSI merge for 2.6.22 Date: Sun, 15 Jul 2007 10:24:17 -0500 190 files changed, 21725 insertions(+), 26337 deletions(-) Then there was the last piece before the merge window closed: Subject: [GIT PATCH] final piece of the SCSI merge for 2.6.22 Date: Sun, 22 Jul 2007 13:28:53 -0500 74 files changed, 3649 insertions(+), 1295 deletions(-) > And after -rc1, I don't want to see crap like this: > > 46 files changed, 2837 insertions(+), 2050 deletions(-) > > because that simply is *not* appropriate after -rc1, much less -rc2. > > So I pulled, but I wanted to make it very clear that I'm very unhappy with > you right now, and you're on my shit-list for the next few releases. Get > the changes in before -rc1, or just *wait*. If they aren't ready before > the merge window opens, they simply shouldn't be merged at all. OK ... that's arguable. This one is larger than I like because of the lpfc bug fix patch ... I accept I need to do a better job getting these into the merge window via the scsi-misc tree. So I will accept the "too big" criticism and try to manage the driver maintainers better. However, I won't accept the "not bug fixes only" criticism at -rc1. The problem is that we're trying to stabilise a new feature: bsg. Unfortunately, the closure of the merge window was really the first time anyone got to play with all of these features together. The non-bug fix changes around bsg have been trying to achieve stability. The problem is that there were a few fairly problematic pieces: dependence on non-modular SCSI; SG header layout and driver implementation. What we really don't want is to have a problematic API baked in stone because we can't do anything other than bug fix updates once the merge window closes. The real root cause of all of this is that there's no tree I can persuade all the interested parties to test that includes all of these features. In spite of the fact they've all been incubating in -mm for at least 3 months, no-one apparently tested all the features together until 2.6.23-rc1 was released, so then we're scrambling to address the issues as they arise. I really, *really* think we need a pre-release tree that consists of all the upstream targetted features (i.e. all of the for the next merge window git trees) and nothing else. -mm doesn't really satisfy this, because it has so much other stuff that the people I need to get testing this don't trust it. The lack of a tree like this that we could have persuaded people to test for the last month is what's causing us to scramble like this at the closure of the merge window. James
From: Linus Torvalds [email blocked] Subject: Re: [GIT PATCH] scsi bug fixes for 2.6.23-rc2 Date: Mon, 6 Aug 2007 21:01:46 -0700 (PDT) On Mon, 6 Aug 2007, James Bottomley wrote: > > Confused ... you did get the first pull request in the first week. Here's the problem. Let me repeat it again: > > And after -rc1, I don't want to see crap like this: > > > > 46 files changed, 2837 insertions(+), 2050 deletions(-) It DOES NOT MATTER if I get a first pull request in the first week, if that pull request is purely cosmetic, and is followed by stuff that *should* have been in the merge window four weeks afterwards. > OK ... that's arguable. There's nothing arguable at all about it. If you have 5000 lines of changes, that's not a "bugfix" any more. That's a big damn change, and it should have happened in the merge window. Or if it doesn't make it in time, in the *next* merge window. Linus
From: James Smart [email blocked] To: Linus Torvalds [email blocked] Subject: Re: [GIT PATCH] scsi bug fixes for 2.6.23-rc2 Date: Tue, 07 Aug 2007 09:12:21 -0400 In defense of my maintainer, who was working on my behalf! ... The lpfc mods were the bulk of the +/- counts. We batch our bug fixes together and then push to James as a large lump. Unfortunately, we had a change that changed logging from a base object to a subobject. Although not risky, it did account for a lot of +/- changes. The way we pushed to James, did not allow for him to easily segment one set of changes from the other. Emulex will change this behavior, hopefully making this easier on James to keep you happy. However, I take issue with looking at line counts as the sole basis for what's appropriate or not. It can be argued that some bug fixes may be larger in scope than others, or patch batching so that the bug fix count is higher will skew this perception. I also believe that more "lesser" bugfixes should be allowed in an earlier -rc? than later, so a hard-and-fast rule for line counts seem odd. Also - what's a bug fix ? There are many things which are not "features" but are necessities for diagnosis or support of the larger change. Some of these you simply don't find in time to make sure they are in place for the -rc1 merge. Do you hold off on them, or do you make a choice based risk/reward based on where the -rc is ? I vote for the latter. I realize that the Linux kernel is such a beast overall that you must have some simple guidelines, but basing it solely on numbers is a very bad pitfall. -- james s
From: Jeff Garzik [email blocked] Subject: Re: [GIT PATCH] scsi bug fixes for 2.6.23-rc2 Date: Tue, 07 Aug 2007 12:13:36 -0400 James Smart wrote: > However, I take issue with looking at line counts as the sole basis > for what's appropriate or not. It can be argued that some bug fixes may be > larger in scope than others, or patch batching so that the bug fix count is > higher will skew this perception. I also believe that more "lesser" > bugfixes > should be allowed in an earlier -rc? than later, so a hard-and-fast rule > for > line counts seem odd. Also - what's a bug fix ? There are many things > which are not "features" but are necessities for diagnosis or support of > the > larger change. Some of these you simply don't find in time to make sure > they > are in place for the -rc1 merge. Do you hold off on them, or do you make a > choice based risk/reward based on where the -rc is ? I vote for the latter. > I realize that the Linux kernel is such a beast overall that you must have > some simple guidelines, but basing it solely on numbers is a very bad > pitfall. It's straightforward engineering math: the more LOC that changed, the more important it is to /not/ stuff it into a stabilization release, because of the greater potential for breaking stuff and negating all the existing testing so far. Once -rc1 is out there, that means the focus should be on stabilizing the existing codebase. Pushing a big driver update means that effort must restart from scratch. We just don't want to go down that road, which a big reason for the merge window in general. If you miss the merge window, tough cookies :) You gotta deal with it just like I do, and everyone else does. Remember -- the more disciplined we all are with the merge window, the more likely it is that a release can be stabilized quickly, and thus, the more quickly we will reach the next merge window. In contrast, increasing violations of the merge window mean increasing time between releases. Jeff
From: Andrew Morton [email blocked] Subject: Re: [GIT PATCH] scsi bug fixes for 2.6.23-rc2 Date: Tue, 7 Aug 2007 00:14:29 -0700 On Mon, 06 Aug 2007 22:55:41 -0500 James Bottomley wrote: > The real root cause of all of this is that there's no tree I can > persuade all the interested parties to test that includes all of these > features. In spite of the fact they've all been incubating in -mm for > at least 3 months, no-one apparently tested all the features together > until 2.6.23-rc1 was released, so then we're scrambling to address the > issues as they arise. I pulled git-scsi-misc on July 19 and there was no bsg code in there at all. I pulled again on July 20 and all the bsg code was in mainline. So it appears that the bsg code went mailing-list -> mainline in less than 24 hours, so there wasn't a lot of opportunity for -mm testing there. A lot of the stupid it-doesn't-compile stuff would have been fixed in -mm, but more substantial problems might not have been picked up. But one can say that about anything. > I really, *really* think we need a pre-release tree that consists of all > the upstream targetted features (i.e. all of the for the next merge > window git trees) and nothing else. That *is* -mm. The vast majority of -mm is the 75-odd subsystem trees. What you're suggesting amounts to omitting some of those trees for test purposes (I think). If so, which ones? Now it coud be argued that subsystem maintainers should run two trees in the last 2.6.x-rcN phase: one tree for 2.6.x+1 and one tree for 2.6.x+2. Then someone could pull all that together as the "Linus tree in a month, minus insufficiently baked stuff" tree. But frankly, I don't expect that people will want to do that, nor will they be able to do it reliably. Plus, an *amazing* amount of stuff turns up in the git trees which was committed just a few days prior to the merge window opening, or even after it opening. eg, bsg which was, afaict, first committed to the scsi tree eleven days after the 2.6.22 release. > -mm doesn't really satisfy this, > because it has so much other stuff that the people I need to get testing > this don't trust it. Right. 75-odd developers need to stop committing bugs to their devel trees. Interesting project ;) > The lack of a tree like this that we could have > persuaded people to test for the last month is what's causing us to > scramble like this at the closure of the merge window. Nope. The scramble is caused by subsystem maintainers jamming stuff into mainline at the last minute so they don't have to sit on it for the next two months. Look. If we're serious about this then the rule needs to be something like If it wasn't committed to your tree *at least* two weeks prior to the 2.6.x merge window opening, it shouldn't go into 2.6.x. People are not presently observing this sort of discipline by a metric mile. And I'm not sure that we should, really. I don't think it's terribly bad to whack half-baked things (bsg ;)) into mainline during the merge window, as long as a) we're sure that we want the feature in Linux and b) we're confident that we can get it fixed up within a couple of months. Two months is a long time. But that's just me, and it is not the approach which Linus wants taken.
From: Jeff Garzik [email blocked] To: Andrew Morton [email blocked] Subject: Re: [GIT PATCH] scsi bug fixes for 2.6.23-rc2 Date: Tue, 07 Aug 2007 11:24:47 -0400 Andrew Morton wrote: > On Mon, 06 Aug 2007 22:55:41 -0500 James Bottomley wrote: >> I really, *really* think we need a pre-release tree that consists of all >> the upstream targetted features (i.e. all of the for the next merge >> window git trees) and nothing else. > > That *is* -mm. The vast majority of -mm is the 75-odd subsystem trees. Not quite. -mm is git trees plus an amazing amount of random patches that turned up on LKML, a lot of which is not destined for kernel release 2.6.(X+1) or 2.6.(X+2), > Plus, an *amazing* amount of stuff turns up in the git trees which was > committed just a few days prior to the merge window opening, or even after > it opening. Yes :( That's a tough problem to solve, too. Deadlines always motivate people, and so -- as in almost every other software project I've worked with -- everybody seems to submit their work on the day of the deadline. Realistically, for the merge window to work perfectly, each step down the maintainership ladder needs to have time to review and integrate the changes destined for that merge window. Ideally, people would do all this work beforehand, so that each step up the ladder has time prior to merge window for review and testing. But that's just not software engineers as we know them ;-) >> The lack of a tree like this that we could have >> persuaded people to test for the last month is what's causing us to >> scramble like this at the closure of the merge window. > > Nope. The scramble is caused by subsystem maintainers jamming stuff into > mainline at the last minute so they don't have to sit on it for the next > two months. Indeed. Particularly in this case, where bsg didn't really grace -mm at all. > Look. If we're serious about this then the rule needs to be something like > > If it wasn't committed to your tree *at least* two weeks prior to the > 2.6.x merge window opening, it shouldn't go into 2.6.x. > > People are not presently observing this sort of discipline by a metric > mile. And I'm not sure that we should, really. My goal AT A MINIMUM with netdev and libata is to get stuff in at least one -mm release prior to merge window opening (though preferably a longer lead time than that). Of course, reality intrudes, but that's my goal. And I think it's a reasonable goal to push upon others (but I'm biased:)) Jeff
From: Jeff Garzik [email blocked] Subject: Re: [GIT PATCH] scsi bug fixes for 2.6.23-rc2 Date: Tue, 07 Aug 2007 12:06:49 -0400 James Bottomley wrote: > OK ... that's arguable. This one is larger than I like because of the > lpfc bug fix patch ... I accept I need to do a better job getting these > into the merge window via the scsi-misc tree. So I will accept the "too > big" criticism and try to manage the driver maintainers better. > > However, I won't accept the "not bug fixes only" criticism at -rc1. The > problem is that we're trying to stabilise a new feature: bsg. Just so we don't lose the forest for the trees... Not trying to put words in Linus's mouth, but it seems to me he wasn't complaining specifically about bsg. "style cleanups", "cosmetic cleanups", ancient ISA driver polishing (1542, my gdth patch) are definitely not "bug fix only" material. The lpfc update was probably the biggest thing, LOC-wise. And even though that was mostly bug fixes -- and notably NOT 100% fixes -- it is big enough to warrant integration testing and exposure prior to mainline. Definitely merge-window-open material AFAICS. Jeff
From: James Smart [email blocked] Subject: Re: [GIT PATCH] scsi bug fixes for 2.6.23-rc2 Date: Tue, 07 Aug 2007 12:27:35 -0400 Jeff Garzik wrote: > The lpfc update was probably the biggest thing, LOC-wise. And even > though that was mostly bug fixes -- and notably NOT 100% fixes -- it is > big enough to warrant integration testing and exposure prior to > mainline. Definitely merge-window-open material AFAICS. FYI - it is integrated and tested prior to mainline, by Emulex (and who else *really* tests it close to the degree we do ?). We do so, as a whole, weeks ahead of the submit to the maintainer. Usually, there's only a couple of small api changes that are picked up when we merge into the maintainers pool. And most of these are caught by us prior anyway as we package the patchsets and ensure the integration into the maintainers pool is smooth. -- james s
From: Jeff Garzik [email blocked] Subject: Re: [GIT PATCH] scsi bug fixes for 2.6.23-rc2 Date: Tue, 07 Aug 2007 12:34:54 -0400 James Smart wrote: > Jeff Garzik wrote: >> The lpfc update was probably the biggest thing, LOC-wise. And even >> though that was mostly bug fixes -- and notably NOT 100% fixes -- it >> is big enough to warrant integration testing and exposure prior to >> mainline. Definitely merge-window-open material AFAICS. > > FYI - it is integrated and tested prior to mainline, by Emulex (and who > else *really* tests it close to the degree we do ?). We do so, as a whole, > weeks ahead of the submit to the maintainer. Usually, there's only a couple > of small api changes that are picked up when we merge into the maintainers > pool. And most of these are caught by us prior anyway as we package the > patchsets and ensure the integration into the maintainers pool is smooth. This is a highly common pattern, and unfortunately you get the highly common Linux response: In Linux we never ever assume a driver is working simply because the hardware vendor tested it. A decade of real world experience PROVES precisely the opposite -- getting code out into the world early and often repeatedly turned up problems not seen in hardware vendor's testing. Take a lesson from when I was on Linus's shit-list... twice: Twice, Intel submitted an e1000 update after the merge window closed. Twice, they claimed the driver passed their quite-exhaustive internal testing. And twice, the most popular network driver broke for large masses of users because I took a hardware vendor's word on testing rather than rely on the testing PROVEN to flush out problems: public linux kernel testing. I'm not singling out Intel, there are plenty of other hardware vendors that repeat the exact same pattern. It's quite simply impossible for a hardware vendor to test all the weird combinations in the field. Our test lab -- the Internet -- is the one we trust. Jeff

Related Links:

The most popular network driver?

Yenya
on
August 9, 2007 - 1:36am

Is really e1000 the most popular network driver? From more than a hundred of computers here I can remember only two or three with e1000. For servers from my point of view it seems that the most widely used is tg3, not e1000.

-Yenya

pretty popular

Anonymous (not verified)
on
August 9, 2007 - 5:17am

The tg3 was pretty popular (do you have a lot of older dells?), but the e1000 is definitely a better chipset (and thus more likely to be added on--most of the add-in NICs I've seen and used are Intels). And the e1000 was (and is) embedded in a lot of Intel motherboards. At any rate, systems that ship with a broadcom chipset are far more likely to use the bnx2 driver than the tg3 driver these days.

The fact that intel has kept a stable interface for so long is an advantage not to be underestimated, nor is their commitment to maintaining their driver. I've got some older syskonnect cards (which actually outperform the intels on small packets) that just don't work anymore due to driver rot. (The older sk98lin driver is essentially unmaintained and doesn't integrate well in 2.6 kernels, and the new skge driver is focused on newer syskonnect embedded chipsets and has issues with the older cards.) Similarly, I've had major issues with the tg3 driver on older embedded NICs (slow, high-latency performance). Intel just does a better job of supporting their stuff, and they've been making cards that use that driver for what, a decade now? I wouldn't be surprised if it was the most popular NIC out there on linux systems.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.