this is a simple bzImage kernel, no modules at all. Here's the full regression report: kernel used: latest -git, head 7180c4c9e09888db0a188f729c96c6d7bd61fa83. Regression seems to have been introduced into v2.6.25 by this commit: | commit 040babf9d84e7010c457e9ce69e9eb1c27927c9e | Author: Auke Kok <auke-jan.h.kok@intel.com> | Date: Wed Oct 31 15:22:05 2007 -0700 | | e1000/e1000e: Move PCI-Express device IDs over to e1000e v2.6.25-rc8 regresses relative to v2.6.24, with the following config, which config works fine in v2.6.24: http://redhat.com/~mingo/misc/config.e1000.bad the eth0 interface is not detected at all: http://redhat.com/~mingo/misc/dmesg.e1000.bad after more than an hour of experimenting around and bisecting the .config variances it turned out that turning off E1000E driver _module_ completely (which isnt even loaded, nor attempted to be loaded) made the kernel boot again: http://redhat.com/~mingo/misc/config.e1000.good and the e1000 interface is detected fine just like it was in v2.6.24: http://redhat.com/~mingo/misc/dmesg.e1000.good the difference in the config is: --- config.e1000.good 2008-04-08 20:24:30.000000000 +0200 +++ config.e1000.bad 2008-04-08 20:20:53.000000000 +0200 @@ -1400,8 +1400,8 @@ CONFIG_DL2K=m CONFIG_E1000=y CONFIG_E1000_NAPI=y # CONFIG_E1000_DISABLE_PACKET_SPLIT is not set -# CONFIG_E1000E is not set -# CONFIG_E1000E_ENABLED is not set +CONFIG_E1000E=m +CONFIG_E1000E_ENABLED=y # CONFIG_IP1000 is not set # CONFIG_IGB is not set CONFIG_NS83820=m it results in the following bootup difference: --- dmesg.e1000.good 2008-04-08 20:27:20.000000000 +0200 +++ dmesg.e1000.bad 2008-04-08 20:27:20.000000000 +0200 @@ -1269,14 +1269,8 @@ initcall 0xc06b7ce9 ran for 0 msecs: cpq Calling initcall 0xc06b81e1: e1000_init_module+0x0/0x6e() Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI Copyright (c) 1999-2006 Intel Corporation. -ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 ...
Auke is out sick, so I'm responding... then why are you compiling e1000e as a module? no "=y" in your kernel means no support, and this kernel .config has e1000e supporting your hardware. your expectation is that e1000 once loaded on this device in a previous kernel (2.6.24) so it should continue to work, right? I see your point but we are trying to make general improvements to both drivers, and the best way forward was a split, in order to make the user experience if you're running a no module kernel, you'll need to set CONFIG_E1000E=y The device IDs moved to e1000e, we don't want collisions between drivers that support the same IDs, so to avoid those user support issues, we're trying to make the process as painless as possible with announcements and time. The distros are already including the e1000e driver in their builds and new installs with the new ID layout will automatically select the correct driver for their hardware. Users that take an upgrade to their kernel (with e1000e enabled) might benefit if the distro upgrading that kernel included a post upgrade script that migrated e1000e devices previously using e1000 in modprobe.conf to alias ethX e1000e If there is a more reasonable solution you can come up with I am interested. Jesse --
there should be no need for me to set something that the kernel can do i think the solution is obvious and simple: if e1000 is built-in then e1000e should not be allowed to be a module. (i.e. it should either be built-in in which case it will handle the PCI IDs, or it should be disabled - in which case e1000 will handle them.) that way e1000e can take over the PCI IDs but we'll never get a non-working system, which takes an hour for a kernel hacker to figure out. The failure was totally silent. eth0 didnt show up at all. Btw., a sidenote: this is another generally annoying property of Linux: there's no easy and user-visible enumeration of PCI IDs (devices) that we _could_ support but dont enable for some reason. It is a royal PITA to track down when some driver decides to (silently) ignore a piece of hardware. Having a seemingly dead piece of hardware component is one of the most frustrating user experiences possible - the first instinctive reaction is "did my hw break???". The kernel should proactively know about all inactive pieces of hardware and should have a one-stop-shop for users where they can reassure themselves which devices are not active and why. Ingo --
It's almost trivial to add new string attributes to sysfs. We could have a file, say, /sys/bus/pci/devices/0000:07:03.0/broken which lspci could read to see if anything's left a message for us. Is that the kind of thing you had in mind? -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
yep, that would be fantastic. i guess more could be done as well - this was just the result of 10 seconds of thinking - please try to think all such scenarios through with the mindset of the user who is faced with a non-working device. Our failure diagnostics are rather ad-hoc in general. Say an USB stick did not come up. Or some card isnt working. Or the mouse is dead. Plain everyday annoyances - we need good, unified, understandable interfaces for users to get reassurances and vectors of action from. Maybe even a WARN_ON() for kerneloops.org to pick up automatically. _Anything_ that is actionable by plain users. Because failures in hw functionality is one of the most serious failure an OS can impose on users (it's only slightly better than say data loss, and clearly worse to most users than say sporadic crashes), and it is the main area where we _lose_ users every day. Ingo --
FWIW, this is what a command on "another OS" does with an unclaimed card: # ioscan -fk -C lan Class I H/W Path Driver S/W State H/W Type Description ==================================================================== lan 0 0/0/3/0 intl100 CLAIMED INTERFACE Intel PCI Pro 10/100Tx Server Adapter lan 1 0/1/2/0 igelan CLAIMED INTERFACE HP PCI 1000Base-T Core lan 2 0/2/1/0 iether CLAIMED INTERFACE HP A7012-60001 PCI/PCI-X 1000Base-T Dual-port Adapter lan 3 0/2/1/1 iether CLAIMED INTERFACE HP A7012-60001 PCI/PCI-X 1000Base-T Dual-port Adapter lan 4 0/3/1/0 ixgbe UNCLAIMED UNKNOWN PCI-X Ethernet (17d55831) I'd probably call that "unclaimed" rather than "broken" but that may just be a preference thing. rick jones --
`lspci -k' already reports which devices are claimed by which driver. Have a nice fortnight -- Martin `MJ' Mares <mj@ucw.cz> http://mj.ucw.cz/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth MS has designed a perfect copy protection scheme: There is no reason to pirate Vista. --
I would consider this a worthwhile addition to the kernel (and lspci).
It would be nice if lspci could display what driver had claimed a
particular device, and which devices were unclaimed by any driver or
otherwise had an error that prevented initialization.
I don't have enough experience to gauge how invasive this would be, but
I'd be happy to contribute towards it if practical.
Cheers,
Dan
--
/--------------- - - - - - -
| Daniel Noe
| http://isomerica.net/~dpn/
--You need to upgrade to a more recent version of lspci -- it already does this ;-) Maybe 'status' would be a better name than 'broken'. We could even default it to 'unclaimed' then. Or 'driver_status' to avoid conflicting with some bus that might have a 'status' bit we try to report through sysfs. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
Hah, thanks. That is useful and very new :) I built a newer lspci and
I agree however that the opportunity for more status would be good. And
status is a better name than "broken". This way it is easy to scan all
devices on the system via sysfs and easily visualize via lspci or some
other tool:
1) Unclaimed devices
2) Devices that aren't working properly - and why (please something more
than "This device is not working properly" :)
3) Devices that are claimed and working properly
Cheers,
Dan
--
/--------------- - - - - - -
| Daniel Noe
| http://isomerica.net/~dpn/
--yes, but it does not (yet) display the negative condition and the reason for that. (if the kernel knows the reason - and in most cases it knows it) Ingo --
Yes, it would be nice to have. Have a nice fortnight -- Martin `MJ' Mares <mj@ucw.cz> http://mj.ucw.cz/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth Noli tangere fila metalica, ne in solum incasa quidem. --
I think you've found the wrong problem ... it looks deliberate to me that enabling e1000e disables e1000 from claiming the PCI IDs (see the PCIE() macro right before the e1000_pci_tbl in drivers/net/e1000/e1000_main.c). The question is why e1000e isn't claiming the device ... -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
because i have e1000 built-in and dont load the e1000e module at all. That worked before and doesnt work now. the solution is rather straightforward: if E1000 is built-in then E1000E should be built-in as well or disabled (i.e. it should not be possible to build it as a module in that case) - because the PCI ID stealing trick now connects the two drivers unconditionally. [ If e1000 is a module then e1000e can be a module (or disabled) - this would be the most common configuration. ] Ingo --
And this would seem to break the most common means of testing a new driver for existing (and working!) hardware, which is to build both drivers as modules, install the new one, and if it appears to have problems either remove and insert the old driver by hand, or boot forcing the old driver. I can't be the only person who tests kernels on machines I wouldn't use to build a kernel, and uses modprobe.conf to test new driver functionality. -- Bill Davidsen <davidsen@tmr.com> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot --
yes, but note that the breakage you are talking about is not caused my patch, it is caused by the planned change to remove those PCI IDs from e1000. my suggested change only solves part of the more general problem you touch upon. (and it does not make it worse in any way) Ingo --
Then disable E1000E in your kernel config, and the PCIE() macro will do the right thing... Have you reviewed the discussion that led to PCIE()? Jeff --
it is an obvious regression that could and should be solved in the Kconfig space: do not allow E1000=y && E1000E=m. i repeat, it took me more than an hour to figure out why there's no networking on my laptop. Guess how much it takes for a plain user to figure out the same problem. Ingo --
this is really not the solution imho, having e1000 builtin and e1000e as a module is a perfectly viable choice. They are two separate drivers that are completely independent. I also think that the word "regression" is way out of proportion. I did not complain myself when IDE/ATA->SATA driver merges broke all my systems and I was pleasantly provided with the 'cannot find root vfs' message. (that's what I get a plain user uses a distro which is aware of the issue and that will load e1000e automatically, because it was tested by the distro. that is just pushing the discussion to the wrong point. The decision has been made a long time ago to split e1000 in two. now that we have two drivers, we have a migration issue. you can't fix this migration issue by forcing a specific .config endresult on the user. I've seen suggestions that will alleviate the issue by adding a 'default E1000' to the e1000e Kconfig section, and something like that makes sense to me and I still would be happy to merge something like that. I do not think that hiding the existence of e1000e for any user by always enabling it will fix things at all however, and will just lead to a lot of other issues later on. Auke --
you try to argue against a strong and established concept that Linux always had from day one on: DRIVER_X=y means the user prefers that driver so strongly that he has selected it built-in. Such drivers are special in every sense: they run first before any of the module init, they cannot be disabled, etc. etc. The only case where that should be overriden as the primary driver for that piece of hardware if _another_ driver is built-in _too_. ... which is exactly the E1000=y && E1000E=m regression that bit me and the simple solution of forcing E1000E to follow the mode of the E1000 driver solves it. The most common distro setup is E1000=m and E1000E=m. The most common embedded setup is _one_ of the two drivers as =y. So i'm not sure why you are arguing about all this. Please just fix this bug, simple as that. Ingo --
I haven't said NAK, but I think the suggested fix is a waste of time because 1) it breaks (by disallowing) a valid setup based on one report 2) it only happens to experienced kernel hackers with weird configs 3) the suggested fix binds together more tightly two drivers we are trying to keep separate 4) it is a temporary situation that will go away in 2.6.26 anyway So from my point of view, your request is to pick the breakage you don't care about (#1, above) to fix the breakage you do care about. It's a "pick your poison" choice, from my POV. Given that POV, that's why I lean towards avoiding your Kconfig fix -- viewing this as a transition issue, and not something to be fixed by limiting the choices of others. But if everyone strongly agrees with you... go ahead and patch, I won't NAK it. I dislike the Kconfig system growing "temporary" hacks, which tend to accumulate false dependencies over time. But I readily admit that's a general principle and not a hard rule... Jeff --
well, your 2.6.26 plans, if i understand them correctly, is to move currently working PCI IDs from e1000 to e1000e, like you attempted to d it in v2.6.24, which Linus reverted - correct? I.e. e1000 simply wont support eth0 on my T60 from 2.6.26 on? That is still an incredibly stupid plan, and no amount of announcement on lkml will make it any less stupid. ... which pretty much pulls the rug from under your argument, no? Ingo --
It seems like you're saying that once hardware is supported by a particular config option, it can never ever be split out to another config option, even if it makes both drivers cleaner. A similar situation happened when the sk98lin driver was split into skge and sky2...I don't remember a big fuss back then. Is it just that no major developers were using the hardware so they didn't notice? Chris --
The difference is that :
1) either could be used for a long time
2) the old worked so bad that the word has spread among people in forums
to try the new driver instead.
I think that splitting drivers should be something accepted in the kernel's
lifetime, but users must not be left confused. It's clearly easier to insert
ourselves in their common process to wave hands indicating that their setup
will soon not work anymore (eg: by having e1000 indicate what driver must be
loaded for unsupported devices).
Willy
--A plain user would have obtained a working distro kernel, putting this rare problem purely in the laps of people with highly unusual kernel configs... But as noted in the announcement, the fix is for this is to continue in the next step in the transition, then you only have one driver claiming those IDs, for any given config. No need for further Kconfig tweakage. Jeff --
i find it mindboggling and rather sad that you are still in denial :-( this is an obvious regression to me, with a very simple fix. No other PCI driver breaks like this. We've got three thousand Kconfig options - it is clearly not realistic for users to keep such details in mind to avoid pitfalls. E1000=y && E1000E=m is uncommon but can easily happen. E1000=y && E1000E=m simply makes no sense in light of the PCI ID stealing that occurs if E1000E is enabled. Ingo --
That's because PCI ID transitions are extremely, extremely rare. Agreed -- hence the multiple announcements, including in this thread, to put said details into mind. It's an unusual situation, thus it received announcements so that people "makes no sense"... for your personal situation. They are two independent drivers, and that's a valid mix of Kconfig settings. Someone could easily have an e1000 card in an e1000e machine, and come up with that specific config mix. It is not denial to say "people other than Ingo might validly choose that config mix." Overall, fundamentally, any transition of a user base from one driver to another is going to be an ad-hoc process. That's the nature of the problem -- each case is different. So we avoid driver transitions if at all possible, as a general rule. In this case, a rare exception to that rule, you have to hammer out a user-education process as best you can. Jeff --
which part of "it took a kernel developer more than an hour to figure out why his laptop had a dead network interface" did you not understand? Whatever you did, it was not apparent to me. I dont follow every tiny detail of the e1000 driver family, nor do 99%+ [*] of our users. find the fix below, against current -git. the current upstream behavior is the worst possible one and is just a plain bug, and the solution is dead-simple. Ingo [*] guesstimate ---------------> Subject: e1000=y && e1000e=m regression fix From: Ingo Molnar <mingo@elte.hu> Date: Wed Apr 09 21:09:35 CEST 2008 fix a regression from v2.6.24: do not transfer the e1000e PCI IDs from e1000 to e1000e if e1000 is built-in and e1000e is a module. Built-in drivers take precedence over modules in many ways - and in this case it's clear that the user intended the e1000 driver to be the primary one. "Silently change behavior and break existing configs" is never a good migration strategy. Most users will use distro kernels that are not affected by this problem at all - nor are they affected by this patch - but this problem can hit users and developers who build their kernels themselves and migrate from v2.6.24 to v2.6.25. this fixes: http://bugzilla.kernel.org/show_bug.cgi?id=10427 Signed-off-by: Ingo Molnar <mingo@elte.hu> --- drivers/net/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-x86.q/drivers/net/Kconfig =================================================================== --- linux-x86.q.orig/drivers/net/Kconfig +++ linux-x86.q/drivers/net/Kconfig @@ -2022,7 +2022,7 @@ config E1000E will be called e1000e. config E1000E_ENABLED - def_bool E1000E != n + def_bool E1000E = y || ((E1000E != n) && (E1000 = E1000E)) config IP1000 tristate "IP1000 Gigabit Ethernet support" --
Speaking as someone who's mostly (guestimated at 97% ;-) a user, I'd be in favor of a patch like this. The scenario I have in mind that would lead to exactly the situation Ingo is trying to solve is this: Fairly experienced user wants a kernel which supports his hardware without having to load modules, but wants other modules available "just in case". So he takes his distro kernel and selectively changes some modules to built-ins, including e1000. Next he upgrades to 2.6.25 and finds his NIC no longer works. Files bug reports all over the place and loads of people waste valuable time trying to help him. I doubt any of the people trying to help (who are trying to do so without access to the hardware) will soon think of this scenario. It's much more likely they'll get stuck on "but e1000e is available as a module, so it should get loaded, right". Maybe, just very maybe, someone, in an act of desperation will say "ok, try compiling in the module, see if that works". Or maybe they will stumble on this thread at some point. Anyway, I completely agree with Ingo that it would be really nice if all the wasted time and frustration could just be avoided. Cheers, FJP --
Nope, udev with normal distro config will load e1000e just fine. A custom system, without automatic loading of modules (or with unavailable module) could have this problem. A root fs on NFS or something like that, too. But while it can be changed for 2.6.25 it will break with 2.6.26 again, definitely. -- Krzysztof Halasa --
Not for us that don't compile modules at all, or only modules for the devices we actually have. Which should be about 99% of all kernel developers, because otherwise you're just wasting your time. I certainly want the configurations to be sane by default, because I'm not in the insane camp that has every single module and depend on udev picking the right one out. Linus --
That would work, too. -- Krzysztof Halasa --
No it wouldn't, not when the driver we used forever suddenly stopped supporting them. The fact that some *other* driver that I'd never ever enabled in my life suddenly supports them is irrelevant - it's not in my list of "hardware I have", and it's not even getting compiled. And no, I'm not talking about some theoretical "this could happen" thing. I hit exactly that with commit 040babf9d84e7010c457e9ce69e9eb1c27927c9e (I then thought that the new driver didn't even work for me, but that turned out to be an unrelated bug). It's very irritating when a working machine suddenly just stops working because some config option just changed its meaning. VERY irritating. Linus --
In this case e1000 stops supporting them only if you enable e1000e too. No e1000e, e1000 still does those IDs. For now, if I understand it correctly. And it seems to print a warning. Right. Now it's a different situation, though. -- Krzysztof Halasa --
On Thu, Apr 10, 2008 at 07:30:34AM -0700, Linus Torvalds wrote: If e1000e is not getting compiled, my understanding was the original e1000 Agreed. I like Ingo's Kconfig patch which forces both drivers (e1000 and e1000e) to be built the same way (ie both modules or both builtin). grant --
Yes. And the patch to do so was done by yours truly, exactly because I hit this thing ;) But it only works when the e1000e driver isn't enabled at all, which is why the "e1000e=m" case ends up being different, and Ingo then hit that one. Linus --
Uh, that's /not/ what Ingo's patch does. His patch makes e1000 claim the e1000e IDs if e1000 is built-in and e1000e is a module. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." --
Obviously e1000e hasn't been out for long enough to become common knowledge. (Both Ingo and Linus running into the problem is probably a sign...) Maybe it would make sense to have "e1000 implies setting e1000e to the same as e1000" for a couple releases, so that word gets around a bit more. Then you can remove the auto-select of e1000e and anyone that hasn't updated by then will get bit. Chris --
If this makes people happy then I am happy to ack this. --
so that's definately _not_ what I would like to see at all. Matthew points out that this will just prolong users to use e1000 instead of e1000e (which is what they should be encouraged to switch to in those cases). so I'm dropping my ACK this patch doesn't solve anything and we'll have the same issue back when 2.6.26 ships which will have no PCI Express device IDs in e1000. Auke --
why you want to cripple an existing, rather well working and popular Linux driver is beyond me. You have a wide array of measures if you want to migrate users to the new and shiny e1000e driver: you can stop adding _new_ IDs to the old driver, you can unsupport it, you can claim that it wont work in certain situations, you can print out messages to the user in the dmesg (if those messages are true), you can even remove IDs from it if the user has the new driver enabled. But what you cannot do is to intentionally cripple a popular driver. It's plain stupid. It does not matter how many times you've announced it, it's still madness. Unless your goal is to reduce the Linux userbase as quickly as possible that is ... ;-) And please understand: _you_ are the maintainer of this code so _please_, if you wish to do so, solve the problem differently, but dont just stand there _talking_. I gave you ample feedback about what the problem is (which you initially denied to even exist) and i even wrote a patch. You might never use e1000=y && e1000e=m or e1000=y && e1000e=n huh? How can you claim that?? It definitely solved my problem. Did you ... and not changing existing behavior for a perfectly well working system is exactly what compatibility and smooth migration is about. New drivers need several kernel releases to be fully known, to be fully trusted and to be fully accepted and integrated - and not the least, to be fully tested ... These are all well-known principles. It's nothing new at all and there's nothing special about it: dont break existing drivers and setups and dont create silent side-effects between drivers. Ingo --
Because we decided a long time ago to do this driver split. And everyone at that time agreed with that, and we set out to do this. And part of that plan was to move (not copy) the device IDs over. We accepted that that might break some kernel developers' systems in the process and consulted several vendors and distros if they were OK with the change and they all agreed with the plan. I do not want people with PCI Express e1000 cards to use e1000 for any day longer than is strictly needed, and I certainly do not want to prolong the period where both drivers could work on their adapters. That will be a far bigger nightmare for me than just a few kernel developers having a bad day. I guarantee, I will get e-mails about 2.6.25+e1000(e) issues for far longer then you guys :) Users will outnumber us kernel developers in complaints if we keep the situation unclear to them, and we already told them that they need to switch to e1000e for their PCI Express devices. If we now do stuff like what you proposed in that patch, we just prolong this confusion. That cannot be good for anyone. Imagine if distro's start picking random device IDs or worse. Stuff like that is already happening, and discussions like these just add to the confusion. Again - If there is a way to auto-enable e1000e in the right way so that more systems migrate better then I'm all for it (even if forcing E1000E=y). But it seems that the various patches proposed don't cut it and frankly Kconfig is completely inadequate as a hardware enabling script since it knows absolutely nothing about the hardware in the first place. And it wasn't meant for that either. `make oldconfig` is not the answer ;). Again - this has happened before, I remember many of my boxes not booting because SATA Kconfig options changed and all my boxes failed to move the proper Kconfig symbols over when I ran `make oldconfig` myself. Somewhere around 2.6.20 or so. Auke --
that's an insane argument ... because we messed up in the past and have hurt users (and probably lost users) you feel like it gives you a free card to mess up again??? The IDE -> SATA migration, while i like the new SATA code and find it excellent and well-maintained (many kudos for that to Tejun, Jeff, Alan & co), caused a lot of trouble for users in one specific area, for no good reasons other than stupid personality conflicts: /dev/hda worked just as well as /dev/sda, the _name_ of the device should never have been changed. So if you use _that_ aspect of the (otherwise cool) SATA/PATA code as a blueprint for the e1000 -> e1000e migration then you are on the worst possible track in terms of picking a role model ;-) It's as if you adored Sylvester Stallone for his vivid mimics, Jean-Claude Van Damne for his excellent acting skills and Paris Hilton for her brillant brain. really, just because you do exceedingly good things to Linux does not give you a free card to do something bad to Linux in exchange. The two do not cancel out each other - because the bad things _add up_ and drive away users, irreversibly. To you e1000 is the center of the universe so you feel the price is worth paying. For others it is not. We want the good things from you and we'll say no thanks to the bad ideas. Kernel developers, especially old-timers, regularly forget about that. Ingo --
Hey, hey calm down. The device moving over to e1000e shouldn never have been added to e1000. They're totally differnet and the only reason they got added in thefirst time was because soemone talked intel into it. We discussed this a long time and came to a wide agreement it should move out. Now the actual transition could and should have been handled better, but with all the pci-e hardware in a separate driver we're all off better in the long term. And this is not really comparable to the libata transition at all, there's no user-visible changed. For every distro kernel that just builds both driver it's a completely seamless transition, and for people who build their own kernel we should find some Kconfig trickery to make the transition easier. For example we could just built e1000e when CONFIG_E1000 is set and spill a warning that starting from 1.1.2009 ---end quoted text--- --
firstly, a good deal of our alpha testers use =y drivers. Secondly, your
kind of constructive email is exactly what i wanted to see in the first
place...
i dont really care _how_ this gets solved - i'm not maintaining this
code. What forced me to deal with it was this outright denial of my
problem, the ridiculing and NACK-ing of it and general stonewalling.
I'd have preferred to have sent only my first report. The networking
driver guys on the other hand:
1) forced me to send a full bugreport about something that i described
adequately in my very first mail already, and which they should have
immediately recognized, based on the trouble they had with Linus. (i
wasnt aware of that back when i made my report)
2) repeatedly denied that there is any problem. Claimed that "this is a
careful migration balance we decided" and other babbling.
3) forced me to write a patch for code they are supposed to be
maintaining to actually get things moving.
4) moved the regression bugzilla entry to REJECTED+INVALID without
actually resolving the bug and forced me to write several comments
there too. (See http://bugzilla.kernel.org/show_bug.cgi?id=10427)
5) forced me to write 20 mails with still no clear resolution yet at
this point.
it's insane and i'm really curious what kind of language you'd use in
your replies if i ever forced you through such an excercise in arch/x86
or the scheduler ;-)
and no, it wasnt a case of miscommunication. My bug was i think
well-understood in the very first mailings already, but it was
discounted as unimportant and resolution was delayed all the way up
until this point. That shows fundamental insensitivity to bug reporters
which is more worrisome than the bug itself (the bug is fairly minor and
i never claimed otherwise).
Hours of my time wasted on something that should have been a 2 minutes
matter - and yes, as i go through these chores i do get increasingly
annoyed about it, and rightful...this is a gross misrepresentation and misunderstanding. You're completely ignoring the fact that: (1) I debated whether it was a "regression" - in my opinion design changes that deliberately break things are hardly worth this incredibly negative stamp (2) I never NACK'ed your patch. I just withdrew my ack. (3) You're stonewalling me by pretty much forcing me to completely drop the driver split and not showing any understanding for the reason behind the split at all. You don't provide a solution, nor does anyone, and I don't see any solution to what you want but to completely cancel this driver split. And I'm _not_ going to do that. --
Maybe you could rename both configuration switches, so that you bring it to the attention of everybody who does make oldconfig? Have a nice fortnight -- Martin `MJ' Mares <mj@ucw.cz> http://mj.ucw.cz/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth main(){char *s="main(){char *s=%c%s%c;printf(s,34,s,34);}";printf(s,34,s,34);} --
As a start we could do two driver keyed off a single Kconfig variable. And then find a way to get users informed that they might need to enabled the other one --
I think that's a great solution. Here's a suggested patch. Not much tested, but it's fairly obvious. It basically makes one top-level config option (E1000) to pick the driver at all, and two sub-options (E1000_PCI and E1000_PCIE) that you can choose between. If you pick E1000 support, you're given the choice between "PCI only", "PCI-E only" or "support both", and that will then pick the right combination of support for E1000_PCI and E1000_PCIE. This also does imply that you cannot mix the "module-ness" of the two drivers, because you choose whether the E1000 support (in general) is going to be a module or built-in, and that choice will automatically affect the sub-choices. I do think that this makes the whole driver status much more obvious. (It does mean that if you chose E1000E before, and chose _not_ to support E1000 at all, you will now not even be asked about PCI-E support, because you've effectively said "no" to E1000 support in the first place. If we want to avoid that, then the top-level E1000 config variable should probably be renamed to E1000_SUPPORT or something like that). Linus --- drivers/net/Kconfig | 52 +++++++++++++++++++++++------------------- drivers/net/Makefile | 4 +- drivers/net/e1000/Makefile | 2 +- drivers/net/e1000e/Makefile | 2 +- 4 files changed, 32 insertions(+), 28 deletions(-) diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 3a0b20a..6968e20 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -1979,9 +1979,35 @@ config E1000 To compile this driver as a module, choose M here. The module will be called e1000. +choice + prompt "E1000 bus type support" + depends on E1000 + default E1000_BOTH + help + Choose PCI or PCI-E support for E1000 driver + +config E1000_PCI_ONLY + bool "Support only older E1000 PCI cards" + +config E1000_PCIE_ONLY + bool "Support newer E1000 PCI-E cards" + +config E1000_BOTH + bool "Support all E1000 cards" + +endc...
Wouldn't it make more sense to turn E1000 into a option that does nothing except select both E1000E and E1000_PCI, and have those two be the options that build drivers? Then, after a while, we drop the E1000 option entirely, and people are fine as long as they used any of the kernels in between (since the system will have forgotten that E1000E was only set by an option that has disappeared). Right now, E1000 means "support both PCI and PCI-E E1000" and E1000E means "support PCI-E E1000". I don't see any reason not to add a "support PCI E1000" option and keep the semantics of existing options the same and just change the implementation. AFAICT, this makes "make oldconfig" always give the same support that the the earlier kernel had and people get set it to what they actually want if they notice. I.e., something like this (plus removing the ID-stealing in e1000): diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index f337800..9078bde 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -1955,6 +1955,11 @@ config DL2K config E1000 tristate "Intel(R) PRO/1000 Gigabit Ethernet support" + select E1000_PCI + select E1000E + +config E1000_PCI + tristate "Intel(R) PRO/1000 PCI Gigabit Ethernet support" depends on PCI ---help--- This driver supports Intel(R) PRO/1000 gigabit ethernet family of @@ -1976,7 +1981,7 @@ config E1000 config E1000_NAPI bool "Use Rx Polling (NAPI)" - depends on E1000 + depends on E1000_PCI help NAPI is a new driver API designed to reduce CPU and interrupt load when the driver is receiving lots of packets from the card. It is @@ -1990,7 +1995,7 @@ config E1000_NAPI config E1000_DISABLE_PACKET_SPLIT bool "Disable Packet Split for PCI express adapters" - depends on E1000 + depends on E1000_PCI help Say Y here if you want to use the legacy receive path for PCI express hardware. diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 3b1ea32..8026e63 100644 --- a/driver...
