From: Jesse Brandeburg <jesse.brandeburg@intel.com>
Test the MSI interrupt physically once before assuming that it
actually works. Several platforms have already come across that
have non-functional MSI interrupts and this code will attempt
to detect those safely. Once the test succeeds MSI interrupts
will be enabled.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
---
drivers/net/e1000e/defines.h | 1
drivers/net/e1000e/e1000.h | 1
drivers/net/e1000e/netdev.c | 161 ++++++++++++++++++++++++++++++++++++++++--
3 files changed, 155 insertions(+), 8 deletions(-)
diff --git a/drivers/net/e1000e/defines.h b/drivers/net/e1000e/defines.h
index a4f511f..c3593f3 100644
--- a/drivers/net/e1000e/defines.h
+++ b/drivers/net/e1000e/defines.h
@@ -385,6 +385,7 @@
/* Interrupt Cause Set */
#define E1000_ICS_LSC E1000_ICR_LSC /* Link Status Change */
+#define E1000_ICS_RXSEQ E1000_ICR_RXSEQ /* rx sequence error */
#define E1000_ICS_RXDMT0 E1000_ICR_RXDMT0 /* rx desc min. threshold */
/* Transmit Descriptor Control */
diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index 4bf0c6c..556b258 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -305,6 +305,7 @@ struct e1000_info {
#define FLAG_MSI_ENABLED (1 << 27)
#define FLAG_RX_CSUM_ENABLED (1 << 28)
#define FLAG_TSO_FORCE (1 << 29)
+#define FLAG_MSI_TEST_FAILED (1 << 30)
#define E1000_RX_DESC_PS(R, i) \
(&(((union e1000_rx_desc_packet_split *)((R).desc))[i]))
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index f501dd5..41d57da 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -932,25 +932,30 @@ static irqreturn_t e1000_intr(int irq, void *data)
static int e1000_request_irq(struct e1000_adapter *adapter)
{
struct net_device *netdev = ...Ah, the perennial add-same-test-to-every-driver conundrum. I think we are far enough along with MSI to _not_ do this anymore in drivers. The platforms with MSI problems should be discovered, made public, and worked around. Otherwise you hide the same problem, just for someone else to discover with another component of the machine. Jeff --
ok, that's reasonable as I have not recently seen any "new" reports against the current trees. We have to keep this patch in our out-of-tree version around though, since a lot of distros still ship kernels without all the new MSI quirks. But that's not for this forum. Thanks, Auke --
Actually, I'm hoping you'll allow this Jeff, we have a production system (see below) we know about that doesn't like the way 82571 formats MSI interrupt messages. All other systems seem to be okay with this format of MSI messages, but this system implemented a stricter interpretation of the spec, and so even though that system doesn't need a quirk for MSI because MSI works in general, we still MUST test the MSI vector to make sure it works *for us* In this case it comes down to being an errata workaround. Since there is no way to "test" generation of an interrupt from any specific hardware device without internal knowledge of said device, there isn't a way for us to help the kernel by writing a generic "test MSI" routine. I would prefer this "generic test" code be in the driver rather than having to identify all the chipsets that fail and have the driver do This is our workaround, it is our fault, the incompatible chipset is in We had avoided this test in the past based on your recommendation but in this case I don't see a way around it, do you? I'm open to other suggestions but we need to solve the problem somehow in an *automated* way. IBM requested we do it this way. Jesse --
Well if it's a problem with the networking chip rather than the platform, then absolutely stick it in the driver (unless its so severe it needs to be in pci/quirks.c before the driver even loads). But that seems like a quick id test, with no need for all that generic MSI test code. Certainly the work is scoped based on where the problem lies, either platform, device, or platform+device. Jeff --
From: Jeff Garzik <jeff@garzik.org> Yes, that is the way to do it. --
I get your point, but this seems a maintainance problem due to not being able to "future proof" the solution. I know what is (IDs) available now, but I don't know how many systems in the future IBM will release with a similar bridge but a different device ID that causes the same issue. Should we take on the maintenance of continually having to add every new bridge device that has this issue to our driver? Users just want this stuff to work when they plug it in. tg3.c has exactly this kind of test and workaround (in fact its where I got the code) to work around the same kind of issues. --
From: "Brandeburg, Jesse" <jesse.brandeburg@intel.com> This is exactly what the PCI quirks list is and it works just fine. I know the truth is that Intel as a vendor frowns upon putting into it's driver a list of another vendor's PCI IDs as errata items because it looks bad. So just be honest about that. That is from an era when the situation was much different. --
Future-proofing in that way is a pipe dream. You hope to predict what _errata_, what out-of-spec behavior future hardware will have. Trying to code for such N^M possible futures will As David noted, we touch quirks.c all the time for various platform eccentricities. Adding a new id is easy and takes two seconds. The same ease of change applies to any driver-local list of ids, too. Anyway, I think a better question to ask is: should we bloat up every driver testing for platform quirks found on a minority of platforms? Moreover, "it doesn't work" type errata is typically fixed in future chip generations -- making any such generic test /less/ valuable, because of the lower likelihood that IBM will continue to release this buggy hardware for decades. We have an existing "this bridge and MSI don't get along" list. Adding an id is a one-line patch. Jeff --
well it's just slightly more complex than just that... the bridge works OK with MSI for every device except 82571 so 82572, 82573 e1000's work fine as well as all the newer pci-e e1000's, or any of the non-e1000 hardware that uses MSI interrupts. so disabling MSI entirely for that chipset by quirking it is not what we want to do in the first place. hence the test :) the only alternative I see is that we have a quirk in the e1000 driver that acts when the combination of an 82571 and this particular chipset is found, but I have no information ATM on the chipset id's yet and on how to identify this configuration. Auke --
From: "Kok, Auke" <auke-jan.h.kok@intel.com> And this is how you will fix this bug, once you know these IDs. --
From: "Brandeburg, Jesse" <jesse.brandeburg@intel.com> Then you look for that system chipset variant in the e1000e driver and not use MSI when it is found. A generic test really is not warranted. --
I've seem similar problems on other systems, so this would be a nice bit of code to have somewhere. I can see Jeff's argument for having this outside of drivers, but to make my life easier I'd like to see this in e1000e (and e1000!). :-) I suppose an more general alternative would be to make this code run as an ethtool test (current or new one) and then have a module option to disable/enable MSI so all could be done in userspace, but something automatic like this sure would be great. --
So you plan to drop the bits around the device additions that are related to e1000e config sometime soon? --
yes I was hoping to do this for 2.6.26 already, if not 2.6.27. I definately don't want to have that linger. Auke --
