Re: [PATCH 1/2] pci: add misrouted interrupt error handling

Previous thread: [PATCH 2/3] posix-timers: always do get_task_struct(timer->it_process) by Oleg Nesterov on Sunday, August 3, 2008 - 10:49 am. (1 message)

Next thread: [PATCH 2/2] fusion: Implement generic interrupt misroute handling by James Bottomley on Sunday, August 3, 2008 - 11:05 am. (7 messages)
From: James Bottomley
Date: Sunday, August 3, 2008 - 11:02 am

We're getting a lot of storage drivers blamed for interrupt misrouting
issues.  This patch provides a standard way of reporting the problem
... and, if possible, correcting it.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
---
 drivers/pci/Makefile |    3 +-
 drivers/pci/irq.c    |   60 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h  |    7 +++++
 3 files changed, 69 insertions(+), 1 deletions(-)
 create mode 100644 drivers/pci/irq.c

diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 7d63f8c..19dacb8 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -3,7 +3,8 @@
 #
 
 obj-y		+= access.o bus.o probe.o remove.o pci.o quirks.o slot.o \
-			pci-driver.o search.o pci-sysfs.o rom.o setup-res.o
+			pci-driver.o search.o pci-sysfs.o rom.o setup-res.o \
+			irq.o
 obj-$(CONFIG_PROC_FS) += proc.o
 
 # Build PCI Express stuff if needed
diff --git a/drivers/pci/irq.c b/drivers/pci/irq.c
new file mode 100644
index 0000000..6441dfa
--- /dev/null
+++ b/drivers/pci/irq.c
@@ -0,0 +1,60 @@
+/*
+ * PCI IRQ failure handing code
+ *
+ * Copyright (c) 2008 James Bottomley <James.Bottomley@HansenPartnership.com>
+ */
+
+#include <linux/acpi.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/pci.h>
+
+static void pci_note_irq_problem(struct pci_dev *pdev, const char *reason)
+{
+	struct pci_dev *parent = to_pci_dev(pdev->dev.parent);
+
+	dev_printk(KERN_ERR, &pdev->dev,
+		   "Potentially misrouted IRQ (Bridge %s %04x:%04x)\n",
+		   parent->dev.bus_id, parent->vendor, parent->device);
+	dev_printk(KERN_ERR, &pdev->dev, "%s\n", reason);
+	dev_printk(KERN_ERR, &pdev->dev, "Please report to linux-kernel@vger.kernel.org\n");
+	WARN_ON(1);
+}
+
+/**
+ * pci_lost_interrupt - reports a lost PCI interrupt
+ * @pdev:	device whose interrupt is lost
+ * 
+ * The primary function of this routine is to report a lost interrupt
+ * in a standard way which users can recognise (instead of ...
From: Matthew Wilcox
Date: Sunday, August 3, 2008 - 7:51 pm

Will the dev_printk() strings be included in the kerneloops report?  And
what if there is no parent of the device?  Consider device 00:02.0 on my
laptop:

00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 03)
        Subsystem: Fujitsu Limited. Device 13fe
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0

Couldn't this be written more concisely as:

	if (pdev->msix_enabled) {
		pci_note_irq_problem(pdev, "MSIX routing failure");
		return PCI_LOST_IRQ_DISABLE_MSIX;
	}
	if (pdev->msi_enabled) {
		pci_note_irq_problem(pdev, "MSI routing failure");
		return PCI_LOST_IRQ_DISABLE_MSI;
--

From: James Bottomley
Date: Sunday, August 3, 2008 - 8:46 pm

No, but some of them take the prior strings (depending on the

There is always a parent device ... it might be the PCI root device, in

The idea was to separate the cases in case something extra needs be
done.  I think it's pretty much identical as far as the compiler
optimises, and therefore probably not worth worrying about much.

James


--

From: Grant Grundler
Date: Sunday, August 3, 2008 - 9:30 pm

James,
Can you add a paragraph to Documentation/pci.txt about the usage
of the new API?

thanks,
--

From: James Bottomley
Date: Monday, August 4, 2008 - 6:31 am

When there actually is a new API, sure.

James


--

From: Bjorn Helgaas
Date: Monday, August 4, 2008 - 1:43 pm

Do you prefer "dev_printk(KERN_ERR, ...)" over "dev_err(...)"?  Easier
to grep for the former, maybe?  If so, should we deprecate "dev_err()"
and friends?  When I converted most of the PCI core to use dev_printk(),
(80ccba1186d48f ...) I used dev_err(), but I don't really care one way
or the other.

Maybe use pci_name(parent)?

I tried to standardize the PCI core on "[%04x/%04x]" for vendor/device ID.

Bjorn

--

From: Matthew Wilcox
Date: Monday, August 4, 2008 - 2:35 pm

That's unfortunately different from lspci -nn:

00:00.0 Host bridge [0600]: Intel Corporation Mobile PM965/GM965/GL960
Memory Controller Hub [8086:2a00] (rev 03)

Sorry, I should have reviewed your patches ;-(

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--

From: Bjorn Helgaas
Date: Monday, August 4, 2008 - 3:20 pm

I can change those.  I just copied what seemed to be the most
prevalent.  But lspci is harder to change, so we should probably
follow that.
--

From: James Bottomley
Date: Monday, August 4, 2008 - 5:02 pm

To be honest I'm not really interested too much in the various API
preferences ... they can be fixed up later by the people who care.

What I am interested in is whether we can get this interface to give
sufficient information to blacklist the offending motherboard if we can
identify it as the source of the problem.  Since the usual culprit is
the bridge, that's why I'm doing parent vendor/device pairs.  However,
better suggestions for the information that should be displayed would be
gratefully received.

James


--

From: Bjorn Helgaas
Date: Tuesday, August 5, 2008 - 8:42 am

I'm happy to fix it up later if you prefer.  I only mentioned it
because doing it later adds churn and risk of breakage.

Bjorn
--

From: Jesse Barnes
Date: Tuesday, August 5, 2008 - 10:03 am

This seems to be a function that just returns what type of IRQ you're using or 
how it's routed and it isn't necessarily "lost interrupt" specific.

This information is clearly useful to drivers both for informational purposes 
and for debugging problems, so on that front it looks good.  However, I think 
it should probably be called pci_interrupt_type(struct pci_dev *dev) or 
something instead, and just return an enum of either MSIX, MSI, or LINE 
(though I'm open to better names).  From that, the driver can infer what 
might be going wrong, though in the case of a LINE interrupt, all you can 
really do is report that there's probably a platform IRQ routing problem.

Jesse
--

From: James Bottomley
Date: Tuesday, August 5, 2008 - 1:44 pm

So perhaps this routine should only note but not advise?  The drivers

The only thing that this can't do is ACPI ... but on the other hand once
the IRQ routing is set by ACPI I'm not sure the driver can do anything
to fix it.

So ... depending on how the feedback goes, I'll plan to reroll this as a
note but not advise function.

James



--

From: Jesse Barnes
Date: Tuesday, August 5, 2008 - 1:53 pm

If it's just a pci_irq_type function then it probably shouldn't print 
anything, and leave that to the caller, since it might be used for other 
purposes too (e.g. a driver load printk or something).  In the lost interrupt 
case you already have to disable MSI or MSIX in the Fusion driver, so you may 
as well put the printk there, right?  I guess I'm saying it should neither 

Yeah, ACPI may or may not be the problem, all you'll really know is that 
you've got a line interrupt that you failed to get...  The driver won't be 
able to do much in that case aside from complain loudly.

Thanks,
Jesse
--

From: James Bottomley
Date: Tuesday, August 5, 2008 - 1:56 pm

Well, no; the object was to have the layer that knew (PCI) print
information which could be used to identify the problem.  Likely what
the driver will say is something like "MSI isn't working and it's not my
fault".  What I want is for PCI to print something that may be helpful
to people trying to diagnose the problem.  Driver writers aren't going

Yes ... it's just that when line interrupts fail (especially if they
worked previously) it's usually ACPI to blame.

James


--

From: Jesse Barnes
Date: Tuesday, August 5, 2008 - 2:15 pm

Yeah, I understand what you're trying to do, and given that there's only one 
user of this function (your fusion patch), I don't have a strong preference.

However, the function is really just telling the driver what type of IRQ a 
given PCI device currently has; it's not really a "lost interrupt" function 
at all.  So to me, it makes sense to just generalize it into a pci_irq_type 
function and let drivers do what they will with it.  It looks like this is 
the real lost interrupt detection:

+               /* May fail becuase of IRQ misrouting */
+               rc = mpt_get_manufacturing_pg_0(ioc);
+               if (rc) {
+			...
+                       printk(MYIOC_s_ERR_FMT "Cannot recover IRQ routing\n",
+                              ioc->name);
+                       return -1;
+               }

not the pci_lost_interrupt() function.  So it could just as easily be written 
as such:

+               /* May fail becuase of IRQ misrouting */
+               rc = mpt_get_manufacturing_pg_0(ioc);
+               if (rc) {
+			/* Lost an IRQ, see what type... */
+			int irq_type = pci_irq_type(dev);
+
+                       if (type == PCI_IRQ_MSI) {
+				/* Lost an MSI interrupt, try re-config w/o MSI */
+                               free_irq(ioc->pci_irq, ioc);
+                               ioc->msi_enable = 0;
+                               pci_disable_msi(ioc->pcidev);
+                               goto retry;
+                       }
+			/* This platform is just broken... */
+                       printk(MYIOC_s_ERR_FMT "Cannot recover from %s IRQ
+			       routing error\n", pci_irq_type_name(type), ioc->name);
+                       return -1;
+               }

or somesuch.  That seems just as simple for driver writers as your initial 
patch, and the function is named in accordance with what it actually does, 
rather than what it's used for...

Jesse
--

From: James Bottomley
Date: Tuesday, August 5, 2008 - 2:54 pm

It could, but if the bridge is the culprit (as it usually is for MSI
problems), this print won't help identify it.

Therefore, rather than give driver writers a recipe for "print this and
this and go to the bridge and print this", I'd rather have a single PCI
callback that prints all the (hopefully) relevant information that will
allow either fixing or blacklisting.

James

--

From: Jesse Barnes
Date: Thursday, August 7, 2008 - 9:03 am

So in addition to the IRQ type check we need to dump some device topology 
information... yeah that makes sense.  I wonder if the driver core should 
provide something like this.  Greg?  In the meantime we can definitely add 
the IRQ type function.

Thanks,
Jesse
--

From: Greg KH
Date: Thursday, August 7, 2008 - 10:20 am

What kind of topology do you need that is not already provided?

thanks,

greg k-h
--

From: James Bottomley
Date: Thursday, August 7, 2008 - 10:36 am

We really need to print how the device interrupt is routed.  But,
unfortunately, we need to identify the devices in the layout (bridges
and the like by say device/vendor numbers) so we know what they are ...
this is the bit that the generic device model can't do because it's
embedded in the bus specific part.

James


--

From: Jesse Barnes
Date: Thursday, October 23, 2008 - 2:55 pm

Ok, per our discussion at the PCI BoF I applied this, so now you can apply the 
fusion and other bits to help debug issues.

Thanks,
Jesse
--

Previous thread: [PATCH 2/3] posix-timers: always do get_task_struct(timer->it_process) by Oleg Nesterov on Sunday, August 3, 2008 - 10:49 am. (1 message)

Next thread: [PATCH 2/2] fusion: Implement generic interrupt misroute handling by James Bottomley on Sunday, August 3, 2008 - 11:05 am. (7 messages)