Yes, you are right, to find out the root cause is better. Thank you
for all your suggestion and information to us.
Since we have little experience on PCI and MSI here, we had to try to
disable MSI before we find a better solution. But as you are giving
I'm using kernel 2.6.23-rc5 to debug this MSI problem, which can NOT
boot on our Trevally board(RS690+SB700) without any kernel modification.
But if I comment out all the pci_intx() function calls in
drivers/pci/msi.c, it can boot now with MSI enabled as you expected!
# cat /proc/interrupts
CPU0 CPU1
0: 318 174060 IO-APIC-edge timer
8: 0 1 IO-APIC-edge rtc
9: 0 0 IO-APIC-fasteoi acpi
16: 0 204 IO-APIC-fasteoi HDA Intel
17: 0 479 IO-APIC-fasteoi ohci_hcd:usb1, ohci_hcd:usb2, ehci_hcd:usb6
18: 1 2 IO-APIC-fasteoi ohci_hcd:usb3, ohci_hcd:usb4, ohci_hcd:usb5
19: 0 0 IO-APIC-fasteoi ehci_hcd:usb7
22: 4 1 IO-APIC-fasteoi yenta
8412: 0 1315 PCI-MSI-edge eth0
8413: 381 4858 PCI-MSI-edge ahci
NMI: 0 0
LOC: 174285 174210
ERR: 0
Also if I keep the pci_intx() calls in drivers/pci/msi.c and ONLY
comment out the pci_intx() call in drivers/ata/ahci.c
My system can boot up too with MSI enabled!
So does it mean that the root cause is our SB700 SATA controller
has a hardware bug where setting INTX_DISABLE in the PCI COMMAND
register masks MSI interrupts too?
And what is the software solution or workaround?
I will continue debug this MSI problem next week. Any suggestions,
please don't hesitate to tell us.
Thanks
Best Regards
Shane
_________________________________________________________________
News, entertainment and everything you care about at Live.com. Get it now!
http://www.live.com/getstarted.aspx
-
As someone else pointed out, AMD should have *lots* of people with pci and msi experience on the payroll. (Folks here buy AMD-designed Not sure. Sounds like the device driver needs a quirk for this part. The over-worked Jeff Garzik is the maintainer for that driver. You should probably provide the pci device id for this beast. --linas -
Take a look at tg3.c net driver change 2fbe43f6f631dd7ce19fb1499d6164a5bdb34568 which is a similar situation. However, it may turn out that removing the pci_intx() stuff as a general rule is easier than quirking these devices, if enough of them turn out to have this hardware bug. The tg3.c change should illustrate how to fix immediately, though. Jeff -
We'd have to count how many have this bug vs. how many will emit both intx and msi unless pci_intx is cleared, and then how many do that regardless of pci_intx :-) yuck Ben. -
At a first approximation, ATI/AMD devices don't send any interrupts if intx is disabled, nVidia devices send legacy interrupts in addition to MSI ones if intx isn't disabled, and Intel devices actually work correctly. So we need at least one kind of device quirk for intx and msi. (And doing it in the drivers doesn't work, since everybody is making things driven by snd_hda_intel and would like msi, afaict) -Daniel *This .sig left intentionally blank* -
Note that INTX_DISABLE is a recent addition to PCI. Older PCI devices support neither MSI nor INTX-disable, so make sure such devices don't creep into your sample. In general it is documented that INTX_DISABLE should apply only to INTx# so devices that disable MSI based on that bit are out of spec. But unfortunately that is rather irrelevant, since we see these out-of-spec devices in the field today. Jeff -
I have a device that supports MSI and INTX-disable, and, with MSI on (and delivering interrupts successfully) also sends legacy interrupts (on the IRQ that is no longer associated with the device) unless INTX is disabled. Without the intx_disable(), the kernel disables the IRQ entirely and breaks a random other device in my system. It's: 00:07.0 Bridge: nVidia Corporation MCP61 Ethernet (rev a2) I haven't tried MSI with the other devices in the system, but I expect that this: 00:05.0 Audio device: nVidia Corporation MCP61 High Definition Audio (rev a2) It's likewise documented (although maybe arguable in wording) that the device shouldn't send legacy interrupts if MSI is in use, regardless of INTX_DISABLE, but this also happens in the field. I think that the current Linux behavior with respect to INTX_DISABLE is simply due to which hardware bug was present in the device whose driver first got Linux support, but one or the other or both needs a quirk, since there's no behavior that works with everything. And it's still impossible to tell which bug is more common, since MSI isn't used most of the time, even if the hardware supports it, so it's pretty arbitrary which way Linux goes in the non-quirk case. -Daniel *This .sig left intentionally blank* -
I think MCP55 HDA did not have such problem, though I may be wrong (AFAIR it worked with shared IRQ and with MSI). -- Krzysztof Halasa -
From: Daniel Barkalow <barkalow@iabervon.org> I think this pretty much sums up the situation accurately. My suggestion is: 1) Leave the pci_intx() twiddling code in drivers/pci/msi.c 2) Add quirks for "INTX_DISABLE turns off MSI too", this sets a flag in the pci_dev. 3) The pci_intx() calls in drivers/pci/msi.c are skipped if this flag from #2 is set. 4) Add quirk entries for drivers/net/tg3.c chips and these SATA devices we are learning about here, as well as any others we are aware of right now. 5) Remove the pci_intx() workaround code from drivers/net/tg3.c and elsewhere. -
Seems right to me, and pretty straightforward, except that I don't really understand the pm-related logic in there to know how that should work and whether intx will need to be enabled somewhere in addition to not disabling it in the msi enable code. -Daniel *This .sig left intentionally blank* -
This quirk seems good to me. Waiting for your final decision.... This SB700 SATA controller MSI/INTx problem has been reported to our hardware team. I will forward the update information or response to you when I get any from HW team. Thanks Shane -
From: Jeff Garzik <jeff@garzik.org> Ok, it seems I've sort-of self-nominated myself to implement this so I'll try to work on it tomorrow :-) -
From: David Miller <davem@davemloft.net> I have a working implementation, fully tested on a machine with Tigon3 ethernet chips that have the quirk in question. Patch set coming up next. -
That sort of behavior is an example of why I wrote pci_intx() in the first place, and employed it by default throughout the ATA drivers (before it migrated into PCI core). Jeff -
MSI has been introduced by PCI 2.2 (and thus PCI-X 1.0) so there may be devices with MSI but without INTx-disable bit. I guess I have some early PCI-X hardware with MSI but I don't know if they have INTx-disable bit and I can't currently test that. The wording is: 10: This bit disables the device from asserting INTx#. A value of 0 enables the assertion of its INTx# signal. A value of 1 disables the assertion of its INTx# signal. This bit's state after RST# is 0. Refer to Section 6.8.1.3 for control of MSI. So strictly speaking it mandates disabling/enabling INTx but says nothing about other things (e.g. MSI). Some common sense dictates it shouldn't disable MSI, I guess. The "MSI Enable" description doesn't leave any doubt: 0: MSI Enable: If 1, the function is permitted to use MSI to request Right. -- Krzysztof Halasa -
From: Krzysztof Halasa <khc@pm.waw.pl> Right, and every vendor I've spoken to who had the INTX_DISABLE bug clearly acknowledged that it was a bug in their RTL design and that they considered the spec to be clear on this matter Things get more complicated with PCI-Express because INTx# isn't an out-of-band "pin", but rather a message sent over the bus :-) -
Right. I was merely describing the end result, the union of that language as it applies to the kernel. Jeff -
When I used "here", I was just meaning our youthful linux southbridge drivers team instead of the whole AMD. Sorry for the confusion to you. Yes, absolutely AMD has lots of people with PCI and MSI experience, but this MSI issue is mainly under the debug of our team now. I think our team will cooperate with other teams more closely to provide linux chipset support besides fixing the MSI problem in the future. Thanks for your suggestion. I will continue to debug this problem next Monday when I'm back to the office. Thanks Best Regards Shane _________________________________________________________________ Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mk... -
If the pci_intx change will be applied to the SATA driver, can it be applied for the ATI USB-HCDs too? See http://lkml.org/lkml/2006/12/21/47 for more details. That should help most of the ATI MSI quirks. It helped me (Acer Aspire 502x laptop with I am still somewhat confused: most people developing the Linux kernel are limited to look at this hardware from the outside. They had to find out by trial and error. I really hope your colleagues will be more actively helping your team and (other) Linux kernel developers. But I'm glad AMD has such a team. (So while you're at it, ask about the ATI USB HCDs and any other MSI quirk too) -
Well... I think the cooperation will become more and more smoothly as time goes by. In fact, other teams are always helping us. As to the other problems such as USB-HCD, we need some investigation before I can send you any response. Thanks Best Regards Shane _________________________________________________________________ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE -
I checked the USB MSI problem above(also has reported to our hardware
team),
the cause seems same as our SB700 SATA controller MSI problem.
I also did some small USB MSI debug on our SB700 board with 2.6.23-rc5:
The USB part of the second patch(attached at the end of this mail)
does not work for SB700 USB EHCI/OHCI controllers, no matter INTx is
enabled or disabled before enter MSI. The USB host controllers are
always using IO-APIC, which is different from SB400. I don't know why.
[root@localhost ~]# cat /proc/interrupts
CPU0 CPU1
17: 0 133 IO-APIC-fasteoi ohci_hcd:usb1,
ohci_hcd:usb2, ehci_hcd:usb6
18: 1 2 IO-APIC-fasteoi ohci_hcd:usb3,
ohci_hcd:usb4, ohci_hcd:usb5
19: 0 0 IO-APIC-fasteoi ehci_hcd:usb7
Also I wonder why the USB MSI patch is not added into kernel at last?
Will it lead to other bugs?
Thanks
Best Regards
Shane
======> USB part of the second patch in lkml.org/lkml/2006/12/21/47
diff -uprdN linux/drivers/usb/core/hcd-pci.c
linux/drivers/usb/core/hcd-pci.c
--- linux/drivers/usb/core/hcd-pci.c 2006-12-16 13:34:57.000000000
-0800
+++ linux/drivers/usb/core/hcd-pci.c 2006-12-16 13:57:09.000000000
-0800
@@ -69,6 +69,7 @@ int usb_hcd_pci_probe (struct pci_dev *d
if (pci_enable_device (dev) < 0)
return -ENODEV;
+ pci_enable_msi(dev);
dev->current_state = PCI_D0;
dev->dev.power.power_state = PMSG_ON;
@@ -139,6 +140,7 @@ int usb_hcd_pci_probe (struct pci_dev *d
release_region (hcd->rsrc_start, hcd->rsrc_len);
err2:
usb_put_hcd (hcd);
+ pci_disable_msi (dev);
err1:
pci_disable_device (dev);
dev_err (&dev->dev, "init %s fail, %d\n", pci_name(dev),
retval);
@@ -177,6 +179,7 @@ void usb_hcd_pci_remove (struct pci_dev
release_region (hcd->rsrc_start, hcd->rsrc_len);
}
usb_put_hcd (hcd);
+ pci_disable_msi(dev);
pci_disable_device(dev);
}
EXPORT_SYMBOL (usb_hcd_pci_remove);
@@ -391,6 +394,7 @@ int usb_hcd_pci_resume (struct ...From: "Shane Huang" <Shane.Huang@amd.com> Probably someone just needs to be more vocal and active in pushing it to the USB subsystem maintainer(s). I've even had trouble getting even simple bug fixes integrated recently, so perhaps it will take a few retransmits and some patience to get it included. Anyways, thanks for bringing it to our attention. Greg, can you at least devote a few minutes to going over that USB MSI patch, giving it any obvious things it needs (perhaps some pci_msi_enable() return value checks, for example, but may not be needed at all in this case) and then stash it somewhere so it doesn't get lost in the void? Thanks. -
Yeah, I appologize for some of our developers, they seem a bit grumpy at Can someone forward it to me so that I can see it? I can't seem to locate it at the moment, was I copied on it? thanks, greg k-h -
