On 10/3/2007 11:58 PM, Eric W. Biederman wrote:The INTx pending and disable bit were only added starting with PCI 2.3, so in PCI-2.2 and PCI-X 1.0{,a,b} those bits don't exist at all and there is still a significant of such devices still in use or on the market. I agree it would look natural (and that probably happens for a lot of or most devices) to transfer the interrupt state from INTx to MSI, but I don't think you can rely on it without doing some assumptions about device interrupt management that are outside the scope of the spec. The possibility of masking for MSI was only specified (and then only as something optional) starting with PCI-3.0, PCI Express 1.0 and 1.0a are based on the older PCI-2.3 and corresponding devices are very unlikely to have it. So there might still be majority of devices in the field with no MSI masking capability in the different PCI categories: conventional-PCI, PCI-X, PCI-Express. I found this quote in PCI-3.0/6.8.3.5: "For MSI-X, a function is permitted to cache Address and Data values from unmasked MSI-X Table entries. However, anytime software unmasks a currently masked MSI-X Table entry either by clearing its Mask bit or by clearing the Function Mask bit, the function must update any Address or Data values that it cached from that entry. If software changes the Address or Data value of an entry while the entry is unmasked, the result is undefined." I haven't seen a caching possibility mentioned for the MSI case, so apart from the problem with multi-word changes, maybe changing the Address or Data can be done at anytime for MSI. It is indeed defined as MSI-enable, but that's not a contradiction with being equivalent to a "mode switch between INTx and MSI" (ignoring MSI-X in that context). The spec seems to define the following "modes": MSI-enable = 1, INTx-disable= x : MSI-mode MSI-enable = 0, INTx-disable= 0 : INTx-mode with INTx-signal == INTx-pending MSI-enable = 0, INTx-disable= 1 : INTx-mode "polling/diag" mode using INTx pending bit The only specificity of Myrinet is having relatively independant logic for the two modes, while at the same time requiring any pending INTx to be acked before starting any kind of new interrupt. In our case it is true that the device can fire a bounded number of MSI without acks (but not an infinite number, there are a limited number of interrupt tokens, furthermore interrupt rate is limited with a configurable minimum time between interrupts which default to ~10us), I suspect a race with other interrupts were involved because otherwise that minimum inter-interrupt delay would prevent entering that code path. I think even a more casual interrupt-scheme (with an explicit ack required for each interrupt before generating a new one) can also exercise that code path, since between the return from the handler and the clearing of IRQ_PROGRESS, there is an opportunity for the next interrupt to happen. To detect a crazy device generating storms of edge interrupts, I guess note_interrupt() could be called during this "reentrant detection" if masking was made conditional. Loic P.S.: just a little more context: in all Myrinet hardware, enough of the interrupt functionality is implemented in firmware that we can avoid loosing interrupts whenever MSI-enable is toggled, and we already started distributing a firmware-based software update for users running linux >= 2.6.21 and using MSI. So for Myrinet the problem is more or less already closed. The only motivation for starting the thread was that it seemed a possibility that other non-Myrinet devices could be affected by that " use MSI-enable as a masking function": - the first problem being a possible spurious INTx interrupt (and for most PCI-X 133Mhz or earlier there might not be a INTx disable bit to avoid that) during the "MSI-disabled" window. - it does not seem far-fetched that other devices could also loose an interrupt during that toggling, at best this seems a grey area of the spec. - the race to trigger any of those potential problems is small, they would be hard to reproduce. -
| Maciej Rutecki | [2.6.26.*] boot problem (ahci/irq related?) |
| Chuck Ebbert | Why do so many machines need "noapic"? |
| Tony Lindgren | [PATCH 32/90] ARM: OMAP: Basic support for siemens sx1 |
| Renato S. Yamane | Error -71 on device descriptor read/all |
git: | |
| Francis Moreau | What about git cp ? |
| Elijah Newren | Trying to use git-filter-branch to compress history by removing large, obsolete bi... |
| James B. Byrne | GiT and CentOS 5.2 |
| Matthieu Moy | git push to a non-bare repository |
| GVG GVG | ssh_exchange_identification: Connection closed by remote host |
| Jim Razmus | Re: Trouble ticket system suggestions |
| Calomel | Re: Light HTTP servers. |
| Brian Keefer | Re: Testing in a virtual environment |
| Matt Mackall | [PATCH] Stop scaring users with "treason uncloaked!" |
| Kunsheng Chen | Is there any function similar to inet_ntoa() in Kernel or NetFilter ? |
| Saverio Mascolo | TCP default congestion control in linux should be newreno |
| Johann Baudy | [PATCH] Packet socket: mmapped IO: PACKET_TX_RING |
