Re: MSI problem since 2.6.21 for devices not providing a mask in their MSI capability

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Eric W. Biederman <ebiederm@...>
Cc: <linux-pci@...>, Linux Kernel Mailing List <linux-kernel@...>
Date: Wednesday, October 3, 2007 - 9:44 pm

On 10/3/2007 5:49 PM, Eric W. Biederman wrote:



Even if the INTx line is not raised, you cannot rely on the device to
retain memory of a interrupt triggered while MSI are disabled, and
expect it to fire it under MSI form later when MSI are reenabled.  The
PCI spec does not provide any implicit or explicit guarantee about the
MSI enable flag that would allow it to be used for temporary masking
without running the risk of loosing such interrupts. Moreover, even if
you eventually call the interrupt handler to recover a lost-interrupt,
having switched the device to INTx mode (whether or not the INTx line
was forced down or not with the corresponding pci-command bit) without
informing the driver can (and will in our case) break interrupt
handshaking because MSI and INTx interrupts are not acked in the same
way (INTx requires an extra step that we don't do for MSI and that the
device will still expect unless going through driver init again).





Indeed the masking case is well-defined by the spec (including the
operation of the pending bits). And my subject was definitely restricted
to devices without that masking capability.







OK no-op was a bug, but using the enable-bit for temporary masking
purposes still feels like a bug. I am afraid the only safe solution
might be to prohibit any operation that absolutely requires masking if
real masking is not available. Maybe the set_affinity method should
simply be disabled for device not supported masking (unless there is an
option of doing it without masking for instance by guaranteeing only one
word of the MSI capability is changed).






I don't think there is a problem here, no sane driver would depend on
receiving edge interrupts triggered while irqs were explicitly disabled.






Do you have a reference for that requirement. The spec only vaguely
associates MSI programming with "configuration", but I haven't found any
explicit indication that it should not work.





That's indeed a show-stopper.





I don't see how you can disable MSI through the control bit (which is
equivalent to switching the device to INTx whether or not the INTx
disable bit is set in PCI_COMMAND) in the middle of operations, not tell
the driver, and not risk loosing interrupts (unless you rely on much
more than the spec).




The interrupt while doing set_affinity masking would certainly cause a
problem for the device we use (MSI-enable switch between INTx and MSI
mode, and both interrupts are not acked the same way assuming they would
even be delivered to the driver), but I got some new data: upon further
examination, the lost interrupts we have seen seems in fact caused at a
different time:
- the problem is the  mask_ack_irq() done in handle_edge_irq() when a
new interrupt arrives before the IRQ_PROGRESS bit is cleared at the end
of the function.

Again here, switching MSI-off during hot operation breaks the interrupt
accounting and handshaking between our driver and device. At least this
case might be easier to handle, it seems safe to not mask there (when
some proven masking is not available).


Loic



-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: MSI problem since 2.6.21 for devices not providing a mas..., Loic Prylli, (Wed Oct 3, 9:44 pm)
Re: MSI problem since 2.6.21 for devices not providing a mas..., Eric W. Biederman, (Wed Oct 3, 11:58 pm)
Re: MSI problem since 2.6.21 for devices not providing a mas..., Benjamin Herrenschmidt, (Wed Oct 3, 6:03 pm)