Re: MSI interrupts and disable_irq

Previous thread: regression in 2.6.23-rc8 - power off failed on old laptop by Wolfgang Erig on Friday, September 28, 2007 - 9:03 pm. (4 messages)

Next thread: [sdhci] mmc0: unrecognised SCR structure version 1 by Adam Wysocki on Friday, September 28, 2007 - 11:42 pm. (5 messages)
To: Ayaz Abdulla <aabdulla@...>
Cc: Manfred Spraul <manfred@...>, nedev <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, David Miller <davem@...>, Andrew Morton <akpm@...>
Date: Friday, September 28, 2007 - 10:47 pm

(added linux-kernel to CC, since I think it's more of a general kernel
issue)

To be brutally frank, I always thought this disable_irq() mess was a
hack both ugly and fragile. This disable_irq() work that appeared in a
couple net drivers was correct at the time, so I didn't feel I had the
justification to reject it, but it still gave me a bad feeling.

I think the scenario you outline is an illustration of the approach's
fragility: disable_irq() is a heavy hammer that originated with INTx,
and it relies on a chip-specific disable method (kernel/irq/manage.c)
that practically guarantees behavior will vary across MSI/INTx/etc.

Practices like forcedeth's unique locking work for a time, but it should
be a warning sign any time you stray from the normal spin_lock_irqsave()
method of synchronization.

Based on your report, it is certainly possible that there is a problem
with MSI's desc->chip->disable() method... but I would actually
recommend working around the problem by making the forcedeth locking
more standardized by removing all those disable_irq() hacks.

Using spinlocks like other net drivers (note: avoid NETIF_F_LLTX
drivers) has a high probability of both fixing your current problem, and
giving forcedeth a more stable foundation for the long term. In my
humble opinion :)

Jeff

-

To: Jeff Garzik <jgarzik@...>
Cc: Ayaz Abdulla <aabdulla@...>, nedev <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, David Miller <davem@...>, Andrew Morton <akpm@...>
Date: Saturday, October 13, 2007 - 5:30 am

I checked the code: IRQ_DISABLE is implemented in software, i.e.
handle_level_irq() only calls handle_IRQ_event() [and then the nic irq
handler] if IRQ_DISABLE is not set.
OTHO: The last trace looks as if nv_do_nic_poll() is interrupted by an irq.

Perhaps something corrupts dev->irq? The irq is requested with
request_irq(np->pci_dev->irq, handler, IRQF_SHARED, dev->name, dev)
and disabled with
disable_irq_lockdep(dev->irq);

Someone around with a MSI capable board? The forcedeth driver does
dev->irq = pci_dev->irq
in nv_probe(), especially before pci_enable_msi().
Does pci_enable_msi() change pci_dev->irq? Then we would disable the
wrong interrupt....

--
Manfred
-

To: Manfred Spraul <manfred@...>
Cc: Ayaz Abdulla <aabdulla@...>, nedev <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, David Miller <davem@...>, Andrew Morton <akpm@...>
Date: Monday, October 15, 2007 - 6:17 pm

Remember, fundamentally MSI-X is a one-to-many relationship, when you
consider a single PCI device might have multiple vectors.

Jeff

-

To: Jeff Garzik <jgarzik@...>
Cc: Manfred Spraul <manfred@...>, Ayaz Abdulla <aabdulla@...>, nedev <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, David Miller <davem@...>, Andrew Morton <akpm@...>
Date: Tuesday, October 16, 2007 - 1:23 pm

msi-x is using other entry

if (np->msi_flags & NV_MSI_X_ENABLED)

enable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);

YH
-

To: Yinghai Lu <yhlu.kernel@...>
Cc: Manfred Spraul <manfred@...>, Ayaz Abdulla <aabdulla@...>, nedev <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, David Miller <davem@...>, Andrew Morton <akpm@...>
Date: Tuesday, October 16, 2007 - 1:39 pm

Correct, but the overall point was that MSI-X conceptually conflicts
with the existing "lockless" disable_irq() schedule, which was written
when there was a one-one relationship between irq, PCI device, and work
to be done.

Jeff

-

To: Jeff Garzik <jgarzik@...>
Cc: Manfred Spraul <manfred@...>, Ayaz Abdulla <aabdulla@...>, nedev <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, David Miller <davem@...>, Andrew Morton <akpm@...>
Date: Tuesday, October 16, 2007 - 2:01 pm

at this point, nic in mcp55 is using msi or INTx.

YH
-

To: Yinghai Lu <yhlu.kernel@...>
Cc: Jeff Garzik <jgarzik@...>, Ayaz Abdulla <aabdulla@...>, nedev <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, David Miller <davem@...>, Andrew Morton <akpm@...>
Date: Wednesday, October 17, 2007 - 3:43 pm

Correct.
For msi-x, the driver does three disable_irq() calls to the correct
vectors. ugly, but nevertheless correct.
The bug only affected msi: The driver did disable_irq(<old INTx irq
num>) instead of disable_irq(<new msi-x irq num>).
The patch that I've attached to the bugzilla report 9047 seems to fix
the crash, thus I would propose to apply it to 2.6.23 and 2.6.24. I'll
send a seperate mail.

All other problem [reduce code duplication, less ugly locking, ...]
should be fixed with independant patches.

--

-

To: Jeff Garzik <jgarzik@...>
Cc: Manfred Spraul <manfred@...>, Ayaz Abdulla <aabdulla@...>, nedev <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, David Miller <davem@...>, Andrew Morton <akpm@...>
Date: Tuesday, October 16, 2007 - 1:59 pm

Can I use your new driver with RHEL 5 or RHEL 5.1?

YH
-

To: Yinghai Lu <yhlu.kernel@...>
Cc: Manfred Spraul <manfred@...>, Ayaz Abdulla <aabdulla@...>, nedev <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, David Miller <davem@...>, Andrew Morton <akpm@...>
Date: Tuesday, October 16, 2007 - 3:44 pm

Not without modification, since it depends on the napi_struct work
currently in torvalds/linux-2.6.git.

But I am currently rewriting the fe-lock yet again, and most of those
changes can be applied to pre-napi_struct forcedeth.

Jeff

-

To: Manfred Spraul <manfred@...>
Cc: Jeff Garzik <jgarzik@...>, Ayaz Abdulla <aabdulla@...>, nedev <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, David Miller <davem@...>, Andrew Morton <akpm@...>
Date: Sunday, October 14, 2007 - 1:59 am

the request_irq==>setup_irq will make dev->irq = pci_dev->irq.

YH
-

To: Yinghai Lu <yhlu.kernel@...>
Cc: Jeff Garzik <jgarzik@...>, Ayaz Abdulla <aabdulla@...>, nedev <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, David Miller <davem@...>, Andrew Morton <akpm@...>
Date: Sunday, October 14, 2007 - 3:15 am

Where is that?
Otherwise I would propose the attached patch. My board is not
MSI-capable, thus I can't test it myself.

--
Manfred

To: Manfred Spraul <manfred@...>
Cc: Yinghai Lu <yhlu.kernel@...>, Jeff Garzik <jgarzik@...>, Ayaz Abdulla <aabdulla@...>, nedev <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, David Miller <davem@...>, Andrew Morton <akpm@...>
Date: Sunday, October 14, 2007 - 5:47 pm

Why not just copy pcidev->irq to dev->irq once ?

Ben.

-

To: <benh@...>
Cc: Manfred Spraul <manfred@...>, Jeff Garzik <jgarzik@...>, Ayaz Abdulla <aabdulla@...>, nedev <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, David Miller <davem@...>, Andrew Morton <akpm@...>
Date: Sunday, October 14, 2007 - 7:15 pm

it seems e1000 is using np->pci_dev->irq directly too.

YH
-

To: Yinghai Lu <yhlu.kernel@...>
Cc: Manfred Spraul <manfred@...>, Jeff Garzik <jgarzik@...>, Ayaz Abdulla <aabdulla@...>, nedev <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, David Miller <davem@...>, Andrew Morton <akpm@...>
Date: Sunday, October 14, 2007 - 7:36 pm

Heh, allright, doesn't matter, I was just proposing to avoid one more
indirection :-)

Ben.

-

To: Manfred Spraul <manfred@...>
Cc: Jeff Garzik <jgarzik@...>, Ayaz Abdulla <aabdulla@...>, nedev <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, David Miller <davem@...>, Andrew Morton <akpm@...>
Date: Sunday, October 14, 2007 - 3:55 pm

in nv_request_irq
if ((ret = pci_enable_msi(np->pci_dev)) == 0) {
np->msi_flags |= NV_MSI_ENABLED;
if (request_irq(np->pci_dev->irq, handler,
IRQF_SHARED, dev->name, dev) != 0) {

in request_irq

int request_irq(unsigned int irq, irq_handler_t handler,
unsigned long irqflags, const char *devname, void *dev_id)
...
action->dev_id = dev_id;
...
retval = setup_irq(irq, action);

in setup_irq
int setup_irq(unsigned int irq, struct irqaction *new)
....
new->irq = irq;

it seems I missed that here new is irqaction instead of net_dev.

YH
-

To: Jeff Garzik <jgarzik@...>
Cc: Ayaz Abdulla <aabdulla@...>, Manfred Spraul <manfred@...>, nedev <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, David Miller <davem@...>, Andrew Morton <akpm@...>
Date: Saturday, October 6, 2007 - 1:43 pm

I wonder if the race is between soft_timer for nv_do_nic_poll from
different CPUs

YH
-

To: Yinghai Lu <yhlu.kernel@...>
Cc: Ayaz Abdulla <aabdulla@...>, Manfred Spraul <manfred@...>, nedev <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, David Miller <davem@...>, Andrew Morton <akpm@...>
Date: Saturday, October 6, 2007 - 1:59 pm

Interested parties should try the forcedeth patches I just posted :)

Jeff

-

To: Jeff Garzik <jgarzik@...>
Cc: Yinghai Lu <yhlu.kernel@...>, Ayaz Abdulla <aabdulla@...>, nedev <netdev@...>, Linux Kernel Mailing List <linux-kernel@...>, David Miller <davem@...>, Andrew Morton <akpm@...>
Date: Sunday, October 7, 2007 - 12:54 pm

The patches look good, but I would still prefer to understand why the
current implementation causes crashes before rewriting everything.

I've just noticed that netpoll_send_skb() calls ->hard_start_xmit() and
->poll() within local_irq_save().
Thus it seems that spin_lock_irqsave() must be used in the poll() and
hard_start_xmit() callbacks, at least in nics that support
POLL_CONTROLLER :-(

--
Manfred

-

To: <linux-kernel@...>
Cc: <netdev@...>
Date: Friday, September 28, 2007 - 11:08 pm

On Fri, 28 Sep 2007 22:47:16 -0400

I'll try and clean it up if the author doesn't get to it first.

--
Stephen Hemminger <shemminger@linux-foundation.org>

-

To: Stephen Hemminger <shemminger@...>
Cc: <netdev@...>, <linux-kernel@...>
Date: Friday, October 5, 2007 - 6:12 pm

I didn't see anything in disable_irq that would cause it to fail in
the suggested way. But I couldn't quite convince myself we were
race free either. I didn't see anything that was specific to MSI
that would cause something. But switching from level to edge
triggered, and to a lower latency delivery path may have caused

I took a look at the underlying side of this.

I don't know if the MSI capability for the forcedeth supports a mask
bit or not. Mine doesn't even have a msi capability. If it doesn't
support a mask bit the pci spec provides not valid way to mask the
interrupt, so what we do is actually disable the msi capability.
At which point we might get weird INTx interactions.

We have a similar case with ioapics and INTx that also turns
a hardware level disable into a reroute to another irq command.
So I'm going to take a look and see how infrequently we can use
hardware level disabled.

Since it looks like hardware level disables tend to be creatively
implemented I recommend using disable_irq as little as possible.

Eric
-

To: Eric W. Biederman <ebiederm@...>
Cc: Stephen Hemminger <shemminger@...>, <netdev@...>, <linux-kernel@...>
Date: Saturday, October 6, 2007 - 2:23 am

also the driver need to have seperate handlers for ioapic, msi, msi-x...
current nv_nic_irq_optimized keep checking msi_flag...

also nv_enable_hw_interrupts and nv_disable_hw_interrupts is not
corresponding in MSI side..

YH
-

Previous thread: regression in 2.6.23-rc8 - power off failed on old laptop by Wolfgang Erig on Friday, September 28, 2007 - 9:03 pm. (4 messages)

Next thread: [sdhci] mmc0: unrecognised SCR structure version 1 by Adam Wysocki on Friday, September 28, 2007 - 11:42 pm. (5 messages)