Only adjust the NAPI bits of the interruptmask, if netif_rx_schedule_prep()
returns true.
Without this, I see interrupt storms here using 2.6.29-rcx and 2.6.27.19 and
device needed rmmod r8169 + modprobe r8169 to start working.
Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de>
---
drivers/net/r8169.c | 9 ++++-----
1 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index b347340..0c943ba 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -3708,12 +3708,11 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
rtl8169_check_link_status(dev, tp, ioaddr);
if (status & tp->napi_event) {
- RTL_W16(IntrMask, tp->intr_event & ~tp->napi_event);
- tp->intr_mask = ~tp->napi_event;
-
- if (likely(netif_rx_schedule_prep(&tp->napi)))
+ if (likely(netif_rx_schedule_prep(&tp->napi))) {
+ RTL_W16(IntrMask, tp->intr_event & ~tp->napi_event);
+ tp->intr_mask = ~tp->napi_event;
__netif_rx_schedule(&tp->napi);
- else if (netif_msg_intr(tp)) {
+ } else if (netif_msg_intr(tp)) {
printk(KERN_INFO "%s: interrupt %04x in poll\n",
dev->name, status);
}
--
1.6.0.6
--
How the patch is supposed to work is not completely clear to me. Can you elaborate a bit ? -- Ueimor --
Without it, if netif_rx_schedule_prep() returns 0, __netif_rx_schedule() is not called so nothing is there to care for the transfer. Nonetheless intr_mask is changed so that its not possible that __netif_rx_schedule() is ever called later. Patch works by letting rtl8169_interrupt() again call netif_rx_schedule_prep() until it succeeds and __netif_rx_schedule() is called. Thanks, Karsten --
Thanks for the explanation. If netif_rx_schedule fails after napi is enabled, there is a racing poll thread to care for the transfer. Stated differently the bug is noticeable because napi_enable() should be called before request_irq() in rtl8169_open, right ? -- Ueimor --
Propably yes. That would mean the first interrupt is received before rtl_hw_start() is called in rtl8169_open. The source of that interrupt would be something else then? Thanks, Karsten --
All we need is thus a shared interrupt racing on a different CPU with rtl8169_open and some IO versus memory reordering so that the hardware is started and a change in the status register can be seen in the IRQ handler before napi_enable/NAPI_STATE_SCHED escapes from the CPU where rtl8169_open is running... It does not look like a credible explanation. :o/ -- Ueimor --
From: Francois Romieu <romieu@fr.zoreil.com> Agreed, looking at how this part of the driver works I can't see how this condition can arise either. --
Found it: CONFIG_DEBUG_SHIRQ is set here causing a "test-call" to rtl8169_interrupt() from irq_request(). $SUBJECT patch is ok, me still thinks. You? Thanks, Karsten --
Karsten Wiese <fzu@wemgehoertderstaat.de> :
I will have to convince myself that nothing bad can happen in the
window between netif_rx_schedule_prep and RTL_W16. No big deal.
I do not understand why / how the patch works at all : AFAIU, the status
register must contain some napi-related event in order for the patch to
make a difference. That's exactly what rtl8169_irq_mask_and_ack() is
supposed to prevent. -> Does it fail ?
So I'd really like to understand what is going on here.
PS: while I agree on the motivation for CONFIG_DEBUG_SHIRQ, I wonder if
it is 100% safe to run an IRQ handler from a context which does not
adhere to the original semantic. If its {disable/enable}_irq pair races
with a similar pair due to a different driver on the same (shared) irq,
things could imho be unsafe.
--
Ueimor
--
RTL_W16 is a posted write, if it happens, theres no difference if its rtl8169_irq_mask_and_ack() resets status once and prevents interrupts being sent by device. Maybe status reset doesn't work because device is stopped or device isn't _completely_ stopped and updates status flags later. Well, without patch any further interrupt causing napi-status isn't reset _and_ netif_rx_schedule() isn't active. screaming interrupts. slooooowish pc. Wonder why it proceeded at all though. After rmmod/modprobe the status register contained 0x20 when in rtl8169_irq_mask_and_ack() during "test-call", letting device work From the comment of disable_irq(): "This function waits for any pending IRQ handlers for this interrupt to complete before returning." No race possible, in theory ;-) The pc I debugged this on is a UP with no other device active on the used int16. Thanks, Karsten --
Karsten Wiese <fzu@wemgehoertderstaat.de> : /me wonders... Does your network adapter support pxe or some boot time operation ? Btw the XID of the device would be welcome. -- Ueimor --
Karsten Wiese <fzu@wemgehoertderstaat.de> :
Ok, it starts to make some sense.
Does the patch below make a difference ?
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 4b7cb38..b5a7358 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -2011,8 +2011,6 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
}
}
- pci_set_master(pdev);
-
/* ioremap MMIO region */
ioaddr = ioremap(pci_resource_start(pdev, region), R8169_REGS_SIZE);
if (!ioaddr) {
@@ -2039,6 +2037,8 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
msleep_interruptible(1);
}
+ pci_set_master(pdev);
+
/* Identify chip attached to board */
rtl8169_get_mac_version(tp, ioaddr);
--
Ueimor
--
patch below helps.
Thanks,
Karsten
--------------------->
[PATCH] r8169: Reset IntrStatus after chip rest
On a MSI MS-6702E mainboard, when in rtl8169_init_one() for the first time
after BIOS has run, IntrStatus reads 5 after chip has been reset.
IntrStatus should equal 0 there, so patch changes IntrStatus reset to happen
after chip reset instead of before.
R8169_MSG_DEFAULT is changed to include NETIF_MSG_INTR, because without
an important error message from rtl8169_interrupt() isn't shown by default.
Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de>
---
drivers/net/r8169.c | 12 +++++++++---
1 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 56a1eb8..b06e153 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -45,8 +45,9 @@
#define dprintk(fmt, args...) do {} while (0)
#endif /* RTL8169_DEBUG */
-#define R8169_MSG_DEFAULT \
- (NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_IFUP | NETIF_MSG_IFDOWN)
+#define R8169_MSG_DEFAULT \
+ (NETIF_MSG_DRV | NETIF_MSG_PROBE | \
+ NETIF_MSG_IFUP | NETIF_MSG_IFDOWN | NETIF_MSG_INTR)
#define TX_BUFFS_AVAIL(tp) \
(tp->dirty_tx + NUM_TX_DESC - tp->cur_tx - 1)
@@ -2076,7 +2077,7 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
dev_info(&pdev->dev, "no PCI Express capability\n");
/* Unneeded ? Don't mess with Mrs. Murphy. */
- rtl8169_irq_mask_and_ack(ioaddr);
+ RTL_W16(IntrMask, 0x0000);
/* Soft reset the chip. */
RTL_W8(ChipCmd, CmdReset);
@@ -2088,6 +2089,11 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
msleep_interruptible(1);
}
+ /* On an MSI MS-6702E, when here for the first time after
+ BIOS has run, IntrStatus contains 5. Play safe.
+ */
+ RTL_W16(IntrStatus, 0xffff);
+
/* Identify chip attached to board */
rtl8169_get_mac_version(tp, ioaddr);
--
1.6.0.6
--
Karsten Wiese <fzu@wemgehoertderstaat.de> : Thanks. I'll wrap it up without the change to R8169_MSG_DEFAULT: - it can be set through ethtool and a module option - the current default is not a clear nuisance -- Ueimor --
