Hi!
shares its IRQ line with 3 OHCI USB ports (IRQ 7), as the excerpt of
/proc/interrupt shows. Removing USB support from the kernel makes it
work again. I wasn't able to do a full git bisect run yet, as v2.6.27
didn't produce a bootable kernel image for my machine. The machine is
an AmigaOne PowerPC G4 with an onboard 3c920 network chip.
Any idea?
best regards,
Gerhard
PS: Please put me on CC:, as I'm not subscribed to this mailing list.
/proc/interrupts:
CPU0
1: 1648 i8259 Level i8042
5: 0 i8259 Level uhci_hcd:usb4, uhci_hcd:usb5
6: 4 i8259 Level floppy
7: 236520 i8259 Level ohci_hcd:usb1, ohci_hcd:usb2, ohci_hcd:usb3, eth0
8: 2 i8259 Level rtc0
9: 0 i8259 Level eth2
12: 117 i8259 Level i8042
14: 8277 i8259 Level ide0
15: 17559 i8259 Level ide1
BAD: 1
Kernel log:
Badness at net/sched/sch_generic.c:226
NIP: c0250118 LR: c0250118 CTR: c0013020
REGS: efffde90 TRAP: 0700 Not tainted (2.6.29-rc6)
MSR: 00029032 <EE,ME,CE,IR,DR> CR: 42024024 XER: 00000000
TASK = c03915a0[0] 'swapper' THREAD: c03b2000
GPR00: c0250118 efffdf40 c03915a0 00000035 00008a62 ffffffff ffffffff 00000000
GPR08: 00000000 c03c0000 00008a62 c0393104 22024042 00000000 0ffd5900 0080044c
GPR16: 00000001 ffffffff 00000000 007ffc00 0ffd3158 0f0689b0 0ffff220 007ffbc0
GPR24: 00000000 00000000 0000000a 00000004 efffc000 c024ffb0 00000100 ef847000
NIP [c0250118] dev_watchdog+0x168/0x244
LR [c0250118] dev_watchdog+0x168/0x244
Call Trace:
[efffdf40] [c0250118] dev_watchdog+0x168/0x244 (unreliable)
[efffdfa0] [c002f564] run_timer_softirq+0x12c/0x1b4
[efffdfd0] [c002ab0c] __do_softirq+0x6c/0x108
[efffdff0] [c0011ef0] call_do_softirq+0x14/0x24
[c03b3e90] [c0006c30] do_softirq+0x64/0x88
[c03b3eb0] [c002a968] irq_exit+0x38/0x7c
[c03b3ec0] [c000f634] timer_interrupt+0x138/0x150
[c03b3ee0] [c0012bd4] ...On Mon, 09 Mar 2009 23:42:53 +0100
Does this help, it looks like boomerang_interrupt was not doing
shared irq stuff correctly.
--- a/drivers/net/3c59x.c 2009-03-09 16:07:13.372670015 -0700
+++ b/drivers/net/3c59x.c 2009-03-09 16:08:50.214357441 -0700
@@ -2301,6 +2301,7 @@ boomerang_interrupt(int irq, void *dev_i
void __iomem *ioaddr;
int status;
int work_done = max_interrupt_work;
+ int handled = 0;
ioaddr = vp->ioaddr;
@@ -2323,6 +2324,7 @@ boomerang_interrupt(int irq, void *dev_i
printk(KERN_DEBUG "boomerang_interrupt(1): status = 0xffff\n");
goto handler_exit;
}
+ handled = 1;
if (status & IntReq) {
status |= vp->deferred;
@@ -2417,7 +2419,7 @@ boomerang_interrupt(int irq, void *dev_i
dev->name, status);
handler_exit:
spin_unlock(&vp->lock);
- return IRQ_HANDLED;
+ return IRQ_RETVAL(handled);
}
static int vortex_rx(struct net_device *dev)
--
This basically reverts a patch from akpm (bitkeeper cset 1.1046.95.8) This patch was to workaround lots of "nobody cared" warnings generated by boomerang_interrupt(). I added Andrew to the Cc, perhaps he can remember some details on this. Steffen --
On Tue, 10 Mar 2009 09:16:28 +0100 Beats me. Do you havea full copy of that patch, including changelog? Thanks. --
It was this one: #### ChangeSet #### 2003-05-19 10:27:49-07:00, akpm@digeo.com [PATCH] 3c59x irqreturn fix Apparently boomerang_interrupt() is generating lots of "nobody cared" warnings - one per packet it seems. Frankly, I don't have a clue why. These are ancient cards and the driver is otherwise stable, so just change it to return IRQ_HANDLED and move on... ==== drivers/net/3c59x.c ==== 2003-05-17 14:09:34-07:00, akpm@digeo.com +2 -7 3c59x irqreturn fix --- 1.34/drivers/net/3c59x.c 2003-04-20 22:41:08 -07:00 +++ 1.35/drivers/net/3c59x.c 2003-05-17 14:09:34 -07:00 @@ -2321,7 +2321,6 @@ boomerang_interrupt(int irq, void *dev_i long ioaddr; int status; int work_done = max_interrupt_work; - int handled; ioaddr = dev->base_addr; @@ -2336,18 +2335,14 @@ boomerang_interrupt(int irq, void *dev_i if (vortex_debug > 6) printk(KERN_DEBUG "boomerang_interrupt. status=0x%4x\n", status); - if ((status & IntLatch) == 0) { - handled = 0; + if ((status & IntLatch) == 0) goto handler_exit; /* No interrupt: shared IRQs can cause this */ - } if (status == 0xffff) { /* h/w no longer present (hotplug)? */ if (vortex_debug > 1) printk(KERN_DEBUG "boomerang_interrupt(1): status = 0xffff\n"); - handled = 0; goto handler_exit; } - handled = 1; if (status & IntReq) { status |= vp->deferred; @@ -2442,7 +2437,7 @@ boomerang_interrupt(int irq, void *dev_i dev->name, status); handler_exit: spin_unlock(&vp->lock); - return IRQ_RETVAL(handled); + return IRQ_HANDLED; } static int vortex_rx(struct net_device *dev) --
From: Steffen Klassert <klassert@mathematik.tu-chemnitz.de> So basically it's a band-aid because we didn't investigate why this happens. I think we should put the change in, and then look into things properly if users report this issue again. The code there right now is just completely wrong when the 3c59x interrupt is shared with another device. --
Gerhard reported at least one of these "nobody cared" messages at shutdown after applying this change. He wanted to provide us with further informations about this issue next week. Best would be to put it in together with a fix. Indeed. --
From: Steffen Klassert <klassert@mathematik.tu-chemnitz.de> Fair enough. --
Okay, here's a small status update in order to show that I'm really doing something. :) Increasing the value of vortex_debug doesn't really help. It slows down the network transfer too much to trigger the bug. Does somebody know where to insert some printks in the driver to get a useful debug output? Bisecting shows that even v2.6.26-rc1 fails. But I have a v2.6.27-rc7 image that works fine!? Looks like I have to put more effort in bisecting. Gerhard -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01 --
Do you see these messages always when your network hangs and does the network recover after such a hang? Could you please send the output of 'tc -s qdisc show' after a network hang? --
IIRC I only got this message once during shutdown. Normally only "IRQ 7 nobody cared" messages with a stacktrace of the interrupt handlers are printed out with newer kernels (>=2.6.26) (see the screenshots I made). Older kernels don't print out any messages at all. Also the network never recovers after a bad interrupt is reported in /proc/interrupts. I'm far away from my machine for the next three weeks, so I can't send you the output until then. So far I could narrow down the problem to kernel versions v2.6.19 till 2.6.23-rc9. Bisecting is getting harder now, because either arch/ppc doesn't work anymore for PPC32 or my platform patches for arch/powerpc do not apply. regards, Gerhard -- -- -- Dipl. Ing. (FH) Gerhard Pircher -- E-mail : gerhard_pircher@gmx.net -- Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a --
Sorry for the delay! Here's the output of tc after a network hang. qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 4633909 bytes 64766 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 regards, Gerhard -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01 --
