Re: 3c59x: shared interrupt problem

Previous thread: [net-next PATCH 01/12] qlge: Move reset logic into asic_reset_worker func. by Ron Mercer on Monday, March 9, 2009 - 1:59 pm. (14 messages)

Next thread: [PATCH] net: fix warning about non-const string by Stephen Hemminger on Monday, March 9, 2009 - 4:51 pm. (1 message)
From: Gerhard Pircher
Date: Monday, March 9, 2009 - 3:42 pm

Hi!

shares its IRQ line with 3 OHCI USB ports (IRQ 7), as the excerpt of
/proc/interrupt shows. Removing USB support from the kernel makes it
work again. I wasn't able to do a full git bisect run yet, as v2.6.27
didn't produce a bootable kernel image for my machine. The machine is
an AmigaOne PowerPC G4 with an onboard 3c920 network chip.

Any idea?

best regards,

Gerhard

PS: Please put me on CC:, as I'm not subscribed to this mailing list.

/proc/interrupts:
           CPU0
  1:       1648   i8259     Level     i8042
  5:          0   i8259     Level     uhci_hcd:usb4, uhci_hcd:usb5
  6:          4   i8259     Level     floppy
  7:     236520   i8259     Level     ohci_hcd:usb1, ohci_hcd:usb2, ohci_hcd:usb3, eth0
  8:          2   i8259     Level     rtc0
  9:          0   i8259     Level     eth2
 12:        117   i8259     Level     i8042
 14:       8277   i8259     Level     ide0
 15:      17559   i8259     Level     ide1
BAD:          1

Kernel log:
Badness at net/sched/sch_generic.c:226
NIP: c0250118 LR: c0250118 CTR: c0013020
REGS: efffde90 TRAP: 0700   Not tainted  (2.6.29-rc6)
MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 42024024  XER: 00000000
TASK = c03915a0[0] 'swapper' THREAD: c03b2000
GPR00: c0250118 efffdf40 c03915a0 00000035 00008a62 ffffffff ffffffff 00000000 
GPR08: 00000000 c03c0000 00008a62 c0393104 22024042 00000000 0ffd5900 0080044c 
GPR16: 00000001 ffffffff 00000000 007ffc00 0ffd3158 0f0689b0 0ffff220 007ffbc0 
GPR24: 00000000 00000000 0000000a 00000004 efffc000 c024ffb0 00000100 ef847000 
NIP [c0250118] dev_watchdog+0x168/0x244
LR [c0250118] dev_watchdog+0x168/0x244
Call Trace:
[efffdf40] [c0250118] dev_watchdog+0x168/0x244 (unreliable)
[efffdfa0] [c002f564] run_timer_softirq+0x12c/0x1b4
[efffdfd0] [c002ab0c] __do_softirq+0x6c/0x108
[efffdff0] [c0011ef0] call_do_softirq+0x14/0x24
[c03b3e90] [c0006c30] do_softirq+0x64/0x88
[c03b3eb0] [c002a968] irq_exit+0x38/0x7c
[c03b3ec0] [c000f634] timer_interrupt+0x138/0x150
[c03b3ee0] [c0012bd4] ...
From: Stephen Hemminger
Date: Monday, March 9, 2009 - 4:49 pm

On Mon, 09 Mar 2009 23:42:53 +0100

Does this help, it looks like boomerang_interrupt was not doing
shared irq stuff correctly.

--- a/drivers/net/3c59x.c	2009-03-09 16:07:13.372670015 -0700
+++ b/drivers/net/3c59x.c	2009-03-09 16:08:50.214357441 -0700
@@ -2301,6 +2301,7 @@ boomerang_interrupt(int irq, void *dev_i
 	void __iomem *ioaddr;
 	int status;
 	int work_done = max_interrupt_work;
+	int handled = 0;
 
 	ioaddr = vp->ioaddr;
 
@@ -2323,6 +2324,7 @@ boomerang_interrupt(int irq, void *dev_i
 			printk(KERN_DEBUG "boomerang_interrupt(1): status = 0xffff\n");
 		goto handler_exit;
 	}
+	handled = 1;
 
 	if (status & IntReq) {
 		status |= vp->deferred;
@@ -2417,7 +2419,7 @@ boomerang_interrupt(int irq, void *dev_i
 			   dev->name, status);
 handler_exit:
 	spin_unlock(&vp->lock);
-	return IRQ_HANDLED;
+	return IRQ_RETVAL(handled);
 }
 
 static int vortex_rx(struct net_device *dev)

--

From: Steffen Klassert
Date: Tuesday, March 10, 2009 - 1:16 am

This basically reverts a patch from akpm (bitkeeper cset 1.1046.95.8)
This patch was to workaround lots of "nobody cared" warnings generated by
boomerang_interrupt(). 
I added Andrew to the Cc, perhaps he can remember some details on this.

Steffen
--

From: Andrew Morton
Date: Tuesday, March 10, 2009 - 2:55 pm

On Tue, 10 Mar 2009 09:16:28 +0100

Beats me.  Do you havea full copy of that patch, including changelog?

Thanks.
--

From: Steffen Klassert
Date: Wednesday, March 11, 2009 - 4:38 am

It was this one:

#### ChangeSet ####
2003-05-19 10:27:49-07:00, akpm@digeo.com 
  [PATCH] 3c59x irqreturn fix
  
  Apparently boomerang_interrupt() is generating lots of "nobody cared"
  warnings - one per packet it seems.  Frankly, I don't have a clue why.
  
  These are ancient cards and the driver is otherwise stable, so just
  change it to return IRQ_HANDLED and move on...

==== drivers/net/3c59x.c ====
2003-05-17 14:09:34-07:00, akpm@digeo.com +2 -7
  3c59x irqreturn fix

--- 1.34/drivers/net/3c59x.c	2003-04-20 22:41:08 -07:00
+++ 1.35/drivers/net/3c59x.c	2003-05-17 14:09:34 -07:00
@@ -2321,7 +2321,6 @@ boomerang_interrupt(int irq, void *dev_i
 	long ioaddr;
 	int status;
 	int work_done = max_interrupt_work;
-	int handled;
 
 	ioaddr = dev->base_addr;
 
@@ -2336,18 +2335,14 @@ boomerang_interrupt(int irq, void *dev_i
 	if (vortex_debug > 6)
 		printk(KERN_DEBUG "boomerang_interrupt. status=0x%4x\n", status);
 
-	if ((status & IntLatch) == 0) {
-		handled = 0;
+	if ((status & IntLatch) == 0)
 		goto handler_exit;		/* No interrupt: shared IRQs can cause this */
-	}
 
 	if (status == 0xffff) {		/* h/w no longer present (hotplug)? */
 		if (vortex_debug > 1)
 			printk(KERN_DEBUG "boomerang_interrupt(1): status = 0xffff\n");
-		handled = 0;
 		goto handler_exit;
 	}
-	handled = 1;
 
 	if (status & IntReq) {
 		status |= vp->deferred;
@@ -2442,7 +2437,7 @@ boomerang_interrupt(int irq, void *dev_i
 			   dev->name, status);
 handler_exit:
 	spin_unlock(&vp->lock);
-	return IRQ_RETVAL(handled);
+	return IRQ_HANDLED;
 }
 
 static int vortex_rx(struct net_device *dev)
--

From: David Miller
Date: Friday, March 13, 2009 - 3:51 pm

From: Steffen Klassert <klassert@mathematik.tu-chemnitz.de>

So basically it's a band-aid because we didn't investigate why
this happens.

I think we should put the change in, and then look into things
properly if users report this issue again.

The code there right now is just completely wrong when the
3c59x interrupt is shared with another device.
--

From: Steffen Klassert
Date: Saturday, March 14, 2009 - 7:08 am

Gerhard reported at least one of these "nobody cared" messages at shutdown
after applying this change. He wanted to provide us with further informations
about this issue next week. Best would be to put it in together with a fix.

Indeed.
--

From: David Miller
Date: Saturday, March 14, 2009 - 11:40 am

From: Steffen Klassert <klassert@mathematik.tu-chemnitz.de>

Fair enough.
--

From: Gerhard Pircher
Date: Tuesday, March 17, 2009 - 2:37 am

Okay, here's a small status update in order to show that I'm really
doing something. :)
Increasing the value of vortex_debug doesn't really help. It slows
down the network transfer too much to trigger the bug. Does somebody
know where to insert some printks in the driver to get a useful debug
output?
Bisecting shows that even v2.6.26-rc1 fails. But I have a v2.6.27-rc7
image that works fine!? Looks like I have to put more effort in
bisecting.

Gerhard
-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01
--

From: Steffen Klassert
Date: Friday, March 27, 2009 - 12:59 am

Do you see these messages always when your network hangs and does the network
recover after such a hang? Could you please send the output of 'tc -s qdisc show'
after a network hang?
--

From: Gerhard Pircher
Date: Saturday, March 28, 2009 - 7:17 am

IIRC I only got this message once during shutdown. Normally only
"IRQ 7 nobody cared" messages with a stacktrace of the interrupt
handlers are printed out with newer kernels (>=2.6.26) (see the
screenshots I made). Older kernels don't print out any messages
at all. Also the network never recovers after a bad interrupt is
reported in /proc/interrupts.
I'm far away from my machine for the next three weeks, so I can't
send you the output until then.
So far I could narrow down the problem to kernel versions v2.6.19
till 2.6.23-rc9. Bisecting is getting harder now, because either
arch/ppc doesn't work anymore for PPC32 or my platform patches for
arch/powerpc do not apply.

regards,

Gerhard
-- 
--
-- Dipl. Ing. (FH) Gerhard Pircher
-- E-mail : gerhard_pircher@gmx.net
--

Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a
--

From: Gerhard Pircher
Date: Tuesday, April 21, 2009 - 11:36 am

Sorry for the delay! Here's the output of tc after a network hang.

qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 4633909 bytes 64766 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 

regards,

Gerhard
-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01
--

Previous thread: [net-next PATCH 01/12] qlge: Move reset logic into asic_reset_worker func. by Ron Mercer on Monday, March 9, 2009 - 1:59 pm. (14 messages)

Next thread: [PATCH] net: fix warning about non-const string by Stephen Hemminger on Monday, March 9, 2009 - 4:51 pm. (1 message)