Re: WARNING: at net/core/dev.c:2161 net_rx_action()

Previous thread: Re: What still uses the block layer? by Neil Brown on Monday, October 15, 2007 - 4:19 am. (4 messages)

Next thread: [patch 0/2] Protect crashkernel against BSS overlap by Bernhard Walle on Monday, October 15, 2007 - 4:50 am. (5 messages)
From: Ingo Molnar
Date: Monday, October 15, 2007 - 4:24 am

got this warning with Linus' latest -git tree:

  WARNING: at net/core/dev.c:2161 net_rx_action()
   [<80564db4>] net_rx_action+0xce/0x186
   [<8011ba98>] __do_softirq+0x6c/0xcf
   [<8011bb2d>] do_softirq+0x32/0x36
   [<8011bcae>] irq_exit+0x35/0x40
   [<80104fdb>] do_IRQ+0x5c/0x71
   [<801048cd>] do_nmi+0x8f/0x238
   [<801033a3>] common_interrupt+0x23/0x30
   =======================

box is operating fine otherwise. config and bootlog attached. This 
warning got introduced by:

 commit bea3348eef27e6044b6161fd04c3152215f96411
 Author: Stephen Hemminger <shemminger@linux-foundation.org>
 Date:   Wed Oct 3 16:41:36 2007 -0700

     [NET]: Make NAPI polling independent of struct net_device objects.

	Ingo
From: Ingo Molnar
Date: Monday, October 15, 2007 - 4:27 am

unfortunately the warning seems spurious - i just rebooted the box into 
the same kernel and the message did not trigger again.

	Ingo
-

From: Ingo Molnar
Date: Monday, October 15, 2007 - 9:18 am

update: i got this warning for a second time now with another config - 
so i guess if someone sent me a fix-patch or debug-patch i would 
eventually trigger it within a day again.

	Ingo
-

From: David Miller
Date: Monday, October 15, 2007 - 12:57 pm

From: Ingo Molnar <mingo@elte.hu>

This is a driver bug, the work "budget" passed into a driver's
->poll() handler should never be exceeded.  That's what this
warning assertion is checking.

What ethernet card is in your system and what driver is being
used to drive it?

-

From: Ingo Molnar
Date: Monday, October 15, 2007 - 3:03 pm

it's forcedeth.

i've checked nv_napi_poll(), and i dont see how it could return larger 
than 'limit' number of packets.

it could return packets == limit though:

                pkts = nv_rx_process_optimized(dev, budget);
...

        if (pkts < budget) {
                /* re-enable receive interrupts */
                spin_lock_irqsave(&np->lock, flags);

                __netif_rx_complete(dev, napi);

...
        return pkts;

shouldnt that be "pkts <= budget"? But even that shouldnt cause a larger 
than limit return. Weird.

there are two networking cards in the system, the other one is a:

  eth1: RealTek RTL8139 at 0xf080e000, 00:c0:df:03:68:5d, IRQ 11
  eth1:  Identified 8139 chip type 'RTL-8139B'

but this one should be inactive (not plugged into the network). Should i 
try to get a debug print out of the actual 'weight' and 'work' integers, 
and of the n->poll function address?

	Ingo
-

From: Ingo Molnar
Date: Monday, October 15, 2007 - 3:07 pm

ok, i've added such a patch.

looking at the dev.c code - can napi_struct->weight be zero 
legitimately? If yes then the 0 gets passed to the driver and the driver 
would return 1 - violating the assertion.

	Ingo
-

From: David Miller
Date: Monday, October 15, 2007 - 3:21 pm

From: Ingo Molnar <mingo@elte.hu>

I touched upon this in another reply.

For forcedeth it should never be zero, it gets set in
netif_napi_add() to RX_WORK_PER_LOOP which is unconditionally
defined to 64.

Afterwards napi->weight should never be modified and that is
what is passed into n->poll().

Perhaps some memory scribble is causing it to be zero'd.
-

From: Ingo Molnar
Date: Monday, October 15, 2007 - 3:30 pm

update:

[  186.635916] WARNING: at net/core/dev.c:2166 net_rx_action()
[  186.641351]  [<c060d9f5>] net_rx_action+0x145/0x1b0
[  186.646191]  [<c011d752>] __do_softirq+0x42/0x90
[  186.650784]  [<c011d7c6>] do_softirq+0x26/0x30
[  186.655202]  [<c011db48>] local_bh_enable+0x48/0xa0
[  186.660055]  [<c06023e0>] lock_sock_nested+0xa0/0xc0
[  186.664995]  [<c065da16>] tcp_recvmsg+0x16/0xbc0
[  186.669588]  [<c013e94b>] __generic_file_aio_write_nolock+0x27b/0x520
[  186.676001]  [<c0601d75>] sock_common_recvmsg+0x45/0x70
[  186.681202]  [<c05ff5df>] sock_aio_read+0x11f/0x140
[  186.686054]  [<c015c086>] do_sync_read+0xc6/0x110
[  186.690735]  [<c012b9b0>] autoremove_wake_function+0x0/0x40
[  186.696280]  [<c060dcfc>] net_tx_action+0x3c/0xe0
[  186.700961]  [<c015c9c2>] vfs_read+0x132/0x140
[  186.705378]  [<c015cd41>] sys_read+0x41/0x70
[  186.709625]  [<c0102b66>] sysenter_past_esp+0x5f/0x89
[  186.714651]  =======================
[  186.718210] work: 65, weight: 64
[  186.721414] ->poll: (nv_napi_poll+0x0/0x760)

so nv_napi_poll() returned with 65. How is that possible? Ah ...:

              (rx_processed_cnt++ < limit)) {

that should be:

              (++rx_processed_cnt < limit)) {

right? Find the fix below.

	Ingo

-------------------->
Subject: forcedeth: fix the NAPI poll function
From: Ingo Molnar <mingo@elte.hu>

fix the forcedeth NAPI poll function to not emit this warning:

[  186.635916] WARNING: at net/core/dev.c:2166 net_rx_action()
[  186.641351]  [<c060d9f5>] net_rx_action+0x145/0x1b0
[  186.646191]  [<c011d752>] __do_softirq+0x42/0x90
[  186.650784]  [<c011d7c6>] do_softirq+0x26/0x30
[  186.655202]  [<c011db48>] local_bh_enable+0x48/0xa0
[  186.660055]  [<c06023e0>] lock_sock_nested+0xa0/0xc0
[  186.664995]  [<c065da16>] tcp_recvmsg+0x16/0xbc0
[  186.669588]  [<c013e94b>] __generic_file_aio_write_nolock+0x27b/0x520
[  186.676001]  [<c0601d75>] sock_common_recvmsg+0x45/0x70
[  186.681202]  [<c05ff5df>] sock_aio_read+0x11f/0x140
[  ...
From: David Miller
Date: Monday, October 15, 2007 - 3:39 pm

From: Ingo Molnar <mingo@elte.hu>

Excellent catch:

Acked-by: David S. Miller <davem@davemloft.net>
-

From: Jeff Garzik
Date: Monday, October 15, 2007 - 3:40 pm

Two comments:

1) we have a vague definition of "RX work processed."  Due to error 
conditions and goto's in that function, rx_processed_cnt may or may not 
equal the number of packets actually processed.

2) man I dislike these inline C statement combinations (ranting at 
original code style, not you).  I would much rather waste a few extra 
lines of source code and make the conditions obvious:

	while (... && (rx_processed_cnt < limit)) {
		rx_processed_cnt++;

		...
	}

or even

	while (1) {
		...
		if (rx_processed_cnt == limit)
			break;
		rx_processed_cnt++;
	}

The compiler certainly doesn't care, and IMO it prevents bugs.

	Jeff


-

From: Jeff Garzik
Date: Monday, October 15, 2007 - 3:41 pm

Though it wasn't clear, that's an ACK

I'll push the patch if nobody else beats me to it.

-

From: Ingo Molnar
Date: Monday, October 15, 2007 - 10:42 pm

agreed. Do you have an uptodate patch/git-URI for the forcedeth rewrite 
you did? I can throw it into the testbed.

	Ingo
-

From: Jeff Garzik
Date: Monday, October 15, 2007 - 10:47 pm

Branch 'fe-lock' of
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git

It works here locally, but at this very minute I am rewriting those 
changesets yet again :)

	Jeff



-

From: Ingo Molnar
Date: Monday, October 15, 2007 - 11:40 pm

FYI, looks good so far, ontop of Linus' latest -git tree.

btw., one thing i always found weird about forcedeth is that it 
generates an extra ~100 irqs per second even when there's no network 
traffic:

 11:      89752    XT-PIC-XT        eth0
 11:      89854    XT-PIC-XT        eth0
 11:      89955    XT-PIC-XT        eth0
 11:      90056    XT-PIC-XT        eth0
 11:      90157    XT-PIC-XT        eth0

(irq count snapshot every second). Just in case it's easy to fix ...

	Ingo
-

From: Jeff Garzik
Date: Tuesday, October 16, 2007 - 12:17 am

May I presume that behavior occurs in both mainline and my fe-lock branch?

That is likely the NIC's timer irq, which is programmed to kick the irq 
handler every so often.  DEV_NEED_TIMERIRQ is a starting point, as well 
as "Known bugs" at the top of forcedeth.c, if you're curious.

I'm _betting_ that some of that can be eliminated, but we'll see...

	Jeff


-

From: Ingo Molnar
Date: Tuesday, October 16, 2007 - 12:49 am

ok. Was just a FYI - you were looking for feedback :-)

	Ingo
-

From: Jeff Garzik
Date: Tuesday, October 16, 2007 - 9:52 am

Any chance of getting a quick comparison of forcedeth CPU usage in your 
setup, between old driver and fe-lock driver?

I'm curious about the effects of fe-lock on CPU usage, which NAPI is 
known to perturb (in positive or negative directions, depending on 
conditions).

I presume both drivers can work at wire speed, sans any "too many 

Thanks!  :)

	Jeff


-

From: Jeff Garzik
Date: Tuesday, October 16, 2007 - 2:20 pm

Would the attached patch be ok with people?

It's basically the same thing, except that it aligns a bit more closely 
with forcedeth rework stuff I'm doing.

I'll send upstream today unless people scream...

	Jeff


From: Ingo Molnar
Date: Wednesday, October 17, 2007 - 12:25 am

looks good to me.

Acked-by: Ingo Molnar <mingo@elte.hu>

	Ingo
-

From: Ingo Molnar
Date: Wednesday, October 17, 2007 - 3:18 am

albeit you forgot to merge half of my patch :-/ The (tested) patch below 
fixes that.

	Ingo

---------------->
Subject: forcedeth: fix rx-work condition in nv_rx_process_optimized() too
From: Ingo Molnar <mingo@elte.hu>

the merge of my previous fix to forcedeth.c lots an important hunk.

this patch fixes that.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 drivers/net/forcedeth.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Index: linux/drivers/net/forcedeth.c
===================================================================
--- linux.orig/drivers/net/forcedeth.c
+++ linux/drivers/net/forcedeth.c
@@ -2408,13 +2408,13 @@ static int nv_rx_process_optimized(struc
 	struct fe_priv *np = netdev_priv(dev);
 	u32 flags;
 	u32 vlanflags = 0;
-	u32 rx_processed_cnt = 0;
+	int rx_work = 0;
 	struct sk_buff *skb;
 	int len;
 
 	while((np->get_rx.ex != np->put_rx.ex) &&
 	      !((flags = le32_to_cpu(np->get_rx.ex->flaglen)) & NV_RX2_AVAIL) &&
-	      (rx_processed_cnt++ < limit)) {
+	      (rx_work < limit)) {
 
 		dprintk(KERN_DEBUG "%s: nv_rx_process_optimized: flags 0x%x.\n",
 					dev->name, flags);
@@ -2517,9 +2517,11 @@ next_pkt:
 			np->get_rx.ex = np->first_rx.ex;
 		if (unlikely(np->get_rx_ctx++ == np->last_rx_ctx))
 			np->get_rx_ctx = np->first_rx_ctx;
+
+		rx_work++;
 	}
 
-	return rx_processed_cnt;
+	return rx_work;
 }
 
 static void set_bufsize(struct net_device *dev)
-

From: Jeff Garzik
Date: Wednesday, October 17, 2007 - 5:21 pm

applied


-

From: David Miller
Date: Monday, October 15, 2007 - 3:18 pm

From: Ingo Molnar <mingo@elte.hu>

No, not in this case.  The driver must only netif_rx_complete()

That might help.

at least once, perhaps budget is being passed erroneously in as zero?
-

From: Jeff Garzik
Date: Monday, October 15, 2007 - 3:20 pm

FWIW, forcedeth has this funky disable_irq()-based locking mechanism 
that is currently the subject of some kernel.org bugzillas.  I would 
also check and make sure that isn't perturbing your results (even though 
it seems unrelated, based on what I read here).  That might include 
disabling MSI if it's enabled, and/or turning on spinlock debugging.

	Jeff




-

Previous thread: Re: What still uses the block layer? by Neil Brown on Monday, October 15, 2007 - 4:19 am. (4 messages)

Next thread: [patch 0/2] Protect crashkernel against BSS overlap by Bernhard Walle on Monday, October 15, 2007 - 4:50 am. (5 messages)