[PATCH] softirq: Fix warnings triggered by netconsole

Previous thread: forcedeth MCP55 msi problems by Andreas Petlund on Thursday, November 6, 2008 - 9:31 am. (3 messages)

Next thread: [PATCH] Add support for Marvell 88E1118 PHY by Ron Madrid on Thursday, November 6, 2008 - 12:05 pm. (1 message)
From: Ferenc Wagner
Date: Thursday, November 6, 2008 - 10:40 am

Hi,

Since loading netconsole a couple of days ago on the two machines of a
heartbeat cluster hosting Xen virtual machines, I got the attached
warn_on_slowpath messages, one on each.  It never happened before, so
I thought I could as well report them in case it's interesting for
you.  The netconsole module parameter is like

netconsole=@10.253.2.8/eth-gb1,6666@10.253.2.1/00:09:6b:8c:88:6b

with different port numbers on the two hosts.  The eth-gb[12] physical
interfaces are bonded, but I couldn't run netconsole on bond0, as it
has no netpoll support.  The source IP is on bond0.  There are also
several 802.1q subinterfaces over bond0, which are bridge ports.

The kernel is basically 2.6.26.6 with the SuSE Xen patch for dom0
support (the Debian Lenny kernel).
---8<---
From: Jarek Poplawski
Date: Monday, November 17, 2008 - 6:35 am

This report: http://marc.info/?l=linux-netdev&m=122599341430090&w=2
shows local_bh_enable() is used in the wrong context (irqs disabled).
It happens when a usual network receive path is called by netconsole,
which simply turns off irqs around this all. Probably this is wrong,
but it worked like this long time, and it's not trivial to fix this.

Anyway, a commit 0f476b6d91a1395bda6464e653ce66ea9bea7167 "softirq:
remove irqs_disabled warning from local_bh_enable" can break things
after changing from local_irq_save() to local_irq_disable(). Before
this commit there was only a warning, now a lockup is possible, so
it could be treated as a regression. This patch reverts the change
in irqs.

Reported-by: Ferenc Wagner <wferi@niif.hu>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---

 kernel/softirq.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index e7c69a7..756c928 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -135,9 +135,12 @@ EXPORT_SYMBOL(_local_bh_enable);
 
 static inline void _local_bh_enable_ip(unsigned long ip)
 {
+#ifdef CONFIG_TRACE_IRQFLAGS
+	unsigned long flags;
+#endif
 	WARN_ON_ONCE(in_irq() || irqs_disabled());
 #ifdef CONFIG_TRACE_IRQFLAGS
-	local_irq_disable();
+	local_irq_save(flags);
 #endif
 	/*
 	 * Are softirqs going to be turned on now:
@@ -155,7 +158,7 @@ static inline void _local_bh_enable_ip(unsigned long ip)
 
 	dec_preempt_count();
 #ifdef CONFIG_TRACE_IRQFLAGS
-	local_irq_enable();
+	local_irq_restore(flags);
 #endif
 	preempt_check_resched();
 }
--

From: Johannes Berg
Date: Monday, November 17, 2008 - 7:18 am

Unfortunately my brain lacks the magic to decrypt x86 stack traces, so
I'm unable to read much from that report other than that it hit the
WARN_ON. That looks more like the TX path to me? Anyway, my patch made
that trigger for everybody rather than just on NOPREEMPT/UP (or
something like that) and made the code easier to understand by removing
the flags that are pointless anyway if the API is used correctly.

You can find discussion around the patch at

Do we have evidence of this actually hitting often? This is the first
report of anything going wrong that I've seen ever since a single one
right after this commit went into testing five months ago.

IFF we want to add this back (and I'm not in favour) then please add a
big comment that this is only to accomodate broken users.

johannes
From: Jarek Poplawski
Date: Tuesday, November 18, 2008 - 12:49 am

Yes, it seems there should be more such reports from netconsole users.
But, I guess we kind of expect this if we still use WARN_ON and not
BUG_ON here?

Jarek P.
--

From: Ingo Molnar
Date: Monday, November 17, 2008 - 9:16 am

hm, but calling local_bh_enable() with hardirqs off is a bug. It might 
be a long-standing bug, but it can cause lockups even with that change 
reverted: when we process softirqs in local_bh_enable(). So why not 
fix the bug instead?

	Ingo
--

From: Jarek Poplawski
Date: Tuesday, November 18, 2008 - 12:38 am

I think it's what they call a regression: this is a long-standing bug,

It's not about instead: this bug could be fixed as well (if somebody
knows how to do it "properly" without hacks like:
	if (!in_irq())
		local_bh_disable();
etc.; but, I guess, the network code has more such bh disabling).

Jarek P.
--

From: Jarek Poplawski
Date: Wednesday, November 19, 2008 - 1:41 am

[Empty message]
From: Ingo Molnar
Date: Wednesday, November 19, 2008 - 2:30 am

but netconsole can be triggered from printk - and printk can be called 
from hardirqs-off sections - so this doesnt really fix the bug. 
Netconsole should not do BH processing.

	Ingo
--

From: David Miller
Date: Wednesday, November 19, 2008 - 2:42 am

From: Ingo Molnar <mingo@elte.hu>

Well, it sort of "has to".  It calls the NAPI ->poll() method of the
driver to try and make forward progress with TX reclaim so it can
send new messages.

It is very careful not to recursively invoke into ->poll() and other
nasty situations.

Didn't you write some of this code Ingo a very long time ago? :-)))

Anyways, I'll look more closely at this and the original report,
this never was a problem before.
--

From: Ingo Molnar
Date: Wednesday, November 19, 2008 - 3:14 am

it is causing problems, so i'd like to distance myself from that 
careless mistake of my youth! [ And once it starts working again i'd 
like to take credit for having started that brilliant piece of 

Thanks! We strengthened/cleaned up the bh checks a tiny bit recently, 
maybe that caused this to pop up. It might explain some netconsole 
lockups perhaps? Or, the new warning might be bogus. It's a bit of a 
maze.

	Ingo
--

From: David Miller
Date: Wednesday, November 19, 2008 - 3:17 am

From: Ingo Molnar <mingo@elte.hu>

See my other reply to Jarek's patch posting, we're supposed to
ignore all RX packets in this situation, and all the logic appears
to be there, but it's not happening on that person's system for some
reason.
--

From: Ingo Molnar
Date: Wednesday, November 19, 2008 - 3:21 am

Saw that ... an insanely complex scenario.

So at least there's a partial result: it appears to not be a false 
positive warning in the generic softirq/lockdep code.

	Ingo
--

From: David Miller
Date: Wednesday, November 19, 2008 - 3:22 am

From: Ingo Molnar <mingo@elte.hu>

Right, it's totally legit.
--

From: David Miller
Date: Wednesday, November 19, 2008 - 3:10 am

From: Jarek Poplawski <jarkao2@gmail.com>

This backtrace call path is anything but "usual".

There is a UDP send, and a local_bh_enable() done there triggers
softirqs, which processes device (I would guess loopback, or
some non-NAPI device) backlog processing.

This triggers an input netfilter ipt_LOG rule, which triggers a
printk over netconsole.

netpoll finds the TX queue of the tg3 device (which is NAPI)
full, so it invokes ->poll() to try and free up some TX queue
space.

This in turn triggers a packet receive, which hits bridging netfilter,
which does a conntrack lookup, which does the BH disable/enable which
triggers the warning.

I mean, this is just a very crazy trace :-)

I see an easy way around this.  We should just punt on non-netpoll
RX packets that try to get processed during a netconsole ->poll()
run.

Wait a second, this is _supposed_ to be already happening.  Perhaps
the tests in netpoll_rx() are screwed up.

net/core/netpoll.c'd poll_one_napi() explicitly sets NETPOLL_RX_DROP
in npinfo->rx_flags as well as increment "trapped", and that is
supposed to make netpoll_rx() return non-zero which tells the caller
to ignore and drop all RX packets.

What isn't that happening?

netpoll_rx()'s test is:

	if (!npinfo || (!npinfo->rx_np && !npinfo->rx_flags))
		return 0;

which looks perfectly fine.  npinfo will be non-NULL by definition
because we are doing netpoll for netconsole on this device, and
->rx_flags are non-zero by the above mentioned code.

So we get to __netpoll_rx() by the next bit of code.

	if (npinfo->rx_flags && __netpoll_rx(skb))
		ret = 1;

And __netpoll_rx() says:

	struct netpoll *np = npi->rx_np;

	if (!np)
		goto out;

this should trigger because a netpoll RX agent shouldn't be attached,
then:

out:
	if (atomic_read(&trapped)) {
		kfree_skb(skb);
		return 1;
	}

	return 0;

which should trigger because trapped is non-zero, thus we free the SKB
and return 1.

We need to figure out why this isn't working ...
From: Jarek Poplawski
Date: Wednesday, November 19, 2008 - 1:41 am

Consider netconsole case as special in local_bh_enable()/_disable().
This patch skips in_irq() and irqs_disabled() warnings for NETPOLL
configs when it's safe wrt. do_softirq().

Reported-by: Ferenc Wagner <wferi@niif.hu>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---
[apply on top of my first softirq patch in this thread]

diff -Nurp a/kernel/softirq.c b/kernel/softirq.c
--- a/kernel/softirq.c	2008-11-19 07:33:23.000000000 +0000
+++ b/kernel/softirq.c	2008-11-19 07:26:28.000000000 +0000
@@ -76,7 +76,12 @@ static void __local_bh_disable(unsigned 
 {
 	unsigned long flags;
 
+#ifdef CONFIG_NETPOLL
+	if (!softirq_count())
+		WARN_ON_ONCE(in_irq());
+#else
 	WARN_ON_ONCE(in_irq());
+#endif
 
 	raw_local_irq_save(flags);
 	add_preempt_count(SOFTIRQ_OFFSET);
@@ -138,7 +143,16 @@ static inline void _local_bh_enable_ip(u
 #ifdef CONFIG_TRACE_IRQFLAGS
 	unsigned long flags;
 #endif
+#ifdef CONFIG_NETPOLL
+	/*
+	 * Special-case - netconsole runs network code with all interrupts
+	 * disabled. Warn only if it can be really dangerous:
+	 */
+	if (softirq_count() == SOFTIRQ_OFFSET)
+		WARN_ON_ONCE(in_irq() || irqs_disabled());
+#else
 	WARN_ON_ONCE(in_irq() || irqs_disabled());
+#endif
 #ifdef CONFIG_TRACE_IRQFLAGS
 	local_irq_save(flags);
 #endif
--

From: Ingo Molnar
Date: Wednesday, November 19, 2008 - 2:32 am

this is a very ugly patch, not really acceptable.

printk methods should not be doing softirq processing - period.

	Ingo
--

From: Jarek Poplawski
Date: Wednesday, November 19, 2008 - 4:07 am

Well, it's a question of taste. Anyway, this patch is only about
warnings, so no big deal. But I still think the first patch reverting
local_irq_save() -> local_irq_disable() change should be applied.
There is no need to give users any additional lockups risk while we
know there are unsolved issues.

BTW, the current situation with: local_irq_disable() in
_local_bh_enable() and local_irq_save() in do_softirq() doesn't make
much sense. I know, there is local_irq_disable() in __do_softirq()
again, but it can be often skipped on this path because of
in_interrupt() test (and there is soon this local_irq_restore()
in do_softirq()).

Jarek P.
--

Previous thread: forcedeth MCP55 msi problems by Andreas Petlund on Thursday, November 6, 2008 - 9:31 am. (3 messages)

Next thread: [PATCH] Add support for Marvell 88E1118 PHY by Ron Madrid on Thursday, November 6, 2008 - 12:05 pm. (1 message)