Hi, Since loading netconsole a couple of days ago on the two machines of a heartbeat cluster hosting Xen virtual machines, I got the attached warn_on_slowpath messages, one on each. It never happened before, so I thought I could as well report them in case it's interesting for you. The netconsole module parameter is like netconsole=@10.253.2.8/eth-gb1,6666@10.253.2.1/00:09:6b:8c:88:6b with different port numbers on the two hosts. The eth-gb[12] physical interfaces are bonded, but I couldn't run netconsole on bond0, as it has no netpoll support. The source IP is on bond0. There are also several 802.1q subinterfaces over bond0, which are bridge ports. The kernel is basically 2.6.26.6 with the SuSE Xen patch for dom0 support (the Debian Lenny kernel). ---8<---
This report: http://marc.info/?l=linux-netdev&m=122599341430090&w=2 shows local_bh_enable() is used in the wrong context (irqs disabled). It happens when a usual network receive path is called by netconsole, which simply turns off irqs around this all. Probably this is wrong, but it worked like this long time, and it's not trivial to fix this. Anyway, a commit 0f476b6d91a1395bda6464e653ce66ea9bea7167 "softirq: remove irqs_disabled warning from local_bh_enable" can break things after changing from local_irq_save() to local_irq_disable(). Before this commit there was only a warning, now a lockup is possible, so it could be treated as a regression. This patch reverts the change in irqs. Reported-by: Ferenc Wagner <wferi@niif.hu> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> --- kernel/softirq.c | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/kernel/softirq.c b/kernel/softirq.c index e7c69a7..756c928 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -135,9 +135,12 @@ EXPORT_SYMBOL(_local_bh_enable); static inline void _local_bh_enable_ip(unsigned long ip) { +#ifdef CONFIG_TRACE_IRQFLAGS + unsigned long flags; +#endif WARN_ON_ONCE(in_irq() || irqs_disabled()); #ifdef CONFIG_TRACE_IRQFLAGS - local_irq_disable(); + local_irq_save(flags); #endif /* * Are softirqs going to be turned on now: @@ -155,7 +158,7 @@ static inline void _local_bh_enable_ip(unsigned long ip) dec_preempt_count(); #ifdef CONFIG_TRACE_IRQFLAGS - local_irq_enable(); + local_irq_restore(flags); #endif preempt_check_resched(); } --
Unfortunately my brain lacks the magic to decrypt x86 stack traces, so I'm unable to read much from that report other than that it hit the WARN_ON. That looks more like the TX path to me? Anyway, my patch made that trigger for everybody rather than just on NOPREEMPT/UP (or something like that) and made the code easier to understand by removing the flags that are pointless anyway if the API is used correctly. You can find discussion around the patch at Do we have evidence of this actually hitting often? This is the first report of anything going wrong that I've seen ever since a single one right after this commit went into testing five months ago. IFF we want to add this back (and I'm not in favour) then please add a big comment that this is only to accomodate broken users. johannes
Yes, it seems there should be more such reports from netconsole users. But, I guess we kind of expect this if we still use WARN_ON and not BUG_ON here? Jarek P. --
hm, but calling local_bh_enable() with hardirqs off is a bug. It might be a long-standing bug, but it can cause lockups even with that change reverted: when we process softirqs in local_bh_enable(). So why not fix the bug instead? Ingo --
I think it's what they call a regression: this is a long-standing bug, It's not about instead: this bug could be fixed as well (if somebody knows how to do it "properly" without hacks like: if (!in_irq()) local_bh_disable(); etc.; but, I guess, the network code has more such bh disabling). Jarek P. --
but netconsole can be triggered from printk - and printk can be called from hardirqs-off sections - so this doesnt really fix the bug. Netconsole should not do BH processing. Ingo --
From: Ingo Molnar <mingo@elte.hu> Well, it sort of "has to". It calls the NAPI ->poll() method of the driver to try and make forward progress with TX reclaim so it can send new messages. It is very careful not to recursively invoke into ->poll() and other nasty situations. Didn't you write some of this code Ingo a very long time ago? :-))) Anyways, I'll look more closely at this and the original report, this never was a problem before. --
it is causing problems, so i'd like to distance myself from that careless mistake of my youth! [ And once it starts working again i'd like to take credit for having started that brilliant piece of Thanks! We strengthened/cleaned up the bh checks a tiny bit recently, maybe that caused this to pop up. It might explain some netconsole lockups perhaps? Or, the new warning might be bogus. It's a bit of a maze. Ingo --
From: Ingo Molnar <mingo@elte.hu> See my other reply to Jarek's patch posting, we're supposed to ignore all RX packets in this situation, and all the logic appears to be there, but it's not happening on that person's system for some reason. --
Saw that ... an insanely complex scenario. So at least there's a partial result: it appears to not be a false positive warning in the generic softirq/lockdep code. Ingo --
From: Ingo Molnar <mingo@elte.hu> Right, it's totally legit. --
From: Jarek Poplawski <jarkao2@gmail.com>
This backtrace call path is anything but "usual".
There is a UDP send, and a local_bh_enable() done there triggers
softirqs, which processes device (I would guess loopback, or
some non-NAPI device) backlog processing.
This triggers an input netfilter ipt_LOG rule, which triggers a
printk over netconsole.
netpoll finds the TX queue of the tg3 device (which is NAPI)
full, so it invokes ->poll() to try and free up some TX queue
space.
This in turn triggers a packet receive, which hits bridging netfilter,
which does a conntrack lookup, which does the BH disable/enable which
triggers the warning.
I mean, this is just a very crazy trace :-)
I see an easy way around this. We should just punt on non-netpoll
RX packets that try to get processed during a netconsole ->poll()
run.
Wait a second, this is _supposed_ to be already happening. Perhaps
the tests in netpoll_rx() are screwed up.
net/core/netpoll.c'd poll_one_napi() explicitly sets NETPOLL_RX_DROP
in npinfo->rx_flags as well as increment "trapped", and that is
supposed to make netpoll_rx() return non-zero which tells the caller
to ignore and drop all RX packets.
What isn't that happening?
netpoll_rx()'s test is:
if (!npinfo || (!npinfo->rx_np && !npinfo->rx_flags))
return 0;
which looks perfectly fine. npinfo will be non-NULL by definition
because we are doing netpoll for netconsole on this device, and
->rx_flags are non-zero by the above mentioned code.
So we get to __netpoll_rx() by the next bit of code.
if (npinfo->rx_flags && __netpoll_rx(skb))
ret = 1;
And __netpoll_rx() says:
struct netpoll *np = npi->rx_np;
if (!np)
goto out;
this should trigger because a netpoll RX agent shouldn't be attached,
then:
out:
if (atomic_read(&trapped)) {
kfree_skb(skb);
return 1;
}
return 0;
which should trigger because trapped is non-zero, thus we free the SKB
and return 1.
We need to figure out why this isn't working ...Consider netconsole case as special in local_bh_enable()/_disable().
This patch skips in_irq() and irqs_disabled() warnings for NETPOLL
configs when it's safe wrt. do_softirq().
Reported-by: Ferenc Wagner <wferi@niif.hu>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---
[apply on top of my first softirq patch in this thread]
diff -Nurp a/kernel/softirq.c b/kernel/softirq.c
--- a/kernel/softirq.c 2008-11-19 07:33:23.000000000 +0000
+++ b/kernel/softirq.c 2008-11-19 07:26:28.000000000 +0000
@@ -76,7 +76,12 @@ static void __local_bh_disable(unsigned
{
unsigned long flags;
+#ifdef CONFIG_NETPOLL
+ if (!softirq_count())
+ WARN_ON_ONCE(in_irq());
+#else
WARN_ON_ONCE(in_irq());
+#endif
raw_local_irq_save(flags);
add_preempt_count(SOFTIRQ_OFFSET);
@@ -138,7 +143,16 @@ static inline void _local_bh_enable_ip(u
#ifdef CONFIG_TRACE_IRQFLAGS
unsigned long flags;
#endif
+#ifdef CONFIG_NETPOLL
+ /*
+ * Special-case - netconsole runs network code with all interrupts
+ * disabled. Warn only if it can be really dangerous:
+ */
+ if (softirq_count() == SOFTIRQ_OFFSET)
+ WARN_ON_ONCE(in_irq() || irqs_disabled());
+#else
WARN_ON_ONCE(in_irq() || irqs_disabled());
+#endif
#ifdef CONFIG_TRACE_IRQFLAGS
local_irq_save(flags);
#endif
--
this is a very ugly patch, not really acceptable. printk methods should not be doing softirq processing - period. Ingo --
Well, it's a question of taste. Anyway, this patch is only about warnings, so no big deal. But I still think the first patch reverting local_irq_save() -> local_irq_disable() change should be applied. There is no need to give users any additional lockups risk while we know there are unsolved issues. BTW, the current situation with: local_irq_disable() in _local_bh_enable() and local_irq_save() in do_softirq() doesn't make much sense. I know, there is local_irq_disable() in __do_softirq() again, but it can be often skipped on this path because of in_interrupt() test (and there is soon this local_irq_restore() in do_softirq()). Jarek P. --
