Re: [ofa-general] NetEffect, iw_nes and kernel warning

Previous thread: [net-2.6 PATCH] e1000: fix bug with shared interrupt during reset by Jeff Kirsher on Tuesday, January 27, 2009 - 4:27 pm. (2 messages)

Next thread: [RFT 1/4] netfilter: change elements in x_tables by Stephen Hemminger on Tuesday, January 27, 2009 - 4:53 pm. (1 message)
From: Roland Dreier
Date: Tuesday, January 27, 2009 - 4:53 pm

Interesting... looks like an unfortunate interaction with unclear
locking rules.  See below for full explanation.

BTW, what workload are you running to hit this?

I assume you have CONFIG_HIGHMEM set?

 > WARNING: at kernel/softirq.c:136 local_bh_enable+0x9b/0xa0()

I assume this is

	WARN_ON_ONCE(in_irq() || irqs_disabled());

The interesting parts of the stack trace seem to be (reversing the order
so the story makes sense):

 [<e8e3f815>] nes_netdev_start_xmit+0x815/0x8a0 [iw_nes]

nes_netdev_start_xmit() calls skb_linearize() for nonlinear skbs it
can't handle, which calls __pskb_pull_tail():

 [<c048982c>] __pskb_pull_tail+0x5c/0x2e0

__pskb_pull_tail() calls skb_copy_bits():

 [<c0489c05>] skb_copy_bits+0x155/0x290

At least in some cases, skb_copy_bits() calls kmap_skb_frag() and more
to the point kunmap_skb_frag(), which looks like:

	static inline void kunmap_skb_frag(void *vaddr)
	{
		kunmap_atomic(vaddr, KM_SKB_DATA_SOFTIRQ);
	#ifdef CONFIG_HIGHMEM
		local_bh_enable();
	#endif
	}

which leads to:

 [<c012a79b>] local_bh_enable+0x9b/0xa0

which hits the irqs_disabled() warning because iw_nes is using LLTX, and
nes_netdev_start_xmit() does:

	local_irq_save(flags);
	if (!spin_trylock(&nesnic->sq_lock)) {

at the very beginning.

The best solution is probably for iw_nes to stop using LLTX and use the
main netdev lock... but actually I still don't see how it's safe for a
net driver to call skb_linearize() from its transmit routine, since
there's a chance that that will unconditionally enable BHs?

 - R.
--

From: David Miller
Date: Tuesday, January 27, 2009 - 5:07 pm

From: Roland Dreier <rdreier@cisco.com>

It's simply not allowed.

dev_queue_xmit() at a higher level can do __skb_linearize()
because it does so before doing the rcu_read_lock_bh().
--

From: Stephen Hemminger
Date: Tuesday, January 27, 2009 - 5:17 pm

On Tue, 27 Jan 2009 16:07:50 -0800 (PST)

If the device driver can't handle non-linear SKB's then it should
not set NETIF_F_SG.
--

From: Tung, Chien Tin
Date: Wednesday, January 28, 2009 - 9:36 am

Thank you for all the feedback.  We are looking into the issue.

Chien--

From: Roland Dreier
Date: Wednesday, January 28, 2009 - 11:05 am

> > but actually I still don't see how it's safe for a net driver to
 > > call skb_linearize() from its transmit routine, since there's a
 > > chance that that will unconditionally enable BHs?
 > 
 > It's simply not allowed.
 > 
 > dev_queue_xmit() at a higher level can do __skb_linearize()
 > because it does so before doing the rcu_read_lock_bh().

OK, thanks... what confused me is that several other drivers also do
skb_linearize() in their hard_start_xmit method... eg bnx2x,
via-velocity, mv643xx_eth.  So there are several other lurking bugs to
deal with here I guess.

 - R.
--

From: Stephen Hemminger
Date: Wednesday, January 28, 2009 - 12:05 pm

On Wed, 28 Jan 2009 10:05:29 -0800

They all look like lurking (and untested) bug paths. mv643xx is especially
bad since it can leak skb. But it should be possible to call pull_tail
if bh is disabled (as long as irqs are enabled).
--

From: Roland Dreier
Date: Wednesday, January 28, 2009 - 2:52 pm

> > OK, thanks... what confused me is that several other drivers also do
 > > skb_linearize() in their hard_start_xmit method... eg bnx2x,
 > > via-velocity, mv643xx_eth.  So there are several other lurking bugs to
 > > deal with here I guess.

 > They all look like lurking (and untested) bug paths. mv643xx is especially
 > bad since it can leak skb. But it should be possible to call pull_tail
 > if bh is disabled (as long as irqs are enabled).

Yes.  The only obvious problem with __pskb_pull_tail() with BHs disabled
is that with CONFIG_HIGHMEM set, it goes into kmap_skb_frag(), which
then unconditionally does local_bh_disable()/local_bh_enable().  There's
no reason in principle that kmap_skb_frag() couldn't do
local_save_flags()/local_restore_flags() instead.

Just grepping around I see other potential issues related to this, for
example the (unused but exported) function fcoe_fc_crc() does
kmap_atomic(KM_SKB_DATA_SOFTIRQ) without any particular BH disabling,
which might run into trouble if used in the wrong context...

 - R.
--

From: Herbert Xu
Date: Thursday, January 29, 2009 - 11:57 pm

I don't know about the rest but bnx2x is certainly OK since it
only does so with IRQ enabled.  It is legal to call skb_linearize
as long as you're sure that IRQs are enabled, which is always the
case for hard_start_xmit upon entry.

So the only time you can't call it in hard_start_xmit is if you've
just disabled IRQs yourself.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Eilon Greenstein
Date: Friday, January 30, 2009 - 1:22 am

[Empty message]
From: Stephen Hemminger
Date: Friday, January 30, 2009 - 9:25 am

On Fri, 30 Jan 2009 17:57:21 +1100

Or netconsole.
netconsole calls start_xmit from IRQ 
but it is safe since netconsole doesn't send fragmented skb's
--

From: Roland Dreier
Date: Friday, January 30, 2009 - 10:35 am

> > OK, thanks... what confused me is that several other drivers also do
 > > skb_linearize() in their hard_start_xmit method... eg bnx2x,
 > > via-velocity, mv643xx_eth.  So there are several other lurking bugs to
 > > deal with here I guess.
 > 
 > I don't know about the rest but bnx2x is certainly OK since it
 > only does so with IRQ enabled.  It is legal to call skb_linearize
 > as long as you're sure that IRQs are enabled, which is always the
 > case for hard_start_xmit upon entry.

I don't believe this is accurate.  Calling skb_linearize() (on a kernel
with CONFIG_HIGHMEM set) can end up calling local_bh_enable() in
kunmap_skb_frag(), which can obviously cause problems if the initial
context relies on having BHs disabled (as hard_start_xmit does).

 - R.
--

From: David Miller
Date: Friday, January 30, 2009 - 2:51 pm

From: Roland Dreier <rdreier@cisco.com>

local_bh_{enable,disable}() nests, so this is not a problem

--

From: Roland Dreier
Date: Friday, January 30, 2009 - 8:54 pm

> > I don't believe this is accurate.  Calling skb_linearize() (on a kernel
 > > with CONFIG_HIGHMEM set) can end up calling local_bh_enable() in
 > > kunmap_skb_frag(), which can obviously cause problems if the initial
 > > context relies on having BHs disabled (as hard_start_xmit does).

 > local_bh_{enable,disable}() nests, so this is not a problem

Duh.  OK, then the only bugs seem to be that iw_nes does skb_linearize
with irqs off (due to being an LLTX driver), and mv643xx_eth leaks an
skb on its error path if skb_linearize fails.

 - R.
--

From: Lennert Buytenhek
Date: Tuesday, April 21, 2009 - 2:09 am

(Found this when deleting old netdev@ mail...)  mv643xx_eth returns
NETDEV_TX_BUSY if skb_linearize fails, so the qdisc will requeue the
skb, and we shouldn't free it.  Am I missing something?
--

From: Herbert Xu
Date: Tuesday, April 21, 2009 - 5:49 am

I don't think the issue here is the leak.  Calling skb_linearize is
simply illegal if you support netpoll because netpoll will call the
xmit routine with IRQs off.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

From: Herbert Xu
Date: Tuesday, April 21, 2009 - 5:50 am

On the other hand if netpoll never generates a packet that requires
linearisation, maybe it will work :)
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--

Previous thread: [net-2.6 PATCH] e1000: fix bug with shared interrupt during reset by Jeff Kirsher on Tuesday, January 27, 2009 - 4:27 pm. (2 messages)

Next thread: [RFT 1/4] netfilter: change elements in x_tables by Stephen Hemminger on Tuesday, January 27, 2009 - 4:53 pm. (1 message)