Interesting... looks like an unfortunate interaction with unclear
locking rules. See below for full explanation.
BTW, what workload are you running to hit this?
I assume you have CONFIG_HIGHMEM set?
> WARNING: at kernel/softirq.c:136 local_bh_enable+0x9b/0xa0()
I assume this is
WARN_ON_ONCE(in_irq() || irqs_disabled());
The interesting parts of the stack trace seem to be (reversing the order
so the story makes sense):
[<e8e3f815>] nes_netdev_start_xmit+0x815/0x8a0 [iw_nes]
nes_netdev_start_xmit() calls skb_linearize() for nonlinear skbs it
can't handle, which calls __pskb_pull_tail():
[<c048982c>] __pskb_pull_tail+0x5c/0x2e0
__pskb_pull_tail() calls skb_copy_bits():
[<c0489c05>] skb_copy_bits+0x155/0x290
At least in some cases, skb_copy_bits() calls kmap_skb_frag() and more
to the point kunmap_skb_frag(), which looks like:
static inline void kunmap_skb_frag(void *vaddr)
{
kunmap_atomic(vaddr, KM_SKB_DATA_SOFTIRQ);
#ifdef CONFIG_HIGHMEM
local_bh_enable();
#endif
}
which leads to:
[<c012a79b>] local_bh_enable+0x9b/0xa0
which hits the irqs_disabled() warning because iw_nes is using LLTX, and
nes_netdev_start_xmit() does:
local_irq_save(flags);
if (!spin_trylock(&nesnic->sq_lock)) {
at the very beginning.
The best solution is probably for iw_nes to stop using LLTX and use the
main netdev lock... but actually I still don't see how it's safe for a
net driver to call skb_linearize() from its transmit routine, since
there's a chance that that will unconditionally enable BHs?
- R.
--
From: Roland Dreier <rdreier@cisco.com> It's simply not allowed. dev_queue_xmit() at a higher level can do __skb_linearize() because it does so before doing the rcu_read_lock_bh(). --
On Tue, 27 Jan 2009 16:07:50 -0800 (PST) If the device driver can't handle non-linear SKB's then it should not set NETIF_F_SG. --
Thank you for all the feedback. We are looking into the issue. Chien--
> > but actually I still don't see how it's safe for a net driver to > > call skb_linearize() from its transmit routine, since there's a > > chance that that will unconditionally enable BHs? > > It's simply not allowed. > > dev_queue_xmit() at a higher level can do __skb_linearize() > because it does so before doing the rcu_read_lock_bh(). OK, thanks... what confused me is that several other drivers also do skb_linearize() in their hard_start_xmit method... eg bnx2x, via-velocity, mv643xx_eth. So there are several other lurking bugs to deal with here I guess. - R. --
On Wed, 28 Jan 2009 10:05:29 -0800 They all look like lurking (and untested) bug paths. mv643xx is especially bad since it can leak skb. But it should be possible to call pull_tail if bh is disabled (as long as irqs are enabled). --
> > OK, thanks... what confused me is that several other drivers also do > > skb_linearize() in their hard_start_xmit method... eg bnx2x, > > via-velocity, mv643xx_eth. So there are several other lurking bugs to > > deal with here I guess. > They all look like lurking (and untested) bug paths. mv643xx is especially > bad since it can leak skb. But it should be possible to call pull_tail > if bh is disabled (as long as irqs are enabled). Yes. The only obvious problem with __pskb_pull_tail() with BHs disabled is that with CONFIG_HIGHMEM set, it goes into kmap_skb_frag(), which then unconditionally does local_bh_disable()/local_bh_enable(). There's no reason in principle that kmap_skb_frag() couldn't do local_save_flags()/local_restore_flags() instead. Just grepping around I see other potential issues related to this, for example the (unused but exported) function fcoe_fc_crc() does kmap_atomic(KM_SKB_DATA_SOFTIRQ) without any particular BH disabling, which might run into trouble if used in the wrong context... - R. --
I don't know about the rest but bnx2x is certainly OK since it only does so with IRQ enabled. It is legal to call skb_linearize as long as you're sure that IRQs are enabled, which is always the case for hard_start_xmit upon entry. So the only time you can't call it in hard_start_xmit is if you've just disabled IRQs yourself. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
On Fri, 30 Jan 2009 17:57:21 +1100 Or netconsole. netconsole calls start_xmit from IRQ but it is safe since netconsole doesn't send fragmented skb's --
> > OK, thanks... what confused me is that several other drivers also do > > skb_linearize() in their hard_start_xmit method... eg bnx2x, > > via-velocity, mv643xx_eth. So there are several other lurking bugs to > > deal with here I guess. > > I don't know about the rest but bnx2x is certainly OK since it > only does so with IRQ enabled. It is legal to call skb_linearize > as long as you're sure that IRQs are enabled, which is always the > case for hard_start_xmit upon entry. I don't believe this is accurate. Calling skb_linearize() (on a kernel with CONFIG_HIGHMEM set) can end up calling local_bh_enable() in kunmap_skb_frag(), which can obviously cause problems if the initial context relies on having BHs disabled (as hard_start_xmit does). - R. --
From: Roland Dreier <rdreier@cisco.com>
local_bh_{enable,disable}() nests, so this is not a problem
--
> > I don't believe this is accurate. Calling skb_linearize() (on a kernel
> > with CONFIG_HIGHMEM set) can end up calling local_bh_enable() in
> > kunmap_skb_frag(), which can obviously cause problems if the initial
> > context relies on having BHs disabled (as hard_start_xmit does).
> local_bh_{enable,disable}() nests, so this is not a problem
Duh. OK, then the only bugs seem to be that iw_nes does skb_linearize
with irqs off (due to being an LLTX driver), and mv643xx_eth leaks an
skb on its error path if skb_linearize fails.
- R.
--
(Found this when deleting old netdev@ mail...) mv643xx_eth returns NETDEV_TX_BUSY if skb_linearize fails, so the qdisc will requeue the skb, and we shouldn't free it. Am I missing something? --
I don't think the issue here is the leak. Calling skb_linearize is simply illegal if you support netpoll because netpoll will call the xmit routine with IRQs off. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
On the other hand if netpoll never generates a packet that requires linearisation, maybe it will work :) -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --
