(Cc'ing netdev) Ok, after decoding the lockdep output, it looks like that netif_receive_skb() should call rcu_read_lock_bh() instead of rcu_read_lock()? But I don't know if all callers of netif_receive_skb() are in softirq context. Paul, what do you think? Thank you. --
From: Américo Wang <xiyou.wangcong@gmail.com> Normally, netif_receive_skb() is invoked from softirq context. However, via netpoll it can be invoked essentially from any context. But, when this happens, the networking receive path makes amends such that this works fine. That's what the netpoll_receive_skb() check in netif_receive_skb() is for. That check makes it bail out early if the call to netif_receive_skb() is via a netpoll invocation. --
Oh, I see. This means we should call rcu_read_lock_bh() instead. If Paul has no objections, I will send a patch for this. Thanks much, David! --
Nope, its calling rcu_read_lock() from interrupt context and it should stay as is (we dont need to disable bh, this has a cpu cost) --
Oh, but lockdep complains about rcu_read_lock(), it said rcu_read_lock() can't be used in softirq context. Am I missing something? --
Well, lockdep might be dumb, I dont know... I suggest you read rcu_read_lock_bh kernel doc : /** * rcu_read_lock_bh - mark the beginning of a softirq-only RCU critical section * * This is equivalent of rcu_read_lock(), but to be used when updates * are being done using call_rcu_bh(). Since call_rcu_bh() callbacks * consider completion of a softirq handler to be a quiescent state, * a process in RCU read-side critical section must be protected by * disabling softirqs. Read-side critical sections in interrupt context * can use just rcu_read_lock(). * */ Last sentence being perfect : Read-side critical sections in interrupt context can use just rcu_read_lock(). --
Yeah, right, then it is more likely to be a bug of rcu lockdep. Paul is looking at it. Thanks! --
Except that it seems to be working correctly for me... Thanx, Paul --
Hmm, then I am confused. The only possibility here is that this is a lockdep bug... Thanks for your help! --
I believe so...
Peter, this looks odd:
kernel: (usbfs_mutex){+.?...}, at: [<ffffffff8146419f>]
netif_receive_skb+0xe7/0x819
netif_receive_skb() never has a chance to take usbfs_mutex. How can this
comes out?
--
Ok, I think I found what lockdep really complains about, it is that we took spin_lock in netpoll_poll_lock() which is in hardirq-enabled environment, later, we took another spin_lock with spin_lock_irqsave() in netpoll_rx(), so lockdep thought we broke the locking rule. I don't know why netpoll_rx() needs irq disabled, it looks like that no one takes rx_lock in hardirq context. So can we use spin_lock(&rx_lock) instead? Or am I missing something here? Eric? David? Thanks! --
I am a bit lost. Could you give the complete picture, because I cannot find it in my netdev archives. --
Sure, sorry for this. Here is the whole thread: http://lkml.org/lkml/2010/3/11/100 --
OK thanks netpoll_rx() can be called from hard irqs (netif_rx()), so rx_lock definitly needs irq care. netpoll_poll_lock() does take a spinlock with irq enabled, but its not rx_lock, its napi->poll_lock. I dont see what could be the problem, is it reproductible with vanilla kernel ? --
Yeah, I knew, but besides rcu locks, these two locks are the only locks that can be taken in the call chain. I suspect lockdep got No. I don't know why, my patch doesn't touch any function in the call chain. I already "fix" this in another way, so no need to worry this any more. Thanks for your help! --
Hmmm... It is supposed to be OK to use rcu_read_lock() in pretty much any context, even NMI. I will take a look. Thanx, Paul --
Thanks! Please let me know if you have new progress. --
