On Fri, 11 Jun 2010 12:12:48 +1000
Herbert Xu <herbert@gondor.hengli.com.au> wrote:
And there it is, an unbalanced rtnl_unlock().
This stupid little thing took me over a day's work to find - it's just
been awful.
The user-visible symptom was that a bug in the netpoll code causes the
machine to hang after loading ipv6 (!), because
addrconf_fixup_forwarding()'s rtnl_trylock() kept on failing, and the
restart_syscall() kept on getting restarted, so an initscripts procfs
write just kept banging its head against the excessively-unlocked
mutex.
The mutex code handles an excessively-unlocked mutex (mutex.count==2)
really badly. Some API functions say "its locked", others say "it
isn't", etc.
Maybe it's better with mutex debugging enabled - didn't try that.
Things get pretty user-unfriendly when there's a bug within the
netconsole code itself.
Enabling lockdep simply made the bug cure itself - I suspect the mutex
code's handling of mutexes is different if lockdep is enabled. That
would be pretty bad behaviour from the lockdep code.
I just removed the rtnl_unlock() - I couldn't see much in there which
needed rtnl_locking..
Dave, the fixup should be folded into the original patch please -
otherwise we'll have a machine-hangs-up bisection hole which spans two
weeks work of commits.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html