Again ... i dont know this code well, but you yourself describe it
as "a_central_and_damn_rwlock" above.
"Central and damn" locks of any type, anywhere tend to cause such
trouble.
Often they are mini-BKL's in the making. Here's some historic
patterns:
1- First there's a convenient lock around a popular and useful
piece of data and code.
2- As popularity (and reach of code) grows, and some non-trivial
interaction ensues, a little bit of self-recursion is added to
the mix (this is often easier to do than to fix the root cause
of the problem) - making critical sections even easier to grow
in size.
3- Then attempts are made to make it scale all (it's a popular
piece of code), it's extended along a per-cpu axis, but
complexity of locking explode by taking a ton of locks all at
once. The solution: yell at the lock validator for not allowing
this (or for exploding due to the sheer mathematically
large complexity of the locking rules). Frequent requests for
an exclusion from those pesky validations are made, and various
hacks are done to work it around. It's all the fault of the lock
validator, of course.
4- At this point it's rarely a clean, tidy data lock anymore - it
tends to grow into a code lock nobody really knows how it works,
except that it better be taken more often than not, badness may
ensue otherwise. Nobody really knows when to take it, only that
it should be taken widely enough, that it should be recursive
enough to call even _more_ code from under it. Efforts to reduce
critical section length are rebuffed with: "this adds unlocking
and relocking overhead". And it should then all scale as well.
5- In the end it's a lock everyone curses but nobody is able to fix
anymore. "If it was easy to fix we'd have fixed it long ago
already" kind of fatalist thinking becomes widespread.
We saw many examples of this in the past: beyond the BKL (which is a
bit special as its more of an UP legacy - but which hurt us the
most), we had the tasklist_lock which was problematic for a long
time, then there's also the reiser3 locking which was extremely
monolithic. [ i'm only naming safe examples here, where nobody will
flame me =B-) ]
Whether this particular case applies is up to you. I see certain
matches up to phase 3 or so - but i also see certain dissimilarities
as well. Nor do i claim that it's easy to fix or improve.
Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html