Since the machine seems to be otherwise alive, can you do a sysrq-W (which
is most easily done by just doing a
echo w > /proc/sysrq-trigger
and you don't actually need any console access or anything like that).
That should give you all the blocked process traces, and if it's a
deadlock on some semaphore or other, it should all stand out quite nicely.
In fact, things like the above are probably worth scripting for any
automated testing - if you auto-fail after some time, please make the
failure case do that sysrq-W by default.
(The other sysrq things can be useful too - "T" shows the same as "W",
except for _all_ tasks, which is often so verbose that it hides the
problem, but is sometimes the right thing to do).
Linus
-