the freezes that Miklos was seeing were hardirq contexts blocking in
task_rq_lock() - that is done with interrupts disabled. (Miklos i think
also tried !NOHZ kernels and older kernels, with a similar result.)
plus on the ptrace side, the wait_task_inactive() code had most of its
overhead in the atomic op, so if any timer IRQ hit _that_ core, it was
likely while we were still holding the runqueue lock!
i think the only thing that eventually got Miklos' laptop out of the
wedge were timer irqs hitting the ptrace CPU in exactly those
instructions where it was not holding the runqueue lock. (or perhaps an
asynchronous SMM event delaying it for a long time)
Ingo
-