On Thu, 2 Sep 2010, KAMEZAWA Hiroyuki wrote:
[...]
I have a system showing the failure case (but still do not have a way to
reliably repeat it)
Here are the two processes:
23586 pts/0 RL+ 5059:18 /net/homes/mhills/tmp/soaked-cgroup
23685 pts/6 DL+ 0:00 /net/homes/mhills/tmp/soaked-cgroup
23586 spends almost all of its time in 'RL+' status, occasionally it is
seen in 'DL+' status.
From my analysis before, both are blocked on rmdir(), but one is spinning,
holding the lock on the /cgroup, and the other is waiting for the lock. If
I strace 23586 then the rmdir() fails with EINTR.
How best to capture information which might show why the process spins?
--
Mark
--