On Thu, Aug 21, 2008 at 08:07:43AM -0500, Christoph Lameter wrote:
How could we do this. It was a _HUGE_ problem on altix boxes. When you
started a jobs with a large number of MPI ranks, they would all start
from the shepherd process on a single node and the children would
migrate to a different cpu. Unless subsequent jobs used enough memory
to flush those remote quicklists, we would end up with a depleted node
that never reclaimed.
Thanks,
Robin
--