On Tue, 19 Feb 2008 23:28:28 +0100
Pavel Machek <pavel@ucw.cz> wrote:
I suspect one problem could be that an HPC job scheduling program
does not know exactly how much memory each job can take, so it can
sometimes end up making a mistake and overcommitting the memory on
one HPC node.
In that case the user is better off having that job killed and
restarted elsewhere, than having all of the jobs on that node
crawl to a halt due to swapping.
Paul, is this guess correct? :)
--
All rights reversed.
--