> If you think only about target processes, yeah sure, you can cover
This is why I think it is important to define the limits of
which kernel state features are covered (or going to be
covered) by checkpoint/restart - and then list applications
that are supported (Oren mentioned mysql server in this thread).
It will always be easy for someone to point at some application
like powertop and say "we can't migrate that, so checkpoint
restart is therefore useless" ... this just is not true. This
can be useful without having to be complete (as long as the
limits are well defined).
See above - it may be enough to cover a significant number of
useful cases.
Okay - so "dbus" is in the list of "can't so that no, and will
never be able to checkpoint/restore that class" - big deal. I'm
getting repetitive no, but one last time: just because this can't
handle every conceivable case doesn't make it useless.
I don't think that you'll ever make virtualization good enough
to make the HPC people happy.
The CR cool-aid hasn't gotten so far into my system to accept
this claim. If these "can't stop for more than a few milli-seconds"
processes are HPC workloads, then I'm not seeing how you can do
much to help them. I think these applications are using almost
all of the RAM on the system, and most of the pages are anonymous.
Just how do you checkpoint several GB of dirty pages in a few
milli-seconds (when there is almost no free memory on the system)?
If you have something else in mind, then please explain a little more.
-Tony
--