On Tue, 2008-08-12 at 09:32 -0700, Jeremy Fitzhardinge wrote:
All true. Hard stuff.
The IBM product works partly by limiting migrations to occurring on a
single physical ethernet network. Each container gets its own IP and
MAC address. The socket state is checkpointed quite fully and moved
along with the IP.
Me, personally, I think I'd probably "re-link" the thing, mark it as
such, ship it across like a normal file, then unlink it after the
restore. I don't know what we'd choose when actually implementing it.
I respectfully disagree. The number one prerequisite for
checkpoint/restart is isolation. Xen just happens to get this for free.
So, instead of saying that there's no explicit connection between the
process and its working set, ask yourself how we make a connection.
In this case, we can do it with a filesystem (mount) namespace. Each
container that we might want to checkpoint must have its writable
filesystems contained to a private set that are not shared with other
containers. Things like union mounts would help here, but aren't
necessarily required. They just make it more efficient.
Right. We just start with "everybody has their own disk" which is slow
and crappy and optimize it from there.
It's almost as big of a problem as trying to virtualize entire machines
and expecting them to run as fast as native. :)
Cool! I didn't know you guys did the IRIX implementation. I'm sure you
guys got a lot farther than any of us are. Did you guys ever write any
papers or anything on it? I'd be interested in more information.
-- Dave
--