Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Oren Laadan
Date: Saturday, November 20, 2010 - 11:05 am

Hi,

Based on discussion with Gene, I'd like to clarify key points and
difference between kernel and userspace approaches (specifically
linux-cr and dmtcp): three parts to break the long post...

part I: perpsectice about the types of scopes of c/r in discussion
part II: linux-cr design adn objectives
part III: comparison kernel/userspace approaches

[now relax, grab (another) cup of coffee and read on...]

PART I:  ==PERSPECTIVE==

A rough classification of c/r categories:

* container-c/r: important use-case, e.g. c/r and migration of an
  application containers like VPS (virtual private server), VDI
  (desktop) or  other self-contained application (e.g. Oracle server).
  Here _all_ the relevant processes are included in the checkpoint.

* standalone-c/r: another use-case is standalone-c/r where a set of
  processes is checkpointed, but not the entire environment, and then
  those processes are restarted in a different "eco-system".

* distributed-c/r: meaning several sets of processes, each running
  on a different host. (Each set may be a separate container there).

In container-c/r, the main challenge is to be _reliable_ in the sense
that a restart from a successful checkpoint should always succeed.

In standalone-c/r, the main challenge is that an application resumes
execution after a restart in a possible _different_ eco-system. Some
application don't care (e.g 'bc'). Other applications do care, and to
different degrees; for these we need "glue" to pacify the application.

There are generally three types of "glue":

(1) Modify the application or selected libraries to be c/r-aware, and
  notify it when restart completes. (e.g. CoCheck MPI library).
(2) Add a userspace helper that will run post-restart to do necessary
  trickery (eg. send a SIGWINCH to 'screen'; mount proper filesystem
  at the new host after migration; reconnect a socket to a peer).
(3) Use interposition on selected library calls and add wrapper code
  that will glue in what's missing (e.g. dbus or nscd calls to
  reconnect an application to those services).

IMPORTANT: the glueing method is _orthogonal_ to how the c/r is done !
We are strictly discussion the core c/r functionality.

(next part: linux-cr philosophy...)

Thanks,

Oren.
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Christoph Hellwig, (Tue Nov 2, 2:47 pm)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Christoph Hellwig, (Thu Nov 4, 7:25 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Fri Nov 5, 10:17 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Sukadev Bhattiprolu, (Fri Nov 5, 10:31 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Sun Nov 7, 11:49 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Sun Nov 7, 12:42 pm)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Mon Nov 8, 11:37 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Serge E. Hallyn, (Wed Nov 17, 8:39 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Wed Nov 17, 10:04 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Pavel Emelyanov, (Thu Nov 18, 2:13 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Jose R. Santos, (Thu Nov 18, 1:13 pm)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Kirill Korotaev, (Fri Nov 19, 7:36 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:00 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:01 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:16 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:25 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:27 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:38 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:55 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Oren Laadan, (Sat Nov 20, 11:05 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Sun Nov 21, 1:18 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Sun Nov 21, 1:21 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Sukadev Bhattiprolu, (Mon Nov 22, 11:02 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Sun Nov 28, 9:09 pm)