Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Oren Laadan
Date: Sunday, November 7, 2010 - 8:55 pm

On 11/07/2010 06:05 PM, Gene Cooperman wrote:

[snip]


Agreed - as long as we are considering the c/r-engine functionality
(and not the "glue" logic to keep apps outside their context after
the restart).

That said, I'm afraid we'll more definitions to what is "reasonable"
than to what is "transparent"...


Distributed c/r is one of the proposed use-cases for linux-cr.

The technique in that paper, BTW, was a userspace glue: during
restart, that glue re-establishes connectivity by using new TCP 
connections, and c/r uses those new sockets in lieu of restoring
the old ones.

For that and other use-cases we designed linux-cr to be flexible
so that it is possible and easy to integrate any userspace glue.


I stand corrected.


[snip]


Wrappers are great (I did TA the w4118 class here...). They are
a powerful tool; however in _our_ context they have downsides:
(a) wrappers add visible overhead (less so for cpu-bound apps,
more so with server apps)
(b) wrappers that do virtualization to a "black-box" API (as
opposed to integrate with the API) are prone to races (see the
paper that I cited before)
(c) wrappers duplicate kernel logic, IMHO unnecessarily (and I
don't refer to the userspace "glue" from above)
(d) wrappers are hard to make hermetic (no escapes) to apps.

IMO, the one excellent reasons to use wrappers is to support
the userspace glue that allows restarted apps to run out of
their original context.


I clearly failed to explain well. Lemme try again:

If you use PTRACE to checkpoint, then you ptrace the target tasks,
peek at and save their state, and then let them resume execution.
The target apps need not collaborate - they are forced by the kernel
to the ptraced state regardless of what they were doing, and resume
execution without knowing what happened.

In linux-cr it works similarly: checkpoint does not require that
the processes be scheduled to run - they don't participate; rather,
external process(es) do the work.

In contrast, IIUC, dmtcp uses syscall wrappers and overloading of
signal(s) in order to make every checkpointed process/thread actively
execute the checkpoint logic. I refer to this as "collaborating"
with the checkpoint operation. (I mentioned the downside of this
requirement above).


Again, I failed to deliver the message: syscall wrappers are not bad.
They have limitations as noted above. Some users won't care, others
may and do.

As for glibc - those wrappers have a set of well defined tasks,
e.g. set errno, hide underlying syscall, caching, threads etc. But
glibc does not try to virtualize pids, for example, nor "spy" after
the processes, so to speak.


Oh... that's not what I meant: 'ltrace skype' fails because skype
tries to protect itself from being reverse-engineered. It doesn't
like ltrace's interposition on some library calls (don't know the
details). (Note that PTRACE doesn't upset skype: 'strace skype'
does work). The point being - userspace wrapping is "escapable".


No tricks - I once tried after a colleague mentioned that skype is
hard to reverse engineer (I thought I could prove him wrong...).


Linux-cr can do live migration - e.g. VDI, move the desktop - in
which case skype's sockets' network stacks are reconstructed,
transparently to both skype (local apps) and the peer (remote apps).
Then, at the destination host and skype continues to work.


I'd assume that if the c/r engine can do the former, then it
will also do the latter. Maybe even it would be useful for dmtcp
to be able to use a couple of syscalls (checkpoint,restart) to
do the base c/r work  :p

Oren.
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Christoph Hellwig, (Tue Nov 2, 2:47 pm)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Christoph Hellwig, (Thu Nov 4, 7:25 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Fri Nov 5, 10:17 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Sukadev Bhattiprolu, (Fri Nov 5, 10:31 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Sun Nov 7, 11:49 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Sun Nov 7, 12:42 pm)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Oren Laadan, (Sun Nov 7, 8:55 pm)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Mon Nov 8, 11:37 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Serge E. Hallyn, (Wed Nov 17, 8:39 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Wed Nov 17, 10:04 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Pavel Emelyanov, (Thu Nov 18, 2:13 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Jose R. Santos, (Thu Nov 18, 1:13 pm)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Kirill Korotaev, (Fri Nov 19, 7:36 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:00 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:01 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:16 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:25 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:27 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:38 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:55 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Sun Nov 21, 1:18 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Sun Nov 21, 1:21 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Sukadev Bhattiprolu, (Mon Nov 22, 11:02 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Sun Nov 28, 9:09 pm)