Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Tejun Heo
Date: Friday, November 5, 2010 - 2:28 am

Hello,

On 11/04/2010 05:44 PM, Gene Cooperman wrote:

Call me skeptical but I still don't see, yet, it being a mainstream
thing (for average sysadmin John and proverbial aunt Tilly).  It
definitely is useful for many different use cases tho.  Hey, but let's
see.


I don't think gdb seeing it is a big deal as long as it's hidden from
the application itself.


I'm probably missing something but can't you stop the application
using PTRACE_ATTACH?  You wouldn't need to hijack a signal or worry
about -EINTR failures (there are some exceptions but nothing really to
worry about).  Also, unless the manager thread needs to be always
online, you can inject manager thread by manipulating the target
process states while taking a snapshot.


Can you please elaborate a bit?  What do you want to see changed?


I see.  I just thought that it would be helpful to have the core part
- which does per-process checkpointing and restoring and corresponds
to the features implemented by in-kernel CR - as a separate thing.  It
already sounds like that is mostly the case.

I don't have much idea about the scope of the whole thing, so please
feel free to hammer senses into me if I go off track.  From what I
read, it seems like once the target process is stopped, dmtcp is able
to get most information necessary from kernel via /proc and other
methods but the paper says that it needs to intercept socket related
calls to gather enough information to recreate them later.  I'm
curious what's missing from the current /proc.  You can map socket to
inode from /proc/*/fd which can be matched to an entry in
/proc/*/net/PROTO to find out the addresses and most socket options
should be readable via getsockopt.  Am I missing something?

I think this is why userland CR implementation makes much more sense.
Most of states visible to a userland process are rather rigidly
defined by standards and, ultimately, ABI and the kernel exports most
of those information to userland one way or the other.  Given the
right set of needed features, most of which are probabaly already
implemented, a userland implementation should have access to most
information necessary to checkpoint without resorting to too messy
methods and then there inevitably needs to be some workarounds to make
CR'd processes behave properly w.r.t. other states on the system, so
userland workarounds are inevitable anyway unless it resorts to
preemtive separation using namespaces and containers, which I frankly
think isn't much of value already and more so going forward.

Thanks.

-- 
tejun
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Christoph Hellwig, (Tue Nov 2, 2:47 pm)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Christoph Hellwig, (Thu Nov 4, 7:25 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Tejun Heo, (Fri Nov 5, 2:28 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Fri Nov 5, 10:17 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Sukadev Bhattiprolu, (Fri Nov 5, 10:31 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Sun Nov 7, 11:49 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Sun Nov 7, 12:42 pm)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Mon Nov 8, 11:37 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Serge E. Hallyn, (Wed Nov 17, 8:39 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Wed Nov 17, 10:04 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Pavel Emelyanov, (Thu Nov 18, 2:13 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Jose R. Santos, (Thu Nov 18, 1:13 pm)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Kirill Korotaev, (Fri Nov 19, 7:36 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:00 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:01 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:16 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:25 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:27 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:38 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Alexey Dobriyan, (Fri Nov 19, 9:55 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Sun Nov 21, 1:18 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Sun Nov 21, 1:21 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Sukadev Bhattiprolu, (Mon Nov 22, 11:02 am)
Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch, Gene Cooperman, (Sun Nov 28, 9:09 pm)