Cc: Benjamin Herrenschmidt <benh@...>, Pavel Machek <pavel@...>, Rafael J. Wysocki <rjw@...>, Matthew Garrett <mjg59@...>, <linux-kernel@...>, <linux-pm@...>, Alan Stern <stern@...>
Thanks for the detailed reply!
On Jul 09, 2007, at 22:07:15, Nigel Cunningham wrote:
The reason it's _required_ to be done from userspace is that
userspace is the only one which can figure out "These processes need
to run for suspend to work", and then let those processes continue
running after the freeze. The *ONLY* reason this even stops
processes at all is so we can do the post-device-mapper-snapshot code
with very little usably-free RAM (IE: only about 1MB for a standard
desktop system).
Well, you *do* want it to have semi-signal semantics, processes which
receive it must not get back to userspace code so that they don't
start allocating more memory when we're trying to do the freeze. You
also don't want a process to be able to trap it (IE: like SIGSTOP or
SIGKILL).
On the other hand, it should be delivered asynchronously (IE: It
doesn't break an interruptable sleep or respond to most is-a-signal-
present checks). You don't actually care if its sleeping in the
kernel somewhere, just as long as it doesn't allocate much memory.
You would probably need a new signal "SIGFREEZE" which causes the
process to be ignored as runnable the next time they schedule but
never actually gets delivered, and a "SIGUNFREEZE" which does the
reverse. That way userspace could selectively resume processes based
on its policy of "this needs to run for hibernation".
It's userspace's job to know which ones are needed. For example, if
you are hibernating over NFS then you need to resume the various NFS/
RPC daemons and threads.
So they aren't allocating memory when we are doing the device-mapper
snapshot.
It may be simpler, but it really screws up things like cpusets,
processor affinity, etc. It also ties hibernation to the presently
very-flakey CPU-hotplug support, which is probably not what we want.
IMHO if the user pulls a CPU while the box is hibernated, then he/she
gets what he/she deserves. If you really want to support that, then
the user must do the hotplug operation *manually* before suspending.
Anything else is just going to be shooting ourselves in the foot
repeatedly.
You could pretty easily have a spare 128MB swap partition somewhere
which is not used during system operation but is "swapon"ed by
userspace after the COW snapshot to provide extra backing store.
Who says we have to use this swap to write the image? That may be
the default use-case, but it's certainly shouldn't be mandatory.
Really, for the write-image-to-swap case you would just need to
preallocate sufficient memory for the bmap tables beforehand, then
populate them at this phase.
Well, each page is copy-on-write, so the FD reference would always
provide access to the original page data, whereas the processes may
end up copying the page so they can write to it. The trick would be
that shared pages need to remain shared between processes even after
the copy-on-write. This is likely to be the trickiest part.
Yes, but at this point we're basically running a segment of userspace
with full kernel services available. Like I said above we can just
add a dedicated 128MB swap device to provide some spare backing
store. When you start running low on memory it might even page out
to the swap device a part of the atomic copy of kernel memory.
No, but a kernel write to a DM-snapshot calls the DM-snapshot code
which copies the segment(s) to the snapshot device and modifies them
there. Basically disk-filesystem pagecache pages would be synced and
protected by the DM snapshot, while anonymous memory pages would be
CoW-ed. And it might not be an unreasonable requirement to state
that the disk-based filesystems must all be mapped through DM-
snapshot devices (even just straight 1:1 linear mappings), so that
they can be trivially snapshotted.
I suppose you could record swap-outs done to SIGFREEZEd processes
specially, so that they would be swapped in again before resuming
userspace. That would effectively result in the same thing.
Cheers,
Kyle Moffett
-