Offering a potential alternative to the existing suspend and restore implementations in the Linux Kernel, Ying Huang posted a patch utilizing kexec, "kexec based hibernation has some potential advantages over uswsusp and suspend2. " He listed two such potential advantages, "the hibernation image size can exceed half of memory size easily," and, "the hibernation image can be written to and read from almost anywhere, such as a USB disk [or] NFS." He described the feature implemented by his patch as "jumping from a kexeced kernel to the original kernel", allowing someone to first boot from one kernel, then to kexec another crashdump kernel in reserved memory and run from it for a while, and finally to "jump back" to the original kernel.
Andrew Morton replied to the idea very positively, "this sounds awesome. Am I correct in expecting that ultimately the existing hibernation implementation just goes away and we reuse (and hence strengthen) the existing kexec (and kdump?) infrastructure? And that we get hibernation support almost for free on all kexec (and relocatable-kernel?) capable architectures? And that all the management of hibernation and resume happens in userspace?" He went on to ask, "how close do you think all this is to being a viable thing?" Ying replied, "the kexec jump is the first step, maybe the simplest step. There are many other issues to be resolved, at least the following ones," going on to list a series of steps that still have to be implemented before kexec based hibernation would be a viable option.
From: Huang, Ying [email blocked] Subject: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation Date: Wed, 11 Jul 2007 15:30:31 +0000 Kexec base hibernation has some potential advantages over uswsusp and suspend2. Some most obvious advantages are: 1. The hibernation image size can exceed half of memory size easily. 2. The hibernation image can be written to and read from almost anywhere, such as USB disk, NFS. This patch implements the functionality of "jumping from kexeced kernel to original kernel". That is, the following sequence is possible: 1. Boot a kernel A 2. Work under kernel A 3. Kexec another kernel B in kernel A 4. Work under kernel B 5. Jump from kernel B to kernel A 6. Continue work under kernel A This is the first step to implement kexec based hibernation. If the memory image of kernel A is written to or read from a permanent media in step 4, a preliminary version of kexec based hibernation can be implemented. The kernel B is run as a crashdump kernel in reserved memory region. This is the biggest constrains of the patch. It is planed to be eliminated in the next version. That is, instead of reserving memory region previously, the needed memory region is backuped before kexec and restored after jumping back. Another constrains of the patch is that the CONFIG_ACPI must be turned off to make kexec jump work. Because ACPI will put devices into low power state, the kexeced kernel can not be booted properly under it. This constrains can be eliminated by separating the suspend method and hibernation method of the devices as proposed earlier in the LKML. The kexec jump is implemented in the framework of software suspend. In fact, the kexec based hibernation can be seen as just implementing the image writing and reading method of software suspend with a kexeced Linux kernel. Now, only the i386 architecture is supported. The patch is based on Linux kernel 2.6.22, and has been tested on my IBM T42. Usage: 1. Compile kernel with following options selected: CONFIG_X86_32=y CONFIG_RELOCATABLE=y # not needed strictly, but it is more convenient with it CONFIG_KEXEC=y CONFIG_SOFTWARE_SUSPEND=y CONFIG_KEXEC_HIBERNATION=y 2. Compile the kexec-tools with kdump and kjump patches added, the kdump patch can be found at: http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-dkump10.patch While, the kexec-tools kjump patch is appended with the mail. 3. Boot compiled kernel, the reserved crash kernel memory region must be added to kernel command line as following: crashkernel=<XX>M@<XX>M Where, <XX> should be replaced by the real memory size and position. Kexec jump - The first step to kexec base hibernation Kexec base hibernation has some potential advantages over uswsusp and suspend2. Some most obvious advantages are: 1. The hibernation image size can exceed half of memory size easily. 2. The hibernation image can be written to and read from almost anywhere, such as USB disk, NFS. This patch implements the functionality of "jumping from kexeced kernel to original kernel". That is, the following sequence is possible: 1. Boot a kernel A 2. Work under kernel A 3. Kexec another kernel B in kernel A 4. Work under kernel B 5. Jump from kernel B to kernel A 6. Continue work under kernel A This is the first step to implement kexec based hibernation. If the memory image of kernel A is written to or read from a permanent media in step 4, a preliminary version of kexec based hibernation can be implemented. The kernel B is run as a crashdump kernel in reserved memory region. This is the biggest constrains of the patch. It is planed to be eliminated in the next version. That is, instead of reserving memory region previously, the needed memory region is backuped before kexec and restored after jumping back. Another constrains of the patch is that the CONFIG_ACPI must be turned off to make kexec jump work. Because ACPI will put devices into low power state, the kexeced kernel can not be booted properly under it. This constrains can be eliminated by separating the suspend method and hibernation method of the devices as proposed earlier in the LKML. The kexec jump is implemented in the framework of software suspend. In fact, the kexec based hibernation can be seen as just implementing the image writing and reading method of software suspend with a kexeced Linux kernel. Now, only the i386 architecture is supported. The patch is based on Linux kernel 2.6.22, and has been tested on my IBM T42. Usage: 1. Compile kernel with following options selected: CONFIG_X86_32=y CONFIG_RELOCATABLE=y # not needed strictly, but it is more convenient with it CONFIG_KEXEC=y CONFIG_SOFTWARE_SUSPEND=y CONFIG_KEXEC_HIBERNATION=y 2. Compile the kexec-tools with kdump and kjump patches added, the kdump patch can be found at: http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-dkump10.patch While, the kexec-tools kjump patch is appended with the mail. 3. Boot compiled kernel, the reserved crash kernel memory region must be added to kernel command line as following: crashkernel=<XX>M@<XX>M Where, <XX> should be replaced by the real memory size and position. 4. Switch hibernation image operations, through shell command as follow: echo kexec > /sys/power/hibernation_image_ops 5. Boot the kexeced kernel as a crashdump kernel, the same kernel can be used if CONFIG_RELOCATABLE=y is selected. The kernel command line option as following must be appended to kernel command line. kexec_jump_buf_pfn=`cat /sys/kernel/kexec_jump_buf_pfn` 6. In the kexec booted kernel, switch hibernation image operations, as in 4. 7. In the kexec booted kernel, trigger the jumping back with following shell command. echo <a>:<b> > /sys/power/resume Where <a> and <b> is non-negative integer, at least one of them must be non-zero. Hibernation image operations This patch make it possible to have multiple implementations of hibernation image operations such as write, read, check, etc, and they can be switched at run time through writing the "/sys/power/hibernation_image_ops". The uswsusp is the default implementation. Signed-off-by: Huang Ying [email blocked] Kexec jump This patch provide the kexec based implementation of hibernation image operation. Now, only jumping between original kernel and kexeced kernel is supported, real image write/read/check will be provided in next patches. Signed-off-by: Huang Ying [email blocked] 4. Switch hibernation image operations, through shell command as follow: echo kexec > /sys/power/hibernation_image_ops 5. Boot the kexeced kernel as a crashdump kernel, the same kernel Kexec jump - The first step to kexec base hibernation
From: Pavel Machek [email blocked] Subject: Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation Date: Wed, 11 Jul 2007 13:13:50 +0200 Hi! Looks interesting... but I was feeling strange dejavu reading this... and that's because you pasted the changelog twice :-). > Kexec base hibernation has some potential advantages over uswsusp and > suspend2. Some most obvious advantages are: > > 1. The hibernation image size can exceed half of memory size easily. Yes. > 2. The hibernation image can be written to and read from almost > anywhere, such as USB disk, NFS. We could do USB disk with uswsusp... NFS would be harder. How fast can kexec boot secondary kernel? > This patch implements the functionality of "jumping from kexeced > kernel to original kernel". That is, the following sequence is > possible: > > 1. Boot a kernel A > 2. Work under kernel A > 3. Kexec another kernel B in kernel A > 4. Work under kernel B > 5. Jump from kernel B to kernel A > 6. Continue work under kernel A Nice! > 2. Compile the kexec-tools with kdump and kjump patches added, the > kdump patch can be found at: > > http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-dkump10.patch I got 404 error :-(. > 3. Boot compiled kernel, the reserved crash kernel memory region must > be added to kernel command line as following: > > crashkernel=<XX>M@<XX>M > > Where, <XX> should be replaced by the real memory size and position. How much memory do you suggest to reserve? 64M? > 7. In the kexec booted kernel, trigger the jumping back with following > shell command. > > echo <a>:<b> > /sys/power/resume > > Where <a> and <b> is non-negative integer, at least one of them must > be non-zero. What does a and b mean? [Was it more than three copies? If they were non-identical, assume I read some random one]. > +/* Adds the kexec_backup= command line parameter to command line. */ > +static int cmdline_add_backup(char *cmdline, unsigned long addr) > +{ > + int cmdlen, len, align = 1024; > + char str[30], *ptr; > + > + /* Passing in kexec_backup=xxxK format. Saves space required in cmdline. > + * Ensure 1K alignment*/ > + if (addr%align) > + return -1; > + addr = addr/align; > + ptr = str; > + strcpy(str, " kexec_backup="); > + ptr += strlen(str); > + ultoa(addr, ptr); > + strcat(str, "K"); > + len = strlen(str); > + cmdlen = strlen(cmdline) + len; > + if (cmdlen > (COMMAND_LINE_SIZE - 1)) > + die("Command line overflow\n"); > + strcat(cmdline, str); > +#if 0 > + printf("Command line after adding backup\n"); > + printf("%s\n", cmdline); > +#endif > + return 0; > +} printf()? ...and please remove out commented code. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
From: Huang, Ying [email blocked] To: Pavel Machek [email blocked] Subject: Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation Date: Thu, 12 Jul 2007 16:28:54 +0000 On Wed, 2007-07-11 at 13:13 +0200, Pavel Machek wrote: > Hi! > > Looks interesting... but I was feeling strange dejavu reading > this... and that's because you pasted the changelog twice :-). > Sorry, I should have re-checked the mail before sending out. > How fast can kexec boot secondary kernel? I measure it on my IBM T42 with CONFIG_PRINTK_TIME=y. The boot-up time is about 4.35s from kexec is issued to root mounted in kexec kernel. I think it is possible to optimize. Maybe the kexec kernel can be hibernate/resume by the normal kernel too. This way, a real kexec/boot-up is only needed for the first time. > > 2. Compile the kexec-tools with kdump and kjump patches added, the > > kdump patch can be found at: > > > > http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-dkump10.patch > > I got 404 error :-(. Sorry, typo problem. The URL should be: http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-kdump10.patch > > 3. Boot compiled kernel, the reserved crash kernel memory region must > > be added to kernel command line as following: > > > > crashkernel=<XX>M@<XX>M > > > > Where, <XX> should be replaced by the real memory size and position. > > How much memory do you suggest to reserve? 64M? I reserved 16M RAM. I think this is sufficient for a simple disk based hibernation. > > 7. In the kexec booted kernel, trigger the jumping back with following > > shell command. > > > > echo <a>:<b> > /sys/power/resume > > > > Where <a> and <b> is non-negative integer, at least one of them must > > be non-zero. > > What does a and b mean? > a and b has no meaning. They are only used to trigger the resume process. This will be fixed in the future version. > > +/* Adds the kexec_backup= command line parameter to command line. */ > > +static int cmdline_add_backup(char *cmdline, unsigned long addr) > > +{ > > + int cmdlen, len, align = 1024; > > + char str[30], *ptr; > > + > > + /* Passing in kexec_backup=xxxK format. Saves space required in cmdline. > > + * Ensure 1K alignment*/ > > + if (addr%align) > > + return -1; > > + addr = addr/align; > > + ptr = str; > > + strcpy(str, " kexec_backup="); > > + ptr += strlen(str); > > + ultoa(addr, ptr); > > + strcat(str, "K"); > > + len = strlen(str); > > + cmdlen = strlen(cmdline) + len; > > + if (cmdlen > (COMMAND_LINE_SIZE - 1)) > > + die("Command line overflow\n"); > > + strcat(cmdline, str); > > +#if 0 > > + printf("Command line after adding backup\n"); > > + printf("%s\n", cmdline); > > +#endif > > + return 0; > > +} > > printf()? ...and please remove out commented code. This is patch against kexec-tools, which is a userspace tool. So printf is used. The commented code is just as other cmdline_add_xxx functions of kexec-tools. But it is useless, and should be removed. Best Regards, Huang Ying
From: Andrew Morton [email blocked] Subject: Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation Date: Wed, 11 Jul 2007 17:22:43 -0700 On Wed, 11 Jul 2007 15:30:31 +0000 "Huang, Ying" wrote: > Kexec base hibernation has some potential advantages over uswsusp and > suspend2. Some most obvious advantages are: > > 1. The hibernation image size can exceed half of memory size easily. > 2. The hibernation image can be written to and read from almost > anywhere, such as USB disk, NFS. > > This patch implements the functionality of "jumping from kexeced > kernel to original kernel". That is, the following sequence is > possible: > > 1. Boot a kernel A > 2. Work under kernel A > 3. Kexec another kernel B in kernel A > 4. Work under kernel B > 5. Jump from kernel B to kernel A > 6. Continue work under kernel A > > This is the first step to implement kexec based hibernation. If the > memory image of kernel A is written to or read from a permanent media > in step 4, a preliminary version of kexec based hibernation can be > implemented. > > The kernel B is run as a crashdump kernel in reserved memory > region. This is the biggest constrains of the patch. It is planed to > be eliminated in the next version. That is, instead of reserving memory > region previously, the needed memory region is backuped before kexec > and restored after jumping back. > > Another constrains of the patch is that the CONFIG_ACPI must be turned > off to make kexec jump work. Because ACPI will put devices into low > power state, the kexeced kernel can not be booted properly under > it. This constrains can be eliminated by separating the suspend method > and hibernation method of the devices as proposed earlier in the LKML. > > The kexec jump is implemented in the framework of software suspend. In > fact, the kexec based hibernation can be seen as just implementing the > image writing and reading method of software suspend with a kexeced > Linux kernel. > > Now, only the i386 architecture is supported. The patch is based on > Linux kernel 2.6.22, and has been tested on my IBM T42. This sounds awesome. Am I correct in expecting that ultimately the existing hibernation implementation just goes away and we reuse (and hence strengthen) the existing kexec (and kdump?) infrastructure? And that we get hibernation support almost for free on all kexec (and relocatable-kernel?) capable architectures? And that all the management of hibernation and resume happens in userspace? I didn't understand the ACPI problem. Does this mean that CONFIG_ACPI must be disabled in the to-be-hibernated kernel, or in the little transient kexec kernel? How close do you think all this is to being a viable thing?
From: Huang, Ying [email blocked] To: Andrew Morton [email blocked] Subject: Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation Date: Thu, 12 Jul 2007 14:43:43 +0000 On Wed, 2007-07-11 at 17:22 -0700, Andrew Morton wrote: > This sounds awesome. Am I correct in expecting that ultimately the > existing hibernation implementation just goes away and we reuse (and hence > strengthen) the existing kexec (and kdump?) infrastructure? > And that we get hibernation support almost for free on all kexec (and > relocatable-kernel?) capable architectures? > And that all the management of hibernation and resume happens in userspace? Yes. Ultimately, most of the hibernation code such as process freezer, memory shrinking, memory snapshot (atomic copy), image reading/writing can go away, because kexec based hibernation doesn't depend on them. Just the device/CPU state quiescent/save/restore is necessary to remain. And, the management of hibernation and resume will happen in userspace. > > I didn't understand the ACPI problem. Does this mean that CONFIG_ACPI must > be disabled in the to-be-hibernated kernel, or in the little transient > kexec kernel? Under current implementation of device state quiescent/save/restore, the CONFIG_ACPI must be turned off both in to-be-hibernated kernel and transient kexec kernel. But the hibernation people are going to separate the device suspend from device hibernate. The device hibernate will put device in quiescent state but not in low power state. When this is done, it is not necessary to disable CONFIG_ACPI at all. It is just a workaround for current implementation that disabling CONFIG_ACPI. > How close do you think all this is to being a viable thing? The kexec jump is the first step, maybe the simplest step. There are many other issues to be resolved, at least the following ones. 1. Separate device suspend from device hibernate. 2. Do not reserve memory for kexec kernel. That is, backup needed memory before kexec and restore them after kexec. 3. Support the in-place kexec? The relocatable kernel is not necessary if this can be implemented. 4. Image writing/reading. (Only user space application is needed). 5. A smooth resume process. Maybe it is not needed to kexec a new kernel for resume. For example, in the first stage of kernel boot, just first 16M (or a little more) RAM is used, if the resume image is found, the saved kernel image is resumed; if the resume image is not found, turn on the remaining RAM. This will depends on 3. 6. Reduce the boot-up time of kexec kernel. Maybe the kexec kernel can be hibernate/resume by the normal kernel too. This way, a real kexec/boot-up is only needed for the first time. Best Regards, Huang, Ying
From: [email blocked] (Eric W. Biederman) Subject: Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation Date: Thu, 12 Jul 2007 10:32:45 -0600 I like the concept, but I completely disagree with your current implementation. I think it will be much easier if you start with a completely independent code path and then just reuse the pieces of the existing code path that you need. More details below. "Huang, Ying" writes: > On Wed, 2007-07-11 at 17:22 -0700, Andrew Morton wrote: >> This sounds awesome. Am I correct in expecting that ultimately the >> existing hibernation implementation just goes away and we reuse (and hence >> strengthen) the existing kexec (and kdump?) infrastructure? >> And that we get hibernation support almost for free on all kexec (and >> relocatable-kernel?) capable architectures? >> And that all the management of hibernation and resume happens in userspace? > > Yes. Ultimately, most of the hibernation code such as process freezer, > memory shrinking, memory snapshot (atomic copy), image reading/writing > can go away, because kexec based hibernation doesn't depend on them. > Just the device/CPU state quiescent/save/restore is necessary to remain. > And, the management of hibernation and resume will happen in userspace. > >> I didn't understand the ACPI problem. Does this mean that CONFIG_ACPI must >> be disabled in the to-be-hibernated kernel, or in the little transient >> kexec kernel? > > Under current implementation of device state quiescent/save/restore, the > CONFIG_ACPI must be turned off both in to-be-hibernated kernel and > transient kexec kernel. > > But the hibernation people are going to separate the device suspend from > device hibernate. The device hibernate will put device in quiescent > state but not in low power state. When this is done, it is not necessary > to disable CONFIG_ACPI at all. It is just a workaround for current > implementation that disabling CONFIG_ACPI. > >> How close do you think all this is to being a viable thing? > > The kexec jump is the first step, maybe the simplest step. There are > many other issues to be resolved, at least the following ones. > > 1. Separate device suspend from device hibernate. Actually in some very practical sense we already have two copies of this in the kernel. device_shutdown and the hotunplug/module remove code. So it is should be mostly a matter of using what we have. Basically all this entails is to modify sys_reboot() and adding a LINUX_REBOOT_CMD_KSPAWN and have that command enter the kexec path with the appropriate set of calls. I would be really surprised if this winds up with much more code then the current kernel_kexec function. This might wind up exactly the same as the current LINUX_REBOOT_CMD_KEXEC but at least until we have a working prototype it makes sense to allow for differences. This should allow the kexec based implementation to coincide with the existing software suspend to disk code until it is proven out and then we can just remove all of the software suspend code to disk code. > 2. Do not reserve memory for kexec kernel. That is, backup needed memory > before kexec and restore them after kexec. > 3. Support the in-place kexec? The relocatable kernel is not necessary > if this can be implemented. It sounds like what you really want is the normal kexec path enhanced so that you can return to the kernel you started with. The normal kexec path already knows how to do the memory shuffle so it can do on demand memory allocation. That code just needs to enhanced slightly so that you allocate an extra page, setup an inverse scatter gather list for restoring the pages, and teach relocate_kernel.S to preserve it's destination pages by using the inverse scatter gather list. The normal kexec path already calls device_shutdown and the like to stop devices from running. Although again that code path is not prepared to restore the devices. ... For prototyping I would: - reserve a chunk of memory (possibly with the crashkernel= option) and run a relocatable kernel out of it. By using the normal kexec you can boot a relocatable restore kernel in that reserved region. It is an extra step but it makes things work today. - I would use the normal sys_kexec_load. - I would debug/tweak the user space and the code to reenter the old kernel. I.e. the device driver stop/start code. Once it was basically working I would the update normal kexec memory copy code in relocate.S to preserve the destination pages. > 4. Image writing/reading. (Only user space application is needed). And possibly a few fixes to /dev/mem. This is pretty much the same process as generating a core dump so there should be some synergy with that. We probably want to use something like the ELF header the crashdump path uses to communicate to the kernel saving memory which memory regions need to be saved. Which probably means that we you can use the exact same method as the kexec on panic kernel uses to save memory. > 5. A smooth resume process. Maybe it is not needed to kexec a new kernel > for resume. For example, in the first stage of kernel boot, just first > 16M (or a little more) RAM is used, if the resume image is found, the > saved kernel image is resumed; if the resume image is not found, turn on > the remaining RAM. This will depends on 3. Well I expect the resume will be load the resumed kernel into reserved memory. And kexec a very small assembly stub that will jump back to the code in relocate_kernel.S which will call ret. Then either hot add the rest of our memory or kexec to a kernel without restrictions. > 6. Reduce the boot-up time of kexec kernel. Maybe the kexec kernel can > be hibernate/resume by the normal kernel too. This way, a real > kexec/boot-up is only needed for the first time. Well just not loading drivers you aren't going to use and generally avoiding long disk probing times will help here. We control all of the code so it should be relatively straight forward. Eric
From: [email blocked] To: "Eric W. Biederman" [email blocked] Subject: Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation Date: Thu, 12 Jul 2007 12:09:31 -0700 (PDT) On Thu, 12 Jul 2007, Eric W. Biederman wrote: >> 2. Do not reserve memory for kexec kernel. That is, backup needed memory >> before kexec and restore them after kexec. >> 3. Support the in-place kexec? The relocatable kernel is not necessary >> if this can be implemented. > > It sounds like what you really want is the normal kexec path enhanced > so that you can return to the kernel you started with. > > The normal kexec path already knows how to do the memory shuffle so > it can do on demand memory allocation. That code just needs to > enhanced slightly so that you allocate an extra page, setup an inverse > scatter gather list for restoring the pages, and teach relocate_kernel.S > to preserve it's destination pages by using the inverse scatter gather > list. > > The normal kexec path already calls device_shutdown and the like to > stop devices from running. Although again that code path is not > prepared to restore the devices. we shouldn't need a restore code path if the new kernel re-detects everything. if kexec already shuts down all the devices we may not need to implement anything new here (although there may be room for future performance optimization) > ... > > For prototyping I would: > - reserve a chunk of memory (possibly with the crashkernel= option) > and run a relocatable kernel out of it. > > By using the normal kexec you can boot a relocatable restore kernel > in that reserved region. It is an extra step but it makes things > work today. > > - I would use the normal sys_kexec_load. > > - I would debug/tweak the user space and the code to reenter the > old kernel. I.e. the device driver stop/start code. > > Once it was basically working I would the update normal kexec > memory copy code in relocate.S to preserve the destination pages. for prototyping there's no need to use the same kernel. >> 4. Image writing/reading. (Only user space application is needed). > > And possibly a few fixes to /dev/mem. This is pretty much the same > process as generating a core dump so there should be some synergy with that. what fixes are you thinking of? you are makeing this sound very simple ;-) David Lang
From: [email blocked] (Eric W. Biederman) To: david Subject: Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation Date: Thu, 12 Jul 2007 13:49:59 -0600 david lang writes: > On Thu, 12 Jul 2007, Eric W. Biederman wrote: > >>> 2. Do not reserve memory for kexec kernel. That is, backup needed memory >>> before kexec and restore them after kexec. >>> 3. Support the in-place kexec? The relocatable kernel is not necessary >>> if this can be implemented. >> >> It sounds like what you really want is the normal kexec path enhanced >> so that you can return to the kernel you started with. >> >> The normal kexec path already knows how to do the memory shuffle so >> it can do on demand memory allocation. That code just needs to >> enhanced slightly so that you allocate an extra page, setup an inverse >> scatter gather list for restoring the pages, and teach relocate_kernel.S >> to preserve it's destination pages by using the inverse scatter gather >> list. >> >> The normal kexec path already calls device_shutdown and the like to >> stop devices from running. Although again that code path is not >> prepared to restore the devices. > > we shouldn't need a restore code path if the new kernel re-detects > everything. if kexec already shuts down all the devices we may not need to > implement anything new here (although there may be room for future performance > optimization) Yes, reusing device hotplug... You still need the code path for little things and to kick of the device redetection but if you get lucky it won't have to do much. Of course speed is important. >> ... >> >> For prototyping I would: >> - reserve a chunk of memory (possibly with the crashkernel= option) >> and run a relocatable kernel out of it. >> >> By using the normal kexec you can boot a relocatable restore kernel >> in that reserved region. It is an extra step but it makes things >> work today. >> >> - I would use the normal sys_kexec_load. >> >> - I would debug/tweak the user space and the code to reenter the >> old kernel. I.e. the device driver stop/start code. >> >> Once it was basically working I would the update normal kexec >> memory copy code in relocate.S to preserve the destination pages. > > for prototyping there's no need to use the same kernel. > >>> 4. Image writing/reading. (Only user space application is needed). >> >> And possibly a few fixes to /dev/mem. This is pretty much the same >> process as generating a core dump so there should be some synergy with that. > > what fixes are you thinking of? Don't really know. I figured /dev/mem was sufficient but the kexec on panic folks tell me it doesn't work for areas we have told the kernel isn't memory, I haven't had time so I haven't pushed it. > you are makeing this sound very simple ;-) Which is the primary point of using kexec. If it isn't simple then we are doing something wrong... Eric
From: Jeremy Fitzhardinge [email blocked] To: Andrew Morton [email blocked] Subject: Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation Date: Wed, 11 Jul 2007 22:48:06 -0700 Andrew Morton wrote: > On Wed, 11 Jul 2007 15:30:31 +0000 > "Huang, Ying" wrote: > >> 1. Boot a kernel A >> 2. Work under kernel A >> 3. Kexec another kernel B in kernel A >> 4. Work under kernel B >> 5. Jump from kernel B to kernel A >> 6. Continue work under kernel A >> >> This is the first step to implement kexec based hibernation. If the >> memory image of kernel A is written to or read from a permanent media >> in step 4, a preliminary version of kexec based hibernation can be >> implemented. >> >> The kernel B is run as a crashdump kernel in reserved memory >> region. This is the biggest constrains of the patch. It is planed to >> be eliminated in the next version. That is, instead of reserving memory >> region previously, the needed memory region is backuped before kexec >> and restored after jumping back. >> >> Another constrains of the patch is that the CONFIG_ACPI must be turned >> off to make kexec jump work. Because ACPI will put devices into low >> power state, the kexeced kernel can not be booted properly under >> it. This constrains can be eliminated by separating the suspend method >> and hibernation method of the devices as proposed earlier in the LKML. >> >> The kexec jump is implemented in the framework of software suspend. In >> fact, the kexec based hibernation can be seen as just implementing the >> image writing and reading method of software suspend with a kexeced >> Linux kernel. >> I guess I'm (still) confused by the terminology here. Do you mean that it fits into suspend-to-disk as a disk-writing mechanism, or in suspend-to-ram as a way of going to sleep? >> Now, only the i386 architecture is supported. The patch is based on >> Linux kernel 2.6.22, and has been tested on my IBM T42. >> > > This sounds awesome. Am I correct in expecting that ultimately the > existing hibernation implementation just goes away and we reuse (and hence > strengthen) the existing kexec (and kdump?) infrastructure? > > And that we get hibernation support almost for free on all kexec (and > relocatable-kernel?) capable architectures? > > And that all the management of hibernation and resume happens in userspace? > > I didn't understand the ACPI problem. Does this mean that CONFIG_ACPI must > be disabled in the to-be-hibernated kernel, or in the little transient > kexec kernel? > I think the point is that if kernel A says "I'm suspending" and calls the suspend method on all its devices, then kernel B finds that it has no powered on devices to work with. But then couldn't it turn on the ones it wants anyway? And don't you want to suspend them, to make sure they're not still DMAing memory while B is trying to shuffle everything off to disk? It does sound pretty cool. J
From: [email blocked] To: Jeremy Fitzhardinge [email blocked] Subject: Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation Date: Wed, 11 Jul 2007 23:43:39 -0700 (PDT) On Wed, 11 Jul 2007, Jeremy Fitzhardinge wrote: > Andrew Morton wrote: >> On Wed, 11 Jul 2007 15:30:31 +0000 >> "Huang, Ying" wrote: >> >> > 1. Boot a kernel A >> > 2. Work under kernel A >> > 3. Kexec another kernel B in kernel A >> > 4. Work under kernel B >> > 5. Jump from kernel B to kernel A >> > 6. Continue work under kernel A >> > >> > This is the first step to implement kexec based hibernation. If the >> > memory image of kernel A is written to or read from a permanent media >> > in step 4, a preliminary version of kexec based hibernation can be >> > implemented. >> > >> > The kernel B is run as a crashdump kernel in reserved memory >> > region. This is the biggest constrains of the patch. It is planed to >> > be eliminated in the next version. That is, instead of reserving memory >> > region previously, the needed memory region is backuped before kexec >> > and restored after jumping back. >> > >> > Another constrains of the patch is that the CONFIG_ACPI must be turned >> > off to make kexec jump work. Because ACPI will put devices into low >> > power state, the kexeced kernel can not be booted properly under >> > it. This constrains can be eliminated by separating the suspend method >> > and hibernation method of the devices as proposed earlier in the LKML. >> > >> > The kexec jump is implemented in the framework of software suspend. In >> > fact, the kexec based hibernation can be seen as just implementing the >> > image writing and reading method of software suspend with a kexeced >> > Linux kernel. >> > > > I guess I'm (still) confused by the terminology here. Do you mean that it > fits into suspend-to-disk as a disk-writing mechanism, or in suspend-to-ram > as a way of going to sleep? Suspend-to-ram involves stopping the system and shutting down devices to go into low-power mode, then on wakeup restarting devices and resuming operation so the steps would be. 1. stop userspace 2. walk the system device tree and put devices to sleep 3. go into the lowest power mode available and wait for a wakeup signal later 4. walk the system device tree and wake up devices 5. resume userspace scheduling. note that what devices get put to sleep could be configurable, potentially to the extreme of things like the OLPC (that have hardware designed for cheap sleeping) going into a light suspend-to-ram state between keystrokes if nothing else has a timer event scheduled before that. Suspend-do-disk (Hibernate) involves stopping the system, makeing a snapshot of ram, writing the snapshot to somewhere and powering off the box. on wakeup (power-on) a helper kernel boots, loads the snapshot into ram and jumps to the kernel in the snapshot to resume operation. as I understand the proposal the thought is to do the following 1. system kernel does suspend-to-ram to put the devices into a known safe state. 2. system kernel uses kexec to start hibernate kernel 3. hibernate kernel wakes up devices it needs as if it was doing a resume-from-ram 4. hibernate kernel copies ram image somewhere 5. hibernate kernel shuts down the box later 6. hibernate kernel boots 7. hibernate kernel copies ram image from somewhere 8. hibernate kernel does syspend-to-ram to put the devices into a known safe state. 9. hibernate kernel uses kexec to start system kernel 10. system kernel wakes up devices it needs as if it was doing a resume-from-ram. >> > Now, only the i386 architecture is supported. The patch is based on >> > Linux kernel 2.6.22, and has been tested on my IBM T42. >> > >> >> This sounds awesome. Am I correct in expecting that ultimately the >> existing hibernation implementation just goes away and we reuse (and hence >> strengthen) the existing kexec (and kdump?) infrastructure? >> >> And that we get hibernation support almost for free on all kexec (and >> relocatable-kernel?) capable architectures? >> >> And that all the management of hibernation and resume happens in >> userspace? this is the thought. >> I didn't understand the ACPI problem. Does this mean that CONFIG_ACPI >> must >> be disabled in the to-be-hibernated kernel, or in the little transient >> kexec kernel? >> > > I think the point is that if kernel A says "I'm suspending" and calls the > suspend method on all its devices, then kernel B finds that it has no powered > on devices to work with. But then couldn't it turn on the ones it wants > anyway? And don't you want to suspend them, to make sure they're not still > DMAing memory while B is trying to shuffle everything off to disk? I don't understand the ACPI problem so I can't try to clarify it. > It does sound pretty cool. re-useing existing components in new ways, making it so that particular problems only have to be solved once and that solution is used repeatedly. there's a lot to like about this approach. very cool. David Lang
From: "Rafael J. Wysocki" [email blocked] Subject: Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation Date: Thu, 12 Jul 2007 14:46:13 +0200 On Thursday, 12 July 2007 08:43, [email blocked] wrote: > On Wed, 11 Jul 2007, Jeremy Fitzhardinge wrote: > > > Andrew Morton wrote: > >> On Wed, 11 Jul 2007 15:30:31 +0000 > >> "Huang, Ying" wrote: > >> > >> > 1. Boot a kernel A > >> > 2. Work under kernel A > >> > 3. Kexec another kernel B in kernel A > >> > 4. Work under kernel B > >> > 5. Jump from kernel B to kernel A > >> > 6. Continue work under kernel A > >> > > >> > This is the first step to implement kexec based hibernation. If the > >> > memory image of kernel A is written to or read from a permanent media > >> > in step 4, a preliminary version of kexec based hibernation can be > >> > implemented. > >> > > >> > The kernel B is run as a crashdump kernel in reserved memory > >> > region. This is the biggest constrains of the patch. It is planed to > >> > be eliminated in the next version. That is, instead of reserving memory > >> > region previously, the needed memory region is backuped before kexec > >> > and restored after jumping back. > >> > > >> > Another constrains of the patch is that the CONFIG_ACPI must be turned > >> > off to make kexec jump work. Because ACPI will put devices into low > >> > power state, the kexeced kernel can not be booted properly under > >> > it. This constrains can be eliminated by separating the suspend method > >> > and hibernation method of the devices as proposed earlier in the LKML. > >> > > >> > The kexec jump is implemented in the framework of software suspend. In > >> > fact, the kexec based hibernation can be seen as just implementing the > >> > image writing and reading method of software suspend with a kexeced > >> > Linux kernel. > >> > > > > > I guess I'm (still) confused by the terminology here. Do you mean that it > > fits into suspend-to-disk as a disk-writing mechanism, or in suspend-to-ram > > as a way of going to sleep? > > Suspend-to-ram involves stopping the system and shutting down devices to > go into low-power mode, then on wakeup restarting devices and resuming > operation > > so the steps would be. > > 1. stop userspace > > 2. walk the system device tree and put devices to sleep > > 3. go into the lowest power mode available and wait for a wakeup signal > > later > > 4. walk the system device tree and wake up devices > > 5. resume userspace scheduling. Note that we are going to phase out steps 1 and 5. > note that what devices get put to sleep could be configurable, potentially > to the extreme of things like the OLPC (that have hardware designed for > cheap sleeping) going into a light suspend-to-ram state between keystrokes > if nothing else has a timer event scheduled before that. > > Suspend-do-disk (Hibernate) involves stopping the system, makeing a > snapshot of ram, writing the snapshot to somewhere and powering off the > box. on wakeup (power-on) a helper kernel boots, loads the snapshot into > ram and jumps to the kernel in the snapshot to resume operation. > > as I understand the proposal the thought is to do the following > > 1. system kernel does suspend-to-ram to put the devices into a known safe > state. Not necessarily suspend-to-RAM. I'd much prefer it if devices were not put into low power states but quiesced (ie. no DMA, no interrupts). > 2. system kernel uses kexec to start hibernate kernel > > 3. hibernate kernel wakes up devices it needs as if it was doing a > resume-from-ram I think that the devices should be initialized from scratch in this step. > 4. hibernate kernel copies ram image somewhere In this step some userland may be involved (started from the "hibernate" kernel). > 5. hibernate kernel shuts down the box > > later > > 6. hibernate kernel boots > > 7. hibernate kernel copies ram image from somewhere > > 8. hibernate kernel does syspend-to-ram to put the devices into a known > safe state. Again, the devices should be quiesced rather then suspended in this step. > 9. hibernate kernel uses kexec to start system kernel > > 10. system kernel wakes up devices it needs as if it was doing a > resume-from-ram. I think it should reconfigure devices from scratch (ie. reprobe). > >> > Now, only the i386 architecture is supported. The patch is based on > >> > Linux kernel 2.6.22, and has been tested on my IBM T42. > >> > > >> > >> This sounds awesome. Am I correct in expecting that ultimately the > >> existing hibernation implementation just goes away and we reuse (and hence > >> strengthen) the existing kexec (and kdump?) infrastructure? > >> > >> And that we get hibernation support almost for free on all kexec (and > >> relocatable-kernel?) capable architectures? > >> > >> And that all the management of hibernation and resume happens in > >> userspace? > > this is the thought. > > >> I didn't understand the ACPI problem. Does this mean that CONFIG_ACPI > >> must > >> be disabled in the to-be-hibernated kernel, or in the little transient > >> kexec kernel? > >> > > > > I think the point is that if kernel A says "I'm suspending" and calls the > > suspend method on all its devices, then kernel B finds that it has no powered > > on devices to work with. But then couldn't it turn on the ones it wants > > anyway? And don't you want to suspend them, to make sure they're not still > > DMAing memory while B is trying to shuffle everything off to disk? > > I don't understand the ACPI problem so I can't try to clarify it. > > > It does sound pretty cool. > > re-useing existing components in new ways, making it so that particular > problems only have to be solved once and that solution is used repeatedly. > there's a lot to like about this approach. > > very cool. Well, I'm not a big fan of it right now, but well, it looks doable in general. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth
From: Mark Lord [email blocked] To: "Rafael J. Wysocki" [email blocked] Subject: Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation Date: Thu, 12 Jul 2007 09:51:54 -0400 Rafael J. Wysocki wrote: > On Thursday, 12 July 2007 08:43, [email blocked] wrote: >> On Wed, 11 Jul 2007, Jeremy Fitzhardinge wrote: >> >>> Andrew Morton wrote: .. >> 8. hibernate kernel does syspend-to-ram to put the devices into a known >> safe state. > Again, the devices should be quiesced rather then suspended in this step. That's just not possible. The Hibernate kernel will not have all of the same device drivers as the mainline kernel. Or at least that's what people have previously stated here. .. >>>> This sounds awesome. Am I correct in expecting that ultimately the >>>> existing hibernation implementation just goes away and we reuse (and hence >>>> strengthen) the existing kexec (and kdump?) infrastructure? No, not so simple. We still need much of the code to santize devices upon wakeup from hibernation. And adding this extra reboot-kernel step in the midst of hibernate will double the time it takes (ugh). Currently, TuxOnIce(suspend2) takes about 10 seconds to suspend my notebook. Switching to this new scheme would double that to 10 seconds to boot/probe, plus the original 10 seconds to hibernate. Assuming the new implementation even comes close to suspend2 speed. And the complexity and difficulty of setup really scares me. Right now, we've got a pretty good/fast in-kernel (well, external patch) that allows my machines to hibernate very quickly, wake up even faster, and not swap like mad afterwards. Without any external programs, initramfs, or extra kernels required. And we want to replace this with an ultra-complex setup because.. ???? Cheers
From: Pavel Machek [email blocked] To: Mark Lord [email blocked] Subject: Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation Date: Thu, 12 Jul 2007 14:49:12 +0000 Hi! > And the complexity and difficulty of setup really scares > me. > Right now, we've got a pretty good/fast in-kernel (well, > external patch) > that allows my machines to hibernate very quickly, wake > up even faster, > and not swap like mad afterwards. Without any external > programs, > initramfs, or extra kernels required. > > And we want to replace this with an ultra-complex setup > because.. ???? ...freezer does not work with fuse :-). Or more exactly because freezer is ugly, and we don't know how to get rid of it... (Not that I advocate kexec-based hibernation. I think it is going to suck. But it might allow kdump-but-keep-running, so work is not wasted). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
ETOOCOMPLEX
Okay. Admittedly I am far from being a kernel guru, however I fail to see the benefits of this added complexity. From what I understand, tasks running on one kernel (the initially booted one) are moved over to a smaller secondary kernel, and the first kernel is then put to sleep. However basic things like ACPI have to be shut off for this to work, and the second kernel is using the suspend code you would be using under the main kernel anyway?
It just seems like you are jumping through more hoops and gaining very little in return.
This is either so incredibly ingenious that my untrained little mind cannot see the beauty of it, or you have to be a functional retard for it to make any sense.
Not.
You didn't got it. Initially booted kernel works all the time, task aren't moved. When suspend comes, second kernel is run. Second kernel saves all the memory (including first kernel) i shuts down system. On resume, memory is restored and execution gets back to where first kernel jumped to second kernel. It simpler - no stopping tasks, no freezing.
ACPI gets into way because when first kernel prepares hardware to suspend, it prepares it too much -- devices are powered off. And with powered off disk controller you can't save memory image to disk :) ACPI methods for preparing to suspend will be split: first half will only prepare hardware, and second half will poweroff it. When suspending with double kernel, only first methods will be called.
--
:wq
What Not.
I have to say I love this explanation. It would explain, or seem to explain, why my laptop doesn't wake up from sleep or hibernate very well.
"ACPI gets into way because when first kernel prepares hardware to suspend, it prepares it too much -- devices are powered off. And with powered off disk controller you can't save memory image to disk :)"
That is one of those, "OK, lift up your right leg. OK, now, while still holding your right leg up, lift up your left leg. Remain steadily in this position until I give the word."
Is it logical errors like this that cause some of these suspend and hibernate problems? They get fixed on a laptop by laptop basis by patches that get around the basic logical error?
I don't know, it is just that the thought of the software first shutting the disk controller down and then trying to write to the disk makes a person laugh. That software tries the same thing every time and never learns anything. I guess it could if someone knew how to make it learn as it goes along.
OK, so is this saying that you have a second kernel all fixed up to hibernate or suspend right, and that is the one you use to do the suspending or hibernating, and you have your other kernel that is optimized for doing regular Linux tasks? This is kind of a system where you use a kernel that will actually suspend and wake up well, and use that to avoid the problems to suspending that exist in the kernel that is more set up for regular Linux work? I don't care, if it works, do it.