This patch implements the functionality of jumping between the kexeced kernel and the original kernel. To support jumping between two kernels, before jumping to (executing) the new kernel and jumping back to the original kernel, the devices are put into quiescent state, and the state of devices and CPU is saved. After jumping back from kexeced kernel and jumping to the new kernel, the state of devices and CPU are restored accordingly. The devices/CPU state save/restore code of software suspend is called to implement corresponding function. To support jumping without reserving memory. One shadow backup page (source page) is allocated for each page used by new (kexeced) kernel (destination page). When do kexec_load, the image of new kernel is loaded into source pages, and before executing, the destination pages and the source pages are swapped, so the contents of destination pages are backupped. Before jumping to the new (kexeced) kernel and after jumping back to the original kernel, the destination pages and the source pages are swapped too. A jump back protocol for kexec is defined and documented. It is an extension to ordinary function calling protocol. So, the facility provided by this patch can be used to call ordinary C function in real mode. A set of flags for sys_kexec_load are added to control which state are saved/restored before/after real mode code executing. For example, you can specify the device state and FPU state are saved/restored before/after real mode code executing. The states (exclude CPU state) save/restore code can be overridden based on the "command" parameter of kexec jump. Because more states need to be saved/restored by hibernating/resuming. Signed-off-by: Huang Ying <ying.huang@intel.com> --- Documentation/i386/jump_back_protocol.txt | 103 ++++++++++++++ arch/powerpc/kernel/machine_kexec.c | 2 arch/ppc/kernel/machine_kexec.c | 2 arch/sh/kernel/machine_kexec.c | 2 arch/x86/kernel/machine_ke...
Why do we need var arg support? We can't keep the same idt and gdt as the pages they are on will be overwritten/reused. So explictily stomping on them sounds better Why rename relocate_kernel? Ah. I see. You need to make it into a pointer again. The crazy don't stop the pgd support strikes again. It used to be named rnk. More later. Eric --
If all parameters are provided in user space, the usage model may be as follow: - sys_kexec_load() /* with executable/data/parameters(A) loaded */ - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with parameters(A)*/ - /* jump back */ - sys_kexec_load() /* with executable/data/parameters(B) loaded */ - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with parameters(B)*/ - /* jump back */ That is, the kexec image should be re-loaded if the parameters are different, and there can be no state reserved in kexec image. This is OK for original kexec implementation, because there is no jumping back. But, for kexec with jumping back, another usage model may be useful too. - sys_kexec_load() /* with executable/data loaded */ - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,parameters(A)) /* execute physical mode code with parameters(A)*/ - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,parameters(B)) /* execute physical mode code with parameters(B)*/ This way the kexec image need not to be re-loaded, and the state of kexec image can be reserved across several invoking. Another usage model may be useful is invoking the kexec image (such as firmware) from kernel space. - kmalloc the needed memory and loaded the firmware image (if needed) - sys_kexec_load() with a fake image (one segment with size 0), the entry point of the fake image is the entry point of the firmware image. - kexec_call(fake_image, ...) /* maybe change entry point if needed */ This way, some kernel code can invoke the firmware in physical mode just like invoking an ordinary function. The original idea about this code is: If the kexec image is claimed that it need not to "perserving extensive CPU state" (such as FPU/MMX/GDT/LDT/IDT/CS/DS/ES/FS/GS/SS etc), the IDT/GDT/CS/DS/ES/FS/GS/SS are not touched in kexec image code. So the segment registers need not to be set. But this is not clear. At least more description should be provided for You mean I should change the function pointer...
Interesting. We wind up preserving the code in between invocations. I don't know about your particular issue, but I can see that clearly we need a way to read values back from our target image. And if we can read everything back one way to proceed is to read everything out modify it and then write it back. Amending a kexec image that is already stored may also make sense. I'm not convinced that the var arg parameters make sense, but you added them because of a real need. The kexec function is split into two separate calls so that we can unmount the filesystem the kexec image comes from before actually doing the kexec. If extensive user space shutdown or startup is needed I will argue that doing the work in the sys_reboot call is the wrong place to do it. Although if a jump back is happening we should not need much restart. Can you generate a minimal patch with just the minimal necessary That certainly seems interesting. But that doesn't justify the vararg You were changing something that used to be a pointer back to a pointer and I found that confusing. See the last one or two commits to machine_kexec_32.c for when this happened. I get the feeling that we need to put the page table creation logic into machine_kexec_prepare, instead of in assembly. Eric --
Yes. Reading/Modifying the loaded kexec image is another way to do necessary communication between the first kernel and the second kernel. In fact, the patch [4/4] of this series with title: [PATCH 4/4 -mm] kexec based hibernation -v7 : kimgcore provide a ELF CORE file in /proc (/proc/kimgcore) to read the loaded kexec image. The writing function can be added easily. But I think communication between the first kernel and the second kernel via reading/modifying the loaded kernel image is not very convenient way. The usage mode may be as follow: - sys_kexec_load() /* with executable/data loaded */ - modify the loaded kexec image to set the parameters (A) - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with parameters(A)*/ - In physical mode code, check the parameters A and executing accordingly - modify the loaded kexec image to set the parameters (B) - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with parameters(B)*/ - In physical mode code, check the parameters B and executing accordingly There are some issues with this usage model: - Some parameters in kernel needed to be exported (such as the kimage->head to let the second kernel to read the memory contents of backupped memory). - The physical mode code invoker (the first kernel) need to know where to write the parameters. A common protocol or a protocol case by case should be defined. For example, the memory address after the entry point of kexec image is a good candidate. But for Linux kernel, there are two types of entry point, the "jump back entry" or "purgatory". Maybe different protocol should be defined for these two types of entry point. - For the user space of the second kernel to get the parameters. A interface (maybe a file in /proc or /sys) should be provided to export the parameters to user space. So I think the current parameters passing mechanism may be more simple and convenient (defined in Document/i386/jump_back_protocol.txt in the patch). The...
Hi, Why do we need so many different flags for preserving different types of state (CPU, CPU_EXT, Device, console) ? To keep things simple, can't we can create just one flag KEXEC_PRESERVE_CONTEXT, which will indicate any special action required for preserving the previous kernel's context so that one can swith back to old kernel? Thanks Vivek --
Yes. There are too many flags, especially when we have no users of these flags now. It is better to use one flag such as KEXEC_PRESERVE_CONTEXT now, and create the others required flags when really needed. Best Regards, Huang Ying --
Hi, I am just going through your patches and trying to understand it. Don't I need jumping back to restore a already hibernated kernel image? Can Ok, so due to swapping of source and destination pages first kernel's data is still preserved. How do I get the dynamic memory required for second Is 2K sufficient for all the code in relocate_kernel_32.S? What's the Who fills the entry point at offset 0x200? Who is using kexec_call(). I can't seem to locate the caller of it. Thanks Vivek --
Now, the jumping back is used to implement "kexec based hibernation", which uses kexec/kdump to save the memory image of hibernated kernel during hibernating, and uses /dev/oldmem to restore the memory image of hibernated kernel and jump back to the hibernated kernel to continue run. The other usage model maybe include: - Dump the system memory image then continue to run, that is, get some memory snapshot of system during system running. - Cooperative multi-task of different OS. You can load another OS (B) from current OS (A), and jump between the two OSes upon needed. All dynamic memory required for second kernel should be "loaded" by sys_kexec_load in first kernel. For example, not only the Linux kernel should be loaded at 1M, the memory 0~16M (exclude kernel) should be The current size is 0x2d7 (727). I got it though objdump, The entry point is filled by assembler code in reloate_kernel_32.S upon There is no user of kexec_call() now. But I think it may be useful as a physical mode caller for some firmware code. Best Regards, Huang Ying --
I'm not kexec hacker... but maybe this is in good enough state to be merged? It is useful on its own: kexec jump and back means we can dump system then continue running, for example... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
As far as I'm concerned, patches [1/4] and [2/4] can go. The other two are not in that shape yet (especially the [3/4] patch). Greetings, Rafael --
Ok. Then I will see if I can review these in the next couple days and give some feedback. At a quick skim through the code it appears there is some more infrastructure then we need and things can still be simplified. Since this applies in particular to the user space interface I'm not comfortable with these patches going in just yet. The unused KEXEC_PRESERVE_ flags especially give me pause. Having something like that, that isn't currently wired up sounds like a bad place to start. Eric --
| Theodore Tso | Re: -mm merge plans for 2.6.23 -- sys_fallocate |
| Jeff Garzik | Re: [RFC] Heads up on sys_fallocate() |
| Erez Zadok | [UNIONFS] 00/42 Unionfs and related patches review |
| Roland Dreier | Re: Integration of SCST in the mainstream Linux kernel |
git: | |
| Jon Smirl | Re: VCS comparison table |
| Andy Parkins | svn:externals using git submodules |
| Daniel Berlin | Re: Git and GCC |
| Sam Vilain | [PATCH] git-mergetool: add support for ediff |
| Richard Stallman | Real men don't attack straw men |
| Paul de Weerd | Re: Porting OpenBSD to OLPC XO laptops. |
| sonjaya | openvpn on openbsd 4.1 |
| Adliger Martinez von der Unterschicht | linux kills laptop hard drive... how does obsd behave? |
| Gerrit Renker | [PATCH 0/37] dccp: Feature negotiation - last call for comments |
| David Miller | Re: [PATCH] pkt_sched: Destroy gen estimators under rtnl_lock(). |
| Andrew Morton | Re: [Bugme-new] [Bug 11144] New: dhcp doesn't work with iwl4965 |
| Arjan van de Ven | Re: [GIT]: Networking |
