Re: [linux-pm] [PATCH -mm] kexec jump -v9

Previous thread: [PATCH 07/30] mm: allow PF_MEMALLOC from softirq context by Peter Zijlstra on Thursday, March 20, 2008 - 1:10 pm. (1 message)

Next thread: [PATCH 01/30] swap over network documentation by Peter Zijlstra on Thursday, March 20, 2008 - 1:10 pm. (2 messages)
From: Pavel Machek
Date: Thursday, March 20, 2008 - 3:40 am

Feel free to help with testing.

I believe ACPI is simply getting confused by us overwriting memory
with that from old image. I don't see how you can emulate it with
shutdown.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
pomozte zachranit klanovicky les:  http://www.ujezdskystrom.info/
--

From: Rafael J. Wysocki
Date: Thursday, March 20, 2008 - 3:45 pm

Well, in fact ACPI has something called the NVS memory, which we're supposed
to restore during the resume and which we're not doing.  The problem may be
related to this.

I have fixing that on my todo list, but frankly there's many different things
in there. :-)

Thanks,
Rafael
--

From: Alan Stern
Date: Thursday, March 20, 2008 - 4:01 pm

No, it can't be.  ACPI won't expect the NVS memory to be restored 
following an S5-shutdown.  In fact, as far as ACPI is concerned, 
resuming from an S5-type hibernation should not be considered a resume 
at all but just an ordinary reboot.  All ACPI-related memory areas 
in the boot kernel should be passed directly through to the image 
kernel.

Alan Stern

--

From: Pavel Machek
Date: Thursday, March 20, 2008 - 4:22 pm

How can we pass interpretter state? I do not think we do this kind of
passing.

If it was enough to pass some static area, we could just mark it
nosave...

Len: Is ACPI AML permitted to allocate memory (like in ACPI_ALLOC or
something)? Could we easily identify BIOS data so we could mark them
nosave?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
pomozte zachranit klanovicky les:  http://www.ujezdskystrom.info/
--

From: Rafael J. Wysocki
Date: Thursday, March 20, 2008 - 4:40 pm

However, the image kernel is supposed to restore the NVS area (from the


This wouldn't work even if we could (at least on x86-64).

In fact I'm going to remove the 'nosave' section in the future (another
thing on the todo list).

Thanks,
Rafael
--

From: Rafael J. Wysocki
Date: Thursday, March 20, 2008 - 5:36 pm

Ah, I misunderstood your comment, sorry.

The regions used by ACPI are registered as 'nosave' by the arch code and
we don't save them.  However, the ACPI NVS area is exceptional in that we
are supposed to save and restore it.  The problem is to restore it at the right
time and it's quite hard to figure out from the spec what time is the right
one (the only thing it says is we should do that before calling _WAK).

Thanks,
Rafael
--

From: Alan Stern
Date: Thursday, March 20, 2008 - 5:52 pm

It's supposed to do that when resuming from an S4 hibernation, not 

For an S5 hibernation, the interpreter state within the image is wrong.  
The image kernel needs to have the interpreter state from the boot 
kernel -- I don't know if this is possible.

Alan Stern

--

From: Nigel Cunningham
Date: Friday, March 21, 2008 - 3:05 pm

Hi.


It's possible.

1) When hibernating, allocate a page (or pages if one isn't enough) for
the data to end up in after the atomic restore.
2) Put the location(s) in the image header.
3) At resume time, allocate an equivalent number of extra 'safe' pages
and set up extra pbes for the atomic restore to copy data from the extra
pages to the ones allocated when hibernating.
4) At the appropriate point in time, copy the NVS data to the extra
'safe' pages allocated in step 3.

The data will then be available to the resumed kernel post-resume.

I've been using this method to pass data from the boot kernel to the
resumed kernel for a while now. (I'm using it for I/O speed statistics
and state preservation).

Regards,

Nigel

--

From: Pavel Machek
Date: Saturday, March 22, 2008 - 9:21 am

yes, nosave pages could be used to do this passing -- if we can put
interpretter state into pre-allocated memory block.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

From: Rafael J. Wysocki
Date: Saturday, March 22, 2008 - 10:45 am

On x86-64 there's no guarantee that the "nosave" pages will be at the same
locations in both the image kernel and the boot kernel.  What we could do
is to pass the data in the image header, preallocate some "safe" pages from
the boot kernel, put the data in there and pass a pointer to them to the
image kernel.

However, as far as the ACPI NVS area is concerned, this is probably not
necessary, because the spec wants us to restore the ACPI NVS before calling
_WAK, which is just after the image kernel gets the control back.  So, in
theory, the ACPI NVS data could be stored in the image and restored by
the image kernel from a location known to it (the procedure may be to copy
the ACPI NVS data into a region of regular RAM before creating the image and
copy them back into the ACPI NVS area in platform->leave(), for example), but
I suspect that for this to work we'll have to switch ACPI off in the boot
kernel, just prior to passing control back to the image kernel.

Thanks,
Rafael
--

From: Alan Stern
Date: Saturday, March 22, 2008 - 1:49 pm

That sounds by far the simplest solution.  If the boot kernel can tell
(by looking at some header field in the image or any other way) that
the hibernation used S5 instead of S4, then it should just turn off 
ACPI before passing control to the image kernel.  Then the image kernel 
can turn ACPI back on and all should be well.  If you do this, does the 
NVS region still need to be preserved?

Alan Stern

--

From: Rafael J. Wysocki
Date: Saturday, March 22, 2008 - 2:29 pm

The spec doesn't say much about that, so we'll need to carry out some
experiments.

Still, as far as I can figure out what the spec authors _might_ mean, I think
that it would be inappropriate to restore the ACPI NVS area if S5 was entered
on "power off".  The idea seems to be that the restoration of the ACPI NVS area
should complement whatever has been preserved by the platform over the
hibernation/resume cycle.

IMO, if S5 was entered on "powe off", there are two possible ways to go.
Either ACPI is initialized by the boot kernel, in which case the image kernel
should not touch things like _WAK and similar, just throw away whatever
ACPI-related state it got from the image and try to rebuild the ACPI-related
data from scratch.  Or the boot kernel doesn't touch ACPI and the image kernel
initializes it in the same way as during a fresh boot (that might be difficult,
though).

Thanks,
Rafael
--

From: Eric W. Biederman
Date: Wednesday, May 14, 2008 - 3:38 pm

Just an added data partial point.  In the kexec case I have had not heard
anyone screaming to me that ACPI doesn't work after we switch kernels.

So I expect shutting down ACPI and restarting it should work reliably
and that is easy to test as that is already implemented with kexec.

Eric
--

From: Rafael J. Wysocki
Date: Wednesday, May 14, 2008 - 4:47 pm

You can't program devices to generate wakeup events without ACPI, among
other things.

Anyway, I don't think you should focus on replacing the current hibernation
code entirely so much.

Thanks,
Rafael
--

From: Eric W. Biederman
Date: Thursday, May 15, 2008 - 1:55 pm

No.  It is the second half of S5.  When we go from the boot kernel
to the restored kernel I am talking about.

That path is exactly what happens successfully in the kexec case.
Transitioning from one kernel to another.

If that path works reliably in kexec then we are talking about
something that can be solved without respect to any specific
ACPI implementation.

Eric


--

From: Rafael J. Wysocki
Date: Thursday, May 15, 2008 - 2:20 pm

Well, you don't remove the power from devices doing that, do you?

I was referring to the fact that you remove the power from devices after saving
the image (ie. in the "poweroff" stage).  Then, you initialize them and pass
all that to the restored kernel and the question here is:
(a) Should they be reinitialized before the restored kernel has a chance to
    access them?
(b) If they should, what state they ought to be in when the restored kernel
    accesses them.

That basically depends on how you're going to handle the resuming of devices,
especially on the ACPI bus, in the restored kernel.

If we are to follow ACPI, the answer to (a) is "no", except for devices used to
read the image and it's better if the boot kernel doesn't touch ACPI at all.
Then, the benefit of putting the system into S4 during the "poweroff" stage is
that (a) the resume can be carried out faster and (b) the restored kernel may
use some context preserved by the platform over the sleep state.

Also, that allows you to use the wake up capabilities of some devices that
need not be available from S5.

In any case, however, I don't really think that doing the kexec jump before
creating the image is really necessary.  The kexec jump during resume is in
fact very similar to what the current hibernation code does, but it's slightly
more complicated. :-)

Thanks,
Rafael
--

Previous thread: [PATCH 07/30] mm: allow PF_MEMALLOC from softirq context by Peter Zijlstra on Thursday, March 20, 2008 - 1:10 pm. (1 message)

Next thread: [PATCH 01/30] swap over network documentation by Peter Zijlstra on Thursday, March 20, 2008 - 1:10 pm. (2 messages)