I don't know what exactly the i915_suspend() and i915_resume() are
supposed to do because it works better without them.
After inserting "return 0;" right at the top of those two functions,
suspend (and power-off properly), and resume (without green screen) works
just fine.
I would like to know what they're for.
Tested suspend-to-ram, and suspend-to-disk, both console and X on notebook
internal LCD display, all works without these two functions.
But, anyway, got down to just one line in i915_drv.c causing the hang
during suspend. "pci_set_power_state(dev->pdev, PCI_D3hot);".
And green screen problem during resume is caused by i915_restore_vga(dev);
So, let me where to go from here.
Thanks,
Jeff.
--- linux/drivers/char/drm/i915_drv.c.bad 2008-02-20
11:29:14 +0800
+++ linux/drivers/char/drm/i915_drv.c 2008-02-21 00:58:37 +0800
@@ -369,7 +369,7 @@
if (state.event == PM_EVENT_SUSPEND) {
/* Shut down the device */
pci_disable_device(dev->pdev);
- pci_set_power_state(dev->pdev, PCI_D3hot);
+ //pci_set_power_state(dev->pdev, PCI_D3hot);
}
return 0;
@@ -521,7 +521,7 @@
for (i = 0; i < 3; i++)
I915_WRITE(SWF30 + (i << 2), dev_priv->saveSWF2[i]);
- i915_restore_vga(dev);
+ //i915_restore_vga(dev);
return 0;
}
--
Tried "idle=poll" but it has not effect. Thanks, Jeff. --
Try suspend-and-resume without X. Also, try it on one of the more modern laptops - even *with* X. Basically, the kernel wants to be able to do what X does, because it means that when it works, it works _so_ much better than doing it in X. So getting it working is definitely worth it. That said, before you do anything else, try if suspend-to-RAM works. That's the primary goal for this code anyway, and if it works that gives a good hint. Suspend-to-disk is fundamentally different, and it's entirely possible that for the suspend-to-disk case we should just say "screw trying to suspend/resume graphics", since you'll have the BIOS resuming text-mode anyway, and there are no performance or debugging advantages. Linus --
Ahh. You're using the BIOS to re-initialize your video, aren't you? If STR works without X, then you have something else resuming graphics, and that may be what then interacts badly with the fact that the kernel Let's try to narrow it down to what the interaction is. Are you using something like acpi_sleep=s3_bios or similar? That's what the kernel support is supposed to make unnecessary in the long run, along with all the video mode flickering (ie we should be able to resume to the video mode we want, not flicker through unnecessary modes). Linus --
Well, as far as I know, s2ram could be doing vbe save/restore for you. My laptop (a Toshiba satellite U305, intel 945GM chipset, used to need s2ram -f -p -m to STR correctly. In 2.6.25-rc2 I can simply STR with echo mem > /sys/power/state. Romano I imagine this will be received as blasphemy, but if only ndiswrapper were not horribly broken, this will be my day-by-day kernel. I just hope ath5k will arrive to my chipset soon... -- La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración. This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. --
Really? This contains confidential information? I'd better notify you and destroy this message now... :) Jesse --
STR is always working on my X60s. No green screen, no hang. Both s2ram and "echo mem > /sys/power/state". It's STD that's having problem. But strange thing is I could even restore console after STD even without agp and i915 module loaded, so I don't know how the console vga got saved and restored. Thanks, Jeff. --
Jeff, can you please test hibernation with the patch I've just sent to Jesse
(reproduced below for convenience)?
Thanks,
Rafael
---
drivers/char/drm/i915_drv.c | 5 +++--
include/linux/suspend.h | 2 ++
kernel/power/disk.c | 10 +++++++++-
3 files changed, 14 insertions(+), 3 deletions(-)
Index: linux-2.6/drivers/char/drm/i915_drv.c
===================================================================
--- linux-2.6.orig/drivers/char/drm/i915_drv.c
+++ linux-2.6/drivers/char/drm/i915_drv.c
@@ -27,6 +27,7 @@
*
*/
+#include <linux/suspend.h>
#include "drmP.h"
#include "drm.h"
#include "i915_drm.h"
@@ -222,6 +223,7 @@ static void i915_restore_vga(struct drm_
dev_priv->saveGR[0x18]);
/* Attribute controller registers */
+ inb(st01); /* switch back to index mode */
for (i = 0; i < 20; i++)
i915_write_ar(st01, i, dev_priv->saveAR[i], 0);
inb(st01); /* switch back to index mode */
@@ -366,9 +368,8 @@ static int i915_suspend(struct drm_devic
i915_save_vga(dev);
- if (state.event == PM_EVENT_SUSPEND) {
+ if (state.event == PM_EVENT_SUSPEND && !in_hibernation_power_off()) {
/* Shut down the device */
- pci_disable_device(dev->pdev);
pci_set_power_state(dev->pdev, PCI_D3hot);
}
Index: linux-2.6/include/linux/suspend.h
===================================================================
--- linux-2.6.orig/include/linux/suspend.h
+++ linux-2.6/include/linux/suspend.h
@@ -209,6 +209,7 @@ extern unsigned long get_safe_page(gfp_t
extern void hibernation_set_ops(struct platform_hibernation_ops *ops);
extern int hibernate(void);
+extern bool in_hibernation_power_off(void);
#else /* CONFIG_HIBERNATION */
static inline int swsusp_page_is_forbidden(struct page *p) { return 0; }
static inline void swsusp_set_page_free(struct page *p) {}
@@ -216,6 +217,7 @@ static inline void swsusp_unset_page_fre
static inline void hibernation_set_ops(struct platform_hibernation_ops *ops) {}
static inline int ...Great news. It works. STD (platform) does not hang at suspend. And the annoying green is gone! STR still works. Thanks, Jeff. --
Great, thanks for testing. If Jesse confirms that it also works for him, I'll prepare a cleaner final fix tomorrow. Thanks, Rafael --
I don't understand why hibernation just doesn't use a PM_EVENT_HIBERNATE, and be done with it? Why should it be called PM_EVENT_SUSPEND when it isn't? Adding some external global variables is absolutely the wrong way to fix this. It's not even like there are very many drivers who actually care about "state.event" anyway: a 'git grep' returns just 35 users in the whole tree, so if this was done this ugly way just to avoid double-chcking the other cases that compare against PM_EVENT_SUSPEND, then it really wasn't worth it. Linus --
Please relax, we're debugging the thing right now and the patch doesn't even seem to help on the other affected box. The issue appears to be more complicated than we initially thought. Thanks, Rafael --
Actually, looks like I forgot to reboot between tests (just rmmod'd & modprobed i915), your patch actually does work. However, making new PM event messages might be a good thing anyway, assuming Linus takes it for 2.6.25, since it should make the migration to ->hibernate callbacks easier. Jesse --
Rafael, I'd actually prefer these changes to the i915 driver. One is to avoid the "green screen" problem and the other is to actually save state at hibernate time in case we don't do a POST coming out of S4 (probably not common but hey). Jesse Make sure hibernation works by not shutting down the video device during hibernation power off. This is important because later stages of the hibernation cycle end up touching the video device, which may cause a hang if it was disabled early on. Also make sure the restoration correctly restores the AR registers by flipping the ARX register into index mode before doing anything. Depends on Rafael's patch which exports hibernation state to drivers. Signed-off-by: Jesse Barnes <jesse.barnes@intel.com> diff --git a/drivers/char/drm/i915_drv.c b/drivers/char/drm/i915_drv.c index 35758a6..5e73869 100644 --- a/drivers/char/drm/i915_drv.c +++ b/drivers/char/drm/i915_drv.c @@ -27,6 +27,7 @@ * */ +#include <linux/suspend.h> #include "drmP.h" #include "drm.h" #include "i915_drm.h" @@ -222,6 +223,7 @@ static void i915_restore_vga(struct drm_device *dev) dev_priv->saveGR[0x18]); /* Attribute controller registers */ + inb(st01); for (i = 0; i < 20; i++) i915_write_ar(st01, i, dev_priv->saveAR[i], 0); inb(st01); /* switch back to index mode */ @@ -364,8 +366,8 @@ static int i915_suspend(struct drm_device *dev) i915_save_vga(dev); /* Shut down the device */ - pci_disable_device(dev->pdev); - pci_set_power_state(dev->pdev, PCI_D3hot); + if (!in_hibernation_power_off()) + pci_set_power_state(dev->pdev, PCI_D3hot); return 0; } --
Below is a patch that introduces PM_EVENT_HIBERNATE as requested by Linus and (hopefully) makes hibernation with i915 work correctly. I must admit I don't feel very comfortable introduces PM_EVENT_HIBERNATE at this point, since such changes tend to introduce unexpected issues all over the place, but hopefully this time it won't break anything. I have tested it on the nx6325. Please review and tell me if it looks good. Jesse and Jeff, please check if your boxes hibernate correctly with this patch applied. Thanks, Rafael --- Documentation/power/devices.txt | 13 ++++++++----- drivers/ata/ahci.c | 3 ++- drivers/ata/ata_piix.c | 3 ++- drivers/ata/libata-core.c | 2 +- drivers/char/drm/i915_drv.c | 4 +++- drivers/ide/ppc/pmac.c | 6 ++++-- drivers/macintosh/mediabay.c | 4 +++- drivers/pci/pci.c | 1 + drivers/scsi/aic7xxx/aic79xx_osm_pci.c | 2 +- drivers/scsi/aic7xxx/aic7xxx_osm_pci.c | 2 +- drivers/scsi/mesh.c | 1 + drivers/scsi/sd.c | 5 +++-- drivers/usb/host/sl811-hcd.c | 1 + drivers/usb/host/u132-hcd.c | 10 +++++++--- drivers/video/chipsfb.c | 3 ++- drivers/video/nvidia/nvidia.c | 3 ++- include/linux/pm.h | 5 +++++ kernel/power/disk.c | 4 ++-- net/rfkill/rfkill.c | 3 ++- 19 files changed, 51 insertions(+), 24 deletions(-) Index: linux-2.6/kernel/power/disk.c =================================================================== --- linux-2.6.orig/kernel/power/disk.c +++ linux-2.6/kernel/power/disk.c @@ -391,7 +391,7 @@ int hibernation_platform_enter(void) goto Close; suspend_console(); - error = device_suspend(PMSG_SUSPEND); + error = device_suspend(PMSG_HIBERNATE); if (error) goto Resume_console; @@ -404,7 +404,7 @@ ...
I'm doing this part separately, please drop it - it has nothing to do with
the rest of the patch.
I'd also suggest that you just add a helper function like
int pm_event_powerdown(struct pm_message mesg)
{
return mesg.event >= PM_EVENT_SUSPEND;
}
or something, so that you can have code like
if (pm_event_powerdown(mesg))
pci_set_power_state(pdev, PCI_D3hot);
instead of the test for EVENT_SUSPEND/HIBERNATE explicitly.
Of course, the places that already do a switch-statement are much better
Didn't you miss the apci_pci_choose_state() thing that also needs this
Looks like a missing close-brace to me there - you removed the final '}'.
Or am I blind?
Apart from those issues it looks fine to me.
Linus
--
In the revised patch below I redefined the PM_EVENT_* things as flags so
that I can "or" them and defined PM_EVENT_SLEEP in analogy with
OK, please have a look at the modified patch below.
Thanks,
Rafael
---
Documentation/power/devices.txt | 13 ++++++++-----
drivers/ata/ahci.c | 2 +-
drivers/ata/ata_piix.c | 2 +-
drivers/ata/libata-core.c | 2 +-
drivers/ide/ppc/pmac.c | 4 ++--
drivers/macintosh/mediabay.c | 3 ++-
drivers/pci/pci.c | 1 +
drivers/scsi/aic7xxx/aic79xx_osm_pci.c | 2 +-
drivers/scsi/aic7xxx/aic7xxx_osm_pci.c | 2 +-
drivers/scsi/mesh.c | 1 +
drivers/scsi/sd.c | 3 +--
drivers/usb/host/sl811-hcd.c | 1 +
drivers/usb/host/u132-hcd.c | 11 ++++++++---
drivers/video/chipsfb.c | 2 +-
drivers/video/nvidia/nvidia.c | 2 +-
include/linux/pm.h | 9 ++++++++-
kernel/power/disk.c | 4 ++--
net/rfkill/rfkill.c | 2 +-
18 files changed, 42 insertions(+), 24 deletions(-)
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -391,7 +391,7 @@ int hibernation_platform_enter(void)
goto Close;
suspend_console();
- error = device_suspend(PMSG_SUSPEND);
+ error = device_suspend(PMSG_HIBERNATE);
if (error)
goto Resume_console;
@@ -404,7 +404,7 @@ int hibernation_platform_enter(void)
goto Finish;
local_irq_disable();
- error = device_power_down(PMSG_SUSPEND);
+ error = device_power_down(PMSG_HIBERNATE);
if (!error) {
hibernation_ops->enter();
/* We should never get here */
Index: linux-2.6/include/linux/pm.h
===================================================================
--- ...All right, I'm fine with it. Now we just need to confirm that it works for people.. Linus --
On Sat, Feb 23, 2008 at 10:07 AM, Linus Torvalds Looks good. Applied Rafael patch on top of your latest git tree that has Jesse's i915 fix. No green screen. Tested with STD (platform), STR, and plain echo mem > /sys/power/state. Thanks, Jeff. --
Thanks for testing. Below is the final version of the patch with a changelog etc. Thanks, Rafael --- From: Rafael J. Wysocki <rjw@sisk.pl> During the last step of hibernation in the "platform" mode (with the help of ACPI) we use the suspend code, including the devices' ->suspend() methods, to prepare the system for entering the ACPI S4 system sleep state. At least for some devices the operations performed by the ->suspend() callback in that case must be different from its operations during regular suspend. For this reason, introduce the new PM event type PM_EVENT_HIBERNATE and pass it to the device drivers' ->suspend() methods during the last phase of hibernation, so that they can distinguish this case and handle it as appropriate. Modify the drivers that handle PM_EVENT_SUSPEND in a special way and need to handle PM_EVENT_HIBERNATE in the same way. These changes are necessary to fix a hibernation regression related to the i915 driver (ref. http://lkml.org/lkml/2008/2/22/488). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Tested-by: Jeff Chua <jeff.chua.linux@gmail.com> --- Documentation/power/devices.txt | 13 ++++++++----- drivers/ata/ahci.c | 2 +- drivers/ata/ata_piix.c | 2 +- drivers/ata/libata-core.c | 2 +- drivers/ide/ppc/pmac.c | 4 ++-- drivers/macintosh/mediabay.c | 3 ++- drivers/pci/pci.c | 1 + drivers/scsi/aic7xxx/aic79xx_osm_pci.c | 2 +- drivers/scsi/aic7xxx/aic7xxx_osm_pci.c | 2 +- drivers/scsi/mesh.c | 1 + drivers/scsi/sd.c | 3 +-- drivers/usb/host/sl811-hcd.c | 1 + drivers/usb/host/u132-hcd.c | 11 ++++++++--- drivers/video/chipsfb.c | 2 +- drivers/video/nvidia/nvidia.c | 2 +- include/linux/pm.h | 9 ++++++++- kernel/power/disk.c ...
Thanks, applied. With this, I also find that I dislike the use of suspend/resume for freezing for STD a lot less. It's still too easy to get confused, but at least now drivers always have total knowledge about what is really going on. I'd not like this interface as a driver writer, but now it's not fundamentally broken any more, just slightly confusing. Linus --
Heh, thanks :-). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
On Sun, Feb 24, 2008 at 2:43 AM, Linus Torvalds Tested and working. Thanks again, Jeff. --
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=EF=BB=BFHmm. Doesn't compile for me in 2.6.25-rc2-git7:
CC [M] drivers/usb/host/u132-hcd.o
drivers/usb/host/u132-hcd.c: In function =E2=80=98u132_suspend=E2=80=99:
drivers/usb/host/u132-hcd.c:3224: error: expected expression before
=E2=80=98int=E2=80=99
drivers/usb/host/u132-hcd.c:3225: error: =E2=80=98ports=E2=80=99 undeclared=
(first use
in this function)
drivers/usb/host/u132-hcd.c:3225: error: (Each undeclared identifier is
reported only once
drivers/usb/host/u132-hcd.c:3225: error: for each function it appears
in.)
make[3]: *** [drivers/usb/host/u132-hcd.o] Error 1
make[2]: *** [drivers/usb/host] Error 2
make[1]: *** [drivers/usb] Error 2
make: *** [drivers] Error 2
This fixes it:
Thanks
Mirco
---
From: Mirco Tischler <mt-ml@gmx.de>
Fixes the following compile error caused by commit
3a2d5b700132f35401f1d9e22fe3c2cab02c2549
...
CC [M] drivers/usb/host/u132-hcd.o
drivers/usb/host/u132-hcd.c: In function =E2=80=98u132_suspend=E2=80=99:
drivers/usb/host/u132-hcd.c:3224: error: expected expression before
=E2=80=98int=E2=80=99
drivers/usb/host/u132-hcd.c:3225: error: =E2=80=98ports=E2=80=99 undeclared=
(first use
in this function)
...
Signed-off-by: Mirco Tischler <mt-ml@gmx.de>
---
drivers/usb/host/u132-hcd.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/drivers/usb/host/u132-hcd.c b/drivers/usb/host/u132-hcd.c
index 6fca069..58830b2 100644
--- a/drivers/usb/host/u132-hcd.c
+++ b/drivers/usb/host/u132-hcd.c
@@ -3214,6 +3214,7 @@ static int u132_suspend(struct platform_device
*pdev, pm_message_t state)
return -ESHUTDOWN;
} else {
int retval =3D 0;
+ int ports =3D 0;
=20
switch (state.event) {
case PM_EVENT_FREEZE:
@@ -3221,7 +3222,7 @@ static int u132_suspend(struct platform_device
*pdev, pm_message_t state)
break;
case PM_EVENT_SUSPEND:
case PM_EVENT_HIBERNATE:
- ...Hm, why not to do:
---
drivers/usb/host/u132-hcd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux-2.6/drivers/usb/host/u132-hcd.c
===================================================================
--- linux-2.6.orig/drivers/usb/host/u132-hcd.c
+++ linux-2.6/drivers/usb/host/u132-hcd.c
@@ -3214,6 +3214,7 @@ static int u132_suspend(struct platform_
return -ESHUTDOWN;
} else {
int retval = 0;
+ int ports = MAX_U132_PORTS;
switch (state.event) {
case PM_EVENT_FREEZE:
@@ -3221,7 +3222,6 @@ static int u132_suspend(struct platform_
break;
case PM_EVENT_SUSPEND:
case PM_EVENT_HIBERNATE:
- int ports = MAX_U132_PORTS;
while (ports-- > 0) {
port_power(u132, ports, 0);
}
--
Ah, I see it's merged already. Thanks for the fix, btw! :-) --
Seems to work here after basic tests. ACK. (I discovered that -rc2 swsusp will not power down in some cases, but it was here before the patch, too...) Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
Your system (either your distro suspend/resume scripts or your platform) must be running the video BIOS at resume time, otherwise it would probably come back blank. Jesse --
But I don't think so, unless acpid is doing just that. In my suspend/resume event script, it's just doing a simple s2ram (without options), and exit after resume. Jeff. --
Your s2ram script is doing your STD also? Seems counterintuitive. Anyway, some machines also re-POST the GPU on resume from S3; maybe yours is doing that. Jesse --
It's s2ram to do STR, not STD. Sorry for the confusion. But the key point is there's no GREEN for STR. Mr Green only appear with STD. Thanks, Jeff. --
Ah, ok that makes sense. So typically, what you'd see at suspend time is this ugly call chain: 1) user requests suspend or hibernate 2) kernel kicks users off VT 3) X calls LeaveVT function of X driver 4) X driver restores whatever video state it felt like saving when it started up 5) kernel calls suspend methods 6) machine goes to sleep then on resume: 1) user requests wakeup 2) kernel calls resume methods 3) X takes back the VT, calling driver EnterVT methods 4) X driver EnterVT routine runs, doing whatever it wants 5) you're back to where you were (on a good day anyway) So, on your machine, I suspect your firmware is doing enough that X doesn't have to save/restore full video state around Enter/Leave VT (the same functions called at VT switch time when you press ctl-alt-fx), otherwise you'd be missing things like your backlight or text consoles. So the advantage of the kernel suspend/resume hooks for the DRM layer is that the kernel video drivers can do full state save/restore (which X usually doesn't do, and isn't really designed to do), so that if your platform *doesn't* do it all, you'll still end up with a usable machine in the end. The fact that you'd started running into problems since we merged this just means your platform was taking care of it for you (lucky you) and that we have some bugs in the hibernate code that we're just discovering. Jesse --
That's the main reason for moving to the X series. It's seems to work very well on Linux. Thanks, Jeff. --
Well, I'm also hoping that eventually we could even just not do the VT switch at all, and the kernel can treat X as "just another user process" that it freezes. At least from a mode setting standpoint. We'd still want to make sure that X repaints the screen if the contents were lost, of course. And this is going to depend very intimately on the type of graphics card and whether the video RAM is saved by STR or not - for the Intel integrated graphics kind of situation, the video RAM will be refreshed along with all the other memory, but for other cards we may end up having to do the VT switch not so much for modesetting reasons as just a way to get X to save and restore all the *other* state. How close is the i915 driver from not having to even signal X? Or is that just a pipedream of mine? Linus --
Drivers supporting kernel modesetting will have to stuff their VRAM somewhere, It's there in the modesetting tree (though the requisite changes to avoid VT notification aren't done, it should all work fine). Jesse --
The s2ram command has a built-in whitelist used to set up video rePOSTing. If you want to test, reboot and do echo mem >/sys/power/state without i915 being loaded. If you get a console back on resume then the platform is reinitialising video for you, but otherwise it's your userspace. -- Matthew Garrett | mjg59@srcf.ucam.org --
btw., why isnt there an in-kernel whitelist, with perhaps a dynamic, convenient /debug/s2r/whitelist append-API for distros (and testers) to add more entries to the whitelist/blacklist? (for cases where the kernel whitelist has not caught up yet) Which would eventually converge to Utopia: s2ram that just works out of box. This would be a lot more flexible (people could even temporarily extend the whitelist via rc.local if need to be, etc.), a lot more robust and a lot more user friendly than the "dont use /sys/power/state, rely on some user-space tool to work around bugs" approach. Really, i couldnt make the s2ram API/quirks situation much worse even if i deliberately tried to design the whole code to be as hard to use and as confusing as possible :-/ These types of half-kernel half-userspace solutions usually result in constant finger-pointing and constant lag, and they result in about the crappiest user experience that is possible to achieve physically. ( Sorry about the strong words, while there's lots of good and positive development lately i havent seen much change in this particular area of s2ram in the past 1-2 years, and the whole chain is only as strong as the weakest link - so someone finally has to deliver this message to the cozy fire of s2r hackers while our testers and users are standing out in the cold rain ... ) Ingo --
The problem with the whitelists is that they have to use quite a lot of data to reliably match the system. The s2ram whitelist is not 100% reliable, because it uses too little information to distinguish different versions of the same machine model, for example. Plus, in an ideal world, we should be able to match all possible working graphics/chipset/BIOS combinations and that would be quite a bit of a database. Also, there are some quirks that need to be run from the user land, AFAICS (eg. in an i86 real-mode compatible manner). IMO, whitelisting is not a solution. It's only a sort-of-working workaround and as such it shouldn't be put into the kernel. Thanks, Rafael --
The big problem with that is - the people who know about the devices are usually not kernel people - the workarounds that the whitelist requires is quite often not a kernel workaround. In other words, the most common workarounds for the s2ram whitelist is usually to do things like running vbetool in user-level to do VGA register save/restore (VBE_POST and VGE_SAVE). Sure, the kernel could do that with usermodehelper etc, but s2ram also has those things as command line flags etc, so... Linus --
Because all of these video quirks are just workarounds for the fact that the kernel doesn't work properly. In general, you really don't want to call a real-mode video bios from the kernel, so punting it to userspace (and leaving the whitelisting there) is somewhat more straightforward. In addition, we can then extend the whitelist without requiring kernel We've got i915 suspend/resume now, which already fixes this for a large number of users. Recent ATI is easy, now that we actually have specs for ATOM. The nouveau guys are almost at the point where we can do it for nvidia. That basically just leaves VIA. The other s2r issues are pretty much just driver bugs at this point. -- Matthew Garrett | mjg59@srcf.ucam.org --
Linus, guess I missed this part ... so before touch anything, I did tried suspend-to-ram, and it works on console and in X. And suspend-to-disk hangs, but I can still press and hold the power button to power it off. Then upon powering on and resume, I get the ugly green "console" screen. I can still type and move around. Starting X runs fine. Ctrl-Alt-Del or switching back to console will get back to the green screen. Thanks, Jeff. --
The "press and hold for five seconds" is actually a hardware feature of the southbridge (well, I guess there is "software" in there too, but it's the embedded kind). So the fact that it powers off at that point means nothing, it just means that ok, your kernel is hung, but the hardware still works ;) This *sounds* like some part of the suspend-to-disk sequence is doing something stupid like trying to access the screen after it has been turned off, which doesn't surprise me at all. My oft-stated opinion has been that suspend-to-disk isn't a suspend at all, and should never have been confused with "suspending" anything. It's "snapshot-and-restore", and my opinion is that: - it should *never* call "suspend()"/"resume()" at all (that should be reserved purely for suspend-to-RAM and has real power management issues!) - it should have a totally separate "halt/unhalt/restore" thing that has nothing what-so-ever to do with power management, and is purely about stopping the hardware for things like USB and network cards (which otherwise do things like scan their command lists asynchronously) and making sure that the driver state is consistent with that stopped hw state. - the people who confuse snapshot/restore with suspend/resume are horrible people that cause problems exactly because driver people then get those things mixed up, and something like the video suspend/resume should probably never have impacted suspend-to-disk in the first place! .. so this implies that while the laptop apparently hung at the end of the snapshotting, the snapshotting did actually work, and it must have hung at the very end, presumably when it tried to actually turn the power off. So there seems to be two (probably largely independent) problems: - the hang at shutdown that requires you to press-and-hold the power button to actually cut power. At a guess: putting the VGA device into D3hot makes the ACPI code that ...
Here's an interesting discovery. After I found that "echo reboot > /sys/power/disk" does reboot, I tried "echo shutdown > /sys/power/disk", it does shutdown properly. With "platform" it refuses to shutdown. Both reboot and shutdown still Yes. Thanks, Jeff.
That kind of suggests that the ACPI platform code is hitting the hardware directly - we've seen similar issues with PATA controllers. The right thing to do here is almost certainly just to avoid explicitly powering down hardware on hibernation. -- Matthew Garrett | mjg59@srcf.ucam.org --
Totally agreed. I remember when I started getting hibernation bug reports against this new code and boggling at how hibernate was actually done. The driver actually gets its ->suspend routine called twice with two different pm_message_t values. We tried to do different stuff depending on the pm_message_t (like only putting the device in D3hot if PM_EVENT_SUSPEND), but Sounds like a good theory... now if we could just use set_power_state in the suspend case only. That's what the latest code *tries* to do... JEsse --
Hmm, entering S4 seems like good place to call suspend() for... unless you want separate freeze()/unfreeze(), suspend()/resume(), suspend_s4() and halt() callbacks. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
They're for saving and restoring GPU state across suspend/resume. They're particularly useful if your machine doesn't re-POST at resume time. In that case your GPU may be totally uninitialized, so either the kernel or X has to I know I fixed that problem in at least one configuration... Can you try: # echo test > /sys/power/disk # echo disk > /sys/power/state and see if that also turns your screen green? Also, getting a GPU register dump would be helpful. The intel_reg_dumper tool is built as part of the xf86-video-driver build (git://anongit.freedesktop.org/git/xorg/driver/xf86-video-intel), can you pull that down and try it out? Thanks, Jesse --
Yes, still green. But I got it to actual reboot with ... echo reboot > /sys/power/disk Attached are the two dumps from console. One prior to suspend, and one after resume. Thanks, Jeff.
Looks like the AR registers are hosed, which is what I thought I fixed... Can you attach your i915_drv.c file just so I can sanity check it? Thanks, Jesse --
Ok, so Linus' theory about something later in the resume path trying to touch video is looking good. Hm, looks right. Let me see if I can reproduce this on my T61. Thanks, Jesse --
Given the way the PM core works, do we need to set a flag like this? I really
hope there's a better way of doing this...
Thanks,
Jesse
diff --git a/drivers/char/drm/i915_drv.c b/drivers/char/drm/i915_drv.c
index 4048f39..a2d6242 100644
--- a/drivers/char/drm/i915_drv.c
+++ b/drivers/char/drm/i915_drv.c
@@ -238,6 +238,13 @@ static void i915_restore_vga(struct drm_device *dev)
}
+/*
+ * If we're doing a suspend to disk, we don't want to power off the device.
+ * Unfortunately, the PM core doesn't tell us if we're headed for a regular
+ * S3 state or that it's about to shut down the machine, so we use this flag.
+ */
+static int i915_hibernate;
+
static int i915_suspend(struct drm_device *dev, pm_message_t state)
{
struct drm_i915_private *dev_priv = dev->dev_private;
@@ -252,6 +259,9 @@ static int i915_suspend(struct drm_device *dev,
pm_message_t state)
if (state.event == PM_EVENT_PRETHAW)
return 0;
+ if (state.event == PM_EVENT_FREEZE)
+ i915_hibernate = 1;
+
pci_save_state(dev->pdev);
pci_read_config_byte(dev->pdev, LBB, &dev_priv->saveLBB);
@@ -366,7 +376,7 @@ static int i915_suspend(struct drm_device *dev,
pm_message_t state)
i915_save_vga(dev);
- if (state.event == PM_EVENT_SUSPEND) {
+ if (!i915_hibernate) {
/* Shut down the device */
pci_disable_device(dev->pdev);
pci_set_power_state(dev->pdev, PCI_D3hot);
@@ -385,6 +395,8 @@ static int i915_resume(struct drm_device *dev)
if (pci_enable_device(dev->pdev))
return -1;
+ i915_hibernate = 0;
+
pci_write_config_byte(dev->pdev, LBB, dev_priv->saveLBB);
/* Pipe & plane A info */
--
Then, the .resume() called after the image creation will clear the flag and I don't think it's safe to allow it to survive i915_resume() ... Thanks, Rafael --
Yeah. By *not* using "->suspend()" for freezing or hibernate. Please, Rafael - just make the f*cking suspend-to-disk use other routines already. 99% of all hardware needs to do exactly *nothing* on suspend-to-disk, and the ones that really do need things tend to need to not do a whole lot. For example, the "freeze" action for USB (which is one of the hardest things to suspend) should literally be something like just setting the controller STOP bit, and waiting for it to have stopped. The "unfreeze" should be to just clear the stop bit, while the "restart" should be just a controller reset to use the current memory image. NONE OF THIS HAS ABSOLUTELY ANYTHING TO DO WITH SUSPEND. It never did. I've told people so for years. Maybe actually seeing the problems will make people realize. So please, we shouldn't call "->suspend[_late]" or "->resume[_early]" at all. Not with PMSG_FREEZE, not with PMSG_*anything*. Can we please get this fixed some day? Linus --
In talking with Rafael on IRC about this, I think we're agreed that we need separate entry points. Even with a kexec based hibernate, we'll probably want ->hibernate callbacks so we don't end up shutting down the device. The current callback system looks like this (according to Rafael and the last time I looked): ->suspend(PMSG_FREEZE) ->resume() ->suspend(PMSG_SUSPEND) *enter S3 or power off* ->resume() The fact that we get suspend/resume called once before suspend again in the hibernate case is somewhat obnoxious, but it's even worse that we don't know what we're about to enter after ->suspend(PMSG_SUSPEND). So in the short term it would be nice to at least get the target state exported. And in the long term we could have: ->suspend() *enter S3* ->resume() or: ->hibernate() *kexec to another kernel to save image* *power off* ->return_from_hibernate() (or somesuch) Jesse --
Yes, it's very messy. It's messy for a few different reasons: - the one you hit: a driver actually has a really hard time telling what PMSG_SUSPEND really means. - more importantly, we generally don't want to "suspend/resume" the hardware at all around a power-off, because we're going to resume with the state at the time of the PMSG_FREEZE, which means that the hardware has actually *changed* and been used in between! that second case is very fundamental for things like USB devices, which in theory you can hold alive over a real suspend event (ie a STR event), but which absolutely MUST NOT be resumed over a suspend-to-disk event, because all the low-level request state is bogus! So the "->resume" really isn't a resume at all. It's much closer to a "->reset". Of course, the "solution" to this all right now is that we have to reset everything even if it *is* a suspend event, so it basically means that STR ends up using the much weaker model that snapshot-to-disk uses. The fundamental problem being that the two really have nothing Yes, apart from all the complexities (suspend_late/resume_early). So in reality it's more than that, but the suspend/resume things are clearly nesting, and they have the potential to actually keep state around (because we *know* this machine is not going to mess with the devices in between). IOW, here we actually can have as an option "assume the device is there Enough people don't trust kexec that I suspect the right thing simply is ->freeze() // stop dma, synchronize device state *snapshot* ->unfreeze(); // resume dma *save image* [ optionally ->poweroff() ] // do we really care? I'd say no *power off* ->restore() // reset device to the frozen one which may have four entry-points that can be illogically mapped to the suspend/resume ones like we do now, but they really have nothing to do with suspending/resuming. And notice how while "freeze/restore" kind of pairs like a ...
Well, it seems like we'll have to fix drivers in either case, and isn't a kexec approach fundamentally more sound and simple, design-wise? Rafael pointed out some problems with properly setting wakeup states, but I think Yeah, definitely. It has to be much more robust and deal with configuration changes, etc. (within reason). Jesse --
Absolutely. Two big reasons: - debuggability I know we don't do this correctly right now, but I want to be able to at least feel like we can some day actually do printk's etc through 99% of the suspend/resume cycle. It's a *huge* thing for debugging problems that happen in the wild, and one of the biggest issues is that we currently usualyl just get a "the machine died" message when suspend or resume doesn't work. Yes, doing printk's to the Intel management flash stuff can help a lot here, and I want that too, but I'd really like to shut down consoles individually rather than having the "big hammer" approach that shuts them up entirely over the whole suspend/resume sequence (or not at all, if you use "no_console_suspend"). And I'd *really* like to do things like VGA-console shutdown in the late phase (and resume early). - it's actually likely *much* simpler for some devices. Simple devices (and that includes things like PCI bridges etc, but also potentially USB host controllers etc) are things that can often be trivially suspended - all the complexity is really not in the controller itself, but beyond, in the bus that it actually drives. And the late-suspend/early-resume means that you don't have to worry about things like interrupts happening while you're suspended. Yes, putting the device into D3 will disable interrupts from that device too (unless there are bugs), *BUT* you may be sharing an interrupt line, and interrupts may be posted and delayed, so an earlier interrupt may well be pending etc. suspending late and resuming early just avoids those issues entirely. Sometimes these things interact. For example, firewire is certainly not trivial to suspend as a "subsystem" thing (ie all the devices behind the firewire bridge need to do magic things, like spinning down etc that obviously can not happen in the final "late" phase), but the firewire ...
I've been watching for kexec hibernate for a little while now, and the last I saw was that acpi was incompatible with the kexec hibernate (but the suspend folks were still claiming that devices needed to be put in the 'right mode' not just powered off. I've been waiting to see this resolved. David Lang --
david@lang.hm wrote: .. Yeah, exactly. What's so special about poweroff on hibernation? Why even bother with the special "S4" state there? I want a real full poweroff, or at least I think I do. Why wouldn't I? ???? --
(1) To be able to wake up with the help of devices that can't wake
the system up from S5 (power off)
You may want that, some people may not want it.
We are supposed to handle S4, the BIOS/platform may expect us to do that, so
IMO this is a good enough reason to do it. Especially that we can.
Thanks,
Rafael
--
.. That's the theory. I've read about it, but have yet to imagine any real-life situation where it applies. But this isn't my speciality, so.. do you have experience with any real examples? --
Yup. The fan in my notebook behaves incorrectly after a resume from hibernation if S5 is entered instead of S4 during it. I don't know why exactly it happens, but that's how it goes. Also, some machines are reported to behave incorrectly after a "shutdown" mode hibernation, while the same machines work just fine after a "platform" mode hibernation. So at least for these machines it seems to matter. Thanks, Rafael --
so if you power off your laptop the fan doesn't work when you turn it back given that we don't have a pure "shutdown" option available to try I don't see how this can be said to have been tested. currently any attempts to do a shutdown type hibernate are tangled in the other code that is there for the suspend modes. this makes it _very_ hard to say that the hardware requires something as opposed to the strong possibility that the software is doing something wrong. there are also a _lot_ of people who are not able to reliably use the existing "platform" mode hibernation, so it's not a fair statement to say that it's the 'right' thing to do. If you want to make it an option, fine. But please give those of us who don't care about these other wakeup options, and who want to be able to use other OS's while linux is stopped an option as well. David Lang --
There is such an option. Put # echo shutdown > /sys/power/disk into the init scripts and it will do the trick. Thanks, Rafael --
In fact the driver can find out in which state to put the device into, We do, if there are devices that wake us up from S4 and don't wake us up from S5, for example. Plus this f*cking fan in my box that doesn't work after the Agreed. Thanks, Rafael --
And by "right low power states" you mean "wrong low-power states", right? The thing is, they really *are* the wrong states for 99% of all hardware. If you really have a piece of hardware that you want to have the "->poweroff()" thing do the same as "->suspend()", then hey, just use the same function (or better yet, use two different functions with a call to a shared part). Because IT IS NOT TRUE that ->suspend() puts the devices in the "right power state". The power states are likely to be totally different for S3 and for poweroff, and they are going to differ in different ways depending on the device type. One example would be the one that started this version of the whole discussion (shock horror! We're on subject!) ie when you do a system shutdown, you generally do not even *want* to put individual devices into low-power states at all, because the actual "power off the system" thing will take care of it for you much better. So to take just something as simple as VGA as an example: you really do not want to suspend that device, because you want to see the poweroff messages until the very end. So that final device ->poweroff function really has absolutely *nothing* in common with the device ->suspend[_late] functions, simply because almost any sane driver would decide to do different things. Of course, we can continue to do the insane thing and just continue to use inappropriate and misleadign function callback names, and then encodign what the *real* action should be in the argument and/or in magic system-wide state parameters. So in that sense, it's certainly totally the same thing whether we call it ->shutdown or ->poweroff or ->eat_a_banana, since you could always just look at the argument and other clues, and decide that *this* time, for *this* kind of device, the "eat a banana" callback actually means that we should power it off, but wouldn't it be a lot more logical to just make it clear in the first place that they aren't ...
In fact we have acpi_pci_choose_state() that tells the driver which power state to put the device into in ->suspend(). If that is used, the device ends No. Again, if there are devices that wake us up from S4, but not from S5, they need to be handled differently in the *enter S4* case (hibernation) and Yes, it would. Still, the common thing is, it (ie. ->poweroff) _may_ want to To clarify, I agree that we should use different callbacks for hibernation. Well, I agree with that. As I said before, that's mainly because I've been busy with other stuff recently. Now, with the Alex's help, I'm hoping to take care of it soon. Thanks, Rafael --
First off, nobody should *ever* use that directly anyway. Secondly, the one that people should use ("pci_choose_state()") doesn't actually do what you claim it does. It does all kinds of wrong things, and And again, what does this have to do with (the example I used) the graphics hardware? Answer: nothing. The example I gave you we simply DO THE WRONG THING FOR. Same thing for things like USB devices - where pci_choose_state() doesn't work to begin with. Why do we call "suspend()" on such a thing when we don't want to suspend it? We shouldn't. We should call "freeze/unfreeze" (which are no-ops) and then finally perhaps "poweroff", and that final stage might want to spin things down or similar. But *none* of it has anything to do with suspend, and none of it has anything to do with pci_choose_state() (much less acpi_pci_choose_state) The fact is, we should let the driver decide, and we should make it clear to the driver writer what he is deciding about - rather than basically lie and say "suspend the device and put it into D3" even when that's the last thing it should ever do. Linus --
Well, if platform_pci_choose_state() is defined, pci_choose_state() returns its result and on ACPI systems that points to acpi_pci_choose_state(), so in I'm already convinced, really. :-) Thanks, Rafael --
Did you check closer? I repeat: acpi_pci_choose_state() (when called from pci_choose_state()) doesn't even look at the target 'state'. It just blindly assumes that you want the deepest sleep-state you can have. Which happens to be correct for normal suspend, but means that if you want to test other states (through '/sys/devices/.../power'), that sounds broken. I didn't check any closer, but go check it yourself. The short and sweet: acpi_pci_choose_state() totally ignores its 'state' argument. Do you really think that's correct? But yes, "pci_choose_state()' effectively does that too, apart from PM_EVENT_ON, which is never used. (But the whole and only point of pci_choose_state() was to do the PM_EVENT_FREEZE thing differently, which it doesn't do, so I think the real issue here is that the interface is really rather mis-designed) I suspect most people who ever really looked and worked on this code had a specific device in mind, and I'm sure that all of the code individually always ends up making sense from the standpoint of some specific device driver. It's just that it never seems to make sense from a bigger issues standpoint, and often seems senseless from the standpoint of other devices of other types. Linus --
acpi_pm_device_sleep_state() (that is called by acpi_pci_choose_state()) takes the target state directly from the ACPI layer. We just want to get rid of the argument passed to ->suspend() eventually, but there may be many _suspend_ states available (eg. "mem" and "standby") and for each of them there may be different constraints on the device's state. We have to tell the driver which device states are possible in the target system sleep state. Right now we arbitrarily choose the one with the lowest power This interface is not available any more (ie. there's only "wakeup" in You're wrong, sorry. With PM_EVENT_FREEZE it wouldn't even be necessary. It's there, because potentially there are many possibilities with PM_EVENT_SUSPEND and in fact it shouldn't even be used with PM_EVENT_FREEZE. All of this is more or less orthogonal to the issue at hand, which boils down to the fact that we use the _suspend_ callbacks for hibernation and we shouldn't be doing that. Thanks, Rafael --
.. Something I've never understood, is why we would ever want to bother with *S4* at all? I actually like hibernation (great for travelling), but I treat it as if it were a complete power-off (S5?). I pull batteries, unplug drives, boot other operating systems, etc.. And when I put it all back together again with the Linux disk inserted, I fully expect it to "resume" from the hibernation of 3 months ago. And it does. Why would I ever want anything less than a full poweroff for hibernation ???? Thanks. --
Hi. No. AFAICS, kexec is going to be more complex and ugly in many ways. To summarise, a kexec based hibernation is going to need the following additional requirements to just replace what we already have: - get the original kernel to allocate storage while racing against the rest of the system (currently allocation is done post-atomic copy & post-freezing - no racing). This makes it potentially slower, too; - get the original kernel to transfer the information about what swap was allocated to the kexec'd kernel, probably together with a lot of other information (which pages are nosave etc). - get the original kernel to keep memory free for the kexec'd kernel which would otherwise be usable. Not a biggy on desktops or laptops, but think about embedded. - people keep talking about hibernating to an ext3 fs mounted on fuse as a limitation of the freezer. To do that with kexec, you're still going to have to bmap the ext3 fs and pass the block list (in which case we can also do it without kexec) or umount all the ext3/fuse part and remount in the kexec'd kernel. Sort of defeats the purpose, doesn't it? I also wonder about how much of a pain it's going to be setting up userspace for this kexec'd kernel. Will you need a separate partition just for it? If not, will the userspace be loaded into memory all the time (more memory wasted for normal use), or loaded from ordinary partitions at kexec time (how to do safely? - more info to transfer between kernels?). I'd love it if kexec really was the panacea to the freezer issues, but problems like these make me think it isn't a viable solution. Nigel --
No, with a freezer-based model you can basically *never* suspend to anything related to FUSE or a userspace USB device or anything involving userspace iSCSI initiators or whatever. Sure, there are cases where moving away from the current model doesn't buy you anything, but that doesn't mean that the current model is a good thing. It's not. The You're looking at a tiny amount of memory when compared to current systems. It's really not a problem. -- Matthew Garrett | mjg59@srcf.ucam.org --
Hi. Putting drivers and filesystems in userspace is the fundamentally broken concept. Not just when it comes to the freezer. The whole idea is inherently racy. You can draw silly diagrams about how the freezer supposedly works in LCA slides and spread FUD as much as you like. In the end, though, it's not nearly as hit-and-miss as you say, and replacing the freezer with a kexec based freezer is only going to create Please, quantify 'tiny'. In embedded, 5MB can be too much. I've worked on embedded solutions. I'm not pulling problems out of thin air. Regards, Nigel --
Racy with regards to other things becides trying to suspend a machine? If so, what? thanks, greg k-h --
Hi. That depends on what sort of tangled web you want to weave. Low memory situations is one other situation that occurs to me quickly, especially (though not only) if your ability to swap were to depend upon a userspace driver and/or filesystem. Regards, Nigel --
Lots of them :) We have tanks running Linux using userspace USB drivers for vision control systems (scary, I know...) They seem to be successfully running for many years now, and I'm interested in making sure those kinds of things keep working. We also have laser welding robots with userspace PCI drivers in car manufacturing plants. And other laser cutting robots slicing wood in patterns moving at a rate of over 3 meters a second. Again, with userspace drivers and Linux. Those users would also love to know of any potential problems you know Sure, swap over a userspace filesystem or driver isn't a sane idea. And neither is swaping over NFS over a PPP connection attached to a USB to serial device. Yes, it's possible, and all in the kernel, but not a wise decision. Other than foolish configurations, if you come up with other issues surrounding userspace drivers that could cause problems, please let me know. thanks, greg k-h --
Hi Greg. A simple OOM condition isn't an issue? Surely a driver stalling because some of its memory gets swapped out just before it goes to use it would be a problem if it resulted in getting the length of a cut wrong or caused some distorted vision or a late turn :> Am I missing something? Maybe these drivers mlock memory to avoid those issues or something like that? Regards, Nigel --
I think the mlock their memory to prevent this from happening, it's not hard when you control all the applications on the box :) thanks, greg k-h --
I'm really not interested in debating the matter. There are all sorts of potential uses for the freezer, but hibernation isn't one of them. We *need* to get rid of the freezer for suspend to RAM (because a band-aid to ensure atomicity is kind of pointless when the operation you're entering is inherently atomic), and once all the drivers are able to deal with that then it's trivial to get rid of it for hibernation as well. Arguing that the reality of userspace drivers is broken doesn't Then the in-kernel solution has already lost anyway, and I'm desperately unconcerned about out of tree stuff. -- Matthew Garrett | mjg59@srcf.ucam.org --
Hi. Re suspend to ram, I agree. No argument there. Re hibernation, I think your assertion that it will be trivial to get rid of it for hibernation is just plain wrong. Perhaps you don't understand the issues as well as you think you do. Re arguing that the reality of userspace drivers is broken doesn't help here: Yeah, I know. But sometimes if you point out broken ideas for long enough, people do actually listen. Or you learn. Or both. Frankly, I don't want to debate the issue either. What I really want is just to have a hibernation implementation that works, is flexibile, reliable and quick, and one that I don't have to keep maintaining. Unfortunately for me, most people seem to be more concerned with fixing hypothetical problems than with giving users something they can actually I know. I'd submit it, or work on breaking it into pieces and submitting them one at a time, but that seems to me to be a waste of time. Nigel --
Okay, I think I'll just start sending patches for that, but rather not earlier than in the 2.6.27 time frame. No one else works on that and I've been busy with other things recently. Besides, I'm not even a full time kernel Yes, we can (hopefully). Thanks, Rafael --
Rafael, If I can help, please say so. Regards, --
On Wednesday 20 February 2008 at 3:29 pm, Linus Torvalds penned I can't say I even come close to understand what's going on but getting s2ram to work on my Dell M4300 has been a nightmare. Even after writing up how to get it to work (posted on the suspend-devel list - but no one answered .. yet again), I'm having some quirks. If I had a bizillion $'s, I'd buy an M4300 for Linus and give him a million to get it to s2ram! :p Cheers, -- Pablo Sanchez - Blueoak Database Engineering, Inc Ph: 819.459.1926 Toll free: 888.459.1926 Fax: 603.720.7723 (US) Text Page: pablo_p@blueoakdb.com --
Ok, can you give this patch a try with the 'platform' method? It should at
least tell us what ACPI would like the device to do at suspend time, but it
probably won't fix the hang.
Thanks,
Jesse
diff --git a/drivers/char/drm/i915_drv.c b/drivers/char/drm/i915_drv.c
index 4048f39..d8aa2c9 100644
--- a/drivers/char/drm/i915_drv.c
+++ b/drivers/char/drm/i915_drv.c
@@ -366,11 +366,11 @@ static int i915_suspend(struct drm_device *dev,
pm_message_t state)
i915_save_vga(dev);
- if (state.event == PM_EVENT_SUSPEND) {
- /* Shut down the device */
- pci_disable_device(dev->pdev);
- pci_set_power_state(dev->pdev, PCI_D3hot);
- }
+ /* Ask ACPI which state the device should be put in */
+ pci_disable_device(dev->pdev);
+ printk("calling pci_set_power_state with %d\n",
+ acpi_pci_choose_state(dev, state));
+ pci_set_power_state(dev->pdev, acpi_pci_choose_state(dev, state));
return 0;
}
@@ -380,7 +380,7 @@ static int i915_resume(struct drm_device *dev)
struct drm_i915_private *dev_priv = dev->dev_private;
int i;
- pci_set_power_state(dev->pdev, PCI_D0);
+ pci_set_power_state(dev->pdev, acpi_pci_choose_state(dev, state));
pci_restore_state(dev->pdev);
if (pci_enable_device(dev->pdev))
return -1;
--
I can't get it to compile. drivers/char/drm/i915_drv.c: In function 'i915_suspend': drivers/char/drm/i915_drv.c:372: error: implicit declaration of function 'acpi_pci_choose_state' drivers/char/drm/i915_drv.c: In function 'i915_resume': drivers/char/drm/i915_drv.c:383: error: 'state' undeclared (first use in this function) drivers/char/drm/i915_drv.c:383: error: (Each undeclared identifier is reported only once drivers/char/drm/i915_drv.c:383: error: for each function it appears in.) make[3]: *** [drivers/char/drm/i915_drv.o] Error 1 make[2]: *** [drivers/char/drm] Error 2 make[1]: *** [drivers/char] Error 2 make: *** [drivers] Error 2 Thanks, Jeff. --
And this change should just be reverted (leave it as PCI_D0). Thanks, Jesse --
drivers/char/drm/i915_drv.c: In function 'i915_suspend': drivers/char/drm/i915_drv.c:372: warning: passing argument 1 of 'pci_choose_state' from incompatible pointer type drivers/char/drm/i915_drv.c:373: warning: passing argument 1 of 'pci_choose_state' from incompatible pointer type I hope those are just warning that can just be ignored. Ok, rebooting and will get back shortly. Thanks, Jeff. --
Oops again, should be dev->pdev. Silly DRM layer obfuscation. Jesse --
I was just about to write that the test didn't work. Both std str hangs even before attempting to suspend. Anyway, I'm compiling and rebooting now. Thanks, Jeff. --
It says "calling pci_set_power_state with 3". Then after all then it still hangs, and then resume with Mr Green. PM: Syncing filesystems ... done. Freezing user space processes ... (elapsed 0.00 seconds) done. Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done. PM: Shrinking memory... ^H-^Hdone (0 pages freed) PM: Freed 0 kbytes in 0.20 seconds (0.00 MB/s) ACPI: Preparing to enter system sleep state S4 Suspending console(s) sd 0:0:0:0: [sda] Synchronizing SCSI cache drm_sysfs_suspend ACPI: PCI interrupt for device 0000:00:02.0 disabled calling pci_set_power_state with 3 ACPI: PCI interrupt for device 0000:00:1d.7 disabled ACPI: PCI interrupt for device 0000:00:1d.3 disabled ACPI: PCI interrupt for device 0000:00:1d.2 disabled ACPI: PCI interrupt for device 0000:00:1d.1 disabled ACPI: PCI interrupt for device 0000:00:1d.0 disabled ACPI: PCI interrupt for device 0000:00:1b.0 disabled Disabling non-boot CPUs ... PM: Creating hibernation image: PM: Need to copy 25136 pages tick-braodcast: ignoring broadcast for offline CPU #1 PM: Writing back config space on device 0000:00:02.0 at offset 1 (was 900007, writing 900003) ACPI: PCI Interrupt 0000:00:1b.0[B] -> GSI 17 (level, low) -> IRQ 17 PCI: Setting latency timer of device 0000:00:1b.0 to 64 PCI: Setting latency timer of device 0000:00:1c.0 to 64 PCI: Setting latency timer of device 0000:00:1c.1 to 64 ... Thanks, Jeff. --
So it returns the right value. Jeff, Jesse, please check one thing for me. Please boot 2.6.25-rc2 (or better, the current head of the Linus' tree) with no_console_suspend and try to do the following: # echo 8 > /proc/sys/kernel/printk # echo core > /sys/power/pm_test # echo disk > /sys/power/state (that will run a test of the freeze/unfreeze code without creating the image) and then # echo mem > /sys/power/state (that will run a test of the suspend/resume code without actually suspending). I'd like to know if that works. Thanks, Rafael --
That comes back for me, without creating the green screen. There's a long delay between it saying "entering S4" and actually resuming back to my This also works (after doing the echo disk > ...) above. There's still a delay between "entering S3" and the resume to my console though. Jesse --
There's an intentional 5 sec. wait. If the delay is longer that 5 sec., that's a
If that's 5 sec., it's fine.
Please apply the appended patch and try to hibernate. I wonder if you get the
reboot or it hangs earlier.
Thanks,
Rafael
---
kernel/power/disk.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -405,11 +405,7 @@ int hibernation_platform_enter(void)
local_irq_disable();
error = device_power_down(PMSG_SUSPEND);
- if (!error) {
- hibernation_ops->enter();
- /* We should never get here */
- while (1);
- }
+ mdelay(1000);
local_irq_enable();
/*
@@ -424,6 +420,7 @@ int hibernation_platform_enter(void)
resume_console();
Close:
hibernation_ops->end();
+ kernel_restart(NULL);
return error;
}
--
Below is a patch that should work around the issue. Please try it and let me
know if it helps.
Thanks,
Rafael
---
drivers/char/drm/i915_drv.c | 3 +++
include/linux/suspend.h | 2 ++
kernel/power/disk.c | 9 ++++++++-
3 files changed, 13 insertions(+), 1 deletion(-)
Index: linux-2.6/include/linux/suspend.h
===================================================================
--- linux-2.6.orig/include/linux/suspend.h
+++ linux-2.6/include/linux/suspend.h
@@ -209,6 +209,7 @@ extern unsigned long get_safe_page(gfp_t
extern void hibernation_set_ops(struct platform_hibernation_ops *ops);
extern int hibernate(void);
+extern bool in_hibernation_power_off(void);
#else /* CONFIG_HIBERNATION */
static inline int swsusp_page_is_forbidden(struct page *p) { return 0; }
static inline void swsusp_set_page_free(struct page *p) {}
@@ -216,6 +217,7 @@ static inline void swsusp_unset_page_fre
static inline void hibernation_set_ops(struct platform_hibernation_ops *ops) {}
static inline int hibernate(void) { return -ENOSYS; }
+static inline bool in_hibernation_power_off(void) { return false; }
#endif /* CONFIG_HIBERNATION */
#ifdef CONFIG_PM_SLEEP
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -24,7 +24,7 @@
#include "power.h"
-
+static bool entering_sleep_state;
static int noresume = 0;
static char resume_file[256] = CONFIG_PM_STD_PARTITION;
dev_t swsusp_resume_device;
@@ -381,6 +381,7 @@ int hibernation_platform_enter(void)
if (!hibernation_ops)
return -ENOSYS;
+ entering_sleep_state = true;
/*
* We have cancelled the power transition by running
* hibernation_ops->finish() before saving the image, so we should let
@@ -412,6 +413,7 @@ int hibernation_platform_enter(void)
}
local_irq_enable();
+ entering_sleep_state = false;
/*
* We don't need to reenable the ...I ended up applying the below patch instead, so it would build, and
unfortunately it still hung at suspend time.
So at this point, the known workarounds to the hang at suspend time are to
remove the device power down call or to boot with 'no_console_suspend'.
The 'screen turns green' problem is fixed by the extra 'inb' added in the
patch below (at least for me).
Jesse
diff --git a/drivers/char/drm/i915_drv.c b/drivers/char/drm/i915_drv.c
index 35758a6..35b5a60 100644
--- a/drivers/char/drm/i915_drv.c
+++ b/drivers/char/drm/i915_drv.c
@@ -27,6 +27,7 @@
*
*/
+#include <linux/suspend.h>
#include "drmP.h"
#include "drm.h"
#include "i915_drm.h"
@@ -222,6 +223,7 @@ static void i915_restore_vga(struct drm_device *dev)
dev_priv->saveGR[0x18]);
/* Attribute controller registers */
+ inb(st01); /* switch back to index mode */
for (i = 0; i < 20; i++)
i915_write_ar(st01, i, dev_priv->saveAR[i], 0);
inb(st01); /* switch back to index mode */
@@ -249,6 +251,9 @@ static int i915_suspend(struct drm_device *dev)
return -ENODEV;
}
+ if (in_hibernation_power_off())
+ return 0;
+
pci_save_state(dev->pdev);
pci_read_config_byte(dev->pdev, LBB, &dev_priv->saveLBB);
@@ -364,7 +369,6 @@ static int i915_suspend(struct drm_device *dev)
i915_save_vga(dev);
/* Shut down the device */
- pci_disable_device(dev->pdev);
pci_set_power_state(dev->pdev, PCI_D3hot);
return 0;
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index 1d7d4c5..58d9f67 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -209,6 +209,7 @@ extern unsigned long get_safe_page(gfp_t gfp_mask);
extern void hibernation_set_ops(struct platform_hibernation_ops *ops);
extern int hibernate(void);
+extern bool in_hibernation_power_off(void);
#else /* CONFIG_HIBERNATION */
static inline int swsusp_page_is_forbidden(struct page *p) { return 0; }
static inline void swsusp_set_page_free(struct page *p) {}
@@ -216,6 +217,7 ...This thing should make i915_suspend() a noop in the last phase of hibernation,
so if it still only works when you remove the
pci_set_power_state(dev->pdev, PCI_D3hot), then I don't get it.
Can you please try the pach below instead?
Thanks,
Rafael
I ended up applying the below patch instead, so it would build, and
unfortunately it still hung at suspend time.
So at this point, the known workarounds to the hang at suspend time are to
remove the device power down call or to boot with 'no_console_suspend'.
The 'screen turns green' problem is fixed by the extra 'inb' added in the
patch below (at least for me).
Jesse
---
drivers/char/drm/i915_drv.c | 5 +++--
include/linux/suspend.h | 2 ++
kernel/power/disk.c | 10 +++++++++-
3 files changed, 14 insertions(+), 3 deletions(-)
Index: linux-2.6/drivers/char/drm/i915_drv.c
===================================================================
--- linux-2.6.orig/drivers/char/drm/i915_drv.c
+++ linux-2.6/drivers/char/drm/i915_drv.c
@@ -27,6 +27,7 @@
*
*/
+#include <linux/suspend.h>
#include "drmP.h"
#include "drm.h"
#include "i915_drm.h"
@@ -222,6 +223,7 @@ static void i915_restore_vga(struct drm_
dev_priv->saveGR[0x18]);
/* Attribute controller registers */
+ inb(st01); /* switch back to index mode */
for (i = 0; i < 20; i++)
i915_write_ar(st01, i, dev_priv->saveAR[i], 0);
inb(st01); /* switch back to index mode */
@@ -366,9 +368,8 @@ static int i915_suspend(struct drm_devic
i915_save_vga(dev);
- if (state.event == PM_EVENT_SUSPEND) {
+ if (state.event == PM_EVENT_SUSPEND && !in_hibernation_power_off()) {
/* Shut down the device */
- pci_disable_device(dev->pdev);
pci_set_power_state(dev->pdev, PCI_D3hot);
}
Index: linux-2.6/include/linux/suspend.h
===================================================================
--- linux-2.6.orig/include/linux/suspend.h
+++ linux-2.6/include/linux/suspend.h
@@ -209,6 +209,7 @@ extern ...I encountered the same patching problem, but realized that it was due to earlier patch that you had wanted me to test, so if you revert your patch back to the current git, Rafael's patch will apply and compile cleanly. Thanks, Jeff. --
Jeff, for the hang on suspend problem, I know suspect something else in 2.6.25-rc2 caused that. Can you try the 2.6.25-rc1 version of i915_drv.c (in fact all of drivers/char/drm from 2.6.25-rc1) but in a 2.6.25-rc2 kernel? I ask because 2.6.25-rc1 suspends to disk just fine for me and resumes w/o a green screen, while 2.6.25-rc2 fails to suspend (hangs like you say) and gives me a green screen. Were there other changes in ACPI or the PM core that might have caused this I wonder? Thanks, Jesse --
Looks like 2.6.25-rc1 also had broken suspend (my test was broken). IIRC, Dave and I had it working at LCA using the out of tree DRM modules on 2.6.23.14 or 15... Maybe you could give that a try? Thanks, Jesse --
And just to confirm that, I just tested the current DRM modules against a 2.6.23.15 kernel. It suspends to disk correctly (w/o a hang) and doesn't give me a green screen, so something in 2.6.25 must be causing that (even 2.6.25-rc1 seems to have the problem). Also, this patch against 2.6.25-rc1 seemed to prevent the 'green screen' problem. 2.6.25-rc2 already has part of it... Anyway, let me know how your testing goes. Thanks, Jesse --
In 2.6.23.x there's no second ->suspend() during hibernation, so no wonder. I'll figure out how to work around this issue in the current mainline, but a real fix will only be possible when we have separate callbacks for hibernation. Thanks, Rafael --
In 2.6.23 it's just: ->suspend() ->resume() *S4* ? I ask because we still do the D3hot call in the DRM tree, so the hang should Ok, thanks. Jesse --
->shutdown() (that breaks wake up from S4 with many devices, including but not limited to Thanks, Rafael --
.. Does this machine have more than one CPU core? If so.. Does your kernel have CONFIG_HOTPLUG_CPU=y (if not, enable it). ?? --
