Re: 2.6.35.5: hibernation broken... AGAIN

Previous thread: [PATCH] fbdev: sh_mipi_dsi: Require two I/O resources V2 by Magnus Damm on Wednesday, November 17, 2010 - 2:53 am. (1 message)

Next thread: Re: [PATCH] hwmon: (lm95241) Rewritten without using macros by Davide Rizzo on Wednesday, November 17, 2010 - 4:05 am. (4 messages)
From: Ondrej Zary
Date: Wednesday, November 17, 2010 - 3:39 am

Hello,
the nasty memory-corrupting hibernation bug 
https://bugzilla.kernel.org/show_bug.cgi?id=15753 is back since 2.6.35.5.
2.6.35.4 works fine, 2.6.35.5 crashes after two days.

It seems to be caused by b77c254d8d66e5e9aa81239fedba9f3d568097d9.

-- 
Ondrej Zary
--

From: Rafael J. Wysocki
Date: Wednesday, November 17, 2010 - 1:53 pm

Can you, _please_, always put all of the basic commit info (subject, author,
who signed it off) in bug reports?

In this particular case I don't have a -stable tree handy at the moment and
can't even see what the commit is.

Thanks,
Rafael
--

From: Andrew Morton
Date: Wednesday, November 17, 2010 - 2:04 pm

On Wed, 17 Nov 2010 21:53:52 +0100

You can simply do a google search for the commit ID.  It is


commit b77c254d8d66e5e9aa81239fedba9f3d568097d9
Author: Hugh Dickins <hughd@google.com>
Date:   Thu Sep 9 16:38:09 2010 -0700

    swap: prevent reuse during hibernation
 
--

From: Rafael J. Wysocki
Date: Wednesday, November 17, 2010 - 2:12 pm

So I guess we should Cc Hugh ...
--

From: Ondrej Zary
Date: Wednesday, November 17, 2010 - 2:18 pm

Oops, my bad. I was in hurry and somehow missed that he's not on the Cc list.

-- 
Ondrej Zary
--

From: Hugh Dickins
Date: Wednesday, November 17, 2010 - 4:46 pm

Embarrassing: I suspect that I've been confused, not for the first
time, by the fork-like nature of hibernation and its images.
I wonder if this patch below fixes it, Ondrej?

(And is it kernel swsusp or user swsusp that you're using?  May not
matter at all, but will help us to think more clearly about it,
if the corruption remains after this patch.)

Rafael, do you agree that this patch was actually required even for
your original commit 452aa6999e6703ffbddd7f6ea124d3968915f3e3
mm/pm: force GFP_NOIO during suspend/hibernation and resume?

Or am I still just as confused?  Or if not, are there more forking
places which require a similar patch?

Not signing it off yet,
Hugh

---

 kernel/power/hibernate.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

--- 2.6.37-rc2/kernel/power/hibernate.c	2010-11-01 13:01:31.000000000 -0700
+++ linux/kernel/power/hibernate.c	2010-11-17 15:23:36.000000000 -0800
@@ -348,16 +348,19 @@ int hibernation_snapshot(int platform_mo
 		goto Recover_platform;
 
 	error = create_image(platform_mode);
-	/* Control returns here after successful restore */
+	/* Control returns here after successful restore,
+	 * and also after creating the image in memory (or failing to do so).
+	 */
 
  Resume_devices:
 	/* We may need to release the preallocated image pages here. */
-	if (error || !in_suspend)
+	if (error || !in_suspend) {
 		swsusp_free();
+		set_gfp_allowed_mask(saved_mask);
+	}
 
 	dpm_resume_end(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
-	set_gfp_allowed_mask(saved_mask);
 	resume_console();
  Close:
 	platform_end(platform_mode);
--

From: Ondrej Zary
Date: Saturday, November 20, 2010 - 8:07 am

It seems to work with this patch. 2nd day and no crashes yet.
2.6.35.4 is OK
2.6.35.4 + b77c254d8d66e5e9aa81239fedba9f3d568097d9 is bad




-- 
Ondrej Zary
--

From: Ondrej Zary
Date: Friday, November 26, 2010 - 4:43 am

-- 
Ondrej Zary
--

From: Rafael J. Wysocki
Date: Friday, November 26, 2010 - 4:10 pm

Can you check if the problem is also fixed by the patch below, please?

Rafael

---
 kernel/power/hibernate.c |   36 ++++++++++++++++++++++++++----------
 kernel/power/power.h     |    1 +
 kernel/power/user.c      |    1 +
 3 files changed, 28 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/power/hibernate.c
===================================================================
--- linux-2.6.orig/kernel/power/hibernate.c
+++ linux-2.6/kernel/power/hibernate.c
@@ -29,6 +29,21 @@
 #include "power.h"
 
 
+static gfp_t saved_gfp_mask;
+
+static void hibernate_restrict_gfp_mask(void)
+{
+	saved_gfp_mask = clear_gfp_allowed_mask(GFP_IOFS);
+}
+
+void hibernate_restore_gfp_mask(void)
+{
+	if (saved_gfp_mask) {
+		set_gfp_allowed_mask(saved_gfp_mask);
+		saved_gfp_mask = 0;
+	}
+}
+
 static int nocompress = 0;
 static int noresume = 0;
 static char resume_file[256] = CONFIG_PM_STD_PARTITION;
@@ -327,7 +342,6 @@ static int create_image(int platform_mod
 int hibernation_snapshot(int platform_mode)
 {
 	int error;
-	gfp_t saved_mask;
 
 	error = platform_begin(platform_mode);
 	if (error)
@@ -339,7 +353,7 @@ int hibernation_snapshot(int platform_mo
 		goto Close;
 
 	suspend_console();
-	saved_mask = clear_gfp_allowed_mask(GFP_IOFS);
+	hibernate_restrict_gfp_mask();
 	error = dpm_suspend_start(PMSG_FREEZE);
 	if (error)
 		goto Recover_platform;
@@ -348,7 +362,10 @@ int hibernation_snapshot(int platform_mo
 		goto Recover_platform;
 
 	error = create_image(platform_mode);
-	/* Control returns here after successful restore */
+	/*
+	 * Control returns here (1) after the image has been created or the
+	 * image creation has failed and (2) after a successful restore.
+	 */
 
  Resume_devices:
 	/* We may need to release the preallocated image pages here. */
@@ -357,7 +374,10 @@ int hibernation_snapshot(int platform_mo
 
 	dpm_resume_end(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
-	set_gfp_allowed_mask(saved_mask);
+
+	if ...
From: Ondrej Zary
Date: Saturday, November 27, 2010 - 7:58 am

The patch does not apply to 2.6.35.4, 2.6.35.5 and also 2.6.36. What version 



-- 
Ondrej Zary
--

From: Rafael J. Wysocki
Date: Saturday, November 27, 2010 - 1:47 pm

The patch was against the current mainline.

The one below was rebased on top of 2.6.36, so please test it with this kernel.

Thanks,
Rafael

---
 kernel/power/hibernate.c |   36 ++++++++++++++++++++++++++----------
 kernel/power/power.h     |    1 +
 kernel/power/user.c      |    2 ++
 3 files changed, 29 insertions(+), 10 deletions(-)

Index: suspend-2.6/kernel/power/hibernate.c
===================================================================
--- suspend-2.6.orig/kernel/power/hibernate.c
+++ suspend-2.6/kernel/power/hibernate.c
@@ -29,6 +29,21 @@
 #include "power.h"
 
 
+static gfp_t saved_gfp_mask;
+
+static void hibernate_restrict_gfp_mask(void)
+{
+	saved_gfp_mask = clear_gfp_allowed_mask(GFP_IOFS);
+}
+
+void hibernate_restore_gfp_mask(void)
+{
+	if (saved_gfp_mask) {
+		set_gfp_allowed_mask(saved_gfp_mask);
+		saved_gfp_mask = 0;
+	}
+}
+
 static int noresume = 0;
 static char resume_file[256] = CONFIG_PM_STD_PARTITION;
 dev_t swsusp_resume_device;
@@ -326,7 +341,6 @@ static int create_image(int platform_mod
 int hibernation_snapshot(int platform_mode)
 {
 	int error;
-	gfp_t saved_mask;
 
 	error = platform_begin(platform_mode);
 	if (error)
@@ -338,7 +352,7 @@ int hibernation_snapshot(int platform_mo
 		goto Close;
 
 	suspend_console();
-	saved_mask = clear_gfp_allowed_mask(GFP_IOFS);
+	hibernate_restrict_gfp_mask();
 	error = dpm_suspend_start(PMSG_FREEZE);
 	if (error)
 		goto Recover_platform;
@@ -347,7 +361,10 @@ int hibernation_snapshot(int platform_mo
 		goto Recover_platform;
 
 	error = create_image(platform_mode);
-	/* Control returns here after successful restore */
+	/*
+	 * Control returns here (1) after the image has been created or the
+	 * image creation has failed and (2) after a successful restore.
+	 */
 
  Resume_devices:
 	/* We may need to release the preallocated image pages here. */
@@ -356,7 +373,10 @@ int hibernation_snapshot(int platform_mo
 
 	dpm_resume_end(in_suspend ?
 		(error ? PMSG_RECOVER : ...
From: Hugh Dickins
Date: Tuesday, November 30, 2010 - 3:16 pm

Sorry for going quiet for so long.

Thanks for testing, Ondrej.  I completely agree with Rafael that my
patch was inadequate: it turned out to be good enough for your useful
feedback on the main path through hibernation, but missed a number
of details.  We do need something like Rafael's patch - thanks.

(And I realize that it's tiresome and frustrating for you to be
trying first that and then this etc; but we had bugs here precisely
because this is a confusing area, so I think it is worth some effort


Trivial point, I suppose, but it bothers me that PM is accumulating
wrappers around wrappers around gfp_allowed_mask.  Looks like
clear_gfp_allowed_mask and set_gfp_allowed_mask (oddly asymmetrical)
were not really what we need.  How about scrapping them, and putting


I'm worried that it's hard to find and maintain the places that need
this restoration - and if we miss one, we won't find out about it for
ages.  It would help a lot if the gfp restoration always accompanies
some other essential stage - thaw_processes() looks to be right,
so could we skip conditionally restoring here if we do it then?

I suggest that in part because I cannot find where you restore in
the case that hibernate()'s swsusp_write() fails - that was the


But this could be left until software_resume()'s thaw_processes()?



Right, here you have one next to thaw_processes().
But the SNAPSHOT ioctls are a nightmarish maze to me,

That might be good safety, but it does strongly suggest that you

Hugh
--

From: Rafael J. Wysocki
Date: Tuesday, November 30, 2010 - 4:07 pm

Good point.  I'm not totally sure if coupling this with thaw_processes() is
the right thing yet, but I guess it is, except for the s2disk-specific things


Well, sorry about that.  A few of them should just be dropped, but they are
lingering so that oldish user space works with the new kernels.


This is because of how s2disk works.  If it fails to allocate enough swap,
it tries again with minimum image size without thawing the other tasks.
Now, because hibernation_snapshot() saves the "original" GFP mask, it has

Thanks,
Rafael
--

From: Rafael J. Wysocki
Date: Tuesday, November 30, 2010 - 5:38 pm

Below is an updated patch in which I tried to address your comments.

I didn't find it very useful to couple pm_restore_gfp_mask() with the thawing
of tasks, but nevertheless I think all of the spots where it's needed are
covered now.

The patch has only been compile-tested for now, so caveat emptor.

Thanks,
Rafael

---
 include/linux/gfp.h      |    4 ++--
 kernel/power/hibernate.c |   22 ++++++++++++----------
 kernel/power/suspend.c   |    5 ++---
 kernel/power/user.c      |    2 ++
 mm/page_alloc.c          |   18 +++++++++++-------
 5 files changed, 29 insertions(+), 22 deletions(-)

Index: linux-2.6/kernel/power/hibernate.c
===================================================================
--- linux-2.6.orig/kernel/power/hibernate.c
+++ linux-2.6/kernel/power/hibernate.c
@@ -327,7 +327,6 @@ static int create_image(int platform_mod
 int hibernation_snapshot(int platform_mode)
 {
 	int error;
-	gfp_t saved_mask;
 
 	error = platform_begin(platform_mode);
 	if (error)
@@ -339,7 +338,7 @@ int hibernation_snapshot(int platform_mo
 		goto Close;
 
 	suspend_console();
-	saved_mask = clear_gfp_allowed_mask(GFP_IOFS);
+	pm_restrict_gfp_mask();
 	error = dpm_suspend_start(PMSG_FREEZE);
 	if (error)
 		goto Recover_platform;
@@ -348,7 +347,10 @@ int hibernation_snapshot(int platform_mo
 		goto Recover_platform;
 
 	error = create_image(platform_mode);
-	/* Control returns here after successful restore */
+	/*
+	 * Control returns here (1) after the image has been created or the
+	 * image creation has failed and (2) after a successful restore.
+	 */
 
  Resume_devices:
 	/* We may need to release the preallocated image pages here. */
@@ -357,7 +359,10 @@ int hibernation_snapshot(int platform_mo
 
 	dpm_resume_end(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
-	set_gfp_allowed_mask(saved_mask);
+
+	if (error || !in_suspend)
+		pm_restore_gfp_mask();
+
 	resume_console();
  Close:
 	platform_end(platform_mode);
@@ -452,17 ...
From: KAMEZAWA Hiroyuki
Date: Tuesday, November 30, 2010 - 5:53 pm

On Wed, 1 Dec 2010 01:38:53 +0100



	if (atomic_dec_return(&gfp_mask_save_mode_counter))

	if (atomic_inc_return(&gfp_mask_save_mode_counter) > 1)
		WARN_ONCE()

or some ?

Thanks,
-Kame

--

From: Rafael J. Wysocki
Date: Wednesday, December 1, 2010 - 2:25 pm

What exactly would that be useful for?

Rafael
--

From: Rafael J. Wysocki
Date: Wednesday, December 1, 2010 - 3:23 pm

Please note that pm_restore_gfp_mask() can be legitimately called before
pm_restrict_gfp_mask() via the SNAPSHOT_CREATE_IMAGE hibernate ioctl, so the
test you're suggesting wouldn't really work.

Thanks,
Rafael
--

From: KAMEZAWA Hiroyuki
Date: Wednesday, December 1, 2010 - 4:37 pm

On Wed, 1 Dec 2010 23:23:31 +0100

Hm, I just wonder some tests not for breaking gfp_allowed_mask by
 - double call
 - forget to restore
That will be fatal.

I'm not very interested in implementation detail

Thanks,
-Kame



--

From: Rafael J. Wysocki
Date: Wednesday, December 1, 2010 - 5:00 pm

That's easy.  It's sufficient to do WARN_ON(saved_gfp_mask) at the beginning
of pm_restrict_gfp_mask().  This also takes care of the "forget to restore"
case to some extent, because pm_restrict_gfp_mask() will generally be only

This is kind of difficult in general.  You can't detect a missing
pm_restore_gfp_mask() other than by checking if pm_restrict_gfp_mask() is not


Well ...

Thanks,
Rafael
--

From: Hugh Dickins
Date: Tuesday, November 30, 2010 - 6:21 pm

So far as I can see, yes, they are: I'm still a bit worried about
future changes leaving the wrong gfp mask behind by mistake;
but I trust you more than I trust me on that, so you can add my
Acked-by: Hugh Dickins <hughd@google.com> when it comes to
sending in the patch.

Thanks,
--

From: Rafael J. Wysocki
Date: Wednesday, December 1, 2010 - 1:57 pm

Thanks a lot!

Andrew, are the changes in page_alloc.c fine from your standpoint?


--

From: Ondrej Zary
Date: Monday, December 6, 2010 - 2:09 pm

-- 
Ondrej Zary
--

From: Rafael J. Wysocki
Date: Monday, December 6, 2010 - 3:57 pm

Thanks for testing!


--

From: Rafael J. Wysocki
Date: Friday, November 26, 2010 - 1:24 pm

(Sorry for the late response).

Well, not exactly.

First, IMO set_gfp_allowed_mask(saved_mask) should not be called before
dpm_resume_end(), because IO allocations may try to use suspended devices in
theory (although that's not likely due to the swsusp_free() before).  So the
right thing to do appears to be:

if (error || !in_suspend) {
  		swsusp_free();

dpm_resume_end(in_suspend ?
  		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);

if (error || !in_suspend)
  		set_gfp_allowed_mask(saved_mask);

resume_console();

Second, since we don't call set_gfp_allowed_mask(saved_mask) in the
(in_suspend && !error) case, hibernation_platform_enter() also should be
updated (I think we don't need to use set_gfp_allowed_mask() in it at all).

There's one more subtlety.  Namely, the saving of an image by s2disk
may be aborted and in that case we need to restore the original
gfp_allowed_mask too, but I need to look deeper into the code to see how to do
it cleanly.

Thanks,
Rafael
--

Previous thread: [PATCH] fbdev: sh_mipi_dsi: Require two I/O resources V2 by Magnus Damm on Wednesday, November 17, 2010 - 2:53 am. (1 message)

Next thread: Re: [PATCH] hwmon: (lm95241) Rewritten without using macros by Davide Rizzo on Wednesday, November 17, 2010 - 4:05 am. (4 messages)