Re: [PATCH 3/3] PM: Do not destroy/create devices while suspended in cpuid.c

Previous thread: none

Next thread: Re: 2.6.24-rc6-mm1 by Herbert Xu on Sunday, December 23, 2007 - 9:25 pm. (12 messages)
To: pm list <linux-pm@...>
Cc: ACPI Devel Maling List <linux-acpi@...>, Alan Stern <stern@...>, Andrew Morton <akpm@...>, Len Brown <lenb@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Ingo Molnar <mingo@...>
Date: Sunday, December 23, 2007 - 8:55 pm

Hi,

Some device drivers register CPU hotplug notifiers and use them to destroy
device objects when removing the corresponding CPUs and to create these objects
when adding the CPUs back.

Unfortunately, this is not the right thing to do during suspend/hibernation,
since in that cases the CPU hotplug notifiers are called after suspending
devices and before resuming them, so the operations in question are carried
out on the objects representing suspended devices which shouldn't be
unregistered behing the PM core's back. Although right now it usually doesn't
lead to any practical complications, it will predictably deadlock if
gregkh-driver-pm-acquire-device-locks-prior-to-suspending.patch is applied.

The solution is to prevent drivers from removing/adding devices from within
CPU hotplug notifiers during suspend/hibernation using the FROZEN bit
in the notifier's action argument. The following three patches modify the
MSR, x86-64 MCE and cpuid drivers along these lines.

Thanks,
Rafael

--

To: Rafael J. Wysocki <rjw@...>
Cc: pm list <linux-pm@...>, ACPI Devel Maling List <linux-acpi@...>, Andrew Morton <akpm@...>, Len Brown <lenb@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Ingo Molnar <mingo@...>
Date: Monday, December 24, 2007 - 11:51 am

Do we need to worry about the possibility that when the system wakes up
from hibernation, the set of usable CPUs might be smaller than it was
beforehand? Is any special handling needed for this, or is it already
accounted for?

Alan Stern

--

To: Alan Stern <stern@...>
Cc: pm list <linux-pm@...>, ACPI Devel Maling List <linux-acpi@...>, Andrew Morton <akpm@...>, Len Brown <lenb@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Ingo Molnar <mingo@...>
Date: Tuesday, December 25, 2007 - 12:21 pm

Hm, well. The cleanest thing would be to allow the drivers to remove the
device objects on CPU_UP_CANCELED_FROZEN, which means that we weren't able to
bring the CPU up during a resume, but still that will deadlock with
gregkh-driver-pm-acquire-device-locks-prior-to-suspending.patch.

Greetings,
Rafael

--

To: Alan Stern <stern@...>
Cc: pm list <linux-pm@...>, ACPI Devel Maling List <linux-acpi@...>, Andrew Morton <akpm@...>, Len Brown <lenb@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Ingo Molnar <mingo@...>
Date: Tuesday, December 25, 2007 - 3:21 pm

Hmm. In principle, device objects may be destroyed on CPU_UP_CANCELED_FROZEN
without acquiring the device locks, since in fact we know these objects won't
be accessed concurrently at that time (the locks are already held by the PM
core, but the PM core is not going to actually access the devices before the
subsequent resume).

Comments?

Thanks,
Rafael
--

To: Rafael J. Wysocki <rjw@...>
Cc: pm list <linux-pm@...>, ACPI Devel Maling List <linux-acpi@...>, Andrew Morton <akpm@...>, Len Brown <lenb@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Ingo Molnar <mingo@...>
Date: Tuesday, December 25, 2007 - 11:33 pm

How about delaying the CPU_UP_CANCELED_FROZEN announcements until it's
really safe to send them out? That is, after all devices have been
resumed and the PM core no longer holds any of their locks. (Should
this be before or after tasks leave the freezer? -- I'm not sure.)

So the idea is send appropriate announcements at the usual time for
CPUs that do come back up normally, and don't send anything right away
for CPUs that fail to come up. Just keep track of which ones failed,
and then later take care of them.

Alan Stern

--

To: Alan Stern <stern@...>
Cc: pm list <linux-pm@...>, ACPI Devel Maling List <linux-acpi@...>, Andrew Morton <akpm@...>, Len Brown <lenb@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Ingo Molnar <mingo@...>
Date: Wednesday, December 26, 2007 - 11:12 am

However, we don't want to execute .resume() for device objects that correspond
to the "dead" CPUs, so to a minimum we should remove them from the dpm_off
list on CPU_UP_CANCELED_FROZEN. For this purpose, we can define a
callback that will remove the device from dpm_off immediately and schedule its
destruction after all devices have been resumed.

Rafael
--

To: Alan Stern <stern@...>
Cc: Rafael J. Wysocki <rjw@...>, pm list <linux-pm@...>, ACPI Devel Maling List <linux-acpi@...>, Andrew Morton <akpm@...>, Len Brown <lenb@...>, LKML <linux-kernel@...>, Ingo Molnar <mingo@...>
Date: Tuesday, December 25, 2007 - 8:33 am

That should not happen... but it does in some error cases.... so
handling it would be a bonus.

Waking up with one cpu out of 8 is bad, but still way better than not
waking up at all ;-).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--

To: pm list <linux-pm@...>
Cc: ACPI Devel Maling List <linux-acpi@...>, Alan Stern <stern@...>, Andrew Morton <akpm@...>, Len Brown <lenb@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Ingo Molnar <mingo@...>
Date: Sunday, December 23, 2007 - 8:56 pm

From: Rafael J. Wysocki <rjw@sisk.pl>

The MSR driver should not attempt to destroy/create a suspended
device.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
arch/x86/kernel/msr.c | 3 ---
1 file changed, 3 deletions(-)

Index: linux-2.6/arch/x86/kernel/msr.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/msr.c
+++ linux-2.6/arch/x86/kernel/msr.c
@@ -155,13 +155,10 @@ static int __cpuinit msr_class_cpu_callb

switch (action) {
case CPU_UP_PREPARE:
- case CPU_UP_PREPARE_FROZEN:
err = msr_device_create(cpu);
break;
case CPU_UP_CANCELED:
- case CPU_UP_CANCELED_FROZEN:
case CPU_DEAD:
- case CPU_DEAD_FROZEN:
msr_device_destroy(cpu);
break;
}

--

To: Rafael J. Wysocki <rjw@...>
Cc: pm list <linux-pm@...>, ACPI Devel Maling List <linux-acpi@...>, Alan Stern <stern@...>, Andrew Morton <akpm@...>, Len Brown <lenb@...>, LKML <linux-kernel@...>, Ingo Molnar <mingo@...>
Date: Tuesday, December 25, 2007 - 8:33 am

To: pm list <linux-pm@...>
Cc: ACPI Devel Maling List <linux-acpi@...>, Alan Stern <stern@...>, Andrew Morton <akpm@...>, Len Brown <lenb@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Ingo Molnar <mingo@...>
Date: Sunday, December 23, 2007 - 8:57 pm

From: Rafael J. Wysocki <rjw@sisk.pl>

The cpuid driver should not attempt to destroy/create a suspended
device.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
arch/x86/kernel/cpuid.c | 3 ---
1 file changed, 3 deletions(-)

Index: linux-2.6/arch/x86/kernel/cpuid.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpuid.c
+++ linux-2.6/arch/x86/kernel/cpuid.c
@@ -157,13 +157,10 @@ static int __cpuinit cpuid_class_cpu_cal

switch (action) {
case CPU_UP_PREPARE:
- case CPU_UP_PREPARE_FROZEN:
err = cpuid_device_create(cpu);
break;
case CPU_UP_CANCELED:
- case CPU_UP_CANCELED_FROZEN:
case CPU_DEAD:
- case CPU_DEAD_FROZEN:
cpuid_device_destroy(cpu);
break;
}
--

To: Rafael J. Wysocki <rjw@...>
Cc: pm list <linux-pm@...>, ACPI Devel Maling List <linux-acpi@...>, Alan Stern <stern@...>, Andrew Morton <akpm@...>, Len Brown <lenb@...>, LKML <linux-kernel@...>, Ingo Molnar <mingo@...>
Date: Tuesday, December 25, 2007 - 8:34 am

To: pm list <linux-pm@...>
Cc: ACPI Devel Maling List <linux-acpi@...>, Alan Stern <stern@...>, Andrew Morton <akpm@...>, Len Brown <lenb@...>, LKML <linux-kernel@...>, Pavel Machek <pavel@...>, Ingo Molnar <mingo@...>
Date: Sunday, December 23, 2007 - 8:57 pm

From: Rafael J. Wysocki <rjw@sisk.pl>

The x86-64 MCE driver should not attempt to destroy/create a suspended
device.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
arch/x86/kernel/cpu/mcheck/mce_64.c | 2 --
1 file changed, 2 deletions(-)

Index: linux-2.6/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/mcheck/mce_64.c
+++ linux-2.6/arch/x86/kernel/cpu/mcheck/mce_64.c
@@ -862,11 +862,9 @@ mce_cpu_callback(struct notifier_block *

switch (action) {
case CPU_ONLINE:
- case CPU_ONLINE_FROZEN:
mce_create_device(cpu);
break;
case CPU_DEAD:
- case CPU_DEAD_FROZEN:
mce_remove_device(cpu);
break;
}

--

To: Rafael J. Wysocki <rjw@...>
Cc: pm list <linux-pm@...>, ACPI Devel Maling List <linux-acpi@...>, Alan Stern <stern@...>, Andrew Morton <akpm@...>, Len Brown <lenb@...>, LKML <linux-kernel@...>, Ingo Molnar <mingo@...>
Date: Tuesday, December 25, 2007 - 8:34 am

Previous thread: none

Next thread: Re: 2.6.24-rc6-mm1 by Herbert Xu on Sunday, December 23, 2007 - 9:25 pm. (12 messages)