login
Header Space

 
 

Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

Score:
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Ingo Molnar <mingo@...>
Cc: Jiri Slaby <jirislaby@...>, Andrew Morton <akpm@...>, <linux-kernel@...>, Rafael J. Wysocki <rjw@...>, Arjan van de Ven <arjan@...>, Thomas Gleixner <tglx@...>, Linux-pm mailing list <linux-pm@...>, Dipankar Sarma <dipankar@...>
Date: Monday, December 10, 2007 - 4:19 am

On Sun, Dec 09, 2007 at 08:46:47AM +0100, Ingo Molnar wrote:
Hi Ingo, 

From the code I fail to see how get_online_cpus() can help us. 

+		/*
+		 * Only do the hung-tasks check on one CPU:
+		 */
+		get_online_cpus();
+		check_cpu = any_online_cpu(cpu_online_map);
+		put_online_cpus();

check_cpu can go offline here, no?

+
+		if (this_cpu != check_cpu)
+			continue;
+
+		if (sysctl_hung_task_timeout_secs)
+			check_hung_uninterruptible_tasks(this_cpu);

Further more this can cause a deadlock since we're calling 
get_online_cpus() from the watchdog thread's context, 
which is going to be kthread_stop'ed from a cpu-hotplug context.
This is what I think was happening in the case reported by Jiri.

Please find the patch below.

Thanks and Regards
gautham.

commit 15bfb662b35c609490185fba2fd4713d230b9374
Author: Gautham R Shenoy <ego@in.ibm.com>
Date:   Mon Dec 10 13:41:45 2007 +0530

softlockup: remove get_online_cpus() which doesn't help here.

The get_online_cpus() protection seems to be bogus
in kernel/softlockup.c as cpu cached in check_cpu can go offline
once we do a put_online_cpus().

This can also cause deadlock during a cpu offline as follows:

WATCHDOG_THREAD:		OFFLINE_CPU:
				mutex_down(&cpu_hotplug.lock);
				/* All subsequent get_online_cpus
				 * will be blocked till we're
				 * done with this cpu-hotplug
				 * operation.
				 */

get_online_cpus();
/* watchdog is blocked
   Thus we cannot
   go further until
   the cpu-hotplug
   operation completes
 */
 				CPU_DEAD:
    				kthread_stop(watchdog_thread);

    				/* we're trying to stop a
    				 * thread which is blocked
    				 * waiting for us to finish.
    				 *
    				 * Since we cannot finish until
    				 * the thread stops, we deadlock here!
    				 */
    
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linuxtronix.de>
Cc: Arjan van de Van <arjan@linux.intel.com>
Cc: Jiri Slaby <jirislaby@gmail.com>

diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index e50b44a..576eb9c 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -219,9 +219,7 @@ static int watchdog(void *__bind_cpu)
 		/*
 		 * Only do the hung-tasks check on one CPU:
 		 */
-		get_online_cpus();
 		check_cpu = any_online_cpu(cpu_online_map);
-		put_online_cpus();
 
 		if (this_cpu != check_cpu)
 			continue;

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
2.6.24-rc4-mm1, Andrew Morton, (Wed Dec 5, 1:17 am)
Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment, Cedric Le Goater, (Thu Dec 13, 1:45 pm)
Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment, Ilpo Järvinen, (Thu Dec 13, 7:00 pm)
Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment, Cedric Le Goater, (Fri Dec 14, 2:52 am)
Re: 2.6.24-rc4-mm1, Rik van Riel, (Wed Dec 12, 12:16 am)
Re: 2.6.24-rc4-mm1, Martin Bligh, (Tue Dec 11, 12:20 pm)
Re: 2.6.24-rc4-mm1, Randy Dunlap, (Tue Dec 11, 12:59 pm)
Re: 2.6.24-rc4-mm1, Martin Bligh, (Tue Dec 11, 1:50 pm)
Re: 2.6.24-rc4-mm1, Reuben Farrelly, (Mon Dec 10, 10:48 am)
Re: 2.6.24-rc4-mm1, Andrew Morton, (Mon Dec 10, 5:11 pm)
Re: 2.6.24-rc4-mm1, Reuben Farrelly, (Tue Dec 11, 10:12 am)
broken suspend (sched related) [Was: 2.6.24-rc4-mm1], Jiri Slaby, (Fri Dec 7, 10:34 am)
Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1], Gautham R Shenoy, (Mon Dec 10, 4:19 am)
Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1], Gautham R Shenoy, (Mon Dec 10, 6:15 am)
Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1], Gautham R Shenoy, (Mon Dec 10, 7:08 am)
Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1], Gautham R Shenoy, (Mon Dec 10, 7:49 am)
Re: 2.6.24-rc4-mm1: VDSOSYM build error, Laurent Riffard, (Thu Dec 6, 6:28 pm)
Re: 2.6.24-rc4-mm1: VDSOSYM build error, Andrew Morton, (Thu Dec 6, 6:37 pm)
Re: 2.6.24-rc4-mm1: VDSOSYM build error, Miles Lane, (Thu Dec 6, 7:28 pm)
Re: 2.6.24-rc4-mm1: VDSOSYM build error, Andrew Morton, (Thu Dec 6, 7:34 pm)
Re: 2.6.24-rc4-mm1: VDSOSYM build error, Miles Lane, (Thu Dec 6, 7:47 pm)
Re: 2.6.24-rc4-mm1: VDSOSYM build error, Ingo Molnar, (Fri Dec 7, 6:36 am)
[PATCH x86/mm] x86 vDSO: canonicalize sysenter .eh_frame, Roland McGrath, (Thu Dec 6, 9:14 pm)
Re: 2.6.24-rc4-mm1, Reuben Farrelly, (Thu Dec 6, 2:59 am)
Re: 2.6.24-rc4-mm1, Andrew Morton, (Thu Dec 6, 3:35 am)
Re: 2.6.24-rc4-mm1, Ilpo Järvinen, (Mon Dec 10, 8:24 am)
Re: 2.6.24-rc4-mm1, Cedric Le Goater, (Wed Dec 12, 3:21 pm)
tcp_sacktag_one() WARNING (was Re: 2.6.24-rc4-mm1), Cedric Le Goater, (Thu Dec 13, 1:38 pm)
Re: 2.6.24-rc4-mm1, Ilpo Järvinen, (Mon Dec 10, 4:05 pm)
Re: 2.6.24-rc4-mm1, David Miller, (Thu Dec 6, 3:09 am)
Re: 2.6.24-rc4-mm1, Ilpo Järvinen, (Fri Dec 7, 9:16 am)
Re: 2.6.24-rc4-mm1, Cedric Le Goater, (Wed Dec 12, 1:57 pm)
Re: 2.6.24-rc4-mm1 Kernel build fails on S390x, Kamalesh Babulal, (Wed Dec 5, 11:15 pm)
Re: 2.6.24-rc4-mm1 Kernel build fails on S390x, Andrew Morton, (Thu Dec 6, 3:19 am)
Re: 2.6.24-rc4-mm1, , (Thu Dec 6, 7:49 am)
Re: 2.6.24-rc4-mm1, Andrew Morton, (Thu Dec 6, 8:04 am)
Re: 2.6.24-rc4-mm1, , (Thu Dec 6, 3:18 pm)
Re: 2.6.24-rc4-mm1, Greg KH, (Thu Dec 6, 3:38 pm)
Re: 2.6.24-rc4-mm1, , (Thu Dec 6, 4:04 pm)
Re: [dm-devel] Re: 2.6.24-rc4-mm1, Kay Sievers, (Thu Dec 6, 6:04 pm)
Re: [dm-devel] Re: 2.6.24-rc4-mm1, , (Thu Dec 6, 7:12 pm)
Re: [dm-devel] Re: 2.6.24-rc4-mm1, Kay Sievers, (Thu Dec 6, 7:24 pm)
Re: [dm-devel] Re: 2.6.24-rc4-mm1, , (Fri Dec 7, 2:20 pm)
Re: [dm-devel] Re: 2.6.24-rc4-mm1, Kay Sievers, (Fri Dec 7, 2:44 pm)
Re: [dm-devel] Re: 2.6.24-rc4-mm1, , (Fri Dec 7, 4:28 pm)
Re: [dm-devel] Re: 2.6.24-rc4-mm1, Kay Sievers, (Fri Dec 7, 4:49 pm)
Re: [dm-devel] Re: 2.6.24-rc4-mm1, Alasdair G Kergon, (Thu Dec 6, 6:12 pm)
Re: 2.6.24-rc4-mm1: some issues on sparc64, Mariusz Kozlowski, (Sat Dec 8, 2:20 pm)
Re: 2.6.24-rc4-mm1: some issues on sparc64, Andrew Morton, (Sat Dec 8, 2:22 pm)
Re: 2.6.24-rc4-mm1: some issues on sparc64, David Miller, (Sun Dec 9, 4:45 am)
Re: 2.6.24-rc4-mm1: some issues on sparc64, Andrew Morton, (Sun Dec 9, 5:03 am)
[PATCH] md: balance braces in raid5 debug code, Mariusz Kozlowski, (Fri Dec 7, 2:20 pm)
Re: 2.6.24-rc4-mm1, Dave Young, (Thu Dec 6, 10:12 pm)
Re: 2.6.24-rc4-mm1, Luis R. Rodriguez, (Fri Dec 7, 6:22 pm)
Re: 2.6.24-rc4-mm1, Dave Young, (Sun Dec 9, 9:07 pm)
Re: 2.6.24-rc4-mm1, Nick Kossifidis, (Sun Dec 9, 1:55 pm)
2.6.24-rc4-mm1: kobj changes fallout on powerpc, Olof Johansson, (Wed Dec 5, 5:15 am)
Re: 2.6.24-rc4-mm1: kobj changes fallout on powerpc, Kamalesh Babulal, (Wed Dec 5, 9:11 am)
speck-geostationary