Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1]

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Gautham R Shenoy
Date: Monday, December 10, 2007 - 1:19 am

On Sun, Dec 09, 2007 at 08:46:47AM +0100, Ingo Molnar wrote:
Hi Ingo, 

From the code I fail to see how get_online_cpus() can help us. 

+		/*
+		 * Only do the hung-tasks check on one CPU:
+		 */
+		get_online_cpus();
+		check_cpu = any_online_cpu(cpu_online_map);
+		put_online_cpus();

check_cpu can go offline here, no?

+
+		if (this_cpu != check_cpu)
+			continue;
+
+		if (sysctl_hung_task_timeout_secs)
+			check_hung_uninterruptible_tasks(this_cpu);

Further more this can cause a deadlock since we're calling 
get_online_cpus() from the watchdog thread's context, 
which is going to be kthread_stop'ed from a cpu-hotplug context.
This is what I think was happening in the case reported by Jiri.

Please find the patch below.

Thanks and Regards
gautham.

commit 15bfb662b35c609490185fba2fd4713d230b9374
Author: Gautham R Shenoy <ego@in.ibm.com>
Date:   Mon Dec 10 13:41:45 2007 +0530

softlockup: remove get_online_cpus() which doesn't help here.

The get_online_cpus() protection seems to be bogus
in kernel/softlockup.c as cpu cached in check_cpu can go offline
once we do a put_online_cpus().

This can also cause deadlock during a cpu offline as follows:

WATCHDOG_THREAD:		OFFLINE_CPU:
				mutex_down(&cpu_hotplug.lock);
				/* All subsequent get_online_cpus
				 * will be blocked till we're
				 * done with this cpu-hotplug
				 * operation.
				 */

get_online_cpus();
/* watchdog is blocked
   Thus we cannot
   go further until
   the cpu-hotplug
   operation completes
 */
 				CPU_DEAD:
    				kthread_stop(watchdog_thread);

    				/* we're trying to stop a
    				 * thread which is blocked
    				 * waiting for us to finish.
    				 *
    				 * Since we cannot finish until
    				 * the thread stops, we deadlock here!
    				 */
    
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linuxtronix.de>
Cc: Arjan van de Van <arjan@linux.intel.com>
Cc: Jiri Slaby <jirislaby@gmail.com>

diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index e50b44a..576eb9c 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -219,9 +219,7 @@ static int watchdog(void *__bind_cpu)
 		/*
 		 * Only do the hung-tasks check on one CPU:
 		 */
-		get_online_cpus();
 		check_cpu = any_online_cpu(cpu_online_map);
-		put_online_cpus();
 
 		if (this_cpu != check_cpu)
 			continue;

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
2.6.24-rc4-mm1, Andrew Morton, (Tue Dec 4, 10:17 pm)
2.6.24-rc4-mm1: kobj changes fallout on powerpc, Olof Johansson, (Wed Dec 5, 2:15 am)
Re: 2.6.24-rc4-mm1: kobj changes fallout on powerpc, Kamalesh Babulal, (Wed Dec 5, 6:11 am)
Re: 2.6.24-rc4-mm1 Kernel build fails on S390x, Kamalesh Babulal, (Wed Dec 5, 8:15 pm)
Re: 2.6.24-rc4-mm1, Reuben Farrelly, (Wed Dec 5, 11:59 pm)
Re: 2.6.24-rc4-mm1, David Miller, (Thu Dec 6, 12:09 am)
Re: 2.6.24-rc4-mm1 Kernel build fails on S390x, Andrew Morton, (Thu Dec 6, 12:19 am)
Re: 2.6.24-rc4-mm1, Andrew Morton, (Thu Dec 6, 12:35 am)
Re: 2.6.24-rc4-mm1, Valdis.Kletnieks, (Thu Dec 6, 4:49 am)
Re: 2.6.24-rc4-mm1, Andrew Morton, (Thu Dec 6, 5:04 am)
Re: 2.6.24-rc4-mm1, Valdis.Kletnieks, (Thu Dec 6, 12:18 pm)
Re: 2.6.24-rc4-mm1, Valdis.Kletnieks, (Thu Dec 6, 1:04 pm)
Re: [dm-devel] Re: 2.6.24-rc4-mm1, Kay Sievers, (Thu Dec 6, 3:04 pm)
Re: [dm-devel] Re: 2.6.24-rc4-mm1, Alasdair G Kergon, (Thu Dec 6, 3:12 pm)
Re: 2.6.24-rc4-mm1: VDSOSYM build error, Laurent Riffard, (Thu Dec 6, 3:28 pm)
Re: 2.6.24-rc4-mm1: VDSOSYM build error, Andrew Morton, (Thu Dec 6, 3:37 pm)
Re: [dm-devel] Re: 2.6.24-rc4-mm1, Valdis.Kletnieks, (Thu Dec 6, 4:12 pm)
Re: [dm-devel] Re: 2.6.24-rc4-mm1, Kay Sievers, (Thu Dec 6, 4:24 pm)
Re: 2.6.24-rc4-mm1: VDSOSYM build error, Miles Lane, (Thu Dec 6, 4:28 pm)
Re: 2.6.24-rc4-mm1: VDSOSYM build error, Andrew Morton, (Thu Dec 6, 4:34 pm)
Re: 2.6.24-rc4-mm1: VDSOSYM build error, Miles Lane, (Thu Dec 6, 4:47 pm)
[PATCH x86/mm] x86 vDSO: canonicalize sysenter .eh_frame, Roland McGrath, (Thu Dec 6, 6:14 pm)
Re: 2.6.24-rc4-mm1, Dave Young, (Thu Dec 6, 7:12 pm)
Re: 2.6.24-rc4-mm1: VDSOSYM build error, Ingo Molnar, (Fri Dec 7, 3:36 am)
Re: 2.6.24-rc4-mm1, Ilpo Järvinen, (Fri Dec 7, 6:16 am)
[PATCH] md: balance braces in raid5 debug code, Mariusz Kozlowski, (Fri Dec 7, 11:20 am)
Re: [dm-devel] Re: 2.6.24-rc4-mm1, Valdis.Kletnieks, (Fri Dec 7, 11:20 am)
Re: [dm-devel] Re: 2.6.24-rc4-mm1, Kay Sievers, (Fri Dec 7, 11:44 am)
Re: [dm-devel] Re: 2.6.24-rc4-mm1, Valdis.Kletnieks, (Fri Dec 7, 1:28 pm)
Re: [dm-devel] Re: 2.6.24-rc4-mm1, Kay Sievers, (Fri Dec 7, 1:49 pm)
Re: 2.6.24-rc4-mm1, Luis R. Rodriguez, (Fri Dec 7, 3:22 pm)
Re: 2.6.24-rc4-mm1: some issues on sparc64, Mariusz Kozlowski, (Sat Dec 8, 11:20 am)
Re: 2.6.24-rc4-mm1: some issues on sparc64, Andrew Morton, (Sat Dec 8, 11:22 am)
Re: 2.6.24-rc4-mm1: some issues on sparc64, David Miller, (Sun Dec 9, 1:45 am)
Re: 2.6.24-rc4-mm1: some issues on sparc64, Andrew Morton, (Sun Dec 9, 2:03 am)
Re: 2.6.24-rc4-mm1, Nick Kossifidis, (Sun Dec 9, 10:55 am)
Re: 2.6.24-rc4-mm1, Dave Young, (Sun Dec 9, 6:07 pm)
Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1], Gautham R Shenoy, (Mon Dec 10, 1:19 am)
Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1], Gautham R Shenoy, (Mon Dec 10, 3:15 am)
Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1], Gautham R Shenoy, (Mon Dec 10, 4:08 am)
Re: broken suspend (sched related) [Was: 2.6.24-rc4-mm1], Gautham R Shenoy, (Mon Dec 10, 4:49 am)
Re: 2.6.24-rc4-mm1, Reuben Farrelly, (Mon Dec 10, 7:48 am)
Re: 2.6.24-rc4-mm1, Ilpo Järvinen, (Mon Dec 10, 1:05 pm)
Re: 2.6.24-rc4-mm1, Andrew Morton, (Mon Dec 10, 2:11 pm)
Re: 2.6.24-rc4-mm1, Reuben Farrelly, (Tue Dec 11, 7:12 am)
Re: 2.6.24-rc4-mm1, Martin Bligh, (Tue Dec 11, 9:20 am)
Re: 2.6.24-rc4-mm1, Randy Dunlap, (Tue Dec 11, 9:59 am)
Re: 2.6.24-rc4-mm1, Martin Bligh, (Tue Dec 11, 10:50 am)
Re: 2.6.24-rc4-mm1, Rik van Riel, (Tue Dec 11, 9:16 pm)
Re: 2.6.24-rc4-mm1, Cedric Le Goater, (Wed Dec 12, 10:57 am)
Re: 2.6.24-rc4-mm1, Cedric Le Goater, (Wed Dec 12, 12:21 pm)
Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment, Cedric Le Goater, (Thu Dec 13, 10:45 am)
Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment, Ilpo Järvinen, (Thu Dec 13, 4:00 pm)