Re: [BUG] CFS vs cpu hotplug

Previous thread: [PATCH] eCryptfs: Do not try to open device files on mknod by Michael Halcrow on Wednesday, July 9, 2008 - 6:21 pm. (2 messages)

Next thread: Re: [Bug 11063] New: lack of GNU_STACK header doesn't result in rwx stack on i386 by Andrew Morton on Wednesday, July 9, 2008 - 6:28 pm. (1 message)
To: Ingo Molnar <mingo@...>
Cc: <miaox@...>, Lai Jiangshan <laijs@...>, Ingo Molnar <mingo@...>, Heiko Carstens <heiko.carstens@...>, Peter Zijlstra <a.p.zijlstra@...>, Avi Kivity <avi@...>, <linux-kernel@...>, Andrew Morton <akpm@...>
Date: Wednesday, July 9, 2008 - 6:32 pm

hm, while looking at this code again...

Ingo,

I think we may have a race between try_to_wake_up() and migrate_live_tasks() -> move_task_off_dead_cpu()
when the later one may end up looping endlessly.

Subject: sched: prevent a potentially endless loop in move_task_off_dead_cpu()

Interrupts are enabled on other CPUs when migration_call(CPU_DEAD, ...) is called so we may get a race
between try_to_wake_up() and migrate_live_tasks() -> move_task_off_dead_cpu(). The former one may push
a task out of a dead CPU causing the later one to loop endlessly.

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>

---
diff --git a/kernel/sched.c b/kernel/sched.c
index 94ead43..9397b87 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5621,8 +5621,10 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)

double_rq_lock(rq_src, rq_dest);
/* Already moved. */
- if (task_cpu(p) != src_cpu)
+ if (task_cpu(p) != src_cpu) {
+ ret = 1;
goto out;
+ }
/* Affinity changed (again). */
if (!cpu_isset(dest_cpu, p->cpus_allowed))
goto out;

---

--

To: Dmitry Adamushko <dmitry.adamushko@...>
Cc: Ingo Molnar <mingo@...>, <miaox@...>, Lai Jiangshan <laijs@...>, Peter Zijlstra <a.p.zijlstra@...>, Avi Kivity <avi@...>, <linux-kernel@...>, Andrew Morton <akpm@...>
Date: Thursday, July 10, 2008 - 3:30 am

That's exactly what explains a dump I got yesterday. Thanks for fixing! :)

Will apply your patch and let you know if it fixes the problem.
--

To: Heiko Carstens <heiko.carstens@...>
Cc: Dmitry Adamushko <dmitry.adamushko@...>, <miaox@...>, Lai Jiangshan <laijs@...>, Peter Zijlstra <a.p.zijlstra@...>, Avi Kivity <avi@...>, <linux-kernel@...>, Andrew Morton <akpm@...>
Date: Thursday, July 10, 2008 - 3:39 am

applied to tip/sched/urgent via the commit below - lets see whether we
can still get it into v2.6.26.

Ingo

---------------->
commit dc7fab8b3bb388c57c6c4a43ba68c8a32ca25204
Author: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Date: Thu Jul 10 00:32:40 2008 +0200

sched: fix cpu hotplug

I think we may have a race between try_to_wake_up() and
migrate_live_tasks() -> move_task_off_dead_cpu() when the later one
may end up looping endlessly.

Interrupts are enabled on other CPUs when migration_call(CPU_DEAD, ...) is
called so we may get a race between try_to_wake_up() and
migrate_live_tasks() -> move_task_off_dead_cpu(). The former one may push
a task out of a dead CPU causing the later one to loop endlessly.

Heiko Carstens observed:

| That's exactly what explains a dump I got yesterday. Thanks for fixing! :)

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: miaox@cn.fujitsu.com
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/kernel/sched.c b/kernel/sched.c
index 94ead43..9397b87 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5621,8 +5621,10 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)

double_rq_lock(rq_src, rq_dest);
/* Already moved. */
- if (task_cpu(p) != src_cpu)
+ if (task_cpu(p) != src_cpu) {
+ ret = 1;
goto out;
+ }
/* Affinity changed (again). */
if (!cpu_isset(dest_cpu, p->cpus_allowed))
goto out;
--

Previous thread: [PATCH] eCryptfs: Do not try to open device files on mknod by Michael Halcrow on Wednesday, July 9, 2008 - 6:21 pm. (2 messages)

Next thread: Re: [Bug 11063] New: lack of GNU_STACK header doesn't result in rwx stack on i386 by Andrew Morton on Wednesday, July 9, 2008 - 6:28 pm. (1 message)