Re: [BUG] CFS vs cpu hotplug

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Miao Xie
Date: Monday, July 7, 2008 - 3:26 am

on 3:59 Lai Jiangshan wrote:
[snip]

I tested it with Dmitry's patch, and found that all the tasks on the offline
cpu were migrated to an online cpu by migrate_live_tasks() in migration_call().
But some tasks(such as klogd and so on)was moved back to the offline cpu
immediately before BUG_ON(rq->nr_running != 0) checking, even before acquiring
rq's lock.

	static int __cpuinit
	migration_call(struct notifier_block *nfb, unsigned long action, void *
	{
		...
		switch (action) {
		...
		case CPU_DEAD:
		case CPU_DEAD_FROZEN:
			cpuset_lock();
			migrate_live_tasks(cpu);
			rq = cpu_rq(cpu);
			...
			spin_lock_irq(&rq->lock);
			...
			migrate_dead_tasks(cpu);
			spin_unlock_irq(&rq->lock);
			cpuset_unlock();
			migrate_nr_uninterruptible(rq);
			BUG_ON(rq->nr_running != 0);
			...
			break;
		}
		...
	}

By debuging, I found this bug was caused by select_task_rq_fair().
After migrating the tasks on the offline cpu to an online cpu, the kernel would
wake up these migrated tasks quickly by try_to_wake_up(). try_to_wake_up() would
invoke select_task_rq_fair() to find a lower-load cpu in sched domains for them.
But the sched domains weren't updated and the offline cpu was still in the sched
domains. So select_task_rq_fair() might return the offline cpu's id, then the
bug occurred.

I fix the bug just by checking the select_task_rq_fair()'s return value in
try_to_wake_up().

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>

---
 kernel/sched.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 94ead43..15b5ddf 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2103,6 +2103,9 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state, int sync)
 		goto out_activate;
 
 	cpu = p->sched_class->select_task_rq(p, sync);
+	if (unlikely(cpu_is_offline(cpu)))
+		cpu = orig_cpu;
+
 	if (cpu != orig_cpu) {
 		set_task_cpu(p, cpu);
 		task_rq_unlock(rq, &flags);
-- 
1.5.4.rc3


--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[BUG] CFS vs cpu hotplug, Heiko Carstens, (Thu Jun 19, 9:19 am)
Re: [BUG] CFS vs cpu hotplug, Peter Zijlstra, (Thu Jun 19, 11:05 am)
Re: [BUG] CFS vs cpu hotplug, Peter Zijlstra, (Thu Jun 19, 11:14 am)
Re: [BUG] CFS vs cpu hotplug, Heiko Carstens, (Thu Jun 19, 2:14 pm)
Re: [BUG] CFS vs cpu hotplug, Heiko Carstens, (Thu Jun 19, 2:17 pm)
Re: [BUG] CFS vs cpu hotplug, Peter Zijlstra, (Thu Jun 19, 2:26 pm)
Re: [BUG] CFS vs cpu hotplug, Peter Zijlstra, (Thu Jun 19, 2:32 pm)
Re: [BUG] CFS vs cpu hotplug, Heiko Carstens, (Thu Jun 19, 2:49 pm)
Re: [BUG] CFS vs cpu hotplug, Peter Zijlstra, (Fri Jun 20, 1:51 am)
Re: [BUG] CFS vs cpu hotplug, Dmitry Adamushko, (Fri Jun 20, 4:44 am)
Re: [BUG] CFS vs cpu hotplug, Heiko Carstens, (Fri Jun 20, 3:19 pm)
Re: [BUG] CFS vs cpu hotplug, Heiko Carstens, (Fri Jun 20, 3:23 pm)
Re: [BUG] CFS vs cpu hotplug, Dmitry Adamushko, (Wed Jun 25, 3:12 pm)
Re: [BUG] CFS vs cpu hotplug, Dmitry Adamushko, (Sat Jun 28, 3:16 pm)
Re: [BUG] CFS vs cpu hotplug, Ingo Molnar, (Sat Jun 28, 11:55 pm)
Re: [BUG] CFS vs cpu hotplug, Heiko Carstens, (Mon Jun 30, 2:07 am)
Re: [BUG] CFS vs cpu hotplug, Ingo Molnar, (Mon Jun 30, 2:17 am)
Re: [BUG] CFS vs cpu hotplug, Lai Jiangshan, (Tue Jul 1, 2:22 am)
Re: [BUG] CFS vs cpu hotplug, Ingo Molnar, (Tue Jul 1, 2:31 am)
Re: [BUG] CFS vs cpu hotplug, Lai Jiangshan, (Tue Jul 1, 3:09 am)
Re: [BUG] CFS vs cpu hotplug, Lai Jiangshan, (Wed Jul 2, 12:13 am)
Re: [BUG] CFS vs cpu hotplug, Dmitry Adamushko, (Wed Jul 2, 1:50 am)
Re: [BUG] CFS vs cpu hotplug, Lai Jiangshan, (Wed Jul 2, 2:23 am)
Re: [BUG] CFS vs cpu hotplug, Miao Xie, (Mon Jul 7, 3:26 am)
Re: [BUG] CFS vs cpu hotplug, Dmitry Adamushko, (Mon Jul 7, 4:31 am)