move_task_off_dead_cpu()->select_fallback_rq() reads/updates ->cpus_allowed
lockless. We can race with set_cpus_allowed() running in parallel.
Change it to take rq->lock around select_fallback_rq(). Note that it is not
trivial to move this spin_lock() into select_fallback_rq(), we must recheck
the task was not migrated after we take the lock and other callers do not
need this lock.
To avoid the races with other callers of select_fallback_rq() which rely on
TASK_WAKING, we also check p->state != TASK_WAKING and do nothing otherwise.
The owner of TASK_WAKING must update ->cpus_allowed and choose the correct
CPU anyway, and the subsequent __migrate_task() is just meaningless because
p->se.on_rq must be false.
Alternatively, we could change select_task_rq() to take rq->lock right
after it calls sched_class->select_task_rq(), but this looks a bit ugly.
Also, change it to not assume irqs are disabled and absorb __migrate_task_irq().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/sched.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)
--- 34-rc1/kernel/sched.c~2_MTODC_TAKE_RQ_LOCK 2010-03-15 09:40:16.000000000 +0100
+++ 34-rc1/kernel/sched.c 2010-03-15 09:40:31.000000000 +0100
@@ -5509,29 +5509,29 @@ static int migration_thread(void *data)
}
#ifdef CONFIG_HOTPLUG_CPU
-
-static int __migrate_task_irq(struct task_struct *p, int src_cpu, int dest_cpu)
-{
- int ret;
-
- local_irq_disable();
- ret = __migrate_task(p, src_cpu, dest_cpu);
- local_irq_enable();
- return ret;
-}
-
/*
* Figure out where task on dead CPU should go, use force if necessary.
*/
static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
{
- int dest_cpu;
-
+ struct rq *rq = cpu_rq(dead_cpu);
+ int needs_cpu, dest_cpu;
+ unsigned long flags;
again:
- dest_cpu = select_fallback_rq(dead_cpu, p);
+ local_irq_save(flags);
+
+ raw_spin_lock(&rq->lock);
+ needs_cpu = (task_cpu(p) == dead_cpu) && (p->state != ...Hmm. looks like my gcc is more friendly... OK. certainly I'll send the updated patch, if this series passes your review otherwise. Oleg. --
Hrm, that and I'm apparently unable to read, it said dest_cpu, not Yeah, you made a few good points in 0/6, am now staring at the code on how to close those holes, hope to post something sensible soon. --
Yes, great. Speaking of 0/6, I forgot to ask a couple more question... try_to_wake_up() does task_rq_lock() which checks TASK_WAKING. Perhaps it shouldn't ? I mean, perhaps try_to_wake_up() can take rq->lock without checking task->state. It can never race with the owner of TASK_WAKING, before anything else we check "p->state & state". And. Without the change above, any owner of TASK_WAKING must disable preemption and clear irqs. What do you think? And a stupid question. While doing these changes I was really, really puzzled by task_rq_lock() which does local_irq_save(*flags); rq = task_rq(p); raw_spin_lock(&rq->lock); to the point, I even tried to read the comment which says: Note the ordering: we can safely lookup the task_rq without explicitly disabling preemption. Could you please explain what does this mean? IOW, why can't we do rq = task_rq(p); raw_spin_lock_irqsave(&rq->lock, flags); instead? Of course, this doesn't really matter, but I'd like to understand what I have missed here. Thanks, Oleg. --
You're right, but creating a special task_rq_lock() for ttwu() went a little far, but now that we can remove all that again, this too should I'm not sure why that is the case, v2.6.14:kernel/sched.c already has that. Ingo can you remember any reason for this or should we change the code like Oleg suggests? --
Commit-ID: 1445c08d06c5594895b4fae952ef8a457e89c390 Gitweb: http://git.kernel.org/tip/1445c08d06c5594895b4fae952ef8a457e89c390 Author: Oleg Nesterov <oleg@redhat.com> AuthorDate: Mon, 15 Mar 2010 10:10:10 +0100 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Fri, 2 Apr 2010 20:12:01 +0200 sched: move_task_off_dead_cpu(): Take rq->lock around select_fallback_rq() move_task_off_dead_cpu()->select_fallback_rq() reads/updates ->cpus_allowed lockless. We can race with set_cpus_allowed() running in parallel. Change it to take rq->lock around select_fallback_rq(). Note that it is not trivial to move this spin_lock() into select_fallback_rq(), we must recheck the task was not migrated after we take the lock and other callers do not need this lock. To avoid the races with other callers of select_fallback_rq() which rely on TASK_WAKING, we also check p->state != TASK_WAKING and do nothing otherwise. The owner of TASK_WAKING must update ->cpus_allowed and choose the correct CPU anyway, and the subsequent __migrate_task() is just meaningless because p->se.on_rq must be false. Alternatively, we could change select_task_rq() to take rq->lock right after it calls sched_class->select_task_rq(), but this looks a bit ugly. Also, change it to not assume irqs are disabled and absorb __migrate_task_irq(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100315091010.GA9131@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/sched.c | 30 +++++++++++++++--------------- 1 files changed, 15 insertions(+), 15 deletions(-) diff --git a/kernel/sched.c b/kernel/sched.c index c0b3ebc..27774b5 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -5448,29 +5448,29 @@ static int migration_thread(void *data) } #ifdef CONFIG_HOTPLUG_CPU - -static int __migrate_task_irq(struct task_struct *p, int src_cpu, int dest_cpu) -{ - int ret; - - local_irq_disable(); - ret = __migrate_task(p, ...
