I think this could be what was happening: between the two seconds, CPU 0
becomes idle and it pulls the nice 19 task over via pull_task(), which
calls set_task_cpu(), which changes the task's vruntime to the current
min_vruntime of CPU 0 (in my patch). Then, after set_task_cpu(), CPU 0
calls activate_task(), which calls enqueue_task() and in turn
update_curr(). Now, nr_running on CPU 0 is 0, so sync_vruntime() gets
called and CPU 0's min_vruntime gets synced to the system max. Thus, the
nice 19 task now has a vruntime less than CPU 0's min_vruntime. I think
this can be fixed by adding the following code in set_task_cpu() before we
adjust p->vruntime:
if (!new_rq->cfs.nr_running)
sync_vruntime(new_rq);
I think this rq->lock check works because it prevents the above scenario
(CPU 0 is in pull_task so it must hold the rq lock). But my concern is
that it may be too conservative, since sync_vruntime is called by
update_curr, which mostly gets called in enqueue_task() and
dequeue_task(), both of which are often invoked with the rq lock being
held. Thus, if we don't allow sync_vruntime when rq lock is held, the sync
will occur much less frequently and thus weaken cross-CPU fairness.
tong
-