Re: current linux-2.6.git: cpusets completely broken

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Dmitry Adamushko
Date: Monday, July 14, 2008 - 3:38 pm

On Sat, 12 Jul 2008, Linus Torvalds wrote:

(please correct me if I misinterpreted your point)

cpu_clear(cpu, cpu_active_map); _alone_ does not guarantee that after
its completion, no new tasks can appear on (be migrated to) 'cpu'.

cpu_clear() may race against migration operations which are already in
progress on other CPUs : executing right after a check for
!cpu_active(cpu) and before doing actual migration [*]

Am I missing something?

[  If no, then what I dare to say below is that: (a) with only
cpu_clear(cpu, cpu_active_map) in cpu_down(), "cpu_active_map" is
perhaps not much better than (alternatively) using existing
"cpu_online_map" to check if a task can be migrated to 'cpu' _and_ (b)
there are also a few (rough) speculations on how to fix [*] ]

New tasks may appear on (soon-to-be-dead) 'cpu' at any point until
_cpu_down() calls

__stop_machine_run() -> [ next is called by 'kstopmachine' ] do_stop()
-> stop_machine()

stop_machine() starts a RT high-prio thread on each online cpu and
waits until these threads get scheduled in (take control of cpus).
That guarantees a re-schedule on each CPU has taken place.
In turn, it means none of the CPUs are in the middle of task-migration
operation [**] and further task-migration operations can not race
against cpu_down() -> cpu_clear() (in a sense, stop_machine() is a
synchronization point).

[**] migration operations are done with rq->lock being held.

OTOH, cpu_clear(cpu, cpu_online_map) takes place right after
stop_machine() : do_stop() -> take_cpu_down() (via smdata->fn()) ->
__cpu_disable().

Let's imagine we update all places in the scheduler where
task-migration may take place with a check for either
(a) !cpu_active(cpu) _or_ (b) cpu_offline(cpu) :

then for both cases new tasks may apear on 'cpu' for which cpu_down()
is in progress and for both cases - until __stop_machine_run() -> ...
-> stop_machine() gets called.

Hm?

In any case, the scheduler does not depend on sched-domains to do
migration and migration to offline cpus is not possible (although,
it's possible to soon-to-be-offline cpus), but OTOH we depend on
internals of __stop_machine_run() [ it acts as a sync. point ].

To solve both, we might introduce a special synchronization point
right after cpu_clear(cpu, cpu_active_map) gets called in cpu_down().

[ simplest (probably stupid) approaches ]

(a)

per-cpu rw_lock, readers' part is taken by task-migration code,
writer's part is in cpu_down():

rw_write_lock(per_cpu(migration_lock, cpu)); cpu_clear(cpu,
cpu_active_map); rw_write_unlock(...);

(b)

add rq->migration counter (per-cpu)

inc(rq->migration);
if (cpu_active(dst_cpu))
        do_migration(dst_cpu);
dec(rq->migration);

cpu_active_sync(cpu)
{
for_each_online_cpu:
   while (rq->migration) { cpu_relax(); }
}

(c)

per-cpu "migration_counter" so per_cpu(migration_counter, dst_cpu)
gets +1 while a migration operation _to_ this cpu is in progress and
then

cpu_active_sync(to_be_offline_cpu)
{
   while (per_cpu(migration_counter, to_be_offline_cpu) != 0) { cpu_relax(); }
}


-- 
Best regards,
Dmitry Adamushko
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
current linux-2.6.git: cpusets completely broken, Vegard Nossum, (Fri Jul 11, 12:07 pm)
Re: current linux-2.6.git: cpusets completely broken, Paul Menage, (Fri Jul 11, 12:36 pm)
Re: current linux-2.6.git: cpusets completely broken, Vegard Nossum, (Fri Jul 11, 12:43 pm)
Re: current linux-2.6.git: cpusets completely broken, Max Krasnyansky, (Fri Jul 11, 1:07 pm)
Re: current linux-2.6.git: cpusets completely broken, Dmitry Adamushko, (Fri Jul 11, 4:03 pm)
Re: current linux-2.6.git: cpusets completely broken, Max Krasnyansky, (Fri Jul 11, 4:19 pm)
Re: current linux-2.6.git: cpusets completely broken, Dmitry Adamushko, (Fri Jul 11, 4:53 pm)
Re: current linux-2.6.git: cpusets completely broken, Vegard Nossum, (Fri Jul 11, 8:17 pm)
Re: current linux-2.6.git: cpusets completely broken, Linus Torvalds, (Fri Jul 11, 8:28 pm)
Re: current linux-2.6.git: cpusets completely broken, Dmitry Adamushko, (Sat Jul 12, 3:04 am)
Re: current linux-2.6.git: cpusets completely broken, Dmitry Adamushko, (Sat Jul 12, 4:05 am)
Re: current linux-2.6.git: cpusets completely broken, Linus Torvalds, (Sat Jul 12, 12:15 pm)
Re: current linux-2.6.git: cpusets completely broken, Max Krasnyansky, (Sat Jul 12, 12:19 pm)
Re: current linux-2.6.git: cpusets completely broken, Linus Torvalds, (Sat Jul 12, 1:10 pm)
Re: current linux-2.6.git: cpusets completely broken, Linus Torvalds, (Sat Jul 12, 2:30 pm)
Re: current linux-2.6.git: cpusets completely broken, Linus Torvalds, (Sat Jul 12, 3:07 pm)
Re: current linux-2.6.git: cpusets completely broken, Max Krasnyansky, (Sat Jul 12, 3:43 pm)
Re: current linux-2.6.git: cpusets completely broken, Vegard Nossum, (Sat Jul 12, 4:00 pm)
Re: current linux-2.6.git: cpusets completely broken, Linus Torvalds, (Sat Jul 12, 4:01 pm)
Re: current linux-2.6.git: cpusets completely broken, Linus Torvalds, (Sat Jul 12, 4:04 pm)
Re: current linux-2.6.git: cpusets completely broken, Dmitry Adamushko, (Sat Jul 12, 4:05 pm)
Re: current linux-2.6.git: cpusets completely broken, Linus Torvalds, (Sat Jul 12, 4:17 pm)
Re: current linux-2.6.git: cpusets completely broken, Dmitry Adamushko, (Sat Jul 12, 4:19 pm)
Re: current linux-2.6.git: cpusets completely broken, Vegard Nossum, (Sat Jul 12, 4:25 pm)
Re: current linux-2.6.git: cpusets completely broken, Dmitry Adamushko, (Sat Jul 12, 4:25 pm)
Re: current linux-2.6.git: cpusets completely broken, Dmitry Adamushko, (Sun Jul 13, 2:53 am)
Re: current linux-2.6.git: cpusets completely broken, Andi Kleen, (Sun Jul 13, 8:29 am)
Re: current linux-2.6.git: cpusets completely broken, Linus Torvalds, (Sun Jul 13, 10:10 am)
Re: current linux-2.6.git: cpusets completely broken, Ingo Molnar, (Sun Jul 13, 10:42 am)
Re: current linux-2.6.git: cpusets completely broken, Linus Torvalds, (Sun Jul 13, 10:46 am)
Re: current linux-2.6.git: cpusets completely broken, Dmitry Adamushko, (Sun Jul 13, 11:13 am)
Re: current linux-2.6.git: cpusets completely broken, Ingo Molnar, (Sun Jul 13, 11:19 am)
Re: current linux-2.6.git: cpusets completely broken, Linus Torvalds, (Sun Jul 13, 11:20 am)
Re: current linux-2.6.git: cpusets completely broken, Linus Torvalds, (Sun Jul 13, 11:38 am)
Re: current linux-2.6.git: cpusets completely broken, Mike Travis, (Mon Jul 14, 8:49 am)
Re: current linux-2.6.git: cpusets completely broken, Dmitry Adamushko, (Mon Jul 14, 3:38 pm)
Re: current linux-2.6.git: cpusets completely broken, Linus Torvalds, (Mon Jul 14, 4:05 pm)
Re: current linux-2.6.git: cpusets completely broken, Dmitry Adamushko, (Mon Jul 14, 5:00 pm)
Re: current linux-2.6.git: cpusets completely broken, Linus Torvalds, (Mon Jul 14, 5:23 pm)
Re: current linux-2.6.git: cpusets completely broken, Dmitry Adamushko, (Mon Jul 14, 7:21 pm)
Re: current linux-2.6.git: cpusets completely broken, Max Krasnyansky, (Mon Jul 14, 8:03 pm)
Re: current linux-2.6.git: cpusets completely broken, Steven Rostedt, (Mon Jul 14, 8:23 pm)
Re: current linux-2.6.git: cpusets completely broken, Linus Torvalds, (Mon Jul 14, 8:36 pm)
Re: current linux-2.6.git: cpusets completely broken, Steven Rostedt, (Mon Jul 14, 8:47 pm)
Re: current linux-2.6.git: cpusets completely broken, Linus Torvalds, (Mon Jul 14, 9:04 pm)
Re: current linux-2.6.git: cpusets completely broken, Linus Torvalds, (Mon Jul 14, 9:12 pm)
Re: current linux-2.6.git: cpusets completely broken, Steven Rostedt, (Mon Jul 14, 9:16 pm)
Re: current linux-2.6.git: cpusets completely broken, Ingo Molnar, (Tue Jul 15, 1:32 am)
Re: current linux-2.6.git: cpusets completely broken, Max Krasnyansky, (Tue Jul 15, 1:42 am)
Re: current linux-2.6.git: cpusets completely broken, Ingo Molnar, (Tue Jul 15, 1:57 am)
Re: current linux-2.6.git: cpusets completely broken, Max Krasnyansky, (Tue Jul 15, 2:12 am)
Re: current linux-2.6.git: cpusets completely broken, Max Krasnyansky, (Tue Jul 15, 11:35 pm)
Re: current linux-2.6.git: cpusets completely broken, Peter Zijlstra, (Wed Jul 16, 12:10 am)
Re: current linux-2.6.git: cpusets completely broken, Max Krasnyansky, (Wed Jul 16, 10:01 am)