Re: [PATCH] cpu hotplug, sched:Introduce cpu_active_map and redoscheddomainmanagment (take 2)

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Gregory Haskins <ghaskins@...>
Cc: Peter Zijlstra <a.p.zijlstra@...>, <mingo@...>, <dmitry.adamushko@...>, <torvalds@...>, <pj@...>, <linux-kernel@...>
Date: Tuesday, July 22, 2008 - 1:10 am

Gregory Haskins wrote:

Sorry for the delay. I finally had a chance to read through this thread again
and to look at the rq->rd->online logic.

One thing that came up during original discussion with Linus and Dmitry is
that cpuset can call partition_sched_domains() at random (user writes into
/dev/cpuset/...) but the scheduler used to rely on a certain sequence of the
domain creation/deletion during cpu hotplug. That's exactly what caused the
problem in the first place. My patch that fixed domain destruction by the
hotplug events changed the sequence the scheduler relied on and things broke.

Greg, correct me if I'm wrong but we seem to have exact same issue with the
rq->rq->online map. Lets take "cpu going down" for example. We're clearing
rq->rd->online bit on DYING event, but nothing AFAICS prevents another cpu
calling rebuild_sched_domains()->partition_sched_domains() in the middle of
the hotplug sequence.
partition_sched_domains() will happily reset rd->rq->online mask and things
will fail. I'm talking about this path

__build_sched_domains() -> cpu_attach_domain() -> rq_attach_root()
	...
	cpu_set(rq->cpu, rd->span);
	if (cpu_isset(rq->cpu, cpu_online_map))
		set_rq_online(rq);
	...

--

btw Why didn't we convert sched*.c to use rq->rd->online when it was
introduced ? ie Instead of using cpu_online_map directly.

Max
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [PATCH] cpu hotplug, sched:Introduce cpu_active_map and ..., Max Krasnyansky, (Tue Jul 22, 1:10 am)