Re: [PATCH 3/4] scheduler: replace migration_thread with cpu_stop

Previous thread: [PATCH 1/4] cpu_stop: implement stop_cpu[s]() by Tejun Heo on Thursday, April 22, 2010 - 9:09 am. (10 messages)

Next thread: defconfig strangeness by Nicholas Mc Guire on Thursday, April 22, 2010 - 9:07 am. (3 messages)
From: Tejun Heo
Date: Thursday, April 22, 2010 - 9:09 am

Hello, all.

cpu_hog has been renamed to cpu_stop and moved into
kernel/stop_machine.c per Peter Zijlstra's suggestion.  This patchset
is feature-wise identical to the second take of cpuhog[L].  The only
changes are the rename, relocation and refresh against the current
sched/core.

The following API renames took place.

- hog_one_cpu()		-> stop_one_cpu()
- hog_one_cpu_nowait()	-> stop_one_cpu_nowait()
- hog_cpus()		-> stop_cpus()
- try_hog_cpus()	-> try_stop_cpus()
- *_hog() callbacks	-> *_cpu_stop()

Internal names have been renamed accordingly.  e.g. cpuhog thread
became cpu_stopper thread and so on.

This patchset contains the following four patches.

 0001-cpu_stop-implement-stop_cpu-s.patch
 0002-stop_machine-reimplement-using-cpu_stop.patch
 0003-scheduler-replace-migration_thread-with-cpu_stop.patch
 0004-scheduler-kill-paranoia-check-in-synchronize_sched_e.patch

The patches are against the current linux-2.6-tip/sched/core
(09a40af5240de02d848247ab82440ad75b31ab11) and are available in the
following git tree.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git cpu_stop

I retained the original acked/reviewed-by's as the changes are mostly
cosmetic.  If you disagree, please let me know.  I'll try to push this
through sched/core again once Peter acks.

diffstat follows.

 Documentation/RCU/torture.txt |   10 
 arch/s390/kernel/time.c       |    1 
 drivers/xen/manage.c          |   14 -
 include/linux/rcutiny.h       |    2 
 include/linux/rcutree.h       |    1 
 include/linux/stop_machine.h  |   59 ++--
 kernel/cpu.c                  |    8 
 kernel/module.c               |   14 -
 kernel/rcutorture.c           |    2 
 kernel/sched.c                |  271 +++------------------
 kernel/sched_fair.c           |   42 ++-
 kernel/stop_machine.c         |  525 ++++++++++++++++++++++++++++++++----------
 12 files changed, 514 insertions(+), 435 deletions(-)

Thanks.

--
tejun

[L] http://thread.gmane.org/gmane.linux.kernel/962635
--

From: Tejun Heo
Date: Thursday, April 22, 2010 - 9:09 am

Currently migration_thread is serving three purposes - migration
pusher, context to execute active_load_balance() and forced context
switcher for expedited RCU synchronize_sched.  All three roles are
hardcoded into migration_thread() and determining which job is
scheduled is slightly messy.

This patch kills migration_thread and replaces all three uses with
cpu_stop.  The three different roles of migration_thread() are
splitted into three separate cpu_stop callbacks -
migration_cpu_stop(), active_load_balance_cpu_stop() and
synchronize_sched_expedited_cpu_stop() - and each use case now simply
asks cpu_stop to execute the callback as necessary.

synchronize_sched_expedited() was implemented with private
preallocated resources and custom multi-cpu queueing and waiting
logic, both of which are provided by cpu_stop.
synchronize_sched_expedited_count is made atomic and all other shared
resources along with the mutex are dropped.

synchronize_sched_expedited() also implemented a check to detect cases
where not all the callback got executed on their assigned cpus and
fall back to synchronize_sched().  If called with cpu hotplug blocked,
cpu_stop already guarantees that and the condition cannot happen;
otherwise, stop_machine() would break.  However, this patch preserves
the paranoid check using a cpumask to record on which cpus the stopper
ran so that it can serve as a bisection point if something actually
goes wrong theree.

Because the internal execution state is no longer visible,
rcu_expedited_torture_stats() is removed.

This patch also renames cpu_stop threads to from "stopper/%d" to
"migration/%d".  The names of these threads ultimately don't matter
and there's no reason to make unnecessary userland visible changes.

With this patch applied, stop_machine() and sched now share the same
resources.  stop_machine() is faster without wasting any resources and
sched migration users are much cleaner.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar ...
From: Peter Zijlstra
Date: Monday, May 3, 2010 - 6:26 am

So who guarantees busiest->active_balance_work isn't already enqueued by
some other cpu's load-balancer run?


--

From: Tejun Heo
Date: Tuesday, May 4, 2010 - 12:17 am

Hello,


Hmmm... maybe I'm mistaken but isn't that guaranteed by
busiest->active_balance which is protected by the rq lock?
active_load_balance_cpu_stop is scheduled iff busiest->active_balance
was changed from zero and only active_load_balance_cpu_stop() can
clear it at the end of its execution at which point the
active_balance_work is safe to reuse.

Thanks.

-- 
tejun
--

From: Peter Zijlstra
Date: Tuesday, May 4, 2010 - 5:45 am

Ah, indeed. It wasn't obvious from looking at the patch, but when
looking at the full code it fairly easy to see.


--

From: Tejun Heo
Date: Tuesday, May 4, 2010 - 5:49 am

Hmmm... it's probably worthwhile to note tho.  I'll add a comment and
send out the updated patches soon.

Thanks.

-- 
tejun
--

Previous thread: [PATCH 1/4] cpu_stop: implement stop_cpu[s]() by Tejun Heo on Thursday, April 22, 2010 - 9:09 am. (10 messages)

Next thread: defconfig strangeness by Nicholas Mc Guire on Thursday, April 22, 2010 - 9:07 am. (3 messages)