Re: [RFC PATCH v2] nohz/sched: disable ilb on !mc_capable()

Previous thread: [PATCH 00/10][RFC] tracing: Lowering the footprint of TRACE_EVENTs by Steven Rostedt on Monday, April 26, 2010 - 12:50 pm. (3 messages)

Next thread: Re: [linux-pm] Invalid opcode on resume from STR on Asus P4P800-VM by Alan Stern on Monday, April 26, 2010 - 1:34 pm. (4 messages)
From: Dominik Brodowski
Date: Monday, April 26, 2010 - 1:31 pm

From: Dominik Brodowski <linux@dominikbrodowski.net>
Date: Thu, 8 Apr 2010 21:51:18 +0200
Subject: [PATCH] nohz/sched: disable ilb on !mc_capable()

On my dual-core, !mc_capbale() CPU, the idle load balancer (ilb) is one
of the main reasons ticks are not stopped: Under moderate load (~98 % idle),
upt o half of the calls to tick_nohz_top_sched_tick() are aborted due
to calls to select_nohz_load_balancer(1).

I suspect this is caused by the following phenomenon:

    CPU0				CPU1
    <active>				<active>
    tick_nohz_stop_sched_tick(1)
    select_nohz_load_balancer(1)
     => CPU0 becomes ilb owner,		<CPU1 becomes idle a bit later>
        tick is not stopped,		tick_nohz_stop_sched_tick(1)
        CPU0 goes to sleep for		 => CPU1 isn't the ilb owner,
        exactly 1 tick.			    tick is stopped.
    <short sleep>			<long sleep>
    ---> scheduler_tick()
    tick_nohz_stop_sched_tick(0)
    tick_nohz_stop_sched_tick(1)
     => is ilb owner, all CPUs are
        idle, CPU0 may go to sleep.

If all CPU cores have hardly anything to do, letting the active CPU do
idle load balancing allows us to enter deep sleep states earlier, and for
longer periods of time. Furthermore, on !mc_capable() systems, it seems that
the ilb algorithm isn't needed at all. Let's show this for a 2-core system:

- if both cores are active, ilb is deactivated
- if no core is active, ilb is deactivated
- if only one core is active, it attempts to balance its load off to other
  CPUs on each tick anyway. ilb wouldn't act quicker.

This patch decreases the amount of wakeups on my completely idle notebook by
about two thirds.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 5a5ea2c..8ad8a03 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3290,6 +3290,9 @@ int select_nohz_load_balancer(int stop_tick)
 	if (stop_tick) {
 		cpu_rq(cpu)->in_nohz_recently = 1;
 
+		if (!mc_capable())
+			return 0;
+
 		if ...
From: Peter Zijlstra
Date: Tuesday, May 4, 2010 - 6:14 am

Right, so I think the !mc_capable() check is buggy, at the very least on
sparc64 which is 'creative' with its sched_domain maps.

I'm also not sure what a single socket AMD Magny-Cours will do.

On a single socket Nehalem we will have a non trivial sched_domain
because we also have the threads included.

I think we can only do your optimization for machines that end up having

--

From: Dominik Brodowski
Date: Tuesday, May 4, 2010 - 1:14 pm

Is there an easy way to determine there's just a single sched_domain?

Best,
	Dominik
--

From: Suresh Siddha
Date: Wednesday, May 5, 2010 - 4:03 pm

Dominik, We have posted some patches in the past to solve this issue.

http://lkml.org/lkml/2009/12/10/470

I will be re-posting the cleaned up patches shortly to address this.

thanks,
suresh

--

Previous thread: [PATCH 00/10][RFC] tracing: Lowering the footprint of TRACE_EVENTs by Steven Rostedt on Monday, April 26, 2010 - 12:50 pm. (3 messages)

Next thread: Re: [linux-pm] Invalid opcode on resume from STR on Asus P4P800-VM by Alan Stern on Monday, April 26, 2010 - 1:34 pm. (4 messages)